4X ARM Cortex A57 • 4X ARM Cortex A53 • Adreno 430 GPU • Hexagon V56 DSP • 20Nm Process Technology

Total Page:16

File Type:pdf, Size:1020Kb

4X ARM Cortex A57 • 4X ARM Cortex A53 • Adreno 430 GPU • Hexagon V56 DSP • 20Nm Process Technology Embedded Systems Research & Future Developments Henk Corporaal TU/e Electrical Engineering Faculty Electronic Systems Group June 2014 Our Group: Electronic Systems medical Part of TU/e – Electrical Engineering imaging Mission • Scientific basis for design trajectories • From function to realization • Iteration-free design design sensing and actuation circuit chip computer architecture computer fabrication systems-on-chip automotive networked systems HC – FHI DevClub 2 Our industrial cooperation HC – FHI DevClub 3 What is an Embedded System • Wikipedia: • “An embedded system is a computer system with a dedicated function within a larger mechanical or electrical system, often with real-time constraints …. Embedded systems contain processing cores …. Embedded systems control many devices in common use today” • Everywhere: from credit cards, smartphones, cars, households, etc. • ARM sells 30 M per day! • Billions sold every year HC – FHI DevClub 4 Overview • Design Challenges • Processor Trends • CPUs • GPUs • Research • Multiprocessor design • GPU programming • Low power HC – FHI DevClub 5 Major challenges • Ultra low power • Guarantee Real-Time requirements • Deal with tremendous complexity • Very complex processing platforms − Multi-processor − Heterogeneous − Huge software stack • Programming difficulty HC – FHI DevClub 6 Your future phone requirements Requirements: - 4G Communication - 2D/3D Graphics - Audio/Video Codec - Image/Video Processing > 1 TOPS < 1 pJ/op < 1 Watt HC – FHI DevClub 7 Intrinsic Computational Efficiency (ICE) 1pJ/op accelerator HC – FHI DevClub 8 Design complexity problem complexity 103 Process technology + 58% 102 HW gap HW design productivity +21 % SW gap 101 SW productivity + 8 % 4 8 12 16 year HC – FHI DevClub 9 Dealing with complexity: the design pyramid Idea Requirements Abstraction Construct Specification aspect models level and evaluate X Architecture properties Implementation Y Realization Solution HC – FHI DevClub 10 Design problems: Complexity ! idea video control comm. abstraction level abstraction design wall #concepts ? convergence of design space design flows different realizations HC – FHI DevClub 11 Processor Trends By Cedric Nugteren, 2013 HC – FHI DevClub 12 Haswell (4th gen. Intel Core processor) HC – FHI DevClub 13 Intel Xeon Phi • Intel® Xeon Phi™ Coprocessor 7120X • 61 cores • 1.238 GHz, 1.333 GHZ Turbo • 512bit vector instructions • 30.5M Cache (512k / core) • 300W TDP • 16 GB on-board memory • Price: $4129 HC – FHI DevClub 14 ARM Cortex A57: 64-bit HC – FHI DevClub 15 Your S4, full of ARMs Exynos 5 Octa processor • Big – Little concept • Quad ARM Cortex-A15 • Quad ARM Cortex-A7 • + PowerVR GPU or Qualcomm Snapdragon S600 with quad ARM HC – FHI DevClub 16 Qualcomm Snapdragon 810 • 4x ARM Cortex A57 • 4x ARM Cortex A53 • Adreno 430 GPU • Hexagon V56 DSP • 20nm process technology • 4K capture and playback with H.264 and H.265 • Up to 55 MP Camera, dual ISP (Image Signal Processor) HC – FHI DevClub 17 PS4 / Xbox • 8 AMD Jaguars • L1 instr: 32kB, 2-way, L1 data: 32 KB, 8-way; 2MB 16-way L2/quad; • 1152 MAD (MultiplyAdd units): 1.84 TFlop • 8 GB GDDR5: 176 GB/s • 28nm HC – FHI DevClub 18 Nvidia GPU overview: going many core 8192 8800 GTX 9800 GTX GTX 480 GTX 580 GTX 680 GTX Titan GTX 780 Ti G80 G92 GF100 GF110 GK104 GK110 GK110 4096 GTX 280 2048 GT200 1024 512 256 128 64 jan jan jan jan jan jan jan jan jan - - - - - - - - - 11 13 06 07 08 09 10 12 14 Core count GFLOPS Power [W] HC – FHI DevClub 19 Maxwell – GM107 (2014) • 28nm • SM: • 4x 32 cores • 4x 8 LD/ST unit • 4x 8 SFU • 4 warp schedulers − 2 instructions / warp • 32 threads per warp 1 clock cycle / warp HC – FHI DevClub 20 Very Hot: NVIDIA K1 mobile platform • Big-Little: 4 big +1little ARM A15s • Keppler GPU with • 192 cores • Supports 2160p @ 30fps Keppler 192 cores ARM big - little HC – FHI DevClub 21 Major challenges • Ultra low power • Guarantee Real-Time requirements • Deal with tremendous complexity • Very complex processing platforms − Multi-processor − Heterogeneous − Huge software stack • Programming difficulty HC – FHI DevClub 22 Our TU/e group solutions • Real-time Multi-processing out of the box • from Model to Multi-Processor / System-on-Chip: CompSoC, MAMPSx • Composable and Predictable design • From C to GPUs & FPGAs • Characterize loop-nests: Species • from C to efficient GPU code • from C to verilog FPGA accelerators • Ultra Low Power • SIMD (vector processor) with very low Vdd • Avoid (external) memory traffic: Reuse data • Use of accelerators (Heterogeneous Processing Hardware) HC – FHI DevClub 23 MPSoC out of the box HC – FHI DevClub 24 Application Modeling: Synchronous Data Flow Tokens MPEG-4 SP: VLD IDCT Rates Actors 99 99 MC RC Channels • Conservative, worst-case abstraction • Strong analysis & synthesis support • Low implementation overhead … • … but may lead to over-allocation of resources HC – FHI DevClub 25 Modeling: Scenario-Aware Data Flow MPEG-4 SP: Tokens VLD IDCT Actors Rates x x ... State machine MC RC Channels I P99 P99 x = {0, 30, 40, 50, 60, 70, 80, 99} P0 • Dynamics captured in scenarios • Efficient implementations • Design and Modeling Tools available: SDF3 HC – FHI DevClubPAGE 26 26 MAMPSx Multiprocessor Framework for Real-Time correct-by-construction Systems HC – FHI DevClub 27 GPU Programming Electronic Systems HC – FHI DevClubPAGE 28 28 GPUs: hundreds of processors HC – FHI DevClub 29 However: Programming hell, from C to CUDA HC – FHI DevClub 30 Programming heaven ? FPGA Or FPGA? HC – FHI DevClub 31 Good news: Lots of similarity HC – FHI DevClub 32 Algorithmic Species HC – FHI DevClub 33 Our performance (compared to other fully automatic compilers) HC – FHI DevClub 34 Ultra Low Power HC – FHI DevClub 36 SIMD + low Vdd + New Memory Architecture • Exploiting Data Locality • Extreme low Voltage • Massive parallelism HC – FHI DevClub 37 Results on the ICE Graph 1pJ/op HC – FHI DevClub 38 Can we do better? • Yes ! How? • Supporting special functionality • Accelerators • Change your software • Advanced code (loop) transformations HC – FHI DevClub 39 E.g. Data Reuse in Convolutional Neural Networks CNN used for e.g. Recognizing Traffic Signs C2 S2 feature maps feature maps n1 C1 S1 10 x 10 5 x 5 input feature maps feature maps n2 32 x 32 28 x 28 14 x 14 output sign 30 50 60 70 80 90 100 5x5 1x1 convolution 2x2 convolution subsampling 5x5 2x2 5x5 convolution subsampling convolution feature extraction classification HC – FHI DevClub 40 Data reuse: Bandwith reduction 327x 1000 6 original 100 10 #External memory accesses accesses x 10 memory #External 1 On-chip memory size • However: Huge on-chip memory required HC – FHI DevClub 41 After using our code transformer: 1000 6 original loop n model simulated 100 loop m loop q loop r 10 #External memory accesses accesses x 10 memory #External 1 On-chip memory size • 64 times memory reduction / same BW reduction HC – FHI DevClub 42 Convolutional Neural Network FPGA prototype • Synthesized with Vivado HLS (AutoESL) C -> HDL • Extreme (Linear) speedup (with number of MACC PEs) Accelerator CNN Layer 3 FSL MACC PE MACC PE in_img MACC PE MACC PE In weight Ctrl MACC PE MACC PE bias MACC PE MACC PE MACC PE MACC PE out_img Out Ctrl Activation Select saturate DDR LUT HC – FHI DevClub 43 Summary • Embedded System become extremely complex • Processor trends • Advanced methods and tooling needed for • Programming efficiency • Design Space Exploration • Predictable Design: guaranteeing timing properties • Correct by Construction • Researching solutions • Automated MPSoC design • Automated C to target platform • Ultra low power • We FPGA prototype everything HC – FHI DevClub 44 Cheap prototype boards HC – FHI DevClub 45 Minnowboard MAX 99 to 139 USD HC – FHI DevClub 46 Minnowboard MAX Number of cores Single/Dual core 64-bit Intel® Atom™ E38xx Series SoC @ 1.46 / 1.33 GHz RAM 1/2 GB DDR3 RAM Flash 8 MByte SPI Flash External SD card Caches 32KByte I-Cache / 24KByte D-Cache 512 KByte L2 cache GPU Intel integrated HD Graphics I/O DVI over HDMI connector Audio SATA2 3Gb/sec 2 USB (host) & 1 USB-B (device; slave) Serial debug Serial (UART 0) to USB conversion (mini-USB-B port) 10/100/1000 Ethernet RJ-45 connector 8 Buffered GPIO (General Purpose I/O) pins SPI & I2C Special features 7-Issue slot VLIW from SiliconHive (INTEL) Price 99 USD / 139 USD HC – FHI DevClub 47 Raspberry Pi 35 USD HC – FHI DevClub 48 Raspberry Pi Number of cores 1 Broadcom BCM2835 (CPU: ARM11 @ 700 MHZ + GPU + DSP) RAM 512 MByte Flash External SD card Caches 16 KB (Instruction) / 16 KB (Data) L1 Cache GPU Broadcom VideoCore IV I/O 2 USB ports Audio out HDMI video out Composite video out 10/100 Mbit Ethernet GPIO Special features Very cheap, relatively small Price 35 USD HC – FHI DevClub 49 Jetson TK1 (using TEGRA K1) 192 USD HC – FHI DevClub 50 Jetson TK1 Number of cores 4-Plus-1 quad-core ARM: 4 Cortex A15+ A7 CPU @ 2.3 GHz RAM 2 GByte with 64 bit width Flash 16 GB 4.51 eMMC memory & External SD card Caches 32 KB I-cache, 32 KB D-cache 4 Mbyte L2 cache GPU Kepler GPU with 192 CUDA cores @ 950 MHz I/O Half mini-PCIE slot Full size SD/MMC connector Full-size HDMI port 1x USB 2.0, 1x USB 3.0 , 1x RS232 Audio & Gigabit ethernet Via expansion port: GPIO, UART, SPI, I2C Special features Very fast mobile GPU platform, OpenCL programmable Price 192 USD HC – FHI DevClub 51 Google Tango (using TEGRA K1) 1024 USD HC – FHI DevClub 52 Google Tango Number of cores 4-Plus-1 quad-core ARM: 4 Cortex A15 + A7 CPU @ 2.3 GHz RAM 4GB of RAM Flash 128GB Caches 32 KB I-cache, 32 KB D-cache 4 Mbyte L2 cache GPU Kepler
Recommended publications
  • Enabling the Use of Low Power Mobile and Embedded Technologies For
    Enabling the Use of Embedded and Mobile Technologies for High-Performance Computing Author: Advisor: Nikola Rajovic´ Alex Ramirez A THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor per la Universitat Polit`ecnicade Catalunya Departament d’Arquitectura de Computadors Barcelona, Spring 2017 To my family . Моjоj породици . Acknowledgements During my PhD studies I received help and support from many people - it would be strange if I did not. Thus I would like to thank professors Mateo Valero and Veljko Milutinovi´cfor recognizing my interest in HPC and introducing me to my advisor, professor Alex Ramirez. I would like to thank him for trusting in me and giving me an opportunity to be a pioneer of HPC on mobile ARM platforms, for guiding and shaping my work last five years, for helping me understand what real priorities are, and for being pushy when it was really needed. In addition, thank you Carlos and Puzo for helping me to have a smooth start of my PhD and for filling the gap when Alex was too busy. Last two years of my PhD I have been working with Alex remotely, and I would like to thank the local crew who helped me: professor Eduard Ayguade for providing me with some very hard-to-get manuscripts and accepting to be my "ponente" at the University; professor Jesus Labarta for thorough dis- cussions about parallel performance issues and know-how lessons on demand; Alex Rico for helping me deliver work for the Mont-Blanc project and for friendly advices every so little; Filippo Mantovani for making sure I could use the prototypes without outages and for filtering-out internal bureaucracy issues.
    [Show full text]
  • Stochastic Modeling and Performance Analysis of Multimedia Socs
    Stochastic Modeling and Performance Analysis of Multimedia SoCs Balaji Raman, Ayoub Nouri, Deepak Gangadharan, Marius Bozga, Ananda Basu, Mayur Maheshwari, Jérôme Milan, Axel Legay, Saddek Bensalem, Samarjit Chakraborty To cite this version: Balaji Raman, Ayoub Nouri, Deepak Gangadharan, Marius Bozga, Ananda Basu, et al.. Stochastic Modeling and Performance Analysis of Multimedia SoCs. SAMOS XIII - Embedded Computer Sys- tems: Architectures, Modeling, and Simulation, Jul 2013, Agios konstantinos, Samos Island, Greece. pp.145-154, 10.1109/SAMOS.2013.6621117. hal-00878094 HAL Id: hal-00878094 https://hal.archives-ouvertes.fr/hal-00878094 Submitted on 29 Oct 2013 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Stochastic Modeling and Performance Analysis of Multimedia SoCs Balaji Raman1, Ayoub Nouri1, Deepak Gangadharan2, Marius Bozga1, Ananda Basu1, Mayur Maheshwari1, Jerome Milan3, Axel Legay4, Saddek Bensalem1, and Samarjit Chakraborty5 1 VERIMAG, France, 2 Technical Univeristy of Denmark, 3 Ecole Polytechnique, France, 4 INRIA Rennes, France, 5 Technical University of Munich, Germany. E-mail: [email protected] Abstract—Quality of video and audio output is a design-time to estimate buffer size for an acceptable output quality. The constraint for portable multimedia devices.
    [Show full text]
  • An Emerging Architecture in Smart Phones
    International Journal of Electronic Engineering and Computer Science Vol. 3, No. 2, 2018, pp. 29-38 http://www.aiscience.org/journal/ijeecs ARM Processor Architecture: An Emerging Architecture in Smart Phones Naseer Ahmad, Muhammad Waqas Boota * Department of Computer Science, Virtual University of Pakistan, Lahore, Pakistan Abstract ARM is a 32-bit RISC processor architecture. It is develop and licenses by British company ARM holdings. ARM holding does not manufacture and sell the CPU devices. ARM holding only licenses the processor architecture to interested parties. There are two main types of licences implementation licenses and architecture licenses. ARM processors have a unique combination of feature such as ARM core is very simple as compare to general purpose processors. ARM chip has several peripheral controller, a digital signal processor and ARM core. ARM processor consumes less power but provide the high performance. Now a day, ARM Cortex series is very popular in Smartphone devices. We will also see the important characteristics of cortex series. We discuss the ARM processor and system on a chip (SOC) which includes the Qualcomm, Snapdragon, nVidia Tegra, and Apple system on chips. In this paper, we discuss the features of ARM processor and Intel atom processor and see which processor is best. Finally, we will discuss the future of ARM processor in Smartphone devices. Keywords RISC, ISA, ARM Core, System on a Chip (SoC) Received: May 6, 2018 / Accepted: June 15, 2018 / Published online: July 26, 2018 @ 2018 The Authors. Published by American Institute of Science. This Open Access article is under the CC BY license.
    [Show full text]
  • Comparative Study of Various Systems on Chips Embedded in Mobile Devices
    Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol.4, No.7, 2013 - National Conference on Emerging Trends in Electrical, Instrumentation & Communication Engineering Comparative Study of Various Systems on Chips Embedded in Mobile Devices Deepti Bansal(Assistant Professor) BVCOE, New Delhi Tel N: +919711341624 Email: [email protected] ABSTRACT Systems-on-chips (SoCs) are the latest incarnation of very large scale integration (VLSI) technology. A single integrated circuit can contain over 100 million transistors. Harnessing all this computing power requires designers to move beyond logic design into computer architecture, meet real-time deadlines, ensure low-power operation, and so on. These opportunities and challenges make SoC design an important field of research. So in the paper we will try to focus on the various aspects of SOC and the applications offered by it. Also the different parameters to be checked for functional verification like integration and complexity are described in brief. We will focus mainly on the applications of system on chip in mobile devices and then we will compare various mobile vendors in terms of different parameters like cost, memory, features, weight, and battery life, audio and video applications. A brief discussion on the upcoming technologies in SoC used in smart phones as announced by Intel, Microsoft, Texas etc. is also taken up. Keywords: System on Chip, Core Frame Architecture, Arm Processors, Smartphone. 1. Introduction: What Is SoC? We first need to define system-on-chip (SoC). A SoC is a complex integrated circuit that implements most or all of the functions of a complete electronic system.
    [Show full text]
  • Overview of CPU Power Consumption and Management in Smartphones
    Overview of CPU Power Consumption and Management in Smartphones Prof. Sasu Tarkoma University of Helsinki, Aalto University, Helsinki Institute for Information Technology Contents • Modern smartphone SoC and CPUs – The CPU: power states – Power management basics • Smartphone solutions – Linux CPU Frequency subsystem – Power models • Intra-device task offloading – Sensor hub – Heterogeneous multiprocessing • Computation offloading Smartphones • Smartphones have become hubs for applications and connecting with the Internet • Cloud has emerged as a backend for mobile applications • Mobile data and WiFi are the dominant protocols for connecting with Internet resources • The next generation solutions are addressing limitations of the current smartphones – Coordination of resource usage – Offloading in its many forms – Heterogeneous environment and the emergence of IoT / M2M / wearables Observations • Smartphone and mobile device hardware and software evolve rapidly • Multiple wireless protocols • Heterogeneous computing over multiple cores – Dedicated subsystems (sensor hubs) – Increasing number of sensing subsystems – Always-on sensing • Battery technology has not kept pace with the development • Software is not, in many cases, optimized • Difficult to balance between local versus distributed processing • Difficult to control traffic across interfaces Mobile Evolution 1995 2000 2005 2010 2015 Processor Single Single Single Dual-core Quad-core and beyond, auxiliary processors, sensor hubs Cellular 2G 2.5-3G 3.5G Transition 4G generation toward 4G Standard GSM GPRS HSPA HSPA, LTE LTE, LTE-A Downlink (Mb/ 0.01 0.1 1 10 100 s) Display pixels 4 16 64 256 1024 (x1000) Communicatio - - WiFi, WiFi, WiFi, Bluetooth LE, ns modules Bluetooth Bluetooth RFID Battery 1 2 3 4 5 capacity (Wh) Software (MB) 0.1 1 10 100 1000 Example Smartphone SoC Modem Subsystem Multicore Subsystem Multimedia Subsystem LTE Adreno World KRAIT CPU KRAIT CPU GPU Modem Audio, GPS,Wi-Fi, Video HW, BT,FM Accelerator L1 Cache L1 Cache s DSP DSP DSP L2 Cache Multim.
    [Show full text]
  • DISI - University of Trento
    PhD Dissertation International Doctorate School in Information and Communication Technologies DISI - University of Trento Cyber-Physical Systems: Two Case Studies in Design Methodologies Luca Rizzon Advisor: Prof. Roberto Passerone Universit`adegli Studi di Trento April 2016 Abstract To analyze embedded systems, engineers use tools that can simulate the performance of software components executed on hardware architectures. When the embedded system functionality is strongly correlated to physical quantities, as in the case of Cyber-Physical System (CPS), we need to model physical processes to determine the overall behavior of the system. Unfortunately, embedded systems simulators are not generally suitable to evaluate physical processes, and in the same way physical model simulators hardly capture the functionality of computing systems. In this work, we present a methodology to concurrently explore these aspects using the metroII design framework. The methodology provides guidelines for the implementation of these models in the design environment. To demonstrate the feasibility of the proposed approach, we applied the methodology to two case studies. A case study regards a binaural guidance system developed to be included into a smart rollator for older adults. The second case consists of an energy recovery device which gets energy from the heat dissipated by a high performance processor and power a smart sink able to provide cooling or to serve as a wireless sensing node. Keywords [Cyber-Physical Systems, Design Methodology, Binaural Synthesis, Energy Harvesting] Contents 1 Introduction 1 1.1 CPS Design Challenges . .1 1.2 Structure of the Thesis . .3 2 Design Methodology 5 2.1 State of the Art . .5 2.2 MetroII Design Framework .
    [Show full text]
  • M-Line BROCHURE
    The Novasom Industries M-line was created for those advanced multimedia applications where the computing power and the presence of specific HW accelerators are needed as much as the advanced connectivity to various kinds of displays while maintaining the classic low-level industrial connectivity. Novasom Industries M11 series, based on the new Intel Apollo Lake x5 6th generation Atom CPU with Microsoft Windows 10 and UHD (4k) video capabilities, is perfect for the typical Kiosks & Digital-Signage applications. The M7 is based on the Rockchip RK3328, a 4X A53 processor and can drive UHD (4k) displays, has USB3 & 2, HDMI and supports Android OS. Complete SBC with immediate bootstrap Native Android & Linux support O.S. (M7, M8 & M9) M8 board runs Qualcomm Snapdragon 410E with Android and Windows 10 IoT and can be connected to FHD displays. Native Window 10 and Linux (M11) Embedded UPS manager with battery and Redundant The M9, based on the Rockchip RK3399 offers an android like Power Input experience and high side multimedia application with UHD, multiple video and camera input. HD Audio output and Optical SPDIF mPCIe interface slot (M9, M7 & M11) All the M-Line boards support Linux OS. Fluidity and no scratch on Heavy UHD play RASPMOOD form factor for M7, M8 and M9: dimensions, guaranteed mechanical holes, expansion pin on strip, connector Fully certified board, visit kind and position are similar to the famous Pi Family. www.novasomindustries.com for details So if you've started with a toy-board and want to use in an industrial proposal, we are ready.
    [Show full text]
  • Introducing Slambench, a Performance and Accuracy Benchmarking Methodology for SLAM
    Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM Luigi Nardi1, Bruno Bodin2, M. Zeeshan Zia1, John Mawer3, Andy Nisbet3, Paul H. J. Kelly1, Andrew J. Davison1, Mikel Lujan´ 3, Michael F. P. O’Boyle2, Graham Riley3, Nigel Topham2 and Steve Furber3 Kernels Architectures Correctness Performance Metrics Frame rate Computer bilateralFilter (..) halfSampleRobust (..) Vision renderVolume (..) integrate (..) : Energy : Accuracy Compiler/Runtime Hardware ICL-NUIM Dataset Fig. 1: The SLAMBench framework makes it possible for experts coming from the computer vision, compiler, run-time, and hardware communities to cooperate in a unified way to tackle algorithmic and implementation alternatives. Abstract— Real-time dense computer vision and SLAM offer While classical point feature-based simultaneous locali- great potential for a new level of scene modelling, tracking sation and mapping (SLAM) techniques are now crossing and real environmental interaction for many types of robot, into mainstream products via embedded implementation in but their high computational requirements mean that use on mass market embedded platforms is challenging. Mean- projects like Project Tango [3] and Dyson 360 Eye [1], while, trends in low-cost, low-power processing are towards dense SLAM algorithms with their high computational re- massive parallelism and heterogeneity, making it difficult for quirements are largely at the prototype stage on PC or robotics and vision researchers to implement their algorithms laptop platforms. Meanwhile, there has been a great focus in a performance-portable way. In this paper we introduce in computer vision on developing benchmarks for accuracy SLAMBench, a publicly-available software framework which represents a starting point for quantitative, comparable and comparison, but not on analysing and characterising the validatable experimental research to investigate trade-offs in envelopes for performance and energy consumption.
    [Show full text]
  • QTEE STOR, Is Shown in Figure 4
    Qualcomm Snapdragon, Qualcomm Trusted Execution Environment, Qualcomm Secure Storage Solutions, Qualcomm Secure Processing Unit, Qualcomm Secure File System, and Qualcomm Fast Trusted Storage are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Other products and brand names may be trademarks or registered trademarks of their respective owners. The contents of this document are provided on an “as-is” basis without warranty of any kind. Qualcomm Technologies, Inc. specifically disclaims the implied warranties of merchantability and fitness for a particular purpose. Qualcomm Technologies, Inc. 5775 Morehouse Drive San Diego, CA 92121 U.S.A. © 2019 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Overview .............................................................................................................................. 1 Acronyms ............................................................................................................................. 2 Limitation of pure software-based solutions ...................................................................... 3 Hardware building blocks .................................................................................................... 4 Qualcomm® Trusted Execution Environment ................................................................. 4 Hardware Crypto Engine ................................................................................................
    [Show full text]
  • Report on Tuned Linux-ARM Kernel and Delivery of Kernel Patches to the Linux Kernel Version 1.0
    D5.4{ Report on Tuned Linux-ARM kernel and delivery of kernel patches to the Linux kernel Version 1.0 Document Information Contract Number 288777 Project Website www.montblanc-project.eu Contractual Deadline M18 Dissemintation Level PU Nature Other Coordinator Alex Ramirez (BSC) Contributors Roxana Rusitoru (ARM) Reviewers Isaac Gelado (BSC) Keywords Linux kernel, HPC, Transparent HugePage Notices: The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement no 288777 c 2013 Mont-Blanc Consortium Partners. All rights reserved. D5.4 - Report on Tuned Linux-ARM kernel and delivery of kernel patches to the Linux kernel Version 1.0 Change Log Version Description of Change v0.1 Initial draft released to the WP5 contributors v0.2 Revised draft released to the WP5 contributors v1.0 Version to send to the WP leaders and the EC 2 D5.4 - Report on Tuned Linux-ARM kernel and delivery of kernel patches to the Linux kernel Version 1.0 Contents 1 Introduction5 2 Analysis of Transparent HugePages: pandaboard5 2.1 Introduction...................................5 2.2 Hardware setup.................................5 2.2.1 Limitations...............................5 2.3 Benchmarks...................................6 2.3.1 Setup..................................8 2.3.1.1 Compilation flags......................8 2.3.1.2 Processor affinity.......................8 2.3.1.3 ATLAS............................8 2.4 Sample HPC application - bigDFT......................8 2.5 Methodology..................................9 2.6 Results...................................... 12 2.6.1 Observations.............................. 15 2.7 Conclusions................................... 15 2.8 Further work.................................. 15 2.8.1 Kernel profiling............................. 15 2.8.2 In-depth analysis of results.....................
    [Show full text]
  • Arxiv:1910.06663V1 [Cs.PF] 15 Oct 2019
    AI Benchmark: All About Deep Learning on Smartphones in 2019 Andrey Ignatov Radu Timofte Andrei Kulik ETH Zurich ETH Zurich Google Research [email protected] [email protected] [email protected] Seungsoo Yang Ke Wang Felix Baum Max Wu Samsung, Inc. Huawei, Inc. Qualcomm, Inc. MediaTek, Inc. [email protected] [email protected] [email protected] [email protected] Lirong Xu Luc Van Gool∗ Unisoc, Inc. ETH Zurich [email protected] [email protected] Abstract compact models as they were running at best on devices with a single-core 600 MHz Arm CPU and 8-128 MB of The performance of mobile AI accelerators has been evolv- RAM. The situation changed after 2010, when mobile de- ing rapidly in the past two years, nearly doubling with each vices started to get multi-core processors, as well as power- new generation of SoCs. The current 4th generation of mo- ful GPUs, DSPs and NPUs, well suitable for machine and bile NPUs is already approaching the results of CUDA- deep learning tasks. At the same time, there was a fast de- compatible Nvidia graphics cards presented not long ago, velopment of the deep learning field, with numerous novel which together with the increased capabilities of mobile approaches and models that were achieving a fundamentally deep learning frameworks makes it possible to run com- new level of performance for many practical tasks, such as plex and deep AI models on mobile devices. In this pa- image classification, photo and speech processing, neural per, we evaluate the performance and compare the results of language understanding, etc.
    [Show full text]
  • Arm-Based Computing Platform Solutions Accelerating Your Arm Project Development
    Arm-based Computing Platform Solutions Accelerating Your Arm Project Development Standard Hardware Solutions AIM-Linux & AIM-Android Services Integrated Peripherals Trusty Ecosystem QT Automation SUSI API Transportation Medical MP4 BSP NFS Video Driver LOADER Acceleration Kernel Security Boot Loader Networking www.advantech.com Key Factors for Arm Business Success Advantech’s Arm computing solutions provide an open and unified development platform that minimizes effort and increases resource efficiency when deploying Arm-based embedded applications. Advantech Arm computing platforms fulfill the requirements of power- optimized mobile devices and performance-optimized applications with a broad offering of Computer-on-Modules, single board, and box computer solutions based on the latest Arm technologies. This year, Advantech’s Arm computing will roll out three new innovations to lead embedded Arm technologies into new arena: 1. The i.MX 8 series aims for next generation computing performance and targets new application markets like AI. 2. Developing a new standard: UIO20/40-Express, an expansion interface for extending various I/Os easily and quickly for different embedded applications. 3. We are announcing the Advantech AIM-Linux and AIM-Android, which provide unfiled BSP, modularized App Add-Ons, and SDKs for customers to accelerate their application development. Standardized Hardware Solutions • Computer on Module • Single Board Computer • Computer Box AIM-Linux AIM-Linux & AIM-Android • Unified Embedded Platforms AIM-Android • App Add-Ons
    [Show full text]