KVM/ARM: the Design and Implementation of the Linux ARM Hypervisor

Total Page:16

File Type:pdf, Size:1020Kb

KVM/ARM: the Design and Implementation of the Linux ARM Hypervisor KVM/ARM: The Design and Implementation of the Linux ARM Hypervisor Christoffer Dall and Jason Nieh ~ 1.2 billion ~ 300 million Source: Gartner Market Report, Jan 2014 ARM Servers ARM Network Equipment Virtualization Extensions Key Challenges != Virtualization Extensions VT-x • No PC-standard on ARM Key Contributions • Split-Mode Virtualization • Implemented and evaluated KVM/ARM • Upstreamed implementation to mainline linux Outline • ARM Virtualization Extensions • Comparison to x86 • Split-Mode Virtualization • Results • Experiences ARM Virtualization Extensions • Provides virtualization in 4 key areas: • CPU Virtualization • Memory Virtualization • Interrupt Virtualization • Timer Virtualization ARM Virtualization Extensions CPU Virtualization User Kernel ARM Virtualization Extensions CPU Virtualization User Kernel Hyp ARM Virtualization Extensions CPU Virtualization VM 0 VM 1 User User Kernel Kernel Hypervisor ARM Virtualization Extensions CPU Virtualization • Separate stack • Different control registers • Different page table format • Controls Memory, Interrupt, and Timer virtualization • Can be completely bypassed ARM Virtualization Extensions Memory Virtualization 0 4 GB Virtual User space application Kernel Addresses Stage-1 Page Tables MMU 0 2^40 Intermediate Physical RAM Devices Addresses (IPA) Stage-2 Page Tables MMU 0 2^40 Physical RAM Devices Addresses ARM Virtualization Extensions Interrupt Virtualization • Interrupts are an important part of any system • The mechanism to notify CPU of events • CPUs interact with an interrupt controller • VMs accessing an interrupt controller traps • Hurts performance ARM Virtualization Extensions Interrupt Virtualization CPU IRQ Interface CPU 0 Dist. CPU ACK Interface CPU 1 GIC ARM Virtualization Extensions Interrupt Virtualization CPU IRQ Interface CPU 0 Virtual CPU VIRQ Interface Dist. CPU ACK Interface CPU 1 Virtual CPU ACK Interface GIC x86 Root Non-Root (VM) VM ENTRY VM EXIT VMCS ARM User Space Kernel Trap Hypervisor (HYP) ARM User Space Kernel Trap ? Hypervisor (HYP) Standalone hypervisor on ARM • Reimplement basic OS functionality • Hardware support is a pain! Hosted Hypervisor on ARM Kernel ARM Hardware Hosted Hypervisor on ARM VM VM Guest Kernel Guest Kernel Kernel Hypervisor ARM Hardware Hosted Hypervisor on ARM HYP mode? VM VM Guest Kernel Guest Kernel Kernel Hypervisor ARM Hardware Split-Mode Virtualization OS Kernel Hypervisor Kernel mode HYP mode Split-Mode Virtualization OS Kernel Hypervisor Kernel mode Highvisor HYP mode Lowvisor Split-Mode Virtualization OS Kernel Hypervisor Run VM Kernel mode Highvisor HYP mode Lowvisor Split-Mode Virtualization OS Kernel Hypervisor Run VM Kernel mode Highvisor Trap HYP mode Lowvisor Split-Mode Virtualization OS Kernel Hypervisor VM Run VM Kernel mode Highvisor Trap Trap HYP mode Lowvisor Split-Mode Virtualization OS Kernel Hypervisor VM Kernel mode Highvisor HYP mode Lowvisor Split-Mode Virtualization OS Kernel Hypervisor VM Kernel mode Highvisor Trap HYP mode Lowvisor Split-Mode Virtualization OS Kernel Hypervisor VM Kernel mode Highvisor Trap Trap HYP mode Lowvisor Split-Mode Virtualization OS Kernel Hypervisor VM Function call Kernel mode Highvisor Trap Trap HYP mode Lowvisor Split-Mode Virtualization User Space VM User Space User mode Kernel mode OS Kernel VM Kernel Hyper- visor HYP mode ARM Hardware Split-Mode Virtualization User Space VM User Space User mode Kernel mode OS Kernel VM Kernel Hyper- visor HYP mode ARM Hardware Split-Mode Virtualization User Space VM User Space User mode Kernel mode OS Kernel VM Kernel Hyper- visor HYP mode ARM Hardware CPU Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor CPU Virtualization OS Kernel Hypervisor VM Run VM Highvisor Kernel mode HYP mode Lowvisor CPU Virtualization OS Kernel Hypervisor VM Run VM Highvisor Kernel mode HYP mode Lowvisor Enables traps CPU Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor CPU Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Disables traps Memory Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Memory Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Configures Stage-2 Page Tables Memory Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Configures Stage-2 Page Tables Memory Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Enables Stage-2 Translation Memory Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Memory Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Disables Stage-2 Translation Interrupt Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Interrupt Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Prepare Virtual Interrupts Interrupt Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Prepare Virtual Interrupts Interrupt Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Enable Virtual Interrupts Interrupt Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Interrupt Virtualization OS Kernel Hypervisor VM Highvisor Kernel mode HYP mode Lowvisor Disable Virtual Interrupts Split-Mode Virtualization Pros Cons • Existing OS functionality • Existing hardware • Double-trap on support world-switch • HYP mode for virtualization Implementation • Implemented KVM/ARM using split-mode virtualization • Merged into mainline Linux v3.9 Experimental Setup Arndale Board - Exynos 5250 MacBook Air Dual-Core Cortex A-15 @ 1.4 GHz Dual-Core Intel i7 @1.8 GHz eSATA SSD SATA SSD Experimental Setup version 12.10 Kernel version 3.10 (Quantal Quetzal) Same amount of memory Same kernel config across all runs Micro Benchmarks Test ARM (cycles) x86 (cycles) Trap 27 632 Hypercall 5,326 1,336 Send virtual IPI 14,366 17,138 Ack virtual IPI 427 2,043 2.0 Macro Benchmarks 1.5 1.0 0.5 Apache MySQL ARM memcached x86 kernel compile untar curl1k curl1g hackbench Outline • ARM Virtualization Extensions • Comparison to x86 • Split-Mode Virtualization • Results • Experiences Maintainability is key Maintainability is key Genuine fear of “Dump and Run” arch/arm mach-aaec2000 mach-ixp4xx mach-omap2 mach-shark mach-at91 mach-keystone mach-orion5x mach-shmobile mach-bcm mach-kirkwood mach-picoxcell mach-socfpga mach-bcmring mach-ks8695 mach-pnx4008 mach-spear mach-clps711x mach-l7200 mach-prima2 mach-sti mach-davinci mach-lh7a40x mach-pxa mach-stmp378x mach-dove mach-loki mach-realview mach-stmp37xx mach-ebsa110 mach-mmp mach-rockchip mach-sunxi mach-ep93xx mach-msm mach-rpc mach-tegra mach-footbridge mach-mv78xx0 mach-s3c2400 mach-u300 mach-gemini mach-mvebu mach-s3c2410 mach-ux500 mach-h720x mach-mx1 mach-s3c2412 mach-versatile mach-highbank mach-mx2 mach-s3c2440 mach-vexpress mach-imx mach-mx25 mach-s3c2442 mach-virt mach-integrator mach-mx3 mach-s3c2443 mach-vt8500 mach-iop13xx mach-mxc91231 mach-s3c24a0 mach-w90x900 mach-iop32x mach-netx mach-s3c6400 mach-zynq mach-iop33x mach-nomadik mach-s3c6410 ! mach-ixp2000 mach-ns9xxx mach-s5pc100 ! mach-ixp23xx mach-omap1 mach-sa1100 Trust Trust It’s about the people Linus Torvalds Russell King Avi Kivity … arch/arm virt/kvm … Will Deacon Marc Zyngier (ARM Ltd.) (ARM Ltd.) Conclusion • KVM/ARM was the first system to run unmodified guests on hardware • Facilitated by split-mode virtualization • Performance within 10% of native execution • Implemented KVM/ARM and upstreamed to Linux • Commercially supported and maintained Use and Contribute • Web: http://systems.cs.columbia.edu/projects/kvm-arm • Code: https://github.com/torvalds/linux • Mailing list: [email protected] • IRC: #kvm-arm on freenode.
Recommended publications
  • Enabling the Use of Low Power Mobile and Embedded Technologies For
    Enabling the Use of Embedded and Mobile Technologies for High-Performance Computing Author: Advisor: Nikola Rajovic´ Alex Ramirez A THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor per la Universitat Polit`ecnicade Catalunya Departament d’Arquitectura de Computadors Barcelona, Spring 2017 To my family . Моjоj породици . Acknowledgements During my PhD studies I received help and support from many people - it would be strange if I did not. Thus I would like to thank professors Mateo Valero and Veljko Milutinovi´cfor recognizing my interest in HPC and introducing me to my advisor, professor Alex Ramirez. I would like to thank him for trusting in me and giving me an opportunity to be a pioneer of HPC on mobile ARM platforms, for guiding and shaping my work last five years, for helping me understand what real priorities are, and for being pushy when it was really needed. In addition, thank you Carlos and Puzo for helping me to have a smooth start of my PhD and for filling the gap when Alex was too busy. Last two years of my PhD I have been working with Alex remotely, and I would like to thank the local crew who helped me: professor Eduard Ayguade for providing me with some very hard-to-get manuscripts and accepting to be my "ponente" at the University; professor Jesus Labarta for thorough dis- cussions about parallel performance issues and know-how lessons on demand; Alex Rico for helping me deliver work for the Mont-Blanc project and for friendly advices every so little; Filippo Mantovani for making sure I could use the prototypes without outages and for filtering-out internal bureaucracy issues.
    [Show full text]
  • Bootstomp: on the Security of Bootloaders in Mobile Devices
    BootStomp: On the Security of Bootloaders in Mobile Devices Nilo Redini, Aravind Machiry, Dipanjan Das, Yanick Fratantonio, Antonio Bianchi, Eric Gustafson, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna, UC Santa Barbara https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/redini This paper is included in the Proceedings of the 26th USENIX Security Symposium August 16–18, 2017 • Vancouver, BC, Canada ISBN 978-1-931971-40-9 Open access to the Proceedings of the 26th USENIX Security Symposium is sponsored by USENIX BootStomp: On the Security of Bootloaders in Mobile Devices Nilo Redini, Aravind Machiry, Dipanjan Das, Yanick Fratantonio, Antonio Bianchi, Eric Gustafson, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna fnredini, machiry, dipanjan, yanick, antoniob, edg, yans, chris, [email protected] University of California, Santa Barbara Abstract by proposing simple mitigation steps that can be im- plemented by manufacturers to safeguard the bootloader Modern mobile bootloaders play an important role in and OS from all of the discovered attacks, using already- both the function and the security of the device. They deployed hardware features. help ensure the Chain of Trust (CoT), where each stage of the boot process verifies the integrity and origin of 1 Introduction the following stage before executing it. This process, in theory, should be immune even to attackers gaining With the critical importance of the integrity of today’s full control over the operating system, and should pre- mobile and embedded devices, vendors have imple- vent persistent compromise of a device’s CoT. However, mented a string of inter-dependent mechanisms aimed at not only do these bootloaders necessarily need to take removing the possibility of persistent compromise from untrusted input from an attacker in control of the OS in the device.
    [Show full text]
  • FAN53525 3.0A, 2.4Mhz, Digitally Programmable Tinybuck® Regulator
    FAN53525 — 3.0 A, 2.4 MHz, June 2014 FAN53525 3.0A, 2.4MHz, Digitally Programmable TinyBuck® Regulator Digitally Programmable TinyBuck Digitally Features Description . Fixed-Frequency Operation: 2.4 MHz The FAN53525 is a step-down switching voltage regulator that delivers a digitally programmable output from an input . Best-in-Class Load Transient voltage supply of 2.5 V to 5.5 V. The output voltage is 2 . Continuous Output Current Capability: 3.0 A programmed through an I C interface capable of operating up to 3.4 MHz. 2.5 V to 5.5 V Input Voltage Range Using a proprietary architecture with synchronous . Digitally Programmable Output Voltage: rectification, the FAN53525 is capable of delivering 3.0 A - 0.600 V to 1.39375 V in 6.25 mV Steps continuous at over 80% efficiency, maintaining that efficiency at load currents as low as 10 mA. The regulator operates at Programmable Slew Rate for Voltage Transitions . a nominal fixed frequency of 2.4 MHz, which reduces the . I2C-Compatible Interface Up to 3.4 Mbps value of the external components to 330 nH for the output inductor and as low as 20 µF for the output capacitor. PFM Mode for High Efficiency in Light Load . Additional output capacitance can be added to improve . Quiescent Current in PFM Mode: 50 µA (Typical) regulation during load transients without affecting stability, allowing inductance up to 1.2 µH to be used. Input Under-Voltage Lockout (UVLO) ® At moderate and light loads, Pulse Frequency Modulation Regulator Thermal Shutdown and Overload Protection . (PFM) is used to operate in Power-Save Mode with a typical .
    [Show full text]
  • Embedded Computer Solutions for Advanced Automation Control «
    » Embedded Computer Solutions for Advanced Automation Control « » Innovative Scalable Hardware » Qualifi ed for Industrial Software » Open Industrial Communication The pulse of innovation » We enable Automation! « Open Industrial Automation Platforms Kontron, one of the leaders of embedded computing technol- ogy has established dedicated global business units to provide application-ready OEM platforms for specifi c markets, includ- ing Industrial Automation. With our global corporate headquarters located in Germany, Visualization & Control Data Storage Internet-of-Things and regional headquarters in the United States and Asia-Pa- PanelPC Industrial Server cifi c, Kontron has established a strong presence worldwide. More than 1000 highly qualifi ed engineers in R&D, technical Industrie 4.0 support, and project management work with our experienced sales teams and sales partners to devise a solution that meets M2M SYMKLOUD your individual application’s demands. When it comes to embedded computing, you can focus on your core capabilities and rely on Kontron as your global OEM part- ner for a successful long-term business relationship. In addition to COTS standards based products, Kontron also of- fers semi- and full-custom ODM services for a full product port- folio that ranges from Computer-on-Modules and SBCs, up to embedded integrated systems and application ready platforms. Open for new technologies Kontron provides an exceptional range of hardware for any kind of control solution. Open for individual application Kontron systems are available either as readily integrated control solutions, or as open platforms for customers who build their own control applications with their own look and feel. Open for real-time Kontron’s Industrial Automation platforms are open for Real- Industrial Ethernet Time operating systems like VxWorks and Linux with real time extension.
    [Show full text]
  • A3MAP: Architecture-Aware Analytic Mapping for Networks-On-Chip Wooyoung Jang and David Z
    6C-2 A3MAP: Architecture-Aware Analytic Mapping for Networks-on-Chip Wooyoung Jang and David Z. Pan Department of Electrical and Computer Engineering University of Texas at Austin [email protected], [email protected] Abstract - In this paper, we propose a novel and global A3MAP formulation, we seek to embed a task graph into the metric (Architecture-Aware Analytic Mapping) algorithm applied to space of network. Then, the quality of task mapping is NoC (Networks-on-Chip) based MPSoC (Multi-Processor measured by the total distortion of metric embedding. System-on-Chip) not only with homogeneous cores on regular Through this formulation, our A3MAP can map a task mesh architecture as done by most previous mapping adaptively to any different sized tile both on a algorithms but also with heterogeneous cores on irregular mesh or custom architecture. As a main contribution, we develop a regular/irregular mesh and on a custom network. Fig. 1 simple yet efficient interconnection matrix that models any task shows the methodology of our A3MAP. Given a task graph graph and network. Then, task mapping problem is exactly and a network as inputs, an interconnection matrix that can formulated to an MIQP (Mixed Integer Quadratic model any task graph and network along interconnection is Programming). Since MIQP is NP-hard [15], we propose two generated. Then, task mapping problem is exactly effective heuristics, a successive relaxation algorithm and a formulated to an MIQP (Mixed Integer Quadratic genetic algorithm. Experimental results show that A3MAP by Programming) and is solved by two effective heuristics since the successive relaxation algorithm reduces an amount of the MIQP is NP-hard [15].
    [Show full text]
  • Comparative Study of Various Systems on Chips Embedded in Mobile Devices
    Innovative Systems Design and Engineering www.iiste.org ISSN 2222-1727 (Paper) ISSN 2222-2871 (Online) Vol.4, No.7, 2013 - National Conference on Emerging Trends in Electrical, Instrumentation & Communication Engineering Comparative Study of Various Systems on Chips Embedded in Mobile Devices Deepti Bansal(Assistant Professor) BVCOE, New Delhi Tel N: +919711341624 Email: [email protected] ABSTRACT Systems-on-chips (SoCs) are the latest incarnation of very large scale integration (VLSI) technology. A single integrated circuit can contain over 100 million transistors. Harnessing all this computing power requires designers to move beyond logic design into computer architecture, meet real-time deadlines, ensure low-power operation, and so on. These opportunities and challenges make SoC design an important field of research. So in the paper we will try to focus on the various aspects of SOC and the applications offered by it. Also the different parameters to be checked for functional verification like integration and complexity are described in brief. We will focus mainly on the applications of system on chip in mobile devices and then we will compare various mobile vendors in terms of different parameters like cost, memory, features, weight, and battery life, audio and video applications. A brief discussion on the upcoming technologies in SoC used in smart phones as announced by Intel, Microsoft, Texas etc. is also taken up. Keywords: System on Chip, Core Frame Architecture, Arm Processors, Smartphone. 1. Introduction: What Is SoC? We first need to define system-on-chip (SoC). A SoC is a complex integrated circuit that implements most or all of the functions of a complete electronic system.
    [Show full text]
  • Nomadik Application Processor Andrea Gallo Giancarlo Asnaghi ST Is #1 World-Wide Leader in Digital TV and Consumer Audio
    Nomadik Application Processor Andrea Gallo Giancarlo Asnaghi ST is #1 world-wide leader in Digital TV and Consumer Audio MP3 Portable Digital Satellite Radio Set Top Box Player Digital Car Radio DVD Player MMDSP+ inside more than 200 million produced chips January 14, 2009 ST leader in mobile phone chips January 14, 2009 Nomadik Nomadik is based on this heritage providing: – Unrivalled multimedia performances – Very low power consumption – Scalable performances January 14, 2009 BestBest ApplicationApplication ProcessorProcessor 20042004 9 Lowest power consumption 9 Scalable performance 9 Video/Audio quality 9 Cost-effective Nominees: Intel XScale PXA260, NeoMagic MiMagic 6, Nvidia MQ-9000, STMicroelectronics Nomadik STn8800, Texas Instruments OMAP 1611 January 14, 2009 Nomadik Nomadik is a family of Application Processors – Distributed processing architecture ARM9 + multiple Smart Accelerators – Support of a wide range of OS and applications – Seamless integration in the OS through standard API drivers and MM framework January 14, 2009 roadmap ... January 14, 2009 Some Nomadik products on the market... January 14, 2009 STn8815 block diagram January 14, 2009 Nomadik : a true real time multiprocessor platform ARM926 SDRAM SRAM General (L1 + L2) Purpose •Unlimited Space (Level 2 •Limited Bandwidth Cache System for Video) DMA Master OS Memory Controller Peripherals multi-layer AHB bus RTOS RTOS Multi-thread (Scheduler FSM) NAND Flash MMDSP+ Video •Unlimited Space MMDSP+ Audio 66 MHz, 16-bit •“No” Bandwidth 133 MHz, 24-bit •Mass storage
    [Show full text]
  • ARM Was Developed at Acron Computer Limited Of
    MEH420 Intro. To Embedded Systems ARM Processors ARM Processors • ARM was developed at Acron Computer • Based upon RISC Architecture with Limited of Cambridge, England between enhancements to meet requirements of 1983 & 1985 embedded applications. • RISC concept introduced in 1980 at Stanford • A large uniform register file and Berkley • Load-store architecture, where data processing operations operate on register contents only • ARM Limited founded in 1990 • Uniform and fixed length instruction • ARM Cores • 32-bit processor • Licensed to partners to develop and fabricate new • Instructions are 32-bit long microcontrollers • Good speed / power consumption ratio • Soft core • High code density -1- -2- -3- ARM Processors ARM Processors ARM Processors • Version 1 (1983-1985) (obsolete) • Version 5T • Enhancement to Basic RISC Features: • 26-bit addressing, no multiply or coprocessor • Superset of 4T adding new instruction • Version 5TE • Control over ALU and barrel shifter for every data • Version 2 (obsolete) processing operation to maximize their usage • Includes 32-bit result multiply co-processor • Add signal processing extension • Auto-increment and auto-decrement addressing • Version 3 • Examples: • ARM9E-S: v5TE (Sony Ericsson K-W series, TI modes to optimize program loops • 32-bit addressing • Load and Store multiple instructions to maximize OMAPs) data throughput • Version 4 • XScale: v4 (Samsung Omnia, Blackberry) • Conditional execution of instructions to maximize • Add signed, unsigned half-word and signed byte • Version 6 execution throughput load and store instructions • ARM11: ARMv6 (iPhone, Nokia E90, N95 etc) • Version 4T: Thumb compressed form of • Cortex-M0-M1: ARMv6 (STM32, NXP LPC, FPGA instruction introduced. Softcore) -4- -5- -6- ARM Processors ARM Processors: Common Features (till v5) ARM Processors: Basic ARM Organization • ARM v7: (M,E-M,R,A): Cortex-M3-M4, Cortex-R4-R5- R7, Cortex-A5-A7-A8-A9,A12, A15.
    [Show full text]
  • DISI - University of Trento
    PhD Dissertation International Doctorate School in Information and Communication Technologies DISI - University of Trento Cyber-Physical Systems: Two Case Studies in Design Methodologies Luca Rizzon Advisor: Prof. Roberto Passerone Universit`adegli Studi di Trento April 2016 Abstract To analyze embedded systems, engineers use tools that can simulate the performance of software components executed on hardware architectures. When the embedded system functionality is strongly correlated to physical quantities, as in the case of Cyber-Physical System (CPS), we need to model physical processes to determine the overall behavior of the system. Unfortunately, embedded systems simulators are not generally suitable to evaluate physical processes, and in the same way physical model simulators hardly capture the functionality of computing systems. In this work, we present a methodology to concurrently explore these aspects using the metroII design framework. The methodology provides guidelines for the implementation of these models in the design environment. To demonstrate the feasibility of the proposed approach, we applied the methodology to two case studies. A case study regards a binaural guidance system developed to be included into a smart rollator for older adults. The second case consists of an energy recovery device which gets energy from the heat dissipated by a high performance processor and power a smart sink able to provide cooling or to serve as a wireless sensing node. Keywords [Cyber-Physical Systems, Design Methodology, Binaural Synthesis, Energy Harvesting] Contents 1 Introduction 1 1.1 CPS Design Challenges . .1 1.2 Structure of the Thesis . .3 2 Design Methodology 5 2.1 State of the Art . .5 2.2 MetroII Design Framework .
    [Show full text]
  • Low-Power Ultra-Small Edge AI Accelerators for Image Recog- Nition with Convolution Neural Networks: Analysis and Future Directions
    Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 16 July 2021 doi:10.20944/preprints202107.0375.v1 Review Low-power Ultra-small Edge AI Accelerators for Image Recog- nition with Convolution Neural Networks: Analysis and Future Directions Weison Lin 1, *, Adewale Adetomi 1 and Tughrul Arslan 1 1 Institute for Integrated Micro and Nano Systems, University of Edinburgh, Edinburgh EH9 3FF, UK; [email protected]; [email protected] * Correspondence: [email protected] Abstract: Edge AI accelerators have been emerging as a solution for near customers’ applications in areas such as unmanned aerial vehicles (UAVs), image recognition sensors, wearable devices, ro- botics, and remote sensing satellites. These applications not only require meeting performance tar- gets but also meeting strict reliability and resilience constraints due to operations in harsh and hos- tile environments. Numerous research articles have been proposed, but not all of these include full specifications. Most of these tend to compare their architecture with other existing CPUs, GPUs, or other reference research. This implies that the performance results of the articles are not compre- hensive. Thus, this work lists the three key features in the specifications such as computation ability, power consumption, and the area size of prior art edge AI accelerators and the CGRA accelerators during the past few years to define and evaluate the low power ultra-small edge AI accelerators. We introduce the actual evaluation results showing the trend in edge AI accelerator design about key performance metrics to guide designers on the actual performance of existing edge AI acceler- ators’ capability and provide future design directions and trends for other applications with chal- lenging constraints.
    [Show full text]
  • Tegra Linux Driver Package
    TEGRA LINUX DRIVER PACKAGE RN_05071-R32 | March 18, 2019 Subject to Change 32.1 Release Notes RN_05071-R32 Table of Contents 1.0 About this Release ................................................................................... 3 1.1 Login Credentials ............................................................................................... 4 2.0 Known Issues .......................................................................................... 5 2.1 General System Usability ...................................................................................... 5 2.2 Boot .............................................................................................................. 6 2.3 Camera ........................................................................................................... 6 2.4 CUDA Samples .................................................................................................. 7 2.5 Multimedia ....................................................................................................... 7 3.0 Top Fixed Issues ...................................................................................... 9 3.1 General System Usability ...................................................................................... 9 3.2 Camera ........................................................................................................... 9 4.0 Documentation Corrections ..................................................................... 10 4.1 Adaptation and Bring-Up Guide ............................................................................
    [Show full text]
  • Introducing Slambench, a Performance and Accuracy Benchmarking Methodology for SLAM
    Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM Luigi Nardi1, Bruno Bodin2, M. Zeeshan Zia1, John Mawer3, Andy Nisbet3, Paul H. J. Kelly1, Andrew J. Davison1, Mikel Lujan´ 3, Michael F. P. O’Boyle2, Graham Riley3, Nigel Topham2 and Steve Furber3 Kernels Architectures Correctness Performance Metrics Frame rate Computer bilateralFilter (..) halfSampleRobust (..) Vision renderVolume (..) integrate (..) : Energy : Accuracy Compiler/Runtime Hardware ICL-NUIM Dataset Fig. 1: The SLAMBench framework makes it possible for experts coming from the computer vision, compiler, run-time, and hardware communities to cooperate in a unified way to tackle algorithmic and implementation alternatives. Abstract— Real-time dense computer vision and SLAM offer While classical point feature-based simultaneous locali- great potential for a new level of scene modelling, tracking sation and mapping (SLAM) techniques are now crossing and real environmental interaction for many types of robot, into mainstream products via embedded implementation in but their high computational requirements mean that use on mass market embedded platforms is challenging. Mean- projects like Project Tango [3] and Dyson 360 Eye [1], while, trends in low-cost, low-power processing are towards dense SLAM algorithms with their high computational re- massive parallelism and heterogeneity, making it difficult for quirements are largely at the prototype stage on PC or robotics and vision researchers to implement their algorithms laptop platforms. Meanwhile, there has been a great focus in a performance-portable way. In this paper we introduce in computer vision on developing benchmarks for accuracy SLAMBench, a publicly-available software framework which represents a starting point for quantitative, comparable and comparison, but not on analysing and characterising the validatable experimental research to investigate trade-offs in envelopes for performance and energy consumption.
    [Show full text]