From Hardware Accelerators to Algorithm Design

Total Page:16

File Type:pdf, Size:1020Kb

From Hardware Accelerators to Algorithm Design 1 Efficient Computing for AI and Robotics: From Hardware Accelerators to Algorithm Design Vivienne Sze ( @eems_mit) Massachusetts Institute of Technology In collaboration with Luca Carlone, Yu-Hsin Chen, Joel Emer, Sertac Karaman, Tushar Krishna, Peter Li, Fangchang Ma, Amr Suleiman, Diana Wofk, Nellie Wu, Tien-Ju Yang, Zhengdong Zhang Slides available at https://tinyurl.com/SzeMITDL2020 Vivienne Sze http://sze.mit.edu/ @eems_mit 2 Processing at “Edge” instead of the “Cloud” Communication Privacy Latency Vivienne Sze http://sze.mit.edu/ @eems_mit 3 Computing Challenge for Self-Driving Cars (Feb 2018) Cameras and radar generate ~6 gigabytes of data every 30 seconds. Self-driving car prototypes use approximately 2,500 Watts of computing power. Generates wasted heat and some prototypes need water-cooling! Vivienne Sze http://sze.mit.edu/ @eems_mit 4 Existing Processors Consume Too Much Power < 1 Watt > 10 Watts Vivienne Sze http://sze.mit.edu/ @eems_mit 5 Transistors Are Not Getting More Efficient Slowdown of Moore’s Law and Dennard Scaling General purpose microprocessors are not getting faster or more efficient Slowdown Need specialized / domain-specific hardware for significant improvements in speed and energy efficiency Vivienne Sze http://sze.mit.edu/ @eems_mit 6 Efficient Computing with Cross-Layer Design Algorithms Systems Architectures Circuits Vivienne Sze http://sze.mit.edu/ @eems_mit 7 Energy Dominated by Data Movement Operation: Energy Relative Energy Cost (pJ) 8b Add 0.03 16b Add 0.05 32b Add 0.1 16b FP Add 0.4 32b FP Add 0.9 Memory access is orders 8b Multiply 0.2 of magnitude higher energy than compute 32b Multiply 3.1 16b FP Multiply 1.1 32b FP Multiply 3.7 32b SRAM Read (8KB) 5 32b DRAM Read 640 1 10 102 103 104 Vivienne Sze http://sze.mit.edu/ @eems_mit [Horowitz, ISSCC 2014] 8 Autonomous Navigation Uses a Lot of Data Semantic Understanding Geometric Understanding • High frame rate • Growing map size • Large resolutions • Data expansion 2 million pixels 10x-100x more pixels Vivienne Sze http://sze.mit.edu/ @eems_mit [Pire, RAS 2017] 9 Visual-Inertial Localization Determines location/orientation of robot from images and IMU (also used by headset in Augmented Reality and Virtual Reality) … Localization Image sequence Visual-Inertial Odometry IMU (VIO) * Inertial Measurement Unit *Subset of SLAM algorithm (Simultaneous Localization And Mapping) Mapping Slide 28 Vivienne Sze http://sze.mit.edu/ @eems_mit 10 Localization at Under 25 mW First chip that performs complete Visual-Inertial Odometry Front-End for camera (Feature detection, tracking, and outlier elimination) Front-End for IMU (pre-integration of accelerometer and gyroscope data) Back-End Optimization of Pose Graph Navion Consumes 684× and 1582× less energy than mobile and desktop CPUs, respectively [Joint work with Sertac Karaman (AeroAstro)] Vivienne Sze http://sze.mit.edu/ @eems_mit [Zhang, RSS 2017], [Suleiman, VLSI-C 2018] 11 Key Methods to Reduce Data Size Navion: Fully integrated system – no off-chip processing or storage Vision Frontend (VFE) Backend (BE) Previous Current Backend Control Line Buffers Frame Frame Data & Control Bus Feature Undistort Undistort Feature Floating Build Graph Exploit Detection & Rectify & Rectify Tracking Point Graph (FD) (UR) (UR) (FT) Arithmetic Sparsity in Apply Low Linear Matrix Linearize Solver Graph and Cost Left Right Operations Horizon Frame Frame Cholesky Linear States Linear Solver Frame Solver Back Shared Sparse Stereo (SS) Compression Substitute Marginal Memory Vision Frontend Control Rodrigues Register Operations Retract File Data & Control Bus IMU Frontend (IFE) Fixed Point IMU RANSAC Point Cloud Floating Point Arithmetic Arithmetic Pre-Integration memory Use compression and exploit sparsity to reduce memory down to 854kB Navion Project Website: http://navion.mit.edu Vivienne Sze http://sze.mit.edu/ @eems_mit [Suleiman, VLSI-C 2018] Best Student Paper Award 12 Understanding the Environment Depth Estimation Semantic Segmentation State-of-the-art approaches use Deep Neural Networks, which require up to several hundred millions of operations and weights to compute! >100x more complex than video compression Vivienne Sze http://sze.mit.edu/ @eems_mit 13 Deep Neural Networks Deep Neural Networks (DNNs) have become a cornerstone of AI Computer Vision Speech Recognition Game Play Medical Vivienne Sze http://sze.mit.edu/ @eems_mit 14 Book on Efficient Processing of DNNs Part I Understanding Deep Neural Networks Introduction Overview of Deep Neural Networks Part II Design of Hardware for Processing DNNs Key Metrics and Design Objectives Kernel Computation Designing DNN Accelerators Operation Mapping on Specialized Hardware Part III Co-Design of DNN Hardware and Algorithms Reducing Precision Exploiting Sparsity Designing Efficient DNN Models Advanced Technologies https://tinyurl.com/EfficientDNNBook Vivienne Sze http://sze.mit.edu/ @eems_mit Free download for institutional subscribers 15 Weighted Sums Sigmoid Rectified Linear Unit (ReLU) Nonlinear ⎛ 3 ⎞ 1 1 Yj =Activationactivation⎜∑Wij × Xi ⎟ 0 Function ⎝ i=1 ⎠ 0 W11 Y1 y=1/(1+e-x) y=max(0,x) -1 -1 X1 -1 0 1 -1 0 1 Y2 Image source: Caffe tutorial X2 Key operation is Y3 multiply and accumulate (MAC) Accounts for > 90% of computation X3 Output Layer W34 Y4 Input Layer Hidden Layer Vivienne Sze http://sze.mit.edu/ @eems_mit 16 High-Dimensional Convolution in CNN input fmap C filter … … … output fmap C … … H … E R … … S W F Many Input Channels (C) Vivienne Sze http://sze.mit.edu/ @eems_mit 17 Define Shape for Each Layer Input fmaps Output fmaps Filters C … M … … Shape varies across layers C … … H E H – Height of input fmap (activations) R W – Width of input fmap (activations) 1 … … … 1 1 C – Number of 2-D input fmaps /filters S W F (channels) R – Height of 2-D filter (weights) … … … S – Width of 2-D filter (weights) M – Number of 2-D output fmaps (channels) C M C… … … … … … E – Height of output fmap (activations) … F – Width of output fmap (activations) R E N – Number of input fmaps/output fmaps M … H (batch size) N … S … N … F W Vivienne Sze http://sze.mit.edu/ @eems_mit 18 Popular DNN Models Metrics LeNet-5 AlexNet VGG-16 GoogLeNet ResNet-50 EfficientNet-B4 (v1) Top-5 error (ImageNet) n/a 16.4 7.4 6.7 5.3 3.7* Input Size 28x28 227x227 224x224 224x224 224x224 380x380 # of CONV Layers 2 5 16 21 (depth) 49 96 # of Weights 2.6k 2.3M 14.7M 6.0M 23.5M 14M # of MACs 283k 666M 15.3G 1.43G 3.86G 4.4G # of FC layers 2 3 3 1 1 65** # of Weights 58k 58.6M 124M 1M 2M 4.9M # of MACs 58k 58.6M 124M 1M 2M 4.9M Total Weights 60k 61M 138M 7M 25.5M 19M Total MACs 341k 724M 15.5G 1.43G 3.9G 4.4G Reference Lecun, Krizhevsky, Simonyan, Szegedy, He, Tan, PIEEE 1998 NeurIPS 2012 ICLR 2015 CVPR 2015 CVPR 2016 ICML 2019 DNN models getting larger and deeper * Does not include multi-crop and ensemble ** Increase in FC layers due to squeeze-and-excitation layers (much smaller than FC layers for classification) Vivienne Sze http://sze.mit.edu/ @eems_mit 19 Properties We Can Leverage • Operations exhibit high parallelism à high throughput possible • Memory Access is the Bottleneck Memory Read MAC* Memory Write filter weight ALU fmap act updated DRAM partial sum DRAM partial sum 200x 1x * multiply-and-accumulate Worst Case: all memory R/W are DRAM accesses • Example: AlexNet has 724M MACs à 2896M DRAM accesses required Vivienne Sze http://sze.mit.edu/ @eems_mit 20 Properties We Can Leverage • Operations exhibit high parallelism à high throughput possible • Input data reuse opportunities (up to 500x) Input Fmaps Filters Filter Input Fmap Input Fmap Filter 1 1 2 2 Convolutional Reuse Fmap Reuse Filter Reuse (Activations, Weights) (Activations) (Weights) CONV layers only CONV and FC layers CONV and FC layers (sliding window) (batch size > 1) Vivienne Sze http://sze.mit.edu/ @eems_mit 21 Exploit Data Reuse at Low-Cost Memories Specialized PE PE hardware with Global Reg 0.5File – 1.0 kB DRAM small (< 1kB) Buffer PE ALU fetch data to run low cost memory a MACControl here near compute Normalized Energy Cost* ALU 1× (Reference) 0.5 – 1.0 kB RF ALU 1× Farther and larger NoC: 200 – 1000 PEs PE ALU 2× memories consume more power 100 – 500 kB Buffer ALU 6× DRAM ALU 200× * measured from a commercial 65nm process Vivienne Sze http://sze.mit.edu/ @eems_mit 22 Weight Stationary (WS) Global Buffer Psum Activation W0 W1 W2 W3 W4 W5 W6 W7 PE Weight • Minimize weight read energy consumption − maximize convolutional and filter reuse of weights • Broadcast activations and accumulate partial sums spatially across the PE array • Examples: TPU [Jouppi, ISCA 2017], NVDLA Vivienne Sze http://sze.mit.edu/ @eems_mit [Chen, ISCA 2016] 23 Output Stationary (OS) Global Buffer Activation Weight P0 P1 P2 P3 P4 P5 P6 P7 PE Psum • Minimize partial sum R/W energy consumption − maximize local accumulation • Broadcast/Multicast filter weights and reuse activations spatially across the PE array • Examples: [Moons, VLSI 2016], [Thinker, VLSI 2017] Vivienne Sze http://sze.mit.edu/ @eems_mit [Chen, ISCA 2016] 24 Row Stationary Dataflow Row 1 PE 1 Row 1 * Row 1 • Maximize row convolutional reuse in RF − Keep a filter row and fmap sliding window in RF • Maximize row psum accumulation in RF * = Vivienne Sze http://sze.mit.edu/ @eems_mit [Chen, ISCA 2016] Select for Micro Top Picks 25 Row Stationary Dataflow Row 1 Row 2 Row 3 PE 1 PE 4 PE 7 Row 1 * Row 1 Row 1 * Row 2 Row 1 * Row 3 PE 2 PE 5 PE 8 Row 2 * Row 2 Row 2 * Row 3 Row 2 * Row 4 PE 3 PE 6 PE 9 Row 3 * Row 3 Row 3 * Row 4 Row 3 * Row 5 = = = * Optimize for overall* energy efficiency
Recommended publications
  • Hardware Architecture
    Hardware Architecture Components Computing Infrastructure Components Servers Clients LAN & WLAN Internet Connectivity Computation Software Storage Backup Integration is the Key ! Security Data Network Management Computer Today’s Computer Computer Model: Von Neumann Architecture Computer Model Input: keyboard, mouse, scanner, punch cards Processing: CPU executes the computer program Output: monitor, printer, fax machine Storage: hard drive, optical media, diskettes, magnetic tape Von Neumann architecture - Wiki Article (15 min YouTube Video) Components Computer Components Components Computer Components CPU Memory Hard Disk Mother Board CD/DVD Drives Adaptors Power Supply Display Keyboard Mouse Network Interface I/O ports CPU CPU CPU – Central Processing Unit (Microprocessor) consists of three parts: Control Unit • Execute programs/instructions: the machine language • Move data from one memory location to another • Communicate between other parts of a PC Arithmetic Logic Unit • Arithmetic operations: add, subtract, multiply, divide • Logic operations: and, or, xor • Floating point operations: real number manipulation Registers CPU Processor Architecture See How the CPU Works In One Lesson (20 min YouTube Video) CPU CPU CPU speed is influenced by several factors: Chip Manufacturing Technology: nm (2002: 130 nm, 2004: 90nm, 2006: 65 nm, 2008: 45nm, 2010:32nm, Latest is 22nm) Clock speed: Gigahertz (Typical : 2 – 3 GHz, Maximum 5.5 GHz) Front Side Bus: MHz (Typical: 1333MHz , 1666MHz) Word size : 32-bit or 64-bit word sizes Cache: Level 1 (64 KB per core), Level 2 (256 KB per core) caches on die. Now Level 3 (2 MB to 8 MB shared) cache also on die Instruction set size: X86 (CISC), RISC Microarchitecture: CPU Internal Architecture (Ivy Bridge, Haswell) Single Core/Multi Core Multi Threading Hyper Threading vs.
    [Show full text]
  • Efficient Processing of Deep Neural Networks
    Efficient Processing of Deep Neural Networks Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, Joel Emer Massachusetts Institute of Technology Reference: V. Sze, Y.-H.Chen, T.-J. Yang, J. S. Emer, ”Efficient Processing of Deep Neural Networks,” Synthesis Lectures on Computer Architecture, Morgan & Claypool Publishers, 2020 For book updates, sign up for mailing list at http://mailman.mit.edu/mailman/listinfo/eems-news June 15, 2020 Abstract This book provides a structured treatment of the key principles and techniques for enabling efficient process- ing of deep neural networks (DNNs). DNNs are currently widely used for many artificial intelligence (AI) applications, including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the- art accuracy on many AI tasks, it comes at the cost of high computational complexity. Therefore, techniques that enable efficient processing of deep neural networks to improve key metrics—such as energy-efficiency, throughput, and latency—without sacrificing accuracy or increasing hardware costs are critical to enabling the wide deployment of DNNs in AI systems. The book includes background on DNN processing; a description and taxonomy of hardware architectural approaches for designing DNN accelerators; key metrics for evaluating and comparing different designs; features of DNN processing that are amenable to hardware/algorithm co-design to improve energy efficiency and throughput; and opportunities for applying new technologies. Readers will find a structured introduction to the field as well as formalization and organization of key concepts from contemporary work that provide insights that may spark new ideas. 1 Contents Preface 9 I Understanding Deep Neural Networks 13 1 Introduction 14 1.1 Background on Deep Neural Networks .
    [Show full text]
  • Architectural Adaptation for Application-Specific Locality
    Architectural Adaptation for Application-Specific Locality Optimizations y z Xingbin Zhang Ali Dasdan Martin Schulz Rajesh K. Gupta Andrew A. Chien Department of Computer Science yInstitut f¨ur Informatik University of Illinois at Urbana-Champaign Technische Universit¨at M¨unchen g fzhang,dasdan,achien @cs.uiuc.edu [email protected] zInformation and Computer Science, University of California at Irvine [email protected] Abstract without repartitioning hardware and software functionality and reimplementing the co-processing hardware. This re- We propose a machine architecture that integrates pro- targetability problem is an obstacle toward exploiting pro- grammable logic into key components of the system with grammable logic for general purpose computing. the goal of customizing architectural mechanisms and poli- cies to match an application. This approach presents We propose a machine architecture that integrates pro- an improvement over traditional approach of exploiting grammable logic into key components of the system with programmable logic as a separate co-processor by pre- the goal of customizing architectural mechanisms and poli- serving machine usability through software and over tra- cies to match an application. We base our design on the ditional computer architecture by providing application- premise that communication is already critical and getting specific hardware assists. We present two case studies of increasingly so [17], and flexible interconnects can be used architectural customization to enhance latency tolerance to replace static wires at competitive performance [6, 9, 20]. and efficiently utilize network bisection on multiproces- Our approach presents an improvement over co-processing sors for sparse matrix computations. We demonstrate that by preserving machine usability through software and over application-specific hardware assists and policies can pro- traditional computer architecture by providing application- vide substantial improvements in performance on a per ap- specific hardware assists.
    [Show full text]
  • Development of a Predictable Hardware Architecture Template and Integration Into an Automated System Design Flow
    Development Secure of Reprogramming a Predictable Hardware of Architecturea Network Template Connected and Device Integration into an Automated System Design Flow Securing programmable logic controllers MARCUS MIKULCAK MUSSIE TESFAYE KTH Information and Communication Technology Masters’ Degree Project Second level, 30.0 HEC Stockholm, Sweden June 2013 TRITA-ICT-EX-2013:138Degree project in Communication Systems Second level, 30.0 HEC Stockholm, Sweden Secure Reprogramming of a Network Connected Device KTH ROYAL INSTITUTE OF TECHNOLOGY Securing programmable logic controllers SCHOOL OF INFORMATION AND COMMUNICATION TECHNOLOGY ELECTRONIC SYSTEMS MUSSIE TESFAYE Development of a Predictable Hardware Architecture TemplateKTH Information and Communication Technology and Integration into an Automated System Design Flow Master of Science Thesis in System-on-Chip Design Stockholm, June 2013 TRITA-ICT-EX-2013:138 Author: Examiner: Marcus Mikulcak Assoc. Prof. Ingo Sander Supervisor: Seyed Hosein Attarzadeh Niaki Degree project in Communication Systems Second level, 30.0 HEC Stockholm, Sweden Abstract The requirements of safety-critical real-time embedded systems pose unique challenges on their design process which cannot be fulfilled with traditional development methods. To ensure their correct timing and functionality, it has been suggested to move the design process to a higher abstraction level, which opens the possibility to utilize automated correct-by-design development flows from a functional specification of the system down to the level of Multiprocessor Systems-on-Chip. ForSyDe, an embedded system design methodology, presents a flow of this kind by basing system development on the theory of Models of Computation and side-effect-free processes, making it possible to separate the timing analysis of computation and communication of process networks.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Integrated Circuit Technology for Wireless Communications: an Overview Babak Daneshrad Integrated Circuits and Systems Laboratory UCLA Electrical Engineering Dept
    Integrated Circuit Technology for Wireless Communications: An Overview Babak Daneshrad Integrated Circuits and Systems Laboratory UCLA Electrical Engineering Dept. email: [email protected] http://www.ee.ucla.edu/~babak Abstract: Figure 1 shows a block diagram of a typical wireless com- This paper provides a brief overview of present trends in the develop- munication system. Moreover it shows the partitioning of the ment of integrated circuit technology for applications in the wireless receiver into RF, IF, baseband analog and digital components. communications industry. Through advanced circuit, architectural and In this paper we will focus on two main classes of integrated processing technologies, ICs have helped bring about the wireless revo- circuit technologies. The first is ICs for analog and more lution by enabling highly sophisticated, low cost and portable end user importantly RF communications. In general the technologist as terminals. In this paper two broad categories of circuits are highlighted. The first is RF integrated circuits and the second is digital baseband well as the circuit designer’s challenge here is to first, make the processing circuits. In both areas, the paper presents the circuit design transistors operate at higher carrier frequencies, and second, to challenges and options presented to the designer. It also highlights the integrate as many of the components required in the receiver manner in which these technologies have helped advance wireless com- onto the IC. The second class of circuits that we will focus on munications. are digital baseband circuits. Where increased density and I. Introduction reduced power consumption are the key factors for optimiza- tion.
    [Show full text]
  • Hybrid Pipeline Hardware Architecture Based on Error Detection and Correction for AES
    sensors Article Hybrid Pipeline Hardware Architecture Based on Error Detection and Correction for AES Ignacio Algredo-Badillo 1,† , Kelsey A. Ramírez-Gutiérrez 1,† , Luis Alberto Morales-Rosales 2,* , Daniel Pacheco Bautista 3,† and Claudia Feregrino-Uribe 4,† 1 CONACYT-Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla 72840, Mexico; [email protected] (I.A.-B.); [email protected] (K.A.R.-G.) 2 Faculty of Civil Engineering, CONACYT-Universidad Michoacana de San Nicolás de Hidalgo, Morelia 58000, Mexico 3 Departamento de Ingeniería en Computación, Universidad del Istmo, Campus Tehuantepec, Oaxaca 70760, Mexico; [email protected] 4 Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla 72840, Mexico; [email protected] * Correspondence: [email protected] † These authors contributed equally to this work. Abstract: Currently, cryptographic algorithms are widely applied to communications systems to guarantee data security. For instance, in an emerging automotive environment where connectivity is a core part of autonomous and connected cars, it is essential to guarantee secure communications both inside and outside the vehicle. The AES algorithm has been widely applied to protect communications in onboard networks and outside the vehicle. Hardware implementations use techniques such as iterative, parallel, unrolled, and pipeline architectures. Nevertheless, the use of AES does not Citation: Algredo-Badillo, I.; guarantee secure communication, because previous works have proved that implementations of secret Ramírez-Gutiérrez, K.A.; key cryptosystems, such as AES, in hardware are sensitive to differential fault analysis. Moreover, Morales-Rosales, L.A.; Pacheco it has been demonstrated that even a single fault during encryption or decryption could cause Bautista, D.; Feregrino-Uribe, C.
    [Show full text]
  • Extensible Hardware Architecture for Mobile Robots
    Extensible Hardware Architecture for Mobile Robots Eric Park∗, Linda Kobayashi, and Susan Y. Lee∗ Intelligent Robotics Group NASA Ames Research Center Moffett Field, CA 94035-1000, USA {epark, lkobayashi, sylee}@arc.nasa.gov Abstract— The Intelligent Robotics Group at NASA Ames Research Center has developed a new mobile robot hardware architecture designed for extensibility and reconfigurability. Currently implemented on the K9 rover, and soon to be integrated onto the K10 series of human-robot collaboration research robots, this achitecture allows for rapid changes in instrumentation configuration and provides a high degree of modularity through a synergistic mix of off-the-shelf and custom designed components, allowing eased transplantation into a wide variety of mobile robot platforms. A component level overview of this architecture is presented along with a description of the changes required for implementation on K10, followed by plans for future work. Index Terms— modular, extensible, hardware architecture, mobile robot, k9 INTRODUCTION Fig. 1. K9 rover Mobile robots used for research and development are in increasing demand as the application space for robotics K9 Overview widens. With this demand comes the need for a smarter way to produce reliable mobile robots with a proven set The K9 rover is a 6-wheel steer, 6-wheel drive rocker- of avionics. Time and money lost in developing custom bogey chassis outfitted with electronics and instruments architectures for each new robot makes it increasingly appropriate for supporting research relevant to remote important to develop extensible hardware architectures that exploration [2] [3] [4]. K9’s overall dimensions are 1.05m can expand and accommodate rapid changes in configu- long by 0.85m wide by 1.6m high, with a mass of ap- ration, including transfer to an entirely different mobile proximately 65 kg and a top speed of approximately 6 platform [1].
    [Show full text]
  • Ultra-Low-Power Design and Implementation of Application-Specific Instruction-Set Processors for Ubiquitous Sensing and Computing
    Ultra-low-power Design and Implementation of Application-specific Instruction-set Processors for Ubiquitous Sensing and Computing NING MA Doctoral Thesis in Electronic and Computer Systems Stockholm, Sweden 2015 TRITA-ICT 2015:11 KTH School of Information and ISSN 1653-6363 Communication Technology ISRN KTH/ICT-2015/11-SE SE-164 40 Kista, Stockholm ISBN 978-91-7595-692-3 SWEDEN Akademisk avhandling som med tillstånd av Kungl Tekniska högskolan framlägges till offentlig granskning för avläggande av teknologie doktorsexamen i Elektronik och Datorsystem onsdag den 4 november 2015 klockan 10.00 i Sal B, Electrum, Kungl Tekniska högskolan, Kista 164 40, Stockholm. © Ning Ma, September 2015 Tryck: Universitetsservice US AB iii Abstract The feature size of transistors keeps shrinking with the development of technology, which enables ubiquitous sensing and computing. However, with the break down of Dennard scaling caused by the difficulties for further lower- ing supply voltage, the power density increases significantly. The consequence is that, for a given power budget, the energy efficiency must be improved for hardware resources to maximize the performance. Application-specific inte- grated circuits (ASICs) obtain high energy efficiency at the cost of low flexi- bility for various applications, while general-purpose processors (GPPs) gain generality at the expense of efficiency. To provide both high energy efficiency and flexibility, this dissertation ex- plores the ultra-low-power design of application-specific instruction-set pro- cessors (ASIP) for ubiquitous sensing and computing. Two application sce- narios, i.e. high-throughput compute-intensive processing for multimedia and low-throughput low-cost processing for Internet of Things (IoT) are imple- mented in the proposed ASIPs.
    [Show full text]
  • Foundations for the Study of Software Architecture
    ACM SIGSOFT SOFTWARE ENGINEERING NOTES vol 17 no 4 Oct 1992 Page 40 Foundations for the Study of Software Architecture Alexander L. Wolf Dewayne E. Perry Department of Computer Science AT&T Bell Lab oratories University of Colorado 600 Mountain Avenue Boulder, Colorado 80309 Murray Hill, New Jersey 07974 [email protected] [email protected] c 1989,1991,1992 Dewayne E. Perry and Alexander L. Wolf ABSTRACT more toward integrating designs and the design pro cess into the broader context of the software pro cess and its management. One result of this integration was that The purp ose of this pap er is to build the foundation many of the notations and techniques develop ed for soft- for software architecture. We rst develop an intuition ware design have b een absorb ed by implementation lan- for software architecture by app ealing to several well- guages. Consider, for example, the concept of supp ort- established architectural disciplines. On the basis of ing \programmming-in-the-large". This integration has this intuition, we present a mo del of software architec- tended to blur, if not confuse, the distinction b etween ture that consists of three comp onents: elements, form, design and implementation. and rationale. Elements are either pro cessing, data, or The 1980s also saw great advances in our abilityto connecting elements. Form is de ned in terms of the describ e and analyze software systems. We refer here to prop erties of, and the relationships among, the elements | that is, the constraints on the elements. The ratio- such things as formal descriptive techniques and sophis- nale provides the underlying basis for the architecture in ticated notions of typing that enable us to reason more terms of the system constraints, which most often derive e ectively ab out software systems.
    [Show full text]
  • 1. SYSTEM ARCHITECTURE 1.1 System Hardware Architecture
    1. SYSTEM ARCHITECTURE 1.1 System Hardware Architecture This part represents a potential hardware architecture as a series view or deployment view. These are view base on the overall architectures of our hardware system and organization. Also, the system requirements are specified to hardware components. 1.1.1 List of Hardware components ● Ultrasonic sensor HC-SR04 ● Arduino Board MCU ESP8266 ● Monitor ● Keyboard/Mouse ● Wireless Router ● Computer CPU ● GPS/GPRS Module 1.1.2 Diagram showing the connectivity between the components We represent into the deployment view of the architecture. To showing the connectivity between the components. Describes the physical components (hardware), their distribution (e.g., a web server, database server) and how the different pieces are connected. So, we choose the deployment diagram to represent deployment view. The reason we choose this approach is the simplest model and suited for a proof of concept as shown in figure 1. Figure 1: Deployment Diagram Install the Ultrasonic sensor inside the waste container on its cover. The Ultrasonic sensor will measure the level of garbage inside the waste container and send the digital signal to centralized server by wireless cellular network. The centralized server will control all the process in a central location. System Administrator can manage all of it at the desktops at the waste station. Garbage truck driver can also check status of each waste container by their devices or Personal Computer at the waste station. 1.2 System Software Architecture This part represents a potential software architecture as a series views; use- case view. These are view based on the overall architecture of our software system and organization.
    [Show full text]
  • Method of Top-Level Design for Automated Test Systems
    Advances in Engineering Research, volume 127 3rd International Conference on Electrical, Automation and Mechanical Engineering (EAME 2018) Method of Top-level Design for Automated Test Systems Zhenjie Zeng1, Xiaofei Zhu1,*, Shiju Qi1, Kai Wu2 and Xiaowei Shen1 1Rocket Force University of Engineering, Xi’an, China 2Troops No. 96604, Beijing, China *Corresponding author Abstract—When designing an automatic test system, it is The top-level design of automatic test system integration is necessary to make each electronic test device conform to different based on sufficient requirements analysis, and comprehensively test requirements. The most important issue is the system top- considers the optimal matching of technical and economic level design. The article starts with the three steps of the top-level performances. It is advanced, practical, open, real-time, design: system requirements analysis, architecture selection and universal (compatibility), and reliability. , maintainability and analysis, and test equipment configuration. It describes in detail other aspects of a comprehensive analysis, determine the test how to develop the top-level system design efficiently and system architecture (including hardware platforms and software reasonably when developing automated test systems. The platforms), develop a corresponding test program. As shown in principles, available method techniques, and precautions have Figure 1, it is usually divided into three steps: requirements some guiding significance for the top-level design of automated analysis,
    [Show full text]