Design and Soc Implementation of a Low Cost Smart Home Energy Management System Trung Kien Nguyen

Total Page:16

File Type:pdf, Size:1020Kb

Design and Soc Implementation of a Low Cost Smart Home Energy Management System Trung Kien Nguyen Design and soc implementation of a low cost smart home energy management system Trung Kien Nguyen To cite this version: Trung Kien Nguyen. Design and soc implementation of a low cost smart home energy management system. Other. Université Nice Sophia Antipolis, 2015. English. NNT : 2015NICE4127. tel- 01674170 HAL Id: tel-01674170 https://tel.archives-ouvertes.fr/tel-01674170 Submitted on 2 Jan 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. UNIVERSITÉ NICE SOPHIA ANTIPOLIS POLYTECH’NICE-SOPHIA École Doctorale des Sciences et Technologies de l’Information et de la Communication Electronique pour Objets Connectés THESE Pour obtenir le titre de Docteur en Sciences spécialité Electronique de l’Université Nice Sophia Antipolis présentée et soutenue par Kien Trung NGUYEN Conception et réalisation d’un système de gestion intelligente de la consommation électrique domestique Thèse dirigée par Gilles JACQUEMOD Soutenance prévue le 11 Décembre 2015 Jury : P. GARDA Rapporteur Professeur, Université Pierre et Marie Curie Paris S. WEBER Rapporteur Professeur, Université de Lorraine Nancy G. JACQUEMOD Directeur Professeur, UNS Sophia Antipolis E. DEKNEUVEL Co-Encadrant Maître de Conférences, UNS Sophia Antipolis L. HEBRARD Examinateur Professeur, Université de Strasbourg B. NICOLLE Examinateur Ingénieur, Qualisteo Nice O. PARSON Examinateur Associate Professor, University of Southampton Table of Contents Chapter 1. INTRODUCTION ................................................................................................... 1 1.1. Motivation ....................................................................................................................... 3 1.2. NIALM Technology and Applications ............................................................................ 5 1.2.1 Introduction ............................................................................................................... 5 1.2.2. State of the art ........................................................................................................... 6 1.2.3. Applications of NIALM ........................................................................................... 9 1.3. Electrical Signatures ...................................................................................................... 10 1.3.1. Parallel RLC model ................................................................................................ 11 1.3.2. Steady-state signatures ........................................................................................... 12 1.3.3. Transition-state signatures ...................................................................................... 14 1.4. The trend of NIALM technology .................................................................................. 15 1.5. Thesis contributions ...................................................................................................... 17 1.5.1. Context ................................................................................................................... 17 1.5.2. A real-time innovative NIALM proposal ............................................................... 17 1.5.3. A HW SW co-development methodology for rapid prototype .............................. 18 1.6. Thesis Organization ....................................................................................................... 19 Chapter 2. SYSTEM MODELING FOR EMBEDDED SYSTEM ......................................... 21 2.1. SoC, SoPC and FPGA ................................................................................................... 23 2.2. System development of SoC ......................................................................................... 25 2.2.1. Algorithm optimization .......................................................................................... 25 2.2.2. SoC design flow ..................................................................................................... 27 2.2.3. SoC development approaches ................................................................................ 29 2.3. Model of Computation .................................................................................................. 35 2.3.1. Finite State Machine ............................................................................................... 36 2.3.2. StateChart ............................................................................................................... 38 2.3.3. Dataflow modeling ................................................................................................. 40 2.3.4. Kahn Process Network ........................................................................................... 41 2.3.5. Synchronous Data Flow ......................................................................................... 42 2.3.6. Structured Data Flow .............................................................................................. 44 2.3.7. Reactive Process Network ...................................................................................... 45 2.4. Languages and development tools................................................................................. 47 2.4.1. System design tools ................................................................................................ 47 2.4.2. Model-based design tools ....................................................................................... 48 2.4.3. Architecture design tools ........................................................................................ 51 2.4.4. RTL design tools .................................................................................................... 54 2.5. HW SW codevelopment approach for rapid prototyping .............................................. 56 2.5.1. Modeling executable specification of RPN system ................................................ 59 2.5.2. Architecture exploration ......................................................................................... 64 2.5.3. Hardware Software co-development ...................................................................... 66 2.6. Conclusion ..................................................................................................................... 67 Chapter 3. APPLICATION MODEL FOR A REAL-TIME NIALM SYSTEM ..................... 69 3.1. Activity model of system in dataflow ........................................................................... 71 3.1.1. System requirements .............................................................................................. 71 3.1.2. Entity analysis and modeling ................................................................................. 72 3.1.3. Activity model of system ....................................................................................... 73 3.2. Electrical signatures extraction: Event-based approach ................................................ 76 3.2.1. Power signatures ..................................................................................................... 78 3.2.2. Shape of transitions signatures ............................................................................... 85 3.2.3. Harmonic signatures ............................................................................................... 86 3.2.4. Early application classification .............................................................................. 89 3.3. CUSUM - An online Event Detection ........................................................................... 91 3.4. Genetic Algorithm-based power Disaggregation .......................................................... 96 3.4.1 Sequential clustering K-mean ................................................................................. 98 3.4.2. Genetic Algorithm .................................................................................................. 99 3.5. Conclusion ................................................................................................................... 101 Chapter 4. SOC IMPLEMENTATION OF NIALM SYSTEM ............................................. 103 4.1. The Zynq-7000 platform ............................................................................................. 105 4.2. Executable specification .............................................................................................. 107 4.2.1. Modeling Virtual Appliances ............................................................................... 107 4.2.2. Modeling the NIALM process ............................................................................. 108 4.2.3. Modeling Control Logic ....................................................................................... 110 4.2.4 Disaggregation functional validation .................................................................... 111 4.3. FPGA development approaches .................................................................................. 113 4.4.
Recommended publications
  • Lecture 7: Synchronous Sequential Logic
    Systems I: Computer Organization and Architecture Lecture 7: Synchronous Sequential Logic Synchronous Sequential Logic • The digital circuits that we have looked at so far are combinational, depending only on the inputs. In practice, most systems have many components that contain memory elements, which require sequential logic. • A sequential circuit contains both a combinational circuit and memory elements, whose output also serves as an input for the combinational circuit. • The binary data stored in the memory elements define the state of the sequential circuit and help determine the conditions for changing the state in the memory elements. Inputs Combinational Outputs circuit Memory elements Sequential Circuits: Synchronous and Asynchronous • There are two types of sequential circuits, classified by their signal’s timing: – Synchronous sequential circuits have behavior that is defined from knowledge of its signal at discrete instants of time. – The behavior of asynchronous sequential circuits depends on the order in which input signals change and are affected at any instant of time. Synchronous Sequential Circuits • Synchronous sequential circuits need to use signal that affect memory elements at discrete time instants. • Synchronization is achieved by a timing device called a master-clock generator which generates a periodic train of clock pulses. Basic Flip-Flop Circuit Using NOR Gates 1 R(reset) Q 0 Set state Clear state 1 Q’ 0 S(set) S R Q Q’ 1 0 1 0 0 0 1 0 (after S = 1, R = 0) 0 1 0 1 0 0 0 1 (after S = 0, R = 1) 1 1 0 0 Basic
    [Show full text]
  • ASIC Implemented Microblaze-Based Coprocessor for Data Stream
    ASIC-IMPLEMENTED MICROBLAZE-BASED COPROCESSOR FOR DATA STREAM MANAGEMENT SYSTEMS A Thesis Submitted to the Faculty of Purdue University by Linknath Surya Balasubramanian In Partial Fulfillment of the Requirements for the Degree of Master of Science in Electrical and Computer Engineering May 2020 Purdue University Indianapolis, Indiana ii THE PURDUE UNIVERSITY GRADUATE SCHOOL STATEMENT OF THESIS APPROVAL Dr. John J. Lee, Chair Department of Electrical and Computer Engineering Dr. Lauren A. Christopher Department of Electrical and Computer Engineering Dr. Maher E. Rizkalla Department of Electrical and Computer Engineering Approved by: Dr. Brian King Head of Graduate Program iii ACKNOWLEDGMENTS I would first like to express my gratitude to my advisor Dr. John J. Lee and my thesis committee members Dr. Lauren A. Christopher and Dr. Maher E. Rizkalla for their patience, guidance, and support during this journey. I would also like to thank Mrs. Sherrie Tucker for her patience, help, and encouragement. Lastly, I must thank Dr. Pranav Vaidya and Mr. Tareq S. Alqaisi for all their support, technical guidance, and advice. Thank you all for taking time and helping me complete this study. iv TABLE OF CONTENTS Page LIST OF TABLES :::::::::::::::::::::::::::::::::: vi LIST OF FIGURES ::::::::::::::::::::::::::::::::: vii ABSTRACT ::::::::::::::::::::::::::::::::::::: ix 1 INTRODUCTION :::::::::::::::::::::::::::::::: 1 1.1 Previous Work ::::::::::::::::::::::::::::::: 1 1.2 Motivation :::::::::::::::::::::::::::::::::: 2 1.3 Thesis Outline ::::::::::::::::::::::::::::::::
    [Show full text]
  • Synthesis and Verification of Digital Circuits Using Functional Simulation and Boolean Satisfiability
    Synthesis and Verification of Digital Circuits using Functional Simulation and Boolean Satisfiability by Stephen M. Plaza A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University of Michigan 2008 Doctoral Committee: Associate Professor Igor L. Markov, Co-Chair Assistant Professor Valeria M. Bertacco, Co-Chair Professor John P. Hayes Professor Karem A. Sakallah Associate Professor Dennis M. Sylvester Stephen M. Plaza 2008 c All Rights Reserved To my family, friends, and country ii ACKNOWLEDGEMENTS I would like to thank my advisers, Professor Igor Markov and Professor Valeria Bertacco, for inspiring me to consider various fields of research and providing feedback on my projects and papers. I also want to thank my defense committee for their comments and in- sights: Professor John Hayes, Professor Karem Sakallah, and Professor Dennis Sylvester. I would like to thank Professor David Kieras for enhancing my knowledge and apprecia- tion for computer programming and providing invaluable advice. Over the years, I have been fortunate to know and work with several wonderful stu- dents. I have collaborated extensively with Kai-hui Chang and Smita Krishnaswamy and have enjoyed numerous research discussions with them and have benefited from their in- sights. I would like to thank Ian Kountanis and Zaher Andraus for our many fun discus- sions on parallel SAT. I also appreciate the time spent collaborating with Kypros Constan- tinides and Jason Blome. Although I have not formally collaborated with Ilya Wagner, I have enjoyed numerous discussions with him during my doctoral studies. I also thank my office mates Jarrod Roy, Jin Hu, and Hector Garcia.
    [Show full text]
  • A Logic Synthesis Toolbox for Reducing the Multiplicative Complexity in Logic Networks
    A Logic Synthesis Toolbox for Reducing the Multiplicative Complexity in Logic Networks Eleonora Testa∗, Mathias Soekeny, Heinz Riener∗, Luca Amaruz and Giovanni De Micheli∗ ∗Integrated Systems Laboratory, EPFL, Lausanne, Switzerland yMicrosoft, Switzerland zSynopsys Inc., Design Group, Sunnyvale, California, USA Abstract—Logic synthesis is a fundamental step in the real- correlates to the resistance of the function against algebraic ization of modern integrated circuits. It has traditionally been attacks [10], while the multiplicative complexity of a logic employed for the optimization of CMOS-based designs, as well network implementing that function only provides an upper as for emerging technologies and quantum computing. Recently, bound. Consequently, minimizing the multiplicative complexity it found application in minimizing the number of AND gates in of a network is important to assess the real multiplicative cryptography benchmarks represented as xor-and graphs (XAGs). complexity of the function, and therefore its vulnerability. The number of AND gates in an XAG, which is called the logic net- work’s multiplicative complexity, plays a critical role in various Second, the number of AND gates plays an important role cryptography and security protocols such as fully homomorphic in high-level cryptography protocols such as zero-knowledge encryption (FHE) and secure multi-party computation (MPC). protocols, fully homomorphic encryption (FHE), and secure Further, the number of AND gates is also important to assess multi-party computation (MPC) [11], [12], [6]. For example, the the degree of vulnerability of a Boolean function, and influences size of the signature in post-quantum zero-knowledge signatures the cost of techniques to protect against side-channel attacks.
    [Show full text]
  • Logic Optimization and Synthesis: Trends and Directions in Industry
    Logic Optimization and Synthesis: Trends and Directions in Industry Luca Amaru´∗, Patrick Vuillod†, Jiong Luo∗, Janet Olson∗ ∗ Synopsys Inc., Design Group, Sunnyvale, California, USA † Synopsys Inc., Design Group, Grenoble, France Abstract—Logic synthesis is a key design step which optimizes of specific logic styles and cell layouts. Embedding as much abstract circuit representations and links them to technology. technology information as possible early in the logic optimiza- With CMOS technology moving into the deep nanometer regime, tion engine is key to make advantageous logic restructuring logic synthesis needs to be aware of physical informations early in the flow. With the rise of enhanced functionality nanodevices, opportunities carry over at the end of the design flow. research on technology needs the help of logic synthesis to capture In this paper, we examine the synergy between logic synthe- advantageous design opportunities. This paper deals with the syn- sis and technology, from an industrial perspective. We present ergy between logic synthesis and technology, from an industrial technology aware synthesis methods incorporating advanced perspective. First, we present new synthesis techniques which physical information at the core optimization engine. Internal embed detailed physical informations at the core optimization engine. Experiments show improved Quality of Results (QoR) and results evidence faster timing closure and better correlation better correlation between RTL synthesis and physical implemen- between RTL synthesis and physical implementation. We elab- tation. Second, we discuss the application of these new synthesis orate on synthesis aware technology development, where logic techniques in the early assessment of emerging nanodevices with synthesis enables a fair system-level assessment on emerging enhanced functionality.
    [Show full text]
  • Designing a RISC CPU in Reversible Logic
    Designing a RISC CPU in Reversible Logic Robert Wille Mathias Soeken Daniel Große Eleonora Schonborn¨ Rolf Drechsler Institute of Computer Science, University of Bremen, 28359 Bremen, Germany frwille,msoeken,grosse,eleonora,[email protected] Abstract—Driven by its promising applications, reversible logic In this paper, the recent progress in the field of reversible cir- received significant attention. As a result, an impressive progress cuit design is employed in order to design a complex system, has been made in the development of synthesis approaches, i.e. a RISC CPU composed of reversible gates. Starting from implementation of sequential elements, and hardware description languages. In this paper, these recent achievements are employed a textual specification, first the core components of the CPU in order to design a RISC CPU in reversible logic that can are identified. Previously introduced approaches are applied execute software programs written in an assembler language. The next to realize the respective combinational and sequential respective combinational and sequential components are designed elements. More precisely, the combinational components are using state-of-the-art design techniques. designed using the reversible hardware description language SyReC [17], whereas for the realization of the sequential I. INTRODUCTION elements an external controller (as suggested in [16]) is utilized. With increasing miniaturization of integrated circuits, the Plugging the respective components together, a CPU design reduction of power dissipation has become a crucial issue in results which can process software programs written in an today’s hardware design process. While due to high integration assembler language. This is demonstrated in a case study, density and new fabrication processes, energy loss has sig- where the execution of a program determining Fibonacci nificantly been reduced over the last decades, physical limits numbers is simulated.
    [Show full text]
  • Logic Synthesis Meets Machine Learning: Trading Exactness for Generalization
    Logic Synthesis Meets Machine Learning: Trading Exactness for Generalization Shubham Raif,6,y, Walter Lau Neton,10,y, Yukio Miyasakao,1, Xinpei Zhanga,1, Mingfei Yua,1, Qingyang Yia,1, Masahiro Fujitaa,1, Guilherme B. Manskeb,2, Matheus F. Pontesb,2, Leomar S. da Rosa Juniorb,2, Marilton S. de Aguiarb,2, Paulo F. Butzene,2, Po-Chun Chienc,3, Yu-Shan Huangc,3, Hoa-Ren Wangc,3, Jie-Hong R. Jiangc,3, Jiaqi Gud,4, Zheng Zhaod,4, Zixuan Jiangd,4, David Z. Pand,4, Brunno A. de Abreue,5,9, Isac de Souza Camposm,5,9, Augusto Berndtm,5,9, Cristina Meinhardtm,5,9, Jonata T. Carvalhom,5,9, Mateus Grellertm,5,9, Sergio Bampie,5, Aditya Lohanaf,6, Akash Kumarf,6, Wei Zengj,7, Azadeh Davoodij,7, Rasit O. Topalogluk,7, Yuan Zhoul,8, Jordan Dotzell,8, Yichi Zhangl,8, Hanyu Wangl,8, Zhiru Zhangl,8, Valerio Tenacen,10, Pierre-Emmanuel Gaillardonn,10, Alan Mishchenkoo,y, and Satrajit Chatterjeep,y aUniversity of Tokyo, Japan, bUniversidade Federal de Pelotas, Brazil, cNational Taiwan University, Taiwan, dUniversity of Texas at Austin, USA, eUniversidade Federal do Rio Grande do Sul, Brazil, fTechnische Universitaet Dresden, Germany, jUniversity of Wisconsin–Madison, USA, kIBM, USA, lCornell University, USA, mUniversidade Federal de Santa Catarina, Brazil, nUniversity of Utah, USA, oUC Berkeley, USA, pGoogle AI, USA The alphabetic characters in the superscript represent the affiliations while the digits represent the team numbers yEqual contribution. Email: [email protected], [email protected], [email protected], [email protected] Abstract—Logic synthesis is a fundamental step in hard- artificial intelligence.
    [Show full text]
  • A Soft Processor Microblaze-Based Embedded System for Cardiac Monitoring
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 4, No. 9, 2013 A Soft Processor MicroBlaze-Based Embedded System for Cardiac Monitoring El Hassan El Mimouni Mohammed Karim University Sidi Mohammed Ben Abdellah University Sidi Mohammed Ben Abdellah Fès, Morocco Fès, Morocco Abstract—this paper aims to contribute to the efforts of recent years, and has become also an important way to assert design community to demonstrate the effectiveness of the state of the heart’s condition [5 - 9]. the art Field Programmable Gate Array (FPGA), in the embedded systems development, taking a case study in the II. SYSTEM OVERVIEW biomedical field. With this design approach, we have developed a We have designed and implemented a prototype of basic System on Chip (SoC) for cardiac monitoring based on the soft embedded system for cardiac monitoring, whose functional processor MicroBlaze and the Xilkernel Real Time Operating block diagram is shown in the figure 2; it exhibits a modular System (RTOS), both from Xilinx. The system permits the structure that facilitates the development and debugging. Thus, acquisition and the digitizing of the Electrocardiogram (ECG) analog signal, displaying heart rate on seven segments module it includes 2 main modules: and ECG on Video Graphics Adapter (VGA) screen, tracing the An analog module intended for acquiring and heart rate variability (HRV) tachogram, and communication conditioning of the analog ECG signal to make it with a Personal Computer (PC) via the serial port. We have used appropriate for use by the second digital module ; the MIT_BIH Database records to test and evaluate our implementation performance.
    [Show full text]
  • Performance Evaluation of FPGA Based Embedded ARM Processor
    10.1109/ULTSYM.2013.0135 Performance Evaluation of FPGA based Embedded ARM Processor for Ultrasonic Imaging Spenser Gilliland, Pramod Govindan, Thomas Gonnot and Jafar Saniie Department of Electrical and Computer Engineering Illinois Institute of Technology, Chicago IL, U.S.A. Abstract- This study evaluates the performance of an FPGA based embedded ARM processor system to implement signal processing for ultrasonic imaging and nondestructive testing applications. FPGA based embedded processors possess many advantages including a reduced overall development time, increased performance, and the ability to perform hardware- software (HW/SW) co-design. This study examines the execution performance of split spectrum processing, chirplet signal decomposition, Wigner-Ville distributions and short time Fourier transform implementations, on two embedded processing platforms: a Xilinx Virtex-5 FPGA with embedded MicroBlaze processor and a Xilinx Zynq FPGA with embedded ARM processor. Overall, the Xilinx Zynq FPGA significantly outperforms the Virtex-5 based system in software applications I. INTRODUCTION Figure 1. RUSH SoC setup Generally, ultrasonic imaging applications use personal computers or hand held devices. As these devices are not frequency diverse flaw detection [1], parametric echo specifically designed for efficiently executing estimation [2], and joint time-frequency distribution [3]) on computationally intensive ultrasonic signal processing the RUSH platform using a Xilinx Virtex-5 FPGA with an algorithms, the performance of these applications can be embedded soft-core MicroBlaze processor [4,5] and a improved by executing the algorithms using a dedicated Xilinx Zynq 7020 FPGA with an embedded ARM processor embedded system-on-chip (SoC) hardware. However, [6,7] as shown in Figure 2. porting these applications onto an embedded system requires deep knowledge of the processor architecture and RUSH embedded software development tools.
    [Show full text]
  • Performance Evaluation of a Signal Processing Algorithm with General-Purpose Computing on a Graphics Processing Unit
    DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2019 Performance Evaluation of a Signal Processing Algorithm with General-Purpose Computing on a Graphics Processing Unit FILIP APPELGREN MÅNS EKELUND KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Performance Evaluation of a Signal Processing Algorithm with General-Purpose Computing on a Graphics Processing Unit Filip Appelgren and M˚ansEkelund June 5, 2019 2 Abstract Graphics Processing Units (GPU) are increasingly being used for general-purpose programming, instead of their traditional graphical tasks. This is because of their raw computational power, which in some cases give them an advantage over the traditionally used Central Processing Unit (CPU). This thesis therefore sets out to identify the performance of a GPU in a correlation algorithm, and what parameters have the greatest effect on GPU performance. The method used for determining performance was quantitative, utilizing a clock library in C++ to measure performance of the algorithm as problem size increased. Initial problem size was set to 28 and increased exponentially to 221. The results show that smaller sample sizes perform better on the serial CPU implementation but that the parallel GPU implementations start outperforming the CPU between problem sizes of 29 and 210. It became apparent that GPU's benefit from larger problem sizes, mainly because of the memory overhead costs involved with allocating and transferring data. Further, the algorithm that is under evaluation is not suited for a parallelized implementation due to a high amount of branching. Logic can lead to warp divergence, which can drastically lower performance.
    [Show full text]
  • Evaluation of Synthesizable CPU Cores
    Evaluation of synthesizable CPU cores DANIEL MATTSSON MARCUS CHRISTENSSON Maste r ' s Thesis Com p u t e r Science an d Eng i n ee r i n g Pro g r a m CHALMERS UNIVERSITY OF TECHNOLOGY Depart men t of Computer Engineering Gothe n bu r g 20 0 4 All rights reserved. This publication is protected by law in accordance with “Lagen om Upphovsrätt, 1960:729”. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of the authors. Daniel Mattsson and Marcus Christensson, Gothenburg 2004. Evaluation of synthesizable CPU cores Abstract The three synthesizable processors: LEON2 from Gaisler Research, MicroBlaze from Xilinx, and OpenRISC 1200 from OpenCores are evaluated and discussed. Performance in terms of benchmark results and area resource usage is measured. Different aspects like usability and configurability are also reviewed. Three configurations for each of the processors are defined and evaluated: the comparable configuration, the performance optimized configuration and the area optimized configuration. For each of the configurations three benchmarks are executed: the Dhrystone 2.1 benchmark, the Stanford benchmark suite and a typical control application run as a benchmark. A detailed analysis of the three processors and their development tools is presented. The three benchmarks are described and motivated. Conclusions and results in terms of benchmark results, performance per clock cycle and performance per area unit are discussed and presented. Sammanfattning De tre syntetiserbara processorerna: LEON2 från Gaisler Research, MicroBlaze från Xilinx och OpenRISC 1200 från OpenCores utvärderas och diskuteras.
    [Show full text]
  • Lecture 34: Bonus Topics
    Lecture 34: Bonus topics Philipp Koehn, David Hovemeyer December 7, 2020 601.229 Computer Systems Fundamentals Outline I GPU programming I Virtualization and containers I Digital circuits I Compilers Code examples on web page: bonus.zip GPU programming 3D graphics Rendering 3D graphics requires significant computation: I Geometry: determine visible surfaces based on geometry of 3D shapes and position of camera I Rasterization: determine pixel colors based on surface, texture, lighting A GPU is a specialized processor for doing these computations fast GPU computation: use the GPU for general-purpose computation Streaming multiprocessor I Fetches instruction (I-Cache) I Has to apply it over a vector of data I Each vector element is processed in one thread (MT Issue) I Thread is handled by scalar processor (SP) I Special function units (SFU) Flynn’s taxonomy I SISD (single instruction, single data) I uni-processors (most CPUs until 1990s) I MIMD (multi instruction, multiple data) I all modern CPUs I multiple cores on a chip I each core runs instructions that operate on their own data I SIMD (single instruction, multiple data) I Streaming Multi-Processors (e.g., GPUs) I multiple cores on a chip I same instruction executed on different data GPU architecture GPU programming I If you have an application where I Data is regular (e.g., arrays) I Computation is regular (e.g., same computation is performed on many array elements) then doing the computation on the GPU is likely to be much faster than doing the computation on the CPU I Issues: I GPU
    [Show full text]