Design, Construction, and Use of a Single Board Computer Beowulf Cluster: Application of the Small- Footprint, Low-Cost, Insignal 5420 Octa Board
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Shorten Device Boot Time for Automotive IVI and Navigation Systems
Shorten Device Boot Time for Automotive IVI and Navigation Systems Jim Huang ( 黃敬群 ) <[email protected]> Dr. Shi-wu Lo <[email protected]> May 28, 2013 / Automotive Linux Summit (Spring) Rights to copy © Copyright 2013 0xlab http://0xlab.org/ [email protected] Attribution – ShareAlike 3.0 Corrections, suggestions, contributions and translations You are free are welcome! to copy, distribute, display, and perform the work to make derivative works Latest update: May 28, 2013 to make commercial use of the work Under the following conditions Attribution. You must give the original author credit. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. License text: http://creativecommons.org/licenses/by-sa/3.0/legalcode Goal of This Presentation • Propose a practical approach of the mixture of ARM hibernation (suspend to disk) and Linux user-space checkpointing – to shorten device boot time • An intrusive technique for Android/Linux – minimal init script and root file system changes are required • Boot time is one of the key factors for Automotive IVI – mentioned by “Linux Powered Clusters” and “Silver Bullet of Virtualization (Pitfalls, Challenges and Concerns) Continued” at ALS 2013 – highlighted by “Boot Time Optimizations” at ALS 2012 About this presentation • joint development efforts of the following entities – 0xlab team - http://0xlab.org/ – OSLab, National Chung Cheng University of Taiwan, led by Dr. -
Enabling the Use of Low Power Mobile and Embedded Technologies For
Enabling the Use of Embedded and Mobile Technologies for High-Performance Computing Author: Advisor: Nikola Rajovic´ Alex Ramirez A THESIS SUBMITTED IN FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor per la Universitat Polit`ecnicade Catalunya Departament d’Arquitectura de Computadors Barcelona, Spring 2017 To my family . Моjоj породици . Acknowledgements During my PhD studies I received help and support from many people - it would be strange if I did not. Thus I would like to thank professors Mateo Valero and Veljko Milutinovi´cfor recognizing my interest in HPC and introducing me to my advisor, professor Alex Ramirez. I would like to thank him for trusting in me and giving me an opportunity to be a pioneer of HPC on mobile ARM platforms, for guiding and shaping my work last five years, for helping me understand what real priorities are, and for being pushy when it was really needed. In addition, thank you Carlos and Puzo for helping me to have a smooth start of my PhD and for filling the gap when Alex was too busy. Last two years of my PhD I have been working with Alex remotely, and I would like to thank the local crew who helped me: professor Eduard Ayguade for providing me with some very hard-to-get manuscripts and accepting to be my "ponente" at the University; professor Jesus Labarta for thorough dis- cussions about parallel performance issues and know-how lessons on demand; Alex Rico for helping me deliver work for the Mont-Blanc project and for friendly advices every so little; Filippo Mantovani for making sure I could use the prototypes without outages and for filtering-out internal bureaucracy issues. -
Connecting Peripheral Devices to a Pandaboard Using
11/16/2012 MARK CONNECTING PERIPHERAL DEVICES TO A BIRDSALL PANDABOARD USING PSI This is an application note that will help somebody use the Serial Programming Interface that is available on the OMAP-based PandaBoard’s expansion connector and also explains how use the SPI to connect with a real-time clock (RTC) chip. Ever since the PandaBoard came out, there has been a community of eager programmers constructing creative projects and asking questions about where else and what more they could do to extend the PandaBoard’s abilities. This application note will document a way to connect devices to a OMAP-based devise like a PandaBoard What is the Serial Programming Interface “Serial Programming Interface” (SPI) is a simple standard that was developed my Motorola. SPI can also be called “4-wire” interface (as opposed to 1, 2 or 3-wire serial buses) and it is sometimes referred to like that because the interface has four wires defined. The first is Master- Out-Slave-In (MOSI) and the second is the Master-In-Slave-Out (MISO). There is also a Serial Clock from the Master (SCLK) and a Chipselect Signal (CS#) which can allow for more than one slave devise to be able to connect with one master. Why do we want to use the SPI? There are various ways to connect a peripheral device to the PandaBoard like USB, SPI, etc... and SPI has some advantages. Using the Serial Programming Interface costs less in terms of power usage and it is easy to connect different devises to the PandaBoard and also to debug any problems that occur all while maintaining an acceptable performance rate. -
Deliverable D4.1
Deliverable D4.1 – State of the art on performance and power estimation of embedded and high-performance cores Anastasiia Butko, Abdoulaye Gamatié, Gilles Sassatelli, Stefano Bernabovi, Michael Chapman, Philippe Naudin To cite this version: Anastasiia Butko, Abdoulaye Gamatié, Gilles Sassatelli, Stefano Bernabovi, Michael Chapman, et al.. Deliverable D4.1 – State of the art on performance and power estimation of embedded and high- performance cores. [Research Report] LIRMM (UM, CNRS); Cortus S.A.S. 2016. lirmm-03168326 HAL Id: lirmm-03168326 https://hal-lirmm.ccsd.cnrs.fr/lirmm-03168326 Submitted on 12 Mar 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Project Ref. Number ANR-15-CE25-0007 D4.1 – State of the art on performance and power estimation of embedded and high-performance cores Version 2.0 (2016) Final Public Distribution Main contributors: A. Butko, A. Gamatié, G. Sassatelli (LIRMM); S. Bernabovi, M. Chapman and P. Naudin (Cortus) Project Partners: Cortus S.A.S, Inria, LIRMM Every effort has been made to ensure that all statements and information contained herein are accurate, however the Continuum Project Partners accept no liability for any error or omission in the same. -
A 1024-Core 70GFLOPS/W Floating Point Manycore Microprocessor
A 1024-core 70GFLOPS/W Floating Point Manycore Microprocessor Andreas Olofsson, Roman Trogan, Oleg Raikhman Adapteva, Lexington, MA The Past, Present, & Future of Computing SIMD MIMD PE PE PE PE MINI MINI MINI CPU CPU CPU PE PE PE PE MINI MINI MINI CPU CPU CPU PE PE PE PE MINI MINI MINI CPU CPU CPU MINI MINI MINI BIG BIG CPU CPU CPU CPU CPU BIG BIG BIG BIG CPU CPU CPU CPU PAST PRESENT FUTURE 2 Adapteva’s Manycore Architecture C/C++ Programmable Incredibly Scalable 70 GFLOPS/W 3 Routing Architecture 4 E64G400 Specifications (Jan-2012) • 64-Core Microprocessor • 100 GFLOPS performance • 800 MHz Operation • 8GB/sec IO bandwidth • 1.6 TB/sec on chip memory BW • 0.8 TB/sec network on chip BW • 64 Billion Messages/sec IO Pads Core • 2 Watt total chip power • 2MB on chip memory Link Logic • 10 mm2 total silicon area • 324 ball 15x15mm plastic BGA 5 Lab Measurements 80 Energy Efficiency 70 60 50 GFLOPS/W 40 30 20 10 0 0 200 400 600 800 1000 1200 MHz ENERGY EFFICIENCY ENERGY EFFICIENCY (28nm) 6 Epiphany Performance Scaling 16,384 G 4,096 F 1,024 L 256 O 64 4096 P 1024 S 16 256 64 4 16 1 # Cores On‐demand scaling from 0.25W to 64 Watt 7 Hold on...the title said 1024 cores! • We can build it any time! • Waiting for customer • LEGO approach to design • No global timinga paths • Guaranteed by design • Generate any array in 1 day • ~130 mm2 silicon area 1024 Cores 1Core 8 What about 64-bit Floating Point? Single Precision Double Precision 2 FLOPS/CYCLE 2 FLOPS/CYCLE 64KB SRAM 64KB SRAM 0.215mm^2 0.237mm^2 700MHz 600MHz 9 Epiphany Latency Specifications -
Universidad De Guayaquil Facultad De Ciencias Matematicas Y Fisicas Carrera De Ingeniería En Networking Y Telecomunicaciones D
UNIVERSIDAD DE GUAYAQUIL FACULTAD DE CIENCIAS MATEMATICAS Y FISICAS CARRERA DE INGENIERÍA EN NETWORKING Y TELECOMUNICACIONES DISEÑO DE UN SISTEMA INTELIGENTE DE MONITOREO Y CONTROL EN TIEMPO REAL PARA TANQUES DE ALMACENAMIENTO DE GASOLINA UTILIZANDO TECNOLOGÍA DE HARDWARE Y SOFTWARE LIBRE PARA PEQUEÑAS Y MEDIANAS EMPRESAS PROYECTO DE TITULACIÓN Previa a la obtención del Título de: INGENIERO EN NETWORKING Y TELECOMUNICACIONES AUTORES: Chamorro Salazar Hamilton Gabriel TUTOR: Ing. Jacobo Antonio Ramírez Urbina GUAYAQUIL – ECUADOR 2019 I REPOSITORIO NACIONAL EN CIENCIAS Y TECNOLOGIA FICHA DE REGISTRO DE TESIS TITULO: Diseño de un sistema inteligente de monitoreo y control en tiempo real para tanques de almacenamiento de gasolina utilizando tecnología de hardware y software libre para pequeñas y medianas empresas. REVISORES: Ing. Luis Espin Pazmiño, M.Sc Ing. Harry Luna Aveiga, M.Sc INSTITUCIÓN: Universidad de FACULTAD: Ciencias Matemáticas y Guayaquil Físicas. CARRERA: Ingeniería en Networking y Telecomunicaciones FECHA DE PUBLICACIÓN: N° DE PAGS: 143 AREA TEMÁTICA: Tecnología de la Información y Telecomunicaciones PALABRA CLAVES: Telemetría, microcomputadores, sensores, actuadores, sistema digital, tanques de almacenamiento, control en tiempo real RESUMEN: Este proyecto tiene como finalidad investigar y diseñar un sistema de monitoreo y control en tiempo real para tanques de almacenamiento de gasolina utilizando tecnología de hardware y software libre para las pequeñas y medianas empresas. El diseño permite una gestión eficiente de los -
Comparison of 116 Open Spec, Hacker Friendly Single Board Computers -- June 2018
Comparison of 116 Open Spec, Hacker Friendly Single Board Computers -- June 2018 Click on the product names to get more product information. In most cases these links go to LinuxGizmos.com articles with detailed product descriptions plus market analysis. HDMI or DP- USB Product Price ($) Vendor Processor Cores 3D GPU MCU RAM Storage LAN Wireless out ports Expansion OSes 86Duino Zero / Zero Plus 39, 54 DMP Vortex86EX 1x x86 @ 300MHz no no2 128MB no3 Fast no4 no5 1 headers Linux Opt. 4GB eMMC; A20-OLinuXino-Lime2 53 or 65 Olimex Allwinner A20 2x A7 @ 1GHz Mali-400 no 1GB Fast no yes 3 other Linux, Android SATA A20-OLinuXino-Micro 65 or 77 Olimex Allwinner A20 2x A7 @ 1GHz Mali-400 no 1GB opt. 4GB NAND Fast no yes 3 other Linux, Android Debian Linux A33-OLinuXino 42 or 52 Olimex Allwinner A33 4x A7 @ 1.2GHz Mali-400 no 1GB opt. 4GB NAND no no no 1 dual 40-pin 3.4.39, Android 4.4 4GB (opt. 16GB A64-OLinuXino 47 to 88 Olimex Allwinner A64 4x A53 @ 1.2GHz Mali-400 MP2 no 1GB GbE WiFi, BT yes 1 40-pin custom Linux eMMC) Banana Pi BPI-M2 Berry 36 SinoVoip Allwinner V40 4x A7 Mali-400 MP2 no 1GB SATA GbE WiFi, BT yes 4 Pi 40 Linux, Android 8GB eMMC (opt. up Banana Pi BPI-M2 Magic 21 SinoVoip Allwinner A33 4x A7 Mali-400 MP2 no 512MB no Wifi, BT no 2 Pi 40 Linux, Android to 64GB) 8GB to 64GB eMMC; Banana Pi BPI-M2 Ultra 56 SinoVoip Allwinner R40 4x A7 Mali-400 MP2 no 2GB GbE WiFi, BT yes 4 Pi 40 Linux, Android SATA Banana Pi BPI-M2 Zero 21 SinoVoip Allwinner H2+ 4x A7 @ 1.2GHz Mali-400 MP2 no 512MB no no WiFi, BT yes 1 Pi 40 Linux, Android Banana -
Development Boards This Product Is Rohs Compliant
Development Boards This product is RoHS compliant. PANDABOARD DEVELOPMENT PLATFORM Features: • Core Logic: OMAP4460 applications Processor • Interface: (1) General Purpose Expansion Header • Wireless Connectivity: 802.11 b/g/n (WiLink™ 6.0) • Memory: 1GB DDR2 RAM (I2C, GPMC, USB, MMC, DSS, ETM) • Debug options: JTAG, UART/RS-232, 1 GPIO button NTL • Full Size SD/MMC card port • Camera Expansion Header • Graphics APIs: OpenGL ES v2.0, OpenGL ES v1.1, • 10/100 Ethernet • Display Connectors: HDMI v1.3, DVI-D. LCD Expansion OpenVGv1.1, and EGL v1.3 • USB: (1) USB 2.0 OTG port, (2) USB 2.0 High-speed port • Audio Connectors: 3.5" In/Out, HDMI audio out For quantities greater than listed, call for quote. MOUSER Pandaboard Price Description STOCK NO. Part No. Each 595-PANDABOARD UEVM4430G-01-00-00 Pandaboard ARM Cortex-A9 MPCore 1GHz OMAP4430 SoC Platform 179.00 595-PANDABOARD-ES UEVM4460G-02-01-00 Pandaboard ARM Cortex-A9 MPCore 1GHz OMAP4460 SoC Platform 185.00 Embedded Modules Embedded BEAGLEBOARD SOC PLATFORMS BeagleBoard.org develops low-cost, fan-less single-board computers based on low-power Texas Instruments processors featuring the ARM Cortex-A8 core with all of the expandability of today's desktop machines, but without the bulk, expense, or noise. BeagleBoard.org provides an open source development platform for A B the creation of high-performance embedded designs. Beagleboard C4 Features: Beagleboard xM Features: Beaglebone Features: • Over 1,200 Dhrystone MIPS using the superscalar • Over 2,000 Dhrystone MIPS using the Super-scalar -
DISI - University of Trento
PhD Dissertation International Doctorate School in Information and Communication Technologies DISI - University of Trento Cyber-Physical Systems: Two Case Studies in Design Methodologies Luca Rizzon Advisor: Prof. Roberto Passerone Universit`adegli Studi di Trento April 2016 Abstract To analyze embedded systems, engineers use tools that can simulate the performance of software components executed on hardware architectures. When the embedded system functionality is strongly correlated to physical quantities, as in the case of Cyber-Physical System (CPS), we need to model physical processes to determine the overall behavior of the system. Unfortunately, embedded systems simulators are not generally suitable to evaluate physical processes, and in the same way physical model simulators hardly capture the functionality of computing systems. In this work, we present a methodology to concurrently explore these aspects using the metroII design framework. The methodology provides guidelines for the implementation of these models in the design environment. To demonstrate the feasibility of the proposed approach, we applied the methodology to two case studies. A case study regards a binaural guidance system developed to be included into a smart rollator for older adults. The second case consists of an energy recovery device which gets energy from the heat dissipated by a high performance processor and power a smart sink able to provide cooling or to serve as a wireless sensing node. Keywords [Cyber-Physical Systems, Design Methodology, Binaural Synthesis, Energy Harvesting] Contents 1 Introduction 1 1.1 CPS Design Challenges . .1 1.2 Structure of the Thesis . .3 2 Design Methodology 5 2.1 State of the Art . .5 2.2 MetroII Design Framework . -
Introducing Slambench, a Performance and Accuracy Benchmarking Methodology for SLAM
Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM Luigi Nardi1, Bruno Bodin2, M. Zeeshan Zia1, John Mawer3, Andy Nisbet3, Paul H. J. Kelly1, Andrew J. Davison1, Mikel Lujan´ 3, Michael F. P. O’Boyle2, Graham Riley3, Nigel Topham2 and Steve Furber3 Kernels Architectures Correctness Performance Metrics Frame rate Computer bilateralFilter (..) halfSampleRobust (..) Vision renderVolume (..) integrate (..) : Energy : Accuracy Compiler/Runtime Hardware ICL-NUIM Dataset Fig. 1: The SLAMBench framework makes it possible for experts coming from the computer vision, compiler, run-time, and hardware communities to cooperate in a unified way to tackle algorithmic and implementation alternatives. Abstract— Real-time dense computer vision and SLAM offer While classical point feature-based simultaneous locali- great potential for a new level of scene modelling, tracking sation and mapping (SLAM) techniques are now crossing and real environmental interaction for many types of robot, into mainstream products via embedded implementation in but their high computational requirements mean that use on mass market embedded platforms is challenging. Mean- projects like Project Tango [3] and Dyson 360 Eye [1], while, trends in low-cost, low-power processing are towards dense SLAM algorithms with their high computational re- massive parallelism and heterogeneity, making it difficult for quirements are largely at the prototype stage on PC or robotics and vision researchers to implement their algorithms laptop platforms. Meanwhile, there has been a great focus in a performance-portable way. In this paper we introduce in computer vision on developing benchmarks for accuracy SLAMBench, a publicly-available software framework which represents a starting point for quantitative, comparable and comparison, but not on analysing and characterising the validatable experimental research to investigate trade-offs in envelopes for performance and energy consumption. -
Report on Tuned Linux-ARM Kernel and Delivery of Kernel Patches to the Linux Kernel Version 1.0
D5.4{ Report on Tuned Linux-ARM kernel and delivery of kernel patches to the Linux kernel Version 1.0 Document Information Contract Number 288777 Project Website www.montblanc-project.eu Contractual Deadline M18 Dissemintation Level PU Nature Other Coordinator Alex Ramirez (BSC) Contributors Roxana Rusitoru (ARM) Reviewers Isaac Gelado (BSC) Keywords Linux kernel, HPC, Transparent HugePage Notices: The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement no 288777 c 2013 Mont-Blanc Consortium Partners. All rights reserved. D5.4 - Report on Tuned Linux-ARM kernel and delivery of kernel patches to the Linux kernel Version 1.0 Change Log Version Description of Change v0.1 Initial draft released to the WP5 contributors v0.2 Revised draft released to the WP5 contributors v1.0 Version to send to the WP leaders and the EC 2 D5.4 - Report on Tuned Linux-ARM kernel and delivery of kernel patches to the Linux kernel Version 1.0 Contents 1 Introduction5 2 Analysis of Transparent HugePages: pandaboard5 2.1 Introduction...................................5 2.2 Hardware setup.................................5 2.2.1 Limitations...............................5 2.3 Benchmarks...................................6 2.3.1 Setup..................................8 2.3.1.1 Compilation flags......................8 2.3.1.2 Processor affinity.......................8 2.3.1.3 ATLAS............................8 2.4 Sample HPC application - bigDFT......................8 2.5 Methodology..................................9 2.6 Results...................................... 12 2.6.1 Observations.............................. 15 2.7 Conclusions................................... 15 2.8 Further work.................................. 15 2.8.1 Kernel profiling............................. 15 2.8.2 In-depth analysis of results..................... -
Survey and Benchmarking of Machine Learning Accelerators
1 Survey and Benchmarking of Machine Learning Accelerators Albert Reuther, Peter Michaleas, Michael Jones, Vijay Gadepally, Siddharth Samsi, and Jeremy Kepner MIT Lincoln Laboratory Supercomputing Center Lexington, MA, USA freuther,pmichaleas,michael.jones,vijayg,sid,[email protected] Abstract—Advances in multicore processors and accelerators components play a major role in the success or failure of an have opened the flood gates to greater exploration and application AI system. of machine learning techniques to a variety of applications. These advances, along with breakdowns of several trends including Moore’s Law, have prompted an explosion of processors and accelerators that promise even greater computational and ma- chine learning capabilities. These processors and accelerators are coming in many forms, from CPUs and GPUs to ASICs, FPGAs, and dataflow accelerators. This paper surveys the current state of these processors and accelerators that have been publicly announced with performance and power consumption numbers. The performance and power values are plotted on a scatter graph and a number of dimensions and observations from the trends on this plot are discussed and analyzed. For instance, there are interesting trends in the plot regarding power consumption, numerical precision, and inference versus training. We then select and benchmark two commercially- available low size, weight, and power (SWaP) accelerators as these processors are the most interesting for embedded and Fig. 1. Canonical AI architecture consists of sensors, data conditioning, mobile machine learning inference applications that are most algorithms, modern computing, robust AI, human-machine teaming, and users (missions). Each step is critical in developing end-to-end AI applications and applicable to the DoD and other SWaP constrained users.