Diplomat: Mapping of Multi-Kernel Applications Using a Static Dataflow Abstraction

Total Page:16

File Type:pdf, Size:1020Kb

Diplomat: Mapping of Multi-Kernel Applications Using a Static Dataflow Abstraction Diplomat: Mapping of Multi-kernel Applications Using a Static Dataflow Abstraction Bruno Bodin Luigi Nardi & Paul H. J. Kelly Michael F. P. O’Boyle The University of Edinburgh, Imperial College London, The University of Edinburgh, Edinburgh, United Kingdom London, United Kingdom Edinburgh, United Kingdom Email: [email protected] Email: fl.nardi,[email protected] Email: [email protected] Abstract—In this paper we propose a novel approach to They provide an automatic mapping, yet these models are heterogeneous embedded systems programmability using a task- overly restrictive. As an example, many vision applications graph based framework called Diplomat. Diplomat is a task- include dynamic behavior, such as early exit conditions, graph framework that exploits the potential of static dataflow modeling and analysis to deliver performance estimation and which are not supported by those models. Tools such as CPU/GPU mapping. An application has to be specified once, and Qualcomm Symphony [3] (previously known as MARE) are then the framework can automatically propose good mappings. more expressive. Symphony lets a programmer represent an We evaluate Diplomat with a computer vision application on two application using a task-graph, a more general representation embedded platforms. Using the Diplomat generation we observed of the dataflow model where communications are no longer a 16% performance improvement on average and up to a 30% improvement over the best existing hand-coded implementation. static. However, Symphony tasks are dynamically defined and mapping decisions are performed manually. Ideally we would like a framework that is both flexible and able to provide high I. INTRODUCTION performance mappings automatically. Mobile and embedded systems on chip (SoC) integrate In this paper, we address the mapping problem of multi- multiple computing resources. They contain multiple CPU kernel applications. For this purpose we introduce the Diplo- cores, a GPU and in some cases also a DSP. Such complex mat framework, a task-graph model embedded in the Python architectures have the potential for good performances but are language and combined with off-line performance charac- hard to program. They have a mapping problem, i.e. how terization to compute relevant mappings. It mainly targets to associate computational tasks to hardware resources in streaming applications, which are common in computer vision. order to meet the application’s response time, throughput In Diplomat, an application is defined as a task-graph. In objectives and power budget. OpenCL [1] has emerged as contrast with static dataflow modeling, the tasks and com- a standard to program heterogeneous systems and allows munication channels are not necessarily known at compile- program portability, i.e. the ability of a single program to time. Diplomat automatically collects performance profiling be executed on different devices. The OpenCL programming information to evaluate tasks’ workload; then it uses a static interface allows a programmer to express a task under the form dataflow model to evaluate performance and explores different of a function (called kernel in OpenCL) and to execute it on mappings for a given platform. The library generates C++ compatible resources. However, developing applications using code, which can then be integrated into an embedded system’s such systems and achieving acceptable performance is a non- firmware. This code is derived automatically from the task- trivial task. Even if a programmer has perfect knowledge of graph, and uses a runtime library to implement the necessary the target architecture and a deep understanding of OpenCL, communication and buffering between tasks. We show that a performance portability remains a challenge. class of computer vision applications can be abstracted to a Meanwhile, the emergence of computationally intensive static dataflow representation via the Diplomat task-graph. We mobile vision applications makes the use of embedded het- observe up to 30% speed improvement over the best existing erogeneous platforms increasingly important. Mobile vision hand-written implementation. applications have performance constraints and need efficient The contributions of this paper are: mappings. They are excellent candidates to evaluate new • A task-graph model, suitable for dynamic streaming mapping techniques. applications, and which can be easily abstracted into a Prior work on mapping exists. While most work deals with Synchronous Dataflow (SDF) model. single-kernel applications, recent research focuses on multi- • A framework which implements this abstraction and kernel applications. These represent more realistic real-world performs workload profiling together with static analysis applications. Static dataflow frameworks such as StreamIt [2] to find a mapping that achieves best performance on provide languages to statically express multi-kernel applica- heterogeneous platforms. tions and communication channels with specific constraints • We show that our approach adapts automatically to differ- on the amount of transferred data through the channels. ent architectures for truly performance-portable software. II. MOTIVATION 1 2 To illustrate choices that have to be made while designing an embedded system application on modern devices, we will Tasks' implementation in DIplomat DSL C++, OpenMP, OpenCL, ... Task graph representation consider the SLAMBench benchmarking framework [4] and in particular the KFusion benchmark. KFusion performs real- time localization and scene reconstruction for a camera mov- ing through an unknown environment, i.e. it estimates the pose Time profiling of a depth camera while building a highly detailed 3D model 3 of the environment. This application is composed of more than 10 different communicating tasks (refer to Figure 1). To im- Static dataflow plement this application and to reach acceptable performance, 4 abstraction a designer must optimize its algorithm while considering a good mapping at the same time. Thus, in Figure 2 we present Dataflow analysis, several possible mapping solutions. 5 e.g. mapping Running the KFusion benchmark on just one 2GHz ARM A15 core of an embedded SoC (Samsung Exynos 5422, see Code generation Table II), we observe low performance on a single CPU core, less than 2.5 Frames Per Second (FPS). By exploiting data 6 parallelism (using the KFusion OpenMP implementation), the Generated source code speed-up is 2.5x. We can also take advantage of the task paral- C++, OpenMP, OpenCL, ... lelism through a task-graph runtime (namely Symphony [3]), and again, we improve performance by a further 20%. Thus, Fig. 3: Overview of the Diplomat framework. The user pro- with these optimizations and by only using the multi-core vides (1) the task implementations in various languages and CPU, this application can reach 7.4 FPS. Alternatively, we (2) the dependencies between the tasks. Then in (3) Diplomat can use the GPU implementation (with OpenCL), and yields performs timing analysis on the target platform and in (4) a 2x speed-up comparing with the best CPU version, which is abstracts the task-graph as a static dataflow model. Finally, a 14.9 FPS. dataflow model analysis step is performed in (5) and in (6) It is evidently appealing to use both CPU and GPU at the the Diplomat compiler performs the code generation. Input frames same time and to combine those speed-ups through a more complex mapping, but implementing such behavior is not Tracking depth2vertex easy. The communication between CPU and GPU becomes preprocessing vertex2normal Integrate Raycast mm2meter halfsample integrate raycast a potential bottleneck, and the resulting performance might be bilateralfilter track reduce worse. For single-kernel applications, various solutions have been proposed to handle the mapping problem. But these are Reconstructed not well suited to multi-kernel applications. As an example, RenderDepth RenderTrack RenderVolume 3D scene following a speed-up based mapping methodology [5] we obtain a 16% slow-down compared to the OpenCL version. A partitioning technique similar to the one presented in [6] Fig. 1: The KFusion algorithm is composed of 7 main tasks, also results in a 16% slow-down. a total of 12 different OpenCL kernels. In contrast, using our approach, we are able to generate a mapping that reaches 17.2 FPS, which is a 15% speed-up Sequential (CPU) 2.5 compared to the OpenCL version. This example shows that OpenMP (CPU) 6.3 mapping using dataflow analysis tools can outperform existing Task-graph (CPU) 7.4 techniques. OpenCL (GPU) 14.9 Smart Mapping (GPU/CPU) 12.8 III. DIPLOMAT FRAMEWORK Partitioning (GPU+CPU) 12.8 Diplomat (GPU+CPU) 17.2 A. Framework Overview 0 5 10 15 Frames Per Second (FPS) The Diplomat framework combines task-graph modeling and static dataflow analysis to perform the mapping of stream- Fig. 2: Performance comparison of different possible mappings ing applications. An overview of the framework is given in for KFusion (higher FPS is better). Figure 3. 1) Front-end: The Diplomat front-end allows the frame- Bilateral work to gather fundamental information about the application: Filter the different possible implementations of the tasks, their expected input and output data sizes, and the existing data dependencies between each of them. 0 2) Static analysis: At compile-time, the framework per- ScaledDepth Vertex Normal forms static analysis. In order to benefit from existing dataflow i-1 i i i analysis techniques, the initial task-graph needs to be turned into a dataflow model. As
Recommended publications
  • Setting up Odroid C2 – Throughtput Testing
    Throughput Testing Single-Board Computer for Wireless LAN Professionals Setting up Odroid C2 As Test End-Point Inventory Project Items Unpack and knoll the following items ❑ Impact Strong 5200 mAh USB Battery o And it has a flashlight as well! ❑ USB Micro Power Cable (comes with Battery) ❑ Micro SD Card to eMCC adapter (Blueish green) ❑ eMCC Memory Card (small with Red label) ❑ Transcend USB 3.0 SD Card Reader ❑ Odroid Wi-Fi 4 NIC – (Black) ❑ USB ‘U-bend’ Adaper (Black ❑ Odroid C2 – Single Board Computer ❑ #WLPC_EU USB Drive (has the Image) ❑ Odroid Case Kit (Black) Later we will have you use as part of the testing (not in your own kit) • Small Phillips Screwdriver (to assemble your case) • Short Ethernet Cable (to test your unit when attached to a switch port) ❑ If you are running Windows copy the Win32DiskImager-0.9.5-binary executable as well. If you are running Windows follow the Windows directions, if Mac OS – move to the Mac OS directions. Mac OS Version Configure Image ❑ Copy Image File from the #WLPC_EU USB drive (in the Odroid for #WLPC_EU folder) to your Desktop… the file name is DietPi_v133_OdroidC2-arm64- (WLAN_PRO) ❑ Open Image File – Boot Drive ❑ Should have a Folder Call “Boot” ❑ Find the dietpi.txt We are changing setting so your Odroid device can join a local SSID and download more files we’ll need. ❑ Under the Wi-Fi Details section change the Wi-Fi SSID to WLPC_EU-01 with a PSK of password. #Enter your Wifi details below, if applicable (Case Sensitive). Wifi_SSID=WLPC_EU-01 Wifi_KEY=password ❑ Under WiFi Hotspot Settings,
    [Show full text]
  • Improving the Beaglebone Board with Embedded Ubuntu, Enhanced GPMC Driver and Python for Communication and Graphical Prototypes
    Final Master Thesis Improving the BeagleBone board with embedded Ubuntu, enhanced GPMC driver and Python for communication and graphical prototypes By RUBÉN GONZÁLEZ MUÑOZ Directed by MANUEL M. DOMINGUEZ PUMAR FINAL MASTER THESIS 30 ECTS, JULY 2015, ELECTRICAL AND ELECTRONICS ENGINEERING Abstract Abstract BeagleBone is a low price, small size Linux embedded microcomputer with a full set of I/O pins and processing power for real-time applications, also expandable with cape pluggable boards. The current work has been focused on improving the performance of this board. In this case, the BeagleBone comes with a pre-installed Angstrom OS and with a cape board using a particular software “overlay” and applications. Due to a lack of support, this pre-installed OS has been replaced by Ubuntu. As a consequence, the cape software and applications need to be adapted. Another necessity that emerges from the stated changes is to improve the communications through a GPMC interface. The depicted driver has been built for the new system as well as synchronous variants, also developed and tested. Finally, a set of applications in Python using the cape functionalities has been developed. Some extra graphical features have been included as example. Contents Contents Abstract ..................................................................................................................................................................................... 5 List of figures .........................................................................................................................................................................
    [Show full text]
  • Suzanne's Microcluster Slides
    csinparallel.org Microclusters for teaching PDC Suzanne J. Matthews (West Point) 1 csinparallel.org What is a Microcluster? • A personal, highly portable Beowulf cluster • Enables highly interactive and tactile experiential learning • Notable early examples: – Ultimate Linux Lunch Box (Ron Minnich and Mitch Williams, Sandia National Labs) – LittleFe (Charlie Peck, Earlham College) – Microwulf (Joel Adams, Calvin College) 2 csinparallel.org Single Board Computers (SBCs) 3 csinparallel.org Student Pi (West Point) Suzanne J. Matthews Raspberry Pi nodes - Prototype: Raspberry Pi B nodes - Initial: Raspberry Pi B+ nodes - Current: Raspberry Pi 2 nodes - 900 Mhz quad-core CPU, 1 GB of RAM, HDMI, USB, 10/100 Ethernet - Raspbian Linux June 2014 - ~$40 p/node - Materials: - http://suzannejmatthews.com/private/cluster.html October 2014 May 2016 4 csinparallel.org Student Parallella (West Point) Suzanne J. Matthews Parallella nodes - dual-core ARM A9 CPU, 16-core Epiphany co-processor, 1 GB of RAM, μHDMI, μUSB, Gigabit Ethernet - Linaro Linux - ~$145 p/node - Materials: - http://suzannejmatthews.com/private/cluster.html - http://suzannejmatthews.github.io/ October 2014 April 2016 January 2015 5 csinparallel.org Half ShoeBox Clusters (Centre College) David Toth Cubieboard/ODROID nodes (2-node clusters) - Prototype: Cubieboard2: dual-core ARM Cortex A7, 1 GB of RAM, HDMI, USB, 10/100 Ethernet - Latest: ODROID C2: 2Ghz quad-core A53, 2 GB of RAM, HDMI, USB, Gigabit Ethernet, - Android/Ubuntu Linux - ~ $150-$200 p/cluster - Materials: Early 2014 - http://web.centre.edu/david.toth/portablecluster/index.html
    [Show full text]
  • Passmark - Android Device List
    PassMark - Android Device List https://www.androidbenchmark.net/device_list.php AndroidTM Benchmarks Performance Comparison of Android Devices Below is an alphabetical list of all Android device types that appear in the charts. Clicking on a specific device name will take you to the charts where it appears in and will highlight it for you. PassMark Rating CPUMark Rating PassMark Rank Android Device Type Samples (higher is better) (higher is better) (lower is better) 4G R17S 1,572 4,088 1253 1 A-gold BV9500Plus 5,052 13,068 375 1 A-gold BV9800 4,450 11,400 487 1 A-gold F1 4,237 10,869 531 7 A-gold S3_Pro 4,392 11,219 504 2 A-gold Z2_PRO 4,406 11,246 499 1 A1 Alpha 20+ 4,753 12,266 435 1 Acer A3-A40 1,982 5,269 1082 1 Acer AO722 519 1,272 1725 1 1 z 62 2020-10-14, 12:02 PassMark - Android Device List https://www.androidbenchmark.net/device_list.php PassMark Rating CPUMark Rating PassMark Rank Android Device Type Samples (higher is better) (higher is better) (lower is better) AGM A10 2,030 8,521 1066 1 ALCATEL A574BL 497 1,202 1736 1 AlcatelOneTouch Alcatel_5044R 438 1,129 1759 1 Alco CT9223W97 1,214 3,111 1384 1 ALLDOCUBE M8 2,730 7,274 882 5 ALLDOCUBE T701 1,092 4,554 1437 1 ALLDOCUBE U1006H 1,902 4,931 1125 1 ALLVIEW P7_PRO 1,691 4,543 1210 1 ALLVIEW X4_Soul 2,536 6,938 925 1 Alps Acer One 8 T4-82L 2,539 6,526 924 1 Alps Tablet18T 1,201 3,043 1394 1 Alps tb8788p1_64_bsp 2,343 5,784 983 2 Amazon KFKAWI 712 1,701 1589 4 Amazon KFMAWI 2,306 5,640 992 19 Amazon KFONWI 1,082 2,588 1442 3 Amlogic A95X-A3 1,228 3,182 1381 1 Amlogic ABOX A4 397
    [Show full text]
  • Security Monitor for Mobile Devices: Design and Applications
    UNIVERSITY OF CALIFORNIA, IRVINE Security Monitor for Mobile Devices: Design and Applications DISSERTATION submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computer Science by Saeed Mirzamohammadi Dissertation Committee: Professor Ardalan Amiri Sani, UCI, Chair Professor Gene Tsudik, UCI Professor Sharad Mehrotra, UCI Doctor Sharad Agarwal, MSR 2020 Portion of Chapter 1 c 2018 ACM, reprinted, with permission, from [148] Portion of Chapter 1 c 2017 ACM, reprinted, with permission, from [150] Portion of Chapter 1 c 2018 IEEE, reprinted, with permission, from [149] Portion of Chapter 1 c 2020 ACM, reprinted, with permission, from [151] Portion of Chapter 2 c 2018 ACM, reprinted, with permission, from [148] Portion of Chapter 2 c 2017 ACM, reprinted, with permission, from [150] Portion of Chapter 3 c 2018 ACM, reprinted, with permission, from [148] Portion of Chapter 4 c 2018 IEEE, reprinted, with permission, from [149] Portion of Chapter 5 c 2017 ACM, reprinted, with permission, from [150] Portion of Chapter 6 c 2020 ACM, reprinted, with permission, from [151] Portion of Chapter 7 c 2020 ACM, reprinted, with permission, from [151] Portion of Chapter 7 c 2017 ACM, reprinted, with permission, from [150] Portion of Chapter 7 c 2018 IEEE, reprinted, with permission, from [149] Portion of Chapter 7 c 2018 ACM, reprinted, with permission, from [148] All other materials c 2020 Saeed Mirzamohammadi TABLE OF CONTENTS Page LIST OF FIGURES v LIST OF TABLES vii ACKNOWLEDGMENTS viii VITA ix ABSTRACT OF THE DISSERTATION xi 1 Introduction 1 1.1 Applications of the security monitor .
    [Show full text]
  • Cubieboard Cubieboard2 Cubietruck Beaglebone Black
    Raspberry Pi (Model B rev.2) Cubieboard Cubieboard2 Cubietruck Beaglebone Black 1 Ghz (OC) ARM® Cortex-A6 1 Ghz ARM® Cortex-A8 1 Ghz ARM® Cortex-A7 Dual Core 1 Ghz ARM® Cortex-A7 Dual Core 1 Ghz ARM® Cortex-A8 CPU ARM1176JZF-F Allwinner A10 C8096CA Allwinner A20 Allwinner A20 AM335x GPU/FPU VideoCore IV Mali-400 (CedarX, OpenGL) Mali-400MP2 (CedarX, OpenGL) Mali-400MP2 (CedarX, OpenGL) SGX350 3D / NEON FPU accelerator RAM 512 MB 1 GB DDR3 2 GB 2 GB 512 MB DDR3 Storage micro SD/SDHC 4 GB NAND Flash, micro SD/SDHC, SATA 4 GB NAND Flash, micro SD/SDHC, SATA 4 GB NAND Flash, micro SD/SDHC, SATA 2.0 2GB eMMC Power micro USB (5V/1A) 3.5 W DC 5v/2A DC 5v/2A DC 5v/2.5A DC 5V/500mA Video RCA Composite Video, HDMI 1.4 HDMI HDMI HDMI/VGA microHDMI Audio 3.5 mm Headphone Jack 3.5 mm Headphone Jack / Line In 3.5 mm Headphone Jack 3.5 mm Headphone Jack, SPDIF Network 10/100 Mbps 10/100 Mbps 10/100 Mbps 10/100/1000 Mbps, Wifi, Bluetooth 10/100 Mbps 2x46 PIN GPIO I/O ports 26 PIN GPIO, 2x Ribon 2x48 PIN GPIO, 4PIN Serial, 1IR 2x48 PIN GPIO, 4PIN Serial, 1IR 1x 54 PIN GPIO (Arduino Shield Compatible) USB ports 2x USB 2.0 2x USB 2.0 2x USB 2.0, 1 mini USB OTG 2x USB 2.0, 1 mini USB OTG 1x USB 2.0 Linux (Raspbian, Debian, Fedora, Arch, Gentoo, Kali), Andoid, Angstrom, Ubuntu, Fedora, Gentoo.
    [Show full text]
  • Passmark Android Benchmark Charts - CPU Rating
    PassMark Android Benchmark Charts - CPU Rating http://www.androidbenchmark.net/cpumark_chart.html Home Software Hardware Benchmarks Services Store Support Forums About Us Home » Android Benchmarks » Device Charts CPU Benchmarks Video Card Benchmarks Hard Drive Benchmarks RAM PC Systems Android iOS / iPhone Android TM Benchmarks ----Select A Page ---- Performance Comparison of Android Devices Android Devices - CPUMark Rating How does your device compare? Add your device to our benchmark chart This chart compares the CPUMark Rating made using PerformanceTest Mobile benchmark with PerformanceTest Mobile ! results and is updated daily. Submitted baselines ratings are averaged to determine the CPU rating seen on the charts. This chart shows the CPUMark for various phones, smartphones and other Android devices. The higher the rating the better the performance. Find out which Android device is best for your hand held needs! Android CPU Mark Rating Updated 14th of July 2016 Samsung SM-N920V 166,976 Samsung SM-N920P 166,588 Samsung SM-G890A 166,237 Samsung SM-G928V 164,894 Samsung Galaxy S6 Edge (Various Models) 164,146 Samsung SM-G930F 162,994 Samsung SM-N920T 162,504 Lemobile Le X620 159,530 Samsung SM-N920W8 159,160 Samsung SM-G930T 157,472 Samsung SM-G930V 157,097 Samsung SM-G935P 156,823 Samsung SM-G930A 155,820 Samsung SM-G935F 153,636 Samsung SM-G935T 152,845 Xiaomi MI 5 150,923 LG H850 150,642 Samsung Galaxy S6 (Various Models) 150,316 Samsung SM-G935A 147,826 Samsung SM-G891A 145,095 HTC HTC_M10h 144,729 Samsung SM-G928F 144,576 Samsung
    [Show full text]
  • Building a Datacenter with ARM Devices
    Building a Datacenter with ARM Devices Taylor Chien1 1SUNY Polytechnic Institute ABSTRACT METHODS THE CASE CURRENT RESULTS The ARM CPU is becoming more prevalent as devices are shrinking and Physical Custom Enclosure Operating Systems become embedded in everything from medical devices to toasters. Build a fully operational environment out of commodity ARM devices using Designed in QCAD and laser cut on hardboard by Ponoko Multiple issues exist with both Armbian and Raspbian, including four However, Linux for ARM is still in the very early stages of release, with SBCs, Development Boards, or other ARM-based systems Design was originally only for the Raspberry Pis, Orange Pi Ones, Udoo critical issues that would prevent them from being used in a datacenter many different issues, challenges, and shortcomings. Have dedicated hard drives and power system for mass storage, including Quads, PINE64, and Cubieboard 3 multiple drives for GlusterFS operation, and an Archive disk for backups and Issue OS In order to test what level of service commodity ARM devices have, I Each device sits on a tray which can be slid in and out at will rarely-used storage Kernel and uboot are not linked together after a Armbian decided to build a small data center with these devices. This included Cable management and cooling are on the back for easy access Build a case for all of these devices that will protect them from short circuits version update building services usually found in large businesses, such as LDAP, DNS, Designed to be solid and not collapse under its own weight and dust Operating system always performs DHCP request Raspbian Mail, and certain web applications such as Roundcube webmail, Have devices hooked up to a UPS for power safety Design Flaws Allwinner CPUs crash randomly when under high Armbian ownCloud storage, and Drupal content management.
    [Show full text]
  • Smart Walker IV
    Multidisciplinary Senior Design Conference Kate Gleason College of Engineering Rochester Institute of Technology Rochester, New York 14623 P16041: Smart Walker IV Alex Synesael Alexei Rigaud Electrical Engineering Electrical Engineering Danielle Stone Stephen Hayes Electrical Engineering Electrical Engineering ABSTRACT Diagnostic medical technology in its current form is typically expensive and intrusive to a patient’s lifestyle. The purpose of the Smart Walker IV project was to finalize a robotic assistive walker prototype capable of collecting long-term diagnostic information about a patient’s health and activities. This gives care providers a more complete image of their health than can be achieved during a relatively short visit. In addition to the diagnostic features, Smart Walker is capable of autonomous motion such that it can provide emergency support in the event of a fall, or contact the appropriate third party support. Smart Walker IV is a culmination of three previous prototype versions spanning 2013 through 2015, and is intended to serve as the final revision before the platform is ready to be utilized as a graduate research platform for robotic assistive medical diagnostics. The goal of Smart Walker IV is to complete the build and integration phases from where the previous Smart Walker III team ended. The project progressed through analysis and completion of the existing systems, the design and testing of new diagnostic subsystems, and finally complete system integration and qualification. BACKGROUND Smart Walker began as a diagnostic tool for elderly people, and has since evolved into a research platform. The Smart Walker can take diagnostic measurements, as well as measure vital signs and detect when a patient has fallen.
    [Show full text]
  • An ARM Cluster for Running CMSSW Jobs Lirim Osmani1 and Tomas Linden´ 2 1Department of Computer Science, University of Helsinki, 2Helsinki Institute of Physics
    An ARM cluster for running CMSSW jobs Lirim Osmani1 and Tomas Linden´ 2 1Department of Computer Science, University of Helsinki, 2Helsinki Institute of Physics Introduction Power measurements To meet the computing challenge posed by the High-Luminosity Large The RPi4 is powered by a 5 V / 3 A power supply with a USB-C connector. Hadron Collider (HL-LHC) improved software performance, changed analysis The power drawn by the RPi4 board itself was measured with an AVHzY CT-2 procedures and new hardware resources are required. ARM computers are USB-power meter, neglecting the losses of the power supply. developed for the highly comptetive mobile phone market, so they might The Odroid-N2 uses a 12 V / 1.5 A power supply so a less accurate meter provide less expensive computing resources for HL-LHC computing. had to be used than the one for the RPi4. Hardware Table 2: Power measurements. CMSSW has been ported in earlier work to armv7 [1, 2] and to armv8 [3, 4]. Computer Idle stress-ng Odroid-N2 2±1 W 6±1 W The available ARM systems are usually either servers or inexpensive Raspberry Pi 4 Model B with heat sink 2.6 ±0.1 W 5.8±0.1 W development boards. In this work we have used small ARM development Kaby Lake 14 nm quad core i7-7700 Fedora 29 8±1 W 99±1 W boards, two hexa core Odroid-N2 [5] from HardKernel and two quad core Raspberry Pi 4 Model B (RPi4 in the following) [6] from the Raspberry Pi Foundation.
    [Show full text]
  • Machine Learning/AI Processing on Mobile and Iot Edge Devices
    Mobile AI Machine Learning/AI processing on mobile and IoT Edge Devices Emilio Jose Coronado Lopez http://seoulai.com/ Paradigm We’re living in a data world. A world of connected sensors, signals and data streams. Sensory data continuously flows and gathers into a form of mobile or IoT Edge device, then processed, transferred through network to cloud backends, storage, analytics services... Analytics, outputs coming from it, are some kind of offline, late, delayed processing, mostly runs on stored or historical data. This has been pretty much the main scenario of many business and entities entering into digital industry 4.0. However, it will scale and growth through instant feedback, real time ML/AI services, IT and cloud providers are quite use to those incremental paths on controlled environments, but adding wireless, mobile networks, tough latencies and response times hard constraints beyond the confined system is another story. Examples of Real Time Data Streams Cloud or backend AI/ML storage processing Hardware/Network Wifi, 3G/4G, LPWAN, BT Interfaces ... Camera Data Sensor Signals Signals/Data Collect, gateway and transfer to the cloud or backend for Data Processing Audio Data further processing. Sensor Signals Model Training or Mobile Device Previously IMU/Loc Data Trained Sensor Signals Temp. Data Infer/Predict Sensor Signals Iot Edge Device Stock M. Data Sensor Signals Social N. Data Data/Action/Feedback to the source Sensor Signals To another subsystems Constraints: Latencies Constraints: Latencies, Processing, Bandwidth, data storage …. It’s happening now A typical big data, analytics pipeline integrated with Open Source machine learning frameworks like Tensorflow, Caffe, Keras, etc.
    [Show full text]
  • For the ODROID-N2  September 1, 2019
    Five-Minute Fun with your Monku R1: A Sega Genesis Handheld SD Card September 1, 2019 Now that you have your cool new retro gaming console all set up and you're enjoying playing some retro games, let's take a moment to think about what else we can do with our ODROID device. You'll need a Monku Retro device, like the one we show you how to Linux Gaming: PC-Engine / TurboGrafx - Part 6 September 1, 2019 With this article, we’ve reached the end of the PC-Engine / Turbografx series, which I admit went on much longer than I anticipated. It also was a lot more interesting than I thought it would be. Let’s see what this nal installment will bring, and if the series of famous GO-Tank, Go! - Control a Treaded Bot with Your ODROID-GO September 1, 2019 Using an ODROID-GO handheld gaming system for controlling a couple of small DC motors that are housed inside a treaded tank robot is easier than you may think. This chore is even more remarkable when you learn that only two of the GO’s general purpose input/output (GPIO) pins are used CoreELEC: For the ODROID-N2 September 1, 2019 Kodi has for years been one of the most popular media players and is often recommended when people ask for which multimedia solution to choose as it is available for multiple operating systems and a huge spectrum of devices. The majority of low-cost devices which are bought for the purpose How to Build a Monku Retro Gaming Console - Part 3: Adding The Final Touches September 1, 2019 This is a continuation of the Retro Gaming Console article from last month, where we learned how to congure the software for a retro gaming console.
    [Show full text]