Benchmarking Micro-Core Architectures for Detecting Disasters at the Edge
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Interconnect Your Future Enabling the Best Datacenter Return on Investment
Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2016 Mellanox Accelerates The World’s Fastest Supercomputers . Accelerates the #1 Supercomputer . 39% of Overall TOP500 Systems (194 Systems) . InfiniBand Connects 65% of the TOP500 HPC Platforms . InfiniBand Connects 46% of the Total Petascale Systems . Connects All of 40G Ethernet Systems . Connects The First 100G Ethernet System on The List (Mellanox End-to-End) . Chosen for 65 End-User TOP500 HPC Projects in 2016, 3.6X Higher versus Omni-Path, 5X Higher versus Cray Aries InfiniBand is the Interconnect of Choice for HPC Infrastructures Enabling Machine Learning, High-Performance, Web 2.0, Cloud, Storage, Big Data Applications © 2016 Mellanox Technologies 2 Mellanox Connects the World’s Fastest Supercomputer National Supercomputing Center in Wuxi, China #1 on the TOP500 List . 93 Petaflop performance, 3X higher versus #2 on the TOP500 . 41K nodes, 10 million cores, 256 cores per CPU . Mellanox adapter and switch solutions * Source: “Report on the Sunway TaihuLight System”, Jack Dongarra (University of Tennessee) , June 20, 2016 (Tech Report UT-EECS-16-742) © 2016 Mellanox Technologies 3 Mellanox In the TOP500 . Connects the world fastest supercomputer, 93 Petaflops, 41 thousand nodes, and more than 10 million CPU cores . Fastest interconnect solution, 100Gb/s throughput, 200 million messages per second, 0.6usec end-to-end latency . Broadest adoption in HPC platforms , connects 65% of the HPC platforms, and 39% of the overall TOP500 systems . Preferred solution for Petascale systems, Connects 46% of the Petascale systems on the TOP500 list . Connects all the 40G Ethernet systems and the first 100G Ethernet system on the list (Mellanox end-to-end) . -
Configurable RISC-V Softcore Processor for FPGA Implementation
1 Configurable RISC-V softcore processor for FPGA implementation Joao˜ Filipe Monteiro Rodrigues, Instituto Superior Tecnico,´ Universidade de Lisboa Abstract—Over the past years, the processor market has and development of several programming tools. The RISC-V been dominated by proprietary architectures that implement Foundation controls the RISC-V evolution, and its members instruction sets that require licensing and the payment of fees to are responsible for promoting the adoption of RISC-V and receive permission so they can be used. ARM is an example of one of those companies that sell its microarchitectures to participating in the development of the new ISA. In the list of the manufactures so they can implement them into their own members are big companies like Google, NVIDIA, Western products, and it does not allow the use of its instruction set Digital, Samsung, or Qualcomm. (ISA) in other implementations without licensing. The RISC-V The main goal of this work is the development of a RISC- instruction set appeared proposing the hardware and software V softcore processor to be implemented in an FPGA, using development without costs, through the creation of an open- source ISA. This way, it is possible that any project that im- a non-RISC-V core as the base of this architecture. The plements the RISC-V ISA can be made available open-source or proposed solution is focused on solving the problems and even implemented in commercial products. However, the RISC- limitations identified in the other RISC-V cores that were V solutions that have been developed do not present the needed analyzed in this thesis, especially in terms of the adaptability requirements so they can be included in projects, especially the and flexibility, allowing future modifications according to the research projects, because they offer poor documentation, and their performances are not suitable. -
ASIC Implemented Microblaze-Based Coprocessor for Data Stream
ASIC-IMPLEMENTED MICROBLAZE-BASED COPROCESSOR FOR DATA STREAM MANAGEMENT SYSTEMS A Thesis Submitted to the Faculty of Purdue University by Linknath Surya Balasubramanian In Partial Fulfillment of the Requirements for the Degree of Master of Science in Electrical and Computer Engineering May 2020 Purdue University Indianapolis, Indiana ii THE PURDUE UNIVERSITY GRADUATE SCHOOL STATEMENT OF THESIS APPROVAL Dr. John J. Lee, Chair Department of Electrical and Computer Engineering Dr. Lauren A. Christopher Department of Electrical and Computer Engineering Dr. Maher E. Rizkalla Department of Electrical and Computer Engineering Approved by: Dr. Brian King Head of Graduate Program iii ACKNOWLEDGMENTS I would first like to express my gratitude to my advisor Dr. John J. Lee and my thesis committee members Dr. Lauren A. Christopher and Dr. Maher E. Rizkalla for their patience, guidance, and support during this journey. I would also like to thank Mrs. Sherrie Tucker for her patience, help, and encouragement. Lastly, I must thank Dr. Pranav Vaidya and Mr. Tareq S. Alqaisi for all their support, technical guidance, and advice. Thank you all for taking time and helping me complete this study. iv TABLE OF CONTENTS Page LIST OF TABLES :::::::::::::::::::::::::::::::::: vi LIST OF FIGURES ::::::::::::::::::::::::::::::::: vii ABSTRACT ::::::::::::::::::::::::::::::::::::: ix 1 INTRODUCTION :::::::::::::::::::::::::::::::: 1 1.1 Previous Work ::::::::::::::::::::::::::::::: 1 1.2 Motivation :::::::::::::::::::::::::::::::::: 2 1.3 Thesis Outline :::::::::::::::::::::::::::::::: -
Towards Exascale Computing
TowardsTowards ExascaleExascale Computing:Computing: TheThe ECOSCALEECOSCALE ApproachApproach Dirk Koch, The University of Manchester, UK ([email protected]) 1 Motivation: let’s build a 1,000,000,000,000,000,000 FLOPS Computer (Exascale computing: 1018 FLOPS = one quintillion or a billion billion floating-point calculations per sec.) 2 1,000,000,000,000,000,000 FLOPS . 10,000,000,000,000,000,00 FLOPS 1975: MOS 6502 (Commodore 64, BBC Micro) 3 Sunway TaihuLight Supercomputer . 2016 (fully operational) . 12,543,6000,000,000,000,00 FLOPS (125.436 petaFLOPS) . Architecture Sunway SW26010 260C (Digital Alpha clone) 1.45GHz 10,649,600 cores . Power “The cooling system for TaihuLight uses a closed- coupled chilled water outfit suited for 28 MW with a custom liquid cooling unit”* *https://www.nextplatform.com/2016/06/20/look-inside-chinas-chart-topping-new-supercomputer/ . Cost US$ ~$270 million 4 TOP500 Performance Development We need more than all the performance of all TOP500 machines together! 5 TaihuLight for Exascale Computing? We need 8x the worlds fastest supercomputer: . Architecture Sunway SW26010 260C (Digital Alpha clone) @1.45GHz: > 85M cores . Power 224 MW (including cooling) costs ~ US$ 40K/hour, US$ 340M/year from coal: 2,302,195 tons of CO2 per year . Cost US$ 2.16 billion We have to get at least 10x better in energy efficiency 2-3x better in cost Also: scalable programming models 6 Alternative: Green500 Shoubu supercomputer (#1 Green500 in 2015): . Cores: 1,181,952 . Theoretical Peak: 1,535.83 TFLOPS/s . Memory: 82 TB . Processor: Xeon E5-2618Lv3 8C 2.3GHz . -
Computational PHYSICS Shuai Dong
Computational physiCs Shuai Dong Evolution: Is this our final end-result? Outline • Brief history of computers • Supercomputers • Brief introduction of computational science • Some basic concepts, tools, examples Birth of Computational Science (Physics) The first electronic general-purpose computer: Constructed in Moore School of Electrical Engineering, University of Pennsylvania, 1946 ENIAC: Electronic Numerical Integrator And Computer ENIAC Electronic Numerical Integrator And Computer • Design and construction was financed by the United States Army. • Designed to calculate artillery firing tables for the United States Army's Ballistic Research Laboratory. • It was heralded in the press as a "Giant Brain". • It had a speed of one thousand times that of electro- mechanical machines. • ENIAC was named an IEEE Milestone in 1987. Gaint Brain • ENIAC contained 17,468 vacuum tubes, 7,200 crystal diodes, 1,500 relays, 70,000 resistors, 10,000 capacitors and around 5 million hand-soldered joints. It weighed more than 27 tons, took up 167 m2, and consumed 150 kW of power. • This led to the rumor that whenever the computer was switched on, lights in Philadelphia dimmed. • Input was from an IBM card reader, and an IBM card punch was used for output. Development of micro-computers modern PC 1981 IBM PC 5150 CPU: Intel i3,i5,i7, CPU: 8088, 5 MHz 3 GHz Floppy disk or cassette Solid state disk 1984 Macintosh Steve Jobs modern iMac Supercomputers The CDC (Control Data Corporation) 6600, released in 1964, is generally considered the first supercomputer. Seymour Roger Cray (1925-1996) The father of supercomputing, Cray-1 who created the supercomputer industry. Cray Inc. -
CS 110 Computer Architecture Lecture 5: Intro to Assembly Language, MIPS Intro
CS 110 Computer Architecture Lecture 5: Intro to Assembly Language, MIPS Intro Instructor: Sören Schwertfeger http://shtech.org/courses/ca/ School of Information Science and Technology SIST ShanghaiTech University Slides based on UC Berkley's CS61C 1 Using Memory You Don’t Own • What’s wrong with this code? char *append(const char* s1, const char *s2) { const int MAXSIZE = 128; char result[128]; int i=0, j=0; for (j=0; i<MAXSIZE-1 && j<strlen(s1); i++,j++) { result[i] = s1[j]; } for (j=0; i<MAXSIZE-1 && j<strlen(s2); i++,j++) { result[i] = s2[j]; } result[++i] = '\0'; return result; } 2 Using Memory You Don’t Own • Beyond stack read/write char *append(const char* s1, const char *s2) { const int MAXSIZE = 128; char result[128]; result is a local array name – int i=0, j=0; stack memory allocated for (j=0; i<MAXSIZE-1 && j<strlen(s1); i++,j++) { result[i] = s1[j]; } for (j=0; i<MAXSIZE-1 && j<strlen(s2); i++,j++) { result[i] = s2[j]; } result[++i] = '\0'; return result; Function returns pointer to stack } memory – won’t be valid after function returns 3 Managing the Heap • realloc(p,size): – Resize a previously allocated block at p to a new size – If p is NULL, then realloc behaves like malloc – If size is 0, then realloc behaves like free, deallocating the block from the heap – Returns new address of the memory block; NOTE: it is likely to have moved! E.g.: allocate an array of 10 elements, expand to 20 elements later int *ip; ip = (int *) malloc(10*sizeof(int)); /* always check for ip == NULL */ … … … ip = (int *) realloc(ip,20*sizeof(int)); -
FCMSSR Meeting 2018-01 All Slides
Federal Committee for Meteorological Services and Supporting Research (FCMSSR) Dr. Neil Jacobs Assistant Secretary for Environmental Observation and Prediction and FCMSSR Chair April 30, 2018 Office of the Federal Coordinator for Meteorology Services and Supporting Research 1 Agenda 2:30 – Opening Remarks (Dr. Neil Jacobs, NOAA) 2:40 – Action Item Review (Dr. Bill Schulz, OFCM) 2:45 – Federal Coordinator's Update (OFCM) 3:00 – Implementing Section 402 of the Weather Research And Forecasting Innovation Act Of 2017 (OFCM) 3:20 – Federal Meteorological Services And Supporting Research Strategic Plan and Annual Report. (OFCM) 3:30 – Qualification Standards For Civilian Meteorologists. (Mr. Ralph Stoffler, USAF A3-W) 3:50 – National Earth System Predication Capability (ESPC) High Performance Computing Summary. (ESPC Staff) 4:10 – Open Discussion (All) 4:20 – Wrap-Up (Dr. Neil Jacobs, NOAA) Office of the Federal Coordinator for Meteorology Services and Supporting Research 2 FCMSSR Action Items AI # Text Office Comment Status Due Date Responsible 2017-2.1 Reconvene JAG/ICAWS to OFCM, • JAG/ICAWS convened. Working 04/30/18 develop options to broaden FCMSSR • Options presented to FCMSSR Chairmanship beyond Agencies ICMSSR the Undersecretary of Commerce • then FCMSSR with a for Oceans and Atmosphere. revised Charter Draft a modified FCMSSR • Draft Charter reviewed charter to include ICAWS duties by ICMSSR. as outlined in Section 402 of the • Pending FCMSSR and Weather Research and Forecasting OSTP approval to Innovation Act of 2017 and secure finalize Charter for ICMSSR concurrence. signature. Recommend new due date: 30 June 2018. 2017-2.2 Publish the Strategic Plan for OFCM 1/12/18: Plan published on Closed 11/03/17 Federal Weather Coordination as OFCM website presented during the 24 October 2017 FCMMSR Meeting. -
FPGA Architecture: Survey and Challenges Full Text Available At
Full text available at: http://dx.doi.org/10.1561/1000000005 FPGA Architecture: Survey and Challenges Full text available at: http://dx.doi.org/10.1561/1000000005 FPGA Architecture: Survey and Challenges Ian Kuon University of Toronto Toronto, ON Canada [email protected] Russell Tessier University of Massachusetts Amherst, MA USA [email protected] Jonathan Rose University of Toronto Toronto, ON Canada [email protected] Boston – Delft Full text available at: http://dx.doi.org/10.1561/1000000005 Foundations and Trends R in Electronic Design Automation Published, sold and distributed by: now Publishers Inc. PO Box 1024 Hanover, MA 02339 USA Tel. +1-781-985-4510 www.nowpublishers.com [email protected] Outside North America: now Publishers Inc. PO Box 179 2600 AD Delft The Netherlands Tel. +31-6-51115274 The preferred citation for this publication is I. Kuon, R. Tessier and J. Rose, FPGA Architecture: Survey and Challenges, Foundations and Trends R in Elec- tronic Design Automation, vol 2, no 2, pp 135–253, 2007 ISBN: 978-1-60198-126-4 c 2008 I. Kuon, R. Tessier and J. Rose All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, mechanical, photocopying, recording or otherwise, without prior written permission of the publishers. Photocopying. In the USA: This journal is registered at the Copyright Clearance Cen- ter, Inc., 222 Rosewood Drive, Danvers, MA 01923. Authorization to photocopy items for internal or personal use, or the internal or personal use of specific clients, is granted by now Publishers Inc for users registered with the Copyright Clearance Center (CCC). -
This Is Your Presentation Title
Introduction to GPU/Parallel Computing Ioannis E. Venetis University of Patras 1 Introduction to GPU/Parallel Computing www.prace-ri.eu Introduction to High Performance Systems 2 Introduction to GPU/Parallel Computing www.prace-ri.eu Wait, what? Aren’t we here to talk about GPUs? And how to program them with CUDA? Yes, but we need to understand their place and their purpose in modern High Performance Systems This will make it clear when it is beneficial to use them 3 Introduction to GPU/Parallel Computing www.prace-ri.eu Top 500 (June 2017) CPU Accel. Rmax Rpeak Power Rank Site System Cores Cores (TFlop/s) (TFlop/s) (kW) National Sunway TaihuLight - Sunway MPP, Supercomputing Center Sunway SW26010 260C 1.45GHz, 1 10.649.600 - 93.014,6 125.435,9 15.371 in Wuxi Sunway China NRCPC National Super Tianhe-2 (MilkyWay-2) - TH-IVB-FEP Computer Center in Cluster, Intel Xeon E5-2692 12C 2 Guangzhou 2.200GHz, TH Express-2, Intel Xeon 3.120.000 2.736.000 33.862,7 54.902,4 17.808 China Phi 31S1P NUDT Swiss National Piz Daint - Cray XC50, Xeon E5- Supercomputing Centre 2690v3 12C 2.6GHz, Aries interconnect 3 361.760 297.920 19.590,0 25.326,3 2.272 (CSCS) , NVIDIA Tesla P100 Cray Inc. DOE/SC/Oak Ridge Titan - Cray XK7 , Opteron 6274 16C National Laboratory 2.200GHz, Cray Gemini interconnect, 4 560.640 261.632 17.590,0 27.112,5 8.209 United States NVIDIA K20x Cray Inc. DOE/NNSA/LLNL Sequoia - BlueGene/Q, Power BQC 5 United States 16C 1.60 GHz, Custom 1.572.864 - 17.173,2 20.132,7 7.890 4 Introduction to GPU/ParallelIBM Computing www.prace-ri.eu How do -
The Sunway Taihulight Supercomputer: System and Applications
SCIENCE CHINA Information Sciences . RESEARCH PAPER . July 2016, Vol. 59 072001:1–072001:16 doi: 10.1007/s11432-016-5588-7 The Sunway TaihuLight supercomputer: system and applications Haohuan FU1,3 , Junfeng LIAO1,2,3 , Jinzhe YANG2, Lanning WANG4 , Zhenya SONG6 , Xiaomeng HUANG1,3 , Chao YANG5, Wei XUE1,2,3 , Fangfang LIU5 , Fangli QIAO6 , Wei ZHAO6 , Xunqiang YIN6 , Chaofeng HOU7 , Chenglong ZHANG7, Wei GE7 , Jian ZHANG8, Yangang WANG8, Chunbo ZHOU8 & Guangwen YANG1,2,3* 1Ministry of Education Key Laboratory for Earth System Modeling, and Center for Earth System Science, Tsinghua University, Beijing 100084, China; 2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China; 3National Supercomputing Center in Wuxi, Wuxi 214072, China; 4College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China; 5Institute of Software, Chinese Academy of Sciences, Beijing 100190, China; 6First Institute of Oceanography, State Oceanic Administration, Qingdao 266061, China; 7Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China; 8Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China Received May 27, 2016; accepted June 11, 2016; published online June 21, 2016 Abstract The Sunway TaihuLight supercomputer is the world’s first system with a peak performance greater than 100 PFlops. In this paper, we provide a detailed introduction to the TaihuLight system. In contrast with other existing heterogeneous supercomputers, which include both CPU processors and PCIe-connected many-core accelerators (NVIDIA GPU or Intel Xeon Phi), the computing power of TaihuLight is provided by a homegrown many-core SW26010 CPU that includes both the management processing elements (MPEs) and computing processing elements (CPEs) in one chip. -
A Soft Processor Microblaze-Based Embedded System for Cardiac Monitoring
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 4, No. 9, 2013 A Soft Processor MicroBlaze-Based Embedded System for Cardiac Monitoring El Hassan El Mimouni Mohammed Karim University Sidi Mohammed Ben Abdellah University Sidi Mohammed Ben Abdellah Fès, Morocco Fès, Morocco Abstract—this paper aims to contribute to the efforts of recent years, and has become also an important way to assert design community to demonstrate the effectiveness of the state of the heart’s condition [5 - 9]. the art Field Programmable Gate Array (FPGA), in the embedded systems development, taking a case study in the II. SYSTEM OVERVIEW biomedical field. With this design approach, we have developed a We have designed and implemented a prototype of basic System on Chip (SoC) for cardiac monitoring based on the soft embedded system for cardiac monitoring, whose functional processor MicroBlaze and the Xilkernel Real Time Operating block diagram is shown in the figure 2; it exhibits a modular System (RTOS), both from Xilinx. The system permits the structure that facilitates the development and debugging. Thus, acquisition and the digitizing of the Electrocardiogram (ECG) analog signal, displaying heart rate on seven segments module it includes 2 main modules: and ECG on Video Graphics Adapter (VGA) screen, tracing the An analog module intended for acquiring and heart rate variability (HRV) tachogram, and communication conditioning of the analog ECG signal to make it with a Personal Computer (PC) via the serial port. We have used appropriate for use by the second digital module ; the MIT_BIH Database records to test and evaluate our implementation performance. -
It's a Multi-Core World
It’s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group Moore’s Law is not to blame. Intel process technology capabilities High Volume Manufacturing 2004 2006 2008 2010 2012 2014 2016 2018 Feature Size 90nm 65nm 45nm 32nm 22nm 16nm 11nm 8nm Integration Capacity (Billions of 2 4 8 16 32 64 128 256 Transistors) Transistor for Influenza Virus 90nm Process Source: CDC 50nm Source: Intel At end of day, we keep using all those new transistors. That Power and Clock Inflection Point in 2004… didn’t get better. Fun fact: At 100+ Watts and <1V, currents are beginning to exceed 100A at the point of load! Source: Kogge and Shalf, IEEE CISE Courtesy Horst Simon, LBNL Not a new problem, just a new scale… CPU Power W) Cray-2 with cooling tower in foreground, circa 1985 And how to get more performance from more transistors with the same power. RULE OF THUMB A 15% Frequency Power Performance Reduction Reduction Reduction Reduction In Voltage 15% 45% 10% Yields SINGLE CORE DUAL CORE Area = 1 Area = 2 Voltage = 1 Voltage = 0.85 Freq = 1 Freq = 0.85 Power = 1 Power = 1 Perf = 1 Perf = ~1.8 Single Socket Parallelism Processor Year Vector Bits SP FLOPs / core / Cores FLOPs/cycle cycle Pentium III 1999 SSE 128 3 1 3 Pentium IV 2001 SSE2 128 4 1 4 Core 2006 SSE3 128 8 2 16 Nehalem 2008 SSE4 128 8 10 80 Sandybridge 2011 AVX 256 16 12 192 Haswell 2013 AVX2 256 32 18 576 KNC 2012 AVX512 512 32 64 2048 KNL 2016 AVX512 512 64 72 4608 Skylake 2017 AVX512 512 96 28 2688 Putting It All Together Prototypical Application: Serial Weather Model CPU MEMORY First Parallel Weather Modeling Algorithm: Richardson in 1917 Courtesy John Burkhardt, Virginia Tech Weather Model: Shared Memory (OpenMP) Core Fortran: !$omp parallel do Core do i = 1, n Core a(i) = b(i) + c(i) enddoCore C/C++: MEMORY #pragma omp parallel for Four meteorologists in the samefor(i=1; room sharingi<=n; i++) the map.