The Sunway Taihulight Supercomputer: System and Applications
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Interconnect Your Future Enabling the Best Datacenter Return on Investment
Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2016 Mellanox Accelerates The World’s Fastest Supercomputers . Accelerates the #1 Supercomputer . 39% of Overall TOP500 Systems (194 Systems) . InfiniBand Connects 65% of the TOP500 HPC Platforms . InfiniBand Connects 46% of the Total Petascale Systems . Connects All of 40G Ethernet Systems . Connects The First 100G Ethernet System on The List (Mellanox End-to-End) . Chosen for 65 End-User TOP500 HPC Projects in 2016, 3.6X Higher versus Omni-Path, 5X Higher versus Cray Aries InfiniBand is the Interconnect of Choice for HPC Infrastructures Enabling Machine Learning, High-Performance, Web 2.0, Cloud, Storage, Big Data Applications © 2016 Mellanox Technologies 2 Mellanox Connects the World’s Fastest Supercomputer National Supercomputing Center in Wuxi, China #1 on the TOP500 List . 93 Petaflop performance, 3X higher versus #2 on the TOP500 . 41K nodes, 10 million cores, 256 cores per CPU . Mellanox adapter and switch solutions * Source: “Report on the Sunway TaihuLight System”, Jack Dongarra (University of Tennessee) , June 20, 2016 (Tech Report UT-EECS-16-742) © 2016 Mellanox Technologies 3 Mellanox In the TOP500 . Connects the world fastest supercomputer, 93 Petaflops, 41 thousand nodes, and more than 10 million CPU cores . Fastest interconnect solution, 100Gb/s throughput, 200 million messages per second, 0.6usec end-to-end latency . Broadest adoption in HPC platforms , connects 65% of the HPC platforms, and 39% of the overall TOP500 systems . Preferred solution for Petascale systems, Connects 46% of the Petascale systems on the TOP500 list . Connects all the 40G Ethernet systems and the first 100G Ethernet system on the list (Mellanox end-to-end) . -
NVIDIA Tesla Personal Supercomputer, Please Visit
NVIDIA TESLA PERSONAL SUPERCOMPUTER TESLA DATASHEET Get your own supercomputer. Experience cluster level computing performance—up to 250 times faster than standard PCs and workstations—right at your desk. The NVIDIA® Tesla™ Personal Supercomputer AccessiBLE to Everyone TESLA C1060 COMPUTING ™ PROCESSORS ARE THE CORE is based on the revolutionary NVIDIA CUDA Available from OEMs and resellers parallel computing architecture and powered OF THE TESLA PERSONAL worldwide, the Tesla Personal Supercomputer SUPERCOMPUTER by up to 960 parallel processing cores. operates quietly and plugs into a standard power strip so you can take advantage YOUR OWN SUPERCOMPUTER of cluster level performance anytime Get nearly 4 teraflops of compute capability you want, right from your desk. and the ability to perform computations 250 times faster than a multi-CPU core PC or workstation. NVIDIA CUDA UnlocKS THE POWER OF GPU parallel COMPUTING The CUDA™ parallel computing architecture enables developers to utilize C programming with NVIDIA GPUs to run the most complex computationally-intensive applications. CUDA is easy to learn and has become widely adopted by thousands of application developers worldwide to accelerate the most performance demanding applications. TESLA PERSONAL SUPERCOMPUTER | DATASHEET | MAR09 | FINAL FEATURES AND BENEFITS Your own Supercomputer Dedicated computing resource for every computational researcher and technical professional. Cluster Performance The performance of a cluster in a desktop system. Four Tesla computing on your DesKtop processors deliver nearly 4 teraflops of performance. DESIGNED for OFFICE USE Plugs into a standard office power socket and quiet enough for use at your desk. Massively Parallel Many Core 240 parallel processor cores per GPU that can execute thousands of GPU Architecture concurrent threads. -
Scheduling Many-Task Workloads on Supercomputers: Dealing with Trailing Tasks
Scheduling Many-Task Workloads on Supercomputers: Dealing with Trailing Tasks Timothy G. Armstrong, Zhao Zhang Daniel S. Katz, Michael Wilde, Ian T. Foster Department of Computer Science Computation Institute University of Chicago University of Chicago & Argonne National Laboratory [email protected], [email protected] [email protected], [email protected], [email protected] Abstract—In order for many-task applications to be attrac- as a worker and allocate one task per node. If tasks are tive candidates for running on high-end supercomputers, they single-threaded, each core or virtual thread can be treated must be able to benefit from the additional compute, I/O, as a worker. and communication performance provided by high-end HPC hardware relative to clusters, grids, or clouds. Typically this The second feature of many-task applications is an empha- means that the application should use the HPC resource in sis on high performance. The many tasks that make up the such a way that it can reduce time to solution beyond what application effectively collaborate to produce some result, is possible otherwise. Furthermore, it is necessary to make and in many cases it is important to get the results quickly. efficient use of the computational resources, achieving high This feature motivates the development of techniques to levels of utilization. Satisfying these twin goals is not trivial, because while the efficiently run many-task applications on HPC hardware. It parallelism in many task computations can vary over time, allows people to design and develop performance-critical on many large machines the allocation policy requires that applications in a many-task style and enables the scaling worker CPUs be provisioned and also relinquished in large up of existing many-task applications to run on much larger blocks rather than individually. -
It's a Multi-Core World
It’s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Moore's Law abandoned serial programming around 2004 Courtesy Liberty Computer Architecture Research Group Moore’s Law is not to blame. Intel process technology capabilities High Volume Manufacturing 2004 2006 2008 2010 2012 2014 2016 2018 Feature Size 90nm 65nm 45nm 32nm 22nm 16nm 11nm 8nm Integration Capacity (Billions of 2 4 8 16 32 64 128 256 Transistors) Transistor for Influenza Virus 90nm Process Source: CDC 50nm Source: Intel At end of day, we keep using all those new transistors. That Power and Clock Inflection Point in 2004… didn’t get better. Fun fact: At 100+ Watts and <1V, currents are beginning to exceed 100A at the point of load! Source: Kogge and Shalf, IEEE CISE Courtesy Horst Simon, LBNL Not a new problem, just a new scale… CPU Power W) Cray-2 with cooling tower in foreground, circa 1985 And how to get more performance from more transistors with the same power. RULE OF THUMB A 15% Frequency Power Performance Reduction Reduction Reduction Reduction In Voltage 15% 45% 10% Yields SINGLE CORE DUAL CORE Area = 1 Area = 2 Voltage = 1 Voltage = 0.85 Freq = 1 Freq = 0.85 Power = 1 Power = 1 Perf = 1 Perf = ~1.8 Single Socket Parallelism Processor Year Vector Bits SP FLOPs / core / Cores FLOPs/cycle cycle Pentium III 1999 SSE 128 3 1 3 Pentium IV 2001 SSE2 128 4 1 4 Core 2006 SSE3 128 8 2 16 Nehalem 2008 SSE4 128 8 10 80 Sandybridge 2011 AVX 256 16 12 192 Haswell 2013 AVX2 256 32 18 576 KNC 2012 AVX512 512 32 64 2048 KNL 2016 AVX512 512 64 72 4608 Skylake 2017 AVX512 512 96 28 2688 Putting It All Together Prototypical Application: Serial Weather Model CPU MEMORY First Parallel Weather Modeling Algorithm: Richardson in 1917 Courtesy John Burkhardt, Virginia Tech Weather Model: Shared Memory (OpenMP) Core Fortran: !$omp parallel do Core do i = 1, n Core a(i) = b(i) + c(i) enddoCore C/C++: MEMORY #pragma omp parallel for Four meteorologists in the samefor(i=1; room sharingi<=n; i++) the map. -
E-Commerce Marketplace
E-COMMERCE MARKETPLACE NIMBIS, AWESIM “Nimbis is providing, essentially, the e-commerce infrastructure that allows suppliers and OEMs DEVELOP COLLABORATIVE together, to connect together in a collaborative HPC ENVIRONMENT form … AweSim represents a big, giant step forward in that direction.” Nimbis, founded in 2008, acts as a clearinghouse for buyers and sellers of technical computing — Bob Graybill, Nimbis president and CEO services and provides pre-negotiated access to high performance computing services, software, and expertise from the leading compute time vendors, independent software vendors, and domain experts. Partnering with the leading computing service companies, Nimbis provides users with a choice growing menu of pre-qualified, pre-negotiated services from HPC cycle providers, independent software vendors, domain experts, and regional solution providers, delivered on a “pay-as-you- go” basis. Nimbis makes it easier and more affordable for desktop users to exploit technical computing for faster results and superior products and solutions. VIRTUAL DESIGNS. REAL BENEFITS. Nimbis Services Inc., a founding associate of the AweSim industrial engagement initiative led by the Ohio Supercomputer Center, has been delving into access complexities and producing, through innovative e-commerce solutions, an easy approach to modeling and simulation resources for small and medium-sized businesses. INFORMATION TECHNOLOGY INFORMATION TECHNOLOGY 2016 THE CHALLENGE Nimbis and the AweSim program, along with its predecessor program Blue Collar Computing, have identified several obstacles that impede widespread adoption of modeling and simulation powered by high performance computing: expensive hardware, complex software and extensive training. In response, the public/private partnership is developing and promoting use of customized applications (apps) linked to OSC’s powerful supercomputer systems. -
Challenges in Programming Extreme Scale Systems William Gropp Wgropp.Cs.Illinois.Edu
1 Challenges in Programming Extreme Scale Systems William Gropp wgropp.cs.illinois.edu Towards Exascale Architectures Figure 1: Core Group for Node (Low Capacity, High Bandwidth) 3D Stacked (High Capacity, Memory Low Bandwidth) DRAM Thin Cores / Accelerators Fat Core NVRAM Fat Core Integrated NIC Core for Off-Chip Coherence Domain Communication Figure 2.1: Abstract Machine Model of an exascale Node Architecture 2.1 Overarching Abstract Machine Model We begin with asingle model that highlights the anticipated key hardware architectural features that may support exascale computing. Figure 2.1 pictorially presents this as a single model, while the next subsections Figure 2: Basic Layout of a Node describe several emergingFrom technology “Abstract themes that characterize moreMachine specific hardware design choices by com- Sunway TaihuLightmercial vendors. In Section 2.2, we describe the most plausible set of realizations of the singleAdapteva model that are Epiphany-V DOE Sierra viable candidates forModels future supercomputing and architectures. Proxy • 1024 RISC June• 19, Heterogeneous2016 2.1.1 Processor 2 • Power 9 with 4 NVIDA It is likely that futureArchitectures exascale machines will feature heterogeneous for nodes composed of a collectionprocessors of more processors (MPE,than a single type of processing element. The so-called fat cores that are found in many contemporary desktop Volta GPU and server processorsExascale characterized by deep pipelines, Computing multiple levels of the memory hierarchy, instruction-level parallelism -
Wearable Technology for Enhanced Security
Communications on Applied Electronics (CAE) – ISSN : 2394-4714 Foundation of Computer Science FCS, New York, USA Volume 5 – No.10, September 2016 – www.caeaccess.org Wearable Technology for Enhanced Security Agbaje M. Olugbenga, PhD Babcock University Department of Computer Science Ogun State, Nigeria ABSTRACT Sproutling. Watches like the Apple Watch, and jewelry such Wearable's comprise of sensors and have computational as Cuff and Ringly. ability. Gadgets such as wristwatches, pens, and glasses with Taking a look at the history of computer generations up to the installed cameras are now available at cheap prices for user to present, we could divide it into three main types: mainframe purchase to monitor or securing themselves. Nigerian faced computing, personal computing, and ubiquitous or pervasive with several kidnapping in schools, homes and abduction for computing[4]. The various divisions is based on the number ransomed collection and other unlawful acts necessitate these of computers per users. The mainframe computing describes reviews. The success of the wearable technology in medical one large computer connected to many users and the second, uses prompted the research into application into security uses. personal computing, as one computer per person while the The method of research is the use of case studies and literature term ubiquitous computing however, was used in 1991 by search. This paper takes a look at the possible applications of Paul Weiser. Weiser depicted a world full of embedded the wearable technology to combat the cases of abduction and sensing technologies to streamline and improve life [5]. kidnapping in Nigeria. Nigeria faced with several kidnapping in schools and homes General Terms are in dire need for solution. -
Defining and Measuring Supercomputer Reliability
Defining and Measuring Supercomputer Reliability, Availability, and Serviceability (RAS) Jon Stearley <[email protected]> Sandia National Laboratories?? Albuquerque, New Mexico Abstract. The absence of agreed definitions and metrics for supercomputer RAS obscures meaningful discussion of the issues involved and hinders their solution. This paper seeks to foster a common basis for communication about supercom- puter RAS, by proposing a system state model, definitions, and measurements. These are modeled after the SEMI-E10 [1] specification which is widely used in the semiconductor manufacturing industry. 1 Impetus The needs for standardized terminology and metrics for supercomputer RAS begins with the procurement process, as the below quotation excellently summarizes: “prevailing procurement practices are ... a lengthy and expensive undertak- ing both for the government and for participating vendors. Thus any technically valid methodologies that can standardize or streamline this process will result in greater value to the federally-funded centers, and greater opportunity to fo- cus on the real problems involved in deploying and utilizing one of these large systems.” [2] Appendix A provides several examples of “numerous general hardware and software specifications” [2] from the procurements of several modern systems. While there are clearly common issues being communicated, the language used is far from consistent. Sites struggle to describe their reliability needs, and vendors strive to translate these descriptions into capabilities they can deliver to multiple customers. Another example is provided by this excerpt: “The system must be reliable... It is important to define what we mean by reliable. We do not mean high availability... Reliability in this context means that a large parallel job running for many hours has a high probability of suc- cessfully completing. -
Supercomputer
Four types of computers INTRODUCTION TO COMPUTING Since the advent of the first computer different types and sizes of computers are offering different services. Computers can be as big as occupying a large building and as small as a laptop or a microcontroller in mobile & embedded systems.The four basic types of computers are. Super computer Mainframe Computer Minicomputer Microcomputer Supercomputer supercomputer - Types of Computer The most powerful computers in terms of performance and data processing are the supercomputers. These are specialized and task specific computers used by large organizations. These computers are used for research and exploration purposes, like NASA uses supercomputers for launching space shuttles, controlling them and for space exploration purpose. The supercomputers are very expensive and very large in size. It can be accommodated in large air-conditioned rooms; some super computers can span an entire building. In 1964, Seymour cray designed the first supercomptuer CDC 6600. Uses of Supercomputer In Pakistan and other countries Supercomputers are used by Educational Institutes like NUST (Pakistan) for research purposes. Pakistan Atomic Energy commission & Heavy Industry Taxila uses supercomputers for Research purposes. Space Exploration Supercomputers are used to study the origin of the universe, the dark-matters. For these studies scientist use IBM’s powerful supercomputer “Roadrunner” at National Laboratory Los Alamos. Earthquake studies Supercomputers are used to study the Earthquakes phenomenon. Besides that supercomputers are used for natural resources exploration, like natural gas, petroleum, coal, etc. Weather Forecasting Supercomputers are used for weather forecasting, and to study the nature and extent of Hurricanes, Rainfalls, windstorms, etc. Nuclear weapons testing Supercomputers are used to run weapon simulation that can test the Range, accuracy & impact of Nuclear weapons. -
Hardware Technology of the Earth Simulator 1
Special Issue on High Performance Computing 27 Architecture and Hardware for HPC 1 Hardware Technology of the Earth Simulator 1 By Jun INASAKA,* Rikikazu IKEDA,* Kazuhiko UMEZAWA,* 5 Ko YOSHIKAWA,* Shitaka YAMADA† and Shigemune KITAWAKI‡ 5 1-1 This paper describes the hardware technologies of the supercomputer system “The Earth Simula- 1-2 ABSTRACT tor,” which has the highest performance in the world and was developed by ESRDC (the Earth 1-3 & 2-1 Simulator Research and Development Center)/NEC. The Earth Simulator has adopted NEC’s leading edge 2-2 10 technologies such as most advanced device technology and process technology to develop a high-speed and high-10 2-3 & 3-1 integrated LSI resulting in a one-chip vector processor. By combining this LSI technology with various advanced hardware technologies including high-density packaging, high-efficiency cooling technology against high heat dissipation and high-density cabling technology for the internal node shared memory system, ESRDC/ NEC has been successful in implementing the supercomputer system with its Linpack benchmark performance 15 of 35.86TFLOPS, which is the world’s highest performance (http://www.top500.org/ : the 20th TOP500 list of the15 world’s fastest supercomputers). KEYWORDS The Earth Simulator, Supercomputer, CMOS, LSI, Memory, Packaging, Build-up Printed Wiring Board (PWB), Connector, Cooling, Cable, Power supply 20 20 1. INTRODUCTION (Main Memory Unit) package are all interconnected with the fine coaxial cables to minimize the distances The Earth Simulator has adopted NEC’s most ad- between them and maximize the system packaging vanced CMOS technologies to integrate vector and density corresponding to the high-performance sys- 25 25 parallel processing functions for realizing a super- tem. -
IMTEC-91-58 High-Performance Computing: Industry Uses Of
‘F ---l_l,__*“.^*___l.ll_l-_.. - -.”_-._ .----._..-_.. ___..-.-..,_.. -...-I_.__...... liitiltd l_--._-_.---..-_---. Sl;tl,~s (;~~ttc~ral Accottttt.irtg Offiw ---- GAO 1Zcport 1,oCongressional Request,ers i HIGH-PERFORMANCE COMPUTING Industry Uses of Supercomputers and High-Speed Networks 144778 RELEASED (;AO/IM’l’E(:-!tl-T,El .-. I .II ._. ._._. “..__-I ._.,__ _. _ .._.. ._.-._ . .I .._..-.-...-..--.-- ~-- United States General Accounting Office GAO Washington, D.C. 20648 Information Management and Technology Division B-244488 July 30,199l The Honorable Ernest F. Hollings ‘Chairman, Senate Committee on Commerce, Science, and Transportation The Honorable Al Gore Chairman, Subcommittee on Science, Technology, and Space Senate Committee on Commerce, Science, and Transportation The Honorable George E. Brown, Jr. Chairman, House Committee on Science, Space, and Technology The Honorable Robert S. Walker Ranking Minority Member House Committee on Science, Space, and Technology The HonorableTim Valentine Chairman, Subcommittee on Technology and Competitiveness House Committee on Scientie, Space, and Technology The Honorable Tom Lewis Ranking Minority Member Subcommittee on Technology and Competitiveness House Committee on Science, Space, and Technology This report responds to your October 2,1990, and March 11,1991, requests for information on supercomputers and high-speed networks. You specifically asked that we l provide examples of how various industries are using supercomputers to improve products, reduce costs, save time, and provide other benefits; . identify barriers preventing the increased use of supercomputers; and . provide examples of how certain industries are using and benefitting from high-speed networks. Page 1 GAO/JMTEG91-59 Supercomputera and High-Speed Networks B244488 As agreed with the Senate Committee on Commerce, Science, and Trans- portation, and Subcommittee on Science, Technology, and Space, our review of supercomputers examined five industries-oil, aerospace, automobile, and chemical and pharmaceutical. -
Next Steps in Quantum Computing: Computer Science's Role
Next Steps in Quantum Computing: Computer Science’s Role This material is based upon work supported by the National Science Foundation under Grant No. 1734706. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Next Steps in Quantum Computing: Computer Science’s Role Margaret Martonosi and Martin Roetteler, with contributions from numerous workshop attendees and other contributors as listed in Appendix A. November 2018 Sponsored by the Computing Community Consortium (CCC) NEXT STEPS IN QUANTUM COMPUTING 1. Introduction ................................................................................................................................................................1 2. Workshop Methods ....................................................................................................................................................6 3. Technology Trends and Projections .........................................................................................................................6 4. Algorithms .................................................................................................................................................................8 5. Devices ......................................................................................................................................................................12 6. Architecture ..............................................................................................................................................................12