Current Trends in High Performance Computing

Current Trends in High Performance Computing Chokchai Box Leangsuksun, PhD SWEPCO Endowed Professor*, Computer Science Director, High Performance Computing Initiative Louisiana Tech University [email protected] 1 *SWEPCO endowed professorship is made possible by LA Board of Regents Outline • What is HPC? • Current Trends • More on PS3 and GPU computing • Conclusion 12 December 2011 2 1 Mainstream CPUs • CPU speed – plateaus 3-4 Ghz • More cores in a single chip 3-4 Ghz cap – Dual/Quad core is now – Manycore (GPGPU) • Traditional Applications won’t get a free rides • Conversion to parallel computing (HPC, MT) This diagram is from “no free lunch article in DDJ 12 December 2011 3 New trends in computing • Old & current – SMP, Cluster • Multicore computers – Intel Core 2 Duo – AMD 2x 64 • Many-core accelerators – GPGPU, FPGA, Cell • More Many brains in one computer • Not to increase CPU frequency • Harness many computers – a cluster computing 12/12/11 4 2 What is HPC? • High Performance Computing – Parallel , Supercomputing – Achieve the fastest possible computing outcome – Subdivide a very large job into many pieces – Enabled by multiple high speed CPUs, networking, software & programming paradigms – fastest possible solution – Technologies that help solving non-trivial tasks including scientific, engineering, medical, business, entertainment and etc. • Time to insights, Time to discovery, Times to markets 12 December 2011 5 Parallel Programming Concepts Conventional serial execution Parallel execution of a problem where the problem is represented involves partitioning of the problem as a series of instructions that are into multiple executable parts that are executed by the CPU mutually exclusive and collectively exhaustive represented as a partially Problem ordered set exhibiting concurrency. Problem Task Task Task Task CPU instructions instructions Parallel computing takes advantage of concurrency to : • Solve larger problems with less time • Save on Wall Clock Time • Overcoming memory constraints CPU CPU CPU CPU 6 • Utilizing non-local resources Source from Thomas Sterling’s intro to HPC 12 December 2011 6 3 HPC Applications and Major Industries • Finite Element Modeling – Auto/Aero • Fluid Dynamics – Auto/Aero, Consumer Packaged Goods Mfgs, Process Mfg, Disaster Preparedness (tsunami) • Imaging – Seismic & Medical • Finance & Business – Banks, Brokerage Houses (Regression Analysis, Risk, Options Pricing, What if, …) – Wal-mart’s HPC in their operations • Molecular Modeling – Biotech and Pharmaceuticals Complex Problems, Large Datasets, Long Runs This slide is from Intel presentation “Technologies for Delivering Peak Performance on HPC and Grid Applications” 12 December 2011 7 HPC Drives Knowledge Economy 12/12/11 8 4 Life Science Problem – an example of Protein Folding • Take a computing year (in serial mode) to do molecular dynamics simulation for a protein folding problem • Excerpted from IBM David Klepacki’s The future of HPC 12 December 2011• Petaflop = a thousand trillion floating point operations per second 9 Disaster Preparedness - example • Project LEAD – Severe Weather prediction (Tornado) – OU leads. • HPC & Dynamically adaptation to weather forecast • Professor Seidel’s LSU CCT – Hurricane Route Prediction – Emergency Preparedness – Accuracy of prediction – 1 Mile2 = $1 M 12 December 2011 10 5 HPC accelerates a product • FE analysis on 1 CPU – 1,000,000 elements – Numerical processing for 1 element = .1 secs – One computer will take 100,000 secs = 27.7 hrs • Says 100 CPUs – .27 hr ~ 16 mins 12 December 2011 11 Avian Flu Pandemic Modeled on a Supercomputer • MIDAS (Models of Infectious Disease Agent Study) program • The large-scale, stochastic simulation model examines the nationwide spread of a pandemic influenza virus strain • A simulation starts with 2 passengers with contaminated AF arriving LAX • The simulation rolls out a city-city and census-tract-level picture of the spread of infection • a synthetic population of 281 million people over the course of 180 days • It is a very large scale and complex multi-variant 12 December 2011 12 6 Avian Flu Pandemic (90 days) Timothy C. Germann, Kai Kadau, Catherine A. Macken (Los Alamos National Laboratory); Ira M. Longini Jr. (Emory University) Source from www.lanl.gov 12 December 2011 13 Avian Flu Pandemic (II) • The results show that advance preparation of a modestly effective vaccine in large quantities appears to be preferable to waiting for the development of a well-matched vaccine that may be too late. • The simulation models a synthetic population that matches U.S. census demographics and worker mobility data by randomly assigning the simulated individuals to households, workplaces, schools, and the like. • The models serve as virtual laboratories to study how infectious diseases and what intervention strategies are more effective • Run on the Los Alamos supercomputer known as Pink, a 1,024-node (2,048 processor) LinuxBIOS/Bpro with 2 GB/ node. Source from www.lanl.gov 12 December 2011 14 7 Significant indicators – why HPC now? • Main stream computers with multi-cores (Intel or AMD) – In past 1-2 years, CPU speed was flatten at 3+ Ghz – More CPUs in one chip – Dual core, multi-core chips – Traditional software won’t take advantage of these new processors – Personal/Desktop Supercomputing. • Many real problems are highly computational intensive. – NSA uses supercomputing to do data mining – DOE – fusion, plasma, energy related (including weaponry). – Help solving many other important areas (nanotech, life science etc.) – Product design, ERM/Inventory Management • Giants recently sneeze out HPC – Bush’s state of union speech – 3 main S&T focus of which Supercomputing is one of them – Bill Gates’ keynote speech at SC05 – MS goes after HPC • Google search engine - 100,000 nodes • Playstation 3 is a personal supercomputing platform • Hollywood (Entertainment) is HPC-bound (Pixar – more than 3000 CPUs to render animation) 12 December 2011 15 HPC preparedness • Build work forces that understand HPC paradigm & its applications – HPC/Grid Curriculum in IT/CS/CE/ICT – Offer HPC-enabling tracks to other disciplinary (engineering, life science, physic, computational chem, business etc..) – Training business community – Bring awareness to public • National and strategic policies • Improve Infrastructure 12 December 2011 16 8 Pause here • Switch to a tour of machine rooms – Clusters, our Lab to show what they will be using.. • Get students’ info on signup sheet for accounts on our clusters (azul, quadcore, GPU and PS3). • Intro to Linux • Then continue on HPC101 12/12/11 17 HPC 101 12 December 2011 18 9 How to Run Applications Faster ? • There are 3 ways to improve performance: – Work Harder – Work Smarter – Get more Help • Computer Analogy – Using faster hardware – Optimized algorithms and techniques used to solve computational tasks – Multiple computers to solve a particular task 12 December 2011 19 Parallel Programming Concepts Problem Task Task Task Task instructions CPU CPU CPU CPU Source from Thomas Sterling’s intro to HPC 12 December 2011 20 10 HPC objective • High Performance Computing – Parallel , Supercomputing – Achieve the fastest possible computing outcome – Subdivide a very large job into many pieces – Enabled by multiple high speed CPUs, networking, software & programming paradigms – fastest possible solution – Technologies that help solving non-trivial tasks including scientific, engineering, medical, business, entertainment and etc. 12 December 2011 21 Flynn’s Taxonomy of Computer Architectures l SISD - Single Instruction/Single Data l SIMD - Single Instruction/Multiple Data l MISD - Multiple Instruction/Single Data l MIMD - Multiple Instruction/Multiple Data 22 11 Single Instruction/Single Data PU – Processing Unit Your desktop, before the spread of dual core CPUs Slide Source: Wikipedia, Flynn’s Taxonomy 23 Flavors of SISD Instructions: 24 12 More on pipelining… 25 Single Instruction/Multiple Data Processors that execute same instruction on multiple pieces of data: NVIDIA GPUs Slide Source: Wikipedia, Flynn’s Taxonomy 26 13 Single Instruction/Multiple Data l Each core runs the same set of instructions on different data l Example: l GPGPU: processes pixels of an image in parallel Slide Source: Klimovitski & Macri, Intel 27 SISD versus SIMD Writing a compiler for SIMD architectures is VERY difficult (inter-thread communication complicates the picture…) Slide Source: ars technica, Peakstream article 28 14 Multiple Instruction/Single Data Pipe line : CMU Warp machine. Slide Source: Wikipedia, Flynn’s Taxonomy 29 Multiple Instruction/Multiple Data e.g. Multicore systems were based on a MIMD architecture + programming paradigm Such as openMP, multithreads Slide Source: Wikipedia, Flynn’s Taxonomy 30 15 Multiple Instruction/Multiple Data l The sky is the limit: each PU is free to do as it pleases l Can be of either shared memory or distributed memory categories Instructions: 31 Current HPC Hardware • Traditionally HPC has adopted expensive parallel hardware: – Massively Parallel Processors (MPP) – Symmetric Multi-Processors (SMP) • Cluster Computers • Recent trends in HPC … • Multicore systems • Heterogeneous Computing with Accelerator Boards (GPGPU, FPGA) 12 December 2011 32 16 HPC cluster • Login • Compile • Submit job • At least 2 connections • Run tasks 12 December 2011 33 Parallel Programming Env • Parallel Programming Environments and Tools – Threads (PCs, SMPs, NOW..) • POSIX Threads • Java Threads – MPI • Linux, NT, on many Supercomputers – OpenMP (predominantly

Current Trends in High Performance Computing

2.5 Classification of Parallel Computers

Chapter 5 Multiprocessors and Thread-Level Parallelism

Computer Hardware Architecture Lecture 4

Parallel Processing! 1! CSE 30321 – Lecture 23 – Introduction to Parallel Processing! 2! Suggested Readings! •! Readings! –! H&P: Chapter 7! •! (Over Next 2 Weeks)!

System & Service Management

A Case for NUMA-Aware Contention Management on Multicore Systems

Lecture 2 Parallel Programming Platforms

Lecture 1 Parallel Computing Architectures

Multicore and Multiprocessor Systems: Part I

Non-Uniform Memory Access (NUMA)

Multiprocessing: Architectures and Algorithms

NUMA and GPU So, I Know How to Use MPI and Openmp