Course #: CSI 440/540 High Perf Sci Comp I Fall ‘09

High Performance Computing Course #: CSI 440/540 High Perf Sci Comp I Fall ‘09 Mark R. Gilder Email: [email protected] [email protected] CSI 440/540 This course investigates the latest trends in high-performance computing (HPC) evolution and examines key issues in developing algorithms capable of exploiting these architectures. Grading: Your grade in the course will be based on completion of assignments (40%), course project (35%), class presentation(15%), class participation (10%). Course Goals Understanding of the latest trends in HPC architecture evolution, Appreciation for the complexities in efficiently mapping algorithms onto HPC architectures, Familiarity with various program transformations in order to improve performance, Hands-on experience in design and implementation of algorithms for both shared & distributed memory parallel architectures using Pthreads, OpenMP and MPI. Experience in evaluating performance of parallel programs. Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 2 Grades 40% Homework assignments 35% Final project 15% Class presentations 10% Class participation Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 3 Homework Usually weekly w/some exceptions Must be turned in on time – no late homework assignments will be accepted All work must be your own - cheating will not be tolerated All references must be sited Assignments may consist of problems, programming, or a combination of both Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 4 Homework (continued) Detailed discussion of your results is expected – the program is only a small part of the problem Homework assignments will be posted on the class website along with all of the lecture notes ◦ http://www.cs.albany.edu/~gilder Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 5 Project Topic of general interest to the course Read three or four papers to get some ideas of latest research in HPC field Identify an application that can be implemented on our RIT cluster Implement and write a final report describing your application and results Present your results in class Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 6 Other Remarks Would like the course to be very interactive Willing to accept suggestions for changes in content and/or form Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 7 Material Book(s) optional ◦ Introduction to Parallel Computing 2nd Edition by Grama et. Al. ISBN: 0-201-64865-2 ◦ The Sourcebook of Parallel Computing (The Morgan Kaufmann Series) by J. Dongarra, I. Foster, G. Fox et. al. ISBN: 1558608710 ◦ Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers by B. Wilkinson and M. Allen ISBN: 0136717101 Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 8 Material Lecture notes will be provided online either before or just after class Other reading material may be assigned Course website: http://www.cs.albany.edu/~gilder/ Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 9 Course Overview Learning about: ◦ High-Performance Computing (HPC) ◦ Parallel Computing ◦ Performance Analysis ◦ Computational Techniques ◦ Tools to aid in parallel programming ◦ Developing programs using MPI, Pthreads, maybe OpenMP, maybe CUDA Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 10 What You Should Learn In depth understanding of: ◦ When parallel computing is useful ◦ Understanding of parallel computing options ◦ Overview of programming models ◦ Performance analysis and tuning Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 11 Background Strong C-programming experience Understanding of Operating Systems Some background in Numerical Computing Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 12 Computer Accounts For most of the class we will be using the RIT computer cluster. See the following link for more info about the hardware: http://www.rit.albany.edu/wiki/IBM_pSeries_Cluster Accounts will be made available by the second week of class Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 13 Homework #1 Implement a version of the following operations: 1) Matrix-vector multiplication n c j Ai, j * x j , for i 1,..,m j 1 2) Matrix multiplication n Ci, j Ai,k *Bk, j , for i,j 1,...,n k 1 The point of this assignment is not to focus on writing software but rather to look at the performance for each of your implementations and try to explain the observed performance. You should run several experiments on various systems and provide an analysis of your results. This should include plots of your data for various values of n between say 10 and 5000. Make sure you provide a write-up along with your plots and be sure to demonstrate that your implementation is also generating the correct results. Information on various processors may be found at: http://www.cpu-world.com/CPUs/index.html http://www.cpu-world.com/sspec/index.html Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 14 Lecture 1 Outline: ◦ HPC Introduction ◦ Motivation ◦ General Computing Trends Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 15 Units of High Performance Computing Term Expanded Actual Performance 1 Kflop/s 1 Kiloflop/s 103 Flop/sec 1 Mflop/s 1 Megaflop/s 106 Flop/sec 1 Gflop/s 1 Gigaflop/s 109 Flop/sec 1 Tflop/s 1 Teraflop/s 1012 Flop/sec 1 Pflop/s 1 Petaflop/s 1015 Flop/sec Data 1 KB 1 Kilobyte 103 Bytes 1 MB 1 Megabyte 106 Bytes 1 GB 1 Gigabyte 109 Bytes 1 TB 1 Terabyte 1012 Bytes 1 PB 1 Petabyte 1015 Bytes Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 16 HPC Highlights High Performance Computing: Overloaded term but typically refers to the use of computer clusters and/or custom supercomputers to solve large-scale scientific computations. Essentially covers computers which are designed for large computations and/or data intensive tasks. These systems rely on parallel processing to increase algorithm performance (speed-up). Example Applications Include: Computational Fluid Dynamics (CFDs) Large Scale Modeling / Simulation Bioinformatics Molecular Dynamics Financial Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 17 HPC Highlights HPC Attributes: Multiple processors – 10’s, 100’s, 1000’s. High Speed Interconnect Network, i.e, InfiniBand, GigE, etc. Clusters typically built from COTS / commodity components Supercomputers built from mix of both commodity / custom components Performance typically in the Teraflop range (1012 floating point operations / sec) Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 18 HPC Data Example Lets say you can print ◦ 5 columns of 100 numbers each; on both sides of the page = 1000 numbers (Kflop) in one second ( 1 Kflop/s) 106 numbers (Mflop) = 1,000 pages (about 10 cm) ◦ 2 reams of paper / second ◦ 1 Mflop/s 109 numbers (Gflop) = 10,000 cm = 100 m stack ◦ Height of Statue of Liberty (printed / second) 1 Gflop/s Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 19 HPC Data Example Lets say you can print 106 numbers (Mflop) = ◦ 5 columns of 100 numbers each; on 1000 pages (about 10 both sides of the page = 1000 numbers cm) (Kflop) in one second ( 1 Kflop/s) ◦ 2 reams of paper / second ◦ 1 Mflop/s 109 numbers (Gflop) = 10,000 cm = 100 m stack ◦ Height of Statue of Liberty (printed / second) 1 Gflop/s 1012 numbers (Tflop) = 100 km stack; altitude achieved by SpaceShipOne. 1 Tflop/s Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 20 HPC Data Example Lets say you can print ◦ 5 columns of 100 numbers each; on both sides of the page = 1000 numbers (Kflop) in one second ( 1 Kflop/s) ◦ 106 numbers (1 Mflop) = 1000 pages (about 10 cm); 2 reams paper / sec ◦ 109 numbers (1 Gflop) = 10,000 cm = 100 m stack; Height of Statue of Liberty per sec ◦ 1012 numbers (1 Tflop) = 100 km stack; SpaceShipOne’s distance to space/sec 1015 numbers (1 Pflop) = 100,000 km stack printed per second; 1 Pflop/s ◦ ¼ distance to moon Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 21 HPC Data Example Lets say you can print ◦ 5 columns of 100 numbers each; on both sides of the page = 1000 numbers (Kflop) in one second ( 1 Kflop/s) ◦ 106 numbers (1 Mflop) = 1000 pages (about 10 cm); 2 reams paper / sec ◦ 109 numbers (1 Gflop) = 10000 cm = 100 m stack; Height of Statue of Liberty per sec ◦ 1012 numbers (1 Tflop) = 100 km stack; SpaceShipOne’s distance to space/sec ◦ 1015 numbers (1 Pflop) = 100,000 km stack printed per sec 1016 numbers (10 Pflop) = 1,000,000 km stack printed per second; 1 Pflop/s ◦ distance to moon and back and then a bit Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 22 High Performance Computing Today In the past decade, the world has experienced one of the most exciting periods in computer development Microprocessors have become smaller, denser, and more powerful The result is that microprocessor-based supercomputing is rapidly becoming the technology of preference in attacking some of the most important problems of science and engineering Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 23 Lecture 1 Outline: ◦ HPC Introduction ◦ Motivation ◦ General Computing Trends Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 24 HPC Motivation Currently Increases in performance accomplished by increases in clock speed Power and heat dissipation limits have clock frequencies stagnating Legacy software investments are at risk Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 25 HPC Motivation Legacy software is based on a single thread of execution We need to start thinking in parallel Lack of compiler/language tools means more painful and costly software development cycles Mark R. Gilder CSI 440/540 – SUNY Albany Fall '08 26 HPC Motivation Parallel or concurrent designs lead to multicore However, multicore may not be enough for some problems Heterogeneous systems to the rescue: these consist of a collection or processors designed for specific problems which are tied together Heterogeneous Multicore: same thing on a chip Result: Diversity of Computing Architectures w/Limited Tools for Exploiting Capabilities Mark R.

Course #: CSI 440/540 High Perf Sci Comp I Fall ‘09

Evolution of Microprocessor Performance

Itanium Processor

Intel® Processor Architecture

Itanium and Vnuma

Chapter 29 Itanium Architecture

Madison Processor

United States Securities and Exchange Commission Form

Intel's Core 2 Family

Parallel Computing 2Nd Edition by Grama Et

Dual-Core Intel® Itanium® 2 Processor: Reference Manual Update

An Outlook of Technology Scaling Beyond Moore's Law

Intel Multi-Core Presentation