Parallel Computing 2Nd Edition by Grama Et

High Performance Computing Course #: CSI 441/541 High Perf Sci Comp II Spring ‘10 Mark R. Gilder Email: [email protected] [email protected] CSI 441/541 This course investigates the latest trends in high-performance computing (HPC) evolution and examines key issues in developing algorithms capable of exploiting these architectures. Grading: Your grade in the course will be based on completion of assignments (40%), course project (35%), class presentation(15%), class participation (10%). Course Goals Understanding of the latest trends in HPC architecture evolution, Appreciation for the complexities in efficiently mapping algorithms onto HPC architectures, Hands-on experience in design and implementation of algorithms for both shared & distributed memory parallel architectures using MPI, Experience in evaluating performance of parallel programs. Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 2 Grades 40% Homework assignments 35% Final project 15% Class presentations 10% Class participation Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 3 Homework Usually weekly w/some exceptions Must be turned in on time – no late homework assignments will be accepted All work must be your own - cheating will not be tolerated All references must be sited Assignments may consist of problems, programming, or a combination of both Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 4 Homework (continued) Detailed discussion of your results is expected ◦ the actual program is only a small part of the problem Homework assignments will be posted on the class website along with all of the lecture notes ◦ http://www.cs.albany.edu/~gilder Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 5 Project Topic of general interest to the course Read three or four papers to get some ideas of latest research in HPC field Identify an application that can be implemented on our RIT cluster Implement and write a final report describing your application and results Present your results in class Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 6 Other Remarks Would like the course to be very interactive Willing to accept suggestions for changes in content and/or form Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 7 Material Book(s) optional ◦ Introduction to Parallel Computing 2nd Edition by Grama et. Al. ISBN: 0-201-64865-2 ◦ The Sourcebook of Parallel Computing (The Morgan Kaufmann Series) by J. Dongarra, I. Foster, G. Fox et. al. ISBN: 1558608710 ◦ Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers by B. Wilkinson and M. Allen ISBN: 0136717101 Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 8 Material Lecture notes will be provided online either before or just after class Other reading material may be assigned Course website: http://www.cs.albany.edu/~gilder/ Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 9 Course Overview Learning about: ◦ High-Performance Computing (HPC) ◦ Parallel Computing ◦ Performance Analysis ◦ Computational Techniques ◦ Tools to aid in parallel programming ◦ Developing programs using MPI, Pthreads, maybe OpenMP, maybe CUDA Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 10 What You Should Learn In depth understanding of: ◦ When parallel computing is useful ◦ Understanding of parallel computing options ◦ Overview of programming models ◦ Performance analysis and tuning Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 11 Background Strong C-programming experience Knowledge of parallel programming Some background in numerical computing Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 12 Computer Accounts For most of the class we will be using the RIT computer cluster. See the following link for more info about the hardware: http://www.rit.albany.edu/wiki/IBM_pSeries_Cluster Accounts will be made available by the second week of class Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 13 Lecture 1 Outline: ◦ General Computing Trends ◦ HPC Introduction ◦ Why Parallel Computing ◦ Note: some portions of the following slides were taken from Jack Dongarra w/permission Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 15 Units of High Performance Computing Term Expanded Actual Performance 1 Kflop/s 1 Kiloflop/s 103 Flop/sec 1 Mflop/s 1 Megaflop/s 106 Flop/sec 1 Gflop/s 1 Gigaflop/s 109 Flop/sec 1 Tflop/s 1 Teraflop/s 1012 Flop/sec 1 Pflop/s 1 Petaflop/s 1015 Flop/sec Data 1 KB 1 Kilobyte 103 Bytes 1 MB 1 Megabyte 106 Bytes 1 GB 1 Gigabyte 109 Bytes 1 TB 1 Terabyte 1012 Bytes 1 PB 1 Petabyte 1015 Bytes Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 16 HPC Highlights High Performance Computing: Overloaded term but typically refers to the use of computer clusters and/or custom supercomputers to solve large-scale scientific computations. Essentially covers computers which are designed for large computations and/or data intensive tasks. These systems rely on parallel processing to increase algorithm performance (speed-up). Example Applications Include: Computational Fluid Dynamics (CFDs) Large Scale Modeling / Simulation Bioinformatics Molecular Dynamics Financial Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 17 HPC Highlights HPC Attributes: Multiple processors – 10’s, 100’s, 1000’s. High Speed Interconnect Network, i.e, InfiniBand, GigE, etc. Clusters typically built from COTS / commodity components Supercomputers built from mix of both commodity / custom components Performance typically in the Teraflop range (1012 floating point operations / sec) Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 18 HPC Trends HPC at all-time high of $11.6 billion in 2007 http://insidehpc.com/2008/02/21/idc-releases-preliminary-hpc-growth-figures/ X86 and Linux are the current winners Clustering continues to drive change ◦ Price/performance reset ◦ Dramatic increase in units ◦ Growth in standardization of CPU, OS, interconnect Grid technologies will become pervasive New challenges for datacenters: ◦ Power, cooling, system management, system consolidation, virtualization? Storage and data management are growing in importance Software will be the #1 roadblock – and hence provide major opportunities at all levels Source IDC Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 19 HPC Data Example Lets say you can print ◦ 5 columns of 100 numbers each; on both sides of the page = 1000 numbers (Kflop) in one second ( 1 Kflop/s) 106 numbers (Mflop) = 1,000 pages (about 10 cm) ◦ 2 reams of paper / second ◦ 1 Mflop/s 109 numbers (Gflop) = 10,000 cm = 100 m stack ◦ Height of Statue of Liberty (printed / second) 1 Gflop/s Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 20 HPC Data Example Lets say you can print 106 numbers (Mflop) = ◦ 5 columns of 100 numbers each; on 1000 pages (about 10 both sides of the page = 1000 numbers cm) (Kflop) in one second ( 1 Kflop/s) ◦ 2 reams of paper / second ◦ 1 Mflop/s 109 numbers (Gflop) = 10,000 cm = 100 m stack ◦ Height of Statue of Liberty (printed / second) 1 Gflop/s 1012 numbers (Tflop) = 100 km stack; altitude achieved by SpaceShipOne. 1 Tflop/s Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 21 HPC Data Example Lets say you can print ◦ 5 columns of 100 numbers each; on both sides of the page = 1000 numbers (Kflop) in one second ( 1 Kflop/s) ◦ 106 numbers (1 Mflop) = 1000 pages (about 10 cm); 2 reams paper / sec ◦ 109 numbers (1 Gflop) = 10,000 cm = 100 m stack; Height of Statue of Liberty per sec ◦ 1012 numbers (1 Tflop) = 100 km stack; SpaceShipOne’s distance to space/sec 1015 numbers (1 Pflop) = 100,000 km stack printed per second; 1 Pflop/s ◦ ¼ distance to moon Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 22 HPC Data Example Lets say you can print ◦ 5 columns of 100 numbers each; on both sides of the page = 1000 numbers (Kflop) in one second ( 1 Kflop/s) ◦ 106 numbers (1 Mflop) = 1000 pages (about 10 cm); 2 reams paper / sec ◦ 109 numbers (1 Gflop) = 10000 cm = 100 m stack; Height of Statue of Liberty per sec ◦ 1012 numbers (1 Tflop) = 100 km stack; SpaceShipOne’s distance to space/sec ◦ 1015 numbers (1 Pflop) = 100,000 km stack printed per sec 1016 numbers (10 Pflop) = 1,000,000 km stack printed per second; 1 Pflop/s ◦ distance to moon and back and then a bit Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 23 High Performance Computing Today In the past decade, the world has experienced one of the most exciting periods in computer development Microprocessors have become smaller, denser, and more powerful The result is that microprocessor-based supercomputing is rapidly becoming the technology of preference in attacking some of the most important problems of science and engineering Mark R. Gilder CSI 441/541 – SUNY Albany Spring '10 24 Transistor Growth – Moore’s Law 1.E+10 Montecito 1.E+09 Itanium2-9M Itanium 2 1.E+08 Pentium 4 Itanium 1.E+07 Pentium II Pentium III Pentium 1.E+06 I486 I386 1.E+05 I286 8086 Transistors per chip per Transistors 1.E+04 8080 8008 1.E+03 1970 1975 1980 1985 1990 1995 2000 2005 2010 Year Moore’s law states that transistor densities will double every 2 years. Traditionally, this has held and, in fact, the cost per function has dropped, on average, 27% per year. Many other components of computer systems have followed similar exponential growth curves, including data storage capacity and transmission speeds. It really has been unprecedented technological and economic growth. Smaller transistors have lots of nice effects, including the ability to switch transistors faster. This, combined with other component speedups causes many to relate Moore’s law to a 2x improvement in performance every two years. Mark R.

Parallel Computing 2Nd Edition by Grama Et

Evolution of Microprocessor Performance

Course #: CSI 440/540 High Perf Sci Comp I Fall ‘09

Itanium Processor

Intel® Processor Architecture

Itanium and Vnuma

Chapter 29 Itanium Architecture

Madison Processor

United States Securities and Exchange Commission Form

Intel's Core 2 Family

Dual-Core Intel® Itanium® 2 Processor: Reference Manual Update

An Outlook of Technology Scaling Beyond Moore's Law

Intel Multi-Core Presentation