<<

Introduction to Parallel

SDSC Summer Institute Rick Wagner August 6-10, 2012 HPC Systems Manager San Diego, CA

SAN DIEGO CENTER Purpose, Goals, Outline, etc.

• Introduce broad concepts… • Define terms… • Explore a working model… • Provide references…

…all in the context of

By Robert Stadler (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons SAN DIEGO SUPERCOMPUTER CENTER Purpose, Goals, Outline, etc.

Outline: • Introduction • Parallel Models • Performance () • Example Application • Summary

By Robert Stadler (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons SAN DIEGO SUPERCOMPUTER CENTER Parallel vs. High Performance In the abstract, parallel computing is a very broad topic…

…this talk focuses on applications synchronized across processors:

By User Minesweeper on en.wikipedia (Minesweeper) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0 high performance computing. (http://creativecommons.org/licenses/by-sa/3.0/)], via Wikimedia Commons

SAN DIEGO SUPERCOMPUTER CENTER Why Parallel Computing?

Impatience! Greed!

SAN DIEGO SUPERCOMPUTER CENTER Why Parallel Computing?

Impatience! Greed!

The desire to solve The desire to solve problems faster bigger problems, or improve accuracy

SAN DIEGO SUPERCOMPUTER CENTER Why Parallel Computing?

Impatience! Greed!

The desire to solve The desire to solve problems faster bigger problems, or improve accuracy

E.g., understanding E.g., to be correct when climate change before modeling climate itʼs too late change

SAN DIEGO SUPERCOMPUTER CENTER Parallel Programming Models

Parallel programming works around multiples of the basic architecture, with some variation in memory access, IO path, etc.

Supercomputers are this, multiplied.

Source: https://computing.llnl.gov/tutorials/parallel_comp /

SAN DIEGO SUPERCOMPUTER CENTER Programming

Typical single : all processors access all memory.

Applications can read and write memory addresses.

Watch out for race conditions! Source: https://computing.llnl.gov/tutorials/parallel_comp /

Common tools: OpenMP, threads

SAN DIEGO SUPERCOMPUTER CENTER Programming

Source: https://computing.llnl.gov/tutorials/parallel_comp / Grouping machines on a network provides • more memory = larger problems • more CPUs = faster solutions

Standard tool is the Passing Interface (MPI) a to abstract the network

SAN DIEGO SUPERCOMPUTER CENTER Terms

Shared memory: Threads

Distributed memory: Tasks

SAN DIEGO SUPERCOMPUTER CENTER Other Models Hybrid parallelization, e.g., MPI code with threaded sections

Accelerator, offload • GPU: CUDA, GLSL • MIC

• Partitioned Global (PGAS) • High-Throughput Computing

Google is your friend! SAN DIEGO SUPERCOMPUTER CENTER Performance, AKA, Speedup

Strong scaling: More cores for same problem = less time

Weak scaling: More cores for bigger problem = same time

Obviously, which you want depends on your problem.

SAN DIEGO SUPERCOMPUTER CENTER Amdahlʼs Law http://en.wikipedia.org/wiki/Amdahl%27s_law

P: Fraction of the code that can be parallelized

N: Number of tasks, threads, etc.

1 – P: Your enemy 1 S(N) =

By Daniel (en:File:AmdahlsLaw.svg ) (1- P) + P/N [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

SAN DIEGO SUPERCOMPUTER CENTER Instrument Your Code!

Know where your code spends its time!

Chose the big blocks and dive down

Start simple (printf), get better tools as needed

SAN DIEGO SUPERCOMPUTER CENTER Example: Power Spectrum

Very common application: Radially bin power from FFT

Example uses FFTW, “Fastest Fourier Transform in the West” http://www.fftw.org/

FFTW Provides OpenMP and MPI wrappers

SAN DIEGO SUPERCOMPUTER CENTER Our Experiment

Run an OpenMP and MPI version of the power spectrum code, using 16 threads, and 16 tasks respectively.

Compare performance across major codes blocks.

Measure overall speedup.

SAN DIEGO SUPERCOMPUTER CENTER Code Blocks

1. read data

2. fft

By Virens (Own work) [CC-BY-3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons

3. bin profile

SAN DIEGO SUPERCOMPUTER CENTER OpenMP Version

1. read data

2. fft

By Virens (Own work) [CC-BY-3.0 (http://creativecommons.org/licenses/by/3.0)], threaded via Wikimedia Commons

3. bin profile

SAN DIEGO SUPERCOMPUTER CENTER OpenMP: Total Time

SAN DIEGO SUPERCOMPUTER CENTER OpenMP: Time by Section

SAN DIEGO SUPERCOMPUTER CENTER OpenMP: Relative Time by Section

SAN DIEGO SUPERCOMPUTER CENTER OpenMP: Scaling

SAN DIEGO SUPERCOMPUTER CENTER OpenMP: Ratio of Time

SAN DIEGO SUPERCOMPUTER CENTER OpenMP: Ratio of Time

SAN DIEGO SUPERCOMPUTER CENTER MPI Version

1. read data

2. fft

By Virens (Own work) [CC-BY-3.0 (http://creativecommons.org/licenses/by/3.0)], parallel via Wikimedia Commons

3. bin profile

SAN DIEGO SUPERCOMPUTER CENTER MPI: Total Time

SAN DIEGO SUPERCOMPUTER CENTER MPI: Time by Section

SAN DIEGO SUPERCOMPUTER CENTER MPI: Relative Time by Section

SAN DIEGO SUPERCOMPUTER CENTER MPI: Scaling

SAN DIEGO SUPERCOMPUTER CENTER Review

• Parallel programming models are largely divided into shared vs. distributed memory

• The only way to know how well your code is parallelized is to measure it

• Serial blocks of code are the bottlenecks

• As core counts rise, this is your challenge

SAN DIEGO SUPERCOMPUTER CENTER References

Introduction to Parallel Computing https://computing.llnl.gov/tutorials/parallel_comp/

Physics 244: Parallel Computing for Science and Engineering http://lca.ucsd.edu/projects/phys244

Software Carpentry http://software-carpentry.org/

SAN DIEGO SUPERCOMPUTER CENTER