Parallel Computing with MATLAB
Jiro Doke, Ph.D. Senior Application Engineer
Sarah Wait Zaranek, Ph.D. MATLAB Product Marketing
© 2011 The MathWorks, Inc.1 A Question to Consider
Do you want to speed up your algorithms or deal with large data? If so…
– Do you have a multi-core or multi-processor computer?
– Do you have a high-end graphics processing unit (GPU)?
– Do you have access to a computer cluster?
2 Utilizing Additional Processing Power
. Built-in multithreading (implicit) – Core MATLAB and Image Processing Toolbox – Utility for specific matrix operations (linear algebra, fft, filter, etc) – No necessary code change
. Parallel computing tools (explicit) – Parallel Computing Toolbox – MATLAB Distributed Computing Server – Broad utility controlled by the MATLAB user
3 Agenda
. Introduction to Parallel Computing Tools
. Using Multi-core/Multi-processor Machines
. Using Graphics Processing Units (GPUs)
. Scaling Up to a Cluster
4 Going Beyond Serial MATLAB Applications
Worker Worker
Worker
OOLBOXES Worker T Worker Worker BLOCKSETS Worker
Worker
5 Parallel Computing on the Desktop
. Use Parallel Computing Toolbox Desktop Computer Parallel Computing Toolbox . Speed up parallel applications on local computer
. Take full advantage of desktop power by using CPUs and GPUs (up to 12 workers in R2011b)
. Separate computer cluster not required
6 Scale Up to Clusters, Grids and Clouds
Computer Cluster Desktop Computer MATLAB Distributed Computing Server Parallel Computing Toolbox
Scheduler
7 Parallel Computing enables you to …
Larger Compute Pool Larger Memory Pool
Speed up Computations Work with Large Data
11 26 41
12 27 42
13 28 43
14 29 44
15 30 45
16 31 46
17 32 47
17 33 48
19 34 49
20 35 50
21 36 51
22 37 52
8 Agenda
. Introduction to Parallel Computing Tools
. Using Multi-core/Multi-processor Machines
. Using Graphics Processing Units (GPUs)
. Scaling Up to a Cluster
9
Programming Parallel Applications
Greater Greater Control
Ease of Use of Ease
10 Using Additional Cores/Processors (CPUs)
. Support built into Toolboxes Greater Control
Ease of Use of Ease
11 Example: Built-in Support for Parallelism in Other Tools
. Use built-in support for Parallel Computing Toolbox in Optimization Toolbox
. Run optimization in parallel
. Use pool of MATLAB workers
12 Other Tools Providing Parallel Computing Support
. Optimization Toolbox
. Worker Global Optimization Toolbox Worker
TOOLBOXES Worker . Statistics Toolbox Worker Worker BLOCKSETS Worker . Simulink Design Optimization Worker . Bioinformatics Toolbox . Model-Based Calibration Toolbox . … http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html
13 Using Additional Cores/Processors (CPUs)
. Support built into Toolboxes Greater Control
. Simple programming constructs: parfor
Ease of Use of Ease
15 Running Independent Tasks or Iterations
. Ideal problem for parallel computing . No dependencies or communications between tasks . Examples include parameter sweeps and Monte Carlo simulations
Time Time
16 Example: Parameter Sweep of ODEs
. Parameter sweep of ODE system
. Use pool of MATLAB workers
Damped spring oscillator . Convert for to parfor 5
mx b x k x 0 . Interleave serial and 1,2,... 1,2,... parallel code . Sweep through different values of b and k . Record peak value for each simulation 17 Tips for using parfor
. Requirement: Task and order independence
. Classification of Variables – One of the most common type of problems people run into when working with PARFOR. – At runtime, MATLAB needs determine how each variable would get treated. – Documentation: Parallel Computing Toolbox User’s Guide Parallel for-Loops Advanced Topics
. http://blogs.mathworks.com/loren/2009/10/02/using- parfor-loops-getting-up-and-running/
19 Using Additional Cores/Processors (CPUs)
. Support built into Toolboxes Greater Control
. Simple programming constructs: parfor
Ease of Use of Ease . Full control of parallelization: jobs and tasks
20 Agenda
. Introduction to Parallel Computing Tools
. Using Multi-core/Multi-processor Machines
. Using Graphics Processing Units (GPUs)
. Scaling Up to a Cluster
21 Gaining Performance with More Hardware
Using More Cores (CPUs) Using GPUs
Core 1 Core 2
Core 3 Core 4
Device Memory Cache
22 What is a Graphics Processing Unit (GPU)
. Originally for graphics acceleration, now also used for scientific calculations
. Massively parallel array of integer and floating point processors – Typically hundreds of processors per card – GPU cores complement CPU cores
. Dedicated high-speed memory
* Parallel Computing Toolbox requires NVIDIA GPUs with Compute Capability 1.3 or greater, including NVIDIA Tesla 10-series and 20-series products. See http://www.nvidia.com/object/cuda_gpus.html for a complete listing 23 Example: GPU Computing in the Parallel Computing Toolbox
. Solve 2nd order wave equation: 휕2푢 휕2푢 휕2푢 = + 휕푡2 휕푥2 휕푦2
. Send and create data on the GPU
. Run calculations with built-in GPU functions
24 Benchmark: Solving 2D Wave Equation CPU vs GPU
25 Summary of Options for Targeting GPUs
. Use GPU array interface with
MATLAB built-in functions Greater Control
. Execute custom functions on
elements of the GPU array
Ease of Use of Ease . Create kernels from existing CUDA code and PTX files
Webinar: “GPU Computing with MATLAB” http://www.mathworks.com/company/events/webinars/wbnr59816.html 26 Agenda
. Introduction to Parallel Computing Tools
. Using Multi-core/Multi-processor Machines
. Using Graphics Processing Units (GPUs)
. Scaling Up to a Cluster
27 Setting Up Cluster Computing (for System Admins)
Computer Cluster
Desktop Computer MATLAB Distributed Computing Server Parallel Computing Toolbox
Scheduler
MATLAB Distributed Computing Server • All-product install • Worker license per process • License by packs: 8, 16, 32, 64, etc. • No additional toolbox licenses needed 28 Why scale up to a cluster?
. Solve larger, computationally-intensive problems with more processing power
. Solve memory-intensive problems
. Schedule computations to offload from your local machine
29 Scheduling Work (batch)
Worker Work
TOOLBOXES Scheduler Worker Result Worker BLOCKSETS Worker
30 Scheduling Work (batch)
Local Machine Computer Cluster
Worker
TOOLBOXES Scheduler Worker Worker BLOCKSETS Worker
31 Scheduling Work (batch)
Local Machine
Worker
TOOLBOXES Scheduler Worker Worker BLOCKSETS Worker
32 Scheduling Work (batch)
Computer Cluster
Worker
TOOLBOXES Scheduler Worker Worker BLOCKSETS Worker
33 Scheduling Work (batch)
Computer Cluster
Worker
TOOLBOXES Scheduler Worker Worker BLOCKSETS Worker
Local Machine
Remote Desktop Local Machine
34 Example: Scheduled Processing
. Offload processing to workers (local or cluster)
. Regain control of MATLAB after offloading Damped spring oscillator
5 . Monitor progress of scheduled mx b x k x 0 job 1,2,... 1,2,... . Sweep through different values of b and k . Retrieve results from job . Record peak value for each simulation 35 Distributing Large Data
11 26 41
12 27 42
13 28 43
14 29 44
15 30 45
16 31 46 TOOLBOXES 17 32 47 17 33 48 BLOCKSETS 19 34 49 20 35 50
21 36 51
22 37 52
Remotely Manipulate Array Distributed Array from Client MATLAB Lives on the Workers 36 Client-side Distributed Arrays and SPMD
. Client-side distributed arrays – Class distributed – Can be created and manipulated directly from the client. – Simpler access to memory on labs – 100s of built-in functions that operate on distributed arrays, including client-side visualization functions
. spmd – Block of code executed on workers – Worker specific commands – Explicit communication between workers using MPI – Mixture of parallel and serial code
37 Summary
. Speed up parallel applications on desktop by using Parallel Computing Toolbox
. Take full advantage of CPU and GPU hardware
. Use MATLAB Distributed Computing Server to – Scale up to clusters, grids and clouds – Work with data that is too large to fit on to desktop computers
38 For more information
Visit
www.mathworks.com/products/parallel-computing
39