An Introduction to Parallel Algorithms

An introduction to parallel algorithms Knut Mørken Department of Informatics Centre of Mathematics for Applications University of Oslo Winter School on Parallel Computing Geilo January 20–25, 2008 1/26 1 Introduction 2 Parallel addition 3 Computing the dot product in parallel 4 Parallel matrix multiplication 5 Solving linear systems of equations in parallel 6 Conclusion 2/26 On a computer s = 0; for i = 1, 2, . , n s = s + ai ; Adding n numbers n Suppose we have n real numbers (ai )i=1 and want to compute their sum s. In mathematics n X s = ai . i=1 3/26 Adding n numbers n Suppose we have n real numbers (ai )i=1 and want to compute their sum s. In mathematics n X s = ai . i=1 On a computer s = 0; for i = 1, 2, . , n s = s + ai ; 3/26 One at a time (sequentially). Several at once (in parallel). Which operations are available to us? In which order can the operations be performed? Two important issues determine the character of an algorithm: Algorithm An algorithm is a precise prescription of how to accomplish a task. 4/26 One at a time (sequentially). Several at once (in parallel). In which order can the operations be performed? Algorithm An algorithm is a precise prescription of how to accomplish a task. Two important issues determine the character of an algorithm: Which operations are available to us? 4/26 One at a time (sequentially). Several at once (in parallel). Algorithm An algorithm is a precise prescription of how to accomplish a task. Two important issues determine the character of an algorithm: Which operations are available to us? In which order can the operations be performed? 4/26 Several at once (in parallel). Algorithm An algorithm is a precise prescription of how to accomplish a task. Two important issues determine the character of an algorithm: Which operations are available to us? In which order can the operations be performed? One at a time (sequentially). 4/26 Algorithm An algorithm is a precise prescription of how to accomplish a task. Two important issues determine the character of an algorithm: Which operations are available to us? In which order can the operations be performed? One at a time (sequentially). Several at once (in parallel). 4/26 We will measure computing time (complexity) by counting the number of time steps necessary to complete all the arithmetic operations of an algorithm. Primitive operations We will assume that the basic operations available to us are The four arithmetic operations (with two arguments) Comparison of numbers (if-tests) Elementary mathematical functions (trigonometric, exponential, logarithms, roots, . ) 5/26 Primitive operations We will assume that the basic operations available to us are The four arithmetic operations (with two arguments) Comparison of numbers (if-tests) Elementary mathematical functions (trigonometric, exponential, logarithms, roots, . ) We will measure computing time (complexity) by counting the number of time steps necessary to complete all the arithmetic operations of an algorithm. 5/26 Mathematically, however, many problems exhibit considerable freedom in the order in which operations are performed. Order of operations In traditional (sequential) programming it is assumed that a computer can only perform one operation at a time. Adding n numbers s = 0; for i = 1, 2, . , n s = s + ai ; 6/26 Order of operations In traditional (sequential) programming it is assumed that a computer can only perform one operation at a time. Adding n numbers s = 0; for i = 1, 2, . , n s = s + ai ; Mathematically, however, many problems exhibit considerable freedom in the order in which operations are performed. 6/26 Order of operations We can add the n numbers in any order n X ai = aπ1 + aπ2 + ··· + aπn i=1 where (π1, . , πn) is any permutation of the integers 1, . , n. This can be exploited to speed up the addition if we have the possibility of performing several operations simultaneously. 7/26 Parallel addition Suppose that n = 10. Then we have s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, 8/26 Parallel addition Suppose that n = 10. Then we have s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, | {z } | {z } | {z } | {z } | {z } 1 1 1 1 1 a1 a2 a3 a4 a5 8/26 Parallel addition Suppose that n = 10. Then we have s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, | {z } | {z } | {z } | {z } | {z } 1 1 1 1 1 a1 a2 a3 a4 a5 1 1 1 1 1 = a1 + a2 + a3 + a4 + a5, 8/26 Parallel addition Suppose that n = 10. Then we have s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, | {z } | {z } | {z } | {z } | {z } 1 1 1 1 1 a1 a2 a3 a4 a5 1 1 1 1 1 = a1 + a2 + a3 + a4 + a5 , | {z } | {z } |{z} 2 2 2 a1 a2 a3 8/26 Parallel addition Suppose that n = 10. Then we have s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, | {z } | {z } | {z } | {z } | {z } 1 1 1 1 1 a1 a2 a3 a4 a5 1 1 1 1 1 = a1 + a2 + a3 + a4 + a5 , | {z } | {z } |{z} 2 2 2 a1 a2 a3 2 2 2 = a1 + a2 + a3, 8/26 Parallel addition Suppose that n = 10. Then we have s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, | {z } | {z } | {z } | {z } | {z } 1 1 1 1 1 a1 a2 a3 a4 a5 1 1 1 1 1 = a1 + a2 + a3 + a4 + a5 , | {z } | {z } |{z} 2 2 2 a1 a2 a3 2 2 2 = a1 + a2 + a3 , | {z } |{z} 3 3 a1 a2 8/26 Parallel addition Suppose that n = 10. Then we have s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, | {z } | {z } | {z } | {z } | {z } 1 1 1 1 1 a1 a2 a3 a4 a5 1 1 1 1 1 = a1 + a2 + a3 + a4 + a5 , | {z } | {z } |{z} 2 2 2 a1 a2 a3 2 2 2 = a1 + a2 + a3 , | {z } |{z} 3 3 a1 a2 3 3 = a1 + a2. 8/26 Adding n numbers a0 = a; for j = 1, 2, . , dlog2 ne j j−1 j−1 j−1 parallel: ai/2 = ai−1 + ai , i = 2, 4, 6, . , |a |; j−1 j j−1 if |a | % 2 > 0 then a−1 = a−1 ; Parallel addition This means that if we have 5 computing units at our disposal, we can add the 10 numbers in 4 time steps. In general, this technique allows us to add n numbers in dlog2 ne time steps if we have n/2 computing units. 9/26 j−1 j j−1 if |a | % 2 > 0 then a−1 = a−1 ; Parallel addition This means that if we have 5 computing units at our disposal, we can add the 10 numbers in 4 time steps. In general, this technique allows us to add n numbers in dlog2 ne time steps if we have n/2 computing units. Adding n numbers a0 = a; for j = 1, 2, . , dlog2 ne j j−1 j−1 j−1 parallel: ai/2 = ai−1 + ai , i = 2, 4, 6, . , |a |; 9/26 Parallel addition This means that if we have 5 computing units at our disposal, we can add the 10 numbers in 4 time steps. In general, this technique allows us to add n numbers in dlog2 ne time steps if we have n/2 computing units. Adding n numbers a0 = a; for j = 1, 2, . , dlog2 ne j j−1 j−1 j−1 parallel: ai/2 = ai−1 + ai , i = 2, 4, 6, . , |a |; j−1 j j−1 if |a | % 2 > 0 then a−1 = a−1 ; 9/26 Parallel addition There is a recursive version of the previous algorithm where each function call is given to a new processor. Adding n numbers (recursive version) sum(a, n) { if n > 1 then return(sum(a, 1, n//2) + sum(a, n//2 + 1, n)) else return(a1); } 10/26 Resources This group of people. Method 1 While more than one number: 1 I share out the numbers evenly, at least two numbers each 2 You add your numbers 3 You pass your results back to me Parallel computing in practice Problem Add 1000 integers, each with 5 digits. 11/26 Method 1 While more than one number: 1 I share out the numbers evenly, at least two numbers each 2 You add your numbers 3 You pass your results back to me Parallel computing in practice Problem Add 1000 integers, each with 5 digits. Resources This group of people. 11/26 Parallel computing in practice Problem Add 1000 integers, each with 5 digits. Resources This group of people. Method 1 While more than one number: 1 I share out the numbers evenly, at least two numbers each 2 You add your numbers 3 You pass your results back to me 11/26 Provided we have n processors this can be computed in dlog2 ne + 1 time steps: Inner product 1 Compute the n products in parallel. 2 Compute the sum. Dot product If a = (a1, a2,..., an) and b = (b1, b2,..., bn) then n X a · b = ai bi . i=1 12/26 Dot product If a = (a1, a2,..., an) and b = (b1, b2,..., bn) then n X a · b = ai bi . i=1 Provided we have n processors this can be computed in dlog2 ne + 1 time steps: Inner product 1 Compute the n products in parallel. 2 Compute the sum. 12/26 Matrix multiplication n,n n,n Let A ∈ R and B ∈ R .

Load more