An introduction to algorithms

Knut Mørken

Department of Informatics Centre of for Applications University of Oslo

Winter School on Parallel Computing Geilo January 20–25, 2008

1/26 1 Introduction

2 Parallel

3 Computing the dot product in parallel

4 Parallel

5 Solving linear systems of equations in parallel

6 Conclusion

2/26 On a s = 0; for i = 1, 2, . . . , n s = s + ai ;

Adding n numbers

n Suppose we have n real numbers (ai )i=1 and want to compute their sum s. In mathematics n X s = ai . i=1

3/26 Adding n numbers

n Suppose we have n real numbers (ai )i=1 and want to compute their sum s. In mathematics n X s = ai . i=1

On a computer s = 0; for i = 1, 2, . . . , n s = s + ai ;

3/26 One at a time (sequentially). Several at once (in parallel).

Which operations are available to us? In which order can the operations be performed?

Two important issues determine the character of an algorithm:

Algorithm

An algorithm is a precise prescription of how to accomplish a task.

4/26 One at a time (sequentially). Several at once (in parallel).

In which order can the operations be performed?

Algorithm

An algorithm is a precise prescription of how to accomplish a task. Two important issues determine the character of an algorithm: Which operations are available to us?

4/26 One at a time (sequentially). Several at once (in parallel).

Algorithm

An algorithm is a precise prescription of how to accomplish a task. Two important issues determine the character of an algorithm: Which operations are available to us? In which order can the operations be performed?

4/26 Several at once (in parallel).

Algorithm

An algorithm is a precise prescription of how to accomplish a task. Two important issues determine the character of an algorithm: Which operations are available to us? In which order can the operations be performed? One at a time (sequentially).

4/26 Algorithm

An algorithm is a precise prescription of how to accomplish a task. Two important issues determine the character of an algorithm: Which operations are available to us? In which order can the operations be performed? One at a time (sequentially). Several at once (in parallel).

4/26 We will measure computing time (complexity) by counting the number of time steps necessary to complete all the arithmetic operations of an algorithm.

Primitive operations

We will assume that the basic operations available to us are The four arithmetic operations (with two arguments) Comparison of numbers (if-tests) Elementary mathematical functions (trigonometric, exponential, logarithms, roots, . . . )

5/26 Primitive operations

We will assume that the basic operations available to us are The four arithmetic operations (with two arguments) Comparison of numbers (if-tests) Elementary mathematical functions (trigonometric, exponential, logarithms, roots, . . . )

We will measure computing time (complexity) by counting the number of time steps necessary to complete all the arithmetic operations of an algorithm.

5/26 Mathematically, however, many problems exhibit considerable freedom in the order in which operations are performed.

Order of operations

In traditional (sequential) programming it is assumed that a computer can only perform one operation at a time. Adding n numbers s = 0; for i = 1, 2, . . . , n s = s + ai ;

6/26

In traditional (sequential) programming it is assumed that a computer can only perform one operation at a time. Adding n numbers s = 0; for i = 1, 2, . . . , n s = s + ai ;

Mathematically, however, many problems exhibit considerable freedom in the order in which operations are performed.

6/26 Order of operations

We can add the n numbers in any order

n X ai = aπ1 + aπ2 + ··· + aπn i=1

where (π1, . . . , πn) is any permutation of the integers 1, . . . , n. This can be exploited to speed up the addition if we have the possibility of performing several operations simultaneously.

7/26 Parallel addition

Suppose that n = 10. Then we have

s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10,

8/26 Parallel addition

Suppose that n = 10. Then we have

s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, | {z } | {z } | {z } | {z } | {z } 1 1 1 1 1 a1 a2 a3 a4 a5

8/26 Parallel addition

Suppose that n = 10. Then we have

s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, | {z } | {z } | {z } | {z } | {z } 1 1 1 1 1 a1 a2 a3 a4 a5 1 1 1 1 1 = a1 + a2 + a3 + a4 + a5,

8/26 Parallel addition

Suppose that n = 10. Then we have

s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, | {z } | {z } | {z } | {z } | {z } 1 1 1 1 1 a1 a2 a3 a4 a5 1 1 1 1 1 = a1 + a2 + a3 + a4 + a5 , | {z } | {z } |{z} 2 2 2 a1 a2 a3

8/26 Parallel addition

Suppose that n = 10. Then we have

s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, | {z } | {z } | {z } | {z } | {z } 1 1 1 1 1 a1 a2 a3 a4 a5 1 1 1 1 1 = a1 + a2 + a3 + a4 + a5 , | {z } | {z } |{z} 2 2 2 a1 a2 a3 2 2 2 = a1 + a2 + a3,

8/26 Parallel addition

Suppose that n = 10. Then we have

s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, | {z } | {z } | {z } | {z } | {z } 1 1 1 1 1 a1 a2 a3 a4 a5 1 1 1 1 1 = a1 + a2 + a3 + a4 + a5 , | {z } | {z } |{z} 2 2 2 a1 a2 a3 2 2 2 = a1 + a2 + a3 , | {z } |{z} 3 3 a1 a2

8/26 Parallel addition

Suppose that n = 10. Then we have

s = a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10, | {z } | {z } | {z } | {z } | {z } 1 1 1 1 1 a1 a2 a3 a4 a5 1 1 1 1 1 = a1 + a2 + a3 + a4 + a5 , | {z } | {z } |{z} 2 2 2 a1 a2 a3 2 2 2 = a1 + a2 + a3 , | {z } |{z} 3 3 a1 a2 3 3 = a1 + a2.

8/26 Adding n numbers a0 = a; for j = 1, 2, . . . , dlog2 ne j j−1 j−1 j−1 parallel: ai/2 = ai−1 + ai , i = 2, 4, 6, . . . , |a |; j−1 j j−1 if |a | % 2 > 0 then a−1 = a−1 ;

Parallel addition

This means that if we have 5 computing units at our disposal, we can add the 10 numbers in 4 time steps.

In general, this technique allows us to add n numbers in dlog2 ne time steps if we have n/2 computing units.

9/26 j−1 j j−1 if |a | % 2 > 0 then a−1 = a−1 ;

Parallel addition

This means that if we have 5 computing units at our disposal, we can add the 10 numbers in 4 time steps.

In general, this technique allows us to add n numbers in dlog2 ne time steps if we have n/2 computing units. Adding n numbers a0 = a; for j = 1, 2, . . . , dlog2 ne j j−1 j−1 j−1 parallel: ai/2 = ai−1 + ai , i = 2, 4, 6, . . . , |a |;

9/26 Parallel addition

This means that if we have 5 computing units at our disposal, we can add the 10 numbers in 4 time steps.

In general, this technique allows us to add n numbers in dlog2 ne time steps if we have n/2 computing units. Adding n numbers a0 = a; for j = 1, 2, . . . , dlog2 ne j j−1 j−1 j−1 parallel: ai/2 = ai−1 + ai , i = 2, 4, 6, . . . , |a |; j−1 j j−1 if |a | % 2 > 0 then a−1 = a−1 ;

9/26 Parallel addition

There is a recursive version of the previous algorithm where each call is given to a new processor. Adding n numbers (recursive version) sum(a, n) { if n > 1 then return(sum(a, 1, n//2) + sum(a, n//2 + 1, n)) else return(a1); }

10/26 Resources This group of people.

Method 1 While more than one number: 1 I share out the numbers evenly, at least two numbers each 2 You add your numbers 3 You pass your results back to me

Parallel computing in practice

Problem Add 1000 integers, each with 5 digits.

11/26 Method 1 While more than one number: 1 I share out the numbers evenly, at least two numbers each 2 You add your numbers 3 You pass your results back to me

Parallel computing in practice

Problem Add 1000 integers, each with 5 digits.

Resources This group of people.

11/26 Parallel computing in practice

Problem Add 1000 integers, each with 5 digits.

Resources This group of people.

Method 1 While more than one number: 1 I share out the numbers evenly, at least two numbers each 2 You add your numbers 3 You pass your results back to me

11/26 Provided we have n processors this can be computed in dlog2 ne + 1 time steps: Inner product 1 Compute the n products in parallel. 2 Compute the sum.

Dot product

If a = (a1, a2,..., an) and b = (b1, b2,..., bn) then

n X a · b = ai bi . i=1

12/26 Dot product

If a = (a1, a2,..., an) and b = (b1, b2,..., bn) then

n X a · b = ai bi . i=1 Provided we have n processors this can be computed in dlog2 ne + 1 time steps: Inner product 1 Compute the n products in parallel. 2 Compute the sum.

12/26 Matrix multiplication

n,n n,n Let A ∈ R and B ∈ R . Then the product AB is a matrix in n,n R which is given by   a1   a2   AB =  .  b1 b2 ··· bn  .   .  an   a1 · b1 a1 · b2 ··· a1 · bn   a2 · b1 a2 · b2 ··· a2 · bn =  . . .  .  . . ..   . . ···  an · b1 an · b2 ··· an · bn

13/26 Since all the n2 dot products in the matrix product are independent, we can also compute AB in

dlog2 ne + 1

time steps, provided we may use n3 processors.

Parallel matrix multiplication

Recall that one dot product of length n can be computed in dlog2 ne + 1 time steps provided we have n processors at our disposal.

14/26 Parallel matrix multiplication

Recall that one dot product of length n can be computed in dlog2 ne + 1 time steps provided we have n processors at our disposal. Since all the n2 dot products in the matrix product are independent, we can also compute AB in

dlog2 ne + 1

time steps, provided we may use n3 processors.

14/26 Accumulation of the dot product on one processor requires n time steps, provided we have n2 processors.

Parallel matrix multiplication

Alternative algorithm Give each of the n2 dot products in   a1 · b1 a1 · b2 ··· a1 · bn   a2 · b1 a2 · b2 ··· a2 · bn  . . .  .  . . ..   . . ···  an · b1 an · b2 ··· an · bn

to different processors.

15/26 Parallel matrix multiplication

Alternative algorithm Give each of the n2 dot products in   a1 · b1 a1 · b2 ··· a1 · bn   a2 · b1 a2 · b2 ··· a2 · bn  . . .  .  . . ..   . . ···  an · b1 an · b2 ··· an · bn

to different processors. Accumulation of the dot product on one processor requires n time steps, provided we have n2 processors.

15/26 can be computed by first calculating

H1 = A1,1(B1,2 − B2,2), H5 = (A1,1 + A2,2)(B1,1 + B2,2),

H2 = A2,2(B2,1 − B1,1), H6 = (A1,2 − A2,2)(B2,1 + B2,2),

H3 = (A2,1 + A2,2)B1,1, H7 = (A1,1 − A2,1)(B1,1 + B1,2).

H4 = a2,2(B2,1 − B1,1),

Then

C 1,1 = H4 + H5 + H6 − H2, C 2,1 = H2,1 + H2,2,

C 1,2 = H1,1 + H1,2, C 2,2 = H1 + H1 − H3 − H7.

Strassen’s algorithm (sequential)

The product of the two block matrices ! ! ! C C A A B B 1,1 1,2 = 1,1 1,2 1,1 1,2 C 2,1 C 2,2 A2,1 A2,2 B2,1 B2,2

16/26 H5 = (A1,1 + A2,2)(B1,1 + B2,2),

H6 = (A1,2 − A2,2)(B2,1 + B2,2),

H7 = (A1,1 − A2,1)(B1,1 + B1,2).

Then

C 1,1 = H4 + H5 + H6 − H2, C 2,1 = H2,1 + H2,2,

C 1,2 = H1,1 + H1,2, C 2,2 = H1 + H1 − H3 − H7.

Strassen’s algorithm (sequential)

The product of the two block matrices ! ! ! C C A A B B 1,1 1,2 = 1,1 1,2 1,1 1,2 C 2,1 C 2,2 A2,1 A2,2 B2,1 B2,2

can be computed by first calculating

H1 = A1,1(B1,2 − B2,2),

H2 = A2,2(B2,1 − B1,1),

H3 = (A2,1 + A2,2)B1,1,

H4 = a2,2(B2,1 − B1,1),

16/26 Then

C 1,1 = H4 + H5 + H6 − H2, C 2,1 = H2,1 + H2,2,

C 1,2 = H1,1 + H1,2, C 2,2 = H1 + H1 − H3 − H7.

Strassen’s algorithm (sequential)

The product of the two block matrices ! ! ! C C A A B B 1,1 1,2 = 1,1 1,2 1,1 1,2 C 2,1 C 2,2 A2,1 A2,2 B2,1 B2,2

can be computed by first calculating

H1 = A1,1(B1,2 − B2,2), H5 = (A1,1 + A2,2)(B1,1 + B2,2),

H2 = A2,2(B2,1 − B1,1), H6 = (A1,2 − A2,2)(B2,1 + B2,2),

H3 = (A2,1 + A2,2)B1,1, H7 = (A1,1 − A2,1)(B1,1 + B1,2).

H4 = a2,2(B2,1 − B1,1),

16/26 Strassen’s algorithm (sequential)

The product of the two block matrices ! ! ! C C A A B B 1,1 1,2 = 1,1 1,2 1,1 1,2 C 2,1 C 2,2 A2,1 A2,2 B2,1 B2,2

can be computed by first calculating

H1 = A1,1(B1,2 − B2,2), H5 = (A1,1 + A2,2)(B1,1 + B2,2),

H2 = A2,2(B2,1 − B1,1), H6 = (A1,2 − A2,2)(B2,1 + B2,2),

H3 = (A2,1 + A2,2)B1,1, H7 = (A1,1 − A2,1)(B1,1 + B1,2).

H4 = a2,2(B2,1 − B1,1),

Then

C 1,1 = H4 + H5 + H6 − H2, C 2,1 = H2,1 + H2,2,

C 1,2 = H1,1 + H1,2, C 2,2 = H1 + H1 − H3 − H7.

16/26 More elaborate algorithms exist that have complexity O(n2.376) (Coppersmith-Winograd).

Strassen’s algorithm

Strassen’s algorithm may be applied recursively to multiply together any two matrices. Theorem (Complexity of Strassen’s algorithm) If A and B are n × n matrices, then the (sequential) complexity of Strassen’s algorithm is O(n2.71).

17/26 Strassen’s algorithm

Strassen’s algorithm may be applied recursively to multiply together any two matrices. Theorem (Complexity of Strassen’s algorithm) If A and B are n × n matrices, then the (sequential) complexity of Strassen’s algorithm is O(n2.71).

More elaborate algorithms exist that have complexity O(n2.376) (Coppersmith-Winograd).

17/26 Complexity of matrix algorithms

Theorem (Equivalence of matrix operations) Matrix multiplication, computation of determinants, matrix inversion and solution of a linear system of equations all have the same computational complexity (sequential and parallel).

18/26 Solving linear systems of equations

The standard Gaussian elimination algorithm is a bit cumbersome to parallelise, we therefore consider a classical iterative algorithm instead.

19/26 If ai,i 6= 0 for i = 1, . . . , n, then we can solve equation i for xi ,

x1 = (b1 − a1,2x2 − a1,3x3 − · · · − a1,nxn)/a1,1,

x2 = (b2 − a2,1x1 − a2,3x3 − · · · − a2,nxn)/a2,2, . .

xn = (bn − an,1x1 − an,2x2 − · · · − an,n−1xn−1)/an,n.

Solving linear systems of equations

Suppose we have the linear system of equations

a1,1x1 + a1,2x2 + ··· + a1,nxn = b1,

a2,1x1 + a2,2x2 + ··· + a2,nxn = b2, . .

an,1x1 + an,2x2 + ··· + an,nxn = bn.

20/26 x2 = (b2 − a2,1x1 − a2,3x3 − · · · − a2,nxn)/a2,2, . .

xn = (bn − an,1x1 − an,2x2 − · · · − an,n−1xn−1)/an,n.

Solving linear systems of equations

Suppose we have the linear system of equations

a1,1x1 + a1,2x2 + ··· + a1,nxn = b1,

a2,1x1 + a2,2x2 + ··· + a2,nxn = b2, . .

an,1x1 + an,2x2 + ··· + an,nxn = bn.

If ai,i 6= 0 for i = 1, . . . , n, then we can solve equation i for xi ,

x1 = (b1 − a1,2x2 − a1,3x3 − · · · − a1,nxn)/a1,1,

20/26 xn = (bn − an,1x1 − an,2x2 − · · · − an,n−1xn−1)/an,n.

Solving linear systems of equations

Suppose we have the linear system of equations

a1,1x1 + a1,2x2 + ··· + a1,nxn = b1,

a2,1x1 + a2,2x2 + ··· + a2,nxn = b2, . .

an,1x1 + an,2x2 + ··· + an,nxn = bn.

If ai,i 6= 0 for i = 1, . . . , n, then we can solve equation i for xi ,

x1 = (b1 − a1,2x2 − a1,3x3 − · · · − a1,nxn)/a1,1,

x2 = (b2 − a2,1x1 − a2,3x3 − · · · − a2,nxn)/a2,2, . .

20/26 Solving linear systems of equations

Suppose we have the linear system of equations

a1,1x1 + a1,2x2 + ··· + a1,nxn = b1,

a2,1x1 + a2,2x2 + ··· + a2,nxn = b2, . .

an,1x1 + an,2x2 + ··· + an,nxn = bn.

If ai,i 6= 0 for i = 1, . . . , n, then we can solve equation i for xi ,

x1 = (b1 − a1,2x2 − a1,3x3 − · · · − a1,nxn)/a1,1,

x2 = (b2 − a2,1x1 − a2,3x3 − · · · − a2,nxn)/a2,2, . .

xn = (bn − an,1x1 − an,2x2 − · · · − an,n−1xn−1)/an,n.

20/26 Under suitable conditions on the coefficient matrix these iterations will converge to the true solution. Note that if the calculations are performed sequentially, we may make use of the new values of x1, x2,..., xi−1 when we compute xi ; this is called Gaus-Seidel iteration and converges faster than Jacobi iteration.

Solving linear systems of equations

x1 = (b1 − a1,2x2 − a1,3x3 − · · · − a1,nxn)/a1,1

x2 = (b2 − a2,1x1 − a2,3x3 − · · · − a2,nxn)/a2,2 . .

xn = (bn − an,1x1 − an,2x2 − · · · − an,n−1xn−1)/an,n If we choose an initial estimate x0 for the solution, we can use these equations to compute a new estimate x1, then x2 etc. This is called Jacobi’s method.

21/26 Note that if the calculations are performed sequentially, we may make use of the new values of x1, x2,..., xi−1 when we compute xi ; this is called Gaus-Seidel iteration and converges faster than Jacobi iteration.

Solving linear systems of equations

x1 = (b1 − a1,2x2 − a1,3x3 − · · · − a1,nxn)/a1,1

x2 = (b2 − a2,1x1 − a2,3x3 − · · · − a2,nxn)/a2,2 . .

xn = (bn − an,1x1 − an,2x2 − · · · − an,n−1xn−1)/an,n If we choose an initial estimate x0 for the solution, we can use these equations to compute a new estimate x1, then x2 etc. This is called Jacobi’s method. Under suitable conditions on the coefficient matrix these iterations will converge to the true solution.

21/26 Solving linear systems of equations

x1 = (b1 − a1,2x2 − a1,3x3 − · · · − a1,nxn)/a1,1

x2 = (b2 − a2,1x1 − a2,3x3 − · · · − a2,nxn)/a2,2 . .

xn = (bn − an,1x1 − an,2x2 − · · · − an,n−1xn−1)/an,n If we choose an initial estimate x0 for the solution, we can use these equations to compute a new estimate x1, then x2 etc. This is called Jacobi’s method. Under suitable conditions on the coefficient matrix these iterations will converge to the true solution. Note that if the calculations are performed sequentially, we may make use of the new values of x1, x2,..., xi−1 when we compute xi ; this is called Gaus-Seidel iteration and converges faster than Jacobi iteration.

21/26 Each of n processors is given the task of computing one xi . Requires n time steps per iteration.

Jacobi’s method in parallel

Jacobi’s method is perfect for parallel implementation:

x1 = (b1 − a1,2x2 − a1,3x3 − · · · − a1,nxn)/a1,1

x2 = (b2 − a2,1x1 − a2,3x3 − · · · − a2,nxn)/a2,2 . .

xn = (bn − an,1x1 − an,2x2 − · · · − an,n−1xn−1)/an,n

22/26 Jacobi’s method in parallel

Jacobi’s method is perfect for parallel implementation:

x1 = (b1 − a1,2x2 − a1,3x3 − · · · − a1,nxn)/a1,1

x2 = (b2 − a2,1x1 − a2,3x3 − · · · − a2,nxn)/a2,2 . .

xn = (bn − an,1x1 − an,2x2 − · · · − an,n−1xn−1)/an,n

Each of n processors is given the task of computing one xi . Requires n time steps per iteration.

22/26 1 Set j1 = 1000i + 1 and j2 = 1000(i + 1)

2 Give equations j1,..., j2 to the processors 1 1 1 1 0 0 3 Compute x ,..., x from x ,..., x , x ,..., x j1 j2 1 j1−1 j1 n

1 For i = 1, 2, . . . , 100:

First iteration:

Repeat until convergence.

Hybrid Jacobi/Gaus-Seidel

Suppose the system has dimension 100 000 and we only have 1000 processors. Assume that an initial solution x0 is given.

23/26 1 Set j1 = 1000i + 1 and j2 = 1000(i + 1)

2 Give equations j1,..., j2 to the processors 1 1 1 1 0 0 3 Compute x ,..., x from x ,..., x , x ,..., x j1 j2 1 j1−1 j1 n

Repeat until convergence.

Hybrid Jacobi/Gaus-Seidel

Suppose the system has dimension 100 000 and we only have 1000 processors. Assume that an initial solution x0 is given. First iteration:

1 For i = 1, 2, . . . , 100:

23/26 2 Give equations j1,..., j2 to the processors 1 1 1 1 0 0 3 Compute x ,..., x from x ,..., x , x ,..., x j1 j2 1 j1−1 j1 n

Repeat until convergence.

Hybrid Jacobi/Gaus-Seidel

Suppose the system has dimension 100 000 and we only have 1000 processors. Assume that an initial solution x0 is given. First iteration:

1 For i = 1, 2, . . . , 100:

1 Set j1 = 1000i + 1 and j2 = 1000(i + 1)

23/26 1 1 1 1 0 0 3 Compute x ,..., x from x ,..., x , x ,..., x j1 j2 1 j1−1 j1 n

Repeat until convergence.

Hybrid Jacobi/Gaus-Seidel

Suppose the system has dimension 100 000 and we only have 1000 processors. Assume that an initial solution x0 is given. First iteration:

1 For i = 1, 2, . . . , 100:

1 Set j1 = 1000i + 1 and j2 = 1000(i + 1)

2 Give equations j1,..., j2 to the processors

23/26 Repeat until convergence.

Hybrid Jacobi/Gaus-Seidel

Suppose the system has dimension 100 000 and we only have 1000 processors. Assume that an initial solution x0 is given. First iteration:

1 For i = 1, 2, . . . , 100:

1 Set j1 = 1000i + 1 and j2 = 1000(i + 1)

2 Give equations j1,..., j2 to the processors 1 1 1 1 0 0 3 Compute x ,..., x from x ,..., x , x ,..., x j1 j2 1 j1−1 j1 n

23/26 Repeat until convergence.

Hybrid Jacobi/Gaus-Seidel

Suppose the system has dimension 100 000 and we only have 1000 processors. Assume that an initial solution x0 is given. First iteration:

1 For i = 1, 2, . . . , 100:

1 Set j1 = 1000i + 1 and j2 = 1000(i + 1)

2 Give equations j1,..., j2 to the processors 1 1 1 1 0 0 3 Compute x ,..., x from x ,..., x , x ,..., x j1 j2 1 j1−1 j1 n

23/26 Hybrid Jacobi/Gaus-Seidel

Suppose the system has dimension 100 000 and we only have 1000 processors. Assume that an initial solution x0 is given. First iteration:

1 For i = 1, 2, . . . , 100:

1 Set j1 = 1000i + 1 and j2 = 1000(i + 1)

2 Give equations j1,..., j2 to the processors 1 1 1 1 0 0 3 Compute x ,..., x from x ,..., x , x ,..., x j1 j2 1 j1−1 j1 n

Repeat until convergence.

23/26 Often other algorithms than the traditional sequential ones are the most efficient. The actual choice of algorithm is usually highly dependent on the type of processor and how memory is organised.

Conclusion

The number of time steps in many (mathematical) operations can be reduced considerably if multiple operations can be performed simultaneously.

24/26 The actual choice of algorithm is usually highly dependent on the type of processor and how memory is organised.

Conclusion

The number of time steps in many (mathematical) operations can be reduced considerably if multiple operations can be performed simultaneously. Often other algorithms than the traditional sequential ones are the most efficient.

24/26 Conclusion

The number of time steps in many (mathematical) operations can be reduced considerably if multiple operations can be performed simultaneously. Often other algorithms than the traditional sequential ones are the most efficient. The actual choice of algorithm is usually highly dependent on the type of processor and how memory is organised.

24/26 Resources This group of people.

Method 1 While more than one number: 1 I share out the numbers evenly, at least two numbers each 2 You add your numbers 3 You pass your results back to me

Parallel computing in practice

Problem Add 1000 integers, each with 5 digits.

25/26 Method 1 While more than one number: 1 I share out the numbers evenly, at least two numbers each 2 You add your numbers 3 You pass your results back to me

Parallel computing in practice

Problem Add 1000 integers, each with 5 digits.

Resources This group of people.

25/26 Parallel computing in practice

Problem Add 1000 integers, each with 5 digits.

Resources This group of people.

Method 1 While more than one number: 1 I share out the numbers evenly, at least two numbers each 2 You add your numbers 3 You pass your results back to me

25/26 1 You add your numbers 2 You pass your result to someone more important than you

2 While more than one active computer:

3 I receive the result from my top assistant

The intelligence and communication skills of our processors are important, not just their computational skills.

Parallel computing in practice

Method (intelligent) Assumption: You are arranged in a strict hierarchy 1 I share out the numbers evenly

26/26 1 You add your numbers 2 You pass your result to someone more important than you 3 I receive the result from my top assistant

The intelligence and communication skills of our processors are important, not just their computational skills.

Parallel computing in practice

Method (intelligent) Assumption: You are arranged in a strict hierarchy 1 I share out the numbers evenly 2 While more than one active computer:

26/26 2 You pass your result to someone more important than you 3 I receive the result from my top assistant

The intelligence and communication skills of our processors are important, not just their computational skills.

Parallel computing in practice

Method (intelligent) Assumption: You are arranged in a strict hierarchy 1 I share out the numbers evenly 2 While more than one active computer: 1 You add your numbers

26/26 3 I receive the result from my top assistant

The intelligence and communication skills of our processors are important, not just their computational skills.

Parallel computing in practice

Method (intelligent) Assumption: You are arranged in a strict hierarchy 1 I share out the numbers evenly 2 While more than one active computer: 1 You add your numbers 2 You pass your result to someone more important than you

26/26 The intelligence and communication skills of our processors are important, not just their computational skills.

Parallel computing in practice

Method (intelligent) Assumption: You are arranged in a strict hierarchy 1 I share out the numbers evenly 2 While more than one active computer: 1 You add your numbers 2 You pass your result to someone more important than you 3 I receive the result from my top assistant

26/26 The intelligence and communication skills of our processors are important, not just their computational skills.

Parallel computing in practice

Method (intelligent) Assumption: You are arranged in a strict hierarchy 1 I share out the numbers evenly 2 While more than one active computer: 1 You add your numbers 2 You pass your result to someone more important than you 3 I receive the result from my top assistant

26/26 Parallel computing in practice

Method (intelligent) Assumption: You are arranged in a strict hierarchy 1 I share out the numbers evenly 2 While more than one active computer: 1 You add your numbers 2 You pass your result to someone more important than you 3 I receive the result from my top assistant

The intelligence and communication skills of our processors are important, not just their computational skills.

26/26