NJIT Computer Science Dept CS650

CS650 Computer Architecture

Lecture 10 Introduction to Multiprocessors and PC Clustering

Andrew Sohn Computer Science Department New Jersey Institute of Technology

Lecture 10: Intro to Multiprocessors/Clustering 1/15 12/7/2003 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture Key Issues

Run PhotoShop on 1 PC or N PCs

Programmability • How to program a bunch of PCs viewed as a single logical machine.

Performance - • Run PhotoShop on 1 PC (forget the specs of this PC) • Run PhotoShop on N PCs

• Will it run faster on N PCs? Speedup = ?

Lecture 10: Intro to Multiprocessors/Clustering 2/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Types of Multiprocessors Key: Data and Instruction

Single Instruction Single Data (SISD) • Intel processors, AMD processors Single Instruction Multiple Data (SIMD) • Array • Pentium MMX feature Multiple Instruction Single Data (MISD) • • Special purpose machines Multiple Instruction Multiple Data (MIMD) • Typical multiprocessors (Sun, SGI, ,...)

Single Program Multiple Data (SPMD) • Programming model

Lecture 10: Intro to Multiprocessors/Clustering 3/15 12/7/2003 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture Shared-Memory Multiprocessor

Processor Prcessor Prcessor

Interconnection network

Main Memory Storage I/O

Lecture 10: Intro to Multiprocessors/Clustering 4/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Distributed-Memory Multiprocessor

Processor Processor Processor

IO/S MM IO/S MM IO/S MM

Interconnection network

IO/S MM IO/S MM IO/S MM

Processor Processor Processor

Lecture 10: Intro to Multiprocessors/Clustering 5/15 12/7/2003 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture Key Issues

Programmability • Intel Pentium, AMD machines • Performance (Scalability) - Speedup • Run PhotoShop on 1 machine, or n machines • Execution time on 1 machine = x • Execution time on n machines = y • Speedup = x/y = the best speedup = n (often higher due to vari- ous reasons including cache behavior)

Lecture 10: Intro to Multiprocessors/Clustering 6/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Parallelization of Applications

Parallel portion 80%

Serial portion 20%

Lecture 10: Intro to Multiprocessors/Clustering 7/15 12/7/2003 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture Parallelization for 4 Processors

Consider a loop with 100 iterations:

for (i=0;i<100;i++) a[i] = b[i] + c[i];

The loop has no loop-carried dependence, straight forward to parallelize the code for 4 processors. Each processor executes 1/4 of the loop, i.e. 25 iterations as fol- lows:

P0: for (i=0;i<25;i++) a[i] = b[i] + c[i]; P1: for (i=25;i<50;i++) a[i] = b[i] + c[i]; P2: for (i=50;i<75;i++) a[i] = b[i] + c[i]; P3: for (i=75;i<100;i++) a[i] = b[i] + c[i];

Lecture 10: Intro to Multiprocessors/Clustering 8/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Shared-Memory Multiprocessor

Processor Prcessor Prcessor

Interconnection network

for (i=0;i<100;i++) a[i] = b[i] + c[i]; Storage I/O

Main Memory

Lecture 10: Intro to Multiprocessors/Clustering 9/15 12/7/2003 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture Distributed-Memory Multiprocessor

Processor Processor

for (i=0;i<25;i++) for (i=25;i<50;i++) IO/S a[i] = b[i] + c[i]; IO/S a[i] = b[i] + c[i];

Interconnection network

for (i=50;i<75;i++) for (i=75;i<100;i++) IO/S a[i] = b[i] + c[i]; IO/S a[i] = b[i] + c[i];

Processor Processor

Lecture 10: Intro to Multiprocessors/Clustering 10/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Amdahl’s Law Maximum speedup is limited by the serial fraction of a program • N = the number of processors, • s = the time spent by a processor on serial part of a program, • p = the time spent by a processor on parallel part of a program • t = total time = s + p = 1

Maximum possible speedup is given by: sp+ 1 1 D ==------=------p p 1 s + ---- 1 – p + ---- 1 – p⎛⎞1 – ---- N N ⎝⎠N 1 1 – ---- D p = ------1 1 – ---- N

Lecture 10: Intro to Multiprocessors/Clustering 11/15 12/7/2003 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture Amdahl’s Law - Example

N = 10 PCs, the speedup to achieve D = 8, the portion of program to be parallelized will be

1 1 1 – ---- 1 – --- D 8 0.875 p ==------==------0.972 1 1 0.9 1 – ---- 1 – ------N 10

What do these numbers mean? want to achieve 8 fold speedup on 10 PCs --> 97.2% of the entire code has to be parallelized!

Lecture 10: Intro to Multiprocessors/Clustering 12/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Amdahl’s Law - Example main() { ...... Parallel portion 97.2% ...... Serial portion 2.8% ...... }

Lecture 10: Intro to Multiprocessors/Clustering 13/15 12/7/2003 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture Clustering of PCs

• A Poor Man’s • Linux box clustering • MS Windows clustering • For scalability and high availability

• Mail server clustering: MS Exchange • DB server clustering - MS SQL server, MySQL sever, ... • VPN box clustering • Firewall clustering • Filtering clustering • Web clustering

Lecture 10: Intro to Multiprocessors/Clustering 14/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Data Center Infrastructure Chapter 6 of our textbook

Lecture 10: Intro to Multiprocessors/Clustering 15/15 12/7/2003 A. Sohn