NJIT Computer Science Dept CS650 Computer Architecture CS650 Computer Architecture Lecture 10 Introduction to Multiprocessors and PC Clustering Andrew Sohn Computer Science Department New Jersey Institute of Technology Lecture 10: Intro to Multiprocessors/Clustering 1/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Key Issues Run PhotoShop on 1 PC or N PCs Programmability • How to program a bunch of PCs viewed as a single logical machine. Performance Scalability - Speedup • Run PhotoShop on 1 PC (forget the specs of this PC) • Run PhotoShop on N PCs • Will it run faster on N PCs? Speedup = ? Lecture 10: Intro to Multiprocessors/Clustering 2/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Types of Multiprocessors Key: Data and Instruction Single Instruction Single Data (SISD) • Intel processors, AMD processors Single Instruction Multiple Data (SIMD) • Array processor • Pentium MMX feature Multiple Instruction Single Data (MISD) • Systolic array • Special purpose machines Multiple Instruction Multiple Data (MIMD) • Typical multiprocessors (Sun, SGI, Cray,...) Single Program Multiple Data (SPMD) • Programming model Lecture 10: Intro to Multiprocessors/Clustering 3/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Shared-Memory Multiprocessor Processor Prcessor Prcessor Interconnection network Main Memory Storage I/O Lecture 10: Intro to Multiprocessors/Clustering 4/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Distributed-Memory Multiprocessor Processor Processor Processor IO/S MM IO/S MM IO/S MM Interconnection network IO/S MM IO/S MM IO/S MM Processor Processor Processor Lecture 10: Intro to Multiprocessors/Clustering 5/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Key Issues Programmability • Intel Pentium, AMD machines • Performance (Scalability) - Speedup • Run PhotoShop on 1 machine, or n machines • Execution time on 1 machine = x • Execution time on n machines = y • Speedup = x/y = the best speedup = n (often higher due to vari- ous reasons including cache behavior) Lecture 10: Intro to Multiprocessors/Clustering 6/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Parallelization of Applications Parallel portion 80% Serial portion 20% Lecture 10: Intro to Multiprocessors/Clustering 7/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Parallelization for 4 Processors Consider a loop with 100 iterations: for (i=0;i<100;i++) a[i] = b[i] + c[i]; The loop has no loop-carried dependence, straight forward to parallelize the code for 4 processors. Each processor executes 1/4 of the loop, i.e. 25 iterations as fol- lows: P0: for (i=0;i<25;i++) a[i] = b[i] + c[i]; P1: for (i=25;i<50;i++) a[i] = b[i] + c[i]; P2: for (i=50;i<75;i++) a[i] = b[i] + c[i]; P3: for (i=75;i<100;i++) a[i] = b[i] + c[i]; Lecture 10: Intro to Multiprocessors/Clustering 8/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Shared-Memory Multiprocessor Processor Prcessor Prcessor Interconnection network for (i=0;i<100;i++) a[i] = b[i] + c[i]; Storage I/O Main Memory Lecture 10: Intro to Multiprocessors/Clustering 9/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Distributed-Memory Multiprocessor Processor Processor for (i=0;i<25;i++) for (i=25;i<50;i++) IO/S a[i] = b[i] + c[i]; IO/S a[i] = b[i] + c[i]; Interconnection network for (i=50;i<75;i++) for (i=75;i<100;i++) IO/S a[i] = b[i] + c[i]; IO/S a[i] = b[i] + c[i]; Processor Processor Lecture 10: Intro to Multiprocessors/Clustering 10/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Amdahl’s Law Maximum speedup is limited by the serial fraction of a program • N = the number of processors, • s = the time spent by a processor on serial part of a program, • p = the time spent by a processor on parallel part of a program • t = total time = s + p = 1 Maximum possible speedup is given by: sp+ 1 1 D ==------------ ---------------------- =------------------------------- p p 1 s + ---- 1 – p + ---- 1 – p⎛⎞1 – ---- N N ⎝⎠N 1 1 – ---- D p = ------------- 1 1 – ---- N Lecture 10: Intro to Multiprocessors/Clustering 11/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Amdahl’s Law - Example N = 10 PCs, the speedup to achieve D = 8, the portion of program to be parallelized will be 1 1 1 – ---- 1 – --- D 8 0.875 p ==------------- --------------- ==------------- 0.972 1 1 0.9 1 – ---- 1 – ------ N 10 What do these numbers mean? want to achieve 8 fold speedup on 10 PCs --> 97.2% of the entire code has to be parallelized! Lecture 10: Intro to Multiprocessors/Clustering 12/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Amdahl’s Law - Example main() { ......... Parallel portion 97.2% ......... ......... ......... ......... ......... ......... ......... ......... ......... ......... ......... Serial portion 2.8% ......... } Lecture 10: Intro to Multiprocessors/Clustering 13/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Clustering of PCs • A Poor Man’s Supercomputer • Linux box clustering • MS Windows clustering • For scalability and high availability • Mail server clustering: MS Exchange • DB server clustering - MS SQL server, MySQL sever, ... • VPN box clustering • Firewall clustering • Filtering clustering • Web clustering Lecture 10: Intro to Multiprocessors/Clustering 14/15 12/7/2003 A. Sohn NJIT Computer Science Dept CS650 Computer Architecture Data Center Infrastructure Chapter 6 of our textbook Lecture 10: Intro to Multiprocessors/Clustering 15/15 12/7/2003 A. Sohn.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-