<<

Parallel Processing: Past, Present and Future

Dr. G. Young

CS 370 Dr. Young 1 What is a ?

 Let us run a contest. Who gives the most updated explanation?

CS 370 Dr. Young 2 Supercomputer

 (AllWords.com)

A very fast, powerful mainframe , used in advanced military and scientific applications.

CS 370 Dr. Young 3 Supercomputer

 (M-W.com, Merriam-Webster's Collegiate Dictionary)

A large very fast mainframe used especially for scientific computations

CS 370 Dr. Young 4 Supercomputer

 (Dictionary.com)

A that is among the largest, fastest, or most powerful of those available at a given time.

CS 370 Dr. Young 5 Supercomputer

 (FOLDOC.doc.ic.ac.uk) A broad term for one of the fastest currently available. Such computers are typically used for number crunching including scientific simulations, (animated) graphics, analysis of geological (e.g. in petrochemical prospecting), structural analysis, computational fluid dynamics, physics, chemistry, electronic design, nuclear energy research and meteorology. Perhaps the best known supercomputer manufacturer is Cray Research. A less serious definition, reported from about 1990 at The University Of New South Wales states that a supercomputer is any computer that can outperform IBM's current fastest, thus making it impossible for IBM to ever produce a supercomputer.

CS 370 Dr. Young 6 Supercomputer

 (ComputerUser.com)

A very fast and powerful computer, outperforming most mainframes, and used for intensive calculation, scientific simulations, animated graphics, and other work that requires sophisticated and high- powered . Cray Research and Intel are well-known producers of .

CS 370 Dr. Young 7 Supercomputer

 (PCWebopaedia.com)

The fastest type of computer. Supercomputers are very expensive and are employed for specialized applications that require immense amounts of mathematical calculations. For example, weather forecasting requires a supercomputer. Other uses of supercomputers include animated graphics, fluid dynamic calculations, nuclear energy research, and petroleum exploration. The chief difference between a supercomputer and a mainframe is that a supercomputer channels all its power into executing a few programs as fast as possible, whereas a mainframe uses its power to execute many programs concurrently. CS 370 Dr. Young 8 Supercomputer

 (PrenHall.com)

The category that includes the largest and most powerful computers.

CS 370 Dr. Young 9 Supercomputer

 (Geek.com) This refers to a computer that is able to operate at a speed that places it at or near the top speed of currently produced computers. Most supercomputers cost millions of dollars, and the traditional model of using one large computer with proprietary hardware is being challenged by using a cluster of cheaper computers with more standard hardware.

CS 370 Dr. Young 10 Supercomputer Contest

 Who is the winner?  AllWords.com  M-W.com, Merriam-Webster's Collegiate Dictionary  Dictionary.com  FOLDOC.doc.ic.ac.uk  ComputerUser.com  PCWebopaedia.com  PrenHall.com  Geek.com

CS 370 Dr. Young 11 Contest Winner

 geek.com @ 2001 (Led by Chief Geek - Joel Evans )

Used to tell people all about Geek.

For example, to check out if you’re Beginner Geek, Intermediate Geek, Advanced Geek or Super Geek

CS 370 Dr. Young 12 Winner Highlight

 (Geek.com@2001) This refers to a computer that is able to operate at a speed that places it at or near the top speed of currently produced computers. Most supercomputers cost millions of dollars, and the traditional model of using one large computer with proprietary hardware is being challenged by using a cluster of cheaper computers with more standard hardware.

CS 370 Dr. Young 13 CS 370 Dr. Young 14 Topics of Discussion

 Introduction

 Computer Networks

 Parallel and Distributed Processing

 Affordable Supercomputer

 Future Trend and Challenge

 Conclusion

 Q&A

CS 370 Dr. Young 15 Introduction

 Why we need Supercomputers?  Supercomputer Vendors  Supercomputer Products  Top Supercomputers  How to evaluate the power of a supercomputer?  Top 10 Supercomputers  Theoretical Implication of Parallel  Areas of Research in Supercomputing  Supercomputing Journals CS 370 Dr. Young 16 Why we need Supercomputers?

 Even though processor speed has been increased dramatically, but still not fast enough to our needs. Use multiple processors is the way to go.

 Areas need supercomputers:

 Generally involves intensive computation

 Aerospace, Weather, Finance, Defense, Energy, , Government, Chemistry, Geophysics, Telecom, Academic, , Mechanics, Automotive,Transportation, , Manufacturing, Fluid Dynamic, Petroleum

CS 370 Dr. Young 17 Supercomputer Vendors

CS 370 Dr. Young 18 Supercomputer Products

 The Avalon A12  The Cambridge Parallel Processing Gamma II Plus.  The Compaq AlphaServer SC Series.  The Fujitsu AP3000  The Fujitsu VPP5000 series  The Hitachi SR8000 system  The HP Exemplar V2600  The IBM RS/6000 SP  The NEC Cenju-4  The NEC SX-5  The SGI Origin 2000 series  The Sun E1000 Starfire  The Tera/Cray SV1  The Tera/Cray T3E They use different : Processor, OS, Connection structure, Proprietary hardware and Software

CS 370 Dr. Young 19 How to evaluate the power of a supercomputer?

 Peak-performance

 Theoretical

 Run-time

 Benchmarks

 Linpack (Top500)

 Finding Largest Mersenne Prime Number

CS 370 Dr. Young 20 How to evaluate the power of a supercomputer?

Benchmarks performance  LINPACK Benchmark (introduced by Jack Dongarra) is to solve a dense system of linear equations. Rank Top500 supercomputers

 This performance does not reflect the overall of a given system, as no single number ever can.

 Since the problem is very regular, the performance achieved is quite high, and the performance numbers give a good correction of peak performance.

CS 370 Dr. Young 21 How to evaluate the power of a supercomputer?

Prime Number

 Greek mathematician Euclid proved that there are an infinite number of prime numbers.

 do not occur in a regular sequence

 no formula for generating them.

 Discovery of new primes requires randomly generating and testing millions of numbers.

CS 370 Dr. Young 22 How to evaluate the power of a supercomputer? Largest known Mersenne Prime Numbers* before 2000

Prime Digits Year Name  2^21701-1 6533 1978 Landon Curt Noll (with Laura Nickel, Ariel Glenn)  2^23209-1 6987 1979 Landon Curt Noll  2^44497-1 13395 1979 David Slowinski (with Harry Nelson)  2^86243-1 25962 1982 David Slowinski  2^132049-1 39751 1983 David Slowinski  2^216091-1 65050 1985 David Slowinski  2^756839-1 227832 1992 David Slowinski Paul Gage  2^859433-1 258716 1994 David Slowinski Paul Gage  2^1257787-1 378632 1996 David Slowinski Paul Gage  2^1398269-1 420921 1997 David Slowinski Paul Gage  2^2976221-1 895932 1997 David Slowinski Paul Gage  2^3021377-1 909526 1998 David Slowinski Paul Gage  2^6972593-1 2098960 # 1999 David Slowinski Paul Gage

* Mersenne Prime Numbers are Prime Numbers in the form of 2^ -1 # 67 pages long if printed on Newspaper

CS 370 Dr. Young 23 How to evaluate the power of a supercomputer?

The current largest known Mersenne Prime Numbers (in the form of 2n – 1) can be found at http://www.mersenne.org/ $$$ The Electronic Frontier Foundation is offering a $100,000 award for discovering the largest (ten million digits) prime number

CS 370 Dr. Young 24 How to evaluate the power of a supercomputer?

Finding the Largest Mersenne Prime Number

 Slowinski: (SGI, Cray)

"The elementsprime of findera system program -- from the logic of the processors, to the memory, the and the operating and multitasking systems. For high performance systemsrigorously with multiple tests all processors, this is an excellent test of the system's ability."

CS 370 Dr. Young 25 Top 10 Supercomputers

Country 2006 2007 2008 USA 6 8 6 Japan 2 Spain 1 India 1 Germany 1 1 1 France 1 2

CS 370 Dr. Young 26 Top 10 Supercomputers

Country 2012 2013 2013 (Nov) (June) (Nov) USA 5 5 5 China 1 2 1 Japan 1 2 1 Germany 2 1 2 Italy 1 Switzerland 1

CS 370 Dr. Young 27 Top Supercomputers

 Timeline

 http://www.top500.org/timeline/

 Top #1 System

 http://www.top500.org/featured/to p-systems/

CS 370 Dr. Young 28 Theoretical Implication of Parallel machines

 Parallel with infinite number of processors means we have a Non-deterministic Machine

 Statement like Guess({S1,S2}) can be added to our familiar deterministic program.

 Suddenly, those NP-hard problems (e.g. Traveling Salesman Problem) can be solved in Linear time

CS 370 Dr. Young 29 Areas of Research in P&D Computing

 Parallel and Distributed Architectures

 Parallel and Distributed

 Parallel Programming Languages

 Scientific Computing

& Image Processing Systems

 Special Purpose Processors

 VLSI and Configurable Logic Systems

 Performance Modeling/Evaluation

 Memory Hierarchy Issues in Parallel and Distributed Processing

 Programming Environments and Tools for Parallel and Distributed Platforms

and Optimizations for Parallel and Distributed Processing

and Runtime Support for Parallel and

 Parallel and Distributed Network Protocols and Implementations

 Applications of Parallel and Distributed Computing

 Nontraditional Processor Technologies (Optical, Quantum, DNA, etc.)

CS 370 Dr. Young 30 Supercomputing Journals

 International J. of Parallel  ACM J. of Experimental Programming  J. of Networks  BIT  J. of Parallel and Distributed  Cluster Computing Computing  Computing and in Science  J. of Performance Evaluation and Modeling of Computer Systems  IEEE Trans. on Computers  J. of Supercomputing  IEEE Trans. on Parallel and Distributed Systems  J. of Visual Languages & Computing  International J. of Computer Research  Parallel Algorithms and Applications   International J. of Computers and Their Applications  Parallel and Distributed Computing Practices  International J. of High Performance Computing and Networking  Parallel Processing Letters  International J. of High Speed  SIAM J. of Computing Computing  SIAM J. of Scientific Computing

CS 370 Dr. Young 31 Topics of Discussion

 Introduction

 Computer Networks

 Parallel and Distributed Processing

 Affordable Supercomputer

 Future Trend and Challenge

 Conclusion

 Q&A

CS 370 Dr. Young 32 Computer Networks

 Homogeneity

 Same kind of computers

 Examples: a network of PCs, a network of Sun , …

 Heterogeneity

 A mixture of different computers

 Example: Internet

CS 370 Dr. Young 33 Computer Networks

Network/Parallel

Chain Ring Mesh Torus

Tree Cube Hypercube

CS 370 Dr. Young 34 Computer Networks

Proprietary Parallel Computers

 Ring HP Exemplar V2600

 Mesh Cambridge Parallel Processing Gamma II Plus

 Torus Fujitsu AP3000 Tera/Cray Research Inc. T3E

 Hypercube SGI Origin series

CS 370 Dr. Young 35 Topics of Discussion

 Introduction

 Computer Networks

 Parallel and Distributed Processing

 Affordable Supercomputer

 Future Trend and Challenge

 Conclusion

 Q&A

CS 370 Dr. Young 36 Parallel and Distributed Processing

 Hardware structure of Parallel Computers

 Architectural Classes

 Memory Systems

 Distributed Processing

 PVM & MPI

 Parallel Applications

 Task Assignment

CS 370 Dr. Young 37 Parallel and Distributed Processing Hardware Structure of Parallel Computers

 Classification is based on the way of manipulating of instruction and data  4 main architectural classes [Flynn, 1972]

 Multiple/Single Instruction (MI/SI)

 Multiple/Single Data (MD/SD)

M.J. Flynn, Some computer organizations and their) effectiveness, IEEE Transactions on Computing, C-21, pp. 948-960, 1972.

CS 370 Dr. Young 38 Parallel and Distributed Processing

Architectural Classes SISD machines:

 Accommodate one instruction stream that is executed serially.

 These are the conventional systems that contain one CPU

SIMD machines:

 Such systems often have thousands of processing units

 execute the same instruction on different data

 Hitachi S3600

CS 370 Dr. Young 39 Parallel and Distributed Processing

Architectural Classes MISD machines:  Multiple instructions should act on a single stream of data  No practical machine

MIMD machines:  Execute instruction streams in parallel on different data.  Run many sub-tasks in parallel  Large variety of MIMD systems

CS 370 Dr. Young 40 Parallel and Distributed Processing Memory Systems systems:

 Have multiple CPUs all of which the same .

Distributed memory systems:

 Each CPU has its own associated memory.

CS 370 Dr. Young 41 Parallel and Distributed Processing Distributed Processing  DM-MIMD concept one step further  Instead of many integrated processors in one or several boxes, workstations are connected by (Gigabit) , FDDI, or otherwise and set to work concurrently on tasks in the same program.  between processors is often slower in orders of magnitude.

CS 370 Dr. Young 42 Parallel and Distributed Processing PVM & MPI  Packages to realize Distributed Processing

 PVM (Parallel ) [Geist et al., 1994]

 MPIComplete Reference(Message Vol. 1,P Theassing MPI CoreInterface) [Snir et al. and Gropp et al., 1998]

A. Geist, A. Beguelin, J. Dongarra, R. Manchek, W. Jaing, and V. Sunderam, PVM: A Users' Guide and Tutorial for Networked Parallel Computing, MIT Press, Boston, 1994. M. Snir, S. Otto, S. Huss-Lederman, D. Walker, J. Dongarra, MPI: The , MIT Press, Boston, 1998. W. Gropp, S. Huss-Ledermann, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir, M. Snir, MPI: The Complete Reference, , MIT Press, Boston, 1998.

CS 370 Dr. Young 43 Vol. 2, The MPI Extensions Parallel and Distributed Processing

PVM & MPI

 This style of programming, called the "message passing" model, has been widely accepted

 PVM and MPI have been adopted by virtually all major vendors of distributed-memory MIMD systems and even on shared-memory MIMD systems for compatibility reasons.

CS 370 Dr. Young 44 Parallel and Distributed Processing

Parallel Applications

 Parallel Algorithms

 Fine grain/Coarse grain

 Parallel Programming

 ParBegin/ParEnd

 PVM/MPI

CS 370 Dr. Young 45 Parallel and Distributed Processing Task Assignment  Performance Measures

 Completion Time

 Overheads for P&D Processing

 Execution Time for tasks (E)

 Intra-task Interference cost (ITI)

 Inter-task Communication cost (ITC)

CS 370 Dr. Young 46 Parallel and Distributed Processing

Task Assignment Throughput (Stone, 1977)

 E +  ITI +  ITC

H.Network Stone, FlowMultiprocessor Algorithms with the Aid of , IEEE Transactions on , Vol. 3, No. 1, pp. 83-85, 1977.

CS 370 Dr. Young 47 Topics of Discussion

 Introduction

 Computer Networks

 Parallel and Distributed Processing

 Affordable Supercomputer

 Future Trend and Challenge

 Conclusion

 Q&A

CS 370 Dr. Young 48 Affordable supercomputer

 Computer networks with Off-the-Shelf hardware Powered by Parallel and Distributed Software Tools

 Advantages over Conventional Supercomputer

 System of Homogeneous Network

 A network of PC with SCSI Link

 SPVM

 System of

 Internet

 JMPI

CS 370 Dr. Young 49 Computer Networks with Off-the-Shelf Hardware Powered by Parallel and Distributed Processing Tools

CS 370 Dr. Young 50 Advantages over Conventional Supercomputer

 Decomposable  Reusable  Scale up and down easily  Off-the-shelf  Third World friendly  Economical  Reconfigurable Interconnection Topology  Easy to upgrade – , processor, software  Collaborative R&D Environment  General-purpose  Multi-usage

CS 370 Dr. Young 51 Homogeneous Network

 A network of Pentium PCs

CS 370 Dr. Young 52 Heterogeneous Network

CS 370 Dr. Young 53 Topics of Discussion

 Introduction

 Computer Networks

 Parallel and Distributed Processing

 Affordable Supercomputer

 Future Trend and Challenge

 Conclusion

 Q&A

CS 370 Dr. Young 54 Future Trend and Challenge

 PVM and MPI Community continues to grow  Cheaper and faster processors and  More employment of Clusters of Workstations for High Performance Computing  More freely available Software Tools

CS 370 Dr. Young 55 Future Trend and Challenge

 Race between Proprietary supercomputer and the Cluster computers

 How fast can a supercomputer go?

 How the heterogeneous computing evolves?

 Will a cluster of computers over Internet be a faster computer in the world?

 Processing Power on Demand Service?

 Processor ?

CS 370 Dr. Young 56 Topics of Discussion

 Introduction

 Computer Networks

 Parallel and Distributed Processing

 Affordable Supercomputer

 Future Trend and Challenge

 Conclusion

 Q&A

CS 370 Dr. Young 57 Conclusion

 Powered by the state-of-art Parallel and Distributed Processing Tools, high- speed computer network, with powerful workstations, will become a very attractive, affordable, highly scalable and highly available solution for the High Performance Computing world.

CS 370 Dr. Young 58 Conclusion

 Such an Exciting Area of Research

 Practical

 Affordable

 Educational

through Major Forums (e.g. IEEE TFCC, Top500, TopClusters)

 One issue is how to compare/evaluate/rank their performances

CS 370 Dr. Young 59 Conclusion

 Research topics

 Build Your Own Supercomputer(Cluster)

 Heterogeneous System

 Employ new COTS (Com. Off-the-Shelf)

 Classification

 Benchmarks

 Performance Tracking Tools

 System Administration Software

CS 370 Dr. Young 60 Top 500 Supercomputers Update

 Trend of Cluster Computers Versus Proprietary Supercomputers.

 The TOP 500 Supercomputer List http://www.top500.org/

CS 370 Dr. Young 61 Q&A

CS 370 Dr. Young 62