Parallel Processing: Past, Present and Future
Dr. G. Young
CS 370 Dr. Young 1 What is a Supercomputer?
Let us run a contest. Who gives the most updated explanation?
CS 370 Dr. Young 2 Supercomputer
(AllWords.com)
A very fast, powerful mainframe computer, used in advanced military and scientific applications.
CS 370 Dr. Young 3 Supercomputer
(M-W.com, Merriam-Webster's Collegiate Dictionary)
A large very fast mainframe used especially for scientific computations
CS 370 Dr. Young 4 Supercomputer
(Dictionary.com)
A mainframe computer that is among the largest, fastest, or most powerful of those available at a given time.
CS 370 Dr. Young 5 Supercomputer
(FOLDOC.doc.ic.ac.uk) A broad term for one of the fastest computers currently available. Such computers are typically used for number crunching including scientific simulations, (animated) graphics, analysis of geological data (e.g. in petrochemical prospecting), structural analysis, computational fluid dynamics, physics, chemistry, electronic design, nuclear energy research and meteorology. Perhaps the best known supercomputer manufacturer is Cray Research. A less serious definition, reported from about 1990 at The University Of New South Wales states that a supercomputer is any computer that can outperform IBM's current fastest, thus making it impossible for IBM to ever produce a supercomputer.
CS 370 Dr. Young 6 Supercomputer
(ComputerUser.com)
A very fast and powerful computer, outperforming most mainframes, and used for intensive calculation, scientific simulations, animated graphics, and other work that requires sophisticated and high- powered computing. Cray Research and Intel are well-known producers of supercomputers.
CS 370 Dr. Young 7 Supercomputer
(PCWebopaedia.com)
The fastest type of computer. Supercomputers are very expensive and are employed for specialized applications that require immense amounts of mathematical calculations. For example, weather forecasting requires a supercomputer. Other uses of supercomputers include animated graphics, fluid dynamic calculations, nuclear energy research, and petroleum exploration. The chief difference between a supercomputer and a mainframe is that a supercomputer channels all its power into executing a few programs as fast as possible, whereas a mainframe uses its power to execute many programs concurrently. CS 370 Dr. Young 8 Supercomputer
(PrenHall.com)
The category that includes the largest and most powerful computers.
CS 370 Dr. Young 9 Supercomputer
(Geek.com) This refers to a computer that is able to operate at a speed that places it at or near the top speed of currently produced computers. Most supercomputers cost millions of dollars, and the traditional model of using one large computer with proprietary hardware is being challenged by using a cluster of cheaper computers with more standard hardware.
CS 370 Dr. Young 10 Supercomputer Contest
Who is the winner? AllWords.com M-W.com, Merriam-Webster's Collegiate Dictionary Dictionary.com FOLDOC.doc.ic.ac.uk ComputerUser.com PCWebopaedia.com PrenHall.com Geek.com
CS 370 Dr. Young 11 Contest Winner
geek.com @ 2001 (Led by Chief Geek - Joel Evans )
Used to tell people all about Geek.
For example, to check out if you’re Beginner Geek, Intermediate Geek, Advanced Geek or Super Geek
CS 370 Dr. Young 12 Winner Highlight
(Geek.com@2001) This refers to a computer that is able to operate at a speed that places it at or near the top speed of currently produced computers. Most supercomputers cost millions of dollars, and the traditional model of using one large computer with proprietary hardware is being challenged by using a cluster of cheaper computers with more standard hardware.
CS 370 Dr. Young 13 CS 370 Dr. Young 14 Topics of Discussion
Introduction
Computer Networks
Parallel and Distributed Processing
Affordable Supercomputer
Future Trend and Challenge
Conclusion
Q&A
CS 370 Dr. Young 15 Introduction
Why we need Supercomputers? Supercomputer Vendors Supercomputer Products Top Supercomputers How to evaluate the power of a supercomputer? Top 10 Supercomputers Theoretical Implication of Parallel machines Areas of Research in Supercomputing Supercomputing Journals CS 370 Dr. Young 16 Why we need Supercomputers?
Even though processor speed has been increased dramatically, but still not fast enough to our needs. Use multiple processors is the way to go.
Areas need supercomputers:
Generally involves intensive computation
Aerospace, Weather, Finance, Defense, Energy, Internet, Government, Chemistry, Geophysics, Telecom, Academic, Database, Mechanics, Automotive,Transportation, Electronics, Manufacturing, Fluid Dynamic, Petroleum
CS 370 Dr. Young 17 Supercomputer Vendors
CS 370 Dr. Young 18 Supercomputer Products
The Avalon A12 The Cambridge Parallel Processing Gamma II Plus. The Compaq AlphaServer SC Series. The Fujitsu AP3000 The Fujitsu VPP5000 series The Hitachi SR8000 system The HP Exemplar V2600 The IBM RS/6000 SP The NEC Cenju-4 The NEC SX-5 The SGI Origin 2000 series The Sun E1000 Starfire The Tera/Cray SV1 The Tera/Cray T3E They use different technologies: Processor, OS, Connection structure, Proprietary hardware and Software
CS 370 Dr. Young 19 How to evaluate the power of a supercomputer?
Peak-performance
Theoretical
Run-time
Benchmarks
Linpack benchmark (Top500)
Finding Largest Mersenne Prime Number
CS 370 Dr. Young 20 How to evaluate the power of a supercomputer?
Benchmarks performance LINPACK Benchmark (introduced by Jack Dongarra) is to solve a dense system of linear equations. Rank Top500 supercomputers
This performance does not reflect the overall of a given system, as no single number ever can.
Since the problem is very regular, the performance achieved is quite high, and the performance numbers give a good correction of peak performance.
CS 370 Dr. Young 21 How to evaluate the power of a supercomputer?
Prime Number
Greek mathematician Euclid proved that there are an infinite number of prime numbers.
do not occur in a regular sequence
no formula for generating them.
Discovery of new primes requires randomly generating and testing millions of numbers.
CS 370 Dr. Young 22 How to evaluate the power of a supercomputer? Largest known Mersenne Prime Numbers* before 2000
Prime Digits Year Name 2^21701-1 6533 1978 Landon Curt Noll (with Laura Nickel, Ariel Glenn) 2^23209-1 6987 1979 Landon Curt Noll 2^44497-1 13395 1979 David Slowinski (with Harry Nelson) 2^86243-1 25962 1982 David Slowinski 2^132049-1 39751 1983 David Slowinski 2^216091-1 65050 1985 David Slowinski 2^756839-1 227832 1992 David Slowinski Paul Gage 2^859433-1 258716 1994 David Slowinski Paul Gage 2^1257787-1 378632 1996 David Slowinski Paul Gage 2^1398269-1 420921 1997 David Slowinski Paul Gage 2^2976221-1 895932 1997 David Slowinski Paul Gage 2^3021377-1 909526 1998 David Slowinski Paul Gage 2^6972593-1 2098960 # 1999 David Slowinski Paul Gage
* Mersenne Prime Numbers are Prime Numbers in the form of 2^
CS 370 Dr. Young 23 How to evaluate the power of a supercomputer?
The current largest known Mersenne Prime Numbers (in the form of 2n – 1) can be found at http://www.mersenne.org/ $$$ The Electronic Frontier Foundation is offering a $100,000 award for discovering the next largest (ten million digits) prime number
CS 370 Dr. Young 24 How to evaluate the power of a supercomputer?
Finding the Largest Mersenne Prime Number
Slowinski: (SGI, Cray)
"The elementsprime of findera system program -- from the logic of the processors, to the memory, the compiler and the operating and multitasking systems. For high performance systemsrigorously with multiple tests all processors, this is an excellent test of the system's ability."
CS 370 Dr. Young 25 Top 10 Supercomputers
Country 2006 2007 2008 USA 6 8 6 Japan 2 Spain 1 India 1 Germany 1 1 1 France 1 2
CS 370 Dr. Young 26 Top 10 Supercomputers
Country 2012 2013 2013 (Nov) (June) (Nov) USA 5 5 5 China 1 2 1 Japan 1 2 1 Germany 2 1 2 Italy 1 Switzerland 1
CS 370 Dr. Young 27 Top Supercomputers
Timeline
http://www.top500.org/timeline/
Top #1 System
http://www.top500.org/featured/to p-systems/
CS 370 Dr. Young 28 Theoretical Implication of Parallel machines
Parallel machine with infinite number of processors means we have a Non-deterministic Machine
Statement like Guess({S1,S2}) can be added to our familiar deterministic program.
Suddenly, those NP-hard problems (e.g. Traveling Salesman Problem) can be solved in Linear time
CS 370 Dr. Young 29 Areas of Research in P&D Computing
Parallel and Distributed Architectures
Parallel and Distributed Algorithms
Parallel Programming Languages
Scientific Computing
Signal & Image Processing Systems
Special Purpose Processors
VLSI and Configurable Logic Systems
Performance Modeling/Evaluation
Memory Hierarchy Issues in Parallel and Distributed Processing
Programming Environments and Tools for Parallel and Distributed Platforms
Compilers and Optimizations for Parallel and Distributed Processing
Operating System and Runtime Support for Parallel and Distributed Computing
Parallel and Distributed Network Protocols and Implementations
Applications of Parallel and Distributed Computing
Nontraditional Processor Technologies (Optical, Quantum, DNA, etc.)
CS 370 Dr. Young 30 Supercomputing Journals
International J. of Parallel ACM J. of Experimental Algorithmics Programming J. of Interconnection Networks BIT J. of Parallel and Distributed Cluster Computing Computing Computing and Visualization in Science J. of Performance Evaluation and Modeling of Computer Systems IEEE Trans. on Computers J. of Supercomputing IEEE Trans. on Parallel and Distributed Systems J. of Visual Languages & Computing International J. of Computer Research Parallel Algorithms and Applications Parallel Computing International J. of Computers and Their Applications Parallel and Distributed Computing Practices International J. of High Performance Computing and Networking Parallel Processing Letters International J. of High Speed SIAM J. of Computing Computing SIAM J. of Scientific Computing
CS 370 Dr. Young 31 Topics of Discussion
Introduction
Computer Networks
Parallel and Distributed Processing
Affordable Supercomputer
Future Trend and Challenge
Conclusion
Q&A
CS 370 Dr. Young 32 Computer Networks
Homogeneity
Same kind of computers
Examples: a network of PCs, a network of Sun workstations, …
Heterogeneity
A mixture of different computers
Example: Internet
CS 370 Dr. Young 33 Computer Networks
Network/Parallel Computer Architecture
Chain Ring Mesh Torus
Tree Star Cube Hypercube
CS 370 Dr. Young 34 Computer Networks
Proprietary Parallel Computers
Ring HP Exemplar V2600
Mesh Cambridge Parallel Processing Gamma II Plus
Torus Fujitsu AP3000 Tera/Cray Research Inc. T3E
Hypercube SGI Origin series
CS 370 Dr. Young 35 Topics of Discussion
Introduction
Computer Networks
Parallel and Distributed Processing
Affordable Supercomputer
Future Trend and Challenge
Conclusion
Q&A
CS 370 Dr. Young 36 Parallel and Distributed Processing
Hardware structure of Parallel Computers
Architectural Classes
Memory Systems
Distributed Processing
PVM & MPI
Parallel Applications
Task Assignment
CS 370 Dr. Young 37 Parallel and Distributed Processing Hardware Structure of Parallel Computers
Classification is based on the way of manipulating of instruction and data streams 4 main architectural classes [Flynn, 1972]
Multiple/Single Instruction (MI/SI)
Multiple/Single Data (MD/SD)
M.J. Flynn, Some computer organizations and their) effectiveness, IEEE Transactions on Computing, C-21, pp. 948-960, 1972.
CS 370 Dr. Young 38 Parallel and Distributed Processing
Architectural Classes SISD machines:
Accommodate one instruction stream that is executed serially.
These are the conventional systems that contain one CPU
SIMD machines:
Such systems often have thousands of processing units
execute the same instruction on different data
Hitachi S3600
CS 370 Dr. Young 39 Parallel and Distributed Processing
Architectural Classes MISD machines: Multiple instructions should act on a single stream of data No practical machine
MIMD machines: Execute instruction streams in parallel on different data. Run many sub-tasks in parallel Large variety of MIMD systems
CS 370 Dr. Young 40 Parallel and Distributed Processing Memory Systems Shared memory systems:
Have multiple CPUs all of which share the same address space.
Distributed memory systems:
Each CPU has its own associated memory.
CS 370 Dr. Young 41 Parallel and Distributed Processing Distributed Processing DM-MIMD concept one step further Instead of many integrated processors in one or several boxes, workstations are connected by (Gigabit) Ethernet, FDDI, or otherwise and set to work concurrently on tasks in the same program. communication between processors is often slower in orders of magnitude.
CS 370 Dr. Young 42 Parallel and Distributed Processing PVM & MPI Packages to realize Distributed Processing
PVM (Parallel Virtual Machine) [Geist et al., 1994]
MPIComplete Reference(Message Vol. 1,P Theassing MPI CoreInterface) [Snir et al. and Gropp et al., 1998]
A. Geist, A. Beguelin, J. Dongarra, R. Manchek, W. Jaing, and V. Sunderam, PVM: A Users' Guide and Tutorial for Networked Parallel Computing, MIT Press, Boston, 1994. M. Snir, S. Otto, S. Huss-Lederman, D. Walker, J. Dongarra, MPI: The , MIT Press, Boston, 1998. W. Gropp, S. Huss-Ledermann, A. Lumsdaine, E. Lusk, B. Nitzberg, W. Saphir, M. Snir, MPI: The Complete Reference, , MIT Press, Boston, 1998.
CS 370 Dr. Young 43 Vol. 2, The MPI Extensions Parallel and Distributed Processing
PVM & MPI
This style of programming, called the "message passing" model, has been widely accepted
PVM and MPI have been adopted by virtually all major vendors of distributed-memory MIMD systems and even on shared-memory MIMD systems for compatibility reasons.
CS 370 Dr. Young 44 Parallel and Distributed Processing
Parallel Applications
Parallel Algorithms
Fine grain/Coarse grain
Parallel Programming
ParBegin/ParEnd
PVM/MPI APIs
CS 370 Dr. Young 45 Parallel and Distributed Processing Task Assignment Performance Measures
Completion Time
Throughput Overheads for P&D Processing
Execution Time for tasks (E)
Intra-task Interference cost (ITI)
Inter-task Communication cost (ITC)
CS 370 Dr. Young 46 Parallel and Distributed Processing
Task Assignment Throughput (Stone, 1977)
E + ITI + ITC
H.Network Stone, FlowMultiprocessor Algorithms Scheduling with the Aid of , IEEE Transactions on Software Engineering, Vol. 3, No. 1, pp. 83-85, 1977.
CS 370 Dr. Young 47 Topics of Discussion
Introduction
Computer Networks
Parallel and Distributed Processing
Affordable Supercomputer
Future Trend and Challenge
Conclusion
Q&A
CS 370 Dr. Young 48 Affordable supercomputer
Computer networks with Off-the-Shelf hardware Powered by Parallel and Distributed Software Tools
Advantages over Conventional Supercomputer
System of Homogeneous Network
A network of PC with SCSI Link
SPVM
System of Heterogeneous Network
Internet
JMPI
CS 370 Dr. Young 49 Computer Networks with Off-the-Shelf Hardware Powered by Parallel and Distributed Processing Tools
CS 370 Dr. Young 50 Advantages over Conventional Supercomputer
Decomposable Reusable Scale up and down easily Off-the-shelf Third World friendly Economical Reconfigurable Interconnection Topology Easy to upgrade – bus, processor, software Collaborative R&D Environment General-purpose Multi-usage
CS 370 Dr. Young 51 Homogeneous Network
A network of Pentium PCs
CS 370 Dr. Young 52 Heterogeneous Network
CS 370 Dr. Young 53 Topics of Discussion
Introduction
Computer Networks
Parallel and Distributed Processing
Affordable Supercomputer
Future Trend and Challenge
Conclusion
Q&A
CS 370 Dr. Young 54 Future Trend and Challenge
PVM and MPI Community continues to grow Cheaper and faster processors and Interconnections More employment of Clusters of Workstations for High Performance Computing More freely available Software Tools
CS 370 Dr. Young 55 Future Trend and Challenge
Race between Proprietary supercomputer and the Cluster computers
How fast can a supercomputer go?
How the heterogeneous computing evolves?
Will a cluster of computers over Internet be a faster computer in the world?
Processing Power on Demand Service?
Processor Sharing?
CS 370 Dr. Young 56 Topics of Discussion
Introduction
Computer Networks
Parallel and Distributed Processing
Affordable Supercomputer
Future Trend and Challenge
Conclusion
Q&A
CS 370 Dr. Young 57 Conclusion
Powered by the state-of-art Parallel and Distributed Processing Tools, high- speed computer network, with powerful workstations, will become a very attractive, affordable, highly scalable and highly available solution for the High Performance Computing world.
CS 370 Dr. Young 58 Conclusion
Such an Exciting Area of Research
Practical
Affordable
Educational
Knowledge Sharing through Major Forums (e.g. IEEE TFCC, Top500, TopClusters)
One Key issue is how to compare/evaluate/rank their performances
CS 370 Dr. Young 59 Conclusion
Research topics
Build Your Own Supercomputer(Cluster)
Heterogeneous System
Employ new COTS (Com. Off-the-Shelf)
Classification
Benchmarks
Performance Tracking Tools
System Administration Software
CS 370 Dr. Young 60 Top 500 Supercomputers Update
Trend of Cluster Computers Versus Proprietary Supercomputers.
The TOP 500 Supercomputer List http://www.top500.org/
CS 370 Dr. Young 61 Q&A
CS 370 Dr. Young 62