Parallel and Distributed Computing Chapter 1: Introduction to Parallel Computing

Parallel and Distributed Computing Chapter 1: Introduction to Parallel Computing Jun Zhang Department of Computer Science University of Kentucky Lexington, KY 40506 Chapter 1: CS621 1 1.1a: von Neumann Architecture Common machine model for over 70 years Stored-program concept CPU executes a stored program A sequence of read and write operations on the memory Order of operations is sequential Chapter 1: CS621 2 1.1b: A More Detailed Architecture based on von Neumann Model Chapter 1: CS621 3 1.1c: Old von Neumann Computer Chapter 1: CS621 4 1.1d: CISC von Neumann Computer CISC stands for Complex Instruction Set Computer with a single bus system Harvard (RISC) architecture utilizes two buses, a separate data bus and an address bus RISC stands for Reduced Instruction Set Computer They are SISD machines – Single Instruction Stream on Single Data Stream Chapter 1: CS621 5 1.1e: Personal Computer Chapter 1: CS621 6 1.1f: John von Neumann December 28, 1903 – February 8, 1957 Hungarian mathematician Mastered calculus at 8 Graduate level math at 12 Got his Ph.D. at 23 His proposal to his 1st wife, “You and I might be able to have some fun together, seeing as how we both like to drink." Chapter 1: CS621 7 1.2a: Motivations for Parallel Computing Fundamental limits on single processor speed Heat dissipation from CPU chips Disparity between CPU & memory speeds Distributed data communications Need for very large scale computing platforms Chapter 1: CS621 8 1.2b: Fundamental Limits – Cycle Speed Intel 8080 2MHz 1974 ARM 2 8MHz 1986 Intel Pentium Pro 200MHz 1996 AMD Athlon 1.2GHz 2000 Intel QX6700 2.66GHz 2006 Intel Core i7 3770k 3.9GHz 2013 Speed of light: 30cm in 1ns Signal travels about 10 times slower Chapter 1: CS621 9 1.2c: High-End CPU is Expensive Price for high-end CPU rises sharply Intel processor price/performance Chapter 1: CS621 10 1.2d: Moore’s Law Moore’s observation in 1965: the number of transistors per square inch on integrated circuits had doubled every year since the integrated circuit was invented Moore’s revised observation in 1975: the pace was slowed down a bit, but data density had doubled approximately every 18 months How about the future? (price of computing power falls by a half every 18 months?) Chapter 1: CS621 11 1.2e: Moore’s Law – Held for Now Chapter 1: CS621 12 1.3: Power Wall Effect in Computer Architecture Too many transistors in a given chip die area Tremendous increase in power density Increased chip temperature High temperature slows down the transistor switching rate and the overall speed of the computer Chip may melt down if not cooled properly Efficient cooling systems are expensive Chapter 1: CS621 13 1.3: Cooling Computer Chips Some people suggest to put computer chips in liquid nitrogen to cool them Chapter 1: CS621 14 1.3: Solutions Use multiple inexpensive processors A processor with multiple cores Chapter 1: CS621 15 1.3: A Multi-core Processor Chapter 1: CS621 16 1.3a: CPU and Memory Speeds In 20 years, CPU speed (clock rate) has increased by a factor of 1000 DRAM speed has increased only by a factor of smaller than 4 How to feed data faster enough to keep CPU busy? CPU speed: 1-2 ns DRAM speed: 50-60 ns Cache: 10 ns Chapter 1: CS621 17 1.3b: Memory Access and CPU Speed Chapter 1: CS621 18 1.3b: CPU, Memory, and Disk Speed Chapter 1: CS621 19 1.3c: Possible Solutions A hierarchy of successively fast memory devices (multilevel caches) Location of data reference (code) Efficient programming can be an issue Parallel systems may provide 1.) larger aggregate cache 2.) higher aggregate bandwidth to the memory system Chapter 1: CS621 20 1.3f: Multilevel Hierarchical Cache Chapter 1: CS621 21 1.4a: Distributed Data Communications Data may be collected and stored at different locations It is expensive to bring them to a central location for processing Many computing assignments may be inherently parallel Privacy issues in data mining and other large scale commercial database manipulations Chapter 1: CS621 22 1.4b: Distributed Data Communications Chapter 1: CS621 23 1.4c: Move Computation to Data (CS626: Large Scale Data Science) Chapter 1: CS621 24 1.5a: Why Use Parallel Computing Save time – reduce wall clock time – many processors work together Solve larger problems – larger than one processor’s CPU and memory can handle Provide concurrency – do multiple things at the same time: online access to databases, search engine Google’s 4,000 PC servers were one of the largest clusters in the world (server farm) Chapter 1: CS621 25 1.4b: Google’s Data Center Chapter 1: CS621 26 1.5b: Other Reasons for Parallel Computing Taking advantages of non-local resources – using computing resources on a wide area network, or even internet (grid or cloud computing) Cost savings – using multiple “cheap” computing resources instead of a high-end CPU Overcoming memory constraints – for large problems, using memories of multiple computers may overcome the memory constraint obstacle Chapter 1: CS621 27 1.6a: Need for Large Scale Modeling Long term weather forecasting Large scale ocean modeling Oil reservoir simulations Car and airplane manufacturing Semiconductor simulation Pollution tracking Large scale commercial databases Aerospace (NASA microgravity modeling) Chapter 1: CS621 28 1.6b: Semiconductor Simulation Before 1975, an engineer had to make several runs through the fabrication line until a successful device was fabricated Device dimensions shrink below 0.1 micro-meter A fabrication line costs 1.0 billion dollars to build A design must be thoroughly verified before it is committed to silicon A realistic simulation for one diffusion process may take days or months to run on a workstation Chip price drops quickly after entering the market Chapter 1: CS621 29 1.4b: Semiconductor Diffusion Process Chapter 1: CS621 30 1.6c: Drug Design Most drugs work by binding to a specific site, called a receptor, on a protein A central problem is to find molecules (ligands) with high binding affinity Need to accurately and efficiently estimate electrostatic forces in molecular and atomic interactions Calculate drug-protein binding energies from quantum mechanics, statistical mechanics and simulation techniques Chapter 1: CS621 31 1.6d: Computing Protein Binding Chapter 1: CS621 32 1.6d: Computer Aided Drug Design Chapter 1: CS621 33 1.7: Issues in Parallel Computing Design of parallel computers Design of efficient parallel algorithms Methods for evaluating parallel algorithms Parallel computer languages Parallel programming tools Portable parallel programs Automatic programming of parallel computers Education of parallel computing philosophy Chapter 1: CS621 34 1.8 Eclipse Parallel Tools Platform A standard, portable parallel integrated development environment that supports a wide range of parallel architectures and run time systems (IBM) A scalable parallel debugger Support for the integration of a wide range of parallel tools An environment that simplifies the end-user interaction with parallel systems Chapter 1: CS621 35 1.9 Message Passing Interface (MPI) We will use MPI for hands-on experience in this class Message Passing Interface can be downloaded from an online website (MPICH) Parallel computing can be simulated on your own computers (Microsoft MPI) UK can provide distributed computing services for research purposes, for free Chapter 1: CS621 36 1.10 Cloud Computing Cloud Computing is a style of computing in which dynamically scalable and often virtualized resources are provided over the Internet Users need not have knowledge of, expertise in, or control over the technology infrastructure in the “cloud” that supports them Compared to: Grid Computing (cluster of networked, loosely coupled computers) Utility Computing (packaging of computing resources as a metered service) and Autonomic Computing (computer systems capable of self-management) Chapter 1: CS621 37 1.11 Cloud Computing By Sam Johnston in Wikimedia Commons Chapter 1: CS621 38 1.12 Cloud Computing A means to increase computing capacity or add computing capabilities at any time without investing in new infrastructure, training new personnel, or licensing new software Chapter 1: CS621 39 1.13 Cloud Computing Chapter 1: CS621 40 1.14 Cloud Computing Chapter 1: CS621 41 1.15 Cloud Computing Chapter 1: CS621 42 1.16 Top Players in Cloud Computing Amazon.com (IaaS, most comprehesive) Vmware (vCloud, software to build cloud) Microsoft (PaaS, IaaS) Salesforce.com (SaaS) Google (IaaS, PaaS, Google App Engine) Rackspace (IaaS) IBM (Openstack) Citrix (compete with Vmware, free cloud operating system) Joyent (compete with Vmware, OpenStack, Citrix) SoftLayer (web hosting service provider) Chapter 1: CS621 43 1.17 Parallel and Distributed Computing (CS621) This class is about parallel and distributed computing algorithms and applications Algorithms for communications between processors Algorithms to solve scientific computing problems Hands-on experience with message-passing interface (MPI) to write parallel programs to solve some problems This class is NOT about parallel data processing (CS626) Chapter 1: CS621 44 *** Hands-on Experience You can down and install a copy of Microsoft MPI (Message Passing Interface) at https://www.microsoft.com/en- us/download/details.aspx?id=57467 You also need MS Visual Studio to work with it. https://visualstudio.microsoft.com/ Chapter 1: CS621 45 *** Hands-on Experience Here is a video demonstrating how to set up MS Visual Studio for MS MPI programming https://www.youtube.com/watch?v=IW3SKDM_yEs&t=330s Please make sure that you have installed the MS Visual Studio and MS MPI on your computer Please run the sample MPI “Hello World” program https://mpitutorial.com/tutorials/mpi-hello-world/ Chapter 1: CS621 46 *** Hands-on Experience If you have serious research projects that need a lot of computing resources, you can also request an account on a supercomputer at the UK Center for Computational Science Go to the Center for Computational Science and get a form to fill out and ask your advisor to sign it https://www.ccs.uky.edu/ Chapter 1: CS621 47.

Load more