Scalability and Classifications

Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections MCS 572 Lecture 2 Introduction to Supercomputing Jan Verschelde, 13 January 2021 Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 1 / 28 Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 2 / 28 the classification of Flynn In 1966, Flynn introduced the following classification: SISD = Single Instruction Single Data stream one single processor handles data sequentially use pipelining (e.g.: car assembly) to achieve parallelism MISD = Multiple Instruction Single Data stream called systolic arrays, has been of little interest SIMD = Single Instruction Multiple Data stream graphics computing, issue same command for pixel matrix vector and arrays processors for regular data structures MIMD = Multiple Instruction Multiple Data stream this is the general purpose multiprocessor computer Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 3 / 28 Single Program Multiple Data Stream One model is SPMD: Single Program Multiple Data stream: 1 All processors execute the same program. 2 Branching in the code depends on the identification number of the processing node. Manager worker paradigm: manager (also called root) has identification zero, workers are labeled 1; 2;:::; p − 1. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 4 / 28 Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 5 / 28 types of parallel computers shared memory distributed memory m1 m2 m3 m4 network 6? 6? 6? 6? 6? 6? 6? 6? network p1 p2 p3 p4 6? 6? 6? 6? 6? 6? 6? 6? p1 p2 p3 p4 m1 m2 m3 m4 One crude distinction concerns memory, shared or distributed: A shared memory multicomputer has one single address space, accessible to every processor. In a distributed memory multicomputer, every processor has its own memory accessible via messages through that processor. Many parallel computers consist of multicore nodes. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 6 / 28 clusters Definition A cluster is an independent set of computers combined into a unified system through software and networking. Beowulf clusters are scalable performance clusters based on commodity hardware, on a private network, with open source software. What drove the clustering revolution in computing? 1 commodity hardware: choice of many vendors for processors, memory, hard drives, etc... 2 networking: Ethernet is dominating commodity networking technology, supercomputers have specialized networks; 3 open source software infrastructure: Linux and MPI. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 7 / 28 Message Passing and Scalability total time = computation time + communication time | {z } o v e r h e a d Because we want to reduce the overhead, the computation time computation/communication ratio = communication time determines the scalability of a problem: How well can we increase the problem size n, keeping p, the number of processors fixed? Desired: order of overhead order of computation, so ratio ! 1, 2 examples: O(log2(n)) < O(n) < O(n ). Remedy: overlapping communication with computation. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 8 / 28 Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 9 / 28 distributed shared memory computers In a distributed shared memory computer: memory is physically distributed with each processor, and each processor has access to all memory in single address space. Benefits: 1 message passing often not attractive to programmers, 2 shared memory computers allow limited number of processors, whereas distributed memory computers scale well Disadvantage: access to remote memory location causes delays and the programmer does not have control to remedy the delays. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 10 / 28 Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 11 / 28 some terminology bandwidth: number of bits transmitted per second on latency, we distinguish tree types: message latency: time to send zero length message (or startup time), network latency: time to make a message transfer the network, communication latency: total time to send a message including software overhead and interface delays. diameter of network: minimum number of links between nodes that are farthest apart on bisecting the network: bisection width : number of links needed to cut network in two equal parts, bisection bandwidth: number of bits per second which can be sent from one half of network to the other half. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 12 / 28 arrays, rings, meshes, and tori Connecting p nodes in complete graph is too expensive. An array and ring of 4 nodes: t t t t £ t t t t ¢ ¡ A matrix and torus of 16 nodes: £ £ £ £ t t t t £ t t t t ¢ ¡ t t t t £ t t t t ¢ ¡ t t t t £ t t t t ¢ ¡ t t t t £ t t t t ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ ¢ ¡ Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 13 / 28 hypercube network Two nodes are connected , their labels differ in exactly one bit. 011 111 01 11 001 t t 101 t t t t 110 010 t t 00 10 000 100 t t t t e-cube or left-to-right routing: flip bits from left to right, e.g.: going from node 000 to 101 passes through 100. In a hypercube network with p nodes: maximum number of flips is log2(p), number of connections is :::? Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 14 / 28 a tree network Consider a binary tree: The leaves in the tree are processors. The interior nodes in the tree are switches. P PP PP @ @ @ @ ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A Often the tree is fat: t t t t t t t t with an increasing number of links towards the root of the tree. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 15 / 28 Scalability and Classifications 1 Types of Parallel Computers MIMD and SIMD classifications shared and distributed memory multicomputers distributed shared memory computers 2 Network Topologies static connections dynamic network topologies by switches ethernet connections Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 16 / 28 a crossbar switch In a shared memory multicomputer, processors are usually connected to memory modules by a crossbar switch. For example, for p = 4: p4 t p3 t p2 t p1 t m1 m2 m3 m4 t t t t A p-processor shared memory computer requires p2 switches. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 17 / 28 2-by-2 switches ¨H ¨H a switch pass through cross over Changing from pass through to cross over configuration: 1 2 1 2 ¨H¨H t t t t A¡ ¡A ¨H 3 4 3 ¨H 4 t t t t Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 18 / 28 a multistage network Rules in the routing algorithm: 1 bit is zero: select upper output of switch 2 bit is one: select lower output of switch The first bit in the input determines the output of the first switch, the second bit in the input determines the output of the second switch. 00 00 10 t A ¡ t 01 inputst ¡A t outputs 01 ¡ A 10 11 t t11 t t The communication between 2 nodes using 2-by-2 switches causes blocking: other nodes are prevented from communicating. Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 19 / 28 a 3-stage Omega interconnection network circuit switching: 000 000 100 001 t A £ A £ t t A£ A£ t 001 £A £A 010 101 B £ B £ 011 t C £B ¢ C £B ¢ t t £C ¢B £C ¢B t 010 £ C¢ B £ C¢ B 100 110 ¢ C ¢ C 101 t C¡ C¡ t t ¡C ¡C t 011 ¡ C ¡ C 110 111 t t111 t t p number of switches for p processors: log2(p) × 2 Introduction to Supercomputing (MCS 572) scalability and classifications L-2 13 January 2021 20 / 28 circuit and packet switching If all circuits are occupied, communication is blocked. Alternative solution: packet switching: message is broken in packets and sent through network.

Scalability and Classifications

Distributed Algorithms with Theoretic Scalability Analysis of Radial and Looped Load ﬂows for Power Distribution Systems

Scalability and Performance Management of Internet Applications in the Cloud

Scalability and Optimization Strategies for GPU Enhanced Neural Networks (Genn)

Parallel System Performance: Evaluation & Scalability

Multiprocessing and Scalability

Pascal Viewer: a Tool for the Visualization of Parallel Scalability Trends

Computer Architecture: Parallel Processing Basics

Parallel Programming with Openmp

Challenges for the Message Passing Interface in the Petaflops Era

Scalable SIMD-Efficient Graph Processing on Gpus

Computing at Massive Scale: Scalability and Dependability Challenges

CUDA Basics Murphy Stein New York University