Parallel Architectures

Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 “cores”) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36 “cores”) + multi socket boards SUN UltraSPARC T3 16 CPU cores 8 hardware thread per core (128 “cores”) IBM Power 8 GPUs • 2,000+ cores on one chip NVIDIA TITAN Z Top500.org Part 2: Taxonomies for Parallel Architectures Taxonomies for Parallel Architectures • Floyd’s Taxonomy - program control and memory access • Taxonomy Based on Memory Organization • Taxonomy Based on Processor Granularity • Taxonomy Based on Processor Synchronization • Taxonomy Based on Interconnection Architecture Floyd’s Taxonomy • Computer architectures: – SISD – MISD – SIMD – MIMD • Based on method of program control and memory access SISD Computers • Standard sequential computer. • A single processing unit receives a single stream of instructions that operate on a single stream of data. MISD Computers • p processors, each with its own control unit, share a common memory. SIMD Computers • All p identical processors operate under the control of a single instruction stream issued by a central control unit. • There are p data streams, one per processor so different data can be used in each processor. MIMD Computers • p processors • p streams of instructions • p streams of data Taxonomy Based on Memory Organization • Distributed memory • Shared memory – UMA – NUMA Distributed Memory • Each processor has its own memory • Communication is usually performed by message passing • Each processor can access – its own memory, directly – memory of another processor, via message passing Interconnect Shared Memory • provides hardware support for read/write to a shared memory space • has a single address space shared by all processors I/O devices Mem Mem Mem Mem I/O ctrl I/O ctrl Interconnect Interconnect Processor Processor Scaling Up… – Problem is interconnect: cost (crossbar) or bandwidth (bus) – Dance-hall: bandwidth still scalable, but lower cost than crossbar • latencies to memory uniform, but uniformly large – Distributed memory or non-uniform memory access (NUMA) • Construct shared address space out of simple message transactions across a general-purpose network (e.g. read-request, read-response) – Caching shared (particularly nonlocal) data? Taxonomy Based on Processor Granularity • Coarse Grained: Few powerful processors • Fined Grained: Many small processors (massively parallel) • Medium Grained: …between the two... Taxonomy Based on Processor Synchronization • Asynchronous: Processors run on independent clocks. User has synchronize via message passing or shared variable. • Fully Synchronous: Processors run in sync on one global clock. • Bulk-synchronous: Hybrid. Processors have independent clocks. Support is provided for global synchronization to be called by the user’s application program. Taxonomy Based on Interconnection Architectures • Static – Point to point connections • Dynamic – Network with switches – Crossbars – Buses Interconnect Network Static Interconnection Topologies Linear Array Ring • Diameter (Max distance between processors) • Bisection Width (Min cuts to break into equal halves) • Cost (number of links) Static Interconnection Topologies Mesh Torus Diameter? Bisection Width ? Cost ? Static Interconnection Topologies • Tree Diameter? Bisection Width ? Cost ? Static Interconnection Topologies • Complete Network Diameter? Bisection Width ? Cost ? Static Interconnection Topologies • d-dim Hypercube d 2 processors d=4 d=0 d=1 d=2 d=5 d=3 Diameter? Bisection Width ? Cost ? Static Interconnection Topologies • Fat Tree Diameter? Bisection Width ? Cost ? Switch based interconnection network Summary Taxanomy of parallel machines Fine grained Coarse grained massively parallel coarse clusters Distributed grained memory clusters GPU Shared multi-core memory MIMD SIMD • Massively parallel cluster (MIMD, distributed memory, fine grained) • Coarse grained cluster (MIMD, distributed memory, coarse grained) • Multi-core processor (MIMD, shared memory, coarse grained) • GPU (SIMD, shared memory, fine grained).

Parallel Architectures

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support