Parallel Architectures
Part 1:
The rise of parallel machines
Intel Core i7
4 CPU cores
2 hardware thread per core
(8 “cores”)
Lab Cluster
Intel Xeon
4/10/16/18 CPU cores
2 hardware thread per core
(8/20/32/36 “cores”)
+ multi socket boards
SUN UltraSPARC T3
16 CPU cores
8 hardware thread per core
(128 “cores”)
IBM Power 8
GPUs
• 2,000+ cores on one chip
NVIDIA TITAN Z
Top500.org
Part 2:
Taxonomies for Parallel Architectures
Taxonomies for Parallel Architectures • Floyd’s Taxonomy - program control and memory access • Taxonomy Based on Memory Organization • Taxonomy Based on Processor Granularity • Taxonomy Based on Processor Synchronization • Taxonomy Based on Interconnection Architecture
Floyd’s Taxonomy
• Computer architectures: – SISD – MISD – SIMD – MIMD • Based on method of program control and memory access
SISD Computers
• Standard sequential computer. • A single processing unit receives a single stream of instructions that operate on a single stream of data.
MISD Computers
• p processors, each with its own control unit, share a common memory.
SIMD Computers
• All p identical processors operate under the control of a single instruction stream issued by a central control unit. • There are p data streams, one per processor so different data can be used in each processor.
MIMD Computers
• p processors • p streams of instructions • p streams of data
Taxonomy Based on Memory Organization • Distributed memory • Shared memory – UMA – NUMA
Distributed Memory
• Each processor has its own memory • Communication is usually performed by message passing • Each processor can access – its own memory, directly – memory of another processor, via message passing
Interconnect
Shared Memory
• provides hardware support for read/write to a shared memory space • has a single address space shared by all processors
I/O devices
Mem Mem Mem Mem I/O ctrl I/O ctrl
Interconnect Interconnect
Processor Processor
Scaling Up…
– Problem is interconnect: cost (crossbar) or bandwidth (bus) – Dance-hall: bandwidth still scalable, but lower cost than crossbar • latencies to memory uniform, but uniformly large – Distributed memory or non-uniform memory access (NUMA) • Construct shared address space out of simple message transactions across a general-purpose network (e.g. read-request, read-response) – Caching shared (particularly nonlocal) data?
Taxonomy Based on Processor Granularity
• Coarse Grained: Few powerful processors • Fined Grained: Many small processors (massively parallel) • Medium Grained: …between the two...
Taxonomy Based on Processor Synchronization
• Asynchronous: Processors run on independent clocks. User has synchronize via message passing or shared variable. • Fully Synchronous: Processors run in sync on one global clock. • Bulk-synchronous: Hybrid. Processors have independent clocks. Support is provided for global synchronization to be called by the user’s application program.
Taxonomy Based on Interconnection Architectures
• Static – Point to point connections
• Dynamic – Network with switches – Crossbars – Buses Interconnect Network
Static Interconnection Topologies
Linear Array
Ring
• Diameter (Max distance between processors) • Bisection Width (Min cuts to break into equal halves) • Cost (number of links)
Static Interconnection Topologies
Mesh Torus
Diameter? Bisection Width ? Cost ? Static Interconnection Topologies • Tree
Diameter? Bisection Width ? Cost ?
Static Interconnection Topologies • Complete Network
Diameter? Bisection Width ? Cost ?
Static Interconnection Topologies • d-dim Hypercube
d 2 processors d=4 d=0 d=1 d=2 d=5
d=3
Diameter? Bisection Width ? Cost ? Static Interconnection Topologies • Fat Tree
Diameter? Bisection Width ? Cost ?
Switch based interconnection network
Summary
Taxanomy of parallel machines
Fine grained
Coarse grained massively parallel coarse clusters Distributed grained memory clusters
GPU Shared multi-core memory
MIMD SIMD
• Massively parallel cluster (MIMD, distributed memory, fine grained) • Coarse grained cluster (MIMD, distributed memory, coarse grained) • Multi-core processor (MIMD, shared memory, coarse grained)
• GPU (SIMD, shared memory, fine grained)