Flynn's Taxonomy

Flynn's Taxonomy

Flynn’s Taxonomy • Proposed by Michael Flynn in 1966 • SISD – single instruction, single data • Traditional uniprocessor CSE 560 • SIMD – single instruction, multiple data Computer Systems Architecture • Execute the same instruction on many data elements • Vector machines, graphics engines • MIMD – multiple instruction, multiple data Multiprocessors • Each processor executes its own instructions • Multicores are all built this way • SPMD – single program, multiple data (extension proposed by Frederica Darema) • MIMD machine, each node is executing the same code • MISD – multiple instruction, single data • Systolic array 1 2 1 2 Shared-Memory Multiprocessors Distributed-Memory Multiprocessors Conceptual model …but systems actually look more like this • Memory is physically distributed • The shared-memory abstraction • Previously covered common address space and cache coherence • Familiar and feels natural to programmers • Scales to about 10s to 100 processors • Life would be easy if systems actually looked like this… • When we want to scale up to 1000s (or millions) of cores • Separate address spaces • Arbitrary interconnect – custom, LAN, WAN P0 P1 P2 P3 P0 P1 P2 P3 Memory $ M0 $ M1 $ M2 $ M3 Router/interface Router/interface Router/interface Router/interface 3 Interconnect 4 3 4 Connect Processors via Network Programming Models Cluster approach • The interconnect is a Local-Area Network (LAN) TCP/IP message delivery • Off-the-shelf processors (each of which is a multicore) • • IP addresses • Connect using off-the-shelf networking technology • Network handles routing, etc. • Leverages existing components inexpensive to design • Socket-based programming • Cloud service providers do this a lot! • Higher-level abstractions • Amazon Web Services (AWS) • Distributed shared memory • Microsoft Azure • Works but performs poorly – latency again • Scales up very easily • Map-Reduce • 1000s of nodes • Hadoop, etc. • Long latency to move data • Streaming data • Traverse network for one • Apache Storm, etc. cache line? Nope! • Explicit message passing (more later) 5 6 5 6 Virtualization Cluster Interconnect Sharing the processor cores Ethernet Switches • VM technology allows multiple virtual machines to run on a • 1st tier are top-of-rack (ToR) switches single physical machine • Additional tiers connect racks, top tier talks to outside world • Hypervisor schedules VMs onto physical cores • Lots of redundant paths 7 8 7 8 Can we fix latency issue? Custom Interconnect Cluster approach Known topology, trusted environment • TCP/IP network technology is dominant • Routing is easier Still • But is it needed? Or just readily available? • Security is easier TCP/IP Custom interconnect 9 10 9 10 Interconnect Topologies Cray Dragonfly • Mesh Custom Design for Supercomputers Torus (wraparound mesh) • • Big applications with lots of parallelism • Low-overhead message • All tiers in one switch (Aries) delivery • Routing is straightforward • Move along row to destination column • Move along column to destination • Forwarding can be fast • Old-school: store-and- forward • Modern: cut-through 11 12 11 12 Cray Dragonfly Network Back to Standardized Interconnect Mesh with additional links Issue with Ethernet is latency • Protocol processing at endpoints Still • Store-and-forward routing TCP/IP Infiniband interconnect 13 14 13 14 Infiniband Network Programming Paradigm • Standardized technology Message Passing • Multiple vendors • MPI (Message Passing Interface) is de facto standard • Equipment works together • Used by almost all supercomputing applications • Competition • Not trying to be the “Internet” • Focus on low latency interconnect needs • Minimize protocol processing • E.g., easier routing, simpler security model • Fast forwarding • Cut-through packet delivery • Remote Direct Memory Access (RDMA) • Supports single-ended messaging 15 16 15 16 More MPI Flynn’s Taxonomy • Proposed by Michael Flynn in 1966 MPI capabilities beyond just send() and rcve() • SISD – single instruction, single data • One-sided communication: get() and put() • Traditional uniprocessor • Collective operations • SIMD – single instruction, multiple data • Execute the same instruction on many data elements • Vector machines, graphics engines • MIMD – multiple instruction, multiple data • Each processor executes its own instructions • Multicores are all built this way • SPMD – single program, multiple data (extension proposed by Frederica Darema) • MIMD machine, each node is executing the same code • MISD – multiple instruction, single data • Systolic array 17 18 17 18 SIMD Instructions Graphics Engines Heterogeneous Multiprocessor • Many processing elements (PE), many threads per PE • Collections of threads execute in lock-step (SIMD-like) • Hide latency to memory by switching threads By Decora at English Wikipedia, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=30547549 19 20 19 20 Flynn’s Taxonomy Systolic Arrays • Proposed by Michael Flynn in 1966 H.T. Kung, “Why Systolic Architectures?,” Computer, 1982 • SISD – single instruction, single data • Traditional uniprocessor • SIMD – single instruction, multiple data • Execute the same instruction on many data elements • Vector machines, graphics engines • MIMD – multiple instruction, multiple data • Each processor executes its own instructions • Multicores are all built this way • SPMD – single program, multiple data (extension proposed by Frederica Darema) • MIMD machine, each node is executing the same code • MISD – multiple instruction, single data • Systolic array 21 22 21 22 Systolic Arrays Tensor Processing Unit Purpose-built design for specific problem • Custom PE, replicated many times • E.g., array of MA (multiply-accumulate) units for FIR filter • RNA folding [Jacob et al. 2010] 23 24 23 24 Tensor Processing Unit Flynn’s Taxonomy • Proposed by Michael Flynn in 1966 • SISD – single instruction, single data • Traditional uniprocessor • SIMD – single instruction, multiple data • Execute the same instruction on many data elements • Vector machines, graphics engines • MIMD – multiple instruction, multiple data • Each processor executes its own instructions • Multicores are all built this way • SPMD – single program, multiple data (extension proposed by Frederica Darema) • MIMD machine, each node is executing the same code • MISD – multiple instruction, single data • Systolic array 25 26 25 26.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us