Topologies How to Design

Topologies How to Design

Networks: Topologies How to Design Gilad Shainer, [email protected] TOP500 Statistics 2 TOP500 Statistics 3 World Leading Large-Scale Systems • National Supercomputing Centre in Shenzhen – Fat-tree, 5.2K nodes, 120K cores, NVIDIA GPUs, China (Petaflop) • Tokyo Institute of Technology – Fat-tree, 4K nodes, NVIDIA GPUs, Japan (Petaflop) • Commissariat a l'Energie Atomique (CEA) – Fat-tree, 4K nodes, 140K cores, France (Petaflop) • Los Alamos National Lab - Roadrunner – Fat-tree, 4K nodes, 130K cores, USA (Petaflop) • NASA – Hypercube, 9.2K nodes, 82K cores – NASA, USA • Jülich JuRoPa – Fat-tree, 3K nodes, 30K cores, Germany • Sandia National Labs – Red Sky – 3D-Torus, 5.4K nodes, 43K cores – Sandia “Red Sky”, USA 4 ORNL “Spider” System – Lustre File System • Oak Ridge Nation Lab central storage system – 13400 drives – 192 Lustre OSS – 240GB/s bandwidth – InfiniBand interconnect – 10PB capacity 5 Network Topologies • Fat-tree (CLOS), Mesh, 3D-Torus topologies • CLOS (fat-tree) – Can be fully non-blocking (1:1) or blocking (x:1) – Typically enables best performance • Non blocking bandwidth, lowest network latency • Mesh or 3D Torus – Blocking network, cost-effective for systems at scale – Great performance solutions for applications with locality 0,0 0,1 0,2 – Support for dedicate sub-networks 1,0 1,1 1,2 2,0 2,1 2,2 – Simple expansion for future growth 6 d-Dimensional Torus Topology • Formal definition – T=(V,E) is said to be d-dimensional torus of size N1xN2x…xNd if: • V={(v1,v2,…,vd) : 0 ≤ vi ≤ Ni-1} • E={(uv) : exists j s.t. 1) for each i≠j, vi=ui AND 2) vj=(uj±1) mod Nj} • Examples N1=5 N1=N2=3 0,0 0,1 0,2 0 1 2 3 4 1,0 1,1 1,2 2,0 2,1 2,2 7 3D-Torus System – Key Items • Multiple server nodes per cube junction • Smallest 3D cube size the better – Lowest latency between remote nodes – Minimizing throughput contention • Ability to connect storage • Support for separate networks – Dedicated network (links) for specific applications/usage – Example: links dedicated for collectives or specific jobs 8 InfiniBand 3D Torus 9 Routing for 3D Torus (Avoiding Deadlocks) • Setting routing might look simple – Just route packets on the shortest path between source - destination • In lossless networks trivial routing can be disastrous Communication pairs 1. 02 2. 13 3. 24 4. 30 0 2 1 3 2 4 3 0 4 1 5. 41 10 Avoiding Deadlock – Restrictive Approach • Idea – Define a set of rules forbidding usage of some resources or a (temporal) combination of resources which will guarantee freedom from deadlock – Design a routing complying with the rules 0 2 1 3 2 4 0 3 1 4 11 Avoiding Deadlock – Separation Approach • Idea – Decompose each (unidirectional) physical link into several logical channels with private buffer resources – Use logical channels to separate the network into virtual networks, each dependency-cycle-free – Assign communication pairs (with their paths) to the virtual networks • Back to our ring Routing: Shortest path Virtual mapping: If a 2 3 4 0 1 2 3 4 path uses 04 or 0 1 40 link map it to the red virtual network else to the black one 12 InfiniBand 3D Torus • InfiniBand drivers includes subnet management for – Fat Tree – min hop, up/down etc – 3D Torus - Dimension Ordered Routing 13 Mixed Topologies • Fat-tree topology provide the best performance solution • 3D-Torus can be more cost effective, easier to scale, good fit for applications with locality • Mixed topology – System connected as 3D Torus – Fast Fat-tree for collective operations 0,0 0,1 0,2 1,0 1,1 1,2 2,0 2,1 2,2 14 Notes • Following Fat-Tree network configurations – Flat network – No unused port – Two layer of switch fabric (L1 and L2) • Following 3D Torus configurations – Each 3D Torus junction is a 36-port switch – Number of switches refers to 36-port switches • InfiniBand is a great interconnect technology to enable flat connectivity of thousands and tens-of-thousands of servers in future Mega Warehouse Data Centers 15 Example: Non-blocking, Fat-Tree, 40Gb/s 648 L1 36-port switches 18 L2 648-port switches 18 Servers Total: 11664 servers (nodes) Non-blocking Throughput: 40Gb/s to the node Network 18 Servers 16 Example: 2:1 Oversubscription, Fat-Tree, 40Gb/s 648 L1 36-port switches 12 L2 648-port switches 24 Servers Total: 15552 servers (nodes) Non-blocking Throughput: 20Gb/s to the node Network 24 Servers 17 Example: 3:1 Oversubscription, Fat-Tree, 40Gb/s 648 L1 36-port switches 9 L2 648-port switches 27 Servers Total: 17496 servers (nodes) Non-blocking Throughput: 13Gb/s to the node Network 27 Servers 18 Example: 8:1 Oversubscription, Fat-Tree, 40Gb/s 324 L1 36-port switches 2 L2 648-port switches 32 Servers Total: 10368 servers (nodes) Non-blocking Throughput: 5Gb/s to the node Network 32 Servers 19 Example: 3D Torus + z 3D Torus Switch Junction + y 120Gb/s 120Gb/s 120Gb/s 120Gb/s + x - x 40Gb/s each 120Gb/s - y 120Gb/s - z 18 Servers (nodes) 3D Torus size: 8x8x8 (512 36-port switches) Total number of servers: 9216 20 3D Torus Connections Example ION1 Node 0 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 36 Node 12 Node 13 - port Switch port Node 14 Node 15 Node 16 - z +z - y +y - x +x 21 Choosing the Right Topology • Performance: Fat Tree – Application locality? 3D can become an option – Multiple users/applications? Fat Tree – Non blocking? Fat Tree • Cost – Depends on the size of the system – Very large systems can be more cost effective with 3D Torus • Future expansion? 3D Torus will be easier to expend 22 Network Offloading • Transport offloads – critical for CPU efficiency • Congestion avoidance – must be done in the network • Applications offloading (MPI offloading) – For example: MPI Collectives Offloads Software MPI: Losing performance Lower is better beyond 20% CPU Collectives Offload computation based MPI: availability Beyond 80% CPU computation availability without any performance loss! 23 Thank You www.hpcadvisorycouncil.com [email protected] 24 24 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    24 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us