<<

Exascale HPC Fabric

Ashrut Ambastha Sr. Stuff of Solution Architect Mar, 2019

© 2018 Mellanox Technologies 1 Supporting Variety of

Fat Tree Hypercube Torus Dragonfly

© 2018 Mellanox Technologies 2 Topologies – Fat Tree Example

Full Fat Tree / Full CBB

Half Fat Tree / Half CBB

© 2018 Mellanox Technologies 3 Quantum HDR100 – Fat Tree Example

400-Node 100G InfiniBand Platform 800-Node 200G InfiniBand Platform

1 2 20 1 2 5

1 2 39 40 1 2 9 10

15 Switches 60 Switches

© 2018 Mellanox Technologies 4 Torus Topology

© 2018 Mellanox Technologies 5 Mesh or Torus

▪Mesh or 3D Torus ▪ Mesh – each node is connected to 4 other nodes: positive and negative X and Y axis ▪ 3D mesh – Each node is connected to 6 other nodes: positive and negative X, Y and Z axis ▪ 2D/3D torus – The ends of the 2D/3D meshes are connected ▪ Cost-effective for systems at scale ▪ Great performance solutions for applications with locality ▪ Support for dedicated sub-networks ▪ Simple expansion for future growth

3 X 3 2D Torus

© 2018 Mellanox Technologies 6 3D-Torus Example: 12K Nodes, 14x14x13

+ z ▪ 3D Torus configuration + y ▪ Cube links – 5 x 40Gb/s per direction (total of 30 links) 5 links ▪ 5 nodes per junction ▪ Ratio node bandwidth to network bandwidth 1:5 5 links ▪ 5 network links per node link ▪ A storage connection available in every junction 5 links 36 Port QDR 5 links + x - x Switch ▪ Can be used for server node ▪ System configuration ▪ 12,000 server nodes S1 - x N5 5 links - y N4 Legend: 5 links N3 S – connection to storage N1 N2 - z N – connection to node (server)

© 2018 Mellanox Technologies 7 Example 3D Torus – SDSC Gordon

+ z + y 120Gb/s

120Gb/s

120Gb/s 120Gb/s - x + x

40Gb/s each

120Gb/s - y 120Gb/s - z 16 Compute nodes 2 IO nodes

© 2018 Mellanox Technologies 8 Dragonfly+ Topology

© 2018 Mellanox Technologies 9 What is Dragonfly Topology?

▪ Dragonfly is a hierarchical topology with the following properties:

▪ Several “groups”, connected together using all to all links

▪ The topology inside each group can be any topology

▪ Focus on reducing the of long links and network diameter to reduce total cost of network

▪ Requires Adaptive Routing to enable efficient operation

Full-Graph connecting L every group to all B other groups B B B

Group 1 Group 2 Group G

1 2 H H+1 H+2 2H GH

© 2018 Mellanox Technologies 10 There are Different Dragonfly Topologies Options

1 m

b

1 2 3 r

1D = Full Graph 2D = GHC(2,m,r) DF+ = Full Bipartite

▪Dragonfly Flavors Differentiate on The “Intra-Group” Topology

© 2018 Mellanox Technologies 11 Traditional Dragonfly vs Dragonfly+

Dragonfly+ 1 s

1 2 3 l

1 s 1 s

1 2 3 l 1 2 3 l

1 s

1 2 3 l

© 2018 Mellanox Technologies 12 Dragonfly+ Topology

Full-Graph connecting ▪ Several “groups”, connected using all to all links L every group to all ▪ The topology inside each group can be any topology B other groups B B ▪ Reduce total cost of network (fewer long cables) B ▪ Utilizes Adaptive Routing to for efficient operations Group 1 Group 2 Group G ▪ Simplifies future system expansion

1 2 H H+1 H+2 2H GH

1200-Nodes Dragonfly+ Systems Example

1.1 1.2 1.20 2.1 2.2 2.20 3.1 3.2 3.20 G1 G2 G3

1 2 20 1 2 20 1 2 20

HCA HCA HCA HCA HCA HCA HCA HCA HCA x 20 x 20 x 20 x 20 x 20 x 20 x 20 x 20 x 20

© 2018 Mellanox Technologies 13 1200 Ports Dragonfly+ Topology Example

▪ 3 groups, each supports 400 hosts in fat tree topology

▪ Each group contains 20 leaf switches (400 hosts total) and 20 spine switches

▪ Total 1200 AOCs

1.1

1.2

1.2

0

1

2

0

11.

11. 11.2

2.2 2.2 2.1 0

1 2 20 1 2 20 1 2 20 20 20 20 HCA HCA HCA x 20 x 20 x 20

© 2018 Mellanox Technologies 14 Future Expansion of Dragonfly+ Based System

▪ Topology expansion of a Fat Tree, or a regular/Aries like Dragonfly requires one of the following ▪ Reduction of early phase bisection bandwidth due to reservation of ports on the network switches ▪ Re-cabling the long cables ▪ Dragonfly+ is the only topology that allows system expansion at zero cost ▪ While maintaining bisection bandwidth ▪ No port reservation

▪ No re-cabling

1.20

1.1

1.2

21.20

21.1 21.2

2.1 2.2 2.20 Phase 1: 11x400 = 4400 hosts 1 2 11 1 2 20 1 2 20 1 2 20 20 20 20 20 20 20 HCA HCA HCA x 20 x 20 x 20

© 2018 Mellanox Technologies 15

Future Expansion of Dragonfly+ Based System

x 20 x x 20 x

HCA HCA

20

20 20

Phase 2: 20

20 2 1 20 2

+10x400 = 1

12 1

8400 hosts 2

12.20 21.20 12.1 21.1 12.2 21.2

Re-cable the central racks,

1.20 1.1

a change local to the RACK 1.2

21.1

21.2 21.20

2.2 2.20 2.1 Phase 1: 11x400 = 4400 hosts 1 2 11 1 2 20 1 2 20 1 2 20 20 20 20 20 20 20 HCA HCA HCA x 20 x 20 x 20

© 2018 Mellanox Technologies 16 Dragonfly+ Simplifies Scale Deployment and Cost

▪ Reduced fabric cost compared to Fat Tree when scaling beyond 3-level Fat Tree ▪ For all traffic patterns Dragonfly with 2:1 bisection provides close performance to Fat Tree of 1:1 ▪ Cost saving in Switches/Host: ▪ 2:1 Dragonfly+ = 4/K ▪ 1:1 3 level Fat-Tree= 5/K ▪ Number of long (optical cables) is the same

▪ Dragonfly+ enables cost effective large scale networks

© 2018 Mellanox Technologies 17 Dragonfly+ Accelerates Fastest Supercomputer in Canada Large Scale Innovative Highest Performance at Lower TCO Dragonfly+ Topology

InfiniBand Leadership In-Network Computing with SHARP Performance and Tag Matching offloads

Advanced Adaptive Routing Highest Network Utilization

© 2018 Mellanox Technologies 18 Mellanox Accelerates Leading HPC and AI Systems (Examples)

27 34 59 NASA Ames Research Center World’s leading Industry Supercomputer Fastest Supercomputer in Canada 20K InfiniBand Nodes 4.6K InfiniBand Nodes Dragonfly+ Topology 1.5K InfiniBand Nodes

© 2018 Mellanox Technologies 19 Thank You

© 2018 Mellanox Technologies 20