VXLAN Design and Deployment
Total Page:16
File Type:pdf, Size:1020Kb
VXLAN Design and Deployment Marian Klas, Systems Engineer [email protected] Agenda • Why VXLAN? • VXLAN Fundamentals • Underlay Deployment Considerations • Overlay Deployment Considerations • Summary and Conclusion 2 Trend: Flexible Data Center Fabrics Create Virtual Networks on top of an efficient IP network Mobility Segmentation + Policy Scale Automated & Programmable V V Physical M M Full Cross Sectional BW Hosts O O S S Virtual L2 + L3 Connectivity Use VXLAN to Create DC Fabrics Physical + Virtual 3 VXLAN Fundamentals 4 Why Overlays? Seek well integrated best in class Overlays and Underlays Robust Underlay/Fabric Flexible Overlay Virtual Network • High Capacity Resilient Fabric • Mobility – Track end-point attach at edges • Intelligent Packet Handling • Segmentation • Programmable & Manageable • Scale – Reduce core state • Distribute and partition state to network edge • Flexibility/Programmability • Reduced number of touch points 5 Overlay Taxonomy Overlay Control Plane Service = Virtual Network Instance (VNI) VTEPs Identifier = VN Identifier (VNID) NVE = Network Virtualization Edge Encapsulation VTEP = VXLAN Tunnel End-Point Edge Devices (NVE) Edge Device (NVE) Hosts Underlay Network (end-points) Underlay Control Plane 6 VXLAN is an Overlay Encapsulation Data Plane Learning Protocol Learning Flood and Learn over a multidestination Advertise hosts in a protocol distribution tree joined by all edge devices amongst edge devices Overlay Control Plane Encapsulation VXLAN t 7 VXLAN Packet Structure Ethernet in IP with a shim for scalable segmentation Outer MAC Header Outer IP Header Outer UDP Header VXLAN Header Original Layer 2 Frame FCS 14 Bytes (4 Bytes Optional) 20 Bytes 8 Bytes 8 Bytes Ethernet Payload VNI Tag Port Source 0x8100 Header 0x0000 0x0800 Address Dest. IP Address Protocol VLAN ID Reserved Reserved RRRRIRRR Source IP Src. MAC Checksum Checksum IP Header 0x11 (UDP) VLAN Type Dest. MAC Misc. Data Ether Type VXLAN Port UDP Length VXLAN Flags 48 48 16 16 16 72 8 16 32 32 16 16 16 16 8 24 24 8 Src VTEP MAC Address Src and Dst addresses of Large scale the VTEPs Allows for 16M segmentation UDP 4789 possible segments Next-Hop MAC Address Hash of the inner L2/L3/L4 headers of the original frame. Enables entropy for ECMP Load Tunnel Entropy 50 (54) Bytes of overhead balancing in the Network. 8 Data Plane Learning Dedicated Multicast Distribution Tree per VNI PIM Join for Multicast PIM Join for Multicast Group 239.1.1.1 Group 239.2.2.2 V V V V V Web DB DB Web VM VM VM VM 9 Data Plane Learning Dedicated Multicast Distribution Tree per VNI V V V V V VM1 VM2 VM3 10 Data Plane Learning Learning on Broadcast Source - ARP Request Example ARP Req IP A è g ARP Req IP A è g V V V V V ARP Req VM1 VM2 VM3 MAC IP Addr MAC IP Addr VM 1 VTEP 1 VM 1 VTEP 1 ARP Req ARP Req 11 Data Plane Learning Learning on Unicast Source - ARP Response Example ARP Resp V V V V V ARP Resp VTEP 2 è VTEP 1 VM1 VM2 VM3 MAC IP Addr MAC IP Addr VM 2 VTEP 2 VM 1 VTEP 1 12 ARP Resp VXLAN Evolution • Head-end replication enables unicast-only mode Multicast Independent • Control Plane provides dynamic VTEP discovery Protocol Learning • Workload MAC addresses learnt by VXLAN NVEs • Advertise L2/L3 address-to-VTEP association prevents floods information in a protocol • VXLAN HW gateways to other encaps/networks External Connectivity • VXLAN HW gateway redundancy • Enable hybrid overlays • VXLAN Routing IP Services • Distributed IP gateways 14 VXLAN Evolution: Using a Control Protocol VTEP Discovery BgP consolidates and 2 propagates VTEP list for VNI BGP Route Reflector VTEPs advertise their VNI membership in BgP 1 1 1 IP A IP C IP B TOR 1 V V V V V Overlay Neighbors TOR 3 , IP C TOR 3 TOR 2 TOR 2 , IP B VTEP can perform 3 4 VTEP obtains list of Head-End Replication VTEP neighbors for each VNI 15 VXLAN Unicast Mode Head-end replication *Broadcast, Unknown Unicast or Multicast 5 Frames are unicast to the neighbors 4 VXLAN Encap BUM Frame IP A è IP B IP A IP C IP B Overlay Neighbors BUM Frame IP A è IP C POD3 , IP C TOR 1 V V V V V POD2 , IP B 3 VTEP performs Head- 2 End Replication TOR 3 TOR 2 VTEP retrieves the list of Overlay Neighbors** 1 BUM Frame **Information statically configured or dynamically retrieved via control plane A host sends a L2 BUM* frame (VTEP discovery) BGP EVPN Control Plane for VXLAN Host and Subnet Route Distribution Route-Reflectors deployed for scaling purposes RR RR iBgP Adjacencies V V V V V § Host Route Distribution decoupled from the Underlay protocol § Use MP-BGP on the leaf nodes to distribute internal host/subnet routes and external reachability information 17 BGP EVPN Control Plane MAC IP VNI Next- Encap Seq Hop Host Advertisement 1 1 5000 IP L1 VXLAN 0 MAC L1 NLRI: Host MAC1, IP1 NVE IP L1/MAC L1 RR RR VNI 5000 Ext.Community: Encapsulation: VXLAN, NVgRE Sequence 0 V V V V V VNI 5000 Host 1 VLAN 10 1. Host Attaches 2. Attachment NVE advertises host’s MAC (+IP) through BgP RR 3. Choice of encapsulation is also advertised 18 BGP EVPN Control Plane MACMAC IPIP VNIVNI NextNext-- EncapEncap SeqSeq HopHop Host Moves 11 11 50005000 IPIP L1L3 VXLANVXLAN 01 MACMAC L1L3 NLRI: Host MAC1, IP1 NVE IP L3/MAC L3 RR RR VNI 5000 Ext.Community: Encapsulation: VXLAN, NVgRE Sequence 1 V V V V V VNI 5000 Host 1 VLAN 10 1. Host Moves to NVE3 2. NVE3 detects Host1 and advertises H1 with seq#1 19 3. NVE1 sees more recent route and withdraws its advertisement VXLAN L2 and L3 gateways Connecting VXLAN to the broader network Egress interface chosen (bridge L2 Gateway: VXLAN to VLAN may .1Q tag the packet) VXLANORANGE VXLAN L2 Bridging Ingress VXLAN packet on Gateway Orange segment Destination is in another segment. Ingress VXLAN packet on Packet is routed to the new segment L3 gateway: VXLAN to X Routing Orange segment SVI • VXLAN VXLANORANGE VXLANBLUE VLAN100 VXLAN VLAN200 • VLAN Router Egress interface chosen (bridge may .1Q tag the packet) N7K w/F3 ASR1K/ASR Nexus 3K N5600 N9K LC 9K L2 gateway Yes Yes Yes Yes Yes L3 gateway Yes No Yes Yes Yes VXLAN BgP EVPN Nexus Programmable Fabric Support Platform Spine Leaf Function Border Leaf DCNM Fabric Function Function Management Nexus 5600 Yes Yes Yes Yes 7.3(0)N1(1) 7.3(0)N1(1) 7.3(0)N1(1) 7.2(3) Nexus Yes Yes Yes Yes 7000/7700 7.3(0)D1(1) 7.3(0)D1(1) 7.3(0)D1(1) 7.2(3) (F3 Only) Nexus 9300 Yes Yes Yes Yes 7.0(3)I1(3) 7.0(3)I1(3) 7.0(3)I1(3) 7.2(3) Nexus 9500 Yes Yes Yes Yes 7.0(3)I1(3) 7.0(3)I1(3) 7.0(3)I1(3) 7.2(3) 21 Underlay Deployment Considerations 22 Leaf/Spine Topology – Underlay Considerations • High Bi-Sectional Bandwidth • Wide ECMP: Unicast or Multicast • Uniform Reachability, ECMP Links Deterministic Latency (Layer-3) • High Redundancy: Node/Link Failure • Line rate, low latency, for all traffic Deployment Considerations Underlay • MTU and Overlays • Unicast Routing Protocol and IP Addressing • Multicast for BUM* Traffic Replication 24 MTU and VXLAN Underlay • VXLAN adds 50 Bytes to the Outer MAC Header Original Ethernet Frame • Avoid Fragmentation by adjusting Outer IP Header Underlay the IP Networks MTU UDP Header • Data Centers often require Jumbo MTU; most Server NIC do support VXLAN Header up to 9000 Bytes • Using a MTU of 9216* Bytes 50 (54) Bytes of Overhead accommodates VXLAN Overhead Original Layer-2 Frame Overlay plus Server max. MTU *Cisco Nexus 5600/6000 switches only support 9192 Byte for Layer-3 Traffic 25 Building your IP Network – Interface Principles Underlay • Know your IP addressing and IP scale requirements • Best to use single Aggregate for all Underlay Links and Loopbacks • IPv4 only • For each Point-2-Point (P2P) 2 connection, minimum /31 required V1 V • Loopback requires /32 • Routed Ports/Interfaces • Layer-3 Interfaces between Spine and Leaf (no switchport) V3 • VTEP uses Loopback as Source- Interface 26 Building your IP Network – Routing Protocols; OSPF Underlay • OSPF – watch your Network type • Network Type Point-2-Point (P2P) • Preferred (only LSA type-1) • No DR/BDR election • Suits well for routed interfaces/ports (optimal from a LSA Database perspective) 2 • Full SPF calculation on Link Change V1 V • Network Type Broadcast • Suboptimal from a LSA Database perspective (LSA type-1 & 2) • DR/BDR election • Additional election and Database 3 Overhead V 29 Building your IP Network – Routing Protocols; IS-IS Underlay • IS-IS – what was this CLNS? - Independent of IP (CLNS) - Well suited for routed interfaces/ports - No SPF calculation on Link change; only if Topology changes - Fast Re-convergence 2 1 V - Not everyone is familiar with it V V3 31 Building your IP Network – Routing Protocols; iBGP Underlay • iBgP + IgP = The Routing Protocol Combo • IgP for underlay topology & reachability (e.g. IS-IS, OSPF) • iBgP for VTEP (loopback) reachability • iBGP route-reflector for simplification and scale RR RR RR RR • Requires two routing protocols iBGP • Separates Links (IgP) from VTEPs 2 1 V (iBGP) V • End-Host information are still in iBgP but different address-family V3 33 Multicast Enabled Underlay • May use PIM-ASM or PIM-BiDir (Different hardware has different capabilities) Nexus 7K Nexus 9K CSR 1000V N1KV Nexus 3K Nexus 5K/6K ASR9K with F3 LC Standalone ASR1K PIM-ASM PIM-ASM PIM-ASM & Bidir-PIM PIM-ASM & Mcast mode IgMP v2/v3 (Bidir – Bidir-PIM (Bidir – Bidir-PIM (ASM –Future) Bidir-PIM Future) Future) • Cisco Nexus 9000 Series Switches (supports ASM/SSM/Ingress-Replication for VXLAN VTEP support) and Cisco Nexus 5600 Series Switches (supports only PIM BIDIR for VXLAN VTEP support) can be part of the same VXLAN EVPN fabric but not share the same Layer-2 VNI.