Ethernet Storage Network Designs and Integration

Greg McClellan – Network Consulting Engineer BRKDCN-1902 Agenda

• Introduction

• SCSI and Fibre Channel

• Data Centre Bridging

• FCoE, iSCSI and RDMA • SAN Designs – moving toward unified infrastructure

• Conclusion Block storage protocols: built on SCSI

• SCSI is a block-transfer protocol that enables data transfer between various independent peripheral devices and computers.

• SCSI connects disks in a storage array and the tape drives of a tape library to the servers.

• Fibre Channel, Fibre Channel over (FCoE), and iSCSI, are the most widely deployed block-storage protocols.

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 4 SCSI basics

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 5 SCSI-3 Stack

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 6 Fibre Channel Stack

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 7 Fibre Channel Frame

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 8 Block Protocol Comparisons

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 9 Data Center Bridging

• EEE 802.1 Data Center Bridging (DCB) is a collection of standards-based extensions to classical Ethernet. It provides a lossless data center transport layer that helps enable the convergence of LANs and SANs onto a single unified fabric.

• In addition to supporting Fibre Channel over Ethernet (FCoE), DCB can enhance the operation of iSCSI, network-attached storage (NAS), and other business-critical storage traffic.

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 10 Why do we need DBC?

• Block storage networks have different requirements than IP networks

• Low latency requirements for I/O

• No requirement for SCSI transactions

• Different flow control mechanisms (buffer credits vs. pause)

• In-order delivery for SCSI transactions (load balancing)

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 11 DCB Framework

The DCB framework defines the capabilities for switches and endpoints to be part of a data center fabric. It includes:

•Priority-based flow control (PFC, IEEE 802.1Qbb)

•Enhanced transmission selection (ETS, IEEE 802.1Qaz)

•Congestion notification (IEEE 802.1 Qau)

•Extensions to the Link Layer Discovery Protocol standard (IEEE 802.1AB) that support Data Center Bridging Capability Exchange Protocol (DCBX)

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 12 Priority Based Flow Control

• Provides capability to manage bursty, single traffic source on a multiprotocol link

• Large bursts from one traffic type will not affect other traffic types, large queues of traffic from one traffic type will not starve other traffic types' resources, and optimization for one traffic type will not create high latency for small messages of other traffic types.

• PFC is an enhancement to the pause mechanism. PFC enables pause based on user priorities or classes of service instead of link layer pause

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 13 PFC creates eight separate virtual links on the physical link and allows any of these links to be paused and restarted independently.

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 14 Enhanced Transmission Selection (ETS)

• Enables bandwidth management between traffic types for multiprotocol links

• Used to assign traffic to a particular virtual lane using IEEE 802.1p class of service (CoS) values to identify which virtual lane traffic belongs to

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 15 ETS Bandwidth Consumption Example

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 16 Delayed Drop

• PFC works very well for protocols like FCoE that require a reliable medium, but it makes short-lived congestion and persistent congestion undistinguishable

• Delayed drop mediates between traditional Ethernet behavior and PFC behavior. With delayed drop, a CoS can be flow controlled and the duration of the congestion monitored, so that the traditional drop behavior follows if the congestion is not resolved.

• Delayed drop offers the capability to tune the definition of "short-lived congestion" with PCF, hence removing the need to increase physical buffers on the interfaces.

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 17 Congestion Notification

• Pushes congestion to the edge switches with a proxy buffer

• Instead of dropping packets, they are marked as congested in the DSCP header to the receiving host

• Transmission rate is decreased from the edge switch

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 18 Explicit Congestion Notification

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 19 Data Center Bridging Exchange (DCBX) Protocol

• DCBX is used to exchange all the DCB features across the devices and maintain consistency. DCBX helps ensure consistent quality-of-service (QoS) parameters across the network and servers. The features are advertised to the servers in type-length-value (TLV) format using the Link Layer Discovery Protocol (LLDP).

• IEEE DCB builds on classical Ethernet's strengths, adds several crucial extensions to provide a unified fabric.

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 20 Logical Link Down

• There are many features and error conditions in both Ethernet and particularly FC that counts on flapping the link between the initiator and the first hop switch to cause the initiator to resend FLOGI. This is a common recovery mechanism in FC.

• Now that both Ethernet and FC are on the same wire, if the link flaps due to an FC error condition, the Ethernet side is also brought down, and vice versa.

• As a result, we need to introduce a protocol message that the NIC and switch can use to just flap the “logical” FC link or the “logical” ethernet link. Using this new message, the FC side can recover and resend FLOGI and the Ethernet side is never distributed; the same is true in the opposite direction.

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 21 FCoE encapsulation

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 22 FCoE frame payload

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 23 FCoE benefit

• Integrate with existing FC network

• Existing staff skill

• Single wire

• Easy to implement with default settings once “feature fcoe” is enabled

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 24 FCoE considerations

• Specialized host adapters

• Careful consideration application requirements and existing network resources

• Limited distance capability

• Difficult cost justification in the core layer for conversion

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 25 iSCSI payload

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 26 iSCSI packets

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 27 iSCSI benefit

• Existing infrastructure

• Existing staff skill

• Single wire

• No special interface*

• Easy to implement

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 28 iSCSI considerations

• Software based initiators consume a considerable amount of CPU resource

• TCP and IP encapsulation additional overhead

• Can use TOE or iSCSI HBA but that limits the cost benefit of standard hardware

• Limited scale for iSCSI gateways to FC

• Pdu segmentation

• Jumbo frames

• TCP Windowing

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 29 RDMA vs. Sockets

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 30 RoCE Packet format

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 31 RoCE benefit

• Very low latency due to kernel bypass

• Allows for Host compute cycles to be used for application processing instead of data transfer

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 32 RoCE considerations

• Specialized host adapters

• Limited application awareness

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 33 Ethernet Storage Network Considerations

• Know your application workload requirements

• Know your network capabilities

• CoS

• DCB – PFC, ETS, Congestion Notification

• Network design to limit oversubscription

• Ongoing verification of configuration settings (workloads change)

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 34 Network vs. Fabric Classical Ethernet • Ethernet/IP ? ? ? ? •Goal: Provide any-to-any connectivity ? •Unaware of packet loss (“lossy”) – relies on ULPs for ? retransmission and windowing ? Switch Switch •Provides the transport without worrying about the services - services provided by upper layers Switch ? •East-west vs north-south traffic ratios are undefined ? ? • Network design has been optimized for: ? ? ? •High Availability from a transport perspective by connecting nodes in mesh architectures Fabric Topology and Traffic Flows •Service HA is implemented separately Are Highly Flexible •Takes in to account control protocol interaction (STP, OSPF, EIGRP, L2/L3 boundary, etc…) ? ? Client/Server Relationships Are Not Pre-Defined ?

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 35 Network vs. Fabric LAN Design – Access/Aggregation/Core

. Servers typically dual homed to two Outside Data or more access switches Center “Cloud”

. LAN switches have redundant connections to the Core next layer

. Distribution and Core can be collapsed into a single box L3 Aggregation L2 . L2/L3 boundary typically deployed in the Virtual Port- aggregation layer Channel (VPC)

•Spanning tree or advanced L2 technologies (vPC) STP Access used to prevent loops within the L2 boundary •L3 routes are summarized to the core Virtual Port- Channel (VPC) . Services deployed in the L2/L3 boundary of the STP network (load-balancing, firewall, NAM, etc.)

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 36 Network vs. Fabric Classical Fibre Channel

T1 • Fibre Channel SAN T0 T2 •Transport and Services are on the same layer in the same devices FSPF FSPF DNS Zone •Well defined end device relationships (initiators and targets) Switch Switch Zone •Does not tolerate packet drop – requires lossless transport RSCN DNS RSCN DNS •Only north-south traffic, east-west traffic mostly irrelevant FSPF Switch RSCN I5 Zone • Network designs optimized for Scale and Availability I0 I1 •High availability of network services provided through dual fabric architecture I4 I2 I3 •Edge/Core vs Edge/Core/Edge •Service deployment

Fabric Topology, Services and Traffic Flows Are Structured

I(c) Client/Server T(s) Relationships Are I(c) Pre-Defined

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 37 Network and Fabric SAN Design – Two ‘or’ Three Tier Topology

. “Edge-Core” or “Edge-Core-Edge” Topology

FC . Servers connect to the edge switches Fabric ‘A’ Fabric ‘B’ . Storage devices connect to one or more core switches

. HA achieved in two physically separate, but identical, redundant SAN fabric Core

. Very low oversubscription in the fabric (1:1 to 12:1)

Edge

HBA

3 x 8G ISL ports 6 x 4G Array ports

FC Example: 10:1 O/S 60 Servers with 4 Gb ratio HBAs

240 G 24 G 24 G

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 38 Ethernet Traditional Data Center Design FC Ethernet LAN and Fibre Channel SAN

• Physical and Logical separation of LAN and SAN traffic

• Additional Physical and Logical FC separation of SAN fabrics Fabric ‘A’ Fabric ‘B’ L3 L2

Nexus MDS 9000 7000 Nexus 5000

NIC HBA

Isolation Convergence

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 39 Ethernet Data Center Design with E-SAN FC Ethernet LAN and Ethernet SAN

• Same topologies as existing networks, but using Nexus Unified Fabric Ethernet switches for SANs

• Physical and Logical separation of LAN and SAN traffic

FCo • Additional Physical and Logical separation of SAN E fabrics Fabric ‘A’ Fabric ‘B’ L3 • Ethernet SAN Fabric carries FC/FCoE & IP based L2 storage (iSCSI, NAS, …)

• Common components: Ethernet Capacity and Cost Nexus Nexus 7000 7000 Nexus Nexus 5000 5000

NIC CNA or CNA Isolation Convergence

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 40 Ethernet FC Converged Access with vPC Converged FCoE link Sharing Access Layer for LAN and SAN Dedicated FCoE link • Shared Physical, Separate Logical LAN and SAN traffic at Access Layer

• Physical and Logical separation of LAN and SAN traffic at FCo FC Aggregation Layer E

• Additional Physical and Logical separation of SAN fabrics Fabric ‘A’ Fabric ‘B’ L • Storage VDC on Nexus 7000 for additional management / 3 MDS L 9000 operation separation 2

Nexus 7000 Nexus 5000

CNA

Isolation Convergence

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 41 Dedicated vs. Converged ISLs Why support Dedicated ISLs as opposed to Converged?

 One wire for all traffic types  Dedicated wire for a traffic type  QoS guarantees minimum bandwidth  No Extra output feature processing allocation  Distinct Port ownership  No Clear Port ownership  Complete Storage Traffic Separation  Desirable for DCI Connections

Agg BW: 40G Agg BW: 40G FCoE: 40G FCoE: 20G Ethernet: 40G Ethernet: 20G HA: 42 Links Available

Different methods, Producing the same aggregate bandwidth Dedicated Links provide additional isolation of Storage Traffic

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 42 Converged Network – Dual Fabrics with Ethernet FC Dedicated Links Converged FCoE link Maintaining Dual SAN fabrics with Overlay Dedicated FCoE link • LAN and SAN traffic share physical switches Fabric ‘A’ LAN/SAN • LAN and SAN traffic use dedicated links between switches Fabric • All Access and Aggregation switches are FCoE FCF ‘B’ switches L3 • Dedicated links between switches are VE_Ports L2 FCF • Storage VDC for additional management / operation separation VE

FCF … FCF …

Nexus 7000 Nexus 5000

CNA FCoE FC

Isolation Convergence

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 43 Converged Network – Dual Fabrics with Ethernet FC Dedicated Links Converged FCoE link Overlay of SAN A / SAN B with Dedicated Links Dedicated FCoE link

LAN Fabric ‘A’ SAN Fabric ‘B’

L3 L2 FCF

VE

FCF FCF Nexus 7000 Nexus 7000 … Nexus 5000 Nexus 5000 …

CNA CNA FCoE FC

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 44 Converged Network – Dual Fabrics with Ethernet FC Dedicated Links Converged FCoE link Maintaining Dual SAN fabrics with Overlay Dedicated FCoE link FabricPath • LAN and SAN traffic share physical switches Fabric ‘A’ • LAN and SAN traffic use dedicated links between Fabric ‘B’ switches

• All Access and Aggregation switches are FCoE FCF switches L3 L2 FCF FCF • Dedicated links between switches are VE_Ports

• Storage VDC for additional management / operation VE separation FCF FCF Nexus 7000 Nexus 5000

CNA FCoE FC

Isolation Convergence

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 45 Converged Network – Dual Fabrics with Ethernet FC Dedicated Links Converged FCoE link Overlay of SAN A / SAN B with Dedicated Links Dedicated FCoE link FabricPath

LAN Fabric SAN ‘A’ Fabric ‘B’

L3 L2 FC FC F F

VE

FC FC Nexus 7000 Nexus F F Nexus 5000 7000 Nexus 5000

CN FCo FC CNA E A

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 46 Complete Your Online Session Evaluation

• Please complete your Online Session Evaluations after each session • Complete 4 Session Evaluations & the Overall Conference Evaluation (available from Thursday) to receive your Cisco Live T-shirt • All surveys can be completed via the Cisco Live Mobile App or the Don’ t forget: Cisco Live sessions will be available Communication Stations for viewing on-demand after the event at CiscoLive.com/Online

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 47 Continue Your Education

• Demos in the Cisco campus

• Walk-in Self-Paced Labs

• Lunch & Learn

• Meet the Engineer 1:1 meetings

• Related sessions

BRKDCN-1902 © 2017 Cisco and/or its affiliates. All rights reserved. Cisco Public 48 Q & A Thank You