Why Fibre Channel over Ethernet?
R Snively
[email protected] With grateful acknowledgement to John Hufferd, Anoop Ghanwani, Dave Peterson, 408-333-8135 and Steve Wilson, all from Brocade.
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 1 Introduction
The 3 classes of interconnects Example of cross-pollination: iSCSI Our example of cross-pollination: FCoE CEE Ethernet The model FCoE FIP
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 2 The three classes of interconnects
Storage Clustering Networking
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 3 Storage
Requirements Pre-allocated data locations Extreme reliability requirements Large latency multipliers effect performance Simple communications (mostly READ or WRITE) Rapidly increasing scalability requirements Examples FIPS-60, ESCON, FICON, FCP over FC, SCSI, SAS, SATA
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 4 Fibre Channel, a storage example
Achieves goals with High reliability links HBA co-processor (simple because protocol is simple) Delivery of data straight to allocated memory Contains context information directly in frame. Flow control at both link level and end-to-end eliminates need to drop frames. Data path topologies tend to be simple, use FSPF routing. Integrated management.
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 5 Clustering
Requirements RDMA Extremely low latency Latency has significant performance impact Large latency multipliers effect performance Simple communications (mostly GET or PUT) Modest scalability requirements Examples SCI, HIPPI, InfiniBand, VI, Myranet, others
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 6 Networking
Requirements Data allocation determined by analysis of frames Reliability requirements met only at upper layers. Latency of links is third-order performance impact. Sophisticated application level communications Essentially infinite scalability requirements Examples Ethernet, Ethernet, Ethernet, Ethernet, FDDI, SONET, ATM, DSL
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 7 Ethernet, a network example
Achieves goals with Best effort links Simple cheap NICs with host-based frame analysis Delivery of data to buffer, analyzed and moved to destination. Contains context information at higher levels. Flow control at higher levels or not at all. May drop frames. Data path topologies extremely complex. Most management is out of band.
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 8 Why shove storage traffic onto a network?
Because it comes in handy sometimes iSCSI Accepts penalties associated with TCP/IP Accepts penalties associated with latency of networks Gains advantages of LAN/WAN scalability Gains advantages of LAN/WAN distances Low-cost implementations possible if performance goals are modest.
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 9 Another way to shove storage traffic on to networks: FCoE
It is useful enough to justify changing Ethernet to Converged Enhanced Ethernet FCoE Accepts scalability penalties of L2-only Ethernet cloud. Accepts mini-jumbo frame requirement for maximum performance. Accepts penalty of installing new lossless Ethernet bridges. Requires additional flow control (PFC) Requires enhanced transmission selection (ETS) Data Center Bridging eXchange (DCBX) Accepts penalty of possible traffic interactions. Gains advantage of one-port-fits-all configuration. Maintains optimum protocol efficiency. Maintains advantage of full TCP/IP and full FC compatibility.
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 10 Architectural Overview:
SAN Converged FCoE Fabric Network Host Switch (FC) Computer Adapter (CNA)
Host Ethernet Computer Cloud (CEE) LAN/WA TCP/IP N Host Router Network Computer FCoE Traffic (TCP/IP) TCP/IP Traffic Cluster Traffic
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 11 Mini-jumbo frame support
Mini-jumbo frames (No standard defined) Standard Ethernet Frame
46 - 1500 Data Bytes +18 headers Standard FC frame 0 - 2112 Data + (28 to 52 headers) Need mini-jumbo Ethernet frame to fit FC frame
H 28 to 2164 total frame size
64 – 2200 Bytes required in mini-jumbo frame
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 12 CEE Standards developed in IEEE
Priority Flow Control (IEEE 802.1Qbb) Enhanced Transmission Selection (ETS) (IEEE 802.1Qaz) DCBX (Probably included in IEEE 802.1Qaz)
Note that other related work is not required for CEE, though it may ultimately be used in some FCoE implementations. Congestion Notification (IEEE 802.1Qau) Multi-pathing (IETF TRILL project among others)
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 13 Priority-based Flow Control (PFC)
Why PFC? Link level flow control is the only way to guarantee zero loss due to congestion 802.3x pauses all traffic including control traffic PFC only affects priorities that need it; e.g. the priority used by FCoE
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 14 PFC Implementation Conventions
May be applied to each priority independently; i.e. support enable/disable transmission and receipt of pause frames Both transmission and receipt are enabled/disabled simultaneously Frame format for PFC along with Ethertype and MAC address Specifies the minimum delay in reaction to pause that a receiver must be able to deal with After PFC is negotiated (using DCBX), 802.3x shall not be used Don’t generate 802.3x pause frames Ignore received 802.3x pause frames
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 15 PFC Frame Format
The fields Time[n] for n = 0..7 are always present If e[n] = 0, Time[n] is zero on transmit and ignore on receipt
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 16 PFC Deployment Considerations
Pause and link level flow control are known to have several drawbacks Deadlocks – routing and other deadlocks Congestion spreading Solutions exist for these problems Use 802.1Qau to deal with sustained congestion Use deadlock avoidance schemes, including simplified topologies and dropping frames in the event a deadlock condition is detected.
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 17 Enhanced Transmission Selection
Why ETS? In a converged network, different traffic type have different needs Strict priority which is the default defined in 802.1Q falls short of meeting the needs Several pre-standard solutions exist for addressing the shortcomings of strict priority ETS provides a consistent management framework for assigning bandwidth to traffic classes It also defines a minimum behavior From ETS proposal to IEEE 802.1 for the scheduler to ensure some minimum capability across vendors
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 18 ETS Implementation Agreement
Combined priority/WRR scheduler Each priority is assigned to a priority group A priority group is represented by 4-bits Must be 0..7, 15 (8..14 are reserved) A priority may be designated as being strict priority by being assigned PGID 15 After all priorities in PG15 are served using strict priority, start serving all the other groups in proportion to their weight Weight is expressed as a percentage of remaining bandwidth with a resolution of 1% Bandwidths must add up to 100%; if they don’t behavior is undefined A scheduler can pick the closest value that it can support – to within +/- 10% Minimum requirement 3 groups -- PGID 15 and 2 priority groups with weights The scheduler should be configurable on a per-port basis
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 19 Priority Groups and Bandwidth Assignment – An Example
Priori PGID Desc ty 715IPC 7 IPC SP 61LAN PGID BW% Desc 3 51LAN 2 SAN – 50% 15 - IPC 41LAN 050SAN 6 WRR 30SAN 5 150LAN LAN – 50% 20SAN 4 --- 1 11LAN 0 01LAN
There are 3 priority groups in use – IPC, LAN & WAN First all IPC traffic is serviced (pri 7); it is marked as is assigned PGID 15 Next, the available bandwidth is shared equally by LAN (pri 6,5,4,1,0) & SAN (pri 3,2) Within a group, scheduling is not specified; e.g. with the LAN traffic an implementation could map pri 6,5,4,1,0 to a single queue, or map them to separate queues and do round robin or priority between them – hierarchical scheduling
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 20 ETS Configuration Guidelines
Within a PG, PFC must be enabled (or disabled) on all priorities within the group; a mix of PFC and non-PFC priorities within a single PG should be avoided The default configuration is the same as 802.1Q – strict priority All priorities are assigned PGID 15 The bandwidth table contains only PGID 15 with no bandwidth allocation
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 21 Data Center Bridging eXchange (DCBX) Protocol Why DCBX? Defines the limits of the CEE-capable cloud Detects mis-configuration between peers May be used to configure a peer Allows for coexistence with legacy devices
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 22 DCBX & LLDP
DCBX has 2 state machines DCBX control DCBX feature somethingChangedLocal is used by DCBX DCBX Feature n DCBX Feature 2 control to inform LLDP that a new LLDPDU DCBX Feature 1 must be sent somethingChangedRemote informs DCBX remoteFeatureChanged (Syncd) (seqNo, myAckNo) control that LLDP has received new information from the peer DCBX Control LLDP- remoteFeatureChanged informs DCBX EXT MIB Feature state machine that a remote feature has changed somethingChangedLocal somethingChangedRemote seqNo and myAckNo are maintain in DCBX control but visible in DCB feature state LLDP LLDP MIB machines Syncd is maintained in each of the DCBX feature state machines but is also visible in DCBX control Extensions are defined to the LLDP MIB for DCBX and each of the features
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 23 DCBX Handshaking With Peer
DCBX control TLV contains a seqNo and ackNo Local system initializes and populates a seqNo in LLDP messages seqNo is modified when any of the local configuration changes The ackNo tells the peer the last seqNo that was received In this way, a system knows what information about it has been received by its peer
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 24 DCBX Sync With Peer
PFC configuration must match with peer on all priorities, otherwise PFC is disabled on the link and an error is flagged ETS configuration need not match If one end is “Willing” and the other is “Not Willing”, then the “Willing” switch port accepts the configuration from the “Not Willing” port
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 25 Vocabulary Drill
INCITS: InterNational Committee for Information Technology Standards. (Accredited by ANSI.) INCITS Technical Committee T11 Responsible for all Fibre Channel standards FC-BB-5 The standard that defines the Fibre Channel parts of Fibre Channel over Ethernet FCoE In FC-BB-5, officially called the FC-BB_E model.
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 26 Architectural Overview: CNA
Host Lossless FCoE SAN Computer Ethernet Switch Fabric Cloud (FC)
ENode
VN_Port LEP = Link End Point
FCoE_LEP
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 27 Architectural Overview: FCF
Host Lossless FCoE SAN Computer Ethernet Switch Fabric Cloud (FC)
Fibre Channel Forwarder (FCF)
VF_Port
FCoE_LEPs
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 28 Multiple Logical FC connections via a Single Ethernet MAC
Examples of single MACs with connections to two different FCFs Switch
The Logical FC Link is defined by a MAC Address pair • A VN_Port MAC Address • A VF_Port MAC Address
For a logical FC link the FCoE Frames are always sent to and received from a specific FCF’s MAC Address • Therefore, pathing to and from the FC driver is always defined by the MAC Address of the partner FCF’s VF_Port
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 29 Model for the ENode
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 30 Model of the ENODE with Multiple Logical FC interfaces In this Multiple FC NPIV model this instances on a single FC-3 /FC-4s FC-3 /FC-4s is where logical FC Host interface FC-2 functions • For each logical N_Port live (VN_Port) there is one FLOGI In this model this and perhaps 100’s of FDISC FC FC is where the VN_Port VN_Port Entity . . . Entity Encapsulation /De- Encapsulation • Each VN_Port is seen by FCoE FCoE_LEP FCoE FCoE_LEP functions live the Host as a separate Entity Entity FCoE (logical) FC connection Controller MAC Address of FCoE_LEP (VN_Port) Lossless Ethernet MAC Ethernet_Port May or may not be • The number of (logical) FC the same as the FCoE controller connections is MAC Address of “Burnt-in implementation dependent MAC • Only one MAC Address is required for the FCoE Controller and the VN_Ports on a single physical MAC (aka Server Provided MAC Address – SPMA) • FCF may chose to specify new MAC addresses for each VN_Port (aka Fabric Provided MAC Address – FPMA)
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 31 Model for the FCF
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 32 FCoE, Ethertype ‘8906’h
Word 31-24 23-16 15-8 7-0
0 Destination MAC Address (6 Bytes) 1
Optional IEEE 802.1q 2 Source MAC Address (6 Bytes) 4 Byte Tag goes here 3 ET=FCoE (16 bits) Ver (4b) Reserved (12 bits) 4 Reserved 5 Reserved
6 Reserved SOF (8 bits) 7 Encapsulated FC Frame This field varies … FC Frame = Minimum 28 Bytes (7 Words) Æ In size Maximum 2180 Bytes (545 Words) n (including FC-CRC)
Ethernet frame n+1 EOF (8 bits) Reserved Size is 64 Bytes to 2220Bytes n+2 Ethernet FCS
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 33 After that, everything is easy
All Class 2 and 3 protocols work, including: FCP (SCSI Fibre Channel Protocol) FC-SB-3 (FICON) RFC 4338 (Transmission of IPv6, IPv4, and Address Resolution Protocol (ARP) Packets over Fibre Channel) Security etc. Link level flow control uses Ethernet conventions
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 34 A few little details left...
How do you find the devices? How do you determine the allowable frame sizes? How do you associate FC FC_IDs and MAC addresses? How do you keep a virtual link alive in an Ethernet L2 cloud? How do you clear links?
Solution is a second protocol.
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 35 FIP, Ethertype ‘8914’h
Word 31-24 23-16 15-8 7-0
0 Destination MAC Address (6 Bytes) 1
Optional IEEE 802.1q 2 Source MAC Address (6 Bytes) 4 Byte Tag goes here 3ET=FIP (16 bits) Ver Reserved (12 bits) (4b) 4 FIP Operation CodeFIP OperationReserved Code FIP SubCode 5 ReservedDescriptor ListFIP Length subcode FPSP DescriptorFlags List Length ASF 6 Flags F S S F Descriptor list P P varies In size Descriptor List … n PAD to minimum length or mini-Jumbo length Ethernet frame size n+1 Ethernet FCS Is 64Bytes to 2220Bytes Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 36 FIP Operation Codes
Operation Code Subcode Operation 0001h 01h Discovery, Solicitation 02h Discovery, Advertisement 0002h 01h Link Service Request 02h Link Service Response 0003h 01h FIP Keep Alive 02h FIP Clear Virtual Link FFF8h .. FFFEh 00h .. FFh Vendor Specific All others All others Reserved
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 37 An example of FIP
An ENode solicits with a broadcast Solicitation frame to identify any FCFs that it can use to make a virtual connection. Applicable FCFs respond with unicast Advertisement frames. After deciding which FCFs are appropriate, the ENode sends an FLOGI Request. The selected FCF responds with an appropriate FLOGI Response accepting or rejecting the Request. The FLOGI Response assigns the MAC address that the Enode will use for later FCoE frames with the VN_Port. And now the FCoE virtual link is up and ready for normal FC use.
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 38 Multicast Solicitation from H2
FCF-MAC(A)
H1 FCF A FC fabric MAC(H1)
Lossless FCF-MAC(B) MAC(H2) Ethernet Bridge H2 FCF B
All-FCF-MACs MAC(H2)
Solicitation
[From ENode, Addr modes F=0b, this Solicitation solicits supported, MAC(H2), VF_Port capable FCF-MACs H2 Port_ID,Max_Rcv_Size]
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 39 Unicast Advertisements from FCFs
FCF-MAC(A)
H1 FCF A FC fabric MAC(H1)
Lossless FCF-MAC(B) MAC(H2) Ethernet Bridge H2 FCF B
MAC(H2) MAC(H2) FCF-MAC(A) FCF-MAC(B) H2’s FCF list: Jumbo Advertisement Jumbo Advertisement FCF-MAC(A) [J] [Solicited, From FCF, Address [Solicited, From FCF, Address FCF-MAC(B) [J] Modes Supported, Priority, Modes Supported, Priority, FCF-MAC(A), FC-MAP, FCF-MAC(B), FC-MAP, Switch_Name, Fabric_Name] Switch_Name, Fabric_Name] Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 40 FLOGI from H2 to FCF_A
FCF-MAC(A)
H1 FCF A FC fabric MAC(H1)
Lossless FCF-MAC(B) MAC(H2) Ethernet Bridge H2 FCF B
FCF-MAC(A) MAC(H2)
FLOGI Request [From ENode, Addr mode supported, Proposed MAC address, FLOGI Request information] Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 41 FLOGI Response to H2
FCF-MAC(A)
H1 FCF A FC fabric MAC(H1)
Lossless FCF-MAC(B) MAC(H2) Ethernet Bridge H2 FCF B
MAC(H2) FCF-MAC(A) FLOGI Response [From FCF, Addr mode assigned, Assigned MAC address, FLOGI Response information]
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 42 Resulting virtual FC link runs FCoE
FCF-MAC(A)
H1 FCF A FC fabric MAC(H1)
Lossless FCF-MAC(B) Assigned MAC Ethernet Bridge H2 FCF B
FCoE traffic
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 43 More details of FCoE
The remainder of this work is “an exercise for the student”. References: To become active in IEEE 802.1 and IEEE 802.3 see: http://grouper.ieee.org/groups/802/dots.html To become active in T11: www.t11.org
Storage Developer Conference 2008 www.storage-developer.org © 2008 Insert Copyright Information Here. All Rights Reserved. Page 44