Architecture for a 10-Gigabit Standard

Sun MicrosystemsShimon Computer Muller Company

IEEE 802.3 High Speed Study Group June 1, 1999

1 IEEE 802.3

HSSG S. Muller - Sun Outline

■ Introduction ■ Market Requirements ■ Customer Requirements ■ Standards Requirements

■ Multi-Gigabit Transmission Options ■ 10- Objectives ■ Architecture ■ Functional Partitioning ■ Principles of Operation ■ Physical Coding and Framing Requirements ■ Media Independent Interfaces ■ Error Handling

■ Summary

2 IEEE 802.3

HSSG S. Muller - Sun Introduction --- Market Requirements

■ The expectation is that scaling Ethernet to 10Gb/s will necessarily expand the scope of its application space

■ “Ethernet everywhere” (or at least “wherever you can get away with it”) ■ Everything from data center and computer room clusters, through traditional LAN backbone and desktop applications, to MAN/RAN/WAN

3 IEEE 802.3

HSSG S. Muller - Sun Introduction --- Customer Requirements

■ The motivations and constraints for each one of the envisioned applications are quite different

■ Long Haul ■ MAN, RAN and WAN ■ Must operate over existing very long fiber links ■ Requires high coding efficiency ■ Not very sensitive to cost ■ Does not address any specific problem for traditional Ethernet users

■ Intermediate Haul ■ Traditional LAN backbones ■ Must operate over the existing cabling infrastructure ■ Coding efficiency is not an issue ■ Cost sensitive ■ Will address future problems of backbone congestion in Ethernet LANs

4 IEEE 802.3

HSSG S. Muller - Sun Introduction --- Customer Requirements (continued)

■ Short Haul ■ Computer room clusters, server-switch and inter-switch interconnects ■ Cabling infrastructure is not an issue ■ Coding efficiency is not an issue ■ Very cost sensitive ■ Will address existing hot spots in today’s networks

5 IEEE 802.3

HSSG S. Muller - Sun Introduction --- Standards Requirements

■ “One solution that fits all” for the 10-Gigabit Ethernet Standard will be sub-optimal for at least one major market segment

■ The standard should be able to accommodate multiple solutions that will address the divergent market requirements

■ The various options must be constrained to the

■ Specific Requirements:

■ The MAC should be modified one more time and made “truly” speed- independent ■ Media independent interfaces should be defined below the MAC ■ Several Physical Layers can be defined, that attach to these interfaces ■ This approach is fully compatible with previous 802.3 practice

6 IEEE 802.3

HSSG S. Muller - Sun Multi-Gigabit Transmission Options

■ “Go Faster”

■ Architecturally straightforward ■ Scale everything by a factor of 10 ■ Implementation and cost challenges ■ Requires very high-speed Physical Layer transmission components ■ Significant portions of logic need to be clocked at very high frequencies

■ “Go Wider”

■ Striping of the data stream across multiple transmission channels ■ Can be implemented using proven existing technology ■ Alleviates the very high-speed logic design requirements ■ Will provide a much cheaper alternative in a variety of network environments ■ The transmission channels can be separate physical links (ribbon fiber) or a single physical link that carries multiple logical channels (WDM)

■ The 10-Gigabit Ethernet standard should accommodate both the serial and the parallel transmission schemes 7 IEEE 802.3

HSSG S. Muller - Sun Multi-Gigabit Transmission Options (continued)

■ Coarse Granularity Striping

■ The channels’ convergence point is above the MAC Layer --- 802.3ad ■ Distribution/collection typically implemented in s/w or in the switching fabric ■ High-speed operation achieved only when multiple Layer 2/3/4 “flows” can be aggregated ■ For a given “flow”, the throughput and the latency are limited by the speed of a single channel

■ Fine Granularity Striping

■ The channels’ convergence point is below the MAC Layer ■ Distribution/collection implemented in h/w as part of the Physical Layer ■ Striping of the MAC data stream performed with byte granularity ■ From the MAC Client’s perspective the performance of the aggregate is identical to that of a single high-speed link

8 IEEE 802.3

HSSG S. Muller - Sun 10-Gigabit Ethernet Objectives

■ Support the speed of 10Gb/s at the MAC/PLS service interface ■ Preserve the 802.3 frame format at the MAC Client service interface ■ Preserve the minimum and maximum frame sizes of current 802.3 standard ■ Support simple forwarding between 10Gb/s, 1Gb/s, 100Mb/s and 10Mb/s ■ Provide support for Full Duplex operation only (no CSMA/CD) ■ Meet all 802 functional requirements, with the possible exception of Ham- ming Distance ■ Support star-wired topologies ■ Support media selected from ISO/IEC 11801 ■ Provide a family of Physical Layer specifications which support links of: ■ At least 100m (?) over multi-mode multi-fiber bundles ■ At least 300m (?) over multi-mode single-fiber cable ■ At least 3km (?) over single-mode single-fiber cable ■ At least 50km (?) over single-mode single-fiber cable ■ Provide specifications for optional Media Independent Interfaces

9 IEEE 802.3

HSSG S. Muller - Sun Architecture --- Functional Partitioning

LOWER SPEEDS HIGHER LAYERS

LLC --- LOGICAL LINK CONTROL

MAC CONTROL (OPTIONAL)

MAC --- MEDIA ACCESS CONTROL

RECONCILIATION RECONCILIATION

10-GMII

10-GMII DISTRIBUTOR/COLLECTOR

2-GMII

PCS PCS PCS PCS PCS

PMA PMA PMA PMA PMA STRIPED PHY SERIAL PHY PMD PMD PMD PMD PMD

MDI MDI

MEDIUM MEDIUM MEDIUM MEDIUM MEDIUM

1Gb/s 1Gb/s

10 IEEE 802.3

HSSG S. Muller - Sun Architecture --- Functional Partitioning (continued)

10-GMII 2-GMII 2-TBI MDI 156.25MHz/312.50MBd 156.25MHz/312.50MBd 156.25MHz/312.50MBd 3.125GBd

8 10 2 8B/10B SER- 8 PCS 10 DES 2

32 Distributor

8 10 2 8B/10B SER- 10-Gb 8 PCS 10 DES 2 MAC Ethernet Client MAC I/F 8 10 2 8B/10B SER- 8 PCS 10 DES 2

32 Collector

8 10 2 8B/10B SER- 8 PCS 10 DES 2

MDI ??? 12.5GBd

? 2 “Serial” SER- PCS DES ? 2

11 IEEE 802.3

HSSG S. Muller - Sun Architecture --- Principles of Operation

■ Assumptions:

■ A striped bundle contains a fixed number of channels ■ Equal to four ■ All the channels in a bundle have the same nominal speed ■ Equal to 3.125GBd ■ No partial operation is supported ■ For a striped bundle to be considered operational, all the channels in the bun- dle must be operational ■ The end points of a bundle are properly wired ■ In the same order and contiguous ■ The maximum skew between the channels in a bundle is bounded

12 IEEE 802.3

HSSG S. Muller - Sun Architecture --- Principles of Operation (continued)

■ Distributor:

■ Operates in an open loop ■ Not required to consider the skew at the receiving side or the receiver state ■ Accepts a contiguous byte stream from the MAC (frames and idles) and divides it into four sub-streams (“mini-frames” and “mini-idles”) ■ Round-robin arbitration ■ Byte granularity ■ Starting point is arbitrary, but the arbitration is contiguous afterwards ■ The first and/or last bytes of a packet may be sent over any channel with no restrictions ■ The Distributor uniquely “marks” the first and last bytes of a packet ■ The Distributor enforces packet sequencing during the Inter-Packet Gap

13 IEEE 802.3

HSSG S. Muller - Sun Architecture --- Principles of Operation (continued)

■ Collector:

■ Contains an elasticity buffer per channel ■ Compensate for the worst case skew between the channels ■ Synchronizes the channels using sequence information received during the Inter-Packet Gap ■ Reassembles “mini-frames” into packets and sends them to the MAC ■ Reassembly occurs only if all the channels are “in synch” and contain at least a partial “mini-frame” ■ Uses the uniquely “marked” first and last bytes of a packet to determine packet boundaries across the channels ■ Packet reassembly starting point is determined by the “first” byte of a packet ■ After the first byte, reassembly is round-robin with byte granularity ■ Packet reassembly end point is determined by the “last” byte of a packet

14 IEEE 802.3

HSSG S. Muller - Sun Physical Coding and Framing Requirements

■ Leverage from 1000BASE-X to the extent possible

■ Enhance the 1000BASE-X PCS to deal with following constraints:

■ 1. Preamble Shrinkage ■ The minimum Preamble of a “mini-frame” may be reduced to one byte (7/4) ■ 1000BASE-X encapsulation can extend the IPG by one symbol at the expense of the Preamble ■ 1000BASE-X encapsulation requires at least one symbol of Preamble for SPD ■ Solution: ■ Increase the Preamble field of the 10-Gigabit by one byte (total of eight) ■ 2. Inter-Packet Gap Shrinkage ■ The minimum IPG between “mini-frames” may be reduced to three bytes (12/4) ■ 1000BASE-X encapsulation allows for an EPD of a maximum of three symbols ■ This can potentially eliminate the Idle signalling between frames ■ Solution: ■ Define a new EPD of no more than two symbols 15 IEEE 802.3

HSSG S. Muller - Sun Physical Coding and Framing Requirements (cont.)

■ 3. Packet Sequencing Signalling ■ Allows for channel synchronization in the Collector ■ Enhances error robustness ■ To avoid additional overhead, packet sequencing is enforced during the IPG ■ Solution: ■ Define multiple flavors of the IDLE ordered set ■ 4. Packet and Frame Delimiters ■ “Mini-frame”-to-Packet reassembly in the Collector requires multiple flavors of start and end dilimiters ■ Solution: ■ Define an SPD code-group that indicates the start of a “mini-frame” on one of the chan- nels AND the start of a MAC packet for the entire aggregate ■ Define an SFD code-group that indicates the start of a “mini-frame” ONLY on the remain- ing three channels ■ Define an EPD code-group that indicates the end of a “mini-frame” on one of the chan- nels AND the end of a MAC packet for the entire aggregate ■ Define an EFD code-group that indicates the end of a “mini-frame” ONLY on the remain- ing three channels

16 IEEE 802.3

HSSG S. Muller - Sun Media Independent Interfaces

■ Standard media independent interfaces are a good thing to have!

■ Provide interoperable points of attachment between components from multiple vendors ■ Allow for clean architectural partitioning between functional modules ■ Simplify the standard’s specification

■ Multiple MII compliance interfaces may be needed for the 10-Giga- bit Ethernet Standard

■ Allows for various levels of silicon integration in implementations

■ All the interfaces are based on the same concepts that were used for Gigabit Ethernet

■ Utilize the full duplex subset of the GMII and the TBI ■ Trade-off between the absolute minimum number of signals used, with- out substantially increasing the baud rate ■ All the interfaces are defined such that they can be overlaid one on top of the other 17 IEEE 802.3

HSSG S. Muller - Sun Media Independent Interfaces (continued)

■ 10-GMII

■ 156.25MHz clocks ■ 25% increase compared to GMII ■ Both edges of the clocks used for data transfer ■ 32-bit data bus in each direction (TXD/RXD) ■ Increased from 8 on the GMII ■ 2-bit VLD bus in each direction ■ Indicates the number of valid bytes on TXD/RXD ■ Full Duplex operation only ■ GMII COL and CRS signals removed ■ GMII Carrier Extension encodings on TXD/RXD removed

18 IEEE 802.3

HSSG S. Muller - Sun Media Independent Interfaces (continued)

■ 2-GMII

■ 156.25MHz clocks ■ Both edges of the clocks used for data transfer ■ 8-bit data bus in each direction (TXD/RXD) ■ Same as GMII ■ 1-bit Packet Delimiter (PD) signal in each direction ■ Indicates the first and the last bytes of a MAC packet ■ Full Duplex operation only ■ GMII COL and CRS signals removed ■ GMII Carrier Extension encodings on TXD/RXD removed

■ 2-TBI

■ 156.25MHz clocks ■ Both edges of the clocks used for data transfer ■ 10-bit data bus in each direction (TX/RX) 19 IEEE 802.3

HSSG S. Muller - Sun Media Independent Interfaces (continued)

CLK

TX_EN/ RX_DV

TX_ER/ RX_ER

VLD[1:0] 00010 3 0 2 0 2 0

PA1 PA5 D1 D57 CRC1 IPG1 IPG5 IPG9 PA1 PA5 D1 D57 D61 CRC4 IPG4 IPG8 IPG12 PA4 SFD D56 D60 CRC4 IPG3 IPG7 IPG11 PA3 PA7 D3 PA2 PA6 D2 D58 CRC2 IPG2 IPG6 IPG10 PA2 PA6 D2 D58 CRC1 IPG1 IPG5 IPG9 PA1 PA5 D1 D57 D61 CRC4 IPG4 IPG8 IPG12 PA4 SFD D4 D[31:0] PA3 PA7 D3 D59 CRC3 IPG3 IPG7 IPG11 PA3 PA7 D3 D59 CRC2 IPG2 IPG6 IPG10 PA2 PA6 D2 D58 CRC1 IPG1 IPG5 IPG9 PA1 PA5 D1 D5 PA4 SFD D4 D60 CRC4 IPG4 IPG8 IPG12 PA4 SFD D4 D60 CRC3 IPG3 IPG7 IPG11 PA3 PA7 D3 D59 CRC2 IPG2 IPG6 IPG10 PA2 PA6 D2 D6

Notes: - When TX_EN/RX_DV is asserted, an encoding of VLD=0 indicates four valid bytes on D[31:0]. - A non-zero encoding of VLD is only relevant when TX_EN/RX_DV is asserted.

10-GMII

20 IEEE 802.3

HSSG S. Muller - Sun Media Independent Interfaces (continued)

CLK

TX_EN/ RX_DV

TX_ER/ RX_ER

PD

D[7:0] Idle1 PA1 PA5 D1 D57 CRC1 Idle2 Idle2 Idle2 PA1 PA5 D1 D57 D61 CRC4 Idle3 Idle3 Idle3 PA4 SFD D56 D60 CRC3 Idle4 Idle4 Idle4 PA3 PA7 D3

2-TBI 8B/10B IDLE1 SPD D DD D D EFD IDLE2 SPD DDD DDD EPD IDLE3 SFD DD D D D EFD IDLE4 SFD D D Code 2-GMII/2-TBI --- Channel 1

CLK

TX_EN/ RX_DV

TX_ER/ RX_ER

PD

D[7:0] Idle1 PA2 PA6 D2 D58 CRC2 Idle2 Idle2 Idle2 PA2 PA6 D2 D58 CRC1 Idle3 Idle3 Idle3 PA1 PA5 D1 D57 D61 CRC4 Idle4 Idle4 Idle4 PA4 SFD D4

2-TBI 8B/10B IDLE1 SFD DDD DD EFD IDLE2 SFD DDD DDEFD IDLE3 SPD D DDD D D EPD IDLE4 SFD D D Code 2-GMII/2-TBI --- Channel 2

21 IEEE 802.3

HSSG S. Muller - Sun Media Independent Interfaces (continued)

CLK

TX_EN/ RX_DV

TX_ER/ RX_ER

PD

D[7:0] Idle1 PA3 PA7 D3 D59 CRC3 Idle2 Idle2 Idle2 PA3 PA7 D3 D59 CRC2 Idle3 Idle3 Idle3 PA2 PA6 D2 D58 CRC1 Idle4 Idle4 Idle4 PA1 PA5 D1 D5

2-TBI 8B/10B IDLE1 SFD D DD D D EFD IDLE2 SFD DDD DDEFD IDLE3 SFD D DDD D EFD IDLE4 SPD D D D Code 2-GMII/2-TBI --- Channel 3

CLK

TX_EN/ RX_DV

TX_ER/ RX_ER

PD

D[7:0] Idle1 PA4 SFD D4 D60 CRC4 Idle2 Idle2 Idle2 PA4 SFD D4 D60 CRC3 Idle3 Idle3 Idle3 PA3 PA7 D3 D59 CRC2 Idle4 Idle4 Idle4 PA2 PA6 D2 D6

2-TBI 8B/10B IDLE1 SFD D DD D D EPD IDLE2 SFD DDD DDEFD IDLE3 SFD DDDDD D EFD IDLE4 SFD D D Code 2-GMII/2-TBI --- Channel 4

22 IEEE 802.3

HSSG S. Muller - Sun Media Independent Interfaces (continued)

PA1 PA5 D1 D57 CRC1 IPG1 IPG5 IPG9 PA1 PA5 D1 D57 D61 CRC4 IPG4 IPG8 IPG12 PA4 SFD D56 D60 CRC4 IPG3 IPG7 IPG11 PA3 PA7 D3 PA2 PA6 D2 D58 CRC2 IPG2 IPG6 IPG10 PA2 PA6 D2 D58 CRC1 IPG1 IPG5 IPG9 PA1 PA5 D1 D57 D61 CRC4 IPG4 IPG8 IPG12 PA4 SFD D4 PA3 PA7 D3 D59 CRC3 IPG3 IPG7 IPG11 PA3 PA7 D3 D59 CRC2 IPG2 IPG6 IPG10 PA2 PA6 D2 D58 CRC1 IPG1 IPG5 IPG9 PA1 PA5 D1 D5 PA4 SFD D4 D60 CRC4 IPG4 IPG8 IPG12 PA4 SFD D4 D60 CRC3 IPG3 IPG7 IPG11 PA3 PA7 D3 D59 CRC2 IPG2 IPG6 IPG10 PA2 PA6 D2 D6 10-GMII

Idle1 PA1 PA5 D1 D57 CRC1 Idle2 Idle2 Idle2 PA1 PA5 D1 D57 D61 CRC4 Idle3 Idle3 Idle3 PA4 SFD D56 D60 CRC3 Idle4 Idle4 Idle4 PA3 PA7 D3

Idle1 PA2 PA6 D2 D58 CRC2 Idle2 Idle2 Idle2 PA2 PA6 D2 D58 CRC1 Idle3 Idle3 Idle3 PA1 PA5 D1 D57 D61 CRC4 Idle4 Idle4 Idle4 PA4 SFD D4

Idle1 PA3 PA7 D3 D59 CRC3 Idle2 Idle2 Idle2 PA3 PA7 D3 D59 CRC2 Idle3 Idle3 Idle3 PA2 PA6 D2 D58 CRC1 Idle4 Idle4 Idle4 PA1 PA5 D1 D5

Idle1 PA4 SFD D4 D60 CRC4 Idle2 Idle2 Idle2 PA4 SFD D4 D60 CRC3 Idle3 Idle3 Idle3 PA3 PA7 D3 D59 CRC2 Idle4 Idle4 Idle4 PA2 PA6 D2 D6 2-GMII --- Channels 1-4

IDLE1 SPD D DD D D EFD IDLE2 SPD DDD DDD EPD IDLE3 SFD DD D D D EFD IDLE4 SFD D D

IDLE1 SFD DDD DD EFD IDLE2 SFD DDD DDEFD IDLE3 SPD D DDD D D EPD IDLE4 SFD D D

IDLE1 SFD D DD D D EFD IDLE2 SFD DDD DDEFD IDLE3 SFD D DDD D EFD IDLE4 SPD D D D

IDLE1 SFD D DD D D EPD IDLE2 SFD DDD DDEFD IDLE3 SFD DDDDD D EFD IDLE4 SFD D D

2-TBI --- Channels 1-4

23 IEEE 802.3

HSSG S. Muller - Sun Error Handling

■ Data Corruption Errors

■ Data symbol converted to another data symbol ■ Detected and handled by the MAC Layer, analogous to a single-channel trans- mission scheme

■ Per Channel Errors

■ Code violations, Framing errors, Disparity errors, etc. ■ Detected by the PCS in each channel ■ Propagated to the Collector over the 2-GMII using the RX_ER handshake ■ Propagated to the MAC over the 10-GMII using the RX_ER handshake ■ Affect the entire packet in progress

■ Distribution/Collection Errors

■ Length mismatch between “mini-frames” that belong to the same pack- et greater than 1 byte ■ Detected by the Collector ■ Propagated to the MAC over the 10-GMII using the RX_ER handshake 24 IEEE 802.3

HSSG S. Muller - Sun Error Handling (continued)

■ Channel Skew Errors

■ Overflow of at least one per channel elasticity fifo in the Collector ■ Detected by the Collector ■ The entire packet is dropped in the Collector

■ Channel Synchronization Errors

■ At least one channel is out of synch ■ The channel received a “mini-frame” with a sequence number that does not match the expected one ■ The Collector drops packets until synchronization is reestablished ■ Collector cannot reestablish synchronization ■ Timer based ■ Collector reestablished synchronization incorrectly ■ Can happen if multiple “mini-frames” on a single channel “vanished” ■ Results in a very high rate of CRC errors, with no other errors apparent ■ Flush the pipe (link initialization or 802.3x)

25 IEEE 802.3

HSSG S. Muller - Sun Summary

■ “One solution that fits all” for the 10-Gigabit Ethernet Standard will not appropriately address the customer needs

■ The standards effort should be prioritized as follows: ■ Phase 1: ■ Overall architecture and structure of the standard ■ Changes to the MAC Layer ■ Media Independent Interfaces ■ Channelized transmission scheme ■ PCS definition based on the 8B/10B coding scheme ■ Physical Layer for ~100m over MMF bundles using SX lasers ■ Physical Layer for ~300m over MMF using LX lasers and WDM ■ Phase 2: ■ Serialized transmission scheme ■ New PCS definition based on a high efficiency encoding method ■ Physical Layer for ~3km over SMF using LX lasers ■ Physical Layer for ~50km over SMF using LX lasers 26 IEEE 802.3

HSSG S. Muller - Sun