5DSLG,2 3&, ([SUHVV DQG *LJDELW (WKHUQHW Pros and Cons of Using These Interconnects in &RPSDULVRQ Embedded Systems

7KH(PEHGGHG)DEULF&KRLFH

© Copyright 2005 RapidIO® Trade Association Revision 03, May 2005 Agenda

• Interconnect Requirements & Market Focus • Architecture and Protocol • System Level Considerations • Conclusions

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 2 Interconnects Issues In Embedded Systems

– Desire for higher performance • Interconnects often bottlenecks – Lower cost (Non-recurrent costs, capital expense, operating costs) • Standards-based development – Modularity & Reuse • Standard interfaces promote reuse across platforms and over time – Common Components • Standards reduce components and complexity – Distributed Processing • Interconnects key to performance – More Communication Standards and Interworking • Alphabet soup and getting worse! – System-wide Interconnects • Backplane and line card (chip-to-chip)

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 3 Interconnect Trends

nd 2 Generation Point-to-Point DeviceDevice DeviceDevice – Packet switched – PHY: SERDES differential Switch – Lowest pin count Fabric 1st Generation Point-to-Point DeviceDevice – Packet switched DeviceDevice DeviceDevice – PHY: Source-sync differential – Lower pin count DeviceDevice DeviceDevice • 10 GHz Example: HT / P-RIO ” 3 GHz Example: PCI Ex / DeviceDevice S-RIO Hierarchical DeviceDevice DeviceDevice – Bridged Hierarchy – Broadcast DeviceDevice – PHY: Single-ended BridgeBridge DeviceDevice DeviceDevice

Example: PCI / PCI-X ” 133MHz ce an rm Shared Bus fo er – Single segment DeviceDevice DeviceDevice DeviceDevice DeviceDevice DeviceDevice P – Broadcast Device Device Device Device Device – PHY: Single-ended Device Device Device Device Device – Highest pin count Example: VME ” 66MHz

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 4 Interconnect Roles

• Interconnect roles – Chip-to-chip – Board-to-board (Backplane) Chassis-to-chassis – Chassis-to-chassis

Chip-to-chip Board-to-Board

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 5 Market Focus: Gigabit

• Successor to 100Mbps Ethernet – 10Ge in the wings • First revision standard completed in 1998 10/100 Switch/ Gigabit Links – Copper in 1999 Router – Various MAC-to-PHY Standards defined • Initial application in WAN – Aggregation • High performance switches , routers and serves in LAN backbones – Later WAN to workstations, PCs and laptops • Positioning – Well known – “Safe Choice”

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 6 Market Focus: PCI Express

• Successor to PCI 2.3/PCI-X – Driven by Intel and PCISIG – Fully SW/firmware backward compatible to PCI • First revision standard completed in PC & Servers 2002 – Rollout this year in PC infrastructure CPUCPU • x86 chipsets, video cards, NICs – x1, x4, x8 and x16 most common • Initial application in PCs and Servers x16 GPUGPU NorthBrNorthBr DRAMDRAM – Follow-on to AGP8x for 3D graphics HW – GigE, 10GE NICs USB – Storage (RAID, FC), PCI bridges SATA SouthBrSouthBr • Positioning LclBus – Interconnect for PC and Servers x1 space GigEGigE – Embedded if suitable x1 SW ExpExp

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 7 Market Focus: RapidIO

• Initially a processor interconnect • First revision standard completed in 1999 – Rollout in processors, bridges and switches – Parallel 8-bit RapidIO @ 500 MHz applied clock • Initial application in embedded Embedded, Networking systems and Telecom – Compute, defense, networking &

telecom line cards Data Path Fabric – CPU I/O, Line-card aggregation, Control Plane backplane Fabric – Serial PHY allowed expansion to Host Card Line Card DSP data plane Switch Switch Traffic CPU Traffic CPU Aggregation • Flow control, encapsulation, Aggregation

ASIC streams ASIC Data NPU DSP DSP DSP NPU DSP DSP DSP CPU CPU CPU CPU CPU • Positioning Ethernet CPU DSP DSP DSP DSP DSP DSP

MEM PHYs DSP DSP DSP – Best solution for embedded PHYs DSP DSP DSP MEM MEM systems MEMy Control – Suitable for both control and data Ethernet / IP ATM / SONET plane Infiniband / Fibrechannel

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 8 Gigabit Ethernet Overview

• WAN scale interconnect – Box-to-box, board-to-board, GigE Endpoint chip-to-chip, backplane CPUCPU – Connect world’s computers – Physical layer defined for LAN- DRAMDRAM scale interconnection MAC/PHYMAC/PHY • Closet to computer • 100+ m distance Port 2 • Extensibility End End – Layered OSI Architecture End Port Switch/Switch/ Port End pointpoint 1 RouterRouter 3 pointpoint • Point-to-point packetized architecture Port 0 – Variable packet size – High header overhead EndEnd – 46-1500 byte packet L2 PDU pointpoint – Up to 9000 byte jumbo frames

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 9 Ethernet Protocol: Layer 2

 

Preamble 4

Preamble SFD 8

Destination Address 12

Destination Address Source Address 16

Source Address 20

Type/Length Packet PDU 24

Packet PDU 278

FCS 282

Inter-Frame Gap 294 Bytes

Layer 2 Packet Type: 1500 Byte Max Packet PDU Total = 294 Bytes (256 Byte PDU)

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 10 Ethernet Protocol: UDP w/Priority Tagging

Preamble/SFD L2 Header IP Header 8 Bytes 14 Bytes 20 Bytes

UDP Header User PDU FCS IFG 8 Bytes 256 Bytes 4 Bytes 12 Bytes

322 Bytes (256 Byte User PDU) UDP Packet Type: 1472 byte User PDU

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 11 Ethernet Protocol: TCP/IP

Preamble/SFD L2 Header IP Header 8 Bytes 14 Bytes 20 Bytes

TCP Header User PDU FCS IPG 20 Bytes 256 Bytes 4 Bytes 12 Bytes

334 Bytes (256 Byte User PDU) TCP/IP Packet Type: 1460 Byte Max User PDU

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 12 PCI Express Overview

• Chassis-scale interconnect CPUCPU – Chip-to-chip, Board-to- board via connector or End Host/Host/ End cabling End Port Root Port End point 0 Root 2 point – Required legacy PCI point ComplexComplex point compatibility Port 1 – Physical layer defined for Upstream board + connector Switch Port • ~40-50 cm + 2 connectors SwitchSwitch P2P

• Extensibility P2P P2P P2P

– Layered architecture Down Down Down stream stream stream • Point-to-point packetized Port Port Port architecture – Relatively low overhead EndEnd EndEnd EndEnd – Variable size packets pointpoint pointpoint pointpoint – 128-4096 byte PDU

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 13 PCI Express Protocol

 

Rsvd TLP Sequence Number R FMT Type R TC Rsvd 4

T E D P Attr R Length Requester ID 8 Last DW First DW Tag Address[31:16] 12 BE BE Address[15:2] R Packet PDU 16

Packet PDU 20

Packet PDU Optional TLP Digest (ECRC) 272

Optional TLP Digest (ECRC) Cont LCRC 276 280 Bytes LCRC Cont Next Packet/DLLP

Memory Write: 4096 Byte Max Packet PDU Total = 278 Bytes (256 Byte PDU)

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 14 RapidIO Overview

• Chassis scale interconnect – Chip-to-chip, Board-to-board via connector or cabling Host – Physical layer defined for Host backplane interconnection

• ~80-100 cm + 2 connectors Port (Serial) 2 • Extensibility EndEnd Port Port EndEnd SwitchSwitch – Layered architecture pointpoint 1 3 pointpoint • Point-to-point packetized Port architecture 0 – Low overhead – Variable packet size EndEnd – Maximum 256 byte PDU pointpoint – SAR support for 4 K-byte messages

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 15 RapidIO Packet Format: SWRITE

  AckID Prior tt FTYPE Target ID Source ID 4 0 0 0 0 0 (0 1 1 0)

Address 0 XAdd 8

Packet PDU 80

Early CRC Packet PDU 88

266 Bytes (256 Byte PDU) Packet PDU CRC

SWRITE Packet Type: 256 Byte Max Packet PDU

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 16 RapidIO Packet Format: Message

  AckID Prior tt FTYPE Target ID Source ID 4 0 0 0 0 0 (1 0 1 1)

Msg Source Msg Letter Mbox Packet PDU 8 length size seg

Packet PDU 80

Early CRC Packet PDU 88

Packet PDU CRC 268 Bytes (256 Byte PDU w/2 byte pad)

Message Packet Type: 256 Byte Max Packet PDU, 4K w/SAR

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 17 Logical Layer Comparison

PCI GigE RapidIO Express Read/Write Read/Write Memory-mapped R/W No Atomics Configuration Configuration

Write w/Response Support No No Yes

Address Size N/A 32, 64-bits 34, 50, 66-bits

Globally Shared Memory No No Yes

Up to 4K Messages with Interrupts and Messaging Support No HW SAR support, Event Signaling Doorbells Up to 1500 HW SAR up to 64Kbyte Datagram Support byte user No user payloads payloads

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 18 Transport Layer Comparison

GigE PCI Express RapidIO

Topologies Any Tree Any

248 (L2) Large 28 (Small) Number of endpoints 232 (IPv4) (Address-based) 216 (Large) 2128 (IPv6)

What fields must L2: None TLP, Seq Num, AckID switches modify? IP: TTL, MAC, FCS LCRC Yes Multicast Yes Yes (Message only) L2: Best Effort Guranteed, Delivery Guaranteed IP: Guaranteed Best Effort

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 19 Physical Layer Comparison

Parallel Serial GigE PCI Express RapidIO RapidIO

1x, 2x, 4x, 8x, 10 bits Tx/Rx Signal Pairs 4x† 1x, 4x, 8x*, 16x 12x, 16x, 32x 19 bits

~40-50 cm + ~50-80 cm + ~80-100 cm + Channel 100 m cat5 2 connectors 2 connectors 2 connectors

10, 100, 1000 1.25, 2.5, 3.125 Data rate 2.5 GBaud 500-2000 MHz Mbps GBaud

4D-PAM5 Proprietary XAUI Signaling LVDS MLS AC Coupled AC Coupled

Clocking Embedded Embedded Clock + Data Embedded

Latency Highest Higher Lowest Next Lowest

†Tx,Rx are on same wire pairs

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 20 Protocol Efficiency

100%

90%

80%

70%

60% RapidIO PCI Express 50% GigE L2 GigE UDP Effeciency 40% GigE TCP

30% NOTE: SWrite for RapidIO, MWr for PCI Express. Includes header and ACK 20% overhead

10%

0% 1 10 100 1000 10000 PDU Size (Bytes)

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 21 Effective Bandwidth

10 9.4 9.4 9.4 9.4 9.4 8.9 9 8.0 7.8 7.9 7.9 8 7.6 7.2

7 6.7 6.6

6 5.6 SRIO 4x 3.125G 5 4.3 PCI Express x4 GigE: UDP 4 Bandwidth (Gbps) Bandwidth

3

2

0.9 0.9 1.0 0.7 0.8 1 0.5 0.3

0 1 10 100 1000 10000 PDU Size (Bytes)

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 22 Quality-of-Service (QoS) Dependencies

• QoS depends on proper hooks across the interconnect fabric – Hierarchical Flow Control • Addresses short, medium and long-term congestion events • Link and end-to-end – Ability to define many streams of traffic • Often defined as a logical sequence of transactions between two endpoints – Ability to differentiate classes of traffic among streams – Ability to reserve and allocate bandwidth to streams and classes

Overall Interconnect Traffic Streams Classes

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 23 QoS Comparison: Gigabit Ethernet

• No universal QoS standard • Some Layer 2+ switches support Priority Tagging (802.1d/q) – Eight classes • Increasing number of routers support MPLS at L3 31 12 2 Bytes PRIO CFI VID TCI

Preamble/SFD L2 Header IP Header 8 Bytes 14 Bytes 20 Bytes

UDP Header User PDU FCS IFG 8 Bytes 256 Bytes 4 Bytes 12 Bytes

324 Bytes UDP Packet Type: 1472 byte User PDU (256 Byte User PDU)

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 24 QoS Comparison: PCI Express

• 8 Traffic Classes (TC) – No ordering between TCs • 8 Virtual Channel (VC) – Separate buffer resources per VC – TCs are mapped onto VCs • TC to VC mapping per port – No VC field in TLP • Flexible arbitration – Arbitrary, RR, WRR • Most implementations support only a single TC/VC

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 25 QoS Comparison: RapidIO

• All implementations must support 3 prioritized flows – No ordering between flows EndEnd PointPoint – Allows shared buffer pool across flows YY • Switches required to provide some Flow 0 EndEnd EndEnd improved service PointPoint SwitchSwitch PointPoint – Extent of improvement is WW XX implementation dependant Flow 1 EndEnd • Supports carrier-grade level QoS PointPoint ZZ – Support for 1000s of flows, hundreds of traffic classes – End-to-end traffic management

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 26 Flow Control Comparison

Gigabit Ethernet PCI Express RapidIO • Link-to-link flow control • Link-to-link flow control • Link-to-link flow control – PAUSE frames • Congestion control • L3+ end-to-end flow control – XON, XOFF – ECN, TCP windowing, End-to-End • Fine-grained end-to- others Traffic Mgmt end flow control – Data Streaming Logical Layer EndEnd EndEnd pointpoint XOn-XOff pointpoint Congestion Control

EndEnd EndEnd SwitchSwitch SwitchSwitch SwitchSwitch pointpoint pointpoint

Link-level Flow Control EndEnd EndEnd pointpoint Back pointpoint Line Card Plane Line Card

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 27 GigE Usage Models

• Control plane TCP/IP UDP Layer 2 – Sockets and TCP/IP • Protocol encapsulation Client Client Client • SAR support • Guaranteed delivery L3+ – Custom SW stack? SAR/Err SAR/Err • Data plane – Custom UDP-like stack? TCP UDP – UDP L3 • Flow multiplexing via port number IP IP • Address-less datagrams • No SAR support L2 MAC MAC MAC • Multicast/Broadcast support • Best effort – MAC-layer with L2 switching • Address-less datagrams

• No SAR support Hardware Software • Best effort

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 28 PCI Express Usage Models

MWr • Control plane – CPU load/store & DMA Client – Interrupts and Event Application signaling DMA • Data plane Logical – Address-based read & writes Interface Trans • DMA completion requires notification transaction PHY – No Write w/Response • Hardware-based error recovery and ACK (lossless)

Hardware Software

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 29 RapidIO Usage Models

• Control plane SWRITE, MSG – Address-based reads, writes and messaging – CPU load/store & DMA Client • Data plane Application – Address-based read & writes • DMA completion requires notification transaction DMA • Hardware-based error recovery and ACK (lossless) Logical – Messaging • Address-less datagrams • Hardware-based SAR Interface Trans • 4K Byte max user PDU size • Hardware-based error recovery and ACK PHY (lossless) – RapidFabric • Address-less datagrams • Hardware-based SAR • Up to 64KB user PDU size • Lossy Hardware Software

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 30 Performance: Gigabit Ethernet

• Microsecond+ fall through latencies (~100us?) – Not just the hardware, data has to traverse the SW stack • High CPU overhead – Rule of thumb appears to be borne out in data for TCP/IP SW overhead • 1 Hz of CPU per bit of throughput (per direction) – Wire speed achievable with GHz class processors • Some CPU will be left but how much depends on – Protocol being terminated – Offload features of GigE interfaces – Too often advanced TOE features cannot be leveraged • OS & SW stack support issues – Use of UDP or MAC/Layer 2 often involve proprietary protocols • Can defeat the value of off-the-shelf “standards-based” solution • Error correction at endpoint stacks introduce latency jitter and determinism issues • Lack of end-to-end flow control problematic for non-traffic managed systems that can’t significantly overprovision

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 31 Performance: PCI Express & RapidIO • Latency – Sub-microsecond switch latencies • PCI Express switches must do complicated address comparisons – End-to-end latency • Lower latency than GigE since latency does not include a SW stack • Architecture – PCI Express switches allow limited but not complete peer-to-peer communication • Multiple hosting for redundancy problematic – Maintenance transactions cannot move upstream or peer-to-peer – Proposed switches use non-transparent bridges as work around (i.e., create two separate spaces for each host) – PCI Express systems with multiple hosts must use switches with non- transparent bridges • Bridging is non-standard and implementation specific – RapidIO switches can be simple and orthogonal in architecture • Header architected to reduce logic • No need to recalculate CRC

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 32 Efficiency and Throughput

• Efficiency – RapidIO has significant advantage in header efficiency • Especially true when using Layer 3+ to transport user PDUs smaller than 128 bytes – Control plane traffic often bursty and small packet oriented • Effective utilization of GigE likely to be low on this basis alone • Throughput – What percentage of raw interface bandwidth can be utilized? • Flow control mechanisms key in increasing utilization • Well under 50% typical for Ethernet to avoid packet loss • Only RapidIO has a full range of flow-control mechanisms – Throughput can be limited by bottlenecks unrelated to interconnect • Smaller packet sizes involve increasing overhead per byte of data – More buffer descriptor fetches – More header overhead – More interrupts • Memory and bridging device bottlenecks • Extensive protocol termination requires CPU cycles

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 33 Some Economics

• RapidIO, PCI Express and Gigabit Ethernet with some TCP/IP offload have similar underlying silicon costs – PCI Express logic size is larger than RapidIO – Aggressive TCP/IP Offload engine larger than PCI Express and RapidIO endpoints – GigE Copper PHY is very large (~20mm2 in 130nm) • Leveraging Ethernet volume economics is rarely a reality – L2+ GigE switches suitable for aggregation and backplanes are not high volume • 12-16 ports, QoS features and SERDES PHYs – Terminating TCP/IP demands significant processor overhead • Dedicate processor or reduce performance and/or application features • RapidIO has the underlying economics to offer better performance at a price competitive with PCI Express and GigE

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 34 Pros & Cons

Gigabit Ethernet PCI Express RapidIO Technology • Pro • Pro • Pro – Well understood – Long-term use in PC- – Strong system and hence low related HW interconnect risk – Long-term role as solution • Con legacy chip-to-chip – Full QoS and Flow Control features – High overhead interconnect • Con – Arbitrary – High latency and topologies latency jitter (i.e. – Unsuitable connecting supported poor determinism) more than a few devices – Control and data – Significant cost plane support jump for – Higher protocol bandwidth above overhead than RapidIO – Lowest overhead 1Gbps with minimal silicon – Features not driven by footprint – No standard embedded backplane requirements • Con SERDES PHY – No data plane support – Not intended for – No standard HW PC & Server space acceleration

Revision 03 © Copyright 2005 RapidIO® Trade Association Slide 35 ConclusionConclusion

• RapidIO Technology will expand its existing role as standard system fabric – Efficient protocol supporting both control and data plane – Variety of PHY speeds – Cost competitive underlying economics – Available now

• Gigabit Ethernet will serve a limited role as a system interconnect – Low performance settings where significant over provisioning is possible

• PCI Express will remain largely within the PC and Server space and have a limited role in the embedded space – Where there is an intersection with the PC & Server space – In places where PCI is used today – Rarely as fabric due to handling of large numbers of endpoints

Revision 03 © Copyright 2005 RapidIO® Trade Association. RapidIO and the RapidIO logo are registered trademarks Slide 36 and RapidFabric is a trademark of the RapidIO Trade Association.