A 24port 10G Ethernet Switch
(with asynchronous circuitry)
Andrew Lines
1 Agenda
Product Information Technical Details Photos
2 Tahoe: First FocalPoint Family Member
The lowestlatency featurerich 10GE switch chip Tahoe • 10G Ethernet switch 24 Ports • Line rate performance 240Gb/s bandwidth SPI CPU JTAG LED 360M frames/s Fullspeed multicast
Frame Processor • Fullyintegrated single chip (Scheduler) 1MB frame memory 16K MAC addresses ® ® • Lowest latency Ethernet ) 4) s s 4 X X 200ns with copper cables u u (C (C
x x I I ™ U U e e • Rich Feature Set
RapidArray A X XA N N (packet storage) Extensive layer 2 features • Flexible SERDES interfaces 10G XAUI (CX4) 1G SGMII
Asynchronous Blocks
3 Tahoe Hardware Architecture
Modular architecture, centralized control SPI CPU JTAG LED Interface Interface Interface Interface
Management
Frame Control LCI
Lookup Handler Stats
RX Port Logic Scheduler TX Port Logic P M M P Ser Ser C A A C Des Des S C C S
Switch Element Data Path ® ® s s ™ u u
x RapidArray x e (1MB Shared Memory) e N N RX Port Logic TX Port Logic P M M P Ser Ser C A A C Des Des S C C S
4 Tahoe Chip Plot
Fabricated in TSMC 0.13um Ethernet Port Logic SerDes RapidArray Memory PCS 1MB shared MAC
Nexus Crossbars 1.5Tb/s total 3ns latency
Scheduler Highly optimized High event rate MAC Table 16K addresses
Management Frame Control CPU interface Frame handler JTAG Lookup EEPROM interface Statistics LEDs
5 Bridge Features
Robust set of layer2 features
• General Bridge Features • Security 16K MAC entries 802.1x; MAC Address Security STP: multiple, rapid, standard • Monitoring Learning and Ageing Rich monitoring terms Multicast GMRP and IGMPv3 • logical combination of terms • VLAN Tag (IEEE 802.1Q2003) • Src Port, Dst Port, VLAN, Add / Remove tags Traffic Type, Priority, Src Per port association default MA, Dst MA, etc. 4Kentry VLANID table Monitoring action Per VLAN, perport STP • Drop, Mirror, Redirect, Count, Change Priority • Scheduling, Pause, Congestion 16 rules per frame 16 traffic classes for WRED • Statistics 4 queues per port scheduling RFC 2819 compliant WRR or strict priority All counters are 64 bits Pause support 13 counter groups • RMON and SMON • Fulcrum extensions
6 Link Aggregation and Fat Tree Support
True IEEEcompliant Link Symmetric hashing guarantees Aggregation used to group links a conversation resolves to the Link Aggregation between line and fabric switches same fabric switch chip features Ingress to Fabri Fabri Fabri fabric hop • Configuration c c c uses Link ∙∙∙ Aggregation 12 trunk groups Chip Chip Chip hardware to Any ports in a group load balance Up to 12 members Intraswitch Link (ISL) • Hash: Ethernet CRC Programmable Input SA, DA, Type, VLAN Line Line Line Line Line ID, Priority, Source port Chi Chi Chi Chi Chi p p p p p SADA hash symmetry ∙∙∙ forcing Group renumbering ∙∙∙ ∙∙∙ • Other HW hooks Slow protocol traps
MAC A MAC B
7 Two Versions Sampling in Q1 2006
Announced pricing at SC|05 First company to break through $20/port for 10GE
• FM2224 24 10GE Interfaces 1433ball BGA 40mm $450
• FM2112 8 10GE Interfaces and 16 12.5GE Interfaces 897ball BGA 32mm $265
8 24Port Reference Design (Now Shipping)
Evaluation Platform
CSL
13 14 15 16 17 18 19 20 21 22 23 24
1 2 3 4 5 6 7 8 9 10 11 12 ETH
9 Agenda
Product Information Technical Details Photos
10 Tahoe Hardware Features
• Multiple Frequency Requirements 3.125GHz serial links (licensed from RAMBUS) 312.5MHz 32bit datapaths (sync and async) 750MHz MAC Table, Scheduler, Main Memory, Statistics, crosschip interconnect (async) 360MHz Frame Processing (sync) 66MHz Management (sync) • Mixed design styles 3 synchronous blocks: synthesize, place, and route Many custom async blocks (most of the transistors) Licensed cores: SERDES, PLL, TTL pads, fusebox
11 Tahoe Chip Statistics
• TSMC 0.13um LVOD FSG 1.2V • 105M transistors • Over 3000 unique cells • 1.5MB total SRAM (all asynchronous) • 0.51.5W per port depending on activity (36W peak) • Flipchip BGA package
12 Sync and Async together?
• Use existing 3rd party IP cores for synchronous I/O, such as highspeed SERDES from RAMBUS. • Use standard synchronous synthesis, place, and route flow to implement logically complex units with lower speed requirements. • Use async flow only where it has the biggest advantages – SRAMs, crossbars, chipwide interconnect, FIFO's, and highspeed blocks. • Must partition the problem in Architecture. • Some day everything will be Async, but not yet!
13 Simple SynctoAsync Conversion
• Synchronous Request / Grant FIFO protocol
S2A A2S
Synchronous Asynchronous Asynchronous Synchronous Datapath Datapath Datapath Datapath
Request Request A A Grant Grant
clock clock
Seamlessly Bridges Different Clock Domains
14 Digital Verification
• Often overlooked in Academia, but crucial in Industry! • There are nearly as many engineers in verification as there are in design. • Use industrystandard approach of a fullchip simulation with testbench, test suite, regression engine. • Try to get full line and conjunct coverage. • Convert CSP/PRS into Verilog for chiplevel simulation combined with synchronous blocks. • Also use simple closedenvironment selftests to check that different levels of async decomposition match, but this is not sufficient.
15 Design For Test
• Must be able to check for manufacturing defects in async blocks. • Introduce special “scanbuffers” which integrate a serial shift register into an async buffer. • Connect the scanbuffers into 16 serial scanchains. • Can issue an inject, drain, or skip command to each scanbuffer on a scanchain. • External clocked interface to standard testers. • Commercial faultgrading tool (ZOIX).
16 Async SRAM in FocalPoint
Use TSMC 6T state bit layout Multibank design connected with async crossbars and busses Supports up to 32 write ports and 32 read ports in parallel Bank runs at 600MHz, but interconnect sustains 750MHz
17 SRAM Test and Repair
• Scanbuffers integrated into most SRAM banks. • Onchip accelerated testing for largest SRAM. • Tester produces a defect map. • Burn fusebox to use spare addresses to repair bit or addressline errors. • In many SRAMs, can simply remove a block of bad “segments” of storage from the free memory pool. This can repair many more types of errors. • Yield looks quite good so far, as expected.
18 Agenda
Product Information Technical Details Photos
19 FocalPoint Test Platform
20 FocalPoint EP Board
21 FocalPoint EP Rack
22 Wishlist
• CSP vs CSP formal verification • CSP vs PRS formal verification • ATPG tools for async circuits • Static timing for async circuits • Async synthesis from CSP • 65nm advice
If you've working on any of these, talk to me!
23