A 24Port 10G Ethernet Switch
Total Page:16
File Type:pdf, Size:1020Kb
A 24-port 10G Ethernet Switch (with asynchronous circuitry) Andrew Lines 1 Agenda Product Information Technical Details Photos 2 Tahoe: First FocalPoint Family Member The lowest-latency feature-rich 10GE switch chip Tahoe · 10G Ethernet switch - 24 Ports · Line rate performance - 240Gb/s bandwidth SPI CPU JTAG LED - 360M frames/s - Full-speed multicast Frame Processor · Fully-integrated single chip (Scheduler) - 1MB frame memory - 16K MAC addresses ® ® · Lowest latency Ethernet ) 4) s s - -4 X X - 200ns with copper cables u u (C (C x x I I ™ U U e e · Rich Feature Set RapidArray A X XA N N (packet storage) - Extensive layer 2 features · Flexible SERDES interfaces - 10G XAUI (CX-4) - 1G SGMII Asynchronous Blocks 3 Tahoe Hardware Architecture Modular architecture, centralized control SPI CPU JTAG LED Interface Interface Interface Interface Management Frame Control LCI Lookup Handler Stats RX Port Logic Scheduler TX Port Logic P M M P Ser Ser C A A C Des Des S C C S Switch Element Data Path ® ® s s ™ u u x RapidArray x e (1MB Shared Memory) e N N RX Port Logic TX Port Logic P M M P Ser Ser C A A C Des Des S C C S 4 Tahoe Chip Plot Fabricated in TSMC 0.13um Ethernet Port Logic - SerDes RapidArray Memory - PCS - 1MB shared - MAC Nexus Crossbars - 1.5Tb/s total - 3ns latency Scheduler - Highly optimized - High event rate MAC Table - 16K addresses Management Frame Control - CPU interface - Frame handler - JTAG - Lookup - EEPROM interface - Statistics - LEDs 5 Bridge Features Robust set of layer-2 features · General Bridge Features · Security - 16K MAC entries - 802.1x; MAC Address Security - STP: multiple, rapid, standard · Monitoring - Learning and Ageing - Rich monitoring terms - Multicast GMRP and IGMPv3 · logical combination of terms · VLAN Tag (IEEE 802.1Q-2003) · Src Port, Dst Port, VLAN, - Add / Remove tags Traffic Type, Priority, Src - Per port association default MA, Dst MA, etc. - 4K-entry VLAN-ID table - Monitoring action - Per VLAN, per-port STP · Drop, Mirror, Redirect, Count, Change Priority · Scheduling, Pause, Congestion - 16 rules per frame - 16 traffic classes for WRED · Statistics - 4 queues per port scheduling - RFC 2819 compliant - WRR or strict priority - All counters are 64 bits - Pause support - 13 counter groups · RMON and SMON · Fulcrum extensions 6 Link Aggregation and Fat Tree Support True IEEE-compliant Link Symmetric hashing guarantees Aggregation used to group links a conversation resolves to the Link Aggregation between line and fabric switches same fabric switch chip features Ingress to Fabri Fabri Fabri fabric hop · Configuration c c c uses Link ´´´ Aggregation - 12 trunk groups Chip Chip Chip hardware to - Any ports in a group load balance - Up to 12 members Intra-switch Link (ISL) · Hash: Ethernet CRC - Programmable Input - SA, DA, Type, VLAN- Line Line Line Line Line ID, Priority, Source port Chi Chi Chi Chi Chi p p p p p - SA-DA hash symmetry ´´´ forcing - Group renumbering ´´´ ´´´ · Other HW hooks - Slow protocol traps MAC A MAC B 7 Two Versions Sampling in Q1 2006 Announced pricing at SC|05 First company to break through $20/port for 10GE · FM2224 - 24 10GE Interfaces - 1433-ball BGA - 40mm - $450 · FM2112 - 8 10GE Interfaces and - 16 1-2.5GE Interfaces - 897-ball BGA - 32mm - $265 8 24-Port Reference Design (Now Shipping) Evaluation Platform CSL 13 14 15 16 17 18 19 20 21 22 23 24 1 2 3 4 5 6 7 8 9 10 11 12 ETH 9 Agenda Product Information Technical Details Photos 10 Tahoe Hardware Features · Multiple Frequency Requirements - 3.125GHz serial links (licensed from RAMBUS) - 312.5MHz 32-bit datapaths (sync and async) - 750MHz MAC Table, Scheduler, Main Memory, Statistics, cross-chip interconnect (async) - 360MHz Frame Processing (sync) - 66MHz Management (sync) · Mixed design styles - 3 synchronous blocks: synthesize, place, and route - Many custom async blocks (most of the transistors) - Licensed cores: SERDES, PLL, TTL pads, fusebox 11 Tahoe Chip Statistics · TSMC 0.13um LVOD FSG 1.2V · 105M transistors · Over 3000 unique cells · 1.5MB total SRAM (all asynchronous) · 0.5-1.5W per port depending on activity (36W peak) · Flip-chip BGA package 12 Sync and Async together? · Use existing 3rd party IP cores for synchronous I/O, such as high-speed SERDES from RAMBUS. · Use standard synchronous synthesis, place, and route flow to implement logically complex units with lower speed requirements. · Use async flow only where it has the biggest advantages ± SRAMs, crossbars, chip-wide interconnect, FIFO©s, and high-speed blocks. · Must partition the problem in Architecture. · Some day everything will be Async, but not yet! 13 Simple Sync-to-Async Conversion · Synchronous Request / Grant FIFO protocol S2A A2S Synchronous Asynchronous Asynchronous Synchronous Datapath Datapath Datapath Datapath Request Request A A Grant Grant clock clock Seamlessly Bridges Different Clock Domains 14 Digital Verification · Often overlooked in Academia, but crucial in Industry! · There are nearly as many engineers in verification as there are in design. · Use industry-standard approach of a full-chip simulation with test-bench, test suite, regression engine. · Try to get full line and conjunct coverage. · Convert CSP/PRS into Verilog for chip-level simulation combined with synchronous blocks. · Also use simple closed-environment self-tests to check that different levels of async decomposition match, but this is not sufficient. 15 Design For Test · Must be able to check for manufacturing defects in async blocks. · Introduce special ªscan-buffersº which integrate a serial shift register into an async buffer. · Connect the scan-buffers into 16 serial scan-chains. · Can issue an inject, drain, or skip command to each scan-buffer on a scan-chain. · External clocked interface to standard testers. · Commercial fault-grading tool (ZOIX). 16 Async SRAM in FocalPoint Use TSMC 6T state bit layout Multi-bank design connected with async crossbars and busses Supports up to 32 write ports and 32 read ports in parallel Bank runs at 600MHz, but interconnect sustains 750MHz 17 SRAM Test and Repair · Scan-buffers integrated into most SRAM banks. · On-chip accelerated testing for largest SRAM. · Tester produces a defect map. · Burn fusebox to use spare addresses to repair bit or address-line errors. · In many SRAMs, can simply remove a block of bad ªsegmentsº of storage from the free memory pool. This can repair many more types of errors. · Yield looks quite good so far, as expected. 18 Agenda Product Information Technical Details Photos 19 FocalPoint Test Platform 20 FocalPoint EP Board 21 FocalPoint EP Rack 22 Wishlist · CSP vs CSP formal verification · CSP vs PRS formal verification · ATPG tools for async circuits · Static timing for async circuits · Async synthesis from CSP · 65nm advice If you©ve working on any of these, talk to me! 23.