Chang-Hong Wu Distinguished Engineer, Juniper Networks the INTERNET EXPLOSION
Total Page:16
File Type:pdf, Size:1020Kb
ASICS: THE HEART OF MODERN ROUTERS Chang-Hong Wu Distinguished Engineer, Juniper Networks THE INTERNET EXPLOSION # Web Sites 130EB/yr Internet Capacity 162M # Connected Devices 1B Total Digitized Information 420EB # Google Searches/Month 100M 31B/mo 12EB/yr 40M 110EB 4PB/yr 60PB/yr 9.5M 160M 25M 33K 1 1.7M 2.7B/mo 1988 1993 1998 2003 2008 Exponential growth, no matter how you measure it! The clearest indication of value delivered to end-users 2 Copyright © 2010 Juniper Networks, Inc. DRIVING FORCE BEHIND EXPONENTIAL GROWTH C S C S N C S Information N System N Digital Stored Pipelining Microprocessor Multi-core Computing Program Computing Digital Circuit Packet TCP/IP Transmission Switching Switching HPN Networking Flash Digital Core Disk DRAM Storage Memory Storage 3 Copyright © 2010 Juniper Networks, Inc. COMPUTER PERFORMANCE: 1988-2008 228 500,000 X over 20 years 226 224 222 220 218 System CAGR: 1.9x /year 216 214 12 2 Super Computers 210 28 26 Megahertz Megahertz / MFlops 24 Microprocessor CAGR: 1.3x /year 22 20 „88 „89 „90 „91 „92 „93 „94 „95 „96 „97 „98 „99 „00 „01 „02 „03 „04 „05 „06 „07 „08 4 Copyright © 2010 Juniper Networks, Inc. ROUTER PERFORMANCE 1988 – 2008 1000,000 X over 20 years (2x /year) 224 Post-ASIC era: 2.2x /year TX T1600 222 220 T640 M160 218 Pre-ASIC era: 1.6x /year M40 216 214 212 Interface CAGR: 1.7x /year 210 28 26 Megabits per second 24 22 20 „88 „89 „90 „91 „92 „93 „94 „95 „96 „97 „98 „99 „00 „01 „02 „03 „04 „05 „06 „07 „08 5 Copyright © 2010 Juniper Networks, Inc. SILICON THE FOUNDATION OF PERFORMANCE General-Purpose Microprocessor (G) Services Engine (S) Edge Engine Core (E) Fabric Engine Engine (C) (F) 1–10 10–100 100–1000 1000–100,000 100,000–∞ Instructions/packet 6 Copyright © 2010 Juniper Networks, Inc. COMPARISON OF SILICON TECHNOLOGIES Technology Advantages Disadvantages Use Cases Flexibility is General Poor performance, more important Very flexible Purpose CPUs density, and power than performance Field Smaller up-front Lower performance, Volume is low; Programmable development cost; density, and power; Changes are Gate Arrays Field upgrades High per part price expected Can fall short of Off-the-shelf Flexible. Jump performance, Differentiation is straight into Network power, and not important software design Processors functionality targets High High upfront cost; Tailor to your performance; Long development ASICs specification Low production cycle cost 7 Copyright © 2010 Juniper Networks, Inc. SYSTEM ARCHITECTURE Market requirements . Performance, density, feature, cost targets Software/hardware interactions . Functional partitioning Silicon process technology evaluation . Cost/performance tradeoffs Memory choices . Stores configuration, FIB tables, etc. Temporary working buffers Chip partitioning . IO and logic ratio, die size, interface simplicity 8 Copyright © 2010 Juniper Networks, Inc. ASIC PROCESS TECHNOLOGY . Greater density allows more features/functionality for the same price . Moore‟s Law: Transistor density doubles every 18 months – Holding up remarkably well. But how much longer? . While density is increasing, performance is starting to level off . The decrease in operating voltage, hence dynamic power, also slowed . Static power is becoming an issue . NRE costs associated with newer processes increasing dramatically . Architectural innovations are needed to continue to provide value to customers 9 Copyright © 2010 Juniper Networks, Inc. NETWORKING ASICS AND MEMORIES Queues / Packet Buffers Link Memory Input / Input / Output/ Output ASICs Fabric Control Memories (FIB, ACL, configs, etc.) 10 Copyright © 2010 Juniper Networks, Inc. MEMORY TECHNOLOGY CHARACTERISTICS Technology Capacity Frequency Latency Power Cost Embedded L H L M H SRAM Embedded M M L+ L M DRAM Embedded L M L+ H H TCAM External M L M H H SRAM External H M M L H RLDRAM External H+ M H L L- SDRAM External L L H H H TCAM 11 Copyright © 2010 Juniper Networks, Inc. MEMORY CHOICES WITH NETWORKING ASICS Packet buffering • Need high throughput, high density • Long bursts ok • SDRAM or RLDRAM (Reduced Latency DRAM) Queuing/Link memory • Need high throughput, low latency • Shorter bursts • SRAM, RLDRAM, or SDRAM Control memory • Need high throughput, low latency • Even smaller access quantum • SRAM, TCAM, or RLDRAM 12 Copyright © 2010 Juniper Networks, Inc. ARCHITECTURE – CHIP PARTITIONING . Fewer chips does not necessarily mean less overall cost – Chips get very expensive once Chip yield & they cross a certain die size PCB layers $ dominate – Economics of silicon is all about 1 chip fabrication yield Pins, packaging, power, PCB area dominate . Goals 2 chips – Balance size of each chip within packet forwarding engine – Minimize pin-count on each chip – Minimize overall component cost X depends on – Flexibility of support different technology configs with the same chipset Total # of transistors 13 Copyright © 2010 Juniper Networks, Inc. EXAMPLES OF SILICON PROCESS IMPROVEMENT, CHIP PARTITIONING, AND MEMORY USAGE Trio/NISP 65nm 4 Chips 1.2Bn Transistors IP3 I-Chip 180nm (90nm) 90nm 604Gbps IO IP1, 2 10 Chips 1 Chip RLDRAM/ 250nm 446m Trans 160m Transistors DDR3 SDRAM 4 Chips 412Gbps IO 219Gbps IO 18m Trans SRAM/ RLDRAM/ 47Gbps IO SRAM/ RDRAM DDR2 SDRAM SDRAM (RLDRAM) 1998 2002 2006 2009 14 Copyright © 2010 Juniper Networks, Inc. EXAMPLES: BENEFITS OF ASIC EVOLUTIONS M40 M160 T640 T1600 Slot Capacity, 3.0 10 40 100 Gbps System 40Gbps 160Gbps 640Gbps 1600Gbps Capacity Max System 1.5 KW 3.15 KW 4.52 KW 8.35 KW Draw EER 13 25 71 96 (Gbps/KW) FRS 1998 2000 2002 2007 15 Copyright © 2010 Juniper Networks, Inc. MICRO ARCHITECTURE Xchip Take each subsystem, divide into blocks, divide each block into sub- blocks, design down to the basic X_in X_out logic elements Document both functionality and X_in_a X_in_b architecture . Rigorous peer reviews of all documents X_in_b_cntl X_in_b_dp Control logic Datapath 16 Copyright © 2010 Juniper Networks, Inc. REGISTER TRANSFER LEVEL CODING Translate micro architecture for all blocks to “Register Transfer Level” code. always @ (sel or a or b) a begin out if (sel == 1) out = a; b else out = b; sel end OR assign out = sel ? a : b; . A large chip will have hundreds of thousands of lines of RTL code . Must always keep in mind physical placement and timing during the micro architecture phase – You pay now or you pay more later 17 Copyright © 2010 Juniper Networks, Inc. SYNTHESIS & TIMING Synthesis is the exercise of mapping RTL to GATES in the technology of choice INPUT – RTL code – Specification of clocks and cycle-time (frequency) – Input and output constraints for module being synthesized – Wire-load models as basis to model interconnect effects on gates – Recent trends: physical synthesis 18 Copyright © 2010 Juniper Networks, Inc. VERIFICATION Goal: First-time-right silicon . Avoid expensive ASIC respins . Simulations are far easier to debug than real chips Recipe: At least as many verification engineers as design engineers per chip TOOLS Performed at multiple levels Test-bench tool . Block level SystemVerilog . Chip level C/C++, Verilog Coverage tools . Sub-system level Equivalency checkers . System level Simulators Waveform viewers . Software/hardware co-simulation 19 Copyright © 2010 Juniper Networks, Inc. PHYSICAL DESIGN Power and clock planning Perform high-level floor-planning Place I/O, SRAMs, & Register Arrays Random logic placements Perform congestion analysis Wire up all the logic and IOs Run timing with physical placement Many iterations of all of the above 20 Copyright © 2010 Juniper Networks, Inc. PHYSICAL DESIGN EXAMPLE 1) Memory placement 2) Logic placement & clocks 3) M1 routing 4) M2 routing 5) M3 routing 6) M4 routing 7) M5 routing 8) M6 routing 9) M2/M4/M6 routing 10) M1/M3/M5 routing 21 Copyright © 2010 Juniper Networks, Inc. ASIC TAPEOUT Criteria for ASIC Tapeout . All functionality complete . All verification complete . Performance simulations meet goals . Chip is error free from a testability perspective . Chip meets timing under all process, temperature and voltage conditions . Design and verification database is archived 22 Copyright © 2010 Juniper Networks, Inc. MANUFACTURING After the ASIC is taped out . Masks are generated for photolithography . ASICs are then built layer-by-layer on a silicon substrate wafer Once the ASIC wafer is complete . Each die is tested in wafer test . Only good die are laser cut for packaging Once cut die are available . They are put in a package . The packaged devices are then tested again Tested packaged parts are put on system boards . Test with other hardware and software 23 Copyright © 2010 Juniper Networks, Inc. MANUFACTURING – CONTINUED Copper layers 300mm wafer 300mm wafer fab Packaging 24 Copyright © 2010 Juniper Networks, Inc. SUMMARY ASIC technology has transformed the network industry Silicon process technology is evolving at an impressive pace but architectural innovations are required to keep up with the demand for increasing performance at lower power A rigorous architecture, design, and verification process is required to implement complex networking ASICs There are a vast amount of architectural and design tradeoffs to be made so user community should provide feedbacks early and often 25 Copyright © 2010 Juniper Networks, Inc. .