ASICS: THE HEART OF MODERN ROUTERS
Chang-Hong Wu Distinguished Engineer, Juniper Networks THE INTERNET EXPLOSION
# Web Sites 130EB/yr
Internet Capacity 162M # Connected Devices 1B
Total Digitized Information 420EB # Google Searches/Month 100M 31B/mo
12EB/yr 40M
110EB 4PB/yr 60PB/yr 9.5M 160M 25M 33K 1 1.7M 2.7B/mo
1988 1993 1998 2003 2008 Exponential growth, no matter how you measure it! The clearest indication of value delivered to end-users
2 Copyright © 2010 Juniper Networks, Inc. DRIVING FORCE BEHIND EXPONENTIAL GROWTH
C S
C S N C S Information N System N
Digital Stored Pipelining Microprocessor Multi-core Computing Program Computing
Digital Circuit Packet TCP/IP Transmission Switching Switching HPN Networking Flash Digital Core Disk DRAM Storage Memory Storage
3 Copyright © 2010 Juniper Networks, Inc. COMPUTER PERFORMANCE: 1988-2008
228 500,000 X over 20 years
226
224
222
220
218 System CAGR: 1.9x /year
216
214
12 2 Super Computers 210
28
26
Megahertz Megahertz / MFlops 24 Microprocessor CAGR: 1.3x /year
22
20 „88 „89 „90 „91 „92 „93 „94 „95 „96 „97 „98 „99 „00 „01 „02 „03 „04 „05 „06 „07 „08
4 Copyright © 2010 Juniper Networks, Inc. ROUTER PERFORMANCE 1988 – 2008 1000,000 X over 20 years (2x /year)
224 Post-ASIC era: 2.2x /year TX T1600 222
220 T640 M160 218 Pre-ASIC era: 1.6x /year M40 216
214
212 Interface CAGR: 1.7x /year 210
28
26 Megabits per second
24
22
20 „88 „89 „90 „91 „92 „93 „94 „95 „96 „97 „98 „99 „00 „01 „02 „03 „04 „05 „06 „07 „08 5 Copyright © 2010 Juniper Networks, Inc. SILICON THE FOUNDATION OF PERFORMANCE
General-Purpose Microprocessor (G) Services Engine (S) Edge Engine Core (E) Fabric Engine Engine (C) (F)
1–10 10–100 100–1000 1000–100,000 100,000–∞ Instructions/packet
6 Copyright © 2010 Juniper Networks, Inc. COMPARISON OF SILICON TECHNOLOGIES
Technology Advantages Disadvantages Use Cases Flexibility is General Poor performance, more important Very flexible Purpose CPUs density, and power than performance
Field Smaller up-front Lower performance, Volume is low; Programmable development cost; density, and power; Changes are Gate Arrays Field upgrades High per part price expected
Can fall short of Off-the-shelf Flexible. Jump performance, Differentiation is straight into Network power, and not important software design Processors functionality targets High High upfront cost; Tailor to your performance; Long development ASICs specification Low production cycle cost
7 Copyright © 2010 Juniper Networks, Inc. SYSTEM ARCHITECTURE
Market requirements . Performance, density, feature, cost targets Software/hardware interactions . Functional partitioning Silicon process technology evaluation . Cost/performance tradeoffs Memory choices . Stores configuration, FIB tables, etc. . Temporary working buffers Chip partitioning . IO and logic ratio, die size, interface simplicity
8 Copyright © 2010 Juniper Networks, Inc. ASIC PROCESS TECHNOLOGY
. Greater density allows more features/functionality for the same price . Moore‟s Law: Transistor density doubles every 18 months – Holding up remarkably well. But how much longer? . While density is increasing, performance is starting to level off . The decrease in operating voltage, hence dynamic power, also slowed . Static power is becoming an issue . NRE costs associated with newer processes increasing dramatically . Architectural innovations are needed to continue to provide value to customers
9 Copyright © 2010 Juniper Networks, Inc. NETWORKING ASICS AND MEMORIES
Queues / Packet Buffers Link Memory
Input / Input / Output/ Output ASICs Fabric
Control Memories (FIB, ACL, configs, etc.)
10 Copyright © 2010 Juniper Networks, Inc. MEMORY TECHNOLOGY CHARACTERISTICS
Technology Capacity Frequency Latency Power Cost Embedded L H L M H SRAM Embedded M M L+ L M DRAM Embedded L M L+ H H TCAM External M L M H H SRAM External H M M L H RLDRAM External H+ M H L L- SDRAM External L L H H H TCAM
11 Copyright © 2010 Juniper Networks, Inc. MEMORY CHOICES WITH NETWORKING ASICS
Packet buffering • Need high throughput, high density • Long bursts ok • SDRAM or RLDRAM (Reduced Latency DRAM)
Queuing/Link memory • Need high throughput, low latency • Shorter bursts • SRAM, RLDRAM, or SDRAM
Control memory • Need high throughput, low latency • Even smaller access quantum • SRAM, TCAM, or RLDRAM
12 Copyright © 2010 Juniper Networks, Inc. ARCHITECTURE – CHIP PARTITIONING . Fewer chips does not necessarily mean less overall cost
– Chips get very expensive once Chip yield & they cross a certain die size PCB layers $ dominate – Economics of silicon is all about 1 chip fabrication yield Pins, packaging, power, PCB area dominate . Goals 2 chips – Balance size of each chip within packet forwarding engine – Minimize pin-count on each chip – Minimize overall component cost X depends on – Flexibility of support different technology configs with the same chipset Total # of transistors
13 Copyright © 2010 Juniper Networks, Inc. EXAMPLES OF SILICON PROCESS IMPROVEMENT, CHIP PARTITIONING, AND MEMORY USAGE
Trio/NISP 65nm 4 Chips 1.2Bn Transistors IP3 I-Chip 180nm (90nm) 90nm 604Gbps IO IP1, 2 10 Chips 1 Chip RLDRAM/ 250nm 446m Trans 160m Transistors DDR3 SDRAM 4 Chips 412Gbps IO 219Gbps IO 18m Trans SRAM/ RLDRAM/ 47Gbps IO SRAM/ RDRAM DDR2 SDRAM SDRAM (RLDRAM)
1998 2002 2006 2009
14 Copyright © 2010 Juniper Networks, Inc. EXAMPLES: BENEFITS OF ASIC EVOLUTIONS
M40 M160 T640 T1600
Slot Capacity, 3.0 10 40 100 Gbps System 40Gbps 160Gbps 640Gbps 1600Gbps Capacity Max System 1.5 KW 3.15 KW 4.52 KW 8.35 KW Draw EER 13 25 71 96 (Gbps/KW)
FRS 1998 2000 2002 2007
15 Copyright © 2010 Juniper Networks, Inc. MICRO ARCHITECTURE
Xchip Take each subsystem, divide into blocks, divide each block into sub- blocks, design down to the basic X_in X_out logic elements Document both functionality and X_in_a X_in_b architecture . Rigorous peer reviews of all documents X_in_b_cntl X_in_b_dp
Control logic Datapath
16 Copyright © 2010 Juniper Networks, Inc. REGISTER TRANSFER LEVEL CODING
Translate micro architecture for all blocks to “Register Transfer Level” code.
always @ (sel or a or b) a begin out if (sel == 1) out = a; b else out = b; sel end OR assign out = sel ? a : b;
. A large chip will have hundreds of thousands of lines of RTL code . Must always keep in mind physical placement and timing during the micro architecture phase – You pay now or you pay more later
17 Copyright © 2010 Juniper Networks, Inc. SYNTHESIS & TIMING
Synthesis is the exercise of mapping RTL to GATES in the technology of choice INPUT – RTL code – Specification of clocks and cycle-time (frequency) – Input and output constraints for module being synthesized – Wire-load models as basis to model interconnect effects on gates – Recent trends: physical synthesis
18 Copyright © 2010 Juniper Networks, Inc. VERIFICATION
Goal: First-time-right silicon . Avoid expensive ASIC respins . Simulations are far easier to debug than real chips Recipe: At least as many verification engineers as design engineers per chip TOOLS Performed at multiple levels Test-bench tool . Block level SystemVerilog . Chip level C/C++, Verilog Coverage tools . Sub-system level Equivalency checkers . System level Simulators Waveform viewers . Software/hardware co-simulation
19 Copyright © 2010 Juniper Networks, Inc. PHYSICAL DESIGN
Power and clock planning Perform high-level floor-planning Place I/O, SRAMs, & Register Arrays Random logic placements Perform congestion analysis Wire up all the logic and IOs Run timing with physical placement Many iterations of all of the above
20 Copyright © 2010 Juniper Networks, Inc. PHYSICAL DESIGN EXAMPLE
1) Memory placement 2) Logic placement & clocks 3) M1 routing 4) M2 routing 5) M3 routing 6) M4 routing 7) M5 routing 8) M6 routing 9) M2/M4/M6 routing 10) M1/M3/M5 routing
21 Copyright © 2010 Juniper Networks, Inc. ASIC TAPEOUT
Criteria for ASIC Tapeout . All functionality complete . All verification complete . Performance simulations meet goals . Chip is error free from a testability perspective . Chip meets timing under all process, temperature and voltage conditions . Design and verification database is archived
22 Copyright © 2010 Juniper Networks, Inc. MANUFACTURING
After the ASIC is taped out . Masks are generated for photolithography . ASICs are then built layer-by-layer on a silicon substrate wafer Once the ASIC wafer is complete . Each die is tested in wafer test . Only good die are laser cut for packaging Once cut die are available . They are put in a package . The packaged devices are then tested again Tested packaged parts are put on system boards . Test with other hardware and software
23 Copyright © 2010 Juniper Networks, Inc. MANUFACTURING – CONTINUED
Copper layers
300mm wafer
300mm wafer fab Packaging
24 Copyright © 2010 Juniper Networks, Inc. SUMMARY
ASIC technology has transformed the network industry Silicon process technology is evolving at an impressive pace but architectural innovations are required to keep up with the demand for increasing performance at lower power A rigorous architecture, design, and verification process is required to implement complex networking ASICs There are a vast amount of architectural and design tradeoffs to be made so user community should provide feedbacks early and often
25 Copyright © 2010 Juniper Networks, Inc.