ASICS: THE HEART OF MODERN ROUTERS

Chang-Hong Wu Distinguished Engineer, Juniper Networks THE INTERNET EXPLOSION

# Web Sites 130EB/yr

Internet Capacity 162M # Connected Devices 1B

Total Digitized Information 420EB # Google Searches/Month 100M 31B/mo

12EB/yr 40M

110EB 4PB/yr 60PB/yr 9.5M 160M 25M 33K 1 1.7M 2.7B/mo

1988 1993 1998 2003 2008 Exponential growth, no matter how you measure it! The clearest indication of value delivered to end-users

2 Copyright © 2010 Juniper Networks, Inc. DRIVING FORCE BEHIND EXPONENTIAL GROWTH

C S

C S N C S Information N System N

Digital Stored Pipelining Multi-core Computing Program Computing

Digital Circuit Packet TCP/IP Transmission Switching Switching HPN Networking Flash Digital Core Disk DRAM Storage Memory Storage

3 Copyright © 2010 Juniper Networks, Inc. COMPUTER PERFORMANCE: 1988-2008

228 500,000 X over 20 years

226

224

222

220

218 System CAGR: 1.9x /year

216

214

12 2 Super Computers 210

28

26

Megahertz Megahertz / MFlops 24 Microprocessor CAGR: 1.3x /year

22

20 „88 „89 „90 „91 „92 „93 „94 „95 „96 „97 „98 „99 „00 „01 „02 „03 „04 „05 „06 „07 „08

4 Copyright © 2010 Juniper Networks, Inc. ROUTER PERFORMANCE 1988 – 2008 1000,000 X over 20 years (2x /year)

224 Post-ASIC era: 2.2x /year TX T1600 222

220 T640 M160 218 Pre-ASIC era: 1.6x /year M40 216

214

212 Interface CAGR: 1.7x /year 210

28

26 Megabits per second

24

22

20 „88 „89 „90 „91 „92 „93 „94 „95 „96 „97 „98 „99 „00 „01 „02 „03 „04 „05 „06 „07 „08 5 Copyright © 2010 Juniper Networks, Inc. SILICON THE FOUNDATION OF PERFORMANCE

General-Purpose Microprocessor (G) Services Engine (S) Edge Engine Core (E) Fabric Engine Engine (C) (F)

1–10 10–100 100–1000 1000–100,000 100,000–∞ Instructions/packet

6 Copyright © 2010 Juniper Networks, Inc. COMPARISON OF SILICON TECHNOLOGIES

Technology Advantages Disadvantages Use Cases Flexibility is General Poor performance, more important Very flexible Purpose CPUs density, and power than performance

Field Smaller up-front Lower performance, Volume is low; Programmable development cost; density, and power; Changes are Gate Arrays Field upgrades High per part price expected

Can fall short of Off-the-shelf Flexible. Jump performance, Differentiation is straight into Network power, and not important software design Processors functionality targets High High upfront cost; Tailor to your performance; Long development ASICs specification Low production cycle cost

7 Copyright © 2010 Juniper Networks, Inc. SYSTEM ARCHITECTURE

Market requirements . Performance, density, feature, cost targets Software/hardware interactions . Functional partitioning Silicon process technology evaluation . Cost/performance tradeoffs Memory choices . Stores configuration, FIB tables, etc. . Temporary working buffers Chip partitioning . IO and logic ratio, die size, interface simplicity

8 Copyright © 2010 Juniper Networks, Inc. ASIC PROCESS TECHNOLOGY

. Greater density allows more features/functionality for the same price . Moore‟s Law: Transistor density doubles every 18 months – Holding up remarkably well. But how much longer? . While density is increasing, performance is starting to level off . The decrease in operating voltage, hence dynamic power, also slowed . Static power is becoming an issue . NRE costs associated with newer processes increasing dramatically . Architectural innovations are needed to continue to provide value to customers

9 Copyright © 2010 Juniper Networks, Inc. NETWORKING ASICS AND MEMORIES

Queues / Packet Buffers Link Memory

Input / Input / Output/ Output ASICs Fabric

Control Memories (FIB, ACL, configs, etc.)

10 Copyright © 2010 Juniper Networks, Inc. MEMORY TECHNOLOGY CHARACTERISTICS

Technology Capacity Frequency Latency Power Cost Embedded L H L M H SRAM Embedded M M L+ L M DRAM Embedded L M L+ H H TCAM External M L M H H SRAM External H M M L H RLDRAM External H+ M H L L- SDRAM External L L H H H TCAM

11 Copyright © 2010 Juniper Networks, Inc. MEMORY CHOICES WITH NETWORKING ASICS

Packet buffering • Need high throughput, high density • Long bursts ok • SDRAM or RLDRAM (Reduced Latency DRAM)

Queuing/Link memory • Need high throughput, low latency • Shorter bursts • SRAM, RLDRAM, or SDRAM

Control memory • Need high throughput, low latency • Even smaller access quantum • SRAM, TCAM, or RLDRAM

12 Copyright © 2010 Juniper Networks, Inc. ARCHITECTURE – CHIP PARTITIONING . Fewer chips does not necessarily mean less overall cost

– Chips get very expensive once Chip yield & they cross a certain die size PCB layers $ dominate – Economics of silicon is all about 1 chip fabrication yield Pins, packaging, power, PCB area dominate . Goals 2 chips – Balance size of each chip within packet forwarding engine – Minimize pin-count on each chip – Minimize overall component cost X depends on – Flexibility of support different technology configs with the same chipset Total # of transistors

13 Copyright © 2010 Juniper Networks, Inc. EXAMPLES OF SILICON PROCESS IMPROVEMENT, CHIP PARTITIONING, AND MEMORY USAGE

Trio/NISP 65nm 4 Chips 1.2Bn Transistors IP3 I-Chip 180nm (90nm) 90nm 604Gbps IO IP1, 2 10 Chips 1 Chip RLDRAM/ 250nm 446m Trans 160m Transistors DDR3 SDRAM 4 Chips 412Gbps IO 219Gbps IO 18m Trans SRAM/ RLDRAM/ 47Gbps IO SRAM/ RDRAM DDR2 SDRAM SDRAM (RLDRAM)

1998 2002 2006 2009

14 Copyright © 2010 Juniper Networks, Inc. EXAMPLES: BENEFITS OF ASIC EVOLUTIONS

M40 M160 T640 T1600

Slot Capacity, 3.0 10 40 100 Gbps System 40Gbps 160Gbps 640Gbps 1600Gbps Capacity Max System 1.5 KW 3.15 KW 4.52 KW 8.35 KW Draw EER 13 25 71 96 (Gbps/KW)

FRS 1998 2000 2002 2007

15 Copyright © 2010 Juniper Networks, Inc. MICRO ARCHITECTURE

Xchip Take each subsystem, divide into blocks, divide each block into sub- blocks, design down to the basic X_in X_out logic elements Document both functionality and X_in_a X_in_b architecture . Rigorous peer reviews of all documents X_in_b_cntl X_in_b_dp

Control logic Datapath

16 Copyright © 2010 Juniper Networks, Inc. REGISTER TRANSFER LEVEL CODING

Translate micro architecture for all blocks to “Register Transfer Level” code.

always @ (sel or a or b) a begin out if (sel == 1) out = a; b else out = b; sel end OR assign out = sel ? a : b;

. A large chip will have hundreds of thousands of lines of RTL code . Must always keep in mind physical placement and timing during the micro architecture phase – You pay now or you pay more later

17 Copyright © 2010 Juniper Networks, Inc. SYNTHESIS & TIMING

Synthesis is the exercise of mapping RTL to GATES in the technology of choice INPUT – RTL code – Specification of clocks and cycle-time (frequency) – Input and output constraints for module being synthesized – Wire-load models as basis to model interconnect effects on gates – Recent trends: physical synthesis

18 Copyright © 2010 Juniper Networks, Inc. VERIFICATION

Goal: First-time-right silicon . Avoid expensive ASIC respins . Simulations are far easier to debug than real chips Recipe: At least as many verification engineers as design engineers per chip TOOLS Performed at multiple levels Test-bench tool . Block level SystemVerilog . Chip level C/C++, Verilog Coverage tools . Sub-system level Equivalency checkers . System level Simulators Waveform viewers . Software/hardware co-simulation

19 Copyright © 2010 Juniper Networks, Inc. PHYSICAL DESIGN

Power and clock planning Perform high-level floor-planning Place I/O, SRAMs, & Register Arrays Random logic placements Perform congestion analysis Wire up all the logic and IOs Run timing with physical placement Many iterations of all of the above

20 Copyright © 2010 Juniper Networks, Inc. PHYSICAL DESIGN EXAMPLE

1) Memory placement 2) Logic placement & clocks 3) M1 routing 4) M2 routing 5) M3 routing 6) M4 routing 7) M5 routing 8) M6 routing 9) M2/M4/M6 routing 10) M1/M3/M5 routing

21 Copyright © 2010 Juniper Networks, Inc. ASIC TAPEOUT

Criteria for ASIC Tapeout . All functionality complete . All verification complete . Performance simulations meet goals . Chip is error free from a testability perspective . Chip meets timing under all process, temperature and voltage conditions . Design and verification database is archived

22 Copyright © 2010 Juniper Networks, Inc. MANUFACTURING

After the ASIC is taped out . Masks are generated for photolithography . ASICs are then built layer-by-layer on a silicon substrate wafer Once the ASIC wafer is complete . Each die is tested in wafer test . Only good die are laser cut for packaging Once cut die are available . They are put in a package . The packaged devices are then tested again Tested packaged parts are put on system boards . Test with other hardware and software

23 Copyright © 2010 Juniper Networks, Inc. MANUFACTURING – CONTINUED

Copper layers

300mm wafer

300mm wafer fab Packaging

24 Copyright © 2010 Juniper Networks, Inc. SUMMARY

ASIC technology has transformed the network industry Silicon process technology is evolving at an impressive pace but architectural innovations are required to keep up with the demand for increasing performance at lower power A rigorous architecture, design, and verification process is required to implement complex networking ASICs There are a vast amount of architectural and design tradeoffs to be made so user community should provide feedbacks early and often

25 Copyright © 2010 Juniper Networks, Inc.