<<

Microarchitecture of a High-Radix Router

John Kim, William J. Dally, Brian Towles1, and Amit K. Gupta

Concurrent VLSI Architecture 1D.E. Shaw Stanford University Research & Development

ISCA’05 High-Radix Routers 1 Interconnection Network

Supercom puter netw orks: Cray X1 Router Fabrics: A viciTSR On-chip N etw orks: M IT RA W

I/O Interconnects: M yrinet/Infiniband

ISCA’05 High-Radix Routers 2 Bandwidth Trend

10000 Torus Routing Chip Intel iPSC/2 ) s / J-Machine b

G 1000 CM-5 (

e Intel Paragon XP d o

n Cray T3D

r 100

e MIT Alewife t u

o IBM Vulcan r

r Cray T3E e 10 p SGI Origin 2000 h t

d AlphaServer GS320 i

w Bandwidth growth result of 1 IBM SP Switch2 d n

a 1. Increase in sigQnuaaldinricgs rQastNeet b Cray X1 0.1 2. Increase in numVebleior 3 o00f 3signals 1985 1990 1995 2000 2005 2010 IBM HPS year SGI Altix 3000

ISCA’05 High-Radix Routers 3 Bandwidth Trend

10000 Torus Routing Chip Intel iPSC/2 ) s / J-Machine b

G 1000 CM-5 (

e Intel Paragon XP d o

n Cray T3D

r 100

e MIT Alewife t u

o IBM Vulcan r

r Cray T3E e 10 Approximately an p SGI Origin 2000

h order of magnitude t

d AlphaServer GS320

i increase in

w 1 bandwidth every 5 IBM SP Switch2 d

n years

a Quadrics QsNet b Cray X1 0.1 How to effectively utilize the Vinelico 3r0e0a3 sed 1985 1990 1995 2000 2005 2010 bandwidth? IBM HPS year SGI Altix 3000

ISCA’05 High-Radix Routers 4 Current Interconnection Networks

• Earlier works [Dally ’90, Agarwal ’91] suggested “lower-radix” networks – Examples: Torus Routing Chip, Cray T3E, SGI Altix 3000 • Current Routers – Myrinet : radix-32 – Quadrics : radix-8 – IBM SP2 Switch : radix-8

• Technology trends – Increasing Bandwidth – Optical signaling

High-Radix Routers can take better advantage of these technology trends.

ISCA’05 High-Radix Routers 5 Outline

• Technology Trends for High- Radix Router • Motivation for High- Radix Router • High- Radix Router Architectures – Baseline Architecture – Fully Buffered Crossbar – Hierarchical Crossbar • Conclusion & Future Work

ISCA’05 High-Radix Routers 6 High-Radix Router

Router

Router

ISCA’05 High-Radix Routers 7 High-Radix Router

Router

Router Router

Low-radix (small of fat ports) High-radix (large number of skinny ports)

ISCA’05 High-Radix Routers 8 Latency in a Network

Latency = Header Latency + Serialization Latency

= H ttrr ++ LL // bb

= 2trlogkN + 2kL / B where k = radix B = total Bandwidth N = # of nodes L = message size

ISCA’05 High-Radix Routers 9 Latency vs. Radix

2003 technology 2010 technology 300 Header latency 250 decreases Serialization latency increases ) c

e 200 s

n Optimal radix ~ 40 (

y 150 c n e

t 100 a l Optimal radix ~ 128 50

0 0 50 100 150 200 250 radix

ISCA’05 High-Radix Routers 10 Determining Optimal Radix

Latency = Header Latency + Serialization Latency

= H tr + L / b = 2trlogkN + 2kL / B where k = radix B = total Bandwidth N = # of nodes L = message size

Optimal radix

Ï k log2 k = (B tr log N) / L = Aspect Ratio

ISCA’05 High-Radix Routers 11 Higher Aspect Ratio, Higher Optimal Radix

1000 )

k 2010 (

100 x i d a

R 2003

l a

m 1996 i t

p 10 O

1991

1 10 100 1000 10000 Aspect Ratio

ISCA’05 High-Radix Routers 12 Higher Radix, Lower Cost

2003 technology 2010 technology 8 ) s l 7 e n

n 6 a h

c 5

0

0 4 0 1

f 3 o

#

( 2

t

s 1 o c 0 0 50 100 150 200 250 radix

ISCA’05 High-Radix Routers 13 Outline

• Technology Trend for High- Radix Router • Motivation for High- Radix Router • High- Radix Router Architectures – Baseline Architecture – Fully Buffered Crossbar – Hierarchical Crossbar • Conclusion & Future Work

ISCA’05 High-Radix Routers 14 Virtual Channel Router Architecture

ISCA’05 High-Radix Routers 15 High-Radix Switch Architectures (I)

input 1 input 2 ‡ ‡ ‡

input k o o o u u ‡ ‡ ‡ u t t t p p p u u u t t t

1 2 k

(a) Baseline design

ISCA’05 High-Radix Routers 16 Simulation Methodology

• Open-loop simulation done with steady-state measurement • Output switch arbitration required two cycles • Wire delay included in the pipeline • Uniform random traffic pattern • Each packet was assumed to be a single flit • Radix=64 router evaluated with 4 VCs per input

• Metrics – latency & throughput • Cost – area

ISCA’05 High-Radix Routers 17 Baseline Performance Evaluation

50 low-radix 40 ) s e l c

y 30 c (

y c

n 20 e t a l 10

0 0 0.2 0.4 0.6 0.8 1 offered load

ISCA’05 High-Radix Routers 18 Baseline Performance Evaluation

50 low-radix 40

) baseline s e l (high-radix) c

y 30 c (

y c

n 20 e t a l Low 10

radix 0 better 0 0.2 0.4 0.6 0.8 1 offered load

ISCA’05 High-Radix Routers 19 High-Radix Switch Architectures (II)

input 1 input 1 input 1 input 2 input 2 input 2 ‡ ‡ ‡ ‡ ‡ ‡ ‡ ‡ ‡ input k input k input k o o o o o ‡ ‡ ‡ o ‡ ‡ ‡ o o o u u u u u u u u u ‡ ‡ ‡ t t t t t t p p p p p p t t t p p p u u u u u u u u u t t t t t t

t t t 1 2 1 k 2 k

1 k 2

(a) Baseline design (b) Fully buffered Fully buffered crossbar crossbar with shared buffers

ISCA’05 High-Radix Routers 20 Fully Buffered Crossbar Provides High Performance but ….

50 low-radix baseline 40

) fully-buffered s e l

3c 0 y c (

y c

2n 0 e t a l 10

0 0 0.2 0.4 0.6 0.8 1 offered load

ISCA’05 High-Radix Routers 21 … Becomes Costly to build

wire area buffer area 200

150 ) 2 m

m Buffer area (

100 a dominates at e r

a radix 50 50 Other issues: 0 1. Credit Information 0 50 100 radix150 200 250 2. Adaptive Routing

ISCA’05 High-Radix Routers 22 High-Radix Switch Architectures (III)

input 1 input 1 subswitch input 2 input 2 ‡ ‡ ‡ ‡ ‡ ‡ input k input k o o ‡ ‡ ‡ o o o o u u u u u u ‡ ‡ ‡ t t t p p p t t t p p p u u u u u u t t t

t t t 1 2 k

1 k 2

(a) Baseline design (b) Fully buffered (c) Hierarchical crossbar crossbar

ISCA’05 High-Radix Routers 23 Hierarchical Crossbar Performance on Uniform Random Traffic

50 subswitch baseline subswitch 32 40

) subswitch 16 s e subswitch 8 l c 30 y subswitch 4 c ( fully-buffered y c

n 20 e t a l 10

0 0 0.2 0.4 0.6 0.8 1 offered load

ISCA’05 High-Radix Routers 24 Worst-case Traffic Pattern

50 subswitch baseline subswitch 32 40

) subswitch 16 s e subswitch 8 l c 30 y subswitch 4 c ( fully-buffered y c

n 20 e t a l 10

0 0 0.2 0.4 0.6 0.8 1 offered load

ISCA’05 High-Radix Routers 25 Worst Case Performance Comparison

50 baseline subswitch 32 40 subswitch 16 ) s

e subswitch 8 l

c 30 subswitch 4 y c

( fully-buffered

y c

n 20 e t a l 10

0 0 0.2 0.4 0.6 0.8 1 offered load

ISCA’05 High-Radix Routers 26 4k Node Network Performance

low-radix router high-radix router

100

) 80

High-Rs adix leads e l to lowc er zero– y 60 c ( load la tency y c

n 40 e t a l 20

0 0 0.2 0.4 0.6 0.8 1 offered load

ISCA’05 High-Radix Routers 27 High-Radix Router Microarchitecture

• In building high- radix router, there are scaling issues as the number of ports increase • Baseline design provides poor performance • Fully buffered crossbar leads to high performance but a costly design • Hierarchical crossbar provides a feasible design with minimal loss in performance

ISCA’05 High-Radix Routers 28 Conclusion

• Performance of digital systems is often limited by the interconnection network • Increasing on- chip bandwidth can be more efficiently used by a high-radix routers. • High- radix lead to lower cost and lower latency networks • A high- radix router requires a different architecture • A hierarchical organization makes high- radix routers feasible

ISCA’05 High-Radix Routers 29 Future Work

• Optimal topology for high- radix network – Clos topology suited for high- radix routers • Routing - How to adaptively route in a high- radix router? • Flow- control with high- radix router

• Simulating a larger network

• Microarchitecture - Alternative to crossbars : multistage switch organizations, network- on- chip (torus, mesh, etc.)

ISCA’05 High-Radix Routers 30 Thank you

Questions?

ISCA’05 High-Radix Routers 31