Microarchitecture of a High-Radix Router
John Kim, William J. Dally, Brian Towles1, and Amit K. Gupta
Concurrent VLSI Architecture 1D.E. Shaw Stanford University Research & Development
ISCA’05 High-Radix Routers 1 Interconnection Network
Supercom puter netw orks: Cray X1 Router Fabrics: A viciTSR On-chip N etw orks: M IT RA W
I/O Interconnects: M yrinet/Infiniband
ISCA’05 High-Radix Routers 2 Bandwidth Trend
10000 Torus Routing Chip Intel iPSC/2 ) s / J-Machine b
G 1000 CM-5 (
e Intel Paragon XP d o
n Cray T3D
r 100
e MIT Alewife t u
o IBM Vulcan r
r Cray T3E e 10 p SGI Origin 2000 h t
d AlphaServer GS320 i
w Bandwidth growth result of 1 IBM SP Switch2 d n
a 1. Increase in sigQnuaaldinricgs rQastNeet b Cray X1 0.1 2. Increase in numVebleior 3 o00f 3signals 1985 1990 1995 2000 2005 2010 IBM HPS year SGI Altix 3000
ISCA’05 High-Radix Routers 3 Bandwidth Trend
10000 Torus Routing Chip Intel iPSC/2 ) s / J-Machine b
G 1000 CM-5 (
e Intel Paragon XP d o
n Cray T3D
r 100
e MIT Alewife t u
o IBM Vulcan r
r Cray T3E e 10 Approximately an p SGI Origin 2000
h order of magnitude t
d AlphaServer GS320
i increase in
w 1 bandwidth every 5 IBM SP Switch2 d
n years
a Quadrics QsNet b Cray X1 0.1 How to effectively utilize the Vinelico 3r0e0a3 sed 1985 1990 1995 2000 2005 2010 bandwidth? IBM HPS year SGI Altix 3000
ISCA’05 High-Radix Routers 4 Current Interconnection Networks
• Earlier works [Dally ’90, Agarwal ’91] suggested “lower-radix” networks – Examples: Torus Routing Chip, Cray T3E, SGI Altix 3000 • Current Routers – Myrinet : radix-32 – Quadrics : radix-8 – IBM SP2 Switch : radix-8
• Technology trends – Increasing Bandwidth – Optical signaling
High-Radix Routers can take better advantage of these technology trends.
ISCA’05 High-Radix Routers 5 Outline
• Technology Trends for High- Radix Router • Motivation for High- Radix Router • High- Radix Router Architectures – Baseline Architecture – Fully Buffered Crossbar – Hierarchical Crossbar • Conclusion & Future Work
ISCA’05 High-Radix Routers 6 High-Radix Router
Router
Router
ISCA’05 High-Radix Routers 7 High-Radix Router
Router
Router Router
Low-radix (small number of fat ports) High-radix (large number of skinny ports)
ISCA’05 High-Radix Routers 8 Latency in a Network
Latency = Header Latency + Serialization Latency
= H ttrr ++ LL // bb
= 2trlogkN + 2kL / B where k = radix B = total Bandwidth N = # of nodes L = message size
ISCA’05 High-Radix Routers 9 Latency vs. Radix
2003 technology 2010 technology 300 Header latency 250 decreases Serialization latency increases ) c
e 200 s
n Optimal radix ~ 40 (
y 150 c n e
t 100 a l Optimal radix ~ 128 50
0 0 50 100 150 200 250 radix
ISCA’05 High-Radix Routers 10 Determining Optimal Radix
Latency = Header Latency + Serialization Latency
= H tr + L / b = 2trlogkN + 2kL / B where k = radix B = total Bandwidth N = # of nodes L = message size
Optimal radix
Ï k log2 k = (B tr log N) / L = Aspect Ratio
ISCA’05 High-Radix Routers 11 Higher Aspect Ratio, Higher Optimal Radix
1000 )
k 2010 (
100 x i d a
R 2003
l a
m 1996 i t
p 10 O
1991
1 10 100 1000 10000 Aspect Ratio
ISCA’05 High-Radix Routers 12 Higher Radix, Lower Cost
2003 technology 2010 technology 8 ) s l 7 e n
n 6 a h
c 5
0
0 4 0 1
f 3 o
#
( 2
t
s 1 o c 0 0 50 100 150 200 250 radix
ISCA’05 High-Radix Routers 13 Outline
• Technology Trend for High- Radix Router • Motivation for High- Radix Router • High- Radix Router Architectures – Baseline Architecture – Fully Buffered Crossbar – Hierarchical Crossbar • Conclusion & Future Work
ISCA’05 High-Radix Routers 14 Virtual Channel Router Architecture
ISCA’05 High-Radix Routers 15 High-Radix Switch Architectures (I)
input 1 input 2 ‡ ‡ ‡
input k o o o u u ‡ ‡ ‡ u t t t p p p u u u t t t
1 2 k
(a) Baseline design
ISCA’05 High-Radix Routers 16 Simulation Methodology
• Open-loop simulation done with steady-state measurement • Output switch arbitration required two cycles • Wire delay included in the pipeline • Uniform random traffic pattern • Each packet was assumed to be a single flit • Radix=64 router evaluated with 4 VCs per input
• Metrics – latency & throughput • Cost – area
ISCA’05 High-Radix Routers 17 Baseline Performance Evaluation
50 low-radix 40 ) s e l c
y 30 c (
y c
n 20 e t a l 10
0 0 0.2 0.4 0.6 0.8 1 offered load
ISCA’05 High-Radix Routers 18 Baseline Performance Evaluation
50 low-radix 40
) baseline s e l (high-radix) c
y 30 c (
y c
n 20 e t a l Low 10
radix 0 better 0 0.2 0.4 0.6 0.8 1 offered load
ISCA’05 High-Radix Routers 19 High-Radix Switch Architectures (II)
input 1 input 1 input 1 input 2 input 2 input 2 ‡ ‡ ‡ ‡ ‡ ‡ ‡ ‡ ‡ input k input k input k o o o o o ‡ ‡ ‡ o ‡ ‡ ‡ o o o u u u u u u u u u ‡ ‡ ‡ t t t t t t p p p p p p t t t p p p u u u u u u u u u t t t t t t
t t t 1 2 1 k 2 k
1 k 2
(a) Baseline design (b) Fully buffered Fully buffered crossbar crossbar with shared buffers
ISCA’05 High-Radix Routers 20 Fully Buffered Crossbar Provides High Performance but ….
50 low-radix baseline 40
) fully-buffered s e l
3c 0 y c (
y c
2n 0 e t a l 10
0 0 0.2 0.4 0.6 0.8 1 offered load
ISCA’05 High-Radix Routers 21 … Becomes Costly to build
wire area buffer area 200
150 ) 2 m
m Buffer area (
100 a dominates at e r
a radix 50 50 Other issues: 0 1. Credit Information 0 50 100 radix150 200 250 2. Adaptive Routing
ISCA’05 High-Radix Routers 22 High-Radix Switch Architectures (III)
input 1 input 1 subswitch input 2 input 2 ‡ ‡ ‡ ‡ ‡ ‡ input k input k o o ‡ ‡ ‡ o o o o u u u u u u ‡ ‡ ‡ t t t p p p t t t p p p u u u u u u t t t
t t t 1 2 k
1 k 2
(a) Baseline design (b) Fully buffered (c) Hierarchical crossbar crossbar
ISCA’05 High-Radix Routers 23 Hierarchical Crossbar Performance on Uniform Random Traffic
50 subswitch baseline subswitch 32 40
) subswitch 16 s e subswitch 8 l c 30 y subswitch 4 c ( fully-buffered y c
n 20 e t a l 10
0 0 0.2 0.4 0.6 0.8 1 offered load
ISCA’05 High-Radix Routers 24 Worst-case Traffic Pattern
50 subswitch baseline subswitch 32 40
) subswitch 16 s e subswitch 8 l c 30 y subswitch 4 c ( fully-buffered y c
n 20 e t a l 10
0 0 0.2 0.4 0.6 0.8 1 offered load
ISCA’05 High-Radix Routers 25 Worst Case Performance Comparison
50 baseline subswitch 32 40 subswitch 16 ) s
e subswitch 8 l
c 30 subswitch 4 y c
( fully-buffered
y c
n 20 e t a l 10
0 0 0.2 0.4 0.6 0.8 1 offered load
ISCA’05 High-Radix Routers 26 4k Node Network Performance
low-radix router high-radix router
100
) 80
High-Rs adix leads e l to lowc er zero– y 60 c ( load la tency y c
n 40 e t a l 20
0 0 0.2 0.4 0.6 0.8 1 offered load
ISCA’05 High-Radix Routers 27 High-Radix Router Microarchitecture
• In building high- radix router, there are scaling issues as the number of ports increase • Baseline design provides poor performance • Fully buffered crossbar leads to high performance but a costly design • Hierarchical crossbar provides a feasible design with minimal loss in performance
ISCA’05 High-Radix Routers 28 Conclusion
• Performance of digital systems is often limited by the interconnection network • Increasing on- chip bandwidth can be more efficiently used by a high-radix routers. • High- radix lead to lower cost and lower latency networks • A high- radix router requires a different architecture • A hierarchical organization makes high- radix routers feasible
ISCA’05 High-Radix Routers 29 Future Work
• Optimal topology for high- radix network – Clos topology suited for high- radix routers • Routing - How to adaptively route in a high- radix router? • Flow- control with high- radix router
• Simulating a larger network
• Microarchitecture - Alternative to crossbars : multistage switch organizations, network- on- chip (torus, mesh, etc.)
ISCA’05 High-Radix Routers 30 Thank you
Questions?
ISCA’05 High-Radix Routers 31