Routers • How do routers process IP packets • How do you build a 15-744: Computer Networking • Assigned reading • [P+98] A 50 Gb/s IP Router • [D+97] Small Forwarding Tables for Fast Lookups L-4 Routers

© Srinivasan Seshan, 2001 L -4; 1-24 -01 2

Forwarding vs. Routing Outline

• Forwarding: the process of moving packets from • Alternative methods for input to output • The forwarding table • IP packet routing • Information in the packet • Routing: process by which the forwarding table is • Variable prefix match built and maintained • One or more routing protocols • Procedures (algorithms) to convert routing info to • IP router design forwarding table. • Routing protocols – distance vector

© Srinivasan Seshan, 2001 L -4; 1-24 -01 3 © Srinivasan Seshan, 2001 L -4; 1-24 -01 4

Techniques for Forwarding Packets Source Routing • Source routing • List entire path in packet • Packet carries path • Driving directions (north 3 hops, east, etc..) • Table of virtual circuits • Router processing • Connection routed through network to setup • Examine first step in directions state • Strip first step from packet • Packets forwarded using connection state • Forward to step just stripped off • Table of global addresses (IP) • Routers keep next for destination • Packets carry destination address

© Srinivasan Seshan, 2001 L -4; 1-24 -01 5 © Srinivasan Seshan, 2001 L -4; 1-24 -01 6

1 Source Routing Example Source Routing • Advantages • Switches can be very simple and fast • Disadvantages Packet 3,4,3 4,3 2 2 Sender R1 R1 • Variable (unbounded) header size 1 3 1 3 4 4 • Sources must know or discover topology (e.g., 3 failures) 2 1 R2 Receiver 3 • Typical use 4 • Ad-hoc networks (DSR) • Machine room networks (Myrinet)

© Srinivasan Seshan, 2001 L -4; 1-24 -01 7 © Srinivasan Seshan, 2001 L -4; 1-24 -01 8

Virtual Circuits/Tag Switching Virtual Circuits Examples • Connection setup phase • Use other means to route setup request • Each router allocates flow ID on local link • Creates mapping of inbound flow ID/port to outbound Packet 5 7 flow ID/port 2 2

Sender R1 R1 1,7 à 4,2 1 3 1 3 • Each packet carries connection ID 4 4 st • Sent from source with 1 hop connection ID 1,5 à 3,7 2 • Router processing 2 1 R2 Receiver 3 • Lookup flow ID – simple table lookup 4 6 • Replace flow ID with outgoing flow ID 2,2 à 3,6 • Forward to output port

© Srinivasan Seshan, 2001 L -4; 1-24 -01 9 © Srinivasan Seshan, 2001 L -4; 1-24 -01 10

Virtual Circuits IP Datagrams on Virtual Circuits • Advantages • Challenge – when to setup connections • More efficient lookup (simple table lookup) • At bootup time – permanent virtual circuits • More flexible (different path for each flow) (PVC) • Can reserve bandwidth at connection setup • Easier for hardware implementations • Large number of circuits • Disadvantages • For every packet transmission • Still need to route connection setup request • Connection setup is expensive • More complex failure recovery – must recreate • For every connection connection state • What is a connection? • Typical uses • How to route connectionless traffic? • ATM – combined with fix sized cells • MPLS – tag switching for IP networks

© Srinivasan Seshan, 2001 L -4; 1-24 -01 11 © Srinivasan Seshan, 2001 L -4; 1-24 -01 12

2 IP Datagrams on Virtual Circuits Global Addresses (IP) • Traffic pattern • Each packet has destination address • Few long lived flows • Each switch has forwarding table of • Flow – set of data packets from source to destination à next hop destination • At v and x: destination à east • Large percentage of packet traffic • At w and y: destination à south • Improving forwarding performance by using • At z: destination à north virtual circuits for these flows • Distributed routing algorithm for calculating • Other traffic uses normal IP forwarding forwarding tables

© Srinivasan Seshan, 2001 L -4; 1-24 -01 13 © Srinivasan Seshan, 2001 L -4; 1-24 -01 14

Global Address Example Router Table Size • One entry for every host on the • 100M entries,doubling every year • One entry for every LAN Packet R R 2 2

Sender R1 R1 R à 4 • Every host on LAN shares prefix 1 3 1 3 4 4 • Still too many, doubling every year R à 3 R

2 • One entry for every organization 1 R2 Receiver 3 4 R • Every host in organization shares prefix R à 3 • Requires careful address allocation

© Srinivasan Seshan, 2001 L -4; 1-24 -01 15 © Srinivasan Seshan, 2001 L -4; 1-24 -01 16

Outline Original IP Route Lookup • Alternative methods for packet forwarding • Address classes • A: 0 | 7 bit network | 24 bit host (16M each) • IP packet routing • B: 10 | 14 bit network | 16 bit host (64K) • C: 110 | 21 bit network | 8 bit host (255) • Variable prefix match • Address would specify prefix for forwarding table • IP router design • Simple lookup

• Routing protocols – distance vector

© Srinivasan Seshan, 2001 L -4; 1-24 -01 17 © Srinivasan Seshan, 2001 L -4; 1-24 -01 18

3 Original IP Route Lookup – Example CIDR Revisited • www.cmu.edu address 128.2.11.43 • Supernets • Class B address – class + network is 128.2 • Assign adjacent net addresses to same org • Lookup 128.2 in forwarding table • Classless routing (CIDR) • Prefix – part of address that really matters for routing • How does this help ? • Forwarding table contains • Combine routing table entries whenever all nodes with same prefix share same hop • List of class+network entries • A few fixed prefix lengths (8/16/24) • Large tables • 2 Million class C networks

© Srinivasan Seshan, 2001 L -4; 1-24 -01 19 © Srinivasan Seshan, 2001 L -4; 1-24 -01 20

CIDR Example CIDR Illustration • Network provide is allocated 8 class C chunks, 201.10.0.0 to 201.10.7.255 Provider is given 201.10.0.0/21 • Allocation uses 3 bits of class C space • Remaining 21 bits are network number, written Provider as 201.10.0.0/21 • Replaces 8 class C routing entries with 1 combined entry 201.10.0.0/22 201.10.4.0/24 201.10.5.0/24 201.10.6.0/23 • Routing protocols carry prefix with destination network address • Longest prefix match for forwarding

© Srinivasan Seshan, 2001 L -4; 1-24 -01 21 © Srinivasan Seshan, 2001 L -4; 1-24 -01 22

CIDR Shortcomings Routing to the Network • Multi-homing • Packet to 10.1.1.2 • Customer selecting a new provider 10.1.1.4 10.1.1.3 10.1.1.3 arrives H1 H2 • Path is R2 – R1 – 10.1.1/24 10.1.0.2 10.1.0.1 10.1.1.1 R1 H3 H1 – H2 10.1.2.2 201.10.0.0/21 10.1.0/24

Provider 1 Provider 2 10.1.2/23 10.1/16 10.1.8/24 Provider R2 10.1.8.1 10.1.2.1 10.1.16.1 H4 10.1.8.4

201.10.0.0/22 201.10.4.0/24 201.10.5.0/24 201.10.6.0/23 or Provider 2 address

© Srinivasan Seshan, 2001 L -4; 1-24 -01 23 © Srinivasan Seshan, 2001 L -4; 1-24 -01 24

4 Routing Within the Subnet Routing Within the Subnet

10.1.1.2 10.1.1.2 • Packet to 10.1.1.3 10.1.1.4 10.1.1.3 • Packet to 10.1.1.3 10.1.1.4 10.1.1.3 • Matches 10.1.0.0/23 H1 H2 • Matches 10.1.1.1/31 H1 H2 10.1.1/24 10.1.1/24 10.1.0.2 10.1.0.2 Routing table at R2 10.1.0.1 • Longest prefix match 10.1.0.1 10.1.1.1 R1 H3 10.1.1.1 R1 H3 10.1.2.2 10.1.2.2 Destination Next Hop Interface Routing table at R1 10.1.0/24 10.1.0/24 127.0.0.1 127.0.0.1 lo0 Destination Next Hop Interface 10.1.2/23 127.0.0.1 127.0.0.1 lo0 10.1.2/23 Default or 0/0 provider 10.1.16.1 10.1/16 10.1.8/24 10.1/16 10.1.8/24 R2 R2 10.1.8.0/24 10.1.8.1 10.1.8.1 Default or 0/0 10.1.2.1 10.1.2.2 10.1.8.1 10.1.8.1 10.1.2.0/23 10.1.2.1 10.1.2.1 10.1.2.1 10.1.0.0/24 10.1.0.1 10.1.0.1 10.1.2.1 10.1.16.1 H4 10.1.16.1 H4 10.1.1.0/24 10.1.1.1 10.1.1.4 10.1.0.0/23 10.1.2.2 10.1.2.1 10.1.8.4 10.1.8.4 10.1.2.0/23 10.1.2.2 10.1.2.2 10.1.1.2/31 10.1.1.2 10.1.1.2

© Srinivasan Seshan, 2001 L -4; 1-24 -01 25 © Srinivasan Seshan, 2001 L -4; 1-24 -01 26

Routing Within the Subnet Global Addresses • Advantages 10.1.1.2 • Packet to 10.1.1.3 10.1.1.4 10.1.1.3 • Stateless – simple error recovery • Direct route H1 H2 10.1.1/24 • Disadvantages 10.1.0.2 • Longest prefix match 10.1.0.1 • Every switch knows about every destination 10.1.1.1 R1 H3 10.1.2.2 Routing table at H1 10.1.0/24 • Potentially large tables Destination Next Hop Interface 10.1.2/23 • All packets to destination take same route 127.0.0.1 127.0.0.1 lo0 10.1/16 10.1.8/24 R2 Default or 0/0 10.1.1.1 10.1.1.2 10.1.8.1 10.1.2.1 10.1.1.0/24 10.1.1.2 10.1.1.1 10.1.16.1 H4 10.1.1.3/31 10.1.1.2 10.1.1.2 10.1.8.4

© Srinivasan Seshan, 2001 L -4; 1-24 -01 27 © Srinivasan Seshan, 2001 L -4; 1-24 -01 28

Comparison How do we set up Routing Tables? • Graph theory to compute “shortest path”

Source Routing Global Addresses Virtual Circuits • Switches = nodes

Header Size Worst OK – Large address Best • Links = edges

Number of hosts Router Table Size None Number of circuits • Delay, hops = cost (prefixes)

Forward Overhead Best Prefix matching Pretty Good • Need to adapt to changes in topology

Setup Overhead None None Connection Setup

Tell all routers and Error Recovery Tell all hosts Tell all routers Tear down circuit and re-route

© Srinivasan Seshan, 2001 L -4; 1-24 -01 29 © Srinivasan Seshan, 2001 L -4; 1-24 -01 30

5 Outline How To Do Variable Prefix Match • Alternative methods for packet forwarding • Traditional method – Patricia Tree • Arrange route entries into a series of bit tests • IP packet routing • Worst case = 32 bit tests • Problem: memory speed is a bottleneck • Variable prefix match 0 Bit to test – 0 = left child,1 = right child 10 • IP router design default 0/0 16 128.2/16 19 • Routing protocols – distance vector 128.32/16 128.32.130/240 128.32.150/24

© Srinivasan Seshan, 2001 L -4; 1-24 -01 31 © Srinivasan Seshan, 2001 L -4; 1-24 -01 32

Speeding up Prefix Match (P+98) Prefix Tree • Cut prefix tree at 16 bit depth • 64K bit mask • Bit = 1 if tree continues below cut (root head) • Bit = 1 if leaf at depth 16 or less (genuine head) • Bit = 0 if part of range covered by leaf

1 0 0 0 1 0 1 1 1 0 0 0 1 1 1 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Port 1 Port 5 Port 7 Port 9 Port 3 Port 5

© Srinivasan Seshan, 2001 L -4; 1-24 -01 33 © Srinivasan Seshan, 2001 L -4; 1-24 -01 34

Prefix Tree Speeding up Prefix Match (P+98) • Each 1 corresponds to either a route or a subtree • Keep array of routes/pointers to subtree • Need index into array – how to count # of 1s • Keep running count to 16bit word in base index + code word (6 bits) 1 0 0 0 1 0 1 1 1 0 0 0 1 1 1 1 • Need to count 1s in last 16bit word 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 • Clever tricks Subtree 1 Subtree 2 • Subtrees are handled separately Subtree 3

© Srinivasan Seshan, 2001 L -4; 1-24 -01 35 © Srinivasan Seshan, 2001 L -4; 1-24 -01 36

6 Speeding up Prefix Match (P+98) Speeding up Prefix Match - Alternatives • Scaling issues • Route caches • How would it handle IPv6 • Temporal locality • Other possiblities • Many packets to same destination • Other algorithms • Why were the cuts done at 16/24/32 bits? • Waldvogel – Sigcomm 97 • Improve data structure by shuffling bits • Binary search on hash tables • Works well for larger adresses • Bremler-Barr – Sigcomm 99 • Clue = prefix length matched at previous hop • Why is this useful?

© Srinivasan Seshan, 2001 L -4; 1-24 -01 37 © Srinivasan Seshan, 2001 L -4; 1-24 -01 38

Speeding up Prefix Match - Alternatives Outline • Content addressable memory (CAM) • Alternative methods for packet forwarding • Hardware based route lookup • Input = tag, output = value associated with tag • IP packet routing • Requires exact match with tag • Multiple cycles (1 per prefix searched) with single CAM • Variable prefix match • Multiple CAMs (1 per prefix) searched in parallel • Ternary CAM • IP router design • 0,1,don’t care values in tag match • Priority (I.e. longest prefix) by order of entries in CAM • Routing protocols – distance vector

© Srinivasan Seshan, 2001 L -4; 1-24 -01 39 © Srinivasan Seshan, 2001 L -4; 1-24 -01 40

What Does a Router Look Like? Router Processing (P+88) • Line cards • Packet arrives arrives at inbound line card • Network interface cards • Header transferred to forwarding engine • Forwarding engine • 24/56 bytes of packet + link layer info • Fast path routing (hardware vs. software) • Forwarding engine transmits result to line • Backplane card • Switch or bus interconnect • Packet copied to outbound line card • Network controller • Handles routing protocols, error conditions

© Srinivasan Seshan, 2001 L -4; 1-24 -01 41 © Srinivasan Seshan, 2001 L -4; 1-24 -01 42

7 Forwarding Engine (P+88) Forwarding Engine (P+88) • General purpose processor + software • Checksum updated but not checked • 8KB L1 Icache • Options handled by network proc • Holds full forwarding code • Fragmentation handed by network • 96KB L2 cache processor • Forwarding table cache • packets are copied on input line • 16MB L3 cache card • Full forwarding table x 2 - double buffered for • Packet trains help route hit rate updates • Packet train = sequence of packets for same/similar flows

© Srinivasan Seshan, 2001 L -4; 1-24 -01 43 © Srinivasan Seshan, 2001 L -4; 1-24 -01 44

Network Processor Switch Design Issues • Runs and downloads • Have N inputs and M outputs forwarding table to forwarding engines • Multiple packets for same output – output contention • Two forwarding tables per engine to allow easy switchover • Switch contention – switch cannot support arbitrary set of transfers • Performs “slow” path processing • Crossbar • Handles ICMP error messages • Bus • High clock/transfer rate needed for bus • Handles IP option processing • Banyan net • Complex scheduling needed to avoid switch contention • Solution – buffer packets where needed

© Srinivasan Seshan, 2001 L -4; 1-24 -01 45 © Srinivasan Seshan, 2001 L -4; 1-24 -01 46

Switch Buffering Line Card Interconnect (P+88) • Input buffering • Virtual output buffering • Which inputs are processed each slot – schedule? • Maintain per output buffer at input • Head of line packets destined for busy output blocks other packets • Solves head of line blocking problem • Output buffering • Each of MxN input buffer places bid for output • Output may receive multiple packets per slot • Crossbar connect • Need speedup proportional to # inputs • Challenge: map of bids to schedule for • Internal buffering crossbar • Head of line blocking • Amount of buffering needed

© Srinivasan Seshan, 2001 L -4; 1-24 -01 47 © Srinivasan Seshan, 2001 L -4; 1-24 -01 48

8 Switch Scheduling (P+88) Outline • Schedule for 128 byte slots • Alternative methods for packet forwarding • Greedy mapping of inputs to outputs • Fairness • IP packet routing • Order of greedy matching permuted randomly • Priority given to forwarding engine in schedule • Variable prefix match (why?) • Parallelized • IP router design • Check independent paths simultaneously • Routing protocols – distance vector

© Srinivasan Seshan, 2001 L -4; 1-24 -01 49 © Srinivasan Seshan, 2001 L -4; 1-24 -01 50

Factors Affecting Routing Two Main Approaches

• Routing algorithms view the network as • Distance-vector (DV) protocols a graph

• Problem: find lowest A 6 cost path between 3 1 • Link state (LS) protocols 2 F 1 E two nodes B 4 • Factors 1 C 9 D • Static topology • Dynamic load • Policy

© Srinivasan Seshan, 2001 L -4; 1-24 -01 51 © Srinivasan Seshan, 2001 L -4; 1-24 -01 52

Distance Vector Protocols Distributed Bellman-Ford

• Start Conditions: • Employed in the early Arpanet •Each router starts with a vector of (zero) distances to all directly • Distributed next hop computation attached networks • Unit of information exchange • Send step: •Each router advertises its current vector to all neighboring routers • Vector of distances to destinations • Receive step: • Distributed Bellman-Ford Algorithm •Upon receiving vectors from each of its neighbors, router computes its own distance to each neighbor •Then, for every network X, router finds that neighbor who is closer to X than to any other neighbor •Router updates its cost to X •After doing this for all X, router goes to send step

© Srinivasan Seshan, 2001 L -4; 1-24 -01 53 © Srinivasan Seshan, 2001 L -4; 1-24 -01 54

9 Example - Initial Distances E Receives D’s Routes; Updates Cost

1 1 B C Distance to B C Distance to Node Info at Info at 7 Node A B C D E 7 Node A B C D E

A 0 7 ~ ~ 1 A 0 7 ~ ~ 1

A 8 2 B 7 0 1 ~ 8 A 8 2 B 7 0 1 ~ 8 C ~ 1 0 2 ~ C ~ 1 0 2 ~ 1 2 D ~ ~ 2 0 2 1 2 D ~ ~ 2 0 2 E E D D E 1 8 ~ 2 0 E 1 8 4 2 0

© Srinivasan Seshan, 2001 L -4; 1-24 -01 55 © Srinivasan Seshan, 2001 L -4; 1-24 -01 56

A receives B’s; Updates Cost A receives E’s routes; Updates Costs

1 1 B C Distance to Node B C Distance to Node Info at Info at 7 Node A B C D E 7 Node A B C D E

A 0 7 8 ~ 1 A 0 7 5 3 1

A 8 2 B 7 0 1 ~ 8 A 8 2 B 7 0 1 ~ 8 C ~ 1 0 2 ~ C ~ 1 0 2 ~ 1 2 D ~ ~ 2 0 2 1 2 D ~ ~ 2 0 2 E D E D E 1 8 4 2 0 E 1 8 4 2 0

© Srinivasan Seshan, 2001 L -4; 1-24 -01 57 © Srinivasan Seshan, 2001 L -4; 1-24 -01 58

Final Distances View From a Node

1 1 B C Distance to Node B C E’s routing table Info at 7 Node A B C D E 7 Next hop

A 0 6 5 3 1 dest A B D

A 8 2 B 6 0 1 3 5 A 8 2 A 1 14 5 C 5 1 0 2 4 B 7 8 5 1 2 D 3 3 2 0 2 1 2 C 6 9 4 E D E D E 1 5 4 2 0 D 4 11 2

© Srinivasan Seshan, 2001 L -4; 1-24 -01 59 © Srinivasan Seshan, 2001 L -4; 1-24 -01 60

10 Final Distances After Link Failure Next Lecture: Intra-Domain Routing • Routing algorithms • Distance vector routing – challenges 1 B C Distance to Node • Link state routing Info at 7 Node A B C D E • How to make routing adapt to load A 0 7 8 10 1 • How to make routing scale A 8 2 B 7 0 1 3 8 C 8 1 0 2 9 • Assigned reading 1 2 D 10 3 2 0 11 • [KZ89] The revised ARPANET routing metric E X D E 1 8 9 11 0

© Srinivasan Seshan, 2001 L -4; 1-24 -01 61 © Srinivasan Seshan, 2001 L -4; 1-24 -01 62

11