ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

Link State

Jean‐Yves Le Boudec 2014

1 Contents

1. Link state 2. OSPF and Hierarchical routing with areas 3. Dynamic metrics and Braess paradox

2 1. Link State Routing

Principle of link state routing each keeps a topology database of whole network link state updates flooded, or multicast to all network routers compute their routing tables based on topology often uses Dijkstra’s shortest path algorithm Used in OSPF (Open Shortest Path First), IS‐IS (similar to OSPF) and PNNI (ATM )

3 (a) Topology Database Synchronization

Neighbouring nodes synchronize before starting any relationship Hello protocol; keep alive initial synchronization of database description of all links (no information yet) Once synchronized, a node accepts link state advertisements contain a sequence number, stored with record in the database only messages with new sequence number are accepted accepted messages are flooded to all neighbours sequence number prevents anomalies (loops or blackholes)

4 Example network Each router knows directly connected networks

n3 n6

B D E

n2 n4 n5 n7

A C F

n1

5 Initial routing tables

D E net type net type B n6 Ether n6 Ether net type n5 P-to-P n7 Ether n3 n6 n3 Ether n2 P-to-P n4 P-to-P B D E

n2 n4 A n5 n7 net type F n1 Ether A C F n2 P-to-P net type C n1 Ether n7 Ether n1 net type n1 Ether n4 P-to-P n5 P-to-P 6 After Flooding The local metric information is flooded to all routers After , all routers have the same information rtr net cost n6 A n1 10 n3 A n2 100 B n3 10 B n2 100 B D B n4 100 E C n1 10 C n4 100 n2 n4 C n5 100 n5 n7 D n6 10 D n5 100 E n6 10 E n7 10 A C F F n1 10 F n7 10 n1

7 (b) From Topology Database to Graph Arrows routers‐to‐nets with a given metric except point‐to‐point, stub, and external networks From nets to routers, metric = 0 external network n3 54 0 10 100 0 n6 stub network B D 10 E external network 100 100 0 10 10 n7 0 100 100 10 point to point 100 broadcast link A C F network 0 10 10 10 n1 0 0 8 (b) Path Computation Performed locally, based on topology database Computes one or several best paths to every destination from this node Best Path = shortest for OSPF OSPF uses Dijkstra’s shortest path the best known algorithm for centralized operation Paths are computed independently at every node synchronization of databases guarantees absence of persistent loops every node computes a shortest path tree rooted at self

9 Simplified graph Only arrows with metrics between routers Every node executes the shortest path computation on the graph – same graph, but different sources

100 100 100 10

A 10 C 10 F

10

10 Dijkstra’s Shortest Path Algorithm

The nodes are 0...N and the algorithm computes best paths from node 0 c(i,j) is the cost of (i,j), pred(i) is the predecessor of node i on the tree M being built m(j) is the distance from node 0 to node j.

m(0) = 0; M = {0}; for k=1 to N { find (i0, j0) that minimizes m(i) + c(i,j), with i in M, j not in M m(j0) = m(i0) + c(i0, j0) pred(j0) = i0 M = M  {j0} }

11 Example: Dijkstra at A

init: M = { A } B D 10 E step 1: i0=A 100 100 100 10 j0=C m(C)=10 M = {A, C} A 10 C 10 F m(C)=10 m(A)=0 10

12 Next, which node is added to M ?

1. F 2. E 3. D 4. B

5. I don’t know 0% 0% 0% 0% 0% 1. 2. 3. 4. 5. 13 Example: Dijkstra at A

m(F)=20 i0=F 10 B D E j0=E m(E)=20 M = {A,C,F,E} 100 100 100 10

A 10 C 10 F m(C)=10 m(A)=0 m(F)=10 10

15 Example: Dijkstra at A

m(D)=30 m(E)=20 i0=E 10 B D E j0=D m(D)=40 M = {A,C,F,E,D} 100 100 100 10

A 10 C 10 F m(C)=10 m(A)=0 m(F)=10 10

16 Example: Dijkstra at A

m(B)=100 m(D)=30 m(E)=20 i0=A 10 B D E j0=B m(B)=100 M = {A,C,F,E,D,B} 100 100 100 10

A 10 C 10 F m(C)=10 m(A)=0 m(F)=10 10

17 at A

A net next n3 n6 n1 direct n2 direct n3 B B D n4 C E n5 C n6 F n2 n4 n7 n7 F n5

A C F

n1

18 Dijkstra’s Algorithm At C

m(B)=100 m(D)=30 B D 10 E m(F)=20

100 100 100 10

10 10 At A A C F m(A)=10 m(C)=0 m(F)=10 10

19 Routing Tables at C

n3 n6

B D E

n2 n4 n5 n7 C A C F

net next n1 n1 direct n2 A n3 B n4 direct n5 direct n6 F n7 F back

20 Changes to Topology

Changes to topology (e.g. link failures) cause routers to send new Link State Advertisements All routers update their topology database and propagate the change to all their neighbours LSA sequence number is used to avoid loops in the propagation One router that has received an LSA already does not propagate it further Changes to topology database trigger re‐computation of shortest‐paths

21 Link State Routing can be used for non‐standard operations Example: assume you want to bridge VLANs across a campus One solution: tunnel MAC packets in IP Problem: automatic creation of tunnels

VLAN2

R3 VLAN1 R6 VLAN2 R1 VLAN2 R4 VLAN1 R7 R2

R5

VLAN1

22 Can you imagine a solution using Link State Routing in R1, R2, … ?

1. Routers R1, R2 … discover which VLAN is active on any of their ports and put this information in the topology database 2. Routers R1, R2 … overhear all MAC source addresses and put the information in the topology database 3. Both of these solutions seem bad to me 4. I don’t know

0% 0% 0% 0% 1. 2. 3. 4. 23 2. The OSPF Protocol and Hierarchical Routing

OSPF (Open Shortest Path First) IETF standard for internal routing used in large networks (ISPs), in MPLS and in TRILL (Cisco VLAN interconnection)

OSPF uses Link State protocol + Hierarchical

26 OSPF and hierarchical routing Why divide large networks? Cost of computing routing tables update when topology changes SPF algorithm n routers, k links complexity O(n*k) size of DB, update messages grows with the network size Use hierarchical routing to limit the scope of updates and computational overhead divide the network into several areas independent route computing in each area inject aggregated information on routes into other areas We explain hierarchical routing the OSPF way IS‐IS does things a bit differently

27 Hierarchical Routing An OSPF domain is configured in areas one backbone area (area 0) plus zero or several non backbone areas (areas numbered other than 0) All inter‐area traffic goes through area 0 strict hierarchy Inside one area: link state routing as seen earlier one topology database per area X1 X1

B1 A1 X4 X3 area 0 area 1 area 2

X4 X3 A2 B2

28 Principles Routing method used in the higher level: distance vector no problem with loops ‐ one backbone area Mapping of higher level nodes to lower level nodes area border routers (inter‐area routers) belong to both areas Inter‐level routing information summary link state advertisements (LSA) from other areas are injected into the local topology databases

29 Assume networks n1 and n2 become visible at time 0. Which are the topology databases at A1 ?

n1, d=11 D1 n2, d=17 1 n1, d=17 n2, d=11

n1, d=29 n1, d=29 n2, d=23 n2, d=45 1 n1, d=23 n1, d=23 n2, d=17 n2, d=17 D2 D3

1. D1 2. D2 3. D3 4. I don’t know 0% 0% 0% 0% 1. 2. 3. 4. 30 Comments Distance vector computation causes none of the RIP problems strict hierarchy: no loop between areas External and summary LSA for all reachable networks are present in all topology databases of all areas most LSAs are external can be avoided in configuring some areas as terminal: use default entry to the backbone Area partitions require specific support partition of non‐backbone area is handled by having the area 0 topology database keep a map of all area connected components partition of backbone cannot be repaired; it must be avoided; can be handled by backup virtual area 0 links through non backbone area

33 3. Dynamic Metrics Does a routing protocol minimize network utility ?

1. Yes, because it minimizes the cost to destination 2. Yes if TCP is used because it ensures fairness 3. No 4. I don’t know

0% 0% 0% 0%

1. 2. 3. 4. 35 Dynamic Metrics

Some proposed to use dynamic metrics for improving over shortest path high load on a link => high cost => link is less used This is used by EIGRP But there may be some issues Braess paradox

37 Least Delay Routing and Wardrop Equilibirum

Assume all (ms) traffic on this link flows pick the route with (Gb/s) shortest delay Assume parallel paths exist and Delay = flows can make use of them Eventually, Link 3 there will be an Delay = equilibrium Link 1 (called ? Delay = “Wardrop Link 4 Equilibrium”) ? such that delay is equal on all Link 2 competing Delay = routes 38 Which is a Wardrop Equilibrium for this Network ?

4. None of the above Delay =

Link 3 Delay = Link 1 ? Gb/s Delay = Link 4 ? Link 2 0% 0% 0% 0% Delay = 1. 2. 3. 4. 39 Now introduce link 5

Link 5 has delay function i.e. short delay and high capacity There are now 3 paths: 13, 154 and 24 Assume we start from previous equilibrium Delay = Is this a Wardrop equilibrium ? Link 3 Delay = Delay = Link 1 Link 5 Gb/s Delay = Link 4

Link 2

Delay =

41 Is a Wardrop Equilibrium ?

1. Yes 2. No 3. I don’t know

0% 0% 0%

1. 2. 3. 42 What is the Wardrop Equilibrium now ?

delay equations total flow

Solution : Gb/s Delay now is 92 ms on all routes

44 Braess Paradox

With shortest delay routing: disable link 5: delay = 83 ms enable link 5 : delay = 92 ms Adding capacity made things worse This is called Braess paradox Shortest delay routing is not optimal

45 Optimal Routing

One can change the objective of routing: instead of computing shortest paths, one could solve a global optimization problem maximizing a utility function: minimize total delay subject to flow constraints this is a well posed optimization problem the optimal solution depends on all flows but it can be implemented in a distributed algorithm similar to TCP congestion control ; see [BertsekasGallager92]

This can be solved using an offline optimization procedure that computes optimal paths for all traffic flows and downloads the routes into all routers

This can be done with “Software Defined Networking” (such as OpenFlow)

46 Conclusion

Link State Routing is an alternative to distance vector more complex allows more control over the chosen paths

Shortest path routing may not be globally optimal and may need to be complemented with offline optimization methods

47