Chapter 7 Packet-Switching Networks

Network Services and Internal Network Operation Packet Network Topology Datagrams and Virtual Circuits Routing in Packet Networks Shortest Path Routing ATM Networks Traffic Management

Network Layer

 Network Layer: the most complex layer  Requires the coordinated actions of multiple, geographically distributed network elements (switches & routers)  Must be able to deal with very large scales  Billions of users (people & communicating devices)  Biggest Challenges  Addressing: where should information be directed to?  Routing: what path should be used to get information there?

1

t1 t0

Network

 Transfer of information as payload in data packets  Packets undergo random delays & possible loss  Different applications impose differing requirements on the transfer of information

Network Service

Messages Messages

Transport Segments Transport layer layer Network Network service service Network Network Network Network layer layer layer layer Data link Data link Data link Data link End End layer layer layer layer system system Physical Physical β α Physical Physical layer layer layer layer

 Network layer can offer a variety of services to transport layer  Connection-oriented service or connectionless service  Best-effort or delay/loss guarantees

2 Network Service vs. Operation

Network Service Internal Network  Connectionless Operation  Datagram Transfer  Connectionless  Connection-Oriented  IP  Reliable and possibly  Connection-Oriented constant bit rate transfer  Telephone connection  ATM

Various combinations are possible  Connection-oriented service over Connectionless operation  Connectionless service over Connection-Oriented operation  Context & requirements determine what makes sense

The End-to-End Argument

 An end-to-end function is best implemented at a higher level than at a lower level  End-to-end service requires all intermediate components to work properly  Higher-level better positioned to ensure correct operation  Example: stream transfer service  Establishing an explicit connection for each stream across network requires all network elements (NEs) to be aware of connection; All NEs have to be involved in re- establishment of connections in case of network fault  In connectionless network operation, NEs do not deal with each explicit connection and hence are much simpler in design

3 Network Layer Functions

What are essentials?

 Routing: mechanisms for determining the set of best paths for routing packets requires the collaboration of network elements

 Forwarding: transfer of packets from NE inputs to outputs

 Priority & Scheduling: determining order of packet transmission in each NE Optional: congestion control, segmentation & reassembly, security

End-to-End Packet Network

 Packet networks very different than telephone networks  Individual packet streams are highly bursty  Statistical multiplexing is used to concentrate streams  User demand can undergo dramatic change  Peer-to-peer applications stimulated huge growth in traffic volumes  Internet structure highly decentralized  Paths traversed by packets can go through many networks controlled by different organizations  No single entity responsible for end-to-end service

4 Access Multiplexing

Access MUX To packet network

 Packet traffic from users multiplexed at access to network into aggregated streams  DSL traffic multiplexed at DSL Access Mux  Cable modem traffic multiplexed at Cable Modem Termination System

Oversubscription

 Access Multiplexer r  N subscribers connected @ c bps to mux r  Each subscriber active r/c of time (ave. rate r) • • • • nc  Mux has C=nc bps to network r  Oversubscription rate: N/n Nr Nc  Find n so that at most 1% overflow probability Feasible oversubscription rate increases with size

N r/c n N/n 10 0.01 1 10 10 extremely lightly loaded users 10 0.05 3 3.3 10 very lightly loaded user 10 0.1 4 2.5 10 lightly loaded users 20 0.1 6 3.3 20 lightly loaded users 40 0.1 9 4.4 40 lightly loaded users 100 0.1 18 5.5 100 lightly loaded users What is the probability that k users transmit at the peak rate?

5 Home LANs

WiFi

Ethernet Home Router To packet network

How about network addressing?  Home Router  LAN Access using Ethernet or WiFi (IEEE 802.11)  Private IP addresses in Home (192.168.0.x) using Network Address Translation (NAT)  Single global IP address from ISP issued using Dynamic Host Configuration Protocol (DHCP)

Campus Network Servers have redundant connectivity to backbone Organization To Internet or Servers wide area network s s Gateway Backbone R R R S S S Departmental R R Server R s s High-speed campus s backbone net connects dept routers s s s s s s Only outgoing packets leave LAN through router

6 Connecting to ISP

Internet service provider

Border routers

Campus Border routers Network Interdomain level Autonomous system or domain Intradomain level s LAN network administered s s by single organization running same routing protocol

Key Role of Routing

How to get packet from here to there?  Decentralized nature of Internet makes routing a major challenge  Interior gateway protocols (IGPs) are used to determine routes within a domain  Exterior gateway protocols (EGPs) are used to determine routes across domains  Routes must be consistent & produce stable flows  Scalability required to accommodate growth  Hierarchical structure of IP addresses essential to keeping size of routing tables manageable

7 Chapter 7 Packet-Switching Networks

Datagrams and Virtual Circuits

Packet Switching Network

Packet switching network User  Transfers packets between users Transmission line  Transmission lines + packet switches Network (routers) Packet switch  Origin in message switching Two modes of operation:  Connectionless  Virtual Circuit

8 Message Switching

 Message switching invented for telegraphy

Message  Entire messages Message multiplexed onto shared Message lines, stored & forwarded

Source  Headers for source & destination addresses Message  Routing at message switches  Switches Destination Connectionless

 Transmission delay vs. propagation delay  Transmit a 1000B from LA to DC via a 1Gbps network, signal speed 200Km/sec.

Message Switching Delay

Source T t

Switch 1 t τ Switch 2 t

t Destination Delay

Minimum delay = 3τ + 3T

Additional queueing delays possible at each link

9 Long Messages vs. Packets

1 Mbit message source dest BER=p=10-6 BER=10-6 How many bits need to be transmitted to deliver message?

 Approach 1: send 1 Mbit  Approach 2: send 10 message 100-kbit packets  Probability message  Probability packet arrives arrives correctly correctly

6 6 6 −6 10 −10 10− −1 −6 105 −10510−6 −0.1 Pc = (1−10 ) ≈ e = e ≈1/ 3 Pc′ = (1−10 ) ≈ e = e ≈ 0.9  On average it takes about  On average it takes about 1.1 transmissions/hop 3 transmissions/hop  Total # bits transmitted ≈  Total # bits transmitted ≈ 6 Mbits 2.2 Mbits

Packet Switching - Datagram

 Messages broken into smaller units (packets)  Source & destination addresses in packet header Packet 1  Connectionless, packets routed independently Packet 1 (datagram) Packet 2  Packet may arrive out of order  Pipelining of packets across Packet 2 network can reduce delay, increase throughput Packet 2  Lower delay that message switching, suitable for interactive traffic

10 Packet Switching Delay

Assume three packets corresponding to one message traverse same path t 1 2 3 t 1 2 3 t 1 2 3 t Delay Minimum Delay = 3τ + 5(T/3) (single path assumed) Additional queueing delays possible at each link Packet pipelining enables message to arrive sooner

Delay for k-Packet Mess. over L Hops

Source t 1 2 3 Switch 1 t τ 1 2 3 Switch 2 t 1 2 3 t Destination 3 hops L hops 3τ + 2(T/3) first bit received Lτ + (L-1)P first bit received

3τ + 3(T/3) first bit released Lτ + LP first bit released 3τ + 5 (T/3) last bit released Lτ + LP + (k-1)P last bit released where T = k P

11 Routing Tables in Datagram Networks

Destination Output  Route determined by table address port lookup  Routing decision involves 0785 7 finding next hop in route to given destination 1345 12  Routing table has an entry for each destination 1566 6 specifying output port that leads to next hop  Size of table becomes impractical for very large number of destinations 2458 12

Example: Internet Routing

 Internet protocol uses datagram packet switching across networks  Networks are treated as data links

 Hosts have two-part IP address:  Network address + Host address

 Routers do table lookup on network address  This reduces size of routing table

 In addition, network addresses are assigned so that they can also be aggregated  Discussed as CIDR in Chapter 8

12 Packet Switching – Virtual Circuit

Packet Packet Packet

Packet

Virtual circuit

 Call set-up phase sets ups pointers in fixed path along network

 All packets for a connection follow the same path

 Abbreviated header identifies connection on each link

 Packets queue for transmission

 Variable bit rates possible, negotiated during call set-up

 Delays variable, cannot be less than circuit switching

ATM Virtual Circuits

 A VC is a connection with resources reserved.

 A Cell is a small and fixed-size packet, delivered in order.

13 Routing in VC Subnet

Does VC subnets need the capability to route isolated packets from an arbitrary source to an arbitrary destination? Label switching

Connection Setup

Connect Connect Connect request request request SW SW … SW 1 2 n Connect Connect Connect confirm confirm confirm

 Signaling messages propagate as route is selected

 Signaling messages identify connection and setup tables in switches

 Typically a connection is identified by a local tag, Virtual Circuit Identifier (VCI)

 Each switch only needs to know how to relate an incoming tag in one input to an outgoing tag in the corresponding output

 Once tables are setup, packets can flow along path

14 Connection Setup Delay

t Connect 1 2 3 request CC t Release CR 1 2 3 CC t Connect CR 1 2 3 confirm t

 Connection setup delay is incurred before any packet can be transferred

 Delay is acceptable for sustained transfer of large number of packets

 This delay may be unacceptably high if only a few packets are being transferred

Virtual Circuit Forwarding Tables

Input Output Output  Each input port of packet VCI port VCI switch has a forwarding table

12 44  Lookup entry for per-link/port 13 VCI of incoming packet

 Determine output port (next 15 15 23 hop) and insert VCI for next link

27 13 16  Very high speeds are possible (HW-based)

 Table can also include priority or other information about how packet should be treated 58 7 34

What are key pros & cons of VC switching vs. datagram switching?

15 Cut-Through switching

Source t 2 Switch 1 1 3 t 1 2 3 Switch 2 t 1 2 3 t Destination Minimum delay = 3τ + T

 Some networks perform error checking on header only, so packet can be forwarded as soon as header is received & processed

 Delays reduced further with cut-through switching

Message vs. Packet Min. Delay

 Message: L τ + L T = L τ + (L – 1) T + T

 Packet L τ + L P + (k – 1) P = L τ + (L – 1) P + T

 Cut-Through Packet (Immediate forwarding after header) = L τ + T

Above neglect header processing delays

16 Comparison of VC and Datagram Subnets

5-4

17 Chapter 7 Packet-Switching Networks

Router Architecture Overview

17 Router Architecture Overview

Two key router functions:  run routing algorithms/protocol (RIP, OSPF, BGP)  forwarding datagrams from incoming to outgoing link

Network Layer 4-1

Input Port Functions

Physical layer: bit-level reception Data link layer: Decentralized switching: e.g., Ethernet  given datagram dest., lookup output port see chapter 5 using forwarding table in input port memory  goal: complete input port processing at ‘line speed’  queuing: if datagrams arrive faster than forwarding rate into switch fabric

Network Layer 4-2

1 Three types of switching fabrics

Network Layer 4-3

Switching Via Memory

First generation routers:  traditional computers with switching under direct control of CPU packet copied to system’s memory  speed limited by memory bandwidth (2 bus crossings per datagram)

Input Memory Output Port Port

System Bus

Network Layer 4-4

2 Switching Via a Bus

 datagram from input port memory to output port memory via a shared bus  bus contention: switching speed limited by bus bandwidth  32 Gbps bus, Cisco 5600: sufficient speed for access and enterprise routers

Network Layer 4-5

Switching Via An Interconnection Network

 overcome bus bandwidth limitations  Banyan networks, other interconnection nets initially developed to connect processors in multiprocessor  advanced design: fragmenting datagram into fixed length cells, switch cells through the fabric.  Cisco 12000: switches 60 Gbps through the interconnection network

Network Layer 4-6

3 Output Ports

 Buffering required when datagrams arrive from fabric faster than the transmission rate  Scheduling discipline chooses among queued datagrams for transmission

Network Layer 4-7

Output port queueing

 buffering when arrival rate via switch exceeds output line speed  queueing (delay) and loss due to output port buffer overflow!

Network Layer 4-8

4 How much buffering?

 RFC 3439 rule of thumb: average buffering equal to “typical” RTT (say 250 msec) times link capacity C  e.g., C = 10 Gps link: 2.5 Gbit buffer  Recent recommendation: with N flows, buffering equal to RTT . C N

Network Layer 4-9

Input Port Queuing

 Fabric slower than input ports combined -> queueing may occur at input queues  Head-of-the-Line (HOL) blocking: queued datagram at front of queue prevents others in queue from moving forward  queueing delay and loss due to input buffer overflow!

Network Layer 4-10

5 Chapter 7 Packet-Switching Networks

Routing in Packet Networks

17 Routing in Packet Networks

1 3 6

4

2 5 Node (switch or router)

 Three possible (loopfree) routes from 1 to 6:  1-3-6, 1-4-5-6, 1-2-5-6  Which is “best”?What is the objective function for optimization?  Min delay? Min hop? Max bandwidth? Min cost? Max reliability?

Creating the Routing Tables

 Need information on state of links  Link up/down; congested; delay or other metrics  Need to distribute link state information using a routing protocol  What information is exchanged? How often?  How to exchange with neighbors?  Need to compute routes based on information  Single metric; multiple metrics  Single route; alternate routes

18 Routing Algorithm Requirements

 Correctness

 Responsiveness  Topology or bandwidth changes, congestion

 Optimality  Resource utilization, path length

 Robustness  Continues working under high load, congestion, faults, equipment failures, incorrect implementations

 Simplicity  Efficient software implementation, reasonable processing load

 Fairness

Centralized vs Distributed Routing

 Centralized Routing  All routes determined by a central node  All state information sent to central node  Problems adapting to frequent topology changes  What is the problem? Does not scale

 Distributed Routing  Routes determined by routers using distributed algorithm  State information exchanged by routers  Adapts to topology and other changes  Better scalability, but maybe inconsistent due to loops

19 Static vs Dynamic Routing

 Static Routing  Set up manually, do not change; requires administration  Works when traffic predictable & network is simple  Used to override some routes set by dynamic algorithm  Used to provide default router

 Dynamic Routing  Adapt to changes in network conditions  Automated  Calculates routes based on received updated network state information

Flooding

Send a packet to all nodes in a network  Useful in starting up network or broadcast  No routing tables available  Need to broadcast packet to all nodes (e.g. to propagate link state information)

Approach  Send packet on all ports except one where it arrived  Exponential growth in packet transmissions

20 Flooding Example

1 3 6

Is flooding static or adaptive? 4 What is the major problem?

How to handle the problem? 2 5 What are main nice properties of flooding?

How flooding can be terminated? Duplicates (infinite number due to loops) Hop Count; Sequence number with a counter per a source router. Robustness; always follow shortest path TTL is a way to terminate flooding.

Flooding Example

1 3 6

4

2 5

Flooding is initiated from Node 1: Hop 2 transmissions

21 Flooding Example

1 3 6

4

2 5

Flooding is initiated from Node 1: Hop 3 transmissions

Limited Flooding

 Time-to-Live field in each packet limits number of hops to certain diameter  Each switch adds its ID before flooding; discards repeats  Source puts sequence number in each packet; switches records source address and sequence number and discards repeats

22 Limited Flooding Example

 Suppose the following network uses flooding as the routing algorithm. If a packet sent by A to D has a maximum hop of 3, list all the routes it will take. Also tell how many hops worth of bandwidth it consumes. Assume the bandwidth weight of the lines is the same.

B C

A D E

F G

Shortest Paths & Routing

 Many possible paths connect any given source and to any given destination

 Routing involves the selection of the path to be used to accomplish a given transfer

 Typically it is possible to attach a cost or distance to a link connecting two nodes

 Routing can then be posed as a shortest path problem

23 Routing Metrics

Means for measuring desirability of a path  Path Length = sum of costs or distances  Possible metrics  Hop count: rough measure of resources used  Reliability: link availability  Delay: sum of delays along path; complex & dynamic  Bandwidth: “available capacity” in a path  Load: Link & router utilization along path  Cost: $$$

An Example: Sink Tree

 Sink tree: minimum # of hops to a root.

(a) A subnet. (b) A sink tree for router B. Q1: must a sink tree be unique? An example? Q2: each packet will be delivered within a finite # of hops?

24 Shortest Path Approaches

Distance Vector Protocols  Neighbors exchange list of distances to destinations  Best next-hop determined for each destination  Ford-Fulkerson (distributed) shortest path algorithm Link State Protocols  Link state information flooded to all routers  Routers have complete topology information  Shortest path (& hence next hop) calculated  Dijkstra (centralized) shortest path algorithm

Distance Vector Do you know the way to San Jose?

San Jose 294 San Jose 392

San Jose 596 San Jose 250

25 Distance Vector

Local Signpost Table Synthesis  Direction  Neighbors exchange  Distance table entries  Determine current best Routing Table next hop For each destination list:  Inform neighbors  Periodically  Next Node  After changes  Distance dest next dist

Shortest Path to SJ Focus on how nodes find their shortest path to a given destination node, i.e. SJ San Jose

Dj C ij j i

Di If Di is the shortest distance to SJ from i and if j is a neighbor on the shortest path,

then Di = Cij + Dj

26 But we don’t know the shortest paths i only has local info from neighbors San Dj' Jose j' Cij' Dj Cij j i C Pick current D ij” i j" shortest path Dj"

Why Distance Vector Works SJ sends accurate info

San 1 Hop 2 Hops From SJ Jose 3 Hops From SJ From SJ

Hop-1 nodes calculate current (next hop, dist), & Accurate info about SJ send to neighbors ripples across network, Shortest Path Converges

27 Distance Vector Routing (RIP)

 Routing operates by having each router maintain a vector table giving the best known distance to each destination and which line to use to get there. The tables are updated by exchanging information with the neighbors.

 Vector table: one entry for each router in the subnet; each entry contains two parts: preferred outgoing line to use for that destination and an estimate of the time or distance to the destination.

 The router is assumed to know the distance/cost to each neighbor and update the vector table periodically by changing it with neighbors.

 # hops  Delay (ECHO) X i m  Capacity  Congestion

An Example

(a) A subnet. (b) Input from A, I, H, K, and the new routing table for J.

28 Distance Vector Routing (RIP)

 Routing operates by having each router maintain a vector table giving the best known distance to each destination and which line to use to get there. The tables are updated by exchanging information with the neighbors.

 Vector table: one entry for each router in the subnet; each entry contains two parts: preferred outgoing line to use for that destination and an estimate of the time or distance to the destination.

 The router is assumed to know the distance/cost to each neighbor and update the vector table periodically by changing it with neighbors.

 # hops  Delay (ECHO) X i m  Capacity  Congestion

An Example

(a) A subnet. (b) Input from A, I, H, K, and the new routing table for J.

28 Bellman-Ford Algorithm

 Consider computations for one destination d  Initialization  Each node table has 1 row for destination d

 Distance of node d to itself is zero: Dd=0

 Distance of other node j to d is infinite: Dj=∝, for j≠ d  Next hop node nj = -1 to indicate not yet defined for j ≠ d  Send Step  Send new distance vector to immediate neighbors across local link  Receive Step  At node i, find the next hop that gives the minimum distance to d,

 Minj { Cij + Dj }  Replace old (nj, Dj(d)) by new (nj*, Dj*(d)) if new next node or distance  Go to send step

Bellman-Ford Algorithm

 Now consider parallel computations for all destinations d  Initialization  Each node has 1 row for each destination d

 Distance of node d to itself is zero: Dd(d)=0  Distance of other node j to d is infinite: Dj(d)= ∝ , for j ≠ d  Next node nj = -1 since not yet defined  Send Step  Send new distance vector to immediate neighbors across local link  Receive Step  For each destination d, find the next hop that gives the minimum distance to d,

 Minj { Cij+ Dj(d) }  Replace old (nj, Di(d)) by new (nj*, Dj*(d)) if new next node or distance found  Go to send step

29 Bellman-Ford Algorithm

 Consider computations for one destination d  Initialization  Each node table has 1 row for destination d

 Distance of node d to itself is zero: Dd=0

 Distance of other node j to d is infinite: Dj=∝, for j≠ d  Next hop node nj = -1 to indicate not yet defined for j ≠ d  Send Step  Send new distance vector to immediate neighbors across local link  Receive Step  At node i, find the next hop that gives the minimum distance to d,

 Minj { Cij + Dj }  Replace old (nj, Dj(d)) by new (nj*, Dj*(d)) if new next node or distance  Go to send step

Bellman-Ford Algorithm

 Now consider parallel computations for all destinations d  Initialization  Each node has 1 row for each destination d

 Distance of node d to itself is zero: Dd(d)=0  Distance of other node j to d is infinite: Dj(d)= ∝ , for j ≠ d  Next node nj = -1 since not yet defined  Send Step  Send new distance vector to immediate neighbors across local link  Receive Step  For each destination d, find the next hop that gives the minimum distance to d,

 Minj { Cij+ Dj(d) }  Replace old (nj, Di(d)) by new (nj*, Dj*(d)) if new next node or distance found  Go to send step

29 Iteration Node 1 Node 2 Node 3 Node 4 Node 5 Initial (-1, ∞) (-1, ∞) (-1, ∞) (-1, ∞) (-1, ∞) 1 2 3

Table entry Table entry @ node 1 @ node 3 for dest SJ for dest SJ 2 3 1 1 5 2 San 4 Jose 3 1 3 6

2 5 2 4

Iteration Node 1 Node 2 Node 3 Node 4 Node 5 Initial (-1, ∞) (-1, ∞) (-1, ∞) (-1, ∞) (-1, ∞) 1 (-1, ∞) (-1, ∞) (6,1) (-1, ∞) (6,2) 2 3

D3=D6+1 n3=6 D =0 2 3 1 6 1 1 5 2 0 4 San 3 6 1 3 Jose 2 5 2 4 2 D6=0 D5=D6+2 n5=6

30 Iteration Node 1 Node 2 Node 3 Node 4 Node 5 Initial (-1, ∞) (-1, ∞) (-1, ∞) (-1, ∞) (-1, ∞) 1 (-1, ∞) (-1, ∞) (6, 1) (-1, ∞) (6,2) 2 (3,3) (5,6) (6, 1) (3,3) (6,2) 3

3 1 2 3 1 1 5 2 3 0 4 San 3 6 1 3 Jose 2 5 2 4 6 2

Iteration Node 1 Node 2 Node 3 Node 4 Node 5 Initial (-1, ∞) (-1, ∞) (-1, ∞) (-1, ∞) (-1, ∞) 1 (-1, ∞) (-1, ∞) (6, 1) (-1, ∞) (6,2) 2 (3,3) (5,6) (6, 1) (3,3) (6,2) 3 (3,3) (4,4) (6, 1) (3,3) (6,2)

Nodes 1, 4, 5 process the new entry from node 2, but no new finding; Shorting path tree to node 6 is constructed. 1 3 2 3 1 1 5 2 3 0 4 San 3 6 1 3 Jose 2 5 2 4 6 4 2

31 Iteration Node 1 Node 2 Node 3 Node 4 Node 5 Initial (3,3) (4,4) (6, 1) (3,3) (6,2) 1 (3,3) (4,4) (4, 5) (3,3) (6,2) 2 3

1 5 3 2 3 1 1 5 2 3 0 4 San 3 6 1 3 Jose 2 2 5 4 4 2 Network disconnected (3-6 fails); Loop created between nodes 3 and 4

Iteration Node 1 Node 2 Node 3 Node 4 Node 5 Initial (3,3) (4,4) (6, 1) (3,3) (6,2) 1 (3,3) (4,4) (4, 5) (3,3) (6,2) 2 (3,7) (4,4) (4, 5) (5,5) (6,2) 3

5 7 3 2 3 1 1 5 5 2 3 0 4 San 3 1 3 6 Jose 2 2 5 4 4 2

Node 4 could have chosen 2 as next node because of tie

32 Iteration Node 1 Node 2 Node 3 Node 4 Node 5 Initial (3,3) (4,4) (6, 1) (3,3) (6,2) 1 (3,3) (4,4) (4, 5) (3,3) (6,2) 2 (3,7) (4,4) (4, 5) (5,5) (6,2) 3 (3,7) (4,6) (4, 7) (5,5) (6,2)

5 7 7 2 3 1 1 5 2 5 0 4 San 3 6 1 3 Jose 2 5 2 4 2 4 6 Node 2 could have chosen 5 as next node because of tie

Iteration Node 1 Node 2 Node 3 Node 4 Node 5 1 (3,3) (4,4) (4, 5) (3,3) (6,2) 2 (3,7) (4,4) (4, 5) (2,5) (6,2) 3 (3,7) (4,6) (4, 7) (5,5) (6,2) 4 (2,9) (4,6) (4, 7) (5,5) (6,2) 5 (2,9) (4,6) (4, 7) (5,5) (6,2)

7 7 9 2 3 1 1 5 2 5 0 4 San 3 1 3 6 Jose 2 5 2 6 4 2

Node 1 could have chose 3 as next node because of tie

33 Count-to-Infinity Problem

 It converges to the correct answer quickly to good news but slowly to bad news.

B knows A is 1 hop away while all other routers still think A is down, why?

What is the spreading rate of good news? Does B know that C’s path runs through B?

How many exchanges needed in a N-hop subnet? Why spreading rate of bad news so slow? What is the core problem?

Problem: Bad News Travels Slowly

Remedies  Split Horizon  Do not report route to a destination to the neighbor from which route was learned

 Poisoned Reverse  Report route to a destination to the neighbor from which route was learned, but with infinite distance  Breaks erroneous direct loops immediately  Does not work on some indirect loops

34 10/14/2013

Counting to Infinity Problem

(a) 1 2 3 4 1 1 1 Nodes believe best path is through each (b) 1 2 3 X 4 other 1 1 (Destination is node 4)

Update Node 1 Node 2 Node 3

Before break (2,3) (3,2) (4, 1) After break (2,3) (3,2) (2,3) 1 (2,3) (3,4) (2,3) 2 (2,5) (3,4) (2,5) 3 (2,5) (3,6) (2,5) 4 (2,7) (3,6) (2,7) 5 (2,7) (3,8) (2,7) … … … …

Problem: Bad News Travels Slowly

Remedies

• Split Horizon – Do not report route to a destination to the neighbour from which route was learned.

• Split Horizon with Poisoned Reverse – Report route to a destination to the neighbour from which route was learned, but with infinite distance.

1 10/14/2013

Split Horizon with Poison Reverse

(a) 1 2 3 4 1 1 1 Nodes believe best path is through (b) 1 2 3 X 4 each other 1 1

Update Node 1 Node 2 Node 3 Before break (2, 3) (3, 2) (4, 1) After break (2, 3) (3, 2) (-1, ) Node 2 advertises its route to 4 to node 3 as having distance infinity; node 3 finds there is no route to 4

1 (2, 3) (-1, ) (-1, ) Node 1 advertises its route to 4 to node 2 as having distance infinity; node 2 finds there is no route to 4

2 (-1, ) (-1, ) (-1, ) Node 1 finds there is no route to 4

2 Link-State Algorithm

 Basic idea: two stage procedure  Each source node gets a map of all nodes and link metrics (link state) of the entire network  Learning who the neighbors are and what are delay to them  Construct a link state packet, and deliver it to others  Find the shortest path on the map from the source node to all destination nodes; Dijkstra’s algorithm

 Broadcast of link-state information  Every node i in the network broadcasts to every other node in the network:

 ID’s of its neighbors: Ni=set of neighbors of i

 Distances to its neighbors: {Cij | j ∈Ni}  Flooding is a popular method of broadcasting packets

Building Link State Packets

 A state packet starts with the ID of the sender, a seq#, age, and a list of neighbors with delay information.

(a) A subnet. (b) The link state packets for this subnet.

When to build the link state packets? Periodically, or when significant event occurs.

35 Distributing the Link State Packets

 Flooding is used to distribute the link state packets.

What is the major problem with flooding?

How to handle the problem? (source router, sequence number)

How to make the sequence number unique? 32-bit sequence number

What happens if a router crashes, losing its track, and starts again?

What happens if sequence number is corrupted, say 65,540, not 4. Age field (in sec; usually a packet comes in 10 sec.) All packets should be ACKed.

A Packet Buffer

flooding

36 Shortest Paths in Dijkstra’s Algorithm

2 2 1 3 1 1 3 1 6 6 5 2 5 2 3 3 1 4 2 1 4 2 3 3 2 2 4 5 4 5 2 2 1 3 1 1 3 1 6 6 5 2 5 2 3 3 4 1 2 1 4 2 3 3 2 2 4 5 4 5 2 2 1 3 1 1 3 1 6 6 5 2 5 2 3 3 1 4 2 1 4 2 3 3 2 2 4 5 4 5

Execution of Dijkstra’s algorithm 

2 2 1 3 1 1 3 1 6 6  5 2 5 2 3 3 4 4  1 4 2 1 4 2 3 3 2 2  4 5 4 5

Iteration N D2 D3 D4 D5 D6 Initial {1} 3 2  5 ∝ ∝ 1 {1,3} 3  2 4 ∝ 3 2 {1,2,3} 3 2 4 7 3  3 {1,2,3,6} 3 2 4  5 3 4 {1,2,3,4,6} 3 2 4 5  3 5 {1,2,3,4,5,6} 3 2 4 5 3

37 Dijkstra’s algorithm

 N: set of nodes for which shortest path already found  Initialization: (Start with source node s)

 N = {s}, Ds = 0, “s is distance zero from itself”

 Dj=Csj for all j ≠ s, distances of directly-connected neighbors  Step A: (Find next closest node i)  Find i ∉ N such that

 Di = min Dj for j ∉ N  Add i to N  If N contains all the nodes, stop  Step B: (update minimum costs)  For each node j ∉ N Minimum distance from s to  D = min (D , D +C ) j j i ij j through node i in N  Go to Step A

Reaction to Failure

 If a link fails,  Router sets link distance to infinity & floods the network with an update packet  All routers immediately update their link database & recalculate their shortest paths  Recovery very quick

 But watch out for old update messages  Add time stamp or sequence # to each update message  Check whether each received update message is new  If new, add it to database and broadcast  If older, send update message on arriving link

38 Why is Link State Better?

 Fast, loopless convergence  Support for precise metrics, and multiple metrics if necessary (throughput, delay, cost, reliability)  Support for multiple paths to a destination  algorithm can be modified to find best two paths  More flexible, e.g., source routing

What is the memory required to store the input data for a subnet with n routers – each of them has k neighbors? But not critical today!

Source Routing vs. H-by-H

 Source host selects path that is to be followed by a packet  Strict: sequence of nodes in path inserted into header  Loose: subsequence of nodes in path specified  Intermediate switches read next-hop address and remove address  Or maintained for the reverse path  Source routing allows the host to control the paths that its information traverses in the network  Potentially the means for customers to select what service providers they use

39 Example

3,6,B 6,B 1,3,6,B 1 3 B 6 A

4 B Source host 2 5 Destination host

How path learned? Source host needs link state information

Chapter 7 Packet-Switching Networks

Traffic Management Packet Level Flow Level Flow-Aggregate Level

40 QOS in IP Networks

 IETF groups are working on proposals to provide QOS control in IP networks, i.e., going beyond best effort to provide some assurance for QOS  Work in Progress includes RSVP, Differentiated Services, and Integrated Services  Simple model for sharing and congestion studies:

QoS #1

Principles for QOS Guarantees

 Consider a phone application at 1Mbps and an FTP application sharing a 1.5 Mbps link.  bursts of FTP can congest the router and cause audio packets to be dropped.  want to give priority to audio over FTP  PRINCIPLE 1: Marking of packets is needed for router to distinguish between different classes; and new router policy to treat packets accordingly

QoS #2

1 Principles for QOS Guarantees (more)

 Applications misbehave (audio sends packets at a rate higher than 1Mbps assumed above);  PRINCIPLE 2: provide protection (isolation) for one class from other classes  Require Policing Mechanisms to ensure sources adhere to bandwidth requirements; Marking and Policing need to be done at the edges:

QoS #3

Principles for QOS Guarantees (more)

 Alternative to Marking and Policing: allocate a set portion of bandwidth to each application flow; can lead to inefficient use of bandwidth if one of the flows does not use its allocation  PRINCIPLE 3: While providing isolation, it is desirable to use resources as efficiently as possible

QoS #4

2 Principles for QOS Guarantees (more)

 Cannot support traffic beyond link capacity  Two phone calls each requests 1 Mbps  PRINCIPLE 4: Need a Call Admission Process; application flow declares its needs, network may block call if it cannot satisfy the needs

QoS #5

QoS #6

3 Traffic Management

Vehicular traffic management Packet traffic management  Traffic lights & signals  Multiplexing & access control flow of traffic in city mechanisms to control flow street system of packet traffic  Objective is to maximize  Objective is make efficient flow with tolerable delays use of network resources &  Priority Services deliver QoS  Police sirens  Priority  Cavalcade for dignitaries  Fault-recovery packets  Bus & High-usage lanes  Real-time traffic  Trucks allowed only at night  Enterprise (high-revenue) traffic  High bandwidth traffic

Time Scales & Granularities

 Packet Level  Queueing & scheduling at multiplexing points  Determines relative performance offered to packets over a short time scale (microseconds)  Flow Level  Management of traffic flows & resource allocation to ensure delivery of QoS (milliseconds to seconds)  Matching traffic flows to resources available; congestion control  Flow-Aggregate Level  Routing of aggregate traffic flows across the network for efficient utilization of resources and meeting of service levels  “Traffic Engineering”, at scale of minutes to days

41 End-to-End QoS

Packet buffer

1 2 N – 1 N

 A packet traversing network encounters delay and possible loss at various multiplexing points  End-to-end performance is accumulation of per-hop performances

How to keep end-to-end delay under some upper bound? Jitter, loss?

Scheduling & QoS

 End-to-End QoS & Resource Control  Buffer & bandwidth control → Performance  Admission control to regulate traffic level

 Scheduling Concepts  fairness/isolation  priority, aggregation,

 Fair Queueing & Variations  WFQ, PGPS

 Guaranteed Service  WFQ, Rate-control

 Packet Dropping  aggregation, drop priorities

84

42 FIFO Queueing

Packet buffer Arriving packets Transmission Packet discard link when full

 All packet flows share the same buffer  Transmission Discipline: First-In, First-Out  Buffering Discipline: Discard arriving packets if buffer is full (Alternative: random discard; pushout head-of-line, i.e. oldest, packet) How about aggressiveness vs. fairness?

85

FIFO Queueing

 Cannot provide differential QoS to different packet flows  Different packet flows interact strongly

 Statistical delay guarantees via load control  Restrict number of flows allowed (connection admission control)  Difficult to determine performance delivered

 Finite buffer determines a maximum possible delay

 Buffer size determines loss probability  But depends on arrival & packet length statistics

 Variation: packet enqueueing based on queue thresholds  some packet flows encounter blocking before others  higher loss, lower delay

86

43 FIFO w/o and w/ Discard Priority

(a) Packet buffer Arriving packets Transmission Packet discard link when full

(b) Packet buffer Arriving packets Transmission link Class 1 Class 2 discard discard when threshold when full exceeded

87

HOL Priority Queueing

Packet discard when full High-priority Transmission packets link

Low-priority When packets high-priority Packet discard queue empty when full

 High priority queue serviced until empty  High priority queue has lower waiting time  Buffers can be dimensioned for different loss probabilities  Surge in high priority queue can cause low priority queue to

saturate 88

44 HOL Priority Features

Strict priority vs. WTP  Provides differential QoS  Pre-emptive priority: lower classes invisible  Non-preemptive priority: lower (Note: Need labeling) classes impact higher classes Delay through residual service times  High-priority classes can hog all of the bandwidth & starve lower priority classes  Need to provide some isolation between classes

Per-class loads

89

Fair Queuing / Generalized Processor Sharing (GPS)

Packet flow 1 Approximated bit-level round robin service Packet flow 2 C bits/second

Transmission … Packet… flow n link

 Each flow has its own logical queue: prevents hogging; allows differential loss probabilities  C bits/sec allocated equally among non-empty queues  transmission rate = C / n(t), where n(t)=# non-empty queues  Idealized system assumes fluid flow from queues  Implementation requires approximation: simulate fluid system; sort

packets according to completion time in ideal system 90

45 Fair Queuing – Example 1

Buffer 1 Fluid-flow system: at t=0 both packets served at rate ½ (overall rate : Buffer 2 1 1 unit/second) at t=0 Both packets t complete service 0 1 2 at t = 2

Packet from Packet-by-packet system: buffer 2 waiting buffer 1 served first at rate 1; 1 then buffer 2 served at rate 1. Packet from buffer 2 being served Packet from t buffer 1 being 0 1 2 91 served

Fair Queuing – Example 2

2 Buffer 1 Fluid-flow system: at t=0 both packets served at rate 1/2 Buffer 2 1 Packet from buffer at t=0 2 served at rate 1 t 0 2 3 Service rate = reciprocal of the number of active buffers at the time. * Within a buffer, FIFO still though!

Packet from Packet-by-packet buffer 2 fair queueing: waiting 1 buffer 2 served at rate Packet from 1 buffer 1 served t 92 at rate 1 0 1 2 3

46 Weighted Fair Queueing (WFQ)

Buffer 1 Fluid-flow system: at t=0 packet from buffer 1 served at rate 1/4; Buffer 2 1 at t=0 Packet from buffer 1 Packet from buffer 2 t served at rate 1 served at rate 3/4 0 1 2

Packet-by-packet weighted fair Packet from queueing: buffer 1 waiting buffer 2 served first at rate 1; then buffer 1 served at rate 1 1 Packet from Packet from buffer 1 served at rate 1 buffer 2 t served at rate 1 0 1 2 93

Example

 FIFO -> Fair queueing  packet-by-packet vs. bit-by-bit

(a) A router with five packets queued for line O. (b) Finishing times for the five packets.

47 Packetized GPS/WFQ

Sorted packet buffer Arriving Tagging packets unit Transmission Packet discard link when full

 Compute packet completion time in ideal system  add tag to packet  sort packet in queue according to tag  serve according to HOL  WFQ and its many variations form the basis for providing QoS in packet networks 95

Buffer Management

 Packet drop strategy: Which packet to drop when buffers full  Fairness: protect behaving sources from misbehaving sources  Aggregation:  Per-flow buffers protect flows from misbehaving flows  Full aggregation provides no protection  Aggregation into classes provided intermediate protection  Drop priorities:  Drop packets from buffer according to priorities  Maximizes network utilization & application QoS  Examples: layered video, policing at network edge  Controlling sources at the edge

48 Early or Overloaded Drop

Random early detection (RED):  drop pkts if short-term avg of queue exceeds threshold  pkt drop probability increases linearly with queue length  mark offending pkts  improves performance of cooperating TCP sources  increases loss probability of misbehaving sources

Random Early Detection (RED)

 Packets produced by TCP will reduce input rate in response to  Early drop: discard packets before buffers are full  Random drop causes some sources to reduce rate before others, causing gradual reduction in aggregate input rate

Algorithm:  Maintain running average of queue length

 If Qavg < minthreshold, do nothing

 If Qavg > maxthreshold, drop packet  If in between, drop packet according to probability  Flows that send more packets are more likely to have packets dropped

49 Packet Drop Profile in RED

1 Probability of packet drop 0 minth maxth full Average queue length

The method of “early” is used to notify some source to reduce its transmission rate before the buffer becomes full.

Chapter 7 Packet-Switching Networks

Traffic Management at the Flow Level

50 Why Congestion?

Congestion Congestion occurs when a surge of traffic overloads 3 6 network resources

Can a large buffer help? 1 Or even worse? 4 8

2 7 5 Approaches to Congestion Control: • Preventive Approaches (open –loop): Scheduling & Reservations • Reactive Approaches (closed-loop): Detect & Throttle/Discard

Ideal Effect of Congestion Control

(Controlled)

(uncontrolled)

Resources used efficiently up to capacity available

51 Open-Loop Control

 Network performance is guaranteed to all traffic flows that have been admitted into the network  Initially for connection-oriented networks  Key Mechanisms  Admission Control  Policing   Traffic Scheduling

Admission Control

 Flows negotiate contract with network Peak rate  Specify requirements:  Peak, Avg., Min Bit rate  Maximum burst size Average rate  Delay, Loss requirement  Network computes Bits/second resources needed  “Effective” bandwidth  If flow accepted, network allocates resources to Time ensure QoS delivered as Typical bit rate demanded by long as source conforms to a variable bit rate information contract source

52 Policing

 Network monitors traffic flows continuously to ensure they meet their traffic contract  When a packet violates the contract, network can discard or tag the packet giving it lower priority  If congestion occurs, tagged packets are discarded first  Algorithm is the most commonly used policing mechanism  Bucket has specified leak rate for average contracted rate  Bucket has specified depth to accommodate variations in arrival rate  Arriving packet is conforming if it does not result in overflow

The Leaky Bucket

water poured irregularly

(a) A leaky bucket with water. (b) a leaky bucket with packets.

53 The Leaky Bucket Example

 Data comes to a router in 1 MB bursts, that is, an input runs at 25 MB/s (burst rate) for 40 msec. The router is able to support 2 MB/s output (leaky) rate. The router uses a leaky bucket for traffic shaping.

(1) How large the bucket should be so there is no data loss (assuming fluid system)? (2) Now, if the leaky bucket size is 1MB, how long the maximum burst interval can be?

The Leaky Bucket Example

° Example: data comes to a router in 1 MB bursts, that is, an input runs at 25 MB/s for 40 msec. The router is able to support 2 MB/s outgoing (leaky) rate. The leaky bucket size is 1MB.

(a) Input to a leaky bucket. (b) Output from a leaky bucket.

54 Leaky Bucket Algorithm

Arrival of a packet at time ta Depletion rate: 1 packet per unit time X’ = X - (ta - LCT)

L+I = Bucket Depth Current bucket Interarrival time

content Yes I = increment per arrival, X’ < 0? nominal interarrival time

Non-empty No X’ = 0 empty Nonconforming Yes X’ > L? packet arriving packet No would cause overflow X = X’ + I X = value of the leaky bucket counter LCT = ta X’ = auxiliary variable conforming packet LCT = last conformance time conforming packet

Leaky Bucket Example

I = 4 L = 6 Nonconforming Packet arrival

Time

L+I

Bucket content I Per-packet not fluid system

* * * * * * * * * Time

Non-conforming packets not allowed into bucket & hence not included in calculations maximum burst size (MBS = 3 packets)

55 Leaky Bucket Traffic Shaper

Size N Incoming traffic Shaped traffic Server

Packet

 Buffer incoming packets  Play out periodically to conform to parameters  Surges in arrivals are buffered & smoothed out  Possible packet loss due to buffer overflow

Too restrictive, since conforming traffic does not need to be completely smooth, how to allow some burstiness?

Token Bucket Traffic Shaper

Tokens arrive periodically An incoming packet must have sufficient tokens before admission into the network Size K Token

Size N Incoming traffic Shaped traffic Server

Packet  Token rate regulates transfer of packets  If sufficient tokens available, packets enter network without delay  K determines how much burstiness allowed into the network

56 Token Bucket Traffic Shaper

• LB Traffic Shaper is restricted as its output rate is constant when the buffer is not empty • most applications generate variable rate traffic • storing packets for such traffic causes unnecessary delay •Token Bucket Traffic Shaper is designed to allow variation in the transmission rate • it employs a buffer to absorb the burst, as LB shaper Tokens arrive • it controls the output rate using a token bucket periodically (counter)

The number of bytes in a burst length of S sec at the maximum output rate M bytes/sec if the token bucket capacity is C tokens, Size K token arrival rate is R tokens/sec is MS = C + RS Token S = C/(M-R) Packet Shaped Size N Incoming traffic traffic Server

TB Algorithm Algorithm: • the token is generated at a constant rate and stored in the token bucket with capacity C; if the bucket is full, arriving tokens are discarded. In a packet system the token is a count of bytes, e.g. the token is generated as R tokens (bytes) in ∆T seconds. • a packet from the buffer is transmitted only if the tokens can be drawn from the bucket, I.e. the bucket has tokens equivalent to the length of the packet, L bytes. • if no token is available, the packet is backlogged in the buffer • when the bucket is empty, then backlogged packets are transmitted at the rate at which token arrives in the buffer • it behaves like a LB shaper transmitting packets at a constant rate • when the bucket is not empty, then the packets are drawn from the buffer as soon as they arrive • the traffic burstiness is preserved • however if burstiness continues the bucket is exhausted and the packets are backlogged • the bucket size limits the burstiness

6 Delay Bound

A(t) = b+rt R(t) b bytes R(t) instantly No backlog of packets r bytes per Buffer b second occupancy R - r @ 1 empty 0 b t t t R

• In time T a token bucket with rate r bytes/sec and depth b can transmit b+rT bytes • b bytes at time t=0 and rT upto time T • Assume a TB shaper followed by two multiplexers in tandem each capable of transmitting R bytes/sec, where R > r • first multiplexer will buffer b bytes at t=0 and will subsequently receive r bytes/sec • the buffer occupancy will be b bytes, which will be drained at the rate R-r bytes/sec • a newly arrived byte experiences delay equivalent to the time it takes for the buffer occupancy to dissipate

Delay Bound Calculation

•the maximum delay experienced by the bytes following b bytes in the first multiplexer will be b/R seconds • no queue will build up at the second multiplexer, because arrival rate = service rate • the maximum delay through a chain of multiplexers each guaranteeing R bytes/sec bandwidth to the flow is b/R Æ fluid flow model • Parekh et al comes up with delay bound in a packet network to be ∑ D <= b/R + (H-1)m/R + j=1,H(M/Rj), where m is max. packet size of the flow, M the max packet size in the network, H is the number of hops and Rj is the link bandwidth b/R: buffer dissipation time at the first hop (H-1)m/R: transmit time of the preceding packet of the same flow at every other hop ∑ j=1,H (M/Rj): transmit time of the preceding packet of other flow at every hop

7 Token Bucket Shaping Effect

b bytes instantly b + r t The token bucket constrains the traffic from a source to be limited to b + r t bits in an interval of length t

r bytes/second

t

Q1: what are two main differences of a leaky bucket and a token bucket? Allow saving for burst spending; packet discarding or not.

Q2: When a token bucket is the same as a leaky bucket? b = 0; but still different indeed: packet discarding or not

The Token Bucket Example 1

° A network uses a token bucket for traffic shaping. A new token is put into the bucket every 1 msec. Each token is good for one packet, which contains 100 bytes of data. What is the maximum sustainable (input) data rate?

57 The Token Bucket Example 2

° Given: the token bucket capacity C, the token arrival rate p, and the maximum output rate M, calculate the maximum burst interval S

C + pS = MS

° Example 2: data comes to a router in 1 MB bursts, that is, an input runs at 25 MB/s (burst rate) for 40 msec. The router uses a token bucket with capacity of 250KB for traffic shaping. Initially, the bucket is full of tokens. And, the tokens are generated and put into the bucket in a rate of 2 MB/s.

What will be the output from the token bucket?

Fluid Examples

Output from a token bucket with capacities of (c) 250 KB, (d) 500 KB, (e) 750 KB, (f) Output from a 500KB token bucket feeding a 10-MB/sec leaky bucket of 1MB.

58 Current View of Router Function

Routing Reservation Mgmt. Agent Agent Agent

Admission Control

[Routing database] [Traffic control database]

Classifier Pkt. scheduler Input Internet driver forwarder Output driver How about security?

Closed-Loop Flow Control

 Congestion control  feedback information to regulate flow from sources into network  Based on buffer content, link utilization, etc.  Examples: TCP at transport layer; congestion control at ATM level  End-to-end vs. Hop-by-hop  Delay in effecting control  Implicit vs. Explicit Feedback  Source deduces congestion from observed behavior  Routers/switches generate messages alerting to congestion

59 E2E vs. H2H Congestion Control

Source Packet flow Destination

(a)

Source Destination

(b)

Feedback information

TCP vs. ATM

Congestion Warning

 Threshold-based utilization warning  Which factor used for threshold calculation?  How to measure the utilization? Instantaneously or smoothed?  How to set the threshold?  How many threshold levels?

 The Warning Bit in ACKs

 Choke packets to the source

Isn’t this approach too slow in reaction?

60 Traffic Engineering

 Management exerted at flow aggregate level  Distribution of flows in network to achieve efficient utilization of resources (bandwidth)  Shortest path algorithm to route a given flow not enough  Does not take into account requirements of a flow, e.g. bandwidth requirement  Does not take account interplay between different flows  Must take into account aggregate demand from all flows

Effect of Traffic Engineering

3 6 3 6

1 1 4 7 4 7

2 2 8 8 5 5

(a) (b) Better flow allocation Shortest path routing distributes flows congests link 4 to 8 more uniformly

61 Homework

 Chapter 7  Homework (due one week later):  7.28, 7.29, 7.33, 7.34, 7.51, 7.56, 7.57, 7.58, 7.59, 7.64

62