Fair Queuing • Fair Queuing • Core-stateless Fair queuing 15-744: Computer Networking • Assigned reading • Analysis and Simulation of a Fair Queueing Algorithm, Internetworking: Research and Experience L-7 Fair Queuing • Congestion Control for High Bandwidth-Delay Product Networks (XTP - 2 sections)

2

Overview Example • 10Gb/s linecard • TCP and queues • Requires 300Mbytes of buffering. • Queuing disciplines • Read and write 40 byte packet every 32ns.

•RED • Memory technologies • DRAM: require 4 devices, but too slow. • Fair-queuing • SRAM: require 80 devices, 1kW, $2000. • Core-stateless FQ • Problem gets harder at 40Gb/s • Hence RLDRAM, FCRAM, etc. •XCP

3 4

1 Rule-of-thumb If flows are synchronized

• Rule-of-thumb makes sense for one flow Wmax • Typical backbone link has > 20,000 flows

• Does the rule-of-thumb still hold? Wmax  2

Wmax

Wmax 2

t • Aggregate window has same dynamics • Therefore buffer occupancy has same dynamics • Rule-of-thumb still holds.

5 6

If flows are not synchronized Central Limit Theorem

W B • CLT tells us that the more variables (Congestion 0 Windows of Flows) we have, the narrower the Gaussian (Fluctuation of sum of windows) • Width of Gaussian decreases with 1 n 1 • Buffer size should also decreases with n

Bn1 2T C Buffer Size Probability B   Distribution n n

7 8

2 Required buffer size Overview

• TCP and queues

• Queuing disciplines

•RED 2TC n • Fair-queuing

• Core-stateless FQ

Simulation •XCP

9 10

Queuing Disciplines Packet Drop Dimensions • Each router must implement some queuing discipline Aggregation Per-connection state Single class • Queuing allocates both bandwidth and buffer space: Class-based queuing • Bandwidth: which packet to serve (transmit) Drop position next Head Tail • Buffer space: which packet to drop next (when required) Random location • Queuing also affects latency Early drop Overflow drop

11 12

3 Typical Queuing FIFO + Drop-tail Problems • FIFO + drop-tail • Leaves responsibility of congestion control • Simplest choice to edges (e.g., TCP) • Used widely in the Internet • FIFO (first-in-first-out) • Does not separate between different flows • Implies single class of traffic • No policing: send more packets  get more • Drop-tail service • Arriving packets get dropped when queue is full regardless of flow or importance • Synchronization: end hosts react to same • Important distinction: events • FIFO: discipline • Drop-tail: drop policy

13 14

Active Queue Management Active Queue Designs • Design active router queue management to • Modify both router and hosts aid congestion control • DECbit – congestion bit in packet header •Why? • Modify router, hosts use TCP • Routers can distinguish between propagation • Fair queuing and persistent queuing delays • Per-connection buffer allocation • Routers can decide on transient congestion, • RED (Random Early Detection) based on workload • Drop packet or set bit in packet header as soon as congestion is starting

15 16

4 Overview Internet Problems • Full queues • TCP and queues • Routers are forced to have have large queues • Queuing disciplines to maintain high utilizations • TCP detects congestion from loss •RED • Forces network to have long standing queues in steady-state • Fair-queuing • Lock-out problem • Core-stateless FQ • Drop-tail routers treat bursty traffic poorly

•XCP • Traffic gets synchronized easily  allows a few flows to monopolize the queue space

17 18

Design Objectives Lock-out Problem • Keep throughput high and delay low • Random drop • Accommodate bursts • Packet arriving when queue is full causes some random packet to be dropped • Queue size should reflect ability to accept bursts rather than steady-state queuing • Drop front • Improve TCP performance with minimal • On full queue, drop packet at head of queue hardware changes • Random drop and drop front solve the lock- out problem but not the full-queues problem

19 20

5 Full Queues Problem Random Early Detection (RED) • Drop packets before queue becomes full • Detect incipient congestion, allow bursts (early drop) • Keep power (throughput/delay) high • Intuition: notify senders of incipient • Keep average queue size low congestion • Assume hosts respond to lost packets • Example: early random drop (ERD): • Avoid window synchronization • If qlen > drop level, drop each new packet with fixed • Randomly mark packets probability p • Does not control misbehaving users • Avoid bias against bursty traffic • Some protection against ill-behaved users

21 22

RED Algorithm RED Operation

• Maintain running average of queue length Max thresh Min thresh

• If avgq < minth do nothing • Low queuing, send packets through

• If avgq > maxth, drop packet Average Queue Length • Protection from misbehaving sources P(drop) • Else mark packet in a manner proportional 1.0 to queue length

• Notify sources of incipient congestion maxP

minth maxth Avg queue length

23 24

6 RED Algorithm Queue Estimation

• Maintain running average of queue length • Standard EWMA: avgq = (1-wq) avgq + wqqlen • Byte mode vs. packet mode – why? • Special fix for idle periods – why? • Upper bound on w depends on min • For each packet arrival q th • Want to ignore transient congestion • Calculate average queue size (avg) • Can calculate the queue average if a burst arrives

• If minth ≤ avgq < maxth •Set wq such that certain burst size does not exceed minth • Calculate probability P a • Lower bound on wq to detect congestion relatively • With probability Pa quickly • Mark the arriving packet •Typical wq = 0.002 • Else if maxth ≤ avg • Mark the arriving packet

25 26

Thresholds Packet Marking

•minth determined by the utilization •maxp is reflective of typical loss rates requirement • Paper uses 0.02 • Tradeoff between queuing delay and utilization • 0.1 is more realistic value • Relationship between max and min th th • If network needs marking of 20-30% then • Want to ensure that feedback has enough time to make difference in load need to buy a better link! • Depends on average queue increase in one • Gentle variant of RED (recommended) RTT • Vary drop rate from maxp to 1 as the avgq • Paper suggest ratio of 2 varies from maxth to 2* maxth • Current rule of thumb is factor of 3 • More robust to setting of maxth and maxp

27 28

7 Extending RED for Flow Isolation Overview • Problem: what to do with non-cooperative flows? • TCP and queues • Fair queuing achieves isolation using per- • Queuing disciplines flow state – expensive at backbone routers • How can we isolate unresponsive flows without •RED per-flow state? • Fair-queuing • RED penalty box • Monitor history for packet drops, identify flows • Core-stateless FQ that use disproportionate bandwidth • Isolate and punish those flows •XCP

29 30

Fairness Goals What is Fairness?

• Allocate resources fairly • At what granularity? • Isolate ill-behaved users • Flows, connections, domains? • Router does not send explicit feedback to • What if users have different RTTs/links/etc. source • Should it share a link fairly or be TCP fair? • Still needs e2e congestion control • Maximize fairness index? 2 2 • Still achieve statistical muxing • Fairness = (xi) /n(xi ) 0

31 32

8 Max-min Fairness Max-min Fairness Example • Allocate user with “small” demand what it • Assume sources 1..n, with resource wants, evenly divide unused resources to demands X1..Xn in ascending order “big” users • Assume channel capacity C. • Formally: • Give C/n to X1; if this is more than X1 wants, • Resources allocated in terms of increasing demand divide excess (C/n - X1) to other sources: each • No source gets resource share larger than its gets C/n + (C/n - X1)/(n-1) demand • If this is larger than what X2 wants, repeat • Sources with unsatisfied demands get equal share process of resource

33 34

Max-Min Fair Sharing Example Implementing max-min Fairness • Generalized processor sharing • Fluid fairness C – else ri rfair = • Bitwise round robin among all queues n here • Why not simple round robin? • Variable packet length  can get more service by sending bigger packets • Unfair instantaneous service rate • What if arrive just before/after packet departs?

Assume 10 Mbs links

36

9 Bit-by-bit RR Bit-by-bit RR Illustration • Single flow: clock ticks when a bit is • Not feasible to transmitted. For packet i: interleave bits on •Pi = length, Ai = arrival time, Si = begin transmit real networks time, F = finish transmit time i • FQ simulates bit-by- •F= S +P = max (F , A ) + P i i i i-1 i i bit RR • Multiple flows: clock ticks when a bit from all active flows is transmitted  round number

• Can calculate Fi for each packet if number of flows is know at all times • This can be complicated

37 38

Fair Queuing FQ Illustration • Mapping bit-by-bit schedule onto packet transmission schedule Flow 1

• Transmit packet with the lowest Fi at any Flow 2 given time I/P O/P

• How do you compute Fi?

Flow n Variation: Weighted Fair Queuing (WFQ)

39 40

10 Bit-by-bit RR Example Fair Queuing Tradeoffs

• FQ can control congestion by monitoring flows Flow 1 Flow 2 Output • Non-adaptive flows can still be a problem – why? • Complex state

F=10 • Must keep queue per flow F=8 • Hard in routers with many flows (e.g., backbone routers) Flow 1 Flow 2 F=5 Output • Flow aggregation is a possibility (e.g. do fairness per domain) (arriving) transmitting • Complex computation • Classification into flows may be hard

Cannot preempt packet F=10 • Must keep queues sorted by finish times currently being transmitted • Finish times change whenever the flow count changes F=2

41 42

Overview Core-Stateless Fair Queuing

• Key problem with FQ is core routers • TCP and queues • Must maintain state for 1000’s of flows • Queuing disciplines • Must update state at Gbps line speeds • CSFQ (Core-Stateless FQ) objectives •RED • Edge routers should do complex tasks since they have fewer flows • Fair-queuing • Core routers can do simple tasks • No per-flow state/processing  this means that core routers • Core-stateless FQ can only decide on dropping packets not on order of • Not discussed in class – FYI only processing •XCP • Can only provide max-min bandwidth fairness not delay allocation

43 44

11 Core-Stateless Fair Queuing Edge Router Behavior • Edge routers keep state about flows and do • Monitor each flow i to measure its arrival

computation when packet arrives rate (ri) • DPS (Dynamic Packet State) •EWMA of rate • Edge routers label packets with the result of • Non-constant EWMA constant state lookup and computation •e-T/K where T = current interarrival, K = constant • Core routers use DPS and local • Helps adapt to different packet sizes and arrival patterns measurements to control processing of • Rate is attached to each packet packets

45 46

Core Router Behavior F vs. Alpha • Keep track of fair share rate α • Increasing α does not increase load (F) by N * F α C [linked capacity] •F(α)= Σi min(ri, α)  what does this look like? • Periodically update α • Keep track of current arrival rate • Only update α if entire period was congested or uncongested alpha • Drop probability for packet = max(1- α/r, 0) r1 r2 r3 old alpha New alpha

47 48

12 Estimating Fair Share Other Issues • Need F(α) = capacity = C • Punishing fire-hoses – why? • Can’t keep map of F(α) values  would require per • Easy to keep track of in a FQ scheme flow state •Since F(α) is concave, piecewise-linear • What are the real edges in such a scheme?

• F(0) = 0 and F(α) = current accepted rate = Fc • Must trust edges to mark traffic accurately •F(α) = Fc/ α • Could do some statistical sampling to see if •F(α ) = C  α = α * C/F new new old c edge was marking accurately • What if a mistake was made? • Forced into dropping packets due to buffer capacity • When queue overflows α is decreased slightly

49 50

Overview How does XCP Work?

• TCP and queues

• Queuing disciplines

•RED Round TripRound Time Trip Time

CongestionCongestion Window Window • Fair-queuing FeedbackFeedbackFeedback = • Core-stateless FQ + 0.1 packet

•XCP Congestion Header • See also slides by Nicolas Feltman

51 52

13 How does XCP Work? How does XCP Work?

Round Trip Time Congestion Window = Congestion Window + Feedback

Congestion Window XCP extends ECN and CSFQ Feedback == +- 0.30.1 packet Routers compute feedback without any per-flow state

53 54

How Does an XCP Router Compute the Getting the devil out of the details … Feedback? Congestion Controller Fairness Controller Congestion Controller Fairness Controller Algorithm: Congestion  Fairness  Spare - If  > 0  Divide  equally between flows Goal: Matches input traffic to Goal: Divides  between =  avg Controller Controller d  If  < 0  Divide  between flows link capacity & drains the queue flows to converge to fairness Queue proportionally to their current rates Theorem: System converges Looks at aggregate traffic & Looks at a flow’s state in to optimal utilization (i.e., Need to estimate number of Congestion Header stable) for any link bandwidth, queue   flows N MIMD AIMD delay, number of sources if: Algorithm: Algorithm: 1 2 N   If  > 0  Divide  equally   Aggregate traffic changes by 0  and  2 pkts in T T (Cwnd pkt / RTTpkt )  ~ Spare Bandwidth between flows 4 2  ~ - Queue Size If  < 0  Divide  between RTTpkt : Round Trip Time in header  flows proportionally to their So, = avg Spare - (Proof based on Nyquist Cwndpkt : Congestion Window in header  d current rates No Parameter Tuning No Per-Flow State  Queue Criterion) T: Counting Interval 55 56

14 Discussion Important Lessons •RED • Parameter settings • How does TCP implement AIMD? • RED vs. FQ • Sliding window, slow start & ack clocking • How much do we need per flow tracking? At what cost? • How to maintain ack clocking during loss recovery • FQ vs. XCP/CSFQ  fast recovery • Is coarse-grained fairness sufficient? • Misbehaving routers/trusting the edge • How does TCP fully utilize a link? • Deployment (and incentives) • Role of router buffers • How painful is FQ • XCP vs CSFQ • What are the key differences • TCP alternatives • Granularity of fairness • TCP being used in new/unexpected ways • Mechanism vs. policy  will see this in QoS • Key changes needed

57 58

Lessons

• Fairness and isolation in routers • Why is this hard? • What does it achieve – e.g. do we still need congestion control? EXTRA SLIDES • Routers • FIFO, drop-tail interacts poorly with TCP • Various schemes to desynchronize flows and control loss rate (e.g. RED) • Fair-queuing • Clean resource allocation to flows • Complex packet classification and scheduling The rest of the slides are FYI • Core-stateless FQ & XCP • Coarse-grain fairness • Carrying packet state can reduce complexity

59

15