Copa: Practical Delay-Based Congestion Control for the Internet

Copa: Practical Delay-Based Congestion Control for the Internet Venkat Arun and Hari Balakrishnan, MIT CSAIL https://www.usenix.org/conference/nsdi18/presentation/arun This paper is included in the Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’18). April 9–11, 2018 • Renton, WA, USA ISBN 978-1-939133-01-4 Open access to the Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation is sponsored by USENIX. Copa: Practical Delay-Based Congestion Control for the Internet Venkat Arun and Hari Balakrishnan M.I.T. Computer Science and Artificial Intelligence Laboratory Email: fvenkatar,[email protected] Abstract time or low interactive delay). Larger BDPs exacerbate This paper introduces Copa, an end-to-end conges- the \bufferbloat" problem. A more global Internet tion control algorithm that uses three ideas. First, it leads to flows with very different propagation delays sharing a bottleneck (exacerbating the RTT-unfairness shows that a target rate equal to 1=(ddq), where dq is the (measured) queueing delay, optimizes a natural func- exhibited by many current protocols). tion of throughput and delay under a Markovian packet At the same time, application providers and users arrival model. Second, it adjusts its congestion window have become far more sensitive to performance, with in the direction of this target rate, converging quickly to notions of \quality of experience" for real-time and the correct fair rates even in the face of significant flow streaming media, and various metrics to measure Web churn. These two ideas enable a group of Copa flows performance being developed. Many companies have to maintain high utilization with low queuing delay. invested substantial amounts of money to improve However, when the bottleneck is shared with loss-based network and application performance. Thus, the perfor- congestion-controlled flows that fill up buffers, Copa, like mance of congestion control algorithms, which are at the other delay-sensitive schemes, achieves low throughput. core of the transport protocols used to deliver data on To combat this problem, Copa uses a third idea: detect the Internet, is important to understand and improve. the presence of buffer-fillers by observing the delay evolu- Congestion control research has evolved in multiple tion, and respond with additive-increase/multiplicative threads. One thread, starting from Reno, and extending decrease on the d parameter. Experimental results to Cubic and Compound relies on packet loss (or ECN) show that Copa outperforms Cubic (similar throughput, as the fundamental congestion signal. Because these much lower delay, fairer with diverse RTTs), BBR and schemes fill up network buffers, they achieve high PCC (significantly fairer, lower delay), and co-exists well throughput at the expense of queueing delay, which with Cubic unlike BBR and PCC. Copa is also robust makes it difficult for interactive or Web-like applications to non-congestive loss and large bottleneck buffers, and to achieve good performance when long-running flows outperforms other schemes on long-RTT paths. also share the bottleneck. To address this problem, schemes like Vegas [4] and FAST [34] use delay, rather 1 Introduction than loss, as the congestion signal. Unfortunately, these A good end-to-end congestion control protocol for the schemes are prone to overestimate delay due to ACK Internet must achieve high throughput, low queueing compression and network jitter, and under-utilize the delay, and allocate rates to flows in a fair way. Despite link as a result. Moreover, when run with concurrent three decades of work, these goals have been hard to loss-based algorithms, these methods achieve poor achieve. One reason is that network technologies and throughput because loss-based methods must fill buffers applications have been continually changing. Since to elicit a congestion signal. the deployment of Cubic [13] and Compound [32, 31] A third thread of research, starting about ten years a decade ago to improve on Reno's [16] performance ago, has focused on important special cases of network on high bandwidth-delay product (BDP) paths, link environments or workloads, rather than strive for gen- rates have increased significantly, wireless (with its erality. The past few years have seen new congestion time-varying link rates) has become common, and the control methods for datacenters [1, 2, 3, 29], cellular net- Internet has become more global with terrestrial paths works [36, 38], Web applications [9], video streaming [10, exhibiting higher round-trip times (RTTs) than before. 20], vehicular Wi-Fi [8, 21], and more. The performance Faster link rates mean that many flows start and stop of special-purpose congestion control methods is often quicker, increasing the level of flow churn, but the significantly better than prior general-purpose schemes. prevalence of video streaming and large bulk transfers A fourth, and most recent, thread of end-to-end (e.g., file sharing and backups) means that these long congestion control research has argued that the flows must co-exist with short ones whose objectives are space of congestion control signals and actions is too different (high throughput versus low flow completion complicated for human engineering, and that algorithms USENIX Association 15th USENIX Symposium on Networked Systems Design and Implementation 329 can produce better actions than humans. Work in this that the queue is regularly almost flushed, which helps thread includes Remy [30, 35], PCC [6], and Vivace [7]. all endpoints get a correct estimate of the queuing delay. These approaches define an objective function to guide Finally, to compete well with buffer-filling competing the process of coming up with the set of online actions flows, Copa mimics an AIMD window-update rule when (e.g., on every ACK, or periodically) that will optimize it observes that the bottleneck queues rarely empty. the specified function. Remy performs this optimization offline, producing rules that map observed congestion Results: We have conducted several experiments signals to sender actions. PCC and Vivace perform in emulation, over real-world Internet paths and in online optimizations. simulation comparing Copa to several other methods. 1. As flows enter and leave an emulated network, In many scenarios these objective-optimization meth- Copa maintains nearly full link utilization with a ods outperform the more traditional window-update median Jain's fairness index of 0.86. The median schemes [6, 35]. Their drawback, however, is that the indices for Cubic, BBR and PCC are 0.81, 0.61 online rules executed at runtime are much more complex and 0.35 respectively (higher the better). and hard for humans to reason about (for example, a typ- 2. In real-world experiments Copa achieved nearly ical Remy controller has over 200 rules). A scheme that as much throughput and 2-10× lower queueing uses online optimization requires the ability to measure delays than Cubic and BBR. the factors that go into the objective function, which 3. In datacenter network simulations, on a web may take time to obtain; for example, PCC's default search workload trace drawn from datacenter objective function incorporates the packet loss rate, but network [11], Copa achieved a > × reduction in a network running at a low packet loss rate (a desirable 5 flow completion time for short flows over DCTCP. situation) will require considerable time to estimate. It achieved similar performance for long flows. We ask whether it is possible to develop a conges- 4. In experiments on an emulated satellite path, Copa tion control algorithm that achieves the goals of high achieved nearly full link utilization with an average throughput, low queueing delay, and fair rate allocations, queuing delay of only 1 ms. Remy's performance but which is also simple to understand and is general was similar, while PCC achieved similar throughput in its applicability to a wide range of environments and but with ≈700 ms of queuing delay. BBR obtained workloads, and that performs at least as well as the 50% link utilization with ≈100ms queuing delay. best prior schemes designed for particular situations. Both Cubic and Vegas obtained < 4% utilization. Approach: We have developed Copa, an end-to-end 5. In an experiment to test RTT-fairness, Copa, Cu- congestion control method that achieves these goals. bic, Cubic over CoDel and Newreno obtained Jain Inspired by work on Network Utility Maximization fairness indices of 0.76, 0.12, 0.57 and 0.37 re- (NUM) [18] and by machine-generated algorithms, spectively (higher the better). Copa is designed we start with an objective function to optimize. The to coexist with TCPs (see section x2.2), but when objective function we use combines a flow's average told that no competing TCPs exist, Copa allocated throughput, l, and packet delay (minus propagation equal bandwidth to all flows (fairness index ≈1). delay), d: U = log l − d log d. The goal is for each 6. Copa co-exists well with TCP Cubic. On a set sender to maximize its U. Here, d determines how of randomly chosen emulated networks where much to weigh delay compared to throughput; a larger Copa and Cubic flows share a bottleneck, Copa d signifies that lower packet delays are preferable. flows benefit and Cubic flows aren't hurt (upto We show that under certain simplified (but reason- statistically insignificant differences) on average able) modeling assumptions of packet arrivals, the throughput. BBR and PCC obtain higher steady-state sending rate (in packets per second) that throughput at the cost of competing Cubic flows. maximizes U is 1 2 Copa Algorithm l = ; (1) d ·dq Copa incorporates three ideas: first, a target rate to aim for, which is inversely proportional to the measured where dq is the mean per-packet queuing delay (in queueing delay; second, a window update rule that seconds), and 1=d is in units of MTU-sized packets.

Copa: Practical Delay-Based Congestion Control for the Internet

TCP Congestion Control: Overview and Survey of Ongoing Research

A QUIC Implementation for Ns-3

An Experimental Evaluation of Low Latency Congestion Control for Mmwave Links

AQM Algorithms and Their Interaction with TCP Congestion Control Mechanisms

TCP Congestion Control with a Misbehaving Receiver

TCP-Friendly Congestion Control for Real-Time Streaming Applications

Better Throughput in TCP Congestion Control Algorithms on Manets

Performance Analysis of Google Congestion Control Algorithm for Webrtc

Techniques for End-To-End Tcp Performance Enhancement Over Wireless Networks

Congestion Control Tuning of the QUIC Transport Layer Protocol Spring 2018

Congestion Control

TCP Congestion Control) Checksum Urgent Pointer Options (Variable)