When to Use and When Not to Use BBR: an Empirical Analysis and Evaluation Study

When to use and when not to use BBR: An empirical analysis and evaluation study Yi Cao Arpit Jain Kriti Sharma Stony Brook University Stony Brook University Stony Brook University [email protected] [email protected] [email protected] Aruna Balasubramanian Anshul Gandhi Stony Brook University Stony Brook University [email protected] [email protected] ABSTRACT TCP Reno [25] uses an AIMD (additive-increase/multiplicative- This short paper presents a detailed empirical study of BBR’s per- decrease) algorithm to quickly respond to losses while slowly recov- formance under different real-world and emulated testbeds across a ering from congestion. TCP Cubic [28] responds more aggressively range of network operating conditions. Our empirical results help to recover from losses — rather than a linear increase, TCP Cubic to identify network conditions under which BBR outperforms, in grows its congestion window in a cubic manner. Instead of using terms of goodput, contemporary TCP congestion control algorithms. packet loss as a congestion signal, TCP Vegas [19] treats increasing We find that BBR is well suited for networks with shallow buffers, round trip time (RTT) as evidence of congestion. To achieve high despite its high retransmissions, whereas existing loss-based algo- throughput and low latency in data centers, Alizadeh et al. imple- rithms are better suited for deep buffers. mented DCTCP [16], which uses explicit congestion notification To identify the root causes of BBR’s limitations, we carefully (ECN) [35] as the congestion signal to prevent the packet loss from analyze our empirical results. Our analysis reveals that, contrary occurring before the buffer becomes too congested. to BBR’s design goal, BBR often exhibits large queue sizes. Further, A fundamental issue with regards to designing a congestion the regimes where BBR performs well are often the same regimes control algorithm is: what is the optimal operating point for doing where BBR is unfair to competing flows. Finally, we demonstrate the congestion control? Should we keep sending packets until the buffer existence of a loss rate “cliff point” beyond which BBR’s goodput becomes full and use packet loss as the congestion signal (e.g. Reno, drops abruptly. Our empirical investigation identifies the likely Cubic), or should we treat packet delay as the congestion evidence culprits in each of these cases as specific design options in BBR’s (e.g. Vegas, Copa [17, 19]), or should we implement sophisticated source code. algorithms via learning-based techniques (e.g. PCC, Indigo [24, 38])? In 1979, Kleinrock showed that the optimal operating point for CCS CONCEPTS the network was when the bandwidth was maximized while min- imizing the delay [33]. However, it was not until 2016 that this • Networks → Transport protocols; Network performance design point was explicitly used for congestion control. Google’s analysis; Network measurement. BBR (Bandwidth Bottleneck and Round-trip propagation time) algo- ACM Reference Format: rithm aims to operate at this optimal point by probing the current Yi Cao, Arpit Jain, Kriti Sharma, Aruna Balasubramanian, and Anshul bandwidth and delay sequentially in the network [20], as we discuss Gandhi. 2019. When to use and when not to use BBR: An empirical analysis in Section 2. BBR has since been employed at Google, and continues and evaluation study. In Internet Measurement Conference (IMC ’19), October to be actively developed. To avoid bufferbloat27 [ ], BBR regulates its 21–23, 2019, Amsterdam, Netherlands. ACM, New York, NY, USA, 7 pages. congestion window size such that the amount of in-flight packets https://doi.org/10.1145/3355369.3355579 is a multiple of the bandwidth-delay product (BDP); ideally, this should result in small buffer sizes. 1 INTRODUCTION Despite the rising popularity of BBR, it is not fully clear when TCP congestion control algorithms have continued to evolve for BBR should be employed in practice, that is, when does BBR outper- more than 30 years [36]. As the internet becomes more and more form other congestion control algorithms. Prior work has typically complex, researchers have designed different TCP congestion con- focused on BBR’s fairness properties [34, 37] and its throughput trol algorithms to serve different scenarios. For example, the legacy and queueing delay [30]; we discuss prior work on BBR in detail in Section 5. Permission to make digital or hard copies of all or part of this work for personal or The goal of this short paper is to conduct a comprehensive empir- classroom use is granted without fee provided that copies are not made or distributed ical study to investigate BBR’s performance under different network for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM conditions and determine when to employ BBR. In doing so, we also must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, aim to identify the root causes of BBR’s sub-optimal performance. to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. To this end, we conduct extensive experiments in both Mininet [6] IMC ’19, October 21–23, 2019, Amsterdam, Netherlands and real-world networks to analyze the performance of BBR. To © 2019 Association for Computing Machinery. span the range of network operating conditions, we vary the band- ACM ISBN 978-1-4503-6948-0/19/10...$15.00 https://doi.org/10.1145/3355369.3355579 width, RTT, and bottleneck buffer sizes by employing a router IMC ’19, October 21–23, 2019, Amsterdam, Netherlands Yi Cao, Arpit Jain, Kriti Sharma, Aruna Balasubramanian, and Anshul Gandhi Based on the current state, BBR calculates the pacinд_дain (a dynamic gain factor used to scale BtlBw) and cwnd_дain (a dynamic gain factor used to scale BDP), and uses these values to derive pacinд_rate (which controls the inter-packet spacing) and congestion window size, cwnd, respectively. BBR then regulates the pacinд_rate between 1:25 × BtlBw and 0:75 × BtlBw to explore the achievable bandwidth and to drain the subsequently inflated queues. Finally, BBR sends cwnd packets at the inter-packet speed of pacinд_rate. The algorithm continues iteratively with the next round of net- (a) BBR high level design. (b) State machine. work measurements. BBR transitions between different states of Figure 1: BBR congestion control algorithm design. the state machine based on the observed BtlBw, RTprop, amount of packets in flight, etc. BBR periodically enters the ProbeRTT state between the client and server machines. For each scenario, we to reduce its cwnd and drain the queue to reset itself. contrast BBR’s performance with that of Cubic [28], which is the BBR vs other congestion control algorithms: BBR differs from default TCP variant on Linux and Mac OS, to determine the operat- other major congestion control algorithms in the following aspects: ing conditions under which BBR is helpful. 1) BBR does not explicitly respond to losses. Reno and Cubic regard We synthesize the results of our 640 different experiments in packet loss as a congestion event, and subsequently reduce their the form of a decision tree highlighting the choice between BBR cwnd value by a certain factor. Similarly, the delay-based algorithm and Cubic, to aid practitioners. In general, we find that when the TCP Vegas decreases its cwnd when observing increasing RTT. BBR, bottleneck buffer size is much smaller than BDP, BBR can achieve however, does not use explicit congestion signals to reduce cwnd. 200% higher goodput than Cubic. However, in the case of deep Rather, BBR decides the amount of packets to be sent based on buffers, Cubic can improve goodput by 30% compared to BBR; here, past bandwidth and RTT measurements. Thus, in contrast to other buffer refers to the bottleneck buffer size. While we find thatBBR event-driven algorithms, BBR is feedback driven. achieves high goodput in shallow buffers, we observe that BBR’s 2) BBR uses pacinд_rate as the primary controller. Most conges- packet loss can be several orders of magnitude higher than that of tion control algorithms, like Reno and Cubic, use cwnd to determine Cubic when the buffers are shallow. the number of packets in flight. However, cwnd does not directly Our analysis of BBR’s source code reveals that the high packet control the sending rate, resulting in traffic bursts or an idle net- loss under shallow buffers is because of BBR’s configuration pa- work [21]. To solve this issue, BBR uses pacing rate to control the rameters that maintain 2× BDP of data in flight. Decreasing the inter-packet spacing. Implementing the pacing rate in TCP conges- 2× multiplier or increasing the bottleneck buffer size significantly tion control algorithms is known to have benefits for throughput lowers the packet losses. and fairness [15]. Our empirical results also suggest the existence of a “cliff point" 3) BBR actively avoids network congestion, whereas loss-based in loss rate for BBR above which BBR’s goodput decreases signif- algorithms passively decrease their sending rate in response to con- icantly. We find that, empirically, this cliff point is at around 20% gestion. BBR is designed to have low latency and high throughput loss rate. By modifying the BBR parameters, we find that the cliff by maintaining (typically) 2× BDP packets in flight. One BDP is point is largely dictated by the maximum pacinд_дain value BBR budgeted for the network capacity, and the other is to deal with uses when probing for more bandwidth. Interestingly, we find that delayed/aggregated ACKs [20]. BBR thus avoids congestion by lim- BBR exhibits the highest amount of packet retransmissions at this iting the number of packets in flight.

When to Use and When Not to Use BBR: an Empirical Analysis and Evaluation Study

A Comparison of Mechanisms for Improving TCP Performance Over Wireless Links

Dynamic Optimization of the Quality of Experience During Mobile Video Watching

Detecting Fair Queuing for Better Congestion Control

TCP Optimization Improve Qoe by Managing TCP-Caused Latency

An Active-Passive Measurement Study of TCP Performance Over LTE on High-Speed Rails∗

Overall Design of Satellite Networks for Internet Services with Qos Support

Wifox: Scaling Wifi Performance for Large Audience Environments

Network Congestion: Causes, Effects, Controls

Instrumentation for Measuring Users' Goodputs in Dense Wi-Fi

A Comprehensive Overview of TCP Congestion Control in 5G Networks: Research Challenges and Future Perspectives

Data Communicatoin Networking

Nighthawk Pro Gaming Wifi 6 AX5400 Router Model XR1000