Bufferbloat: Advertently Defeated a Critical TCP Con- Gestion-Detection Mechanism, with the Result Being Worsened Congestion and Increased Latency
Total Page:16
File Type:pdf, Size:1020Kb
practice Doi:10.1145/2076450.2076464 protocol the presence of congestion Article development led by queue.acm.org and thus the need for compensating adjustments. Because memory now is significant- A discussion with Vint Cerf, Van Jacobson, ly cheaper than it used to be, buffering Nick Weaver, and Jim Gettys. has been overdone in all manner of net- work devices, without consideration for the consequences. Manufacturers have reflexively acted to prevent any and all packet loss and, by doing so, have in- BufferBloat: advertently defeated a critical TCP con- gestion-detection mechanism, with the result being worsened congestion and increased latency. Now that the problem has been di- What’s Wrong agnosed, people are working feverishly to fix it. This case study considers the extent of the bufferbloat problem and its potential implications. Working to with the steer the discussion is Vint Cerf, popu- larly known as one of the “fathers of the Internet.” As the co-designer of the TCP/IP protocols, Cerf did indeed play internet? a key role in developing the Internet and related packet data and security technologies while at Stanford Univer- sity from 1972−1976 and with the U.S. Department of Defense’s Advanced Research Projects Agency (DARPA) from 1976−1982. He currently serves as Google’s chief Internet evangelist. internet DeLays nOw are as common as they are Van Jacobson, presently a research maddening. But that means they end up affecting fellow at PARC where he leads the networking research program, is also system engineers just like all the rest of us. And when central to this discussion. Considered system engineers get irritated, they often go looking one of the world’s leading authori- for what’s at the root of the problem. Take Jim Gettys, ties on TCP, he helped develop the random early detection (RED) queue for example. His slow home network had repeatedly management algorithm that has been proved to be the source of considerable frustration, so widely credited with allowing the Inter- he set out to determine what was wrong, and he even net to grow and meet ever-increasing throughput demands over the years. coined a term for what he found: bufferbloat. Prior to joining PARC, Jacobson was Bufferbloat refers to excess buffering inside a a chief scientist at Cisco Systems and later Packet Design Networks. network, resulting in high latency and reduced Also participating is Nick Weaver, throughput. Some buffering is needed; it provides a researcher at the International space to queue packets waiting for transmission, thus Computer Science Institute (ICSI) in Berkeley, where he was part of the minimizing data loss. In the past, the high cost of team that developed Netalyzr, a tool memory kept buffers fairly small, so they filled quickly that analyzes network connections, and has been instrumental in detect- and packets began to drop shortly after the link ing bufferbloat and measuring its im- became saturated, signaling to the communications pact across the Internet. 40 communicaTions of The acM | February 2012 | VoL. 55 | No. 2 VinT ceRf The key issue we’ve been talking about is that all this excessive buffering ends up breaking many of the timeout mechanisms built into our network protocols. That gets us to the question Rounding out the discussion is Jim the Internet draft for that feature. He Gettys, who edited the HTTP/1.1 speci- lives in the next town over from me, so of just how bad fication and was a co-designer of the X we arranged to get together for lunch. the problem really Window System. He now is a member During that lunch, Rich provided me of the technical staff at Alcatel-Lucent with several pieces to the puzzle. To is and how much Bell Labs where he focuses on systems begin with, he suggested my problem worse it’s likely design and engineering, protocol design, might have to do with the excessive and free software development. buffering in a device in my path rather to get as the speeds than with the PowerBoost feature. He also pointed out that ICSI has a great out at the edge VINT CERf: What caused you to do the tool called Netalyzr that helps you fig- of the net continue analysis that led you to conclude you ure out what your buffering is. Also, had problems with your home net- much to my surprise, he said a num- to increase. work related to buffers in intermedi- ber of ISPs had told him they were run- ate devices? ning without any queue management JiM GeTTYS: I was running some whatsoever—that is, they weren’t run- bandwidth tests on an old IPsec (In- ning RED on any of their routers or ternet Protocol Security)-like device edge devices. that belongs to Bell Labs and observed The very next day I managed to get latencies of as much as 1.2 seconds a wonderful trace. I had been having whenever the device was running as trouble reproducing the problem I erf c fast as it could. That didn’t entirely had experienced earlier, but since I surprise me, but then I happened to was using a more recent cable modem run the same test without the IPsec this time around, I had a trivial one- box in the way, and I ended up with line command for reproducing the h courtesy of Vint P the same result. With 1.2-second la- problem. The resulting SmokePing tency accompanied by horrible jitter, plot clearly showed the severity of the hotogra P my home network obviously needed problem, and that motivated me to some help. The rule of thumb for good take a packet-capture so I could see telephony is 150-millisecond latency just what in the world was going on. at most, and my network had nearly 10 About a week later, I saw basically the times that much. same signature on a Verizon FiOS [a alestrieri based on a My first thought was that the prob- bundled home communications ser- b lem might relate to a feature called vice operating over a fiber network], PowerBoost that comes as part of my and that surprised me. Anyway, it be- home service from Comcast. That led came clear that what I’d been experi- me to drop a note to Rich Woundy at encing on my home network wasn’t llustration by John i Comcast since his name appears on unique to cable modems. February 2012 | VoL. 55 | No. 2 | communicaTions of The acM 41 practice JiM GeTTys in pulling on a string to figure out why my home router was misbehaving so badly, i discovered that all operating systems—Linux, Windows, and Macintosh alike— also are guilty of resorting to CERf: I assume you weren’t the only what I’d been observing was anything excessive buffering. one making noises about these sorts more than just a couple of flukes. Af- This phenomenon of problems? ter seeing the Netalyzr data, however, I GeTTYS: I had been hearing similar could see how widespread the problem pervades the whole complaints all along. In fact, Dave Reed really was. I can still remember the day path. you definitely [Internet network architect, now with when I first saw the data for the Inter- SAP Labs], about a year earlier had re- net as a whole plotted out. That was can’t fix it at any ported problems in 3G networks that rather horrifying. single location. also appeared to be caused by excessive WEAVeR: It’s actually a pretty buffering. He was ultimately ignored straightforward test that allowed us when he publicized his concerns, but to capture all that data. In putting I’ve since been able to confirm that Dave together Netalyzr at ICSI, we started was right. In his case, he would see daily out with a design philosophy that one high latency without much packet loss anonymous commenter later captured during the day, and then the latency very nicely: “This brings new meaning would fall back down again at night as to the phrase, ‘Bang it with a wrench.’” flow on the overall network dropped. Basically, we just set out to hammer on Dave Clark [Internet network archi- everything—except we weren’t inter- tect, currently senior research scientist ested in doing a bandwidth test since at MIT] had noticed that the DSLAM there were plenty of good ones out (Digital Subscriber Line Access Multi- there already. ettys plexer) his micro-ISP runs had way too I remembered, however, that Nick g much buffering—leading to as much as McKeown and others had ranted six seconds of latency. And this is some- about how amazingly over-buffered thing he had observed six years earlier, home networks often proved to be, so ourtesy of Jim c h which is what had led him to warn Rich buffering seemed like a natural thing P Woundy of the possible problem. to test for. It turns out that would also CERf: hotogra Perhaps there’s an important give us a bandwidth test as a side con- P life lesson here suggesting you may sequence. Thus we developed a pretty not want to simply throw away outliers simple test. Over just a 10-second pe- on the grounds they’re probably just riod, it sends a packet and then waits flukes. When outliers show up, it might for a packet to return. Then each time be a good idea to find out why. it receives a packet back, it sends two alestrieri based on a b NICK WEAVeR: But when testing for more. It either sends large packets this particular problem, the outliers ac- and receives small ones in return, or tually prove to be the good networks.