<<

practice

doi:10.1145/2076450.2076464 protocol the presence of congestion Article development led by queue.acm.org and thus the need for compensating adjustments. Because memory now is significant- A discussion with , Van Jacobson, ly cheaper than it used to be, buffering Nick Weaver, and Jim Gettys. has been overdone in all manner of net- work devices, without consideration for the consequences. Manufacturers have reflexively acted to prevent any and all and, by doing so, have in- : advertently defeated a critical TCP con- gestion-detection mechanism, with the result being worsened congestion and increased . Now that the problem has been di- What’s Wrong agnosed, people are working feverishly to fix it. This case study considers the extent of the bufferbloat problem and its potential implications. Working to with the steer the discussion is Vint Cerf, popu- larly known as one of the “fathers of the Internet.” As the co-designer of the TCP/IP protocols, Cerf did indeed play Internet? a key role in developing the Internet and related packet data and security technologies while at Stanford Univer- sity from 1972−1976 and with the U.S. Department of Defense’s Advanced Research Projects Agency (DARPA) from 1976−1982. He currently serves as Google’s chief Internet evangelist. Internet delays now are as common as they are Van Jacobson, presently a research maddening. But that means they end up affecting fellow at PARC where he leads the networking research program, is also system engineers just like all the rest of us. And when central to this discussion. Considered system engineers get irritated, they often go looking one of the world’s leading authori- for what’s at the root of the problem. Take Jim Gettys, ties on TCP, he helped develop the (RED) queue for example. His slow had repeatedly management algorithm that has been proved to be the source of considerable frustration, so widely credited with allowing the Inter- he set out to determine what was wrong, and he even net to grow and meet ever-increasing demands over the years. coined a term for what he found: bufferbloat. Prior to joining PARC, Jacobson was Bufferbloat refers to excess buffering inside a a chief scientist at Cisco Systems and later Packet Design Networks. network, resulting in high latency and reduced Also participating is Nick Weaver, throughput. Some buffering is needed; it provides a researcher at the International space to queue packets waiting for transmission, thus Computer Science Institute (ICSI) in Berkeley, where he was part of the minimizing data loss. In the past, the high cost of team that developed Netalyzr, a tool memory kept buffers fairly small, so they filled quickly that analyzes network connections, and has been instrumental in detect- and packets began to drop shortly after the link ing bufferbloat and measuring its im- became saturated, signaling to the communications pact across the Internet.

40 communications of the acm | february 2012 | vol. 55 | no. 2 Vint Cerf The key issue we’ve been talking about is that all this excessive buffering ends up breaking many of the timeout mechanisms built into our network protocols. That gets us to the question Rounding out the discussion is Jim the Internet draft for that feature. He Gettys, who edited the HTTP/1.1 speci- lives in the next town over from me, so of just how bad fication and was a co-designer of the X we arranged to get together for lunch. the problem really Window System. He now is a member During that lunch, Rich provided me of the technical staff at Alcatel-Lucent with several pieces to the puzzle. To is and how much Bell Labs where he focuses on systems begin with, he suggested my problem worse it’s likely design and engineering, protocol design, might have to do with the excessive and free software development. buffering in a device in my path rather to get as the speeds than with the PowerBoost feature. He also pointed out that ICSI has a great out at the edge VINT CERF: What caused you to do the tool called Netalyzr that helps you fig- of the net continue analysis that led you to conclude you ure out what your buffering is. Also, had problems with your home net- much to my surprise, he said a num- to increase. work related to buffers in intermedi- ber of ISPs had told him they were run- ate devices? ning without any queue management JIM GETTYS: I was running some whatsoever—that is, they weren’t run- tests on an old IPsec (In- ning RED on any of their routers or ternet Protocol Security)-like device edge devices. that belongs to Bell Labs and observed The very next day I managed to get latencies of as much as 1.2 seconds a wonderful trace. I had been having whenever the device was running as trouble reproducing the problem I erf C fast as it could. That didn’t entirely had experienced earlier, but since I surprise me, but then I happened to was using a more recent cable run the same test without the IPsec this time around, I had a trivial one- box in the way, and I ended up with line command for reproducing the h courtesy of Vint p the same result. With 1.2-second la- problem. The resulting SmokePing tency accompanied by horrible , plot clearly showed the severity of the hotogra

p my home network obviously needed problem, and that motivated me to some help. The rule of thumb for good take a packet-capture so I could see telephony is 150- latency just what in the world was going on. at most, and my network had nearly 10 About a week later, I saw basically the times that much. same signature on a Verizon FiOS [a

alestrieri based on a My first thought was that the prob- bundled home communications ser- B lem might relate to a feature called vice operating over a fiber network], PowerBoost that comes as part of my and that surprised me. Anyway, it be- home service from Comcast. That led came clear that what I’d been experi- me to drop a note to Rich Woundy at encing on my home network wasn’t llustration by John I Comcast since his name appears on unique to cable .

february 2012 | vol. 55 | no. 2 | communications of the acm 41 practice

Jim Gettys In pulling on a string to figure out why my home was misbehaving so badly, I discovered that all operating systems—Linux, Windows, and Macintosh alike— also are guilty of resorting to CERF: I assume you weren’t the only what I’d been observing was anything excessive buffering. one making noises about these sorts more than just a couple of flukes. Af- This phenomenon of problems? ter seeing the Netalyzr data, however, I GETTYS: I had been hearing similar could see how widespread the problem pervades the whole complaints all along. In fact, Dave Reed really was. I can still remember the day path. You definitely [Internet network architect, now with when I first saw the data for the Inter- SAP Labs], about a year earlier had re- net as a whole plotted out. That was can’t fix it at any ported problems in 3G networks that rather horrifying. single location. also appeared to be caused by excessive WEAVER: It’s actually a pretty buffering. He was ultimately ignored straightforward test that allowed us when he publicized his concerns, but to capture all that data. In putting I’ve since been able to confirm that Dave together Netalyzr at ICSI, we started was right. In his case, he would see daily out with a design philosophy that one high latency without much packet loss anonymous commenter later captured during the day, and then the latency very nicely: “This brings new meaning would fall back down again at night as to the phrase, ‘Bang it with a wrench.’” flow on the overall network dropped. Basically, we just set out to hammer on Dave Clark [Internet network archi- everything—except we weren’t inter- tect, currently senior research scientist ested in doing a bandwidth test since at MIT] had noticed that the DSLAM there were plenty of good ones out (Digital Subscriber Line Access Multi- there already. ettys plexer) his micro-ISP runs had way too I remembered, however, that Nick G much buffering—leading to as much as McKeown and others had ranted six seconds of latency. And this is some- about how amazingly over-buffered thing he had observed six years earlier, home networks often proved to be, so ourtesy of Jim C h

which is what had led him to warn Rich buffering seemed like a natural thing p Woundy of the possible problem. to test for. It turns out that would also

CERF: hotogra Perhaps there’s an important give us a bandwidth test as a side con- p life lesson here suggesting you may sequence. Thus we developed a pretty not want to simply throw away outliers simple test. Over just a 10-second pe- on the grounds they’re probably just riod, it sends a packet and then waits flukes. When outliers show up, it might for a packet to return. Then each time

be a good idea to find out why. it receives a packet back, it sends two alestrieri based on a B NICK WEAVER: But when testing for more. It either sends large packets this particular problem, the outliers ac- and receives small ones in return, or tually prove to be the good networks. it sends small packets and receives GETTYS: Without Netalyzr, I never large ones. During the last five sec- llustration by John would have known for sure whether onds of that 10-second period, it just I

42 communications of the acm | february 2012 | vol. 55 | no. 2 practice

measures the latency under load in cally for behavior under load, you won’t GETTYS: I would argue that point. You comparison to the latency without even be aware this is happening. can make things “better” in some very load. It’s essentially just a simple way VAN JACOBSON: I think there’s a limited ways. But if all you’re measur- to stress out the network. deeper problem. We know the cause ing is bandwidth, then you don’t see all We didn’t get around to analyzing of these big queues is data piling up the havoc you’re creating by contribut- all that data until a few months after wherever there’s a fast-to-slow tran- ing to the failure of even routine net- releasing the tool. Then what we saw sition in the network. That generally work services such as lookups of IP ad- were these very pretty graphs that gave happens either going from the Inter- dresses. Those kinds of failures are now us reasonable confidence that a huge net core out to a subscriber (as with commonplace in part because these fraction of the networks we had just YouTube videos) or from the sub- big buffers and the long delays they’re tested could not possibly exhibit good scriber back into the core, where a fast causing are working at odds with the behavior under load. That was a very home network such as a 54-megabit underlying protocols. scary discovery. wireless hits a slow 1- to 2-megabit In- JACOBSON: You’re preaching to the GETTYS: Horrifying, I think. ternet connection. choir, Jim. WEAVER: It wasn’t quite so horrifying In Jim’s case, a Linux machine de- for me because I’d already effectively faulted to about a megabyte of data in taken steps to mitigate the problem on transit, which amounts to several sec- Excess buffering largely defeats TCP’s my own network—namely, I’d paid for onds worth of delay over a 2-megabit congestion-avoidance mechanisms, a higher class of service on my home line. That’s the way Linux ships, and which rely on a low level of timely network specifically to get better be- that’s the way the YouTube servers packet drops to detect congestion. havior under load. You can do that be- come configured out of the box. Un- With buffering, once packets reach cause the buffers are all sized in bytes. less those things get reconfigured, they a chokepoint, they start to queue. If So if you pay for the 4x bandwidth ser- send a megabyte. more packets come in than can be vice, your buffer will be 4x smaller in All that data flows through the net- transmitted, the queue lengthens. The terms of delay, and that ends up acting work until it piles up at the subscriber more packets in the queue, the higher as a boundary on how bad things can link. If some manufacturer were to pro- the latency. Eventually packets are get under load. And I’ve taken steps to duce a home router with a very small dropped, notifying the communica- reduce other potential problems—by buffer, data would get thrown away, and tions protocol of the congestion. installing multiple access points in my that manufacturer would very quickly Bufferbloat allows these queues to home, for example. get a reputation for making crummy grow too long before any packets are GETTYS: The problem is that the next routers. Then a competitor would be dropped. As a result, the buffers be- generation of equipment will come sure to say, “That router has a 90% loss come flooded with packets and then out with even larger buffers. That’s rate, so you want my box instead since take time to drain before they can allow part of why I was having trouble ini- it’s got enough memory to avoid losing in any additional packets. All the end tially reproducing this problem with anything at all.” That’s how we end up user sees of this is slowed response. DOCSIS (Data over Cable Service Inter- with marketplace pressure on manu- Services that require low latency—such face Specification) 3.0 modems. That facturers to put bigger and bigger buf- as network gaming, VoIP, or chat pro- is, because I had even more extreme fers into their equipment. grams—can slow to the point of be- buffering than I’d had before, it took GETTYS: This is not just a phenom- coming unusable. even longer to fill up the buffer and get enon at the broadband edge. In the Also, because of the widespread na- it to start misbehaving. course of pulling on a string to figure ture of the problem, there are growing CERF: What I think you’ve just out- out why my home router was misbe- concerns that the stability of the Inter- lined is a measure of goodness that lat- having so badly, I discovered that all net itself could be compromised. er proved to be exactly the wrong thing our operating systems—Linux, Win- to do. At first, the equipment manu- dows, and Macintosh alike—also are facturers believed that adding more guilty of resorting to excessive buffer- CERF: The key issue we’ve been talking buffers would be a good thing, primar- ing. This phenomenon pervades the about is that all this excessive buffering ily to handle increased traffic volumes whole path. You definitely can’t fix it at ends up breaking many of the timeout and provide for fair access to capacity. any single location. mechanisms built into our network Of course, it has also become increas- JACOBSON: Think of this as the net- protocols. That gets us to the question ingly difficult to buy a chip that doesn’t work version of “Mutually Assured of just how bad the problem really is have a lot of memory in it. Destruction.” Unless you can get ev- and how much worse it’s likely to get WEAVER: Also, to the degree that peo- erybody to cooperate on reducing the as the speeds out at the edge of the Net ple have been testing at all, they’ve been buffers for every piece of equipment, continue to increase. I’d assume the testing for latency or bandwidth. The the right strategy for any provider is to problem is only going to get worse as problem we’re discussing is one of la- increase its buffer. You can make your the buffers start to fill up even faster. tency under load, so if you test only qui- WiFi work better if you put more buffer WEAVER: No, actually, that will make escent latency, you won’t notice it; and into it. That’s also the way to make your things better. While it’s true that the if you test only bandwidth, you’ll never router work better and to make your larger problem has to do with buffers notice it. Unless you’re testing specifi- end system work better. that are too big, if you look at your

february 2012 | vol. 55 | no. 2 | communications of the acm 43 practice

own situation in terms of time rather namically adapting the buffer delay (if that R&D aimed at maximizing the than capacity and you can manage to I might be permitted to use that term bandwidth between supercomputer double the performance on your link, rather than buffer size) for all the vari- centers. Remember that most of the then you ought to be able to cut your ous flows you might be trying to engage network research in the U.S. has been delay by half. with at any given time? funded so far by the NSF (National CERF: Good point. The issue here JACOBSON: No. What you’ve actu- Science Foundation) with the goal of seems to revolve around how long it ally got moving around on the Internet demonstrating very high bandwidth takes for the buffer to fill up. If it’s on are packets, and it’s hard to tell which between academic institutions. This is a slow line, it’s going to take forever, packets constitute a “flow.” What we great for the .01% of the Internet com- and then the buffer is going to hold all do know, however, is that the data piles munity that has those links available to that stuff and permanently impair the up wherever there’s a fast-to-slow tran- them, but it doesn’t do much good for round-trip time. sition, and nowhere else. The router the billion other people in the world GETTYS: The fundamental observa- on the upstream end of that transition who have 2-megabit or less. tion, however, is a bit subtler in that knows there’s a big honking queue pil- We’ve put a lot of effort into pro- there actually is no single right size ing up. If you can somehow mitigate tocol, router, and algorithm develop- for a buffer. Think about wireless, for the queue at the point where you can ment that makes it possible for a single example, where you can easily have actually see it forming, or at least signal TCP to saturate a 40-gigabit link, but two or three orders of magnitude vari- the endpoints about the queue they’re we haven’t put anything even remotely ance in bandwidth. We have broad- creating and tell them they need to deal like that effort into producing usable band systems that can run as much as with that, you don’t need to do any sort cellphone data links, DSL links, or 100 megabits or more, but at the low of complicated flow analysis. home-cable links. That’s where the re- end, they provision only 10 megabits CERF: Could this queue show up any- ally hard problem is because it requires or less. That’s an order-of-magnitude where on the Net, and not just at the that you start to think about how the swing right there. No single static size edges? buffering actually works and how it makes sense all the time. JACOBSON: Theoretically, yes. But I turns into latency. WEAVER: What’s even worse is that if think that theoretical conclusion has CERF: We actually are circling around you think about it in terms of time— impeded a lot of work in this area be- the fact that this is a hard problem be- such that you size the buffer for X cause it leads people to think about the cause of the environment. We should rather than X kilobytes problem as potentially being anywhere. talk about whether we have a crisis on (which you can do with just a little Yet the economics of the Internet tends our hands right now and, if so, what’s to more logic)—that still doesn’t entirely to ensure a very high-bandwidth core be done about it. do the job. It gives you a 90% solution where we aggregate traffic from a lot of GETTYS: We don’t even know wheth- because that will still induce some la- low-bandwidth links. Remember also er the current state of things is going to tency under load. It will be bounded that the individual subscribers from remain stable. I get very nervous when just to the point where it’s about the whom you’re aggregating aren’t corre- I consider the spasmodic behavior of maximum latency—or about the mini- lated, so as you add together their dif- just a single TCP flow, with all these mum latency under load for a simple ferent traffic streams, the traffic tends wild excursions going on much of the queue if you’re still looking to achieve to get smoother. This is the point Nick time. It certainly seems to me that our full-TCP throughput. McKeown made a few years ago when current situation is not at all static. JACOBSON: It won’t even work with he was talking about excessive buffering We’re seeing many new applica- Jim’s WiFi example. Move your WiFi re- in the core. There’s really no need for tions—such as streaming video and ceiver two inches and your throughput humongous buffers there since you’re off-site system backup, for example— can go from 100 megabits down to a dealing with these huge aggregates, re- that are far more likely to saturate the megabit, which means your queue de- sulting in traffic that’s pretty smooth. links. All these applications are starting lay will go up by a factor of 100. There’s WEAVER: The other issue regarding to become much more routine at the just no static buffer-management strat- the Internet core is that the buffer gets same time that Windows XP is retiring. egy—and no static buffer size—that to be really expensive once you start That’s actually a significant milestone will work in that situation. talking about 10-gigabit-plus links. because Windows XP didn’t imple- CERF: That suggests anything that’s Interestingly, if you talk to the Inter- ment window scaling, so it tended not going to work will have to be dynamic net2 folks, their complaints about core to saturate links in quite the same way and adaptable. But it also raises the odd routers have to do with there not being as anything released more recently. problem that your device at the edge enough buffering, since their test in- Maybe I’m being paranoid, but I really of the Net might end up communicat- volves single-stream TCP or few-stream don’t like what I’m seeing right now. ing with devices at various locations TCP throughput on 10-gigabit links. WEAVER: Does any major TCP stack in the network with different path dy- For that, you actually do need a good exist now that doesn’t do proper win- namics. That gets to a question about amount of buffering, but that’s just not dow scaling and so doesn’t saturate ev- how you go about observing flows and economically feasible right now for a erything up to a near-gigabit link? distinguishing between them. Is that lot of those core routers. GETTYS: I think they all do window something that would become neces- JACOBSON: But that is something else scaling now—at least anything more re- sary if we were to move at all toward dy- that has impeded progress here—all cent than Windows XP. Windows, Mac

44 communications of the acm | february 2012 | vol. 55 | no. 2 practice

Van Jacobson We’ve put a lot of effort into protocol, router, and algorithm development that makes it possible for a single TCP to saturate a 40-gigabit link, but we haven’t put anything OS X, and Linux are all very happy to run 10 times or more longer than what up to multigigabit speeds. you have at home. You could pick a remotely like that JACOBSON: But the packet output window size that was appropriate for effort into producing isn’t driven by window scaling. Window the Internet—something like 100 ki- scaling just gives you room to expand lobytes, which is 10 kilobits into 100 usable links for the window, but it expands only if the milliseconds—and that would work cellphone data, system defaults to a large window. For more than adequately for your home years, the default window was sized to environment as well. DSL, or home cable. fit the roughly 100-millisecond/1-mb- But that just isn’t what we see any- That’s the hard ps TCP paths you typically found back more. Instead, we ship several mega- then. At some point over the past five or bytes, and that’s when you hear the problem because six years, people have decided that the complaint: “Oh my God, I’m seeing a common path ought to be considered second of delay.” Well, yes, of course. it requires you a gigabit or more. As a consequence, That’s how your system is configured. think about how the buffer size has been bumped up to GETTYS: What’s more, you find that more than a megabyte. sort of thing in several different plac- the buffering WEAVER: Unfortunately, that’s al- es. The itself may actually works most exactly the case. If you look at my have problems. If you look at the de- house, in fact, going from my comput- vice drivers, you’ll find they’ve also and how it turns er to the file server, I’ve got a gig link at grown some very large buffers of their into latency. 10-plus milliseconds. own. Basically, whenever you look in- JACOBSON: Have you done the experi- side any of our current operating sys- ment of varying the window size and tems, you’ll find a recurring theme of then looking at throughput to your file bufferbloat at multiple layers. server as a function of the window size That’s actually part of the problem: h courtesy of Van Jacobson p to see whether or not you actually need people are thinking of this as a system the megabyte? of layers, where they don’t have to wor-

hotogra WEAVER: p No, I have not. ry about how everything actually works, JACOBSON: Well, I have, and the either above them or below them. They bandwidth flattops at an eight-packet aren’t really thinking through all the window. I’m talking about a Linux file consequences of what they’re doing— server running a very current kernel. and it shows!

alestrieri based on a Typically, the situation is that you’re in B a home environment with large band- width and very short delays, or you’re If we do indeed face an impending going out over the Internet where crisis, what’s to be done? Bufferbloat you’re going to encounter small band- presents a hard problem, with no sin- llustration by John I width and much longer delays—often gle right solution. And making head-

february 2012 | vol. 55 | no. 2 | communications of the acm 45 practice

Nick weaver To the degree that people have been testing at all, they’ve been testing for latency or bandwidth. The problem we’re discussing is one of latency under load, so if you test only quiescent latency, way to mitigate the current situation unless you’re monitoring source/desti- you won’t notice it; will require a multipronged effort that nation pair traffic or something along and if you test only involves ISPs, operating system imple- those lines. Given that we now seem to menters, application vendors, and know something about the nature of bandwidth, you’ll equipment manufacturers. the problem, can you speculate a little never notice it. Since network hardware is clearly a on what, if anything, might be done to primary culprit, it would seem that re- attack it? ducing buffer size is all that ought to JACOBSON: The approach you just really be necessary. But buffer size can- described was the basic idea behind not be configured on most routers and RED: whenever you’ve got a big queue, switches. Nor can buffer size be static. you should notify one or more of the It must be sized dynamically, meaning endpoints that there’s a problem so that both hardware and software mod- they will reduce the window, which in ifications will be required. turn will reduce the queue. That pro- There also are some new queue- vides an alternative to tracking flows, management algorithms that are cur- which at best is a very state-intensive rently being studied. Although the clas- problem. And with IPv6, it can turn sic RED queue-management approach into an impossible problem because serves as a starting point for some of someone who doesn’t want to be po- this research, it is not itself equal to liced will just spread traffic over a zil- the present challenge. Still, efforts to lion addresses, and there’s no way create an improved version of RED are to detect whether those are separate itty already under way. flows or all coming from a single user. L hanie

Instead of trying to infer the flows, p te you might just pick a uniformly dis- S y B h

CERF: It certainly sounds as though tributed random number and, when p equipment with smaller buffers would the packet with that number comes hotogra

help keep a lot of the worst cases up, mark or drop it. So if, say, 90% of p from occurring simply because there the packets come from one source, you wouldn’t be enough buffer space to would have a 90% probability that the create as much of a problem. The other dropped packet is from that source. possibility, I guess, would be to provide WEAVER: There’s also a cool little idea

a way to discard backed-up packets out of Case Western Reserve University alestrieri based on a B and send a summary report back to the for handling queue management re- source of congestion whenever you see motely. Basically, their observation is queues building up. But that gets to this that if you are on the path that all the awkward problem of not knowing what traffic passes through—for example, llustration by John the source of the congestion might be the NAT (network address translation) I

46 communications of the acm | february 2012 | vol. 55 | no. 2 practice that’s separate from the cable mo- ing too much damage and therefore getting hurt pretty badly right now. dem—you can track delay increase. started making changes. BitTorrent That applies to the operating systems, Once the delay increase gets above a is the classic example. It is shifting to the home routers, the broadband gear threshold of, say, 100 milliseconds, you delay-based congestion control specifi- you find in people’s homes, and the can start using the RED approach for cally to: (a) be friendlier to TCP because broadband gear that’s installed at the queue management or whatever it is most of the data carried by BitTorrent ISP end. So, among those four places, you need to do to get the traffic to back really is lower-priority stuff; and (b) we certainly ought to focus on produc- off. The problem with this, however, is mitigate the “you can’t run BitTorrent ing some simple tools that can help that it provides only an 80% solution and Warcraft at the same time” prob- people identify where their problems for the telephony crowd. lem. So, there’s some hope. are coming from. JACOBSON: If I understand what is GETTYS: Right. There are also a num- JACOBSON: I have four family mem- being proposed there, you’re putting ber of queue-management algorithms bers in my household. It used to be in an appliance to measure delay, but other than classic RED that we’re ready that none of us watched video over the if you’re looking at aggregated traffic, to start experimenting with. Van is Net, but now everybody watches video you don’t know how it de-aggregates. working with Kathie Nichols [founder or plays games over the Net, or tries to WEAVER: That’s partially why this pro- and CEO of Pollere Inc., specializing do both at the same time. It only takes posal focused on the home gateway. Ba- in network architecture and perfor- one person watching YouTube while sically, there aren’t all that many large mance] on something they’re calling somebody else is watching Netflix to flows going through your NAT box at RED Lite, and we hope to start playing completely saturate the Internet-to- any point in time. around with that soon. There’s also a home link. JACOBSON: But if you’re on the gate- queue-management algorithm called That’s not the fault of YouTube or way, you’ve already got the queue, so SFB (Stochastic Fair Blue) that’s been Netflix or anybody in the household. you don’t need any state at all. The implemented in Linux and we’re now It’s just the way people are using the In- queue is its own state. getting ready to try it out. ternet now. This is a problem we need GETTYS: One of the reasons queue A number of us are trying now to set to fix so that people can continue using management is even being consid- up OpenWrt (a Linux-based program the Internet the ways they’ve grown ac- ered for home networks is because for embedded devices) so we can build, customed to. so many of the broadband networks at least as a proof of principle, a home don’t provide any. I recently read a pa- router that’s well behaved and might per that indicated as much as 30% of even function reasonably well. The bufferbloat problem is likely to residential broadband networks run JACOBSON: There’s also a separate worsen as the demand grows for Inter- without any queue management what- piece that’s important to understand. net-intensive activities such as stream- soever. That leaves it to the home rout- Because there is no static answer to this ing video and remote data backup and ers to handle the problem. Then we’ve problem, there’s no answer that’s right storage, with the online user experi- got the additional complication of for all environments, and any answer ence bound to deteriorate as a conse- the huge dynamic range presented by needs to evolve over time. People need quence. The problem is most visible at wireless. Van has told me that classic some no-brainer, easy-to-use measure- the edge of the network, but evidence RED cannot be used to address that ment tools that say, “Your home rout- of it can also be found in the Internet particular problem. er’s delay is getting really bad.” core, in corporate networks, and in CERF: I’m trying to think about how For example, one group that’s mas- the boundaries between ISPs. Besides we might start moving toward a solu- sively affected by this bufferbloat prob- calling for simple tools that people tion. It seems there are at least three lem is YouTube. It’s in a position to ob- can use to find and measure buffer- different pieces to consider. First, serve the congestion at the upstream bloat, active queue management for there’s the party who’s suffering the end of the residential links. It would be all devices is also in order. The best effects of bufferbloat in, say, a residen- really simple to add instrumentation to algorithm for the job remains to be de- tial setting. Then there’s the service their servers to measure the buffering termined, however. provider that, I believe, really wants to delay on each video sent out and then provide a better for its associate that with the destination IP customers. Finally, there are all those address. That could be viewed at the Related articles on queue.acm.org equipment and software vendors that individual level (via the player), the pre- might be persuaded to address the fix level, or the network level to provide Bufferbloat: Dark Buffers in the Internet Jim Gettys and Kathleen Nichols problem if they thought it would make some understanding of where the de- http://queue.acm.org/detail.cfm?id=2071893 the ISPs and end users happier. lays are occurring. A Conversation with Van Jacobson The question is: under what condi- GETTYS: The only place where the http://queue.acm.org/detail.cfm?id=1508215 tions might we get all those parties to problem really matters is where you cooperate? Or is this going to remain hit that point of bandwidth transition. You Don’t Know Jack about largely a research problem? Anywhere else, the excessive buffering Kevin Fall and Steve McCanne WEAVER: Well, there have been a might not matter at all, apart from be- http://queue.acm.org/detail.cfm?id=1066069 couple of cases where the application ing a waste of memory. But out at the vendors concluded they were caus- edge of the network, we’re certainly © 2012 ACM 0001-0782/12/02 $10.00

february 2012 | vol. 55 | no. 2 | communications of the acm 47