What Lies Beneath

What Lies Beneath: Understanding Internet Congestion Leiwen Deng, Aleksandar Kuzmanovic Bruce Davie Northwestern University Cisco Systems (l-deng,akuzma)@northwestern.edu [email protected] ABSTRACT a router is larger than its service rate, may be non- Developing measurement tools that can concurrently mon- negligible. Variable queuing delay leads to jitter, which itor congested Internet links at a large scale would signifi- can hurt the performance of delay-based congestion con- cantly help us understand how the Internet operates. While trol algorithms (e.g., [15,18]), or real-time applications congestion at the Internet edge typically arises due to bottle- such as VoIP. And whereas it is difficult to route around necks existent at a connection’s last mile, congestion in the congestion in access-network congestion scenarios [24] core could be more complex. This is because it may depend unless a client is multihomed [13], congested links in the upon internal network policies and hence can reveal system- core could be effectively avoided in many cases [12,14]. atic problems such as routing pathologies, poorly-engineered Hence, identifying such congested locations1, and char- network policies, or non-cooperative inter-AS relationships. acterizing their properties in space (how “big” they are) Therefore, enabling the tools to provide deeper insights about and time (the time-scales at which they occur and re- congestion in the core is certainly beneficial. peat) is valuable for latency-sensitive applications such In this paper, we present the design and implementation as VoIP [8]. of a large scale triggered monitoring system that focuses on Second, the Internet is a complex system composed of monitoring a subset of Internet core links that exhibit rela- thousands of independently administered ASes. Events tively strong and persistent congestion, i.e., hot spots. The happening at one point in the network can propagate system exploits triggered mechanisms to address its scalabil- over inter-AS borders and have repercussions on other ity and automates selection of good vantage points to handle parts of the network (e.g., [37]). Consequently, mea- the common measurement experience that the much more surements from each independent AS (e.g., [27]) are in- congested Internet edges could often overshadow the obser- herently limited: even when a congestion location is vation for congestion in the core. Using the system, we char- identified, establishing potential dependencies among acterize the properties of concurrently monitored hot spots. events happening at different ASes, or revealing un- Contrary to common belief, we find that congestion events at derlying mechanisms responsible for propagating such these hot spots can be highly correlated and such correlated events, is simply infeasible. To achieve such goals, it is congestion events can span across up to three neighboring essential to have global views, i.e., to concurrently mon- ASes. We provide a root-cause analysis to explain this phe- itor the Internet congestion locations at a large scale nomenon and discuss implications of our findings. across the Internet. This paper makes two primary contributions. First, 1. INTRODUCTION we present the design of a large-scale triggered moni- The common wisdom is that very little to no con- toring system, the goal of which is to quantify, locate, gestion occurs in the Internet core (e.g., Tier-1 or -2 track, correlate, and analyze congestion events happen- providers). Given that ISPs are aggressively overpro- ing in the Internet core. The system focuses on a subset visioning the capacity of their pipes, and that end-to- of core links that exhibit relatively strong and persis- end data transfers are typically constrained at Internet tent congestion, i.e., hot spots. Second, we use a Planet- edges, the “lightly-congested core” hypothesis makes a Lab [6] implementation of our system to provide, to the lot of sense. Indeed, measurements conducted within best of our knowledge, the first-of-its-kind measurement Tier-1 providers such as Sprint report almost no packet study that characterizes the properties of concurrently losses [27]; likewise, ISPs like Verio advertise SLAs that monitored hot spots across the world. guarantee negligible packet-loss rates [9]. As a result, At a high level, our methodology is as follows. Us- researchers are focusing on characterizing congestion at ing about 200 PlanetLab vantage points, we initially access networks such as DSL and cable [24]. “jumpstart” measurements by collecting underlying IP- While numerous other measurement studies do con- level topology. Once sufficient topology information is firm that the network edge is more congested than the revealed, we start light end-to-end probing from vantage core (e.g., [23]), understanding the properties of conges- points. Whenever a path experiences excessive queuing tion events residing in the Internet core is meaningful delay over longer time scales, we accurately locate a for (at least) the following two reasons. congested link and designate it a hot spot. (Exactly First, despite negligible packet losses in the core, queu- what we consider “excessive” queuing and a “longer” ing delay, which appears whenever the arrival rate at 1We define a congested location more precisely in Section 2. 1 time scale is defined in Section 2.) In addition, we should be carefully evaluated to understand how link- trigger large-scale coordinated measurements to explore level correlations affect its accuracy. Finally, our results the entire area around that hot spot, both topologically about the typical correlation “spreading” provide guide- close to and distant from it. Our system covers close to lines for overlay re-routing systems. To avoid an entire 30,000 Internet core links using over 35,000 distinct end- hot spot (not just a part of it), it makes sense to choose to-end paths. It is capable of monitoring up to 8,000 overlay paths that are at least 3 ASes disjoint from the links concurrently, 350 of which could be inspected in one experiencing congestion, if such paths are available. depth using a tool we recently developed [23]. Designing a system capable of performing coordinated 2. PRELIMINARIES measurements at such a large scale is itself a challeng- 2.1 Congestion Events ing systems engineering problem. Some of the ques- Congestion has several manifestations. The most se- tions we must address are as follows. How to effec- vere one induces packet losses. We detect congestion tively collect the underlying topology and track routing in its early stage, as indicated by increased queuing de- changes? How to handle clock skew, routing alterations, lays at links. From a microscopic view, a continuous and other anomalies? How to effectively balance the queue building-up and draining period typically lasts measurement load across vantage points? How to min- from several ms to hundreds of ms [42]. On time scales imize the amount of measurement resources while still of seconds to minutes, queue building-up can repeatedly covering a large area? How to select vantage points happen, as shown in Figure 1. In this paper, we focus that can achieve high measurement quality with high on measuring a subset of congested links in the core probability? In this paper, we provide a “from scratch” that exhibit repetitive congestion, as we define formally system design and answer all these questions. below. Our system has important implications for emerging 0.22 protocols and systems, and they open avenues for future research. For instance, given the very light overhead on each node in our system (30 kbps on average), it could 0.11 be easily ported into emerging VoIP systems to help avoid hot spots in the network. Further, our monitor- Congestion intensity ing system should be viewed as a rudiment of a global 11:00 11:10 11:20 11:30 11:40 11:50 12:00 12:10 12:20 12:30 Internet “debugging” system that we plan to develop. Time (hour:minute) Such a system would not only track “anomalies” (i.e., The Y axis shows congestion intensity. It is updated every 30 seconds. hot spots or outages), but would also automate a root- Figure 1: Congestion events on a link monitored cause analysis (e.g., automatically detect if a new policy by Pong implemented by an AS causes “trouble” to its neigh- bors). Finally, others will be able not just to leverage Definition of a congestion event. We define a con- our data sets, but to actively use our system to gather gestion event over each 30-second-long epoch. If we their own data, performing independent analysis. measure high queuing delays (Section 2.2.3) on a link Our key findings are as follows. (i) Congestion events during a 30-second-long epoch, we regard it as a con- on some core links can be highly correlated. Such cor- gestion event on this link. Consequently, such a link relation can span across up to three neighboring ASes. becomes a point of congestion. Figure 1 exemplifies con- (ii) There are a small number of hot spots between or gestion events on a link detected by our measurement within large backbone networks exhibiting highly inten- tool (Section 2.2). sive time-independent congestion. (iii) Both of these Congestion intensity. When we detect a congestion phenomena are closely related to the AS-level traffic ag- event, we quantify its congestion intensity as the per- gregation effect, i.e., when upstream traffic converges to cent of probing samples that observe high queuing de- a thin aggregation point relative to its upstream traffic lays on the link during the 30-second-long epoch. The volume. intuitive meaning of the congestion intensity is how The implications of our findings are the following. frequently congestion occurs during a 30-second-long First, the common wisdom is that the diversity of traf- epoch.

Load more