LOGIN: AUGUST 2005 NSDI ’05 79 Router Software
Total Page:16
File Type:pdf, Size:1020Kb
2nd Symposium on into economic gain, e.g., a faster originate from one underlying site for a top hotel generates an event, and it is difficult to decide Networked Systems Design extra $30 million per year. The core which events are significant to and Implementation idea of the infrastructure is to operators. (NSDI ’05) choose servers as close as possible The tool works by capturing BGP to clients so as to avoid Internet Boston, Massachusetts updates from border routers that problems. This helps because the peer with larger networks. This May 2–4, 2005 Internet consists of more than data is fed into a centralized system 15,000 networks and none of them which processes the updates in real Keynote: The Challenges of Delivering controls more than 5% of the total time. It groups the updates, classi- Content and Applications on the access traffic. Akamai’s SureRoute fies them into events, correlates the Internet also finds alternative routes via events, and then predicts traffic Tom Leighton, MIT/Akamai intermediate Akamai servers when impact. A key difficulty is the large the network fails or performs Summarized by Ningning Hu volume of BGP updates (there are poorly. It monitors roughly 40 millions daily). The discussion Tom Leighton explained that Inter- alternative routes for each Web site, raised the issue of looking at data net problems adversely affect cur- which improves performance by traffic directly, since significant rent Web services. He pointed out 30% on average. events are by definition those that that, for economic reasons, peering Tom finished by highlighting the affect data traffic. links often have limited capacity recent PITAC report on cyber-secu- and that this can easily lead to poor Design and Implementation of a rity, which calls for more invest- performance, because Internet Routing Control Platform ment in fundamental security routing algorithms do not adapt to research. Matthew Caesar, University of Califor- load. To make matters worse, rout- nia, Berkeley; Donald Caldwell, Aman ing protocols are subject to human Shaikh, and Jacobus van der Merwe, errors, filtering, and intentional I NTERNET ROUTING AT&T Labs—Research; Nick Feamster, theft of routes. Tom discussed MIT; Jennifer Rexford, Princeton Internet security issues, working Summarized by Ram Keralapura and Bob Bradley University through an example of DNS hijack- The motivation for the authors was ing. He made the point that virus Finding a Needle in a Haystack: basic design issues in the iBGP pro- and worm proliferation and DOS Pinpointing Significant BGP Routing tocol connecting routers within and botnet attacks are severe prob- Changes in an IP Network ISPs. Current full-mesh iBGP lems. In 2003, over 10% of PCs on Jian Wu and Zhuoqing Morley Mao, doesn’t scale, is prone to protocol the Internet were infected with University of Michigan; Jennifer oscillations and persistent loops viruses. These are not all home Rexford, Princeton University; Jia Wang, when used with route reflection, PCs: 83% of financial institutions AT&T Labs—Research and is hard to manage and difficult were compromised, double the fig- Morley Mao described a tool that to develop. Their RCP approach ure from 2002. Additionally, 17 out monitors BGP (Internet routing) attempts to address each of these of 100 surveyed companies were updates to find in real time a small problems by computing routes the target of cyber extortion, and number of high-level disruptions from a central point and removing the number of botnet attacks (such as flapping prefixes, protocol the decisions from the routers. Use against commercial sites is rising oscillations due to Multi-Exit Dis- of a centralized system brings up sharply. These problems are very criminators, and unstable BGP ses- the problem of single point of fail- hard to solve, because the Internet sions). Unlike earlier research, it ure. The authors address this issue was designed around an assump- does not focus on finding the root by replicating RCP at strategic net- tion of trust that is no longer valid. cause of routing changes. The work locations. They argue that, Tom then described Akamai’s on- problem addressed is important unlike route reflection, there will demand infrastructure. It is made because route changes are common be no consistency issues that could up of around 15,000 servers at and are associated with congestion potentially result in problems like 2400 locations on over 1000 net- and service disruptions; the hope is forwarding loops. Matt argued that works in 70 countries; Akamai that operators can use notifications the RCP system has better scalabil- serves 10–15% of all Internet Web from the new tool to further miti- ity, reduces load on routers, and is traffic each day. On average, Aka- gate the situation for users. It is easier to manage because it is con- mai can make small Web sites 15 challenging because there are many figurable from a single point. It is times faster and large Web sites 2 to possible reasons for a given routing also deployable, because it does not 3 times faster. Tom said that studies update, and multiple updates can require changes to closed legacy show that this translates directly ;LOGIN: AUGUST 2005 NSDI ’05 79 router software. While RCP is only cheated. Another point of discus- Q: What is the number of con- a first step at this stage, these prop- sion was how well the scheme straints you solved for most net- erties may make it a practical way would work for traffic engineering, works? to improve Internet routing. where preferences change depend- A: We used a fixed set of con- Negotiation-Based Routing Between ing on the load. straints resulting in a polynomial Neighboring ISPs time algorithm. Ratul Mahajan, David Wetherall, and MODELS AND FAULTS IP Fault Localization via Risk Thomas Anderson, University of Summarized by Matthew Caesar Modeling Washington Detecting BGP Configuration Faults Ramana Rao Kompella and Alex C. Sno- Today’s Internet is both a competi- with Static Analysis eren, University of California, San tive and a cooperative environ- Diego; Jennifer Yates and Albert Green- ment, because ISPs are self-inter- Nick Feamster and Hari Balakrishnan, berg, AT&T Labs—Research MIT ested but carry traffic for each Ramana Kompella presented other. Each ISP independently Awarded Best Paper SCORE, a tool that identifies the decides how to route its traffic and Nick Feamster presented RCC, a likely root causes of network faults, optimizes for different points, and router configuration checker that especially when they occur at mul- ISPs don’t share internal informa- uses static analysis to detect faults tiple layers. Today, troubleshooting tion. This can result in inefficient in BGP configurations. Today, is ad hoc, with operators manually paths and unstable routes. Tom checking is highly ad hoc. Large localizing faults reported via SNMP Anderson presented a negotiation configuration faults do occur and traps. This is challenging because model the authors developed to can cause major outages. Nick gave alarms tell little about the failure; help solve these problems. It tries a taxonomy of faults. The goal of network databases can be corrupt to find a point between cooperation the RCC is to allow configurations or out-of-date, networks are highly and competition that limits the to be systematically verified for cor- layered (35% of the links have >10 inefficiencies. ISPs assign prefer- rectness before being deployed. components), and correlated fail- ences for routing options using an Correctness is defined in terms of ures can occur (e.g., a single fiber opaque range. They then exchange two goals: path visibility (if there’s cut can take down several links). these preferences and take turns a path between two points, the pro- SCORE constructs a Shared Risk picking better routing options. tocol should propagate information They can reassign preferences Link Group (SRLG) database that about the path) and route validity provides a mapping from each when needed, and the process stops (if there’s a route, there exists a when either ISP wants it to. This component to a set of links that path). RCC uses goals to produce a will fail if the component fails. It strategy respects ISPs’ self-interest list of constraints and checks these by allowing them to barter route manipulates this as a graph, using constraints against the configura- greedy approximations to find the choices according to their prefer- tions. It was evaluated against con- ences (with each ISP losing a little simplest hypothesis to explain fail- figurations from 17 different ASes. ure observations. SCORE also on some flows and gaining more on It succeeded in uncovering faults others). ISPs have incentives to allows for imperfections (e.g., lost without a high-level specification observations) with an error thresh- find good compromises because of the protocol. The major causes each stands to win overall and has old. It performed well in practice: of errors were distributed configu- The accuracy was 95% for 20 fail- no risk of losing. The goal is for ration and the complexity of intra- both fair play and overall win-win ures; the misdiagnoses were due to AS dissemination (as configuration loss of failure notifications and results. The scheme was evaluated often expresses mechanism, not by simulation, which found that database inconsistencies. Ramana just policy). RCC is available mentioned probabilistic modeling ISPs can achieve close to the online. socially optimal routing even of faults and other domains (MPLS, though they must both win. Future Q: Do large, well-run ISPs generate and soft faults like link congestion) work includes multiple-ISP negoti- router instance configurations in a as future work. ations. centralized manner? Would RCC Q: Would it be practical to use provide any benefit in this case? The question of cheating came up steady-state conditions to improve in the discussion.