2nd Symposium on into economic gain, e.g., a faster originate from one underlying site for a top hotel generates an event, and it is difficult to decide Networked Systems Design extra $30 million per year. The core which events are significant to and Implementation idea of the infrastructure is to operators. (NSDI ’05) choose servers as close as possible The tool works by capturing BGP to clients so as to avoid Internet Boston, Massachusetts updates from border routers that problems. This helps because the peer with larger networks. This May 2–4, 2005 Internet consists of more than data is fed into a centralized system 15,000 networks and none of them which processes the updates in real Keynote: The Challenges of Delivering controls more than 5% of the total time. It groups the updates, classi- Content and Applications on the access traffic. Akamai’s SureRoute fies them into events, correlates the Internet also finds alternative routes via events, and then predicts traffic Tom Leighton, MIT/Akamai intermediate Akamai servers when impact. A key difficulty is the large the network fails or performs Summarized by Ningning Hu volume of BGP updates (there are poorly. It monitors roughly 40 millions daily). The discussion Tom Leighton explained that Inter- alternative routes for each Web site, raised the issue of looking at data net problems adversely affect cur- which improves performance by traffic directly, since significant rent Web services. He pointed out 30% on average. events are by definition those that that, for economic reasons, peering Tom finished by highlighting the affect data traffic. links often have limited capacity recent PITAC report on cyber-secu- and that this can easily lead to poor Design and Implementation of a rity, which calls for more invest- performance, because Internet Routing Control Platform ment in fundamental security routing algorithms do not adapt to research. Matthew Caesar, University of Califor- load. To make matters worse, rout- nia, Berkeley; Donald Caldwell, Aman ing protocols are subject to human Shaikh, and Jacobus van der Merwe, errors, filtering, and intentional I NTERNET ROUTING AT&T Labs—Research; Nick Feamster, theft of routes. Tom discussed MIT; Jennifer Rexford, Princeton Internet security issues, working Summarized by Ram Keralapura and Bob Bradley University through an example of DNS hijack- The motivation for the authors was ing. He made the point that virus Finding a Needle in a Haystack: basic design issues in the iBGP pro- and worm proliferation and DOS Pinpointing Significant BGP Routing tocol connecting routers within and botnet attacks are severe prob- Changes in an IP Network ISPs. Current full-mesh iBGP lems. In 2003, over 10% of PCs on Jian Wu and Zhuoqing Morley Mao, doesn’t scale, is prone to protocol the Internet were infected with University of Michigan; Jennifer oscillations and persistent loops viruses. These are not all home Rexford, Princeton University; Jia Wang, when used with route reflection, PCs: 83% of financial institutions AT&T Labs—Research and is hard to manage and difficult were compromised, double the fig- Morley Mao described a tool that to develop. Their RCP approach ure from 2002. Additionally, 17 out monitors BGP (Internet routing) attempts to address each of these of 100 surveyed companies were updates to find in real time a small problems by computing routes the target of cyber extortion, and number of high-level disruptions from a central point and removing the number of botnet attacks (such as flapping prefixes, protocol the decisions from the routers. Use against commercial sites is rising oscillations due to Multi-Exit Dis- of a centralized system brings up sharply. These problems are very criminators, and unstable BGP ses- the problem of single point of fail- hard to solve, because the Internet sions). Unlike earlier research, it ure. The authors address this issue was designed around an assump- does not focus on finding the root by replicating RCP at strategic net- tion of trust that is no longer valid. cause of routing changes. The work locations. They argue that, Tom then described Akamai’s on- problem addressed is important unlike route reflection, there will demand infrastructure. It is made because route changes are common be no consistency issues that could up of around 15,000 servers at and are associated with congestion potentially result in problems like 2400 locations on over 1000 net- and service disruptions; the hope is forwarding loops. Matt argued that works in 70 countries; Akamai that operators can use notifications the RCP system has better scalabil- serves 10–15% of all Internet Web from the new tool to further miti- ity, reduces load on routers, and is traffic each day. On average, Aka- gate the situation for users. It is easier to manage because it is con- mai can make small Web sites 15 challenging because there are many figurable from a single point. It is times faster and large Web sites 2 to possible reasons for a given routing also deployable, because it does not 3 times faster. Tom said that studies update, and multiple updates can require changes to closed legacy show that this translates directly

;LOGIN: AUGUST 2005 NSDI ’05 79 router software. While RCP is only cheated. Another point of discus- Q: What is the number of con- a first step at this stage, these prop- sion was how well the scheme straints you solved for most net- erties may make it a practical way would work for traffic engineering, works? to improve Internet routing. where preferences change depend- A: We used a fixed set of con- Negotiation-Based Routing Between ing on the load. straints resulting in a polynomial Neighboring ISPs time algorithm. Ratul Mahajan, David Wetherall, and MODELS AND FAULTS IP Fault Localization via Risk Thomas Anderson, University of Summarized by Matthew Caesar Modeling Washington Detecting BGP Configuration Faults Ramana Rao Kompella and Alex C. Sno- Today’s Internet is both a competi- with Static Analysis eren, University of California, San tive and a cooperative environ- Diego; Jennifer Yates and Albert Green- ment, because ISPs are self-inter- Nick Feamster and Hari Balakrishnan, berg, AT&T Labs—Research MIT ested but carry traffic for each Ramana Kompella presented other. Each ISP independently Awarded Best Paper SCORE, a tool that identifies the decides how to route its traffic and Nick Feamster presented RCC, a likely root causes of network faults, optimizes for different points, and router configuration checker that especially when they occur at mul- ISPs don’t share internal informa- uses static analysis to detect faults tiple layers. Today, troubleshooting tion. This can result in inefficient in BGP configurations. Today, is ad hoc, with operators manually paths and unstable routes. Tom checking is highly ad hoc. Large localizing faults reported via SNMP Anderson presented a negotiation configuration faults do occur and traps. This is challenging because model the authors developed to can cause major outages. Nick gave alarms tell little about the failure; help solve these problems. It tries a taxonomy of faults. The goal of network databases can be corrupt to find a point between cooperation the RCC is to allow configurations or out-of-date, networks are highly and competition that limits the to be systematically verified for cor- layered (35% of the links have >10 inefficiencies. ISPs assign prefer- rectness before being deployed. components), and correlated fail- ences for routing options using an Correctness is defined in terms of ures can occur (e.g., a single fiber opaque range. They then exchange two goals: path visibility (if there’s cut can take down several links). these preferences and take turns a path between two points, the pro- SCORE constructs a Shared Risk picking better routing options. tocol should propagate information They can reassign preferences Link Group (SRLG) database that about the path) and route validity provides a mapping from each when needed, and the process stops (if there’s a route, there exists a when either ISP wants it to. This component to a set of links that path). RCC uses goals to produce a will fail if the component fails. It strategy respects ISPs’ self-interest list of constraints and checks these by allowing them to barter route manipulates this as a graph, using constraints against the configura- greedy approximations to find the choices according to their prefer- tions. It was evaluated against con- ences (with each ISP losing a little simplest hypothesis to explain fail- figurations from 17 different ASes. ure observations. SCORE also on some flows and gaining more on It succeeded in uncovering faults others). ISPs have incentives to allows for imperfections (e.g., lost without a high-level specification observations) with an error thresh- find good compromises because of the protocol. The major causes each stands to win overall and has old. It performed well in practice: of errors were distributed configu- The accuracy was 95% for 20 fail- no risk of losing. The goal is for ration and the complexity of intra- both fair play and overall win-win ures; the misdiagnoses were due to AS dissemination (as configuration loss of failure notifications and results. The scheme was evaluated often expresses mechanism, not by simulation, which found that database inconsistencies. Ramana just policy). RCC is available mentioned probabilistic modeling ISPs can achieve close to the online. socially optimal routing even of faults and other domains (MPLS, though they must both win. Future Q: Do large, well-run ISPs generate and soft faults like link congestion) work includes multiple-ISP negoti- router instance configurations in a as future work. ations. centralized manner? Would RCC Q: Would it be practical to use provide any benefit in this case? The question of cheating came up steady-state conditions to improve in the discussion. The authors A: Many ISPs run scripts from a your results, e.g., if you assume the explored simple cheating strategies centralized database, but many do network is working correctly most and argued that there is little incen- not, and even with a centralized of the time? tive to cheat, as the cheater often database there can be errors (e.g., does less well than if he hadn’t bad copy/pastes).

80 ;LO GIN: V OL. 30, NO. 4 A: You could inject faults into a Q: What kinds of application A: Heartbeats are sent at a fixed network and test, but most ISPs behaviors would make your accu- rate, independent of system size. wouldn’t be willing to do that. racy poor? For example, would The total overhead is fairly low. Q: You were able to uncover incon- caching effects reduce your accu- Q: What are your thoughts on sistencies in the database. But isn’t racy? Mercury? this circular: How do you know A: We address caching and some A: Mercury is a hybrid network your inferences were correct if they other issues in the paper, but there with constrained routing that can come from an incorrect database? could be interesting future work in solve complex queries. It emulates A: You have to assume the database that direction. the functionality of unstructured is reasonably accurate. Unfortu- overlay, but cannot solve arbitrary nately, you can’t just query the sys- OVERLAYS AND DHTS queries, such as matching based on tem to find out the IP/optical rela- regular expressions. Summarized by Bernard Wong tions. Bandwidth-Efficient Management of Performance Modeling and System Debunking Some Myths About Struc- DHT Routing Tables tured and Unstructured Overlays Management for Multi-Component Jinyang Li, Jeremy Stribling, Robert Online Services Miguel Castro, Manuel Costa, and Morris, and M. Frans Kaashoek, MIT Christopher Stewart and Kai Shen, Antony Rowstron, Microsoft Research Cambridge Accordion is a DHT (distributed University of Rochester hash table) that addresses the Online services that run on clusters Popular file-sharing applications trade-off between maintenance in heterogeneous environments are such as Gnutella use unstructured overhead and lookup performance. difficult to model, predict, and overlays that do not constrain links Reduced maintenance traffic leads manage. There is work on perfor- between nodes and rely on flooding to lower lookup performance due mance models to guide provision- to spread queries. To improve scala- to churn, while aggressively main- ing for single-component services, bility, structured overlays constrain taining neighbor freshness can be but it is not adequate when multi- node and link placement so that expensive in terms of bandwidth. ple components in the system can queries can be resolved in O(log n) Choices for maintenance frequency be replicated and interact with each hops. However, people have are often uninformed, since a priori other in complex ways. Christo- claimed that structured overlays are knowledge of the churn rate is not pher Stewart described a profile- unsuited for real-world applica- usually available. Instead, Accor- driven approach to model system tions given churn, heterogeneity, dion relies on an outbound band- performance. It works at the OS and complex queries. Miguel Cas- width budget to limit the amount level to profile key application tro’s talk focused on debunking of maintenance. It discovers new characteristics transparently. They these myths using a trace-based nodes, tracks the probability of a predict the resources required for simulation of Pastry and Gnutella neighbor being dead (based on the individual components and trans- 0.4. He showed how methods to lifetime of the neighbor and time of parently capture communications handle heterogeneity that mimic its last communication), and at the system-call level to model unstructured techniques can be removes those whose probability interconnections. Different models added to structured overlays. Simi- exceeds a fixed threshold. Com- are then constructed for through- larly, he described structured flood- pared with Chord and OneHop, put and response time. The authors ing and random walks for complex Accordion achieves lower average compared the resulting predictions queries. lookup latencies for a given average with the actual system performance Q: One advantage of unstructured bytes per node per second alive. and found them to be accurate overlays is that the overlay struc- Q: Small-world properties are based within 1%. ture is decoupled from the service on a neighbor distribution that is Q: Does it make sense to try real- structure, allowing for reuse the inverse of the distance in the ID time feedback to improve the between services. Can you space. Would opportunistic neigh- model online? comment on this? bor discovery change the distribu- A: Yes. We did it offline, but one A: We could reuse structured over- tion and the properties? could refine our approach in an lays too. For example, we can carve A: If lookup keys are not uniform, online fashion. out a part of a larger structured then it is not guaranteed to yield overlay for a single smaller service. Q: Have you considered bottle- small-world characteristics. necks in real machines’ CPU/mem- Q: How do heartbearts scale? Q: What if nodes choose to behave ory/network? maliciously in order to meet band- A: Yes, we do this in our model. width budget?

;LOGIN: AUGUST 2005 NSDI ’05 81 A: Accordion is not designed to predictor of how long it will take to Q: How would a least-common- work in a malicious environment. download the desired content over chunk fetch ordering policy, like Improving Web Availability for Clients the connection. BitTorrent, compare with your with MONET sequential or random orderings? David G. Andersen, Carnegie Mellon STORAGE A: We could do that, but we did not look at it yet. University; Hari Balakrishnan, M. Summarized by Kevin Walsh Frans Kaashoek, and Rohit N. Rao, MIT Q: What consistency guarantees do Shark: Scaling File Servers via you provide while things are being The end-to-end availability of the Cooperative Caching Internet (95% and 99.6% in earlier transferred? studies) compares poorly to stan- Siddhartha Annapureddy, Michael J. A: We guarantee NFS-style consis- dard phone service. MONET aims Freedman, and David Mazières, New tency semantics at all times. This is to achieve 99.9% to 99.99% avail- York University done with leases at the central ability by exploiting the path and Siddhartha Annapureddy presented server. replica diversity that exists for Web Shark, a file system that is as con- Q: Your chunk cache indexes data downloads. It consists of an overlay venient and familiar as NFS, yet only by the hash of the chunk. of squid Web proxies and a parallel scales to hundreds of clients and What do you do in case of hash col- DNS resolver. The key difficulty is supports cross-file-system sharing. lisions? that the number of paths through Pushing bundles of software to the the overlay to all replicas can be nodes of a distributed system is A: We assume there will be no hash large. This is good for diversity, but wasteful, even with dissemination collisions. This is the standard bad for overhead if all paths are to systems such as BitTorrent, because assumption for these scenarios. be explored. In MONET, a way- not all of the software may be Q: You showed scalability of Shark point selection algorithm returns a needed. Instead, what is needed is in terms of bandwidth, but the set of paths separated by delays. the illusion that all files are located server is involved in each chunk These paths are most likely to be on every node, with the files being transfer, no? successful, based on previous path fetched only as needed. NFS pro- A: The authentication and session history, and are explored in order to vides these semantics but does not keys are between client and client, minimize overhead. scale well, because a large number not client and server. The client A six-site MONET has been of clients cause delays at the central must initially talk to the server to deployed for two years with server. On the other hand, P2P file get chunk tokens, but then goes to approximately 50 users per week. systems do scale, but have non- clients to get chunks. This poten- Its waypoint algorithm achieves standard administrative models and tially uses many RPCs if the file is availability that is similar to using new semantics, and so are not very large. widely deployed. Shark combines all possible paths. This is 99.99%, if Glacier: Highly Durable, Decentral- server failures are discounted. Also, both advantages by using a central server model together with very ized Storage Despite Massive Akamai sites have eight times more Correlated Failures availability than non-Akamai sites large cooperative client caches (to if server failures are included in the reduce redundant traffic at the Andreas Haeberlen, Alan Mislove, and availability metric. A challenge in server). One intriguing idea was to Peter Druschel, Rice University gathering real measurements was allow chunks of data to be shared A common assumption in distrib- the many incorrectly configured across file systems, increasing the uted storage systems is that diver- DNS and Web servers; consistently effective size of the cache. Several sity is high because nodes use dif- unreachable services were dis- security concerns were discussed: ferent OSes, applications, adminis- counted in the measurements. clients need to be able to check trators, users, etc. This results in integrity, eavesdroppers should not independent failure models, so that Q: When performing parallel con- be able to see the contents, and nections, does MONET just per- reliability comes from a small cache sharing is somewhat in con- amount of replication. But these are form the TCP connect, or does it flict with privacy. In one PlanetLab download the object twice? unrealistic assumptions in practice: test, Shark retrieved a 40MB pack- 70–80% of the OSes in use are Win- A: MONET just performs the TCP age in seven minutes, compared to dows, and a virus or worm can lead connect. 35 minutes for SFS. Another test to a correlated failures that spreads Q: Would MONET choose lossy revealed an eightfold improvement too rapidly even for reactive but low-latency links? over NFS in the number of bits approaches to respond. So what pushed through the network. A: Previous studies have shown can we do? Glacier’s approach is to that the first SYN packet is a good use massive redundancy to tolerate

82 ;LO GIN: V OL. 30, NO. 4 correlated failure rather than try to more than one email system and with the Teoma Internet search predict and exploit correlations sometimes let their node remain service alongside five other QoS (like Phoenix and OceanStore). Of offline for more than a week. We architectures. They also examined course, there is an upper bound on have switched to four weeks, but the effects of sudden fluctuations in the maximum number of failures, perhaps could do something traffic and cluster node failures. but it is easier to pick this number automatic. Trickles: A Stateless Network Stack for than to specify a complete failure Q: How do you assign fragments to Improved Scalability, Resilience, and model. nodes, and how do nodes know Flexibility The central question is whether which fragments to store? Alan Shieh, Andrew C. Myers, and Emin this can be done with a reasonable A: The assignment of fragments to Gün Sirer, Cornell University amount of storage and bandwidth. nodes is done by the hash of the Today’s client-server applications Glacier uses erasure codes with a fragment. We divide the ring into high degree (50 or 100 fragments are built on TCP/IP, which stores 48 sections, and store at hash+1x, per-connection state at both ends. per object, with only about 5 hash+2x, ..., hash+48x. needed to recreate the object) and This limits scalability (due to mem- replicates data, too. Even during a Q: How did you know not to store ory constraints) and leaves the correlated failure there should be a fragment on the node that was server vulnerable to denial of serv- enough fragments to reconstruct down at the time of insertion? ice. Alan Shieh presented Trickles, objects. A risk in Glacier is that A: The neighbors in the ring keep a radical alternative in which the objects may expire during a corre- the pointer to the down node for server state is moved to the clients. lated failure. Also, the per-file over- one week, and can then report the Each client supplies transport and head is especially large for small node as being down whenever a user continuations along with their files. message is destined for that node. packets to request any computa- tion. The server establishes a con- Glacier was evaluated with a trace- text based on these continuations, driven workload and deployment BUILDING NETWORK SERVICES performs the requested computa- with 17 users and 20 nodes based Summarized by Ashwin Sampath tion, updates the associated state, on FreePastry, PAST, Scribe, and and sends it back to the client Post. An artificial 58% correlated Quorum: Flexible Quality of Service along with the result of the compu- failure induced no losses of data at for Internet Services tation. To make this work, the all. Glacier has yet to see any loss of Josep M. Blanquer, Antoni Batchelli, authors implemented an event- data in deployment. Klaus Schauser, and Rich Wolski, Uni- based server API. This design lends Q: In your test system, you use versity of California, Santa Barbara itself to efficient server load balanc- 5/48 encoding even though you Internet services such as e-com- ing schemes and transparent server had only 20 nodes. Couldn’t you merce tend to be clustered architec- failover mechanisms, because just use 5/20 nodes? tures in which it is important to clients establish contexts before A: We wanted an idea of a realistic provide acceptable levels of service issuing each computation request. overhead. During the experiment, to different kinds of customers. A typical target application is a the size of the system grew and Current solutions either throw busy Web server. Alan presented an changed, and we felt 5/48 would be hardware at the problem (overpro- evaluation that showed the mem- more realistic for a larger system. visioning) or embed QoS logic in ory overhead of Trickles to be lower the application code. This is expen- than TCP/IP and the throughput Q: If you knew the size of the sys- rates to be comparable. tem, would you set the number of sive either in terms of equipment or fragments to equal the number of in reprogramming time. Josep Blan- Designing Extensible IP Router nodes? quer presented Quorum, which Software tackles these problems while being Mark Handley, University College, A: Normally there would be many readily deployable. Quorum pro- more nodes than fragments. London/ICSI; Eddie Kohler, University vides its QoS guarantees at the of California, Los Angeles/ICSI; Atanu Q: Won’t the downtime constant boundaries of an Internet site. This Ghosh, Orion Hodson, and Pavlin cause poor performance because it is an effective location to classify Radoslavov, ICSI is fixed and will be a poor choice user requests into service classes sometimes? and shape traffic based on the pri- Everyone wants to fix BGP in some way (convergence, security, scala- A: The one-week figure came from orities of incoming requests, with- out delving inside the cluster. The bility), but the size of the routing an assumption that users could not infrastructure and expectations do without email for more than a authors show this by evaluating their solution on a 68-CPU cluster of 99.999% uptime make experi- week. In reality, users were using ments with routing software almost

;LOGIN: AUGUST 2005 NSDI ’05 83 impossible. Mark Handley pre- presented results validating the Sustaining Cooperation in Multi-Hop sented XORP, IP routing software hardware. He also showed that dif- Wireless Networks designed for extensibility, latency, ferent wireless cards from the same Ratul Mahajan, Maya Rodrig, David and scalability. XORP is based on manufacturer and card family have Wetherall, and John Zahorjan, Univer- an event-driven architecture with surprisingly different RSSI and sity of Washington emphasis on quick processing and noise characteristics. In the discus- propagation of routing changes sion, it was suggested that Glenn Maya Rodrig presented Catch, an between processes. This lends itself compare the results of using the add-on to multi-hop wireless rout- to extensibility and experimenta- emulator with that of simulation ing protocols to deter “free-riding,” tion since each process is inde- models in simulators like QualNet in which nodes use the network pendent. XORP’s BGP implementa- and ns-2. but decline to forward packets. The protocol first detects free-riding tion is based on a data flow model, Geographic Routing Made Practical with routing tables implemented as behavior, then leverages the major- processes that pass along routing Young-Jin Kim and Ramesh Govindan, ity of “good” nodes to punish the updates. This differs from conven- University of Southern California; Brad “bad” node. The key idea was to tional router software designs, Karp, Intel Research/Carnegie Mellon send anonymous probes to which where all routing protocols process University; Scott Shenker, University of neighbors must respond. This routing updates and store routes in California, Berkeley/ICSI forces a potentially bad node in the a single large table. The trade-off is Young-Jin Kim described the Cross- network to reveal its connectivity that the modular and robust design Link Detection Protocol (CLDP) to everybody. Furthermore, packets of XORP marginally increases for enabling geographic routing. relayed by a node can be overheard, memory usage but results in faster Previous geographic routing due to the broadcast nature of the routing convergence. To show this, (GPSR, Greedy Perimeter Stateless medium. Detection thus boils the authors tested the convergence Routing) is based on face traversal down to checking whether more times of Cisco(IOS), Quagga, and with the right-hand rule. This data packets (which were meant to MRTD: Cisco and Quagga routers needs a perfect planarization of the be forwarded) are dropped as com- take up to 30 seconds to converge, radio graph to operate correctly, pared to the anonymous probe while MRTD and XORP are consis- and fails in practice due to irregular responses. The protocol also incor- tently under one second. localization of wireless cards and porates a strategy based on one- radio-opaque obstacles. The previ- way hash functions to enable ously proposed “mutual witness” neighbors to punish a misbehaving WIRELESS fix also suffers from problems: It node. Handling attacks based on Summarized by Ashwin Bharambe generates some additional cross- signal strengths is future work. Using Emulation to Understand and links and can result in collinear Q: What about Sybil attacks? Improve Wireless Networks and links as well. CLDP discovers and A: Catch builds on unforgeable Applications removes cross-links in a radio identities for nodes. graph. It leaves some cross-links to Glenn Judd and Peter Steenkiste, prevent network partitions, but Q: Can you falsely accuse a “good” Carnegie Mellon University guarantees that face traversal will node? Most wireless network studies are never fail. CLDP was evaluated A: Yes, in which case Tit-for-Tat performed in simulation, which using the TinyOS simulator with retaliates. can be carefully controlled but 200 nodes and 200 obstacles. It misses many realistic factors. outperformed previous geographic Glenn Judd proposed an emulation routing protocols in terms of main- infrastructure to bridge the gap taining reachability and providing between simulation and real test- low stretch. bed evaluation. The basic idea is to Q: Does CLDP work under dynamic use real wireless NICs at the sender conditions? and receiver and to control signal propagation through a customized A: Yes, if the velocity of the nodes is FPGA. Analog signals from the limited. sender are down-sampled and con- verted to digital format, processed by a DSP engine (built using the FPGA), converted back to analog format, and fed to the wireless antenna at the receiver. Glenn

84 ;LO GIN: V OL. 30, NO. 4 SYSTEM MANAGEMENT AND delays. However, for some cases, Live Migration of Virtual Machines CONFIGURATION Akamai uses cache invalidation. Christopher Clark, Keir Fraser, and The Collective: A Cache-Based System Steven Hand, University of Cambridge Summarized by Sherif Khattab and Management Architecture Computer Laboratory; Jacob Gorm Dushyant Bansal Ramesh Chandra, Nickolai Zeldovich, Hansen and Eric Jul, University of ACMS: The Akamai Configuration Constantine Sapuntzakis, and Monica S. Copenhagen; Christian Limpach, Ian Management System Lam, Stanford University Pratt, and Andrew Warfield, University of Cambridge Alex Sherman, Akamai Technologies and About 30,000 desktops are infected Columbia University; Philip A. Lisiecki every day, and downtime and confi- It takes about eight seconds to and Andy Berkheimer, Akamai Tech- dentiality breaches translate into move the memory of a Virtual nologies; Joel Wein, Akamai Technolo- monetary damage. Ramesh Chan- Machine (VM) over a machine gies and Polytechnic University dra presented the Collective, a cluster running Xen with net- Akamai’s CDN (Content Delivery cache-based system to improve the worked storage, good connectivity, Network) serves Web content management of desktop PCs. It and support for L2 or L3 traffic using 15,000+ edge servers trades customizability for manage- redirection. Meanwhile, live inter- deployed in 1,200+ ISPs. Its config- ability through centralized manage- active applications, such as Web uration information comes from ment and distributed computation. servers, game servers, and quorum Akamai customers, who want to The Collective introduces the con- protocols, have soft real-time control how their content is being cept of a virtual appliance, an requirements. Ian Pratt presented a served via hundreds of parameters encapsulation of system state (OS, technique for relocating interactive (e.g., cache TTL, allow lists, cookie shared libraries, and installed VMs with downtime as low as management) and internal Akamai applications). Examples include 60ms. It uses iterative, rate-limited services such as mapping and load Windows XP, GNU/Linux with pre-copy of VM memory while the balancing. Alex Sherman presented NFS, and GNU/Linux with local VM continues to run. Pre-copy is the ACMS system for timely, reli- disk. Appliances are stored in more effective than on-demand able delivery of dynamic configura- appliance repositories editable only page faulting and leaves no “resid- tion files in this system. ACMS is by administrators, whereas user ual dependencies” on the original composed of front ends that accept, state (user preferences and data) is host. Pratt introduced the concept store, and synchronize configura- stored in data repositories. In the of the Writable Working Set tion file submissions, and back Collective, software updates are (WWS) of a VM. They represent ends that deliver configuration files atomic and dependable. Caching hot pages, such as process stacks, to edge servers. It uses a quorum- provides support for disconnected and network receive buffers. The based protocol for agreement and operation, a useful feature for size and dirtying rate of WWS are synchronization among the front mobile users: Chandra described a crucial in determining the number ends. Recovery is optimized using USB memory stick carrying appli- and rate of pre-copy iterations. snapshots, a hierarchical versioning ances and data. A prototype of the Pratt also presented results for relo- structure. Edge servers download Collective has been used for about cating a Web server running the configuration files via Akamai’s a year on a daily basis at Stanford. SPECWeb benchmark, a Quake3 CDN with hierarchical caching. Users find the system to be simple, game server, and a synthetic worst ACMS is divided into zones that are with low virtualization overhead. case with rapid page dirtying. tested incrementally to avoid sys- From a 15-day block read trace, temwide effects from bad configu- 80% of requests were for 20% of the SECURITY ration files. During the first nine data. Answering a question from months of 2004, 36 network fail- the audience, Chandra identified Summarized by Robert Picci ures affected the front ends, and in graphics applications and 3-D Botz-4-Sale: Surviving Organized over six months of 2004 there were games as unsuitable for usage in DDoS Attacks That Mimic Flash three recorded instances of file cor- the Collective. Crowds ruption. ACMS continued to work Srikanth Kandula and Dina Katabi, successfully. It took about two min- MIT; Matthias Jacob, Princeton Univer- utes to submit and deliver most sity; Arthur Berger, MIT/Akamai configuration files. An audience member asked about TTL versus Awarded Best Student Paper cache invalidations. Sherman Srikanth Kandula focused on responded that the TTL technique CyberSlam attacks, in which an is easier and tolerates propagation attacker harnesses potentially hun- dreds of thousands of “bots” spread

;LOGIN: AUGUST 2005 NSDI ’05 85 across the Internet to take down a cryptography can be expensive. Q: With a TinyOS model you can Web site. The key feature of these Cashmere deals with failures by meet resource allocation guaran- attacks is that they attempt to exploiting the overlay to route each tees. You can’t with your approach. exhaust resources on the server by packet to a group of nodes rather Which is better? making requests that are indistin- than a single node. This makes it A: Periodic duty cycling is good for guishable from those of legitimate more likely for packets to get some applications, but not all. clients. Srikanth presented a novel through when there is churn. Cash- defense based on CAPTCHAs, the mere reduces the amount of per Beacon Vector Routing: Scalable graphical reverse Turing tests used packet cryptographic computation Point-to-Point Routing in Wireless to prevent automated account by decoupling the payload from the Sensornets signup. When a CyberSlam attack routing information. Session keys Rodrigo Fonseca, Cheng Tien Ee, David or flash crowd is detected, the sys- and lightweight symmetric ciphers Culler, and Ion Stoica, University of Cal- tem starts using CAPTCHAs to dis- can then be used instead of public- ifornia, Berkeley; Sylvia Ratnasamy, tinguish legitimate users from key cryptography. Intel Research; Jerry Zhao, ICSI; Scott attackers. They are served without Shenker, University of California, per-client state at the server. Once Berkeley/ICSI SENSOR NETWORKS it has learned which clients are the Rodrigo Fonseca presented BVR, a attackers (they cannot solve Summarized by Rebecca Braynard simple routing protocol that only CAPTCHAs), the system switches Decentralized, Adaptive Resource uses local state and does not depend into a mode where known attackers Allocation for Sensor Networks upon geographic locations. Instead, are kept out and new users are BVR creates a virtual coordinate allowed in without CAPTCHA Geoffrey Mainland, David Parkes, and Matt Welsh, Harvard University space with connectivity informa- tests. Admission control is also tion. In the algorithm, r nodes are used to balance system resources Matt Welsh talked about control- chosen to be beacons, and the between authenticating new users ling sensor network resources in a remaining nodes find their dis- and serving those who have proven distributed manner by using mar- tances to the beacons. To transmit themselves legitimate. This ket prices. Nodes determine their a packet, a node uses the destina- improves server responsiveness, actions using a globally known tion location to route packets not only under attack, but also reward: local available energy and through the neighbor closest to the under flash crowds. data dependencies. These actions destination (greedy algorithm). If Cashmere: Resilient Anonymous include listening for incoming mes- the nodes are in a local minimum, Routing sages and taking sensor readings. they send the packet through the The algorithm is motivated by the Li Zhuang and Feng Zhou, University of closest beacon node. If the greedy example of tracking a tank in a field algorithm does not work, the California, Berkeley; Ben Y. Zhao, Uni- of sensors and is evaluated through versity of California, Santa Barbara; packet is flooded through the net- a 100-node simulation with the work. BVR was evaluated with a Antony Rowstron, Microsoft Research, metrics of accuracy, energy con- UK high-level simulation (3200 sumption, and energy efficiency. nodes), an implementation on Cashmere addresses some weak- The mechanism uses less energy to Mica2 Motes, and a low-level simu- nesses in existing anonymous rout- track an object and is more effec- lator, TOSSIM. It was found to out- ing by using a structured overlay tive at adapting to changing condi- perform a greedy geographic rout- (in this case, FreePastry). Li tions. The authors plan to develop ing protocol. Zhuang began with the basic idea richer models that extend alloca- of secure anonymous routing: tion across multiple users and Q: Will the beacons be running out Packets are sent to their destina- queries and adjust reward settings of power prematurely? tions through a series of intermedi- during runs. A: Not necessarily, since the data aries such that no one but the Q: Can the pattern of movement does not go through the beacons, sender knows the entire path; cryp- lead to dead nodes? so they may not consume more tography is used to hide routing power. A: The energy budget of a node information from nodes as well as Active Sensor Networks to protect the message contents. limits its consumption. Without massive collusion, no one Q: Since nodes have a local view, Philip Levis and David Culler, Univer- knows who sent the packet, and can they get caught in a “busy- sity of California, Berkeley; David Gay, only the receiver can see its con- body” situation? Intel Research tents. However, with earlier A: Yes, nodes can get caught in Phil Levis argued that sensor net- schemes, failed intermediaries loops, and feedback is needed. works cannot realize their potential can reduce reliability, and the given the energy consumption

86 ;LO GIN: V OL. 30, NO. 4 associated with existing frame- and efficient infrastructure for pro- A: It is a conservative approach and works. Sensor networks often need gramming devices. (See the paper a drawback. To reduce the impact, to be reprogrammed after deploy- for details on their design.) To programmers should use short- ment, as it’s not efficient to collect show their effectiveness, Phil com- running handlers. all data and process it offline. Yet pared the original and the VM Q: In many projects, the work is to they do not need to be repro- implementations of a region library overcome small amounts of mem- grammed, since the networks are (Regions Fiber) and query library ory. Given Moore’s Law, should this application-specific. Instead, an (TinyDB/TinySQL) on a 42-node work be focused on energy con- application-specific virtual machine testbed. sumption instead of memory man- (ASVM) can be used. This lever- Q: To provide concurrency, the agement? ages the trade-off that many cycles Banker’s algorithm is used; does can be performed by a sensor node A: Memory is limited by energy; this create a disadvantage for allo- this will affect how much memory for each bit sent or received. cating resources? ASVMs provide a flexible, simple, is available and how it is used.

USENIX Membership Updates

Membership renewal information, notices, and receipts are now being sent to you electronically. Remember to print your electronic receipt, if you need one, when you receive the confirmation email. You can update your record and change your mailing preferences online at any time.

See http://www.usenix.org/membership.

You are welcome to print your membership card online as well. The online cards have a new design with updated logos—all you have to do is print!

;LOGIN: AUGUST 2005 NSDI ’05 87