5. It must scale to support 2006 USENIX Annual many users with minimum Technical Conference resources available. Boston, MA Peterson explained that they May 30–June 3, 2006 favor evolution over a clean slate, and design principles over a fixed conference KEYNOTE: PLANETLAB: architecture. Design principles EVOLUTION VS. INTELLIGENT include: reports DESIGN IN PLANETARY-SCALE I leverage existing INFRASTRUCTURE and interfaces I keep VM monitor and con- THANKS TO THE SUMMARIZERS Larry Peterson, Professor and Chair, trol plane orthogonal Department of Computer Science, I exploit virtualization Princeton University; Director, 2006 USENIX ANNUAL TECH I give no one root (no more PlanetLab Consortium Marc Chiarini privilege than necessary) Rik Farrow Summarized by Yizhan Sun I support federation Wei Huang PlanetLab is a global platform John Jernigan for evaluating and deploying VIRTUALIZATION network services. It currently Scott Michael Koch includes 670 nodes, spanning Summarized by Marc Chiarini Kiran-Kumar Muniswamy-Reddy 325 sites and 35 countries, and Antfarm: Tracking Processes in a Partho Nath has more than 3 million users. Virtual Machine Environment Aameek Singh PlanetLab hosts many kinds of Stephen T. Jones, Andrea C. Arpaci- Chris Small services, including file transfer, Dusseau, and Remzi H. Arpaci-Dusseau, Yizhan Sun routing, DNS, multicast, Inter- University of Wisconsin, Madison net measurement, and email. SRUTI ’06 Stephen Jones presented an Larry Peterson summarized John Bethencourt approach that allows virtual design requirements for the memory managers (VMMs) to Balanchander Krishnamurthy PlanetLab architecture: Anirudh Ramachandran track the existence and activities 1. It must provide a global of guest OS processes. Process- EVT ’06 platform that supports both aware VMMs are better able to Aaron Burstein short-term experiments and implement traditional OS services Sarah P. Everett long-running service. such as I/O scheduling, which Dan Sandler 2. It must be available now, leads to improved performance even though no one knows Ka-Ping Yee over host and guest OS imple- for sure what “it” is. In mentations. The main advantages other words, we must of the authors’ approach are SUMMARIES deploy the existing system threefold: (1) the VMM does not and software. require detailed knowledge of the 2006 USENIX ANNUAL TECHNICAL 3. We must convince sites to guest’s internal architecture or CONFERENCE ...... 85–104 host nodes running code implementation; (2) no changes 2ND WORKSHOP ON STEPS TO REDUC- written by unknown to the guest OS are necessary (a I NG UNWANTED TRAFFIC ON THE researchers from other big win in the case of legacy or INTERNET (SRUTI ’06) ...... 104–106 organizations. This require- closed-source components); and ment is satisfied by building 2006 USENIX/ACCURATE ELECTRONIC (3) accurate inferral of process a relationship between VOTING TECHNOLOGY WORKSHOP events incurs a very low overhead users and service providers (2.5% in their worst-case sce- (EVT ’06) ...... 107–113 through trusted PLC (Plan- nario). The team implemented etLab Consortium). and evaluated their techniques on 4. Sustaining growth depends both x86 and SPARC architec- on support for site auton- tures with the Xen VMM hosting omy and decentralized con- and the Simics full-system trol. simulator hosting Windows.

;LOGIN: OCTOBER 2006 CONFERENCE SUMMARIES 85 Jones described the mechanism level network offload features for determine effective techniques by which a VMM can detect guest domain interfaces, includ- for improving receive perfor- process creation, destruction, ing scatter/gather I/O, TCP/IP mance in the guest domain. In and context switches in the checksum, and TCP segmenta- the Q&A, Mike Swift asked guest. On x86, Antfarm tracks tion offload; second, the perfor- about other common network the contents of the privileged mance of the I/O channel optimizations and how they may CR3 register, which points to the between the guest and network be applicable to this work. page directory for the process driver domains is enhanced; last, Menon responded that more currently running in the guest. the VMM is modified to allow offload features may be useful When the CR3 changes to any of guest OSes to use efficient virtual but their benefit has not yet been a particular range of values, it memory primitives, including studied. can be inferred that a context superpage and global page High Performance VMM-Bypass I/O switch has occurred. If the CR3 mappings. in Virtual Machines is loaded with a previously After a brief overview of the Xen Jiuxing Liu, IBM T.J. Watson Research unseen value, it can further be Network Virtualization Architec- inferred (and the VMM can Center; Wei Huang, The Ohio State ture, Menon discussed the team’s University; Bulent Abali, IBM T.J. Wat- track) that a new process has optimizations in detail. He noted been created. The VMM makes son Research Center; Dhabaleswar K. that 60–70% of the processing Panda, The Ohio State University two more observations to deter- time for transmit/receive opera- mine whether a process has been tions is spent in the I/O channel Jiuxing Liu presented a new destroyed: Windows and Linux and bridging within the driver device virtualization model, systematically clear nonprivi- domain. To help combat this bot- VMM-bypass I/O, that allows leged portions of page table tleneck, an offload driver is guest OSes to perform time- pages before reusing them; the inserted just before the NIC critical I/O operations without TLB must also be flushed once driver on the path to the physical diverting through the VMM or an address space has been deallo- NIC. This driver implements in other specialized I/O VMs. The cated. If the VMM determines software whichever offload fea- problems with these techniques that the number of assigned tures are not already imple- are manifest when one considers pages in a process’s address space mented on the NIC. A 4x, 2.1x, that every I/O operation involves has gone to zero and that the and 1.9x reduction in execution the VMM, making it a potential TLB has been flushed (by load- cost was achieved in the guest bottleneck. Additionally, the sec- ing CR3 with a special value), domain, driver domain, and Xen ond technique results in expen- the VMM can rightly infer a VMM, respectively. sive context switches. The key process exit. Similar techniques is to use a “guest module” device are available for SPARC architec- Menon and his team also driver installed in guest VMs that tures. Jones concluded with a attacked the mechanisms used to handle setup and management case study of Antfarm’s perfor- transfer packets over the I/O operations of direct I/O. These mance improvements for an channel between the guest and modules communicate with anticipatory disk scheduler: By driver domains. They found that “backend modules” within either understanding which disk I/O the current technique of page the VMM or a privileged device requests come from which guest remapping for each network driver VM. Co-located with processes, a scheduler can try to packet is not necessary in many backend modules are the origi- optimize requests across all cases. Using simple methods, nal privileged modules that processes in all guests. such as data copying, packet know how to make requests to header investigation, and MTU- intelligent I/O devices. Optimizing Network Virtualization sized socket buffers, the team in Xen achieves a 15.7% and 17% Liu went on to describe the Aravind Menon, EPFL; Alan L. Cox, improvement in transmission InfiniBand architecture and the Rice University; Willy Zwaenepoel, and reception, respectively, design and implementation of EPFL across the I/O channel. Xen-IB, their IfiniBand virtual- ization driver for Xen. Infiniband Awarded Best Paper! Overall, the optimizations ex- is a high-speed interconnect to Aravind Menon presented three plored in the research improved various devices that supports modifications to the Xen archi- the transmit throughput in guest OS-bypass, allowing processes tecture that significantly improve domains by a factor of 4.4 and in a host OS to communicate its network performance. First, the receive throughput in the (semi-)directly with the hard- the approach implements high- driver domain by 35%. The team ware. The prototype built by the needs to do further research to

86 ;LO GIN: V OL. 31, NO. 5 research team supports all privi- son, traditional methods of seis- simply find an electrical outlet leged InfiniBand operations, mic data logging involve manual when you need one! including initialization, resource collection of data from the field The next steps for the technol- management, memory registra- site, which can be very remote ogy involve nailing down timing tion, and event handling. Liu and difficult to reach. In addi- issues so that earthquakes can be gave a rundown of the Infini- tion, wireless sensors can cover a localized in real time, and utiliz- Band cluster used as a testbed, larger amount of terrain because ing 3D mapping techniques to consisting of Xen 3.0 running of lower costs. map the inside of volcanoes. RedHat AS4 on Intel Xeon The basic system involved the machines. Comparisons were motes, synchronized by GPS STORAGE presented between native Infini- timestamps and sophisticated band and Xen-IB for latency and algorithms, propagating data Summarized by Wei Huang bandwidth (negligible differ- over an ad hoc mesh network to ences), event/interrupt handling a radio modem that communi- Provenance-Aware Storage Systems (10- to 25-uS overhead intro- cated with a base station many Kiran-Kumar Muniswamy-Reddy, duced), memory registration kilometers away. Deploying David A. Holland, Uri Braun, and (25–35% overhead), IP over wireless sensors presents signi- Margo Seltzer, Harvard University InfiniBand (<10% throughput ficant technological challenges, degradation for >16KB-size mes- Kiran-Kumar Muniswamy- however, as high sampling rates Reddy presented a provenance- sages), and MPI bandwidth and generate huge amounts of data, latency benchmarks (negligible). aware storage system. In the con- and maintaining accurate timing text of his work, provenance Finally, Liu discussed some re- of captured events is absolutely refers to the information that maining challenges. Providing a critical for use by seismologists. describes data in sufficient detail complete and efficient bypass He indicated that node reliability to facilitate reproduction and environment requires addressing was one of the largest concerns. enable validation of results. some remaining important is- Ironically, it was not the sensor Kiran-Kumar started his talk sues, such as safe device access, network that failed often in the with several usage cases of prov- QoS among competing VMs, and deployment, but the base station enance-aware storage, such as VM check-pointing and migra- at the observatory, where a lap- applications in homeland secu- tion. The team is eyeing some top would experience sporadic rity, archiving, and business directions they think will be electrical outages. Logistical compliance, where accessing the fruitful. The prototype can be issues and bad luck seemed to history of files may be critical to downloaded at http://xenbits overshadow the technological end users. However, as Kiran- .xensource.com/ext/ acclaim of the sensors as system Kumar pointed out, support for xen-smartio.hg. uptime sank. Seismologists provenance is very limited in file working with the research team systems. Most of the current INVITED TALK lost confidence in the data set as solutions are domain-specific, a result of reliability issues. Nev- which may cause the data and Deploying a Sensor Network on an ertheless, the data set could be the provenance to be out of sync. Active Volcano cleaned up and analyzed using And in many cases the solutions Matt Welsh, Harvard University external validation techniques, are simply lacking. including data from third-party Summarized by John Jernigan Kiran-Kumar argued for the data-logging stations. importance of PASS, which Matt Welsh related his experi- Some of the lessons learned from keeps the data and the prove- ences during two deployments of the deployments were as follows: nance tightly bound and pro- sensor arrays on volcanoes in Accurate timing of captured vides transparent management. Ecuador. He explained that the events must be the first priority. He then introduced their design arrays could potentially provide The goals of the computer scien- of PASS. In their design, the col- civil authorities with warnings of tists and the seismologists were lector records the provenance volcanic activity and help miti- sometimes disparate, and this data or events and passes the gate hazards. The research team affected the usefulness of the records to the file system. The spread “motes” (small and inex- gathered data. Nodes should storage layer, which is a stack- pensive wireless sensors) in a have been collocated with exist- able file system called PASTA, swath around the volcano and ing data-logging stations for later uses an in-kernel database measured seismic and acoustic verification of data. Finally, nev- engine to store the metadata. activity in real time. By compari- er take for granted that you can And the query tool makes the

;LOGIN: OCTOBER 2006 CONFERENCE SUMMARIES 87 provenance accessible to users. snapshots efficiently. Liuba ment system. It stores the user Kiran-Kumar showed that their focused on two important con- execution environments as implementation had reasonable cepts in Thresher: discrimination parcels, which are the complete overhead on applications, both and segregation. Applications VM images, including memory spatially and temporally. discriminate among snapshots and disk snapshot. Different ver- Kiran-Kumar concluded his talk by ranking them according to sions of parcels are stored in a with several research challenges their importance. The storage lossless manner. they are experiencing through manager segregates differently Partho then asked two questions, their prototype study, such as ranked snapshots efficiently, so both of which are answered by searching suitable security mod- that higher-ranked snapshots their evaluations in this paper: els, pruning of provenance, and can be accessed faster and lower- Can content-aware storage re- addressing the network attached ranked snapshots can eventually duce the (1) storage and (2) net- storage. In the Q&A session, be discarded without affecting work requirements in ISR sys- Kiran-Kumar was asked whether the accessibility of higher-ranked tems? And, if so, by how much? there are any micro-benchmark ones and without disk fragmen- Their evaluation consisted of evaluations for PASS. He indi- tation. three dimensions: the policies, cated that small file operations Lazy segregation technique allow the chunk size, and gzip com- micro-benchmarks entail up to the rank of snapshots to be spec- pression. They evaluated three 100–200% overhead time. How- ified after the snapshots are policies for managing the parcels: ever, since most applications do taken, enabling BITE-based the non-CAS baseline policy not access the storage system ranking. Liuba focused on the (“delta”), which stores different that often, the overhead is usu- diff-based segregation technique versions of parcels for each user ally acceptable for applications. and the optimizations for low- as the diff of the previous ver- Thresher: An Efficient Storage Man- cost reclamation and faster sion; the intra-parcel policy, ager for Copy-on-write Snapshots access to snapshots. Liuba con- where each parcel is represented cluded her talk with perfor- by a separated pool of unique Liuba Shrira and Hao Xu, Brandeis mance evaluation of Thresher. chunks shared by all versions University She showed that lazy segregation from the same user; and the ALL Thresher targets BITE (Back-In- and faster snapshots can be policy, where all parcels for all Time Execution) applications implemented with very low per- users are represented by a single that take snapshots of the past formance overhead, allowing a pool of chunks. Gzip can be used state, inspect the snapshots with huge reduction in storage to further compress the data. BITE, and retain snapshots requirements for snapshots. Partho showed their evaluation deemed as interesting for an Design Tradeoffs in Applying Content results. He pointed out that unlimited time for future analy- Addressable Storage to Enterprise- adopting CAS into the storage sis. Liuba started her talk with a scale Systems Based on Virtual system significantly reduces stor- discussion of why today’s snap- Machines age requirements, especially shot systems are inadequate for when using relaxed policy (ALL BITE applications. She pointed Partho Nath, Penn State University; Michael A. Kozuch, Intel Research policy). And within CAS poli- out that it is critical to provide cies, using smaller chunks works applications with the ability to Pittsburgh; David R. O’Hallaron, Jan Harkes, M. Satyanarayanan, Niraj best in spite of metadata over- discriminate among snapshots, heads. Another important obser- so that valuable snapshots can be Tolia, and Matt Toups, Carnegie Mellon University vation is that CAS policies alone retained while the less valuable can consume less storage than a ones can be discarded or moved Partho Nath presented their non-CAS policy with gzip com- offline, because although disk experience on applying Content pression, which avoids the ex- space is cheap, administration of Addressable Storage (CAS) to pensive compression operations. storage becomes costly. enterprise-scale systems based In response to a question on the In the second part of the talk, on virtual machines. Partho first performance overhead of hash Liuba introduced Thresher, a described the Internet suspend calculation for CAS policies. snapshot storage manager based /resume (ISR) client-manage- Partho indicated that they had on new copy-on-write snapshot ment system, which is the execu- not experienced any noticeable techniques. Thresher is the first tion environment at which their slowdown for hash calculations. to provide applications with the work is targeted. ISR is a virtual- ability to discriminate among machine-based client manage-

88 ;LO GIN: V OL. 31, NO. 5 INVITED TALK that surrounded their respective 2003 paper that criticized the businesses. Mike explained that use of hash functions to compare Panel: Open Source Software Business the community surrounding Bdb two files to tell whether they are Models consisted mostly of users of the the same. John presented various Mike Olsen, Oracle, Sleepycat; Brian Bdb library, and although they arguments to make the point Aker, MySQL; , , benefited from the many eyeballs that although hash functions Ximian examining their code and enforc- may not be strong enough for ing high quality, there are no scenarios where there is an Moderator: Stephen Walli, Optaros, Inc. outside contributors. For My- adversary, they are more than Summarized by Scott Michael Koch SQL, Brian said that ideas for fea- sufficient for usage scenarios The discussion began with each tures and quality bug reports are where there is no adversary. John panelist sharing his opinions and the most important contribu- concluded the talk by stating experiences with OSS. Although tions they receive from their that the computation power the panel agreed that OSS busi- community. Miguel explained needed to find collisions in a nesses can be very successful, that his current project, , 128-bit hash function in 24 days Miguel felt that giving away your receives many external code con- would cost around $80,000, and company’s product for free was a tributions, and he believes that for a SHA1 it would take risk, and he does not recom- the amount and type of contri- $80,000,000 and 2 years. So mend starting a business of this butions strongly depend on the other approaches such as social type. The panel seemed to agree maturity of the code base. Mike engineering might be more suc- that there are only certain cir- then said that the most impor- cessful. In the Q&A session, cumstances in which a OSS busi- tant contribution from the com- John agreed that the current ness can have success. Mike munity is the adoption of their hash functions may not be reminded us that it is hard to software, which increases the secure after 20 years. start any sort of business, and visibility and popularity of the An Evaluation of Network Stack Brian added that selling any sort software in the community. Parallelization Strategies in Modern of software, whether proprietary Inspired by a question in the Operating Systems or open source, today is like “set- audience, the panel discussed some lessons they had learned Paul Willmann, Scott Rixner, and ting up a tip jar” in that you just Alan L. Cox, Rice University hope that enough people are from their past experiences with willing to pay for your software. OSS businesses. The only com- The paper was presented by Brian felt that, if you want to mon problem they mentioned Paul Willmann. The paper eval- make money, a service and sup- was that it can be frustrating try- uates three different strategies port model or an ASP model ing to deal with the slashdot- for parallelizing network stacks: makes the most successful long- type community, and anyone (i) message-based (MsgP), (ii) term option, instead of trying to starting an OSS business should connection-based using threads sell a binary. It was pointed out be aware of the energy and effort for synchronization (ConnP-T), that people are becoming very required to constantly nurture and (iii) connection-based us- comfortable with the subscrip- that community. Learn to com- ing locks for synchronization tion model. Miguel felt that the municate with your audiences (ConnP-L). MsgP is the slowest model of building a proprietary appropriately. Marketing to the of the three, as it has a significant server with free clients was the typical OSS user is best done amount of locking overhead. way to go. Everyone agreed that through attending conferences, ConnP-T has lower locking for the traditional model of sell- setting up blogs, and communi- overhead but experiences signi- ing OSS to be successful, it was cating with them one-on-one. ficant scheduling overhead. key to find a niche in the market ConnP-L has the best perfor- mance, as it mitigates both lock- where your product was some- SHORT PAPERS SESSION I thing that everyone needed. ing and scheduling overheads. Mike summarized this well by Summarized by Kiran-Kumar Paul concluded the talk by stat- saying that using open source Muniswamy-Reddy ing that current programs them- software can be successful as a selves haven’t been written to Compare-by-Hash: A Reasoned take advantage of parallelism. tactic if it supports your overall Analysis strategy as a business. J. Black, University of Colorado, The panelists then went on to Boulder talk about the interactions and relations with the communities John Black presented this paper, a rebuttal to Val Henson’s HotOS

;LOGIN: OCTOBER 2006 CONFERENCE SUMMARIES 89 Disk Drive Level Workload application and kernel code. INVITED TALK Characterization RETOS then performs checks on Alma Riska and Erik Riedel, Seagate the machine instructions to Success, Failure, and Alternative Research ensure that applications do not Solutions for Network Security write to or jump to an address Peiter Zatko, BBN Technologies The paper, presented by Eric outside their logical portion. Riedel, characterizes workloads Some of the instructions can be Summarized by John Jernigan in various kinds of devices, verified statically at compile time Peiter Zatko, a.k.a. “Mudge,” including PCs, laptops, and and others need to be verified at spoke of the current state of home devices. The authors col- run time; for the latter, verifica- affairs in network security, offer- lected traces by inserting SCSI or tion code is injected while com- ing his musings and concerns. IDE analyzers into the I/O bus piling the code. He began with a summary of his and intercepting the signals. Transparent Contribution of Memory background; he is a former mem- Some of their findings are as fol- ber of l0pht, has worked with lows. The read/write ratio, the James Cipar, Mark D. Corner, and the National Security Council, access pattern, and write traffic Emery D. Berger, University of and started his own security vary by application. The request Massachusetts, Amherst company, Intrusic. He is now size is around 4 kB. I/O bus and The talk was given by James working for BBN Technologies. disks are underutilized. In the Cipar. Contributory applications Peiter first addressed some of the enterprise/desktop environment, such as condor, SETI@home, requests are spread all over the pertinent questions in network and Farsite utilize wasted CPU security today, asking how much disk. Videos are highly sequen- cycles, idle memory, and free tial. Access characteristics de- progress we have really made, disk space on participating user where we have messed up, and pend on the environment: cache machines. They can, however, management, arrival, and service where we are spinning our disrupt user activity by forcing wheels. He points out that the processes at the disk drive. Char- user pages to disk. Normal acteristics common in environ- Internet has far outpaced our approaches such as scheduling understanding of security. As the ments include idleness and do not help with memory and burstiness. Internet grew and added nodes disk usage. James presented the and users, the threat model Two key questions were ad- Transparent Memory Manager increased, but software was still dressed in the Q&A session. (1) (TMM), which controls memory not being designed with any How are your results different usage by contributory applica- notion of security. Eventually, a from the Hewlett-Packard paper? tions, thereby ensuring that it distinction between internal and Eric replied that they don’t com- does not impair normal system external environments evolved, pare, because of the large differ- functionality. TMM works by much like a military compound ence between the devices and detecting the imprint of user with a fence and a gateway to environment presented in this applications and then limits the swap credentials, but internal paper and those in that paper. memory footprint of contribu- resources were not themselves (2) Can we get the traces? Eric tory applications accordingly. secure. Presently, many networks replied that they may be able to TMM detects the memory are watched by intrusion detec- give out the traces. imprint by keeping an LRU his- tion systems, which only let in Towards a Resilient Operating System togram of memory accesses. and out certain traffic and flag for Wireless Sensor Networks When pages need to be allocated dubious behavior. However, 0- but there are no free pages and days still penetrate these Hyoseung Kim and Hojung Cha, Yonsei both normal and contributory University defenses and will always be one apps have exceeded their limit, step ahead of patches by defini- Currently, the only way to normal apps are favored. Other- tion. Of even greater concern is recover from crashes in sensors is wise, the page is evicted from the that the defenses do little to pre- to reset the sensors. Hyoseueng class that has exceeded its limit. vent unauthorized activity presented RETOS, a resilient, within the network itself. Peiter expandable, threaded operating emphasized that our threat system. RETOS achieves this by model has changed, but our introducing dual mode operation defenses have not grown with and static/dynamic code check- the environment. ing. Dual mode separates out the

90 ;LO GIN: V OL. 31, NO. 5 On the topic of buffer overflows, management, operations on ref- mance servers. The authors felt he suggests that, even if they all erence counters, and thread syn- that, when building these types went away, we would be left with chronization. While looking for of servers, having to deal with plenty of threats, such as root- these bottlenecks they found the thread programming makes kits, sniffing, and trojaned appli- that 43% of total run time was the code much harder to reuse cations. In addition, overflows of spent waiting to acquire locks. and adds the possibility of dead- many different types, such as They were able to eliminate the locks in the code. Having to heap-based and pointer over- bottleneck in memory manage- worry about threading is an flows, abound. In other areas, it ment by enabling a internal unnecessary burden on the pro- has become too easy for naive memory allocator to provide grammer and can significantly developers to create enterprise each thread with a separate pool complicate debugging. Flux aims applications, such as with PHP, of memory, and they also sepa- to separate the programming and vulnerable software is live rated the workspaces of the process so that all the concur- and rampant. threads since the temporary data rency control is taken care of Peiter suggested that firewalls, used by a single thread did not with its simple language, and the intrusion detection systems, and need to be shared. They elimi- logical programming of the intrusion prevention systems are nated the bottleneck on refer- server is done in C, C++, or Java. not really the answer to our se- ence counters by using atomic They found that programming curity woes. We should really be operations without locks instead with this separated method looking at what goes on inside a of using pthread locks. Although allowed the programmer to bet- network as well. We should not this solution is less portable, ter understand the overall func- see drastic changes in the behav- since it depends on specific hard- tionality of the different parts of ior of nodes or out-of-order ware architectures, all the same the server without having to packets on internal systems with platforms are supported, as be- worry about the underlying few routers. We need to adhere fore, through an abstract API. implementation of each of the to RFCs and also to detect when They also implemented more parts. Using Flux they were able behavior does not match real- efficient reader-writer locks by to put together a Web server, world trends on the network. basing the design on Mellor- image rendering, a BitTorrent Crummey’s Algorithm. peer, and a game server that per- The thought we are left with is By identifying and eliminating formed as fast as or faster than that security is still a cat-and- their counterparts written mouse game, and more intelli- the thread synchronization over- head and these other bottle- entirely in C. More information gent methods of security are and a working example of both strongly needed to keep pace necks, they significantly im- proved BIND9 performance with the HTTP and BitTorrent Server with developing technologies can be found at http://flux.cs and Internet expansion. multiple threads. They con- firmed their improvements by .umass.edu/. testing them on a four-way Understanding and Addressing SERVER IMPLEMENTATION machine. Their improvements Blocking-Induced Network Server should be available in BIND9 as Latency Summarized by Scott Michael Koch of version 9.4.0a5. Although Yaoping Ruan, IBM T.J. Watson Re- Implementation and Evaluation of they focused on BIND9, they feel search Center; Vivek Pai, Princeton Moderate Parallelism in the BIND9 the techniques and improve- University DNS Server ments that they used are appli- The last paper in this session was Tatuya Jinmei, Toshiba Corporation; cable to other thread-based applications. way over my head. Even after Paul Vixie, Internet Systems Consor- going though the presentation tium Flux: A Language for Programming and attempting to read the paper, Tatuya Jinmei presented a paper High-Performance Servers any attempt at writing a sum- about improving the perfor- Brendan Burns, Kevin Grimaldi, mary just turned into trying to mance of ISC’s BIND9, a widely Alexander Kostadinov, Emery D. Berger, reword the abstract of the paper. used DNS server. The authors and Mark D. Corner, University of You can find out more about found that it had poor perfor- Massachusetts, Amherst this paper at http://www.cs mance with threads and did not Brendan Burns talked about a .princeton.edu/nsg/papers/ benefit from having multiple new programming language latency_usenix_06/. CPUs. Some of the key bottle- with the goal of simplifying the necks they found were memory process of building high-perfor-

;LOGIN: OCTOBER 2006 CONFERENCE SUMMARIES 91 INVITED TALK Ph.D. you’re not done: You’re education are two very different ready to start. But how can you things. This conference is for Is University Systems Teaching and teach system building and main- graduate types, but industry Research Relevant to Industry? tenance in a university setting? mostly hires undergrads. Andy is Moderator: Gernot Heiser, You can’t. Industrial research fundamentally wrong: Universi- NICTA/UNSW used to be “short-term”; academ- ties and colleges are there to ics did “longterm” research. But serve society, not students or fac- Panelists: Stephen , HP, head of it’s no longer true—academics ulty. We need to help students Linux strategy in R&D; Orran Krieger, are doing short-term one-off figure out what path they want IBM, K42, Xen strategy; Margo Seltzer, things to get the next grant; to follow and how to follow it, Harvard, Oracle, Sleepycat; Tim industry looks at the longer not to build raw fodder for in- Roscoe, Intel Research Berkeley, OS, term. So people coming out of dustry, nor clones of ourselves. distributed systems; Jim Waldo, Sun universities are not ready for the They want to develop thinkers Labs, Jini, Harvard; Andy Tannenbaum, adult world, but Jim doesn’t and people who can make good Vrije University, Minix, 16 textbooks expect them to be. decisions, even if they end up Summarized by Chris Small Tim Roscoe: I don’t like the divi- being lawyers. Tension exists Gernot started by stating the sion between industry and aca- between giving them tools and claims of industry: that universi- demic research. Intel sets up giving them a specific skill set. In ties are not producing the kinds lablets of 10–20 researchers, the long term, the tools are more of systems people need and are closely attached to the university. important. producing irrelevant research. They do not pursue patents on Orran Krieger agrees with Jim Industry does research, but joint work. Everything is sup- that there has been a longterm/ because it doesn’t get published, posed to be published, open short-term inversion. K42 was industry gets no respect from source, etc. Intel can do this developed even though many academia. He then asked each of because Intel is a manufacturing people in the company thought the six panelists to respond to company, not a software com- it was a waste of time, but im- his opening statement. pany. It probably doesn’t make portant skills and knowledge Andy Tannenbaum: What are sense for to do this. were brought into the com- universities for? To serve stu- Intel wanted to strategically pany—for example, Linux and dents? Industry? Government? influence the way universities pervasive virtualization. What Faculty? The average student’s work. Can we get universities to we want from Ph.D.s are people career lasts 40 years; I want to do work that’s of more value to who will come up with radical focus on stuff that will be useful Intel? Intel provides industrial ideas to change things. Ham- for 20 years, emphasizing princi- relevance and resources. Planet- mond said, “Don’t read all the ples, not facts. Teaching how Lab is an example—an attempt to relevant literature—think about MS-DOS works might have been change the research culture in the fundamentals and the prob- very interesting 20 years ago but distributed systems in academia. lem for a month, then go read is less interesting now. I want to There is less emulation and less the literature.” Researchers teach how to keep the design “we ran this on 17 machines, so should work on big, irrelevant simple, good software engineer- clearly it scales up to 100,000 systems and work in teams. We ing practice, and to expect para- nodes” thinking. Students take used to have five-, six-, and digm shifts. Think in terms of their distributed systems text- seven-year Ph.D.s, and that gave systems. Ignore hype; don’t for- book and try to implement the them time to thrash and come up get the past; ideas get recycled. I ideas on PlanetLab—and it with their own ideas. think sometimes I’m supposed to doesn’t work. They learn a lot Stephen Geary: Hey, I’m a me- teach “bloat-ology.” about what really matters by chanical engineer. I have product doing it. How do you teach sys- responsibilities, making sure that Jim Waldo: I’m the industry guy tems principles? It’s very hard, and mostly agree with Andy. But Linux and open-source tech- unless you’re getting experience nologies work on Itanium-based he’s going to concentrate not on building real systems. 20 years from now but now. Stu- systems. A chunk of code or a dents never have to maintain a Margo Seltzer: There are two piece of research by itself is not system longer than it takes to different topics here that this interesting, or not as interesting write the paper. “I don’t fix bugs; nice academic-oriented panel as long-lived supported systems I have a Ph.D.—I write new are trying to hide from you. that do things for customers. You things.” When you get your Undergrad education and grad get them for four years; I get

92 ;LO GIN: V OL. 31, NO. 5 them for 40 years. You have to how to deal with a large body of SECURITY teach people about budgets and code. schedules. Margo: Open source is a fraud— Summarized by Yizhan Sun Andy: The job of the university there are a handful of people Reval: A Tool for Real-time Evaluation is to serve society, but they’re who commit to the Linux source of DDoS Mitigation Strategies turning out lawyers. tree, not tens of thousands. Rangarajan Vasudevan and Z. Morley Jim: What I’m really looking for Orran: We should have a tax on Mao, University of Michigan; Oliver when I’m hiring is people who corporations—where their top Spatscheck and Jacobus van der Merwe, know “how,” not who know people come from, money goes. AT&T Labs—Research “that”—people who know how Margo: Computer science head- An ISP network today faces many to think, not people who know count is plummeting. Students DDoS attacks. The defense deci- facts (e.g., how to build a partic- think “computer science means sion for DDoS attack is often ular kind of hash table). programming, and programming manual and complex. Many Q. Is academia doing anything will be outsourced.” defense/mitigation strategies are right? Orran: One of the best things available, and it is difficult for a Margo: We need to adjust expec- that is happening to academia is network operator to choose the tations. Andy’s students under- dropping enrollment. People appropriate one in real time. The stand how to think about sys- used to get into academia approach presented here is the tems, but they don’t understand because they were excited; for a Reval simulator framework. every line of Windows. while these were people who Reval takes network state, attack Gernot: How is it that academia thought it was a good career info, and mitigation policy as can churn out mechanical and move. Now a higher percentage input and goes through initial- electrical engineers but not com- of people are passionate about it. ization, mitigation setup, traffic puter systems folks? Why are Margo: It’s not that we’re only setup, and evaluation steps. The there so few real systems depart- getting the passionate people. In output of Reval is the optimal ments? 1992 (at Harvard) we had 30 solution for a DDoS attack. Orran: Linux progress is much concentrators; this year we have A case study on the Abilene net- slower than it should be because 12. People who are passionate work was illustrated in the talk. they ignore the literature. They about technology think, “Oh, I Two mitigation mechanisms can did a brilliant job of cloning know how to write programs; I be applied in this case: blackhol- . But that’s not going to don’t need to study computer ing and scrubbing. The result of revolutionize the field. The suc- science.” Or people in other using Reval to determine the cess of Linux has stifled the abil- intellectual disciplines (e.g., right mitigation strategy in real ity to do the kind of research that physics), who used to have to time was explained and evalu- will move the field forward. Ten learn how to program to get a ated. years ago there were more ideas summer job, got seduced, but LADS: Large-scale Automated DDoS moving things forward. now they learn these things in Detection System high school and ignore computer Andy: It’s not the job of universi- science in university. Vyas Sekar, Carnegie Mellon Univer- ties to produce open source sity; Nick Duffield, Oliver Spatscheck, code. But many of the people Q. Of 16 graduates, 11 were dou- and Jacobus van der Merwe, AT&T producing open source code are ble majors, and these were most- Labs—Research; Hui Zhang, Carnegie university graduates. ly in economics. Mellon University Q. People build their own tools. Gernot: To wrap up, what can we Several strategies and their draw- They should come out of univer- do? Or should we just give up? backs for DDoS attacks were sity with the start of their own Stephen: Gelato Consortium is introduced: personal toolkit. a good example; it was founded Wait for customer to complain— Tim Roscoe: That’s insightful. by HP university relations to not effective at all One thing I’ve noticed about advance Linux, Itanium, and supercomputing. Buy a per-egress detection textbooks, particularly in sys- device—expensive and not scala- tems, is that almost all of them Clem Cole: We need to teach ble are useless at teaching how to people how to collaborate. think about operating systems, Install devices at select locations— planning to build a system, or gives incomplete coverage and inaccurate limits on sensitivity

;LOGIN: OCTOBER 2006 CONFERENCE SUMMARIES 93 Use existing data feeds (e.g., and establishment of session (e.g., hardware tables with SNMP and Netflow) keys. parameters) but not too pro- Use SNMP—entails low overhead grammable, data flow to exactly where the data is needed, and and yields few false negatives, but INVITED TALK has low diagnostic ability design for almost never touching off-chip memory. Given that the Use Netflow—has good diagnos- Architectures and Algorithms for Biomolecular Simulation class of algorithms to be run on tics, yields few false positives, but these machines is well known, has higher overhead and does not Cliff Young, D.E. Shaw Research, LLC such a machine could be an scale Summarized by Partho Nath order of magnitude faster than LADS is a better approach. The This talk by Cliff Young was on general-purpose supercomput- mechanism behind LADS is to the need for developing more ers. The speaker commented that use time-series anomaly-detec- powerful hardware to get closer production of such a machine tion triggers collection of Net- to answering challenging ques- was already underway and could flow and do fine-grained analysis tions in modern biology, chem- be expected in 2008. This afterward. Benefits of LADS istry, and medicine. A typical machine is designed to have 16 include detection of high-impact means of understanding phe- segments at the physical level, attacks, efficient data collection nomena in these fields is via each consisting of 512 nodes and reduced computational cost, molecular dynamics (MD)— (ASICs) in a 8-cube toroidal and flexibility. simulation of biologically signifi- mesh (to reflect the physical Bump in the Ether: A Framework for cant molecules at the atomic space being simulated). The Securing Sensitive User Input level. If performing such experi- speaker detailed the performance of this machine for the NT algo- Jonathan M. McCune, Adrian Perrig, ments were accurate and infi- nitely fast it would be easy to rithm (a parallel algorithm for and Michael K. Reiter, Carnegie Mellon range-limited pairwise interac- University perform arbitrary computational experiments such as determining tions of atoms). He noted that Jonathan McCune first intro- structures by watching them this architecture showed asymp- duced how a user’s input (user form, transforming measure- totically less inter-processor name and password) can be ments into data for mining later, communication, which trans- stolen by a malicious application etc. However, for a goal of, say, lates to better scaling. installed on Windows systems. simulating about 64,000 atoms Most of the questions to the Then he introduced a threat at a millisecond scale, with speaker addressed the machine model and some assumptions of explicit water molecules, one under production. Regarding BitE, including a priori knowl- would need a 10,000-fold soft errors (given that the ma- edge of which software is good. increase in computational power chine has thousands of nodes), Then he proceeded to explain if a single state-of-the-art proces- the speaker commented that off- BitE architecture, setup, and sor were used, or a 1,000-fold chip memory has ECC, whereas operation. speedup if a modern parallel on-chip memory is supposed to BitE system architecture is based cluster were used. The talk con- be free from such errors. Addi- upon a partially trusted host sidered the pros and cons of sev- tionally, the runtime does a platform with a BitE Kernel eral different available architec- checkpoint and reload of the module installed and executed. tural options: (a) clusters of simulation once every hour. The BitE kernel module and commodity processors, (b) gen- Another question was whether mobile client participate in key eral-purpose supercomputers writing code for such specialized setup and bypass the traditional (e.g., Blue-Gene), and (c) spe- hardware was going to be a sig- input path to avoid information cial-purpose supercomputing nificant bottleneck. The speaker being stolen by malicious architectures. agreed that this might be a sig- applications. The speaker was of the opinion nificant issue, especially given BitE can be set up through that new specialized, enormous- that programmers were writing device association and applica- ly parallel architectures with spe- code in assembly for a special- tion registration and operates cial-purpose ASICs specially tai- ized hardware. No compiler was through several steps, including lored for MD simulations are the being developed because the application request, verification answer. Optimizations could development cycle for a com- of attestation, user interaction, include arithmetic specializa- piler would be longer than that tion, hardware tailored for speed of developing the code for the corresponding algorithms in

94 ;LO GIN: V OL. 31, NO. 5 assembly itself. Given that the maintains inventories of the live system, play the traces architecture is simplified by the resources offered by providers onto the system/components to absence of both speculation and and matches requests with avail- be tested, and compare the two out-of-order execution, writing able resources. Leases are used to results to detect any errors. Some efficient code for such an archi- bind a set of resource units to a operations, such as change in tecture may not be too bad. An- consumer for a lease term. Bro- schema, cannot be validated by swering a query on power de- kers issue tickets to consumers the first two methods, so they mands, the speaker said that that are redeemed for leases at propose a model-based housing the machine would be the providers. Shirako’s design approach. In this approach, the another nontrivial task, both in makes resource allocation inde- operator can specify the ex- terms of the physical space pendent of the application. Dur- pected behavior using their mod- required and the cooling costs. ing the Q&A, Vivek Pai asked el. The dynamic behavior of the On a question on the numeric how Shirako knows what data system is then validated with precision of the machine, the the application needs (i.e., what that of the predicted model. In author remarked that no float- do you do when the applications the Q&A, Atul Adya asked ing-point arithmetic is used. need disk space but not CPU?) whether they closed the loop, Computations use a fixed-point David replied that they tried to that is, did they go back to the subset of double-precision, i.e., allocate better bandwidth and to DBAs with their results? Fábio 32-bit single-precision fixed- allocate systems closer to the replied that they did not. In point arithmetic. The advantages consumers. Vivek then asked response to another question by gained here were that the simula- how they dealt with applications Atul, Fábio replied that they did tion runs would be more deter- that checkpoint their state and not deal with triggers. ministic and that the pipeline restart on a different system. SMART: An Integrated Multi-Action design would be simpler. Anoth- David replied that other groups Advisor for Storage Systems er question was on whether such have been looking at this and a machine is viable at all: Given they plan to build on that work. Li Yin, University of California, Berke- that the world market may ab- ley; Sandeep Uttamchandani, Ma- Understanding and Validating Data- dhukar Korupolu, and Kaladhar Voru- sorb only about five such ma- base System Administration chines, would it not be cheaper ganti, IBM Almaden Research Center; to just build commodity clusters Fábio Oliveira, Kiran Nagaraja, Rekha Randy Katz, University of California, instead of such a specialized Bachwani, Ricardo Bianchini, Richard Berkeley cluster? The speaker commented P. Martin, and Thu D. Nguyen, Rutgers The talk was given by Li Yin. The that with commodity clusters a University common approach to meeting 1,000-fold speedup would not be The talk was given by Fábio the service level objective (SLO) possible in a five-year timeframe. Oliveira. The goal of this work is for storage systems involves the The speaker conceded that the to reduce database downtime. observe, analyze, and act loop. size of the market justified by Most of database downtime is This approach involves manual such an investment is still an caused by mistakes made by interaction and is slow. There are open question. database administrators. To this existing tools that help automate end, the authors conducted a the task, but these are again survey of experienced adminis- restrictive, as they can correct MANAGEMENT AND ADMINISTRATION trators at SAGE to better charac- only one action, such as work Summarized by Kiran-Kumar terize the source of these errors. throttling, data migration, or Muniswamy-Reddy They found that one common addition of new resources. Li source of errors is that the de- presented SMART, a framework Sharing Networked Resources with ployment environment is differ- that considers multiple correc- Brokered Leases ent from the test environment. tive actions. David Irwin, Jeff Chase, Laura Grit, They also found that DBAs of all SMART aims to maximize the Aydan Yumerefendi, and David Becker, experience levels are prone to system utility for a give opti- Duke University; Kenneth G. Yocum, make mistakes. mization window. SMART con- University of California, San Diego They presented three forms of tains four key components: (1) David Irwin presented Shirako, a validation to reduce operator INPUT modules (containing system to coordinate resource errors: trace-based, replica- sensors monitoring system state, allocation between providers and based, and model-based. In the SLOs, component modules, consumers. Shirako introduces trace-based approach, they log workload request rate, etc.), (2) brokers, a software entity that the requests to and replies from a utility evaluator (which calcu-

;LOGIN: OCTOBER 2006 CONFERENCE SUMMARIES 95 lates the overall utility delivered SHORT PAPERS SESSION II data to share with other users) by the system), (3) single action and user selectivity (e.g., with tools (to automate invocation of Summarized by Wei Huang whom to share the data). a single action), and (4) an sMonitor: A Non-Intrusive Client- However, Aameek’s study re- action advisor that, based on the Perceived End-to-End Performance vealed that, in current *nix sys- other three components, gener- Monitor of Secured Internet Services tems, the lack of convenience in ates a schedule for actions to be data-sharing mechanisms often invoked to improve system util- Jianbin Wei and Cheng-Zhong Xu, Wayne State University leads to users compromising ity. The action advisor operates their security requirements to in two different decision modes: Jianbin Wei first described the conveniently fit the specifica- normal and unexpected. In nor- inadequacies of existing ap- tions of the underlying access- mal mode, it proactively gener- proaches for monitoring end-to- control model. Aameek talked ates decisions to forecasted end user-perceived performance about their studies on two multi- workloads by optimizing local of Internet services, especially user *nix installations. Simply actions to achieve global optima. with increasing deployment of by scanning readable user direc- In unexpected mode, it makes HTTPS services. Jianbin indi- tories and guessing executable- defensive decisions in response cated that there is a strong need only directories, along with to unexpected variations in to deploy a performance moni- email and browser statistics, they workloads. The reason for it tor, which is nonintrusive, easy were able to “attack” massive being defensive is that the unex- to deploy at the server side, and amounts of privacy data, which, pected workload may be tran- can handle HTTPS services. they believed, were not exposed sient. There were no questions Jianbin presented sMonitor, their on purpose. Since the technical after the talk. solution to these goals. sMonitor sophistication of the attacks is consists of a package capture to low and there is no quick fix to INVITED TALK collect live network packets, a such vulnerabilities of private packet analyzer to reconstruct data, Aameek raised a major con- Permissive Action Links, Nuclear the pages of HTTP/HTTPS trans- cern about the inadequate pro- Weapons, and the History of Public actions, and a performance ana- tection of privacy in *nix sys- Key Cryptography lyzer to derive client-perceived tems. Steven M. Bellovin, Columbia response time of the monitored Aameek concluded the talk with University services. Jianbin focused on their some possible solutions to solutions to several key design Summarized by Partho Nath enhance privacy protection, such challenges, such as identifying as using privacy auditing tools to This talk traced the history of encrypted HTTP requests from monitor potential privacy data PALs (Permissive Action Links), packet size analysis, handling exposures or virtualizing the file detailing the motivation for their pipelined requests, and parallel system hierarchy differently for invention and those responsible downloading. different users. But until that for their creation. The speaker Jianbin concluded with their happened, Aameek said, users ran through a timeline of their evaluation of the accuracy of should pay more attention to use and evolution, highlighting sMonitor in measuring HTTPS monitoring the privacy of their the possible design choices made and HTTP services. He showed own data. at those junctures, along with that errors between the client cryptography and key manage- Securing Web Service by Automatic measurements and the reported Robot Detection ment for the different designs. performance of sMonitor, which The talk concluded with possible is deployed at the server, are less KyoungSoo Park and Vivek S. Pai, designs for modern-day PALs than 8%. Princeton University; Kang-Won Lee and what we might learn from and Seraphin Calo, IBM T.J. Watson them in designing secure sys- Privacy Analysis for Data Sharing in Research Center *nix Systems tems. The slides for the talk can KyoungSoo Park presented a be found at Aameek Singh, Ling Liu, and automatic robot detection frame- http://www.cs.columbia.edu/ Mustaque Ahamad, Georgia Institute work to support a secure Web ~smb/talks/pal.pdf. The content of Technology service. KyoungSoo first talked for the talk can be found at The *nix access control model, about the widespread existence http://www.cs.columbia.edu/ as Aameek Singh pointed out, of malicious bots, including ~smb/nsam-160/pal.html. must provide good support for those for password cracking and both data selectivity (e.g., which DDoS attacks. The increasing

96 ;LO GIN: V OL. 31, NO. 5 abuse of robots motivated an visiting a nonauthoritative Web enables the management plane accurate robot detection system. site. to home in on the location of KyoungSoo described their tech- David presented a study on the adversaries by combining the niques to separate human brows- nature and quantity of homo- results of probes from different er activities from robot-gener- graph attacks. Using a nine-day vantage points (called Byzantine ated Web traffic. They include trace of Web traffic from the tomography). Ioannis discussed browser detection and human Computer Science Department advantages of stealth probing, activity detection. Browser of the University of Washington, including its incremental deploy- detection is based on the obser- they probed the DNS to find reg- ability, backward compatibility, vation that most robots are not istered names that are confusable and incentive compatibility. standard browsers; it catches with (i.e., a homograph to) the Ioannis presented two deploy- robots if the behavior deviates names of visited sites. The re- ment scenarios for stealth prob- from that of normal browsers. sults of the study were fourfold: ing. He described how an ISP Human activity detection di- (1) No user visited a nonauthori- can deploy stealth probing to rectly detects humans by observ- tative site during the trace; (2) secure its own infrastructure. ing human activities such as popular Web sites are more like- He also discussed how a pair mouse movement or keyboard ly to have registered confusable of edge networks can deploy events behind the browsers. names than unpopular sites; (3) stealth probing to secure the Hardware events are being registered confusable names tend path through untrusted ASes tracked in dynamically embed- to consist of substitutions of two on the Internet. ded Javascript and the activity is or fewer confusable Latin char- indirectly reported to the server acters, though some IDN (Inter- I NVITED TALK via a fake image request. This national Domain Name) substi- technique is based on the fact tutions were found; and, (4) the Gold and Fool’s Gold: Successes, that current robots are not gener- intent behind most registered Failures, and Futures in Computer ating hardware events. confusable names is benign— Systems Research predominantly advertisements. KyoungSoo showed that most Butler Lampson, Microsoft Research human activities can be distin- David concluded that homo- guished within tens of HTTP graph attacks currently are rare Summarized by Kiran-Kumar requests. And the maximum and not severe in nature. How- Muniswamy-Reddy false positive rate is low (2.4%). ever, given the recent increase in Butler Lampson started off by KyoungSoo also mentioned that phishing incidents, homograph discussing trends in computer with their system deployed on a attacks seem like an attractive use. He then briefly enumerated CoDeeN content distribution future method for attackers to things in the history of computer network, complaints on robot- lure users to spoofed sites. science that worked, things that related abuse have dropped by a Stealth Probing: Efficient Data-Plane didn’t work and why they didn’t factor of 10. KyoungSoo admit- Security for IP Routing work, and a list of things that ted that serious hackers can still Ioannis Avramopoulos and Jennifer “maybe” worked. He claimed break their detection system and Rexford, Princeton University that the future of computer sci- suggested using machine-learn- ence lay in applications that ing techniques as a remedy. Ioannis Avramopoulos started dealt with avoiding catastrophes his talk by introducing the chal- and uncertainties. Cutting through the Confusion: A lenges in secure IP routing. He Measurement Study of Homograph argued that data-plane monitor- In the context of Moore’s law, Attacks ing must be part of any complete improvement in hardware sim- Tobias Holgers, David E. Watson, and solution. However, existing pro- plifies software. Better hardware Steven D. Gribble, University of posals for secure forwarding enables new applications with Washington with link-level fault localization the complexity going into soft- ware. Accordingly, the fields in David Watson introduced a mea- capability are heavyweight, re- quiring cryptographic operations which computers have been used surement study of homograph has been growing. In the 1950s, attacks. A homograph is a char- at each hop in a path. Ioannis presented a lightweight data- computers were used for simula- acter or string that is visually tion. In the 1980s, they were confusable with a different char- plane mechanism that monitors the availability of paths in a used for communication and acter or string. A homograph storage (e.g., email, airline tick- attack tries to fool a user into secure fashion. In intradomain routing, this mechanism also ets, and search engines). By

;LOGIN: OCTOBER 2006 CONFERENCE SUMMARIES 97 2010, computers will be embod- ever, they have worked for com- tions. For example, a speech- ied in the physical world, that is, panies such as Amazon, who can understanding program will get interacting nontrivially with the afford to have 20% of the things some unknown or uncertain physical world, embedded in fac- displayed wrong. input that the computer has to tories, cars, robots, etc. Systems research has failed at approximate. So one way to deal He then gave a list of things that times, the classic case being that with this may be to build para- worked: virtual memory, address we didn’t invent the Web. This is digms where distribution is a space, packet nets, objects/sub- mainly because of the way we standard data type and can be types, transactions, RDB and think. For example, we felt that parameterized over a domain SQL, bitmaps and GUIs, the the design and the idea of the (like lists). Web, and algorithms. The list of Web are too simple. The idea of Peter Honeyman asked the first things that did not work the Web had been around for question. Is it right to attribute includes capabilities, fancy type some time but was never tried. the World Wide Web to physi- systems, formal methods, soft- Computer scientists would have cists? Wasn’t Mosiac developed ware engineering (all they did tried too hard to come up with by computer scientists? Butler: was have interfaces and count an optimal design. Another rea- Could be I oversimplified. Ques- the number of lines), RPC son for the failure is that com- tion: Why is distributed comput- (which failed because the idea puter scientists tend to deny that ing a failure? Don’t we have the was to try to mask the fact that things might work. For example, Web? Butler: We don’t do dis- the call was remote distributed in the case of the Web, they tributed computing. We do computing), persistent objects would have just argued that it client-server, where only two (in which you end up storing a would never scale. machines are talking to each bunch of rubble, because of pro- The future of systems research other. Grid? I don’t understand gram bugs), and security (get- involves building systems that it. Margo Seltzer: IBAL is a lan- ting worse because there is a lot deal with uncertainty and that guage that supports probability more software now; also, people avoid catastrophe (e.g., reducing as a fundamental datatype. I don’t like security, because secu- highway traffic deaths to zero). encourage everyone to try it. rity says no but people want to The problem involves computer Margo Seltzer: It looks like catas- say yes), RISC (Intel retrofit the vision; building world models trophe code is similar to recov- good ideas of RISC into their for roads and vehicles; dealing ery code as it is never run. But- chips). with uncertainty about sensor ler: Catastrophe code should be Things that may have worked inputs, vehicle performance, and a subset of normal code and include parallelism (which now a changing environment; and, shouldn’t be used only in catas- we actually need because we finally, dependability. Butler trophes. Marc Chiarini: Is AI a have multi-core systems, but defines a dependable system as success or a failure? Butler: Yes, many programmers don’t know one that avoids catastrophes. it is successful. When it is suc- how to apply the theory, so we This ensures that the focus is on cessful, it’s spun off, for example, probably can’t make it work), the really important and pro- computer vision. AI continues to garbage collection (which was vides a way to reduce aspirations be a success and continues to be not designed to be used by sys- for a system. Catastrophe pre- a mess. Question: Are not large- tems), interface and specifica- vention has not always worked; scale bank computer crashes tions (with substantial overhead for example, air traffic control computer-only catastrophes? in breaking down the system and specifications state that the Butler: Not true; although it will specifying the interfaces, they are downtime should be 3 seconds/ inconvenience a lot of people, slightly successful in hardware year/workstation. But this is not there is enormous redundancy but not in software), and reus- true. The architecture of the sys- that will get things back to nor- able components (which [1] are tem should have a normal mode mal. Question: RISC is a success, expensive to develop, [2] are and a catastrophe mode. The since most game systems run on specific to how resources are catastrophe mode should have it. Butler: There has not been a allocated and have unique failure clear, limited goals, implying successful RISC system since models, [3] have been successful limited functionality, have <50K then. in filters and big things [e.g., lines, and have high assurance. OSes, DBs, browsers]). Reusable Another issue is dealing with components have not worked for uncertainty. Any “natural” user Ole/COM/Web services; how- interface should make assump-

98 ;LO GIN: V OL. 31, NO. 5 PLENARY SESSION toon physics such as the car’s design software with large pro- suspension, steering, and turn- duction applications, long run Why Mr. Incredible and Buzz ing. times, OpenGL, and 64-bit tech- Lightyear Need Better Tools: Pixar He talked about some of the nology in mind. They also would and Software Development challenges they encountered in like to see a Visual Studio–type Greg Brandeau, Vice President of creating Cars, as well as some of IDE for Linux. They also men- Technology, Pixar Animation Studios Pixar’s other movies. Besides the tioned wanting vendors to pro- vide a sitewide license for soft- Summarized by Scott Michael Koch basic process mentioned here, each movie is custom-made. ware, to make management of The talk began by explaining the Each has a different director, licenses for a large number of process involved in creating a environment, characters, and machines less complicated. movie at Pixar. Using examples technology. Using a technique The talk got a mixed response as from their latest movie, Cars, called Reyes Rendering, the far as audience questions were and past movies such as Monsters memory space of a 32-bit archi- concerned. Several attendees Inc., he explained the key steps tecture machine was not enough, from other large companies at- involved in turning an idea for a so the company were forced to tested to the fact that the prob- movie portrayed in storyboard switch to 64-bit machines when lems and challenges mentioned drawings into a detailed, com- rendering Cars. In fact, a single were not exclusive to Pixar. Oth- puter-rendered movie. Although car required more than 2 GB of ers questioned Pixar’s contribu- all of Pixar’s movies are essential- memory. Overall, it required 2.4 tions back to the open source ly cartoons, the company feels it CPU millennium to render the community. Although they are is important for its movies to movie. active in submitting bug reports contain lifelike effects. Special Along with using commercial and patches to projects they use, attention is paid to detail when some thought that they need to creating the environments in applications, third-party li- braries, and other in-house be the ones taking the initiative which their stories take place, by to start solving these problems in taking into account effects such applications, the majority of Pixar’s work is done with a more the community, and others will as weather and fire. When appro- join them if they see the project priate, Pixar tries to avoid char- than 2-million-line in-house application that has been devel- to be worthwhile. There were acters appearing to have a plastic also suggestions about making texture by giving them fur or oped over the past 20 years. The application is written in a num- all or portions of Pixar code other more detailed textures. open source in various ways, but The next part of the talk began ber of different languages includ- ing C++, C, Python, Perl, and sh. the company does not feel that with the showing of a trailer for would be appropriate for their Cars and an explanation of how The application is constantly being customized to meet the type of software. There were also it compared to some of the mov- a few suggestion about using ies Pixar had done in the past. ever-changing needs of the cur- rent movie. To take advantage of Solaris’s dtrace, which is some- Their movies typically take three thing they are considering. to five years to complete, and the best tools at any given time, although Cars took about the Pixar feels it is important to keep same amount of time to com- their software as cross-platform WIDE AREA DISTRIBUTED SYSTEMS plete as their earlier Toy Story, it as possible. required 300 times the comput- A perceived major problem is Summarized by Wei Huang ing power. He explained the that Linux/OSS development has Service Placement in a Shared basics of various lighting effects, not kept up with the innovation Wide-Area Platform such as irradiance, ambient of hardware. Having had mixed David Oppenheimer, University of occlusion, and reflection, that results with gdb and purify, they California, San Diego; Brent Chun, were used to improve the realis- felt there needed to be a better Arched Rock Corporation; David tic characteristics of the cars. He debugging utility geared toward Patterson, University of California, also showed a demo of an in- larger applications. Using cur- Berkeley; Alex C. Snoeren and Amin house tool called the Cars Dri- rent debugger solutions, a pro- Vahdat, University of California, San ving System that simulated the cess that usually takes several Diego movement and interaction of the hours turns into a weekend-long cars with their environment, so process when run under a David Oppenheimer’s talk tried that the animators did not have debugging environment. They to answer one question: Can to worry about underlying car- would like OSS developers to intelligent service placement be

;LOGIN: OCTOBER 2006 CONFERENCE SUMMARIES 99 useful on a shared wide-area Replay Debugging for Distributed OCALA proxy. He said that platform such as PlanetLab? At Applications liblog helped to find errors the beginning of his talk, David Dennis Geels, Gautam Altekar, Scott caused by broken assumptions laid out five perspectives, from Shenker, and Ion Stoica, University of about network or coding errors. which they will analyze the re- California, Berkeley Loose Synchronization for Large- source characteristics of shared Scale Networked Systems wide-area platforms and try to Awarded Best Paper! answer this question: the vari- Dennis Geels presented liblog, a Jeannie Albrecht, Christopher Tuttle, ability in resource competitions debugging tool for distributed Alex C. Snoeren, and Amin Vahdat, across nodes; the variability in applications. Dennis started his University of California, San Diego resource demands across slivers talk with challenges of the Jeannie Albrecht started by (allocated resources on a single debugging process in distributed addressing the inadequacy of node for an application); how applications. Many errors are current barrier semantics in random placement behaves; how due to race conditions and usu- large-scale distributed heteroge- the quality of initial resource ally are impossible to reproduce neous computing environments. mappings decay over time; and locally. Because of this “limited She argued that the current bar- whether resource competition visibility,” testing or simulation rier semantics is too strict to be can be predicted. is usually not sufficient to repro- effective for emerging applica- David presented their studies on duce and catch the errors. The tions. For example, network these five aspects from a six- current state-of-the-art tech- links may be unreliable and month trace of node-, network-, nique for debugging is still to use machines may become unre- and application-level measure- the print statement. However, sponsive. A traditional barrier ments of PlanetLab. They found once the software is deployed, may lead to the situation where out that CPU and network re- this technique requires that the progress is limited by the slowest source usages are highly variable developer choose to expose the participant or where one must across the nodes. And the re- affected internal state before the wait for an indefinite time for source demands across instances fault manifests. failed hosts. of applications also varied wide- To address the difficulties of Jeannie proposed several possi- ly. These trends suggested that debugging distributed applica- ble relaxations of strict barrier an intelligent service placement tions, Dennis proposed liblog, synchronization (or partial bar- will benefit applications. This which provides lightweight log- rier), which are designed to was demonstrated by David’s ging and deterministic replay, is enhance liveness in loosely cou- simulation results on running transparent to applications, and pled networked systems. She OpenDHT, Coral, and CoDeeN, requires no patch to kernels. It proposed two partial barrier which showed that there were intercepts all libc calls and logs all semantics: early entry, which more slivers satisfying applica- sending/incoming messages. Each allows nodes to pass through tion resource requirements by message is associated with a lam- without waiting for certain slow using intelligent service place- port clock so that it can be used participants, to prevent a few ment. David also pointed out later for deterministic replay. nodes from slowing down the that node placement decisions Dennis discussed several key whole process; and throttle can be ill-suited after about 30 challenges and design choices of release, which releases the bar- minutes, which suggested that liblog. He talked about how to rier participants within a certain migration may help applications deal with concurrent threads, interval, to avoid resource over- if the cost is acceptable. David where deterministic replay was load by preventing all processes also indicated that a node’s CPU harder owing to the lack of kernel from simultaneously coming and bandwidth usage can be pre- support. He also mentioned how into the critical section. Jeannie dicted by its utilization of that to do user-level annotation for also talked about several heuris- resource recently, which implies TCP traffic and using liblog in a tics to dynamically choose the that a migration service need not mixed environment of logging parameters used in partial barri- require high measurement and nonlogging processes. ers, such as detecting the knee of update rate. However, David said In the Q&A session, when asked the curve (at which point the that they found no daily or whether there are any success/ arrivals are considered to be weekly periodicity on resource failure cases,, Dennis briefly slow) and finding the optimal utilization. mentioned their experience capacity of the critical section. using liblog on I3/Chord and

100 ;LO GIN: V OL. 31, NO. 5 Jeannie presented their experi- were connected with layer 3, we spanning tree, which is more ence in adapting wide-area ser- could use a router.) So we use robust. vices by using partial barriers for bridges. (Switches came from a Then each RBridge encapsulates synchronization. She showed different direction, but they packets to tunnel across the net- promising results. For instance, ended up being the same thing work to other RBridges. Add a using a semaphore barrier (a as bridges through parallel layer 2 header at each RBridge, special variation of a throttle bar- evolution.) with destination address set to rier) to perform admission con- The basic idea is that bridges lis- the last RBridge. trol for parallel software install- ten promiscuously, learn who is To fit this into MPLS, we needed ation in Plush enabled an overall on each side of the bridge, and completion rate close to the opti- to map a 6-byte MAC addr into only forward packets between 19 bits. The trick is to use a nick- mal value achievable by manual the two networks as appropriate. tuning. name (mapping 6-byte MAC But loops (cycles) are a disaster, addrs to 19-bit nicknames). Jeannie was asked about the flex- since layer 2 has neither hop ibility of their partial barrier counts nor topology, so you get Q. What about overflowing max- schemes. She answered that the exponential proliferation of imum packet size? schemes should be very flexible, packets. (See Radia’s book for the A. This turns out not to be a since applications receive call- detailed story.) problem in practice. The original backs when the events happen, Let’s compute a spanning tree max packet size was set because (i.e., when some nodes are (i.e., subset the graph to make a the first Ethernet had a very lim- detected to be slow). The appli- tree and do not include cycles), ited amount of RAM, so the max cations still have control of the only transmit along the links packet size was set where it is. progress and thus have the flexi- that are in the spanning tree, and Everybody can handle larger bility to make the best decisions. save the other links for backups. packets these days. On bridges, the spanning tree I NVITED TALK algorithm turns links on when N ETWORK AND OPERATING it thinks the primary links are SYSTEM SUPPORT Routing Without Tears, Bridging dead—so if you drop packets, Without Danger the spanning tree gets turned Summarized by Aameek Singh Radia Perlman, Sun Microsystems back into a general graph. On System- and Application-level Support Laboratories routers, if you drop packets, for Runtime Hardware Reconfiguration Summarized by Chris Small the link gets shut down. Now on SoC Platforms that everyone has converged on Bridges, being at some level sim- D. Syrivelis and S. Lalis, University of IP (i.e., everybody is using the Thessaly, Hellas pler than routers (at least, re- same layer 3 protocol), why use quiring less configuration), are bridges at all? Why not routers? Dimitris Syrivelis presented this often thought to have come first. Well, bridges are simpler to con- paper describing an approach to Actually, routers came first, figure—self-configuring, even. enable programs running on a bridges came later. And it’s a reconfigurable System-on-Chip myth that bridges are simpler: With link state routing, you dis- (SoC) to modify the underlying cover who you are connected to Layer 1 relay = repeater Field-Programmable Gate Array and broadcast this to your neigh- (FPGA) behavior at runtime. Layer 2 relay = bridge bors. Everybody collects this Layer 3 relay = router Accomplishing this requires sup- information and forwards it on. port from both the underlying Wait, this doesn’t make sense: Eventually everybody has full system and the application run- layer 2 is defined as neighbor-to- information about the entire ning on it. The reconfiguration is neighbor. network. achieved using a quick suspend- We’ll see why this makes sense There is a solution to the bridge/ resume mechanism, in which the later. Ethernet is a misnomer: It’s spanning tree problem, called FPGA bitstream corresponding not a network, it’s a multi-access RBridges. They can replace to the new hardware layout is link. Layer 2 is flat, no topology. bridges and are safer. Basically stored in external memory; the If we need to connect two net- they are bridges that gather system saves its current runtime works of machines connected global link state information. state and initiates its FPGA using a layer 2 protocol, we have Each RBridge builds its own reprogramming (i.e., the entire no topology information. (If they FPGA is reprogrammed from

;LOGIN: OCTOBER 2006 CONFERENCE SUMMARIES 101 scratch, as opposed to dynamic boxes, can be deployed incre- INVITED TALK partial reconfiguration). After mentally, and minimize end- the reconfiguration, the system point changes. The extensions An Introduction to Software Radio restarts and manages all effects use in-band signaling (thus eas- Eric Blossom, Blossom Research; and potential side effects of the ing deployability), negotiations Coordinator and Maintainer of the operation. Such reconfigurable (for an end-point to agree to use GNU Radio Project systems can offer significant the extension—a support that is advantages over systems that already present in SSH and TLS), Summarized by Rik Farrow have soft-core CPUs, drivers, or and authentication of reconnec- Software radio means using code controllers. Changing the run- tion (to prevent hijacking). One to modulate/demodulate radio time characteristics of underly- unanswered question is the secu- signals by using as little hard- ing units can help applications rity analysis of these extensions. ware as possible. Instead of sol- adapt to different requirements The authors believe that the dering parts, you change the and boost overall performance, extensions do not introduce any code that is controlling the soft- though a number of issues such new vulnerabilities, but a formal ware radio, providing extreme as device addressing need to be evaluation has yet to be made. flexibility, on-the-fly reconfigura- resolved. The applications inter- Structured and Unstructured Overlays tion, the ability to act as multiple act with the reconfigurable sys- under the Microscope: A Measurement- radios simultaneously, and a tem through a library that issues based View of Two P2P Systems That much quicker development device addition/removal re- People Use cycle. Software radio is currently quests. The paper also discusses used by the military, SIGINT, two sample applications of a Y. Qiao and F. Bustamante, North- research, and cellular companies. Mandelbrot calculation and an western University Another potential use would be audio signal monitor. Fabián E. Bustamante presented public safety, where interoper- Resilient Connections for SSH and a measurement-based study of ability of radios has been a prob- TLS two file-sharing peer-to-peer lem (recall the Katrina disaster systems based on unstructured relief fiasco). Teemu Koponen, Helsinki Institute for (Gnutella-based) and structured Information Technology; Pasi Eronen, Blossom introduced some basic (Distributed Hash Table [DHT]– concepts required for building Nokia Research Center; Mikko Särelä, based Overnet system) topolo- Helsinki University of Technology any radio transceiver, with the gies. The unstructured systems focus on doing this as much in Teemu Koponen presented this do not dictate the topology of software as possible. Radio paper, which addresses a com- the network, and thus are waves range from kilohertz into mon concern of SSH/TLS con- thought to be more resilient to the gigahertz frequencies. To nections dropping owing to a peer churn (peers joining/leav- properly digitize any signal, you network outage or travel. The ing the network). In contrast, must sample it according to SSH and TLS protocols are ex- the structured systems offer Nyquist’s rule, at at least twice tended to provide more resilient guaranteed and scalable O(log the bandwidth. That sampling is connections that can withstand N) lookup performance (where done in hardware using analog changes in IP addresses and long N is the number of peers). to digital converters (ADC), disconnections. The authors Based on observations, the au- sampling rates as high as 6 GHz, argue that such mobility issues thors conclude that both systems and sample sizes ranging from 8 are best handled at a higher ses- are efficient in handing churn; to 24 bits. Think about that for a sion layer as opposed to the data even the Overnet DHT-based moment. If you sample at 6 GHz link or network layers, the tradi- system was surprisingly efficient. and 16 bits, we are talking about tional approaches employed in Both systems had good perfor- 12 GB/s. You won’t be doing this wireless handover or mobile IP mance for exact-match (precisely on your desktop system soon. mechanisms. This is especially matching an object) queries of But researchers have recorded true in the presence of long dis- popular objects, but Overnet had HDTV signals and stored them connection periods and absence almost twice the success rate for to disk, requiring a disk storage of any network infrastructure querying shared objects. Key- capacity of 40 MB/s. such as mobile IP home agents. word searching was fast in both The proposed protocol exten- Not all radio requires such high systems, and load balancing was sampling rates (for example, FM sions are made while ensuring better handled by Overnet. that they do not require any net- radio), and there are projects at work changes or middleware GNU Radio (www.gnu.org/ software/gnuradio) for an FM

102 ;LO GIN: V OL. 31, NO. 5 receiver and a 1 Mb/s data trans- tion about his experiences build- ning UNIX and mounted at vari- ceiver. Blossom said that you can ing robots and other devices that ous points on the robots. The buy hardware today that in- are controlled by humans in real CPUs communicate with a set of cludes four ADC pairs, an FPLA time. Some useful areas for these microcontrollers that drive the (for onboard computations, pro- include performing dangerous actual hardware, valves, etc. By grammed using GNU radio), up tasks (bomb squad and under- tinkering with several parame- to four daughter cards that water salvage), avoiding long- ters in the embedded FreeBSD include analog parts for filtering term travel (Mars rover), and, kernel, it is possible to achieve signals from antennas, and a perhaps somewhat controver- millisecond response times when connection via USB to your sially, supplying efficient on- coordinating controllers. Timing desktop system. Using this setup demand manual labor (e.g., is very important to this task, and a 2x2 phased array antenna someone half a world away does because even a slight lag in actu- 1.5 m on a side, Blossom has cre- your household chores, much to ation can result in, for example, ated software that can track air- the delight of homebodies and a robot losing balance, running craft using the signal from an eight-year-olds!). into a wall, or crushing an ob- FM radio station antenna on a The bulk of the talk comprised ject. This led naturally into a dis- mountaintop near his home in four parts: fundamentals of cussion about the levels of ab- Nevada. There are regulatory robotics and control, software straction for motion control: issues when building your own platforms and components, lev- actuator, position control, posi- radio transmitters, but these can els of abstraction for controlling tion control with feedback, high- be dealt with by using certain robotic devices, and a discussion level, and fully autonomous. A frequencies and signal strengths. of his construction of self-bal- graph gave the audience a good Blossom called the FCC a bunch ancing motorized vehicles. For idea of what could be effectively of politicians, lawyers, econo- the first part, Blackwell quickly controlled by a human (given mists, and engineers who regu- took the audience through a human tolerances) or software at late bandwidth as if radio were primer on a spectrum of comput- each level: Using just actuators stuck in the 1920s. A single VHF ing components and sensors, and position control, a device at TV channel wastes 6 MHz of from large to small, that serve the level of a Roomba vacuum bandwidth, for example, and different purposes and are placed cleaner is achievable; with feed- compulsory channels use more on different sections of robots. back, unmanned aircraft or an than all the bandwidth used by The joints on humanoid robots arm on wheels can be controlled; cellular channels. Creative use of are primarily pneumatic and are high-level computations might software radios would make actuated by software-controlled permit reasonable bipedal mo- much better use of bandwidth proportional valves. Human con- tion, but a fully humanoid robot without causing interference trol of the robots he builds, such would require a high degree of with other forms of radio com- as those for experimenting with autonomous control. There are, munication. bipedal motion, utilize common of course, cracks in this picture GNU Radio uses data flow sensors in gloves and cameras and not every device falls neatly abstractions, event-based over- that provide constant feedback into a single category. lay, message queues, and mes- on hand and/or arm position. The last part of the presentation sages, all written in a hybrid of Several videos comically demon- focused on Blackwell’s originat- C++ and Python. The software is strated the difficulty of real-time ing hobby of building self-bal- free and the hardware now costs robot control, particularly when ancing vehicles such as his Euni- under $1,000 (www.ettus.com). lag was involved (even human cycle and Segway-like scooter. You can learn more at sensory lag). Most of it turns out not to be http://comsec.com/wiki. Blackwell moved on to show a rocket science, but it still re- breakdown of his component- quires a reasonable knowledge of mechanical engineering and CLOSING SESSION based heterogeneous infrastruc- ture for robotic experimentation. classical physics. There were sev- Real Operating Systems for Real-time He starts with a rack of BSD eral poignant questions asked Motion Control servers responsible for perform- during the Q&A: Is building these things an affordable en- Trevor Blackwell, CTO, Anybots ing complex motion vector and other computations (mostly pro- deavor for hobbyists? Scooters Summarized by Marc Chiarini grammed in Python). These are and such are definitely reason- Trevor Blackwell gave an inter- connected via wired (or wireless) able. Humanoid robotics, espe- esting and entertaining presenta- TCP/IP to embedded CPUs run- cially smaller projects, are quick-

;LOGIN: OCTOBER 2006 CONFERENCE SUMMARIES 103 ly becoming an option. Why did - n’t Blackwell incorporate force- feedback in his projects? Latency is a significant stumbling block, especially for fine control. Why did Blackwell ignore other bio - logically inspired nonhuman robot designs? The response was that he was most interested in robots that could do tasks de- signed for people in environ - ments designed for humanoids. See http://anybots.com and http://tlb.org/scooter.html for further details.

104 ;LOGIN: VOL. 31, NO. 5