THE MAGAZINE OF USENIX & SAGE October 2003 • volume 28 • number 5

inside: CONFERENCE REPORTS USENIX Annual Technical Conference 2003

The Advanced Computing Systems Association conference reports

This issue’s reports focus on The USENIX Annual Technical (Metro Fiber Systems, a wide-area opti- cal-networking company), MCI, and

USENIX Annual Technical Conference Conference EPORTS then Worldcom, rising to challenge the R June 9–14, 2003 largest telecommunications companies San Antonio, Texas in America. He is co-author of !%@:: A

Directory of Electronic Mail Addressing & ONFERENCE

AWARDS C

Networks, published by O’Reilly Books. The USENIX Lifetime Achievement OUR THANKS TO THE SUMMARIZERS: He is also co-author of RFC-850, the Award (the Flame) was given to Rick Standard for Interchange of USENET FOR USENIX ATC ‘03 Adams for implementing Serial Line IP Messages, which was updated to become William Acosta (SLIP) and founding UUNET, thereby RFC 1036 in 1987. Raya Budrevich making the Internet widely accessible. In 1982 Rick ran the first international The Software Tools User Group (STUG) Francis Manoj David UUCP email link at the machine seismo award recognizes significant contribu- Rik Farrow (owned by the Center for Seismic Stud- tions to the community that reflect the Shashi Guruprasad ies in Northern Virginia), which evolved spirit and character demonstrated by Hai Huang into the first (UUCP-based) UUNET. He those who came together in the STUG. James Nugent maintained “B” News (at one time the Recipients of the annual STUG award most popular Usenet News transport), conspicuously exhibit a contribution to Manish Prasad wrote the first implementation of SLIP the reusable code-base available to all Peter Salus (Serial Line IP), and defined the first and/or the provision of a significant, Benjamin A. Schmit protocol for running TCP/IP over ordi- enabling technology directly to users in Wenguang Wang nary serial ports (in particular, dial-up a widely available form. modems). The SLIP protocol was super- The 2003 award was given to CVS (the seded, years later, by PPP, which is still in Concurrent Versioning System) and its use. Rick founded a nonprofit telecom- four main authors, Dick Grune, Brian munications company, UUNET, to Berliner, Jeff Polk, and Jim Klingmon. reduce the cost of uucp mail and net- Without CVS, it wouldn’t be possible for news, particularly for rural sites in any number of people to work on the America. (UUNET was founded with a same code without interfering with each $50,000 loan from the USENIX Associa- other. It can be argued that without tion, which was subsequently repaid.) remote-collaboration tools such as CVS, UUNET became an official gateway most of the larger free and open source between UUCP mail and Internet email, software that is available today could not as well as between North America and have existed. While individuals can pro- Europe. It hosted many related services, duce significant software, collaborative such as Internet FTP access for its methods are often needed for complex UUCP clients and the comp.sources. and wide-ranging projects. unix archives. Rick spun out a for-profit company, UUNET Technologies, which Dick Grune: The original author of the was the second ISP in the United States. CVS shell script, written in July 1986, The for-profit company bought the Dick is also credited with many of the assets of the nonprofit, repaying it with a CVS conflict resolution algorithms. He share of the profits over the years. The developed the script at the Free Univer- nonprofit has spent that money for sity of Amsterdam (Vrije Universiteit), many UNIX-related charitable causes where he teaches principles of program- over the years, such as supporting the ming languages and compiler construc- Internet Software Consortium. The for- tion. He was involved in constructing profit ISP became a multi-billion-dollar Algol 68 compilers in the 1970s and par- company and was merged with MFS ticipated in the Amsterdam Compiler

October 2003 ;login: USENIX ATC ‘03 37 Kit in the 1980s. He is co-author of three concerns “how the brain works.”In case “I’m going to make no thunderous pro- books: Programming Language you don’t recall, Descartes separated nouncements,”Stephenson pronounced. Essentials, Modern Compiler Design, and mind and body; Damasio – and, by “Novelists, or coders, or philosophers, or Parsing Techniques: A Practical Guide. extension, Stephenson – think this paleontologists don’t go on churning wrong. The separation actually dates out masses of filterable stuff, but succeed Brian Berliner: Coder and designer of back to Plato, and Stephenson sees it as by doing it right the first time. the first translation of the CVS scripts to the difference between Spock and libido the C language, in April 1989, Brian “What we do is a much more physical (Kirk or Bones). based his design on the original work kind of work and emotional kind of done by Dick Grune. Stephenson went on to talk of his pro- work than is believed. Pressure to just duction of The Big U (1984), his first churn out lines of code will not lead to Jeff Polk: Jeff rewrote most of the code opus, his creation of “steaming moun- good things in the end.” of CVS 1.2. He made just about every- tains of crap,”resulting in a process of thing dynamic (by using malloc), added Yep. cutting and distillation. He analogized a generic hashed list manager, rewrote this with coding and with “distilling the modules’ database parsing in a com- INVITED TALKS whiskey from beer.” patible but extended way, generalized ENGINEERING REUSABLE SOFTWARE LIBRARIES directory hierarchy recursion for virtu- When we execute there is a “foreground Kiem-Phong Vo, AT&T Labs – Research ally all the commands, generalized the process” and a “background process” loginfo file to be used for pre-commit which goes on “all the time.”That back- Summarized by Wenguang Wang checks and commit templates, wrote a ground process needs space to do what Phong Vo has 20 years of experience new and flexible RCS parser, fixed an it does – “which is a mystery.”What we building general purpose software uncountable number of bugs, and have is “a faculty of maintaining the libraries, which cover a wide range of helped in the design of future CVS fea- entire stream all the time, while access- computing areas such as I/O, memory tures. ing it only bit-by-bit at a time.” allocation, container data types, and data transforming. In this talk, he Jim Kingdon: While at Cygnus, in 1993 Stephenson feels that we should “kick showed how to build efficient, flexible, Jim made the first remote CVS, which down the Platonic model.”He is force- and portable software libraries using the ran over TCP or rsh or kerberos’d rsh, fully against PowerPoint. discipline and method architecture. and eventually over TCP/IP. The “I use a fountain pen,”he remarked, and remote-CVS protocol enabled real use of Vo first pointed out that traditional went on to say that he does not use CVS by the open source community; standard software libraries such as mal- “smileys.”He thinks Larry Wall came up before remote CVS, everyone had to log loc, stdio, curses, and libc have many with something in Perl because there’s in to a central server, copy their patches problems. For example, malloc frag- “more than one way to do something.” there, etc. Some years later, Jim formed ments memory; many container data Cyclic, a company which offered CVS Stephenson returned to Damasio and types have multiple incompatible inter- support and development. cited Einstein’s “clear images,”which are faces; inconsistent and inadequate inter- “visual and muscular,”though he admit- faces are common; programmers often KEYNOTE ADDRESS ted that he wasn’t certain what Einstein resort to hacks around problem areas, Neal Stephenson meant by “muscular.”(I’d guess “vigor- which cause problems in maintenance, porting, and performance. Summarized by Peter H. Salus ous” or “forceful” would be the appro- priate gloss: the German is “kraeftig.”) Neal Stephenson – sci-fi author extraor- Vo then discussed the characteristics of dinaire of The Big U (part funny, part Mathematical entities can be combined ideal standard libraries. These libraries silly, but worth it), Snow Crash, and in an infinite number of useless forms, should have good standard interfaces; Cryptonomicon, among others – began Stephenson said, but they are capable of address unsatisfied needs; be easy to his keynote by remarking that Bertrand leading us to mathematical truth. configure, use, and upgrade; and, most Russell and Stephen Jay Gould didn’t “Invention is choice.” importantly, have decent performance. rewrite: they had “no delete key.” These libraries should enable applica- Down with Plato or Descartes: “We pos- tions to tailor algorithms for specific He then discussed at length Antonio sess an emotional marking system.” needs and simplify library composition Damasio’s Descartes’ Error (1994), which to optimize resource usage. Vo argued

38 Vol. 28, No. 5 ;login: that with these ideal libraries in hand, icy. Typical methods include Vmpool, software collection built on top of some programmers could focus on writing which allocates objects of the same size; of these libraries is available at http:// EPORTS libraries instead of writing programs, Vmbest, which allocates objects based www.research.att.com/sw/download. R since a program consists of libraries plus on a best-fit policy; and Vmlast, which the application specifics. The productiv- allocates memory for a complex struc- THE CONVERGENCE OF UBIQUITY: THE FUTURE OF WIRELESS NETWORK SECURITY

ture and releases all ONFERENCE objects together in con- William A. Arbaugh, University of C stant time. Maryland at College Park Summarized by Rik Farrow These methods are pre- Bill Arbaugh, who often works in the defined in the library wireless arena, started off his talk with a and cannot be extended joke (I wish I would remember to do by programmers. They that). He showed slides of Free Software, can be selected dynami- Free Willy, Free Kevin, Free Martha, Free cally by environment Wireless, and even Free Beer. But what variables. For example, Arbaugh really wanted to talk about was after the VMDEBUG the past, present, and future of wireless environment variable is security. set to 1, Vmalloc uses the debugging-enabled Hotspots are a part of the present of USENIX ’03 Reception methods instead of the wireless networks. You can find maps of default-efficient meth- ity of programming should increase due wireless networks, collected by war driv- ods for memory allocation, which allows to the maximal library reuse and mini- ing, for most large American cities, and various memory allocation problems to mal special code. some of these networks are open, requir- be detected automatically. This feature ing no authorization or encryption key. In traditional software libraries, the han- makes the debugging and the profiling Arbaugh postulated that your next gen- dle plus operations model is often used, of applications very easy whenever eration cell phone will work over - where the handle represents resources needed. less networks when possible, and fall and the operations define resource man- back on slower, but more ubiquitous, Vo then used the Vcodex library to show agement functions. Although this model cell phone technology when necessary. how to design a D&M library. The can address immediate needs and is easy As you walk out of a hotspot talking on Vcodex library can transform data using to use, it causes the problems discussed your cell phone, it will switch transpar- compression/decompression, data dif- above. To address these limitations, Vo ently from the Internet to the cellular ferencing, encryption/decryption, etc. presented the discipline and method network. Arbaugh quipped that the poor Designing a D&M library requires deter- architecture (D&M), which was used sound quality provided by cell phones mining resource types and general oper- when he developed the Vmalloc, Sfio, has made short breaks in communica- ations, characterizing resources in a Cdt, and Vcodex libraries with other tion and occasional garbling expected in general discipline interface, and captur- researchers. phone calls, and that will make Internet ing algorithms/resource management in use more acceptable. Vo used the Vmalloc library as an exam- methods. In Vcodex, bytestream is iden- ple to demonstrate the D&M architec- tified as a resource type (i.e., discipline). The ghost of wireless security past is ture. Vmalloc is a popular memory All data-transforming operations (com- WEP, or Wired Equivalent Privacy. allocation library used to replace the pression/decompression, encryption/ Arbaugh described the author of WEP standard malloc interface. In Vmalloc, a decryption, etc.) are different methods as being in hiding, having created a discipline defines the type of memory since they all change some set of bytes to cryptographic protocol that failed in and the event handling of memory allo- produce other bytes. every imaginable way. cation. Typical disciplines include heap All the D&M libraries discussed in this and shared memory. Programmers can The present of wireless security is Wi-Fi talk (Sfio, Cdt, Vmalloc, Vcodex) can be define their own disciplines to extend Protected Access (WPA). WPA bolts on downloaded from http://www.research. the type of memory. A method in Vmal- over existing hardware and solves some att.com/sw/tools. The AST OpenSource loc defines the memory-allocation pol- of the problems inherent in WEP. For

October 2003 ;login: USENIX ATC ‘03 39 example, keys will change with each an enterprise UNIX OS for the IA-64. correctly. In the example she described, packet sent, and packet integrity will SCO has indicated that they may go after the email system used the pads with the actually work, due to a different set of large-scale users of and mailed XOR operation two times too many, algorithms (TKIP). Stronger access con- letters to that effect to 1500 companies. making it possible to extract the pads trol (unlike the MAC addresses used by Even if IBM is exonerated, it is not clear from each message (and decrypt the WEP) is included. WPA2 will follow, that this would prevent SCO from messages as well). require hardware changes, and support bringing suits against other companies. One of Perlman’s personal pet peeves is AES. This is a problem for small companies, screensavers. When her screen goes which lack the money for a well organ- Alas, WPA does not solve the current blank, she immediately assumes that her ized legal defense. denial-of-service issues. Arbaugh screensaver has kicked in, and types her demonstrated this by sending manage- Several legal issues were brought up. password. Of course, while this might be ment packets (which are never One of them, an obscure legal ruling the case, a power-saving feature might encrypted or authenticated), which from the late 1800s involving the sale of have just blanked the screen, so if you knocked people off the room’s wireless a pregnant cow when neither the seller see a weird set of characters in an email channels. nor the buyer knew this fact, may be from her, it just might have been a pass- used to allow SCO to continue, despite word that she just had to change. Arbaugh’s future includes smaller their releasing Linux code under the devices, with the transparent transition- Perlman lambasted systems such as one GPL. ing between hotspots and cellular men- where whenever you change your pass- tioned earlier, IPv6 addresses and SCO has not yet divulged in June where word, it gets emailed back to you. Or an routing, and always-on connections. All the proprietary code is. A panel of peo- SSL-protected online site that emails you devices will need personal firewalls in ple from the open source community is back a receipt that includes your credit addition to anti-virus – not that that will forming to look at the code under a very card information. protect users from the management restrictive NDA, thus making the panel Perlman also contrasted secret and pub- back doors that already exist in many of questionable value. lic keys. One big reason for Kerberos, cell phones today. Jon Hall brought up the point that a quipped Perlman, was the cost of public Obviously, we are in for a lot of sur- comparison of the SCO and Linux code key licenses. Kerberos requires a direc- prises, and interesting work, in the is insufficient, since code could have tory that must always be available and future of wireless. been inherited from a multitude of securely hold everyone’s secrets. Public other places, such as BSD. keys no longer have the onerous license INTELLECTUAL PROPERTY IN AN AGE fee but require Public Key Infrastruc- The impact of this suit on the future is OF COMMERCE: CORE ISSUES IN THE tures instead. SCO / LINUX IP SUIT unclear, although it could affect the will- Chris DiBona, Damage Studios ingness of companies to share code. She described the four types of PKIs: monopoly, oligarchy, anarchy, and bot- Summarized by James Nugent HOW TO BUILD AN INSECURE SYSTEM OUT tom-up. In monopoly, everyone pays Chris DiBona gave a highly informal OF PERFECTLY GOOD CRYPTOGRAPHY Verisign to play. In oligarchy, 80 or more discussion-format talk on the SCO v. Radia Perlman, Sun Microsystems vendors all have self-signed keys in all IBM suit and what it may mean. Jon Laboratories browsers, with no mechanism for revo- Hall and Don Marti (Linux Journal) also Summarized by Rik Farrow cation. In anarchy, we use the PGP participated in a semi-panel format. Radia Perlman always seems like she is model, which scales poorly. Bottom-up First, a general discussion of the issue is having a great time, and this talk was no appeared the most reasonable, in which in order. On March 6, 2003, SCO filed a exception. She selected cryptography each organization has its own certifica- suit alleging that IBM had stolen some bloopers from a much larger collection tion authority (CA), and organizations code from them and placed it in the than she could present during one talk. sign the certificates for the CAs they Linux kernel; they asked for $1 billion She began with an email system based need to work with. [now $3 billion – ed.] in damages. This on a one-time pad. One-time pads are Perlman went on to rant about the was supposedly done as part of Project the strongest, most secure way of craziness in the SSL certificate scheme Monterey , a joint effort involving SCO, encrypting data, provided the pads (use of X.509), the “300 content-free IBM, and other companies to produce themselves are kept secure and are used

40 Vol. 28, No. 5 ;login: pages of ISAKMP framework” for IPSec, should dictate policy but that these matter of choice or desire that some and her top-ten favorite list of people issues should be decided by the market. people would like to have. An important EPORTS and things that get in the way of better counterpoint that came up was how to R The approach that New.net has taken security. During the question and handle Uniform Domain-Name Dis- does not involve configuration pains for answer period, a manager begged her to pute-Resolution Policy (UDRP) when ISPs or individuals and is backward

lobby for better security, but Perlman numerous TLDs exist. For example, how ONFERENCE

compatible with the existing system. C stated that the world just doesn’t care would they deal with someone wanting New.net domains can be enabled either enough. to open up a .boycott or .sucks domain (1) via recursives that rely on US gov- with organization names in it for people ernment root servers but augment with ALTERNATIVE TOP-LEVEL DOMAINS to criticize companies. It would also be New.net domains, or (2) through user (AKA THE GAME OF THE NAME) harder to enforce anti-cybersquatting machines that have the New.net client Steve Hotz, New.net laws. To the speaker’s position that we plug-in (which still relies on the US gov- Summarized by Shashi Guruprasad need a consortium rather than one ernment root servers). New.net cur- organization, one person responded that Steve Hotz, the CTO of New.net, gave rently offers 88 new domains. They ICANN was the consortium. Lastly, it one of the most controversial talks of currently have 150 million customers in was pointed out that New.net names will this year’s USENIX ATC. This was evi- total. They have tied up with five out of be invalid if DNS security extensions are dent from the audience’s hostility and the top seven US-based ISPs and with implemented. bitter reactions, including a spate of pro- many worldwide. The speaker also men- fanities hurled at New.net and a few tioned that their efforts to talk to even directed at the speaker himself. MODELING THE INTERNET ICANN and other organizations have Harry DeLano and Peter H. Salus, Matrix New.net is a domain name registrar not succeeded. They also avoid conflict- offering consumers a large number of NetSystems ing TLDs with ICANN or country-code Summarized by William Acosta new and alternative top-level domains TLDs. However, it remains to be seen if (TLDs) – e.g., .family, .kids, .shop – that As the Internet continues to grow, it ICANN will break the Internet by becomes harder to represent the net- are not ratified by the Internet Corpora- launching conflicting TLDs. New.net tion for Assigned Names and Numbers, work graphically using a geographical also plans to partner with other alterna- model. Such representations of the net- or ICANN, a technical coordination tive TLD providers that may bid on body composed of a broad coalition of work lack the ability to visually express TLDs with ICANN, such as .kids or such details as the location of the “big the Internet’s business, technical, aca- .golf. Some of the big companies such as demic, and user communities. Among pipes,”peering relations, and routing Microsoft, AOL, AT&T, Verisign, and information. To illustrate the increasing other things, ICANN is responsible for Nokia are potential new TLD providers. assignment of domain names. complexity of the visual models of the However, New.net is not really making Internet, the speakers presented a his- New.net’s position is that additional an effort in this direction. The reality of tory of “maps” of the Internet, starting TLDs provide more choices and richer “breaking the root” will occur only if with the earliest known maps of naming schemes and that a market for multiple non-cooperating businesses ARPANET from 1969 and continuing them exists. Currently, there is no real promote competing namespaces. The with the geographical maps through the Internet directory and nothing to fill the last point was that New.net has been 1970s and 1980s that included the first gap between whois and search engines. operational for the past two years, yet satellite links to both Hawaii and Europe An artificial scarcity exists and has not the Internet is not broken. and the integration of multiple networks been mitigated by ICANN. The speaker During the Q&A, one person expressed like Usenet to form what is now know as repeatedly stressed that this is not the the wish that New.net would fail. the Internet. most important problem that needs Another mentioned that New.net has solving in the Internet but just a place One of the fundamental questions that broken the fundamental principle of a modeling attempts to answer is, how where a market exists. Some of the issues single consistent naming system wher- in new TLDs were: how many, Internet does the Internet behave? This question ever you go. Someone else said he can be broken down into several compo- stability, trademarks, who will control believed that things would break only if them, and, finally, where does the money nents: What are the nodes and their con- there was significant adoption. Another nections? How can the Internet be ($1 billion per year) go? New.net’s per- person asked how it matters whether spective is that no single organization visualized? How can its health be moni- there is a .usenix or usenix.org. The tored? How can we detect and respond answer was that it is not a necessity but a to events? Some of the techniques used

October 2003 ;login: USENIX ATC ‘03 41 for measurement analysis include the Someone pointed out that maps of the can “hold hands” with each other to traceroute program, BGP information, physical world are backed by a physical form larger objects. By controlling the and DNS data. Similarly, tools such as reality that is less mutable than the direction of the “hand-holding,”one can ping and the Internet Weather Report Internet. This prompted the question of create solids, liquids, or gases. Control of [now discontinued - ed.] are used for whether we need better mapping sci- such machines requires nanocomputers. monitoring the “health” of the Internet. ence. Other issues raised during the Nanocomputers, however, cannot be However, these tools are not perfect. As Q&A included a discussion on the scala- programmed using the usual techniques. an example, the speakers noted that bility of current measurement tools and The energy required to erase a bit, traceroute does not accurately reflect the the limitation on probing large numbers though small, is very significant at such existence of multiple parallel paths to of hosts frequently. small scales. Thus, these nanocomputers some destination. need to be programmed using reversible URLs programming techniques that avoid The speakers discussed several projects CAIDA: http://www.caida.org/ erasing bits. (from Lumeta, CAIDA, MIDS) which Graphviz: http://www.research.att.com/ analyze and track certain sets of Internet Flying cars are now possible. Airflow can sw/tools/graphviz/ information. In particular, the speakers be controlled over the skin of the car. Lumeta: http://www.lumeta.com/ showed the Internet averages of latency, Negative drag gives all the required MIDS: http://average.matrix.net/ packet loss, and reachability observed thrust. Thus, the car can be sleek and Rocketfuel: http://www.cs.washington. during events like the September 11, elegant. Instant houses? At bacterial edu/research/networking/rocketfuel/ 2001, terrorist attacks and the April replication speeds, nanorobots can turn

Fools’ Virus, to demonstrate how these NANOTECHNOLOGY: AS HARDWARE a pile of dirt into a house. However, tools tracked the effect these events had BECOMES SOFTWARE there exists the problem of runaway on the Internet. Additionally, the speak- J. Storrs Hall, Institute for Molecular nanorobots. There are several means to ers discussed tools, such as Graphviz Manufacturing control these nanorobots. Removal of from AT&T Labs and Walrus from fuel can stop replication. Also, a fixed Summarized by Francis Manoj David CAIDA, that help visualize the topology supply of replication unit kernels can What is nanotechnology? It is the con- of the Internet. control the replication. struction of self-replicating machines Currently, data collection is subjective with atomic precision. Molecular manu- Other interesting applications include and performed in an ad hoc manner. facturing is actually a better term to thin skins that can insulate and protect Additionally, the data sets obtained lack describe this process. The manufactur- humans. These skins can be so thin that context and may be incomplete. As an ing process is not straightforward. they wrap around a single strand of hair, example, it was noted that paths from Because of their size it is not possible to providing air management and sweat traceroute and BGP data contain incon- just pick atoms and place them together. management! Respirocytes are yet sistencies. The speakers suggested that A molecule can be used as a handle to another design for small nanomachines data collection needs to be more objec- move the atom to the desired site. A that function like red blood cells. They tive and that the interpretation of the bond-forming chemical reaction is used can store and release oxygen in the results deserves more scrutiny. To this to fuse the atom to the rest of the struc- bloodstream as and when necessary. end, the speakers suggested the forma- ture. Space travel can also be made cheaper by tion of an international body patterned the construction of an extremely tall Self-replication is a key requirement for after the Centers for Disease Control platform using artificial diamonds. Thus nanomachines. This results in a sophisti- that would both monitor the health of nanotechnology has tremendous poten- cated product with exponential growth the Internet and be able to respond to tial for the future of humankind. in capital because of the 100% reinvest- events such as disasters and virus out- ment. A parts assembly robot is used to URLs: http://www.imm.org/Parts/ breaks. build other robots. Just like a car factory, http://discuss.foresight.org/~josh/ During the Q&A session, John Quarter- pipelining and convergent assembly http://www.losthighways.org/radebaugh. man asked what are the fixed points lines are used to speed up the process. html (analogous to cities on traditional road http://www.moller.com/ Josh went on to illustrate applications of maps) of Internet maps? Suggestions this technology. Nanomachines can be from attendees included using Internet built to simulate molecules. One such exchanges and the root nameservers. design is “utility fog.”These machines

42 Vol. 28, No. 5 ;login: TECHNICAL SESSIONS – ROLE CLASSIFICATION OF HOSTS WITHIN refined versions of most of the algo- GENERAL TRACK ENTERPRISE NETWORKS BASED ON CONNEC- rithms described in this paper. EPORTS

TION PATTERNS R ADMINISTRATION MAGIC Godfrey Tan, John Guttag, and Frans A COOPERATIVE INTERNET BACKUP SCHEME Summarized by Francis Manoj David Kaashoek, MIT; Massimiliano Poletto, Mark Lillibridge, Hewlett-Packard Labs; Mazu Networks Sameh Elnikety, Rice University;

UNDO FOR OPERATORS: BUILDING AN ONFERENCE UNDOABLE EMAIL STORE This paper presents methods to auto- Andrew Birrell, Mike Burrows, and C Michael Isard, Microsoft Research Aaron B. Brown and David A. matically group hosts in a network Patterson, University of California at based on their communication patterns. Backup service providers on the Internet Berkeley Grouping of hosts can simplify network are costly. This paper describes a scheme This won the General Track Best Paper management and can also detect anom- by which individual computers on the award. Human error is the primary bar- alous conditions. For example, a devel- Internet can cooperate to back up each rier to building dependable systems. A oper’s machine behaving like a mana- other’s data. Each computer uses a set of small mistake such as misconfiguring an ger’s machine would be interesting partners to store its backup data. In email virus scanner can result in lots of information to an administrator. return, it holds part of each partner’s lost email. This damage could be pre- backup data. By making redundant The proposed scheme uses probe hard- vented if operators were able to undo copies of the backup on multiple com- ware to sniff all network traffic. A cen- their mistakes. This paper presents an puters, a high level of reliability is tral aggregator then collects the infor- implementation of undo capabilities for obtained. Thus, inexpensive backups can mation from all the probes. It is respon- an email distribution system. be obtained. sible for maintaining a “neighbor rela- Three Rs – rewind, repair, and replay – tionship” between hosts that commun- The backup scheme depends upon are necessary to achieve such capabili- icate regularly. Groups are then identi- cooperation between hosts. There are ties. Rewind brings back a system to a fied based on host similarities. Similarity always bound to be hosts that refuse to time in the past, repair fixes the mistake, is defined as the number of common cooperate. In its basic form, this scheme and replay ensures that new information neighbors. Group formation is treated has several pitfalls. Hosts may promise is not lost. Paradoxes can arise during a as a graph theory problem. Each node in to hold data and not keep their word. replay, though. This is because the sys- the graph is a host and each edge with This is discouraged by several mecha- tem might make changes to message weight e represents the fact that there are nisms, including periodic challenges to bodies or restore missing email to a e common neighbors between the hosts. ensure that partners are cooperating and mailbox. To explain this to the user, For a given value of similarity, say k,a k- a novel method called “disk-space wast- explanatory emails are sent. neighborhood graph is constructed with ing” designed to make cheating unprof- edges of weight k. Bi-connected compo- itable. Attacks aimed at disrupting the The architecture for the undoable email nents in this k-neighborhood graph are service are also possible in the basic store consists of rewindable storage and considered separate groups. Special cases scheme. The paper outlines some solu- a proxy that intercepts user events. All are used when the graph is a tree. Groups tions to prevent these attacks as well. user events are encoded as “verbs” which can also be merged into a larger group are logged. An undo manager is used to Results from an initial prototype show based on group similarities. For more control all undo operations. Studies with that these techniques are feasible as far details on this, please refer to the paper. a Java implementation show that the as performance is concerned. The costs time and space overheads are not signifi- The effectiveness of this scheme has involved are also quite low compared to cant. Responding to a question at the been studied by comparing the groups other Internet backup options. end of the presentation, Aaron also clar- generated automatically with groups of ified that the explanatory messages hosts determined by system administra- themselves are not treated as user events tors. Results of this study show that the and hence no verbs are generated for groups extracted automatically closely them. match the desired groups. This scheme shows that automatic role classification of hosts based on their communication patterns is feasible. Mazu Network’s PowerSecure software now incorporates

October 2003 ;login: USENIX ATC ‘03 43 POWER ory nodes each process uses creates Machine (JVM), and by sharing Summarized by Hai Huang opportunities to power off a large per- resources among these processes, it centage of nodes in the system, thereby reduces resource consumption and CURRENTCY: A UNIFYING ABSTRACTION FOR saving a significant amount of energy. improves overall performance. Multi- EXPRESSING ENERGY MANAGEMENT POLICIES The authors introduced other tech- user MVM takes this a step further and Heng Zeng, Carla S. Ellis, Alvin R. Lebeck, niques – library aggregation and page allows multiple users to safely execute and Amin Vahdat, Duke University migration – to achieve energy savings. different processes on the same MVM. A system-level energy management These techniques are compatible with As if executing in a traditional OS, technique is discussed. Energy is classi- various DRAM architectures, including processes, files, and other resources fied as yet another resource that can be SDR, DDR, and RDRAM. belonging to the same user will not be managed by the . An illegally accessed by another user. MVM- energy quota is allocated to each process GET VIRTUAL 2 extends MVM by having a separate periodically, and when each process Summarized by Hai Huang instance of Jlogin per user session to accesses an energy-consuming hardware manage user identity, environment, and device (e.g., disk, CPU, network inter- OPERATING SYSTEM SUPPORT FOR VIRTUAL other access control information. It was face), it is charged the amount of energy MACHINES shown that for a typical workload, that was spent accessing the device. Samuel T. King, George W. Dunlap, and MVM-2 only incurs a small perform- When all its energy quota is spent, it Peter M. Chen, University of Michigan ance overhead over MVM, while provid- cannot execute further until the operat- The authors describe several techniques ing a safe multi-user environment. ing system replenishes its quota. The to reduce the virtualization overhead authors were able to show that the sys- associated with type II virtual-machine NEEDLES AND HAYSTACKS tem is able to achieve desired runtime, if monitors (VMM) so they can achieve it is possible, with a high success rate. By performance similar to type I VMMs. In Summarized by Wenguang Wang managing the energy quota of each their experiments, they used UMLinux A LOGIC FILE SYSTEM application, it is easy to suppress the as the guest operating system. One of Yoann Padioleau and Olivier Ridoux, execution of the less important tasks, the techniques they use to improve per- IRISA / University of Rennes while giving more energy to the more formance is to move the VMM from the Organizing information using the hier- important ones to keep the system run- user-space to the kernel, so they can archical paradigm, as is done by tradi- ning longer when the energy supply is avoid high-overhead ptrace calls and tional file systems, can provide naviga- low. reduce the number of context switches tion functions but is rigid in that an when guest system calls/signals are object can only be reached via one path. DESIGN AND IMPLEMENTATION OF POWER- invoked. Other techniques include the The Boolean query paradigm used by AWARE VIRTUAL MEMORY use of segmentation registers to reduce search engines provides flexible key- Hai Huang, Padmanabhan Pillai, and overhead of guest user-to-system and word-based search capability but lacks a Kang G. Shin, University of Michigan guest system-to-user boundary crossing, navigation mechanism. In this talk, A novel technique to reduce the power and the use of multiple address spaces Yoann Padioleau presented a logic file dissipation in the DRAM was presented. for the virtual machine to reduce the system (LISFS) based on the Boolean As workloads become more data-cen- guest user-to-user switching overhead. query paradigm but extended to provide tric, more energy is spent to sustain the The performance of their system was navigation. Each file in LISFS is associ- ever-growing DRAM in systems. The comparable to that achieved by VMware ated with a conjunction of properties of intuitive basis of current power-man- Workstation. interest. The directories are used to rep- agement technology is that processes resent properties. One can navigate in execute one at a time, and when each is A MULTI-USER VIRTUAL MACHINE LISFS by giving the desired or undesired executing, it only uses a small fraction of Grzegorz Czajkowski and Laurent properties as directory names. the total system memory. By figuring Daynès, Sun Microsystems; Ben Titzer, out the memory nodes each process Purdue University A prototype of LISFS has been imple- uses, energy can be saved by putting This paper describes the implementa- mented by the authors. The data struc- unused memory nodes into a low energy tion of adding multi-user capability in a ture of the current implementation is state. Proactively allocating physical multi-tasking virtual machine (MVM). optimized for searching and navigation. pages to minimize the number of mem- A MVM allows a user to execute multi- It is somewhat slow for file and property ple processes in the same Java Virtual creation. The performance experiments

44 Vol. 28, No. 5 ;login: of the prototype showed efficient navi- They found that delta-encoding using achieves a significant reduction of run- gation execution but 4 to 34 times Rabin fingerprints can improve on sim- time when reading large files from the EPORTS longer creation time than ext2. The disk ple compression by up to a factor of two, server, given that a significant portion of R space overhead is 2–5KB per file, given depending on workload. A small frac- the file blocks can be found in the CAS. 50 properties. A prototype of LISFS and tion of objects can potentially account

more information on this project can be for a large portion of the space and CHANGE IS CONSTANT ONFERENCE downloaded at http://www.irisa.fr/LIS. bandwidth savings. More importantly, C

Summarized by Shashi Guruprasad when multiple files match the same SYSTEM SUPPORT FOR ONLINE APPLICATION-SPECIFIC DELTA-ENCODING VIA number of features, any file can be used RECONFIGURATION RESEMBLANCE DETECTION as a good base for computing delta. Fred Douglis and Arun Iyengar, IBM T.J. Craig A.N. Soules, Gregory R. Ganger, Carnegie Mellon University; Jonathan Watson Research Center OPPORTUNISTIC USE OF CONTENT Appavoo, Michael Stumm, and Kevin Delta-encoding is a data-compressing ADDRESSABLE STORAGE FOR DISTRIBUTED Hui, University of Toronto; Robert W. FILE SYSTEMS method that represents an object using Wisniewski, Dilma Da Silva, Orran Niraj Tolia, Mahadev Satyanarayanan, its difference relative to a similar object. Krieger, Marc Auslander, Michal Carnegie Mellon University and Intel In this talk, Fred Douglis showed how to Ostrowski, Bryan Rosenburg, and Jimi Research Pittsburgh; Michael Kozuch, use delta-encoding to compress a set of Xenidis, IBM T.J. Watson Research Brad Karp, Intel Research Pittsburgh; files without specific knowledge of these Center Thomas Bressoud, Denison University files in advance. Unlike previous and Intel Research Pittsburgh; Adrian Craig Soules spoke about their work to approaches, the resemblance between Perrig, Carnegie Mellon University extend and replace active OS compo- files is detected dynamically. A distributed file system nents to perform an online reconfigura- on WAN is often slow. tion. OSes are complex pieces of Niraj Tolia presented software that are required to work under their work on building a a variety of hardware resources and distributed file system, usage patterns. To overcome this prob- called CASPER, which lem of one size fits all, OSes are required exploits the readily avail- to be tuned with the right set of compo- able Content Address- nents to work well under many demand- able Storage (CAS) to ing conditions. The need to bring down cache the objects in a the system in order to reconfigure it is remote server so that the costly in terms of availability, human read traffic through time, and lost system state. The goal is to WAN can be reduced. perform such online reconfiguration with minimal overhead. Implementing Ted Hayelka, Jim Larson, and Bart Massey enjoying the File recipes, which are this in a traditional OS is difficult USENIX Reception the file content hashes because of unstructured component that describe the data boundaries, and thus it is hard to back- blocks composing the file, are computed Douglis first explained how to detect fit new code. An object-oriented OS for each file and stored in the server. resemblance between objects. The Rabin such as IBM’s K42 is a suitable fit and is When a client wants to read a large file, fingerprints are used to compute hashes used for this purpose. it first fetches the recipes of this file from of overlapping sequences of bytes in a the server. The client then uses these To support online reconfiguration, com- file. A subset of these fingerprints (i.e., recipes as keys to search the file blocks in ponent boundaries must be clearly features) are used to represent the file. CAS. If a block is found, the client defined, external references to the com- Files sharing many features would, retrieves it from CAS; otherwise, the ponent must be updated, state transfer hopefully, have similar contents. client reads it from the server. The client mechanisms must be defined between After two objects are detected as similar, finally reconstitutes the full file content the components being replaced, and delta-encoding is applied to compress using these blocks. The experimental components should be quiesced during the objects. The authors evaluated a results of CASPER showed that when reconfiguration so as to avoid state cor- number of parameters of this approach. client-server bandwidth is low, CASPER ruption. An object translation (indirec-

October 2003 ;login: USENIX ATC ‘03 45 tion) table approach is used to take care application to achieve the goal. A tool CUP: CONTROLLED UPDATE PROPAGATION of external references. Components han- with an architecture-independent API IN PEER-TO-PEER NETWORKS dle their own state transfers, since the known as Dyninst is used for “hijacking” Mema Roussopoulos and Mary Baker, states of different components vary. An an application. It’s easier to hijack a Stanford University interposition phase interposes a media- dynamically linked library than a stati- Mema Roussopoulos presented work around an existing object that is cally linked one, which requires intro- addressing the problem of reducing being reconfigured. The mediator per- ducing the dynamic linker code into the search query latency in peer-to-peer forms the hot-swapping of the object in application’s process space. In order to (P2P) systems. Search queries are a per- three phases: forward, block, and trans- achieve a high level of transparency, the formance bottleneck in both structured fer. It uses a thread-generation mecha- system tries to automatically determine (Chord, CAN, Pastry, Tapestry) and nism to detect active calls, and the original X-server host so as to redi- unstructured (Gnutella, Freenet) P2P eventually blocks new calls and thus rect communication to the new X- systems. Recent work has used caching detects when a quiescent state is server. This is achieved by enumerating of metadata that is returned in response reached. Once new calls are blocked, the the process’s socket descriptors that are to these queries at intermediate nodes new object is hot-swapped and a state communicating with a peer port that along the path and is referred to as Path transfer is initiated. After this process, falls in the well-known X-server port Caching with Expiration (PCX). PCX the mediator is detached, the old com- range. This could fail, however, if tun- mechanisms are only a partial solution, ponent is destroyed, and all new calls neling is used, and is remedied by trying as there is no maintenance of the caches directly occur on the new component. to determine the right socket by a brute- along the path. force approach which involves connect- The performance overhead was around CUP addresses this problem by having ing with every port that the application 500 CPU cycles for the interposition asynchronous propagation of metadata is already connected to. A user could phase and 4000 cycles for the hot-swap- updates with independent local node also explicitly provide this information. ping phase on an IBM RS/6000 24 policies rather than global ones. It is Another step in the process is to find a 600Mhz Power-PC CPU-based server. independent of the underlying search suitable point in the X message stream Two benchmarks, Postmark and SDET, algorithm and uses incentive-based poli- so as not to corrupt or miss any mes- were used for the evaluation. Several cies. A node propagates an update only if sage. This is performed by walking well-known adaptive algorithms per- it has an incentive to do so. For example, through the process stack in search of X formed the online reconfiguration to a node would be willing to receive library stub functions and, if any exist, demonstrate the utility. Some of the updates as long as there are queries for a repeating the operation after a timeout. open issues include coordinated swap- particular piece of metadata, since the A limitation of this approach currently ping and recovery from broken compo- node would have to propagate the query is that it does not work with stripped nents that replace working ones. further if it did not maintain the cache statically linked binaries, for which alter- locally. Two policies are defined, proba- CHECKPOINTS OF GUI-BASED APPLICATIONS nate mechanisms are being worked out. bilistic and history-based. Separate logi- The last step is to retrieve the list of GUI Victor C. Zandy and Barton P. Miller, cal query and update channels are University of Wisconsin resources that exist on the source X- maintained and queries are coalesced. server and re-create them on the target Victor Zandy talked about transparently The number of overlay round-trip hops X-server. X-fonts present a problem migrating the GUI (X Windows) for queries to come back with answers because of lack of enough state on the belonging to unmodified applications starting from the originator is used as a application side to determine the font between hosts without premeditation, a metric for the cost of a query. Miss cost names. The current solution is to system that he calls “guievict.”GUI repli- is defined as the total number of hops retrieve all fonts and match the font cation as well as process migration is incurred by all misses. As long as the dif- geometry with the ones being used. This also possible using this system. Some of ference in miss costs between PCX and hideously slow approach will likely be the related work such as VNC or xmove CUP is larger than the overhead of solved with client-side fonts in the needs premeditation and uses proxies to update propagation, CUP recovers its future. Currently, this overhead is miti- achieve GUI migration. cost. gated in subsequent requests by caching The interesting part of the system is the font data. The Stanford Narses P2P flow-simulator use of program-editing techniques to is used to evaluate the algorithm with a URL: introduce new code into a running variety of cut-off policies, network scales http://www.cs.wisc.edu/~zandy/guievict/

46 Vol. 28, No. 5 ;login: and topologies, outgoing update capac- smarter load balancing, algorithm even override permissions found in the ity, and query arrival distributions. chaining within the kernel thread, using underlying file system (when mounted EPORTS

Based on their model, CUP recovers its the second processor as a cryptographic with VFS bypass permission). R cost by a factor of 2 to 300 under a vari- device in dual-processor systems, and NCryptfs outperformed CFS, TCFS, but ety of workloads described in detail in minimizing copying overhead. not BestCrypt in the Am-utils bench-

the paper. The various cut-off policies ONFERENCE

mark. In an I/O-intensive benchmark, C have comparable performance when the NCRYPTFS: A SECURE AND CONVENIENT CRYPTOGRAPHIC FILE SYSTEM NCryptfs was fastest. Future work query rates are high, while second- includes better key management, lock- chance history-based policy is better Charles P. Wright, Michael C. Martino, and Erez Zadok, Stony Brook University box mode, centralized key servers, and with low query rates. Charles Wright presented this paper by threshold secret serving. SECURITY MECHANISMS explaining motivation (providing pro- Matt Blaze was on his feet even before Summarized by Rik Farrow tection to data) and describing prior Wright finished. Blaze asked if Wright work. He mentioned that Matt Blaze’s was sure he didn’t have the CFS and THE DESIGN OF THE OPENBSD CFS suffered from network and data TCFS performance reversed? Wright CRYPTOGRAPHIC FRAMEWORK copy overhead, and I watched Blaze, just said that they fixed some problems with Angelos D. Keromytis, Columbia a few rows ahead of me, sit up a bit TCFS that had a performance impact. University; Jason L. Wright and Theo de straighter at that statement. Wright also Raadt, OpenBSD Project All three crypto file systems were using mentioned TCFS, Microsoft’s EFS , but had different overhead Angelos Keromytis described the ratio- (which does not work over a network), outside of encryption. nale and mechanisms behind the API Cryptfs, a proof of concept by the same created within OpenBSD, and adopted authors, BestCrypt (commercial), and Marc Stavely asked, Why not use under- by FreeBSD and NetBSD, for integrating StegFS. lying file system checks? Wright said that hardware cryptographic devices. The there are no ACLs on lower-level file sys- basic concept was to provide an abstrac- Design goals include strong encryption; tems, and that their ACLs provide the tion layer so that programs and kernel convenience for users, system adminis- key needed to read/write files. routines could ignore the differences trators, and programmers; and perform- between devices, or even the lack of a ance. NCryptfs has three sets of players: A BINARY REWRITING DEFENSE AGAINST device. Previous implementations could system administrators, who set up STACK-BASED BUFFER OVERFLOW ATTACKS stall the CPU waiting for a device to NCryptfs through mounts but do not Manish Prasad and Tzi-cker Chiueh, respond. hold keys; owners, who hold encryption Stony Brook University keys; and others, called readers and writ- Manish Prasad explained that the goal Within the kernel, three functions han- ers, to whom access is delegated by own- was to instrument a program to defend dle creating, dispatching, and freeing a ers. All keys are stored using a long-term against stack-based buffer overflow crypto_session. The dispatch call key, and each owner uses a passphrase attacks without having access to the includes a callback parameter, so that (currently) to authenticate and access source. There are tools available, such as the crypto device can progress asynchro- his or her own keys. Etch, that can perform binary code nously. Out in user-land, /dev/crypto Borrowing from Blaze’s CFS, NCryptfs translations, but the authors are not provides a uniform entry point, con- aware of any that perform return trolled via sysctl() calls. The current ver- also uses an attach to set up encrypted directories for owners. An attach is like a address defense (RAD) based on binary sion of OpenBSD Cryptographic rewriting. Framework (OCF) is synchronous at the lightweight user-mode mount that, user level. unlike a regular mount, cannot hide files The authors use both linear instruction or directories under a mount point. disassembly and control flow analysis. Another big goal of OCF beyond trans- Only data gets encrypted, not the meta- Because Intel processors use variable- parency is performance. The new API data, so any underlying store can be length instructions, and 248 out of 256 does add overhead, but in systems with used (local disk, NFS, or CIFS). In an bytes are legitimate starting points for hardware cryptographic devices it interesting addition to this scheme, instructions, linear analysis has real improves performance because the owners may delegate their access to indi- problems distinguishing code from data. hardware devices can operate in parallel viduals and to arbitrary groups, simply The authors then follow with a second with the CPU. Future work includes by adding authorizations for several pass, starting with the code’s entry point individuals. And this delegation can

October 2003 ;login: USENIX ATC ‘03 47 (as found in the program header), and ing I/O operations with significantly MULTIPROCESSOR SUPPORT FOR following all function calls and fewer system calls. It is implemented by EVENT-DRIVEN PROGRAMS branches. Even here there are problems, exposing certain elements of a connec- Nickolai Zeldovich, Stanford University; as GUI-based applications tend to use tion’s state to user-space over a piece of Alexander Yip, Frank Dabek, Frans callbacks instead of direct function calls. memory shared between kernel and Kaashoek, and Robert T. Morris, MIT; user-land. The amount of shared mem- David Mazières, New York University RAD replaces the prologue and epilogue ory required is just a small fraction of a The contribution of this paper is liba- of functions with its own instructions, typical Web server main memory (1MB, sync-smp, a library that allows event- storing the return address during the for application with 65K concurrent driven applications to take advantage of call and checking it when the function connections, 16 bytes per connection). multiprocessors by running code for returns. They instrumented several The authors implemented a user-level event handlers in parallel. Typically, Microsoft applications, and estimate select() wrapper called uselect(), which event-driven programs are structured as that they missed less than 1% of func- reads from the shared memory area a collection of callback functions which tions. When used against open source whenever possible, and calls select() a main loop calls as I/O events occur. Windows applications, such as gzip and only when the required information is However, to execute callbacks concur- Wget, they obtained similar results, but not available in user-space. rently on a multiprocessor requires run- not with Apache, which includes func- ning multiple copies of the application tions without any absolute addresses The other mechanism proposed is data- or fine-grained synchronization. (called from tables). stream splicing, which allows forwarding of data between server and client This paper maps this problem to graph- Overhead for an instrumented binary streams from within the kernel, thus coloring by allowing the programmer to varied 1–4%, and a test of an exploit reducing a lot of data-copy and context- choose a color for each callback, so that against an RAD-protected version of switching overhead. The two connec- callbacks with the same color are never winhlp32.exe worked. Prahad ended his tions (to client and to server) are spliced executed concurrently. Thus, event- last slide with a note: I’m looking for at the socket level. The novel features in driven servers can get more juice out of work! One person asked if safe lan- this mechanism are support for request a multiprocessor machine with minor guages would help. Prahad answered pipelining and persistent connections, modifications in code. As proof of con- that that is the best solution. A person content caching decoupled from client cept, they modified the SFS file server from CERT asked about how RAD han- aborts, and efficient splicing even for (about 90 lines of changed code out of a dles detected changes. Prahad said that short transfers. total of about 12,000), and observed that the program exits. the modified version on a four-CPU The proposed mechanisms were evalu- server ran 2.5 times as fast as an unmod- FAST SERVERS ated on a testbed comprising commod- ified SFS server running on one CPU. Summarized by Manish Prasad ity hardware. Experiments were done Further improvements were observed in using the Squid proxy server and bench- KERNEL SUPPORT FOR FASTER WEB PROXIES task-processing rates with thread-affin- marked using PolyGraph. The speaker ity optimizations. Marcel-Catalin Rosu and Daniela Rosu, presented CPU utilization results for dif- IBM T.J. Watson Research Center ferent request rates for a benchmark BIG DATA The paper addresses the challenges in biased toward small files, which showed Summarized by Manish Prasad Web proxy design that arise out of a best overhead reduction with a combi- large number of TCP connections. Han- nation of uselect and splicing. Splicing SENECA: REMOTE MIRRORING DONE WRITE dling a large number of network con- achieved very good results for large Minwen Ji, Alistair Veitch, and John nections causes the proxy server to incur objects. Also, the proxy hit response Wilkes, Hewlett-Packard Labs huge overheads due to context switches times showed substantial reductions Minwen Ji presented Seneca, a prototype and user/kernel data copies. The paper using a combination of uselect and remote-mirroring storage system. A pri- presents two mechanisms to ameliorate splice (10–30%). However, the miss mary contribution of the paper is a this overhead, namely, user-level con- response times reduced only by about comprehensive taxonomy of design nection tracking and data-stream splic- 0.5 to 1.3% using the same combination. choices for remote mirroring over vari- ing. ous axes: fault-coverage (component- User-level connection tracking allows an failure tolerance), degree of synchron- application to coordinate its non-block- ization (divergence), update propagation

48 Vol. 28, No. 5 ;login: and acknowledgment, and location of lows an access-based policy, which FAST, SCALABLE DISK IMAGING WITH FRISBEE data duplication. They also classify places a block into a cache when this Mike Hibler, Leigh Stoller, Jay Lepreau, EPORTS related work in the area, based on their block is accessed, so that the block that Robert Ricci, and Chad Barb, University R taxonomy. resides in the upper-level cache is also of Utah contained in the lower-level cache. Robert Ricci talked about Frisbee, a pro- The second contribution of the paper is

While this might be essential when totype disk-imaging system from Utah. ONFERENCE

the design of a robust asynchronous C upper-level caches are significantly

The paper motivates the use of raw disk remote-mirroring protocol that sup- smaller than the lower-level caches, it’s imaging over differential update which ports write coalescing, asynchronous not quite so important in a storage operates above the file system. Advan- propagation, and in-order delivery and cache hierarchy where storage-server tages offered by disk imaging include that provides resilience to many kinds and file-server (storage client) caches are generality across file systems, robustness and sequences of failures, low network comparable in size. An eviction-based to file-system corruption, and speed. bandwidth demands, and low (and tun- strategy places a block in the cache only Disk imaging, however, is low on band- able) data loss. The principal goal of when it is evicted from an upper level. width efficiency. Frisbee (derived from Seneca’s design is to make the data avail- “flying disks”) incorporates file-system- able and keep each copy consistent The paper uses idle distance as a metric aware data compression, data segmenta- despite disk array failures, Seneca box for evaluating the effectiveness of a tion, and a custom application-level failures, network failures, and both tem- cache-placement policy. Idle distance is reliable multicast protocol. porary and permanent site outages. Its the period of time the block resides in levels of fallback divergence modes are the cache without being accessed. A bet- The talk proceeded with a description of time-bounded, log-space-bounded, and ter cache-placement policy will have the Emulab environment, where the unbounded. Updates are propagated lower average idle distances. The paper prototype was built and tested, which either atomically in order, asynchronous presents real-world storage-access traces, comprises a cluster of 168 commodity batched, or out of order. In Seneca, the which reveal that eviction-based strate- PCs. Being a time-shared setup (in the data duplication is done in the duplexed gies show lower average idle distances true sense of the term), every user has SAN appliance. than access-based policies. The speaker root access to all machines in the cluster. presented simulation results comparing This necessitates returning the cluster Seneca’s correctness is verified using a the two placement policies as applied in nodes to a known state with change of simulator which “approximates” the I/O combination with various cache- users. Thus a system like Frisbee turns automaton model. One of the key goals replacement strategies (LRU, frequency out to be a compelling requirement in of performance evaluation experiments based, 2Q, MQ), which showed that the an environment like Emulab. is to answer the question as to how eviction-based strategy always per- much network bandwidth can be saved Frisbee was evaluated for scalability with formed better. by delaying I/Os (asynchrony) and coa- an increasing number of cluster nodes lescing writes. Results show that sub- The work proposes a mechanism to (from 1 to 80). Experiments were also stantial reductions in WAN traffic transparently obtain eviction informa- conducted to evaluate scalability in the (5–40%) were observed for a batch size tion from client buffer caches, which is presence of various percentages of of 30 seconds. nontrivial because a client buffer cache packet loss. Frisbee has been in produc- always silently evicts a clean page and tion use for over 18 months. Speed was EVICTION-BASED CACHE PLACEMENT FOR only writes out dirty pages to the back- reported as the single most attractive STORAGE CACHES end storage system. The main idea is to feature of Frisbee, being able to write Zhifeng Chen and Yuanyuan Zhou, Uni- maintain a mapping between buffer 50GB of data to 80 disks in 34 seconds. versity of Illinois at Urbana-Champaign; addresses and the disk block that it On this note Ricci quoted colleague Kai Li, Princeton University houses, so that by intercepting read/ Mike Hibler: “Earlier I could go for The work addresses the problem of write I/O operations, any changes lunch while the disk reloads, but now I cache placement (not replacement) in detected in the mapping can be attrib- can’t even make it to the bathroom!” the context of buffer cache management uted to the block being evicted from the for lower-level caches (resident on the cache. Evaluation on real systems back-end storage server) in a multi-level showed a 22% improvement in cache hit storage hierarchy (client-file server-disk ratios and a 20% improvement in the array). Cache placement typically fol- transaction rate.

October 2003 ;login: USENIX ATC ‘03 49 FREENIX TRACK and distributed anti-spam networks. involves extracting information from the Each of these methods has many draw- header and body of the message. For NETWORK SERVICES backs, however, including high percent- example, SpamAssassin uses hand-coded IMPLEMENTATION OF A MODERN WEB ages of false negatives. ASK proposes a features with linear weights combined SEARCH ENGINE CLUSTER new solution to the problem by using a with threshold values. More sophisti- Maxim Lifantsev and Tzi-cker Chiueh, challenge-authentication model. It cated techniques include information Stony Brook University applies authentication to the email with- gain and clustering. The simplest format Summarized by William Acosta out interpreting the content and focuses of features is binary, works on all algo- The authors presented the design and on validating the senders rather than the rithms, and is robust. There are a wide analysis of Yuntis, a prototype for a scal- message; this approach is highly effec- variety of machine-learning techniques. able and extensible Web search engine. tive, since spammers use fake addresses The authors chose techniques that were The design of Yuntis attempts to provide to hide their identity. The system con- simple, popular, and educational. The performance by using an event-driven sists of three lists: white, ignore, and general description of machine learning model with one main thread of execu- black. When a message is received into a involves gathering a large number of tion and “select”-like functionality to waiting queue, a confirmation is sent to training instances, measuring statistical avoid the context-switching overhead of the sender, and once a confirmation is properties with respect to a feature set, a multi-threaded design. Similarly, Yun- received the message is moved into the classifying target instances, and, finally, tis employs custom disk data storage and in-box and the email address of the measuring the accuracy of the classifica- intra-cluster communication mecha- sender is added to the white list; there- tion. nisms. after, a confirmation from that sender Bart Massey described five different will not be required. The overall process of creating the techniques for mail classification, focus- search database involves crawling the There are a few specific issues that the ing on only a few. The most simple is the network to retrieve documents, which software deals with. In order to catch minimal Hamming Distance Voting. are compressed and stored upon suc- spammers who impersonate the owner, Here a message is compared against the cessful retrieval. A page quality score is a “mailkey” can be added to one’s own messages in the corpus to determine its iteratively computed, keywords are mail. To avoid mail loops, the software classification; the classification time for extracted, and an index is created of the keeps the last few emails (the number is this method is huge and it is necessary extracted phrases. The prototype imple- variable) in a FIFO queue and checks to have a large variety in the data. The mentation consists of 12 cluster nodes whether a message occurs more than simplest neural net is a single neuron and a data set of 4 million crawled once. It is also possible to send com- call perceptron, but a more complex pages, with statistics and information mands to configure the queue remotely. neural net works even better. Basically, stored in 121 separate data tables. The A limitation of the software is dealing features are given as input and the net- content is partitioned over the entire with bounces: The software cannot dis- work’s output provides the classification. cluster. Future work includes workload tinguish between real and spam bounces There is a need for a representative data balancing of the cluster, scalability without accessing the MTA. Another set. Tests were run on different corpora, improvements, and fault-tolerance issue that needs to be addressed is that personal messages, and synthetic data. mechanisms. simple challenges can be defeated with a There were enough different characteris- smart auto-responder. tics to make a difference. The complex URL: http://www.ecsl.cs.sunysb.edu/ neural network seemed to have the best yuntis/ LEARNING SPAM: SIMPLE TECHNIQUES FOR results, with <1% false positives and FREELY AVAILABLE SOFTWARE 1–3% false negatives. Even humans can- MAIL Bart Massey, Raya Budrevich, Scott not achieve 0% misclassification rates. Summarized by Raya Budrevich Long, Mick Thomure, Portland State Machine learning is a key aspect of the University ASK: ACTIVE SPAM KILLER filtering. Marco Paganini Today there are many approaches to the problem of spam, including black/white Currently, there are three main preven- lists, laws, and automated filtering, tion approaches to the problem of which includes feature recognition and unwanted commercial email: real-time classification; these methods can all black hole lists, keyword identification, work together. Feature detection

50 Vol. 28, No. 5 ;login: NETWORK PROTOCOLS tion, rsync would be a good tool for syn- ing system, where a hashtable had been Summarized by Benjamin A. Schmit chronization with mobile devices. With designed too small for today’s require- EPORTS Network Programming for the approach presented, the update of a ments. R file can be done in place, without the the Rest of Us After these problems were solved, the creation of a temporary work copy. Glyph Lefkowitz, Twisted Matrix Labs; authors improved the NFS server per-

Itamar Shtull-Trauring, Zoteca ONFERENCE

A problem that needed to be solved, formance by introducing cursors, which C

Itamar Shtull-Trauring started the first however, was that data necessary for made the server able to recognize several talk by comparing network program- later rsync commands should not be concurrent file reads. On a heavily ming to driving a car: An automatic overwritten by earlier ones (as he loaded NFS server, these optimizations transmission frees the driver from hav- explained, rsync does its work by trying speed up file requests by a factor of up ing to make low-level choices, at the to find blocks from the old file some- to 2.4. possible cost of some performance. Sim- where in the new file). The solution here ilarly, programmers who want to do is the creation of a dependency graph; BIOS AND VIRTUAL DEVICES networking without caring for peak per- though it might contain cycles, these are Summarized by Shashi Guruprasad formance should use a networking broken up by retransmitting data over- FLEXIBILITY IN ROM: A STACKABLE OPEN toolkit. written by earlier commands, for which SOURCE BIOS two algorithms are implemented. The main design goal for the Twisted Adam Agnew, Adam Sulmicki, William networking framework was providing a The benchmarks, which update files typ- Arbaugh, University of Maryland at cross-platform solution that still allows ically used on handheld computers, College Park; Ronald Minnich, Los access to platform-specific features. It is show that there is very little difference in Alamos National Labs written in Python, a high-level language performance as compared to the original This won the Best Student Paper award. that helps beginners avoid common pro- rsync implementation. The new algo- Adam Agnew presented the first open gramming errors by providing, for rithm, however, needs more main mem- source PC BIOS that could boot any example, automatic boundary checking. ory for calculating the dependency modern OS. They leveraged earlier work The framework is built around an event graph. This problem can be solved by in Linux BIOS, which could only boot loop and allows programmers to easily introducing windowing, which is not yet Linux. Unlike Linux, some of the OSes add even complex services such as a Web implemented. rely on services such as video, hard server to their programs. drive, memory sizing, and a PCI table NFS TRICKS AND BENCHMARKING TRAPS provided by legacy BIOSes. Therefore In order to show how the Twisted Daniel Ellard and Margo Seltzer, Linux BIOS could not directly boot framework can help a programmer Harvard University these OSes. Legacy BIOSes do a poor job speed up development, Conch, an SSH The initial research goal of this paper of setting up a PC, requiring modern application, was implemented. The was to improve the performance of an OSes such as Linux to redo some of the implementation contains 5000 lines of NFS server by optimizing its read-ahead tasks, increasing bootstrap latency. The code written by a single person, as com- caching strategy. Daniel Ellard pointed advantages of using Linux BIOS are pared to 64,000 lines by 84 people in the out that 5–10% of the NFS requests numerous: (1) dramatic increase in boot case of OpenSSH. The framework is arrive at the server in the wrong order, speed – for example, they have achieved available for download from http://www. which could hinder it from doing read- a record of a three-second boot – the twistedmatrix.com/ and was released ahead properly. main bottleneck is the time required to under the LGPL. bring the IDE hard drive spin to normal The benchmarks designed to measure operational speed; (2) small size – an IN-PLACE RSYNC: FILE SYNCHRONIZATION performance improvements showed an image size of 36KB uncompressed offers FOR MOBILE AND WIRELESS DEVICES unusually large amount of variance. scope for more powerful tools to be part David Rasch and Randal Burns, Johns This was caused mainly by the fact that of BIOS; (3) open source – it’s easily Hopkins University modern hard disks have different data debuggable and royalty free and saves David Rasch identified the space require- transmission rates at different physical motherboard manufacturers $3–$5 per ment of rsync as a major problem for areas. New benchmarks that took this PC in royalties; (4) remote administra- mobile and wireless devices with little into account still showed problems, this tion is possible via console support. storage capacity. Without this restric- time located within the FreeBSD operat-

October 2003 ;login: USENIX ATC ‘03 51 The proposed solution employs a com- in supporting denser clusters that have better BIOS integration, serial port emu- bination of Linux BIOS and Bochs x86 more machines per rack where it is nec- lation, and frame-buffer emulation for emulator, using a custom wrapper func- essary to support only the most essential OSes that do not support character- tionality known as Adhesive Loader of the I/O interfaces for reducing both based consoles. (ADLO), whixh doesn’t require the space requirements and the heat dissipa- Bochs emulator to be modified. The tion of each machine. An example of IMPLEMENTING CLONABLE NETWORK STACKS Bochs emulator has excellent PC BIOS such a cluster that researchers in the IN THE FREEBSD KERNEL interrupt support, and using an existing IBM Austin lab built is known as a Marko Zec, University of Zagreb mechanism avoided maintaining more superdense server with 200–300 x86 Virtual hosting, network simulation, and software. Together, these provide the servers in a 42U rack. These servers advanced VPN provisioning require missing legacy BIOS support except for already support an Ethernet interface support for multiple protocol stacks on the Video BIOS, which is extracted from and have no room for serial ports on the the same physical host. This paper the Video ROM and packaged along board. focuses on the design, implementation, with the above to complete the BIOS. and performance of an experimental Two different solutions for console over One of the advantages of such a solution clonable network stack in the FreeBSD Ethernet were developed. The first solu- is that different components can be kernel. By separate stacks, the author tion, which is a TCP/IP-based Linux stacked, leading to different configura- means independent states – routing console named “etherconsole,”uses a tions, of bootloaders, for example, or tables, interfaces, and firewall rules – kernel interface to the socket library to more sophisticated tools in the BIOS, rather than independent protocol code. send and receive console messages such as console support. In other words, this work does not sup- instead of performing low-level device port the creation of a separate protocol Possible future work was discussed, operations on the serial device. A major stack that is not already supported by which included TCPA, console over downside of this approach is that con- the OS. Virtualizing a network stack for other devices such as USB, authenticated sole support is possible only after the above applications is a more light- booting, virtual machine/virtual TCP/IP initialization. Problems that weight alternative to a traditional virtual machine monitor support in BIOS, occur before this phase will not be machine. transparent encrypted storage, and reported on the console and, therefore, transparent backup and intrusion detec- debugging such problems is difficult. A The key design goals were: API/ABI tion that cannot be turned off. There second solution involves the use of link (Application Binary Interface) compati- was also a demonstration of a quick- layer networking, that is, the sending of bility and low or negligible performance booting OS. During the console messages using raw Ethernet overhead. A naïve approach to imple- Q&A, someone asked about menu-based frames with a special Ethernet type and ment clonable stacks is to add an array configuration support. The answer was a broadcast Ethernet address. At the dimension to the kernel networking data that currently no configuration support receiving end, a program captures these structures. Such an approach, however, is present and it is necessary to re-flash. frames and provides the console output. increases the amount of kernel code A combined approach was then used for modification. Instead, the approach used URLs: console support: the link-layer approach was to group the kernel data structures http://www.missl.cs.umd.edu/ until DHCP is performed, and the that maintain the state of the network http://Linuxbios.org/ TCP/IP approach afterwards. stack together and provide access http://bochs.sourceforge.net/ through an additional level of indirec- Security issues were discussed. The pos- tion. This grouping is termed a “virtual CONSOLE OVER ETHERNET sibility of break-ins and denial-of-ser- image” and has an instance of network Mike Kistler, Eric van Hensbergen, and vice is increased when the console is stack associated with it. Virtual images Freeman Rawson, IBM Austin Research available over the network. The sug- Laboratory are organized in a hierarchy similar to gested solutions included a separate pri- the UNIX process hierarchy that helps in Eric van Hensbergen talked about con- vate network for console and switch binding unmodified programs to differ- sole support over an Ethernet device. VLAN support. The latter does not ent network stacks. Traditional forms of console support are address denial-of-service. This was inte- through the serial devices that are aggre- grated with Linux BIOS into custom There was also a discussion on the gated through KVM switches in clusters. firmware at IBM and allowed full system implementation of CPU load and usage The main problem with serial console is administration and debugging. Some accounting/limiting per virtual image, future areas of research in this area were

52 Vol. 28, No. 5 ;login: which helps in resource management behaves when things go wrong, like over any existing mechanism of data and avoids receive livelock. The per- restarting an out-of-date SE or the fail- exchange (NFS, HTTP, FTP) and can EPORTS formance degradation is hardly notice- ure of an HE (manual failover), and also leverage existing IPSec infrastructure for R able, around 3.5% lower than an how the HE ensures that a quorum of secure client-file server communication. unmodified kernel where the test system SEs are in sync when one SE happens to The experimental platform comprised a

had one active network stack among 128 be slower than the other. ONFERENCE

set of machines with modest hardware C stack instances. In some tests, the per- Finally, the author presented some inter- running OpenBSD. Measurements using formance improved slightly (5.7%) due esting experimental results for read and Bonnie micro-benchmark showed that to better locality of kernel network stack write availability against various quo- write throughput was comparable to variables which led to higher CPU cache rum sizes and arrived at a quorum size that of NFSv2, although read through- hits. of two and a set of three replicating SEs put was observed to be lower than NFS. URL: http://www.tel.fer.hr/zec/BSD/ as a recommended configuration to The Postmark benchmark, representing vimage/index.html achieve good read and write availability. a heavy workload of small files, was used He concluded that SEs not in quorum for versions of DisCFS without any FILE SYSTEMS could be connected over the Internet, security and with and without credential Summarized by Manish Prasad thus eliminating the need for a dedi- caching. cated high-bandwidth low-latency link. STARFISH: HIGHLY AVAILABLE BLOCK The talk concluded with a discussion of STORAGE SECURE AND FLEXIBLE GLOBAL FILE SHARING future work, the most noteworthy con- Eran Gabber, Jeff Fellin, Michael Flaster, Stefan Miltchev, Jonathan M. Smith, cerning implementation of access con- Fengrui Gu, Bruce Hillyer, Wee Teck Sotiris Ioannidis, University of Pennsyl- trol beyond permission bits and Ng, Banu Özden, and Elizabeth Shriver, vania; Vassilis Prevelakis, Drexel Uni- avoiding the use of inode numbers as Lucent Technologies, Bell Labs versity; John Ioannidis, AT&T Labs – file handles because of security holes Michael Flaster presented this FREENIX Research; Angelos D. Keromytis, created by inode number reuse. award paper. The principal contribution Columbia University was the dissemination of a replicated Stefan Miltchev presented this work on THE CRYPTOGRAPHIC DISK DRIVER block storage system to the open source authentication in network file systems. Roland C. Dowdeswell, NetBSD Project; community, built from commodity The paper presents Distributed Creden- John Ioannidis, AT&T Labs – Research servers running FreeBSD, connected by tial FileSystem (DisCFS), which uses Dowdeswell presented the Crypto- standard high-speed IP networking gear. trust management credentials to Graphic Disk Driver (CGD), “a pseudo- StarFish is not a distributed file system. “directly authorize actions rather than device driver that sits below the buffer It assumes a single-owner scenario and divide the authorization task into cache and provides an encrypted view of thus doesn’t deal with issues resulting authentication and access control” and an underlying raw partition.”CGD tar- from multiple writers. “to identify files being stored; users; and gets the problems arising from laptops and other portables “growing legs” and StarFish architecture comprises a com- conditions under which their file access vanishing from public places! Here, modity server with an appropriate SCSI is allowed.”DisCFS is intended to “protection of data from other concur- or FC controller, called host element address the weaknesses of existing file rent users is not essential, but protection (HE), which helps the client host access access control mechanisms, which are against loss or theft is important.” data from a set of storage elements (SEs) either too coarse-grained (e.g., Web, that talk to the HE using TCP/IP.An SE FTP) or are unsuitable for use across Due to its positioning in the I/O stack forms a single unit of replication. The administrative domains (e.g., NFS). (just above the disk driver), it is com- SEs could be connected to the HE via DisCFS uses a direct binding between a pletely transparent to file systems. Some either a dedicated link or the Internet. public key and a set of authorizations. A of the goals of the work are to maintain Writes to StarFish are considered to be user can delegate trust to another, which good performance while providing ade- complete on receiving ACKs from all the results in creation of a chain of trust, quate security, ease of use, and seamless SEs that form the required quorum. similar to systems like SPKI. Only a sub- integration into the OS release. The in- kernel driver supports an ioctl interface The presenter projected steadfast relia- set of privileges may be delegated to to attach CGD to an underlying disk bility as the unique selling point of another user, making privilege escalation device or partition and configure the StarFish and presented in detail how it impossible. DisCFS can be implemented

October 2003 ;login: USENIX ATC ‘03 53 required parameters such as encryption expressions were used to recognize the to understand the raw packet data. The algorithm, key length, IV method, etc. strings of grid numbers that defined a usefulness of this tool is difficult to They define a modular framework for stroke. An additional problem is that if overemphasize; it allows the network adding cryptographic algorithms. the device is held tilted, strokes will also trace to be examined in detail and prob- be tilted. The solution to this is that the lem areas to be picked out easily. The driver was evaluated on DEC Per- common horizontal strokes (backspace sonal Workstation 500a, Pentium The LBX proxy uses application-specific and space) were used to reorient the grid 4–based PC, and an IBM Thinkpad compression to reduce the size of when one was recognized. One nice fea- 600E. Results were presented for read- requests. In general, LBX was found to ture is that tilting the device enough to and-write throughput for various block be of very little utility in a modern set- cause incorrect recognition produces sizes, comparing various encryption ting: some of the requests it can com- garbage characters, alerting the user to algorithms – Blowfish, AES, and 3DES – press are obsolete, plus bandwidth for X correct the problem with backspace. against unencrypted I/O. As expected, requests is now a problem only for large More information and software are encrypted I/O throughput edges closer image files. SSH’s gzip compression, in available at http://www.xstroke.org. to raw I/O with the increase in block fact, was found to be superior in almost

size. The talk concluded with a discus- MATCHBOX: WINDOW MANAGEMENT NOT all cases. sion of future work, which included FOR THE DESKTOP In general, latency dominates band- addressing the problem of key revoca- Matthew Allum, OpenedHand Ltd. width, and most of the latency comes tion and leveraging hardware crypto- Matchbox is an X Window manager from synchronous requests. Many of graphic accelerators to achieve better designed for small devices that have low these can be eliminated by batching performance. memory, no keyboard, and limited CPU them, as with internAtom requests power. Matchbox is designed to be (already done by some toolkits). Finally, X WINDOW SYSTEM small, fast, flexible, and configurable. It restructuring applications can reduce Summarized by James Nugent has some restrictions specific to its envi- the effects of latency. Image files were XSTROKE: FULL-SCREEN GESTURE ronment: Only a single full-screen app is found to be the dominant reason for RECOGNITION FOR X supported at a time, and applications bandwidth usage. It was suggested that Carl D. Worth, University of Southern cannot resize windows or make windows using original image formats, which are California larger than the screen. A toolbar/virtual usually compressed, would be helpful. The goal of Xstroke is to provide full- keyboard is always visible. Client-side fonts were also studied; they screen gesture recognition for X Win- Matchbox’s focus is on providing win- had the effect of significantly reducing dows. A gesture is recognized and is dow management that is also compliant the round trips and, hence, application passed to the system as a keystroke or a with standards, most notably the ICCCM startup time. Bandwidth usage is about series of keystrokes. The focus is on standards. Matchbox is themeable, has the same, although compressing the handheld devices, so it is important that runtime and compile-time configura- glyphs for transport would reduce this. the entire screen can be used for recog- tion options, and is extensible via plug- nition, unlike some systems that have a ins. It also has a PDA-style app launcher. EXPERIENCES special “gesture area.”The size and loca- Summarized by Raya Budrevich tion of the gesture are also important. X WINDOW SYSTEM NETWORK Finally, the gesture “draws” on the screen PERFORMANCE BUILDING A WIRELESS COMMUNITY NET- with a transparent color, thus allowing Keith Packard and James Gettys, WORK IN THE NETHERLANDS the gesture to be seen as it is made, but Cambridge Research Laboratory, HP Rudi van Drunen, Dirk-Willem van not to obscure data beneath. Labs Gulik, Jasper Koolhaas, Huub Several X clients were set up to talk to an Schuurmans, and Marten Vijn, Wireless One difficulty is toggling gesture recog- X server attached to a passive packet- Leiden Foundation nition on and off. Several approaches monitoring tool and then to low-band- The purpose is to provide free wireless were discussed, but none was ideal. The width X (LBX) and SSH proxies with a access in historical cities in the Nether- Recognizer is a 3x3 grid based on the NISTNet router. (NISTNet is capable of lands. The technology uses 802.11b with bounding box of the stroke. A grid has producing a variety of delay/bandwidth- three available concurrent channels the problem that similar strokes may limited network conditions.) The xplot without interference. The network is on look different near corners; thus regular performance visualization tool was used ISM band, shared with ham radio,

54 Vol. 28, No. 5 ;login: microwave, and vehicle ID. It is an IPv4 for the pointers and sign the revision with a portable application-specific col- network with a private address space, records, we get end-to-end integrity and lection strategy. EPORTS routing by OSPF, with ISC-DHCP and auditability. SSL is used as the authenti- R ISC-DNS; the network is transparent to cation technology. Because of the per- FREE SOFTWARE AND HIGH-POWER the user and provides no user-level secu- missions setup of the system, anony- ROCKETRY: THE PORTLAND STATE AEROSPACE SOCIETY

rity. mous access is easy and convenient. ONFERENCE

James Perkins, Andrew Greenberg, C Currently, there is a 20km2 outdoor cov- Currently authentication and integrity Jamey Sharp, David Cassard, and Bart erage for the city of Leiden, 20+ nodes, work well. During the last two years Massey, Portland State University 300 private users, and three proxies to there have been only three breaches. The The research group consists of under- connect to the Internet; many local hazards include OpenSSL bugs yielding graduate and graduate students in sci- schools and libraries are also connected. an unreliable transport. Boehm-GC ence and engineering at Portland State There is a need to find free locations for flaws yield leaked memory and regular University, people in local industry, and the antennas, since there is no funding server hangs. local aerospace enthusiasts. The current to pay rental fees. At every site the Several decisions, during implementa- model reached 3.6km; the next-genera- strength of the other nodes and the tion, that seemed plausible led to errors tion model will leave the atmosphere. interference are measured. Network sim- later on: The goal is to put nanosatellites in orbit. ulation software is used to determine the construction of all sites. 1. Using a texty format for debugging – The active guidance computer on the This cost 30% more space and didn’t rocket determines the rocket’s current A node consists of multiple antennas compress, and file systems didn’t respect position, heading, and course; the com- with donated PCs or embedded systems. cases. Therefore one should plan ahead puter then steers the rocket to keep it on The systems run FreeBSD because it is for binary format. course. In order to determine this infor- stable, single distribution, and tagged mation the computer uses several sen- release. Industry-standard wireless cards 2. Server-side integrity checks – Files sors: GPS, inertial measurement unit are used. might be damaged when they reach the (IMU), magnetometers, and pressure server. The server needs to serialize/ The wireless network is available freely and optical sensors. deserialize, which leads to the server to all inhabitants of Leiden. The Wireless knowing the object schema. The launch tower is used to launch the Leiden Foundation is a nonprofit organ- rocket, view rocket status, and send ization that tries to create strong con- 3. Build a simple file-based store first – emergency commands to the rocket. The nections with schools and universities. One file per object was stored, using gzip on-board system includes the flight soft- The network has many uses, such as P2P, to compress, but in switching to binary ware avionics firmware. gaming, and VPN. On any given day, format, it was discovered that 60% of about 100 users are connected. the files were less than 500 bytes. It was difficult to decide which embed- ded operating system to use. There were Community acceptance, project man- 4. Build an event-driven server – Used several candidates but RedHat’s eCos agement, and interference were the main non-blocking I/O and an event loop, was selected. It has all of the needed cri- problems facing the project. which simplified the code and the server teria: RT, POSIX, free for use, small. but caused problems with SSL. OPENCM: EARLY EXPERIENCES AND LESSONS Commercial OEM GPS boards don’t LEARNED 5. Use GC and exception handling – work because of their software limita- Jonathan S. Shapiro, John Vanderburgh, This was well engineered for memory tions in determining acceleration veloc- and Jack Lloyd, Johns Hopkins management but has many platform ity and altitude. GPL-licensed firmware University dependencies, leaks memory in large for GPS receivers uses eCos. Differential OpenCM is a new configuration-man- objects, and doesn’t work on OpenSSL. GPS base station and receiver provide agement system that uses cryptographic To fix GC and exceptions a new manage- precision timing and altitude determina- authentication and high integrity. The ment, GC_SCOPEs, is introduced. Cur- tions. architecture uses cryptographic naming, rently, schema issues described in the Future work includes creating an and the content of a configuration file is paper are all fixed in the development enhanced inertial measurement unit, a DAG; every revision is a pointer to the branch. There’s still a need to modify developing next-generation navigation root of the DAG. If we use crypto hashes storage format and replace Boehm-GC algorithms, investigating loose coupling

October 2003 ;login: USENIX ATC ‘03 55 of IMU and GPS, GPS aiding of the support relinquishing access privileges. For MAC, the FreeBSD kernel has been IMU unit, deep coupling of IMU/GPS This is no problem as long as the appli- extended by additional events. MAC- and other sensors, and adding a steer- cations do not contain any bugs (which aware applications register their interest able hybrid motor. could lead to root privileges for any local in some events, then get these events, user). and possibly de-register afterward. PRIVILEGE MANAGEMENT Access policies are divided into adapta- Privman is implemented as a user-space tions of traditional access policies, tradi- Summarized by Benjamin A. Schmit library which mirrors some calls to the tional MAC policies, and new ones like a POSIX ACCESS CONTROL LISTS ON LINUX libc as well as some system calls. A pro- port of the SELinux FLASK policy to gram linked to the library starts by fork- Andreas Gruenbacher, SuSE Linux AG FreeBSD. The POSIX.1 access permission model is ing into a privileged server and a client not always sufficient, especially when that gives up all its access privileges. The main benefit of MAC is that the interacting with Windows clients via When the client needs to make a privi- source code does not have to be modi- SAMBA. The abandoned standard leged call, it contacts the server on a fied to use it. The implementation still POSIX.1e (the last draft, 17, is available pipe. The server than decides, based on a lacks support for multi-threaded appli- to the public) is taken as the basis of policy file, whether the access should be cations and multi-threaded kernel most ACL implementations. granted. threads. Also, object labeling within the kernel currently decreases performance, The main advantage to Privman is that In the Linux implementation, which has which could be solved by introducing a few changes need to be added to the become an official part of the 2.5 kernel cache. version, access permissions (read, write, source code. The overhead induced by the wrapper calls can be huge for a sin- and execute) can be granted to the KERNEL gle system call, but because these calls owner, named users, the owning group, Summarized by Francis Manoj David named groups, and others. A mask is occur rarely, the performance of a typi- used to retain backward compatibility cal application decreases by only about USING READ-COPY-UPDATE TECHNIQUES for non-ACL-aware applications. 5%. Privman has been used to adapt FOR SYSTEM V IPC IN THE LINUX 2.5 Default permissions make it possible to OpenSSH and wu-ftpd for privilege KERNEL inherit permissions that were set at management. Andrea Arcangeli, SuSE; Mingming directory level. Cao, Paul McKenney, and Dipankar THE TRUSTEDBSD MAC FRAMEWORK: Sarma, IBM The Linux ACL implementation can be EXTENSIBLE KERNEL ACCESS CONTROL FOR Locking and synchronization are used for ACL-aware network protocols, FREEBSD 5.0 extremely important tools in multi- the most important ones being NFS Robert Watson, Wayne Morrison, and threaded environments. Atomic incre- (only version 4) and CIFS (formerly Chris Vance, Network Associates ment, the key to locking, is possible on SMB). Since these two protocols imple- Laboratories; Brian Feldman, FreeBSD current CPUs. However, newer proces- Project ment ACLs in a different way, a mapping sors (like the Pentium 4) consume sig- needs to be done here. The Linux imple- In his dense talk, Robert Watson nificantly more cycles than older mentation does not support granting explained the implementation of the processors during a single atomic incre- somebody other than the owner the Mandatory Access Control system ment. This is because the whole pipeline ability to change permissions. within FreeBSD 5.x. The MAC frame- needs to be flushed, and this is an work is implemented partly within the expensive operation. For data that is PRIVMAN: A LIBRARY FOR PARTITIONING kernel, partly in user-space. Common read-only, paying this penalty is unrea- APPLICATIONS applications require more than just the sonable. Douglas Kilpatrick, Network Associates standard UNIX security mechanisms. Laboratories Operating system support for access Read-Copy-Update (RCU) is a tech- In this talk, Douglas Kilpatrick described control is important; without it, applica- nique that “allows lock-free read-only the Privman privilege management tions would have to cope with several access to data structures that are concur- library. Some UNIX applications make different security mechanisms, the result rently modified on SMP systems.”To use of privileges that normal users can’t being mostly an intersection of the indi- illustrate RCU concepts, let us look at an acquire and, therefore, run with root vidual permissions. example. In a linked list, deletion of an privileges because UNIX does not yet element and traversal of the list cannot happen simultaneously. This is because

56 Vol. 28, No. 5 ;login: the traversal code might reference a some primitive like test and set or incre- PROVIDING A LINUX API ON THE SCALABLE pointer that is freed by the deletion ment a variable (e.g., count++ in C). In K42 KERNEL EPORTS code. The RCU-based solution is to order to guarantee that this code is Jonathan Appavoo, University of R defer the freeing of the pointer until it is called atomically, a user thread should Toronto; Marc Auslander, Dilma Da certain that no other code is traversing not be preempted in the middle of this Silva, David Edelsohn, Orran Krieger, Michal Ostrowski, Bryan Rosenburg,

the list. The paper references a couple of code. If it does get preempted, then the ONFERENCE more efficient solutions to this problem. code needs to be restarted. This restart- Robert W. Wisniewski, and Jimi Xeni- C dis, IBM T.J. Watson Research Center ing would ensure that the operation is The authors have successfully applied performed atomically. K42 is a new research OS kernel RCU techniques for semaphore data designed to be highly scalable and exten- structures in the Linux 2.5 kernel. There RAS has been implemented for the sible and written in object-oriented is a dynamic array called ipc_ids that NetBSD OS. The user thread registers style. It also supports online reconfigu- needs to be traversed for all semaphore the entry and exit points of the RAS ration of kernel services. K42 pushes a operations. This array is protected by a with the kernel. Whenever there is a lot of usual kernel functionality into global lock. This design prevents sema- context switch, the kernel checks to see user-space, enabling efficient IPC. Also, phore operations from proceeding in whether the execution was in the middle global data and locks are avoided in parallel. Using RCU, the global lock was of the RAS, and if it was, the RAS is order to improve performance. eliminated and deferred deletion of restarted when the thread returns. Thus, semaphores was implemented. The total atomic operations are provided. Also, The motivation for this work was to run number of new lines of code added to one cannot write arbitrary code for RAS all standard Linux programs on K42, the semaphore implementation was only and expect things to work correctly. An thus increasing the number of applica- 151. Micro-benchmarks show very good RAS should have the following proper- tions available to those working with results. In the future, the authors plan to ties. It should have a single entry and K42. This paper describes the challenges use RCU to optimize more code in the exit point, should not modify shared that the authors faced in providing the kernel. The current code is available data, and should not invoke any func- Linux API on K42. Linux processes were under the GPL license. tions. This ensures that, on restarting the supported by mapping them to K42 sequence, the thread is restored to a cor- processes. A K42 process called the URLs: rect state. Thus, the responsibility for ProcessLinuxServer maintains these http://www.rdrop.com/users/paulmck getting things to work correctly is shared relationships. Fork was implemented by http://sourceforge.net/projects/lse between the kernel and the user thread. placing most of the code in user-space. The file descriptor table was lazily repli- AN IMPLEMENTATION OF USER-LEVEL The user interface to the system is simi- cated for performance reasons. For sig- RESTARTABLE ATOMIC SEQUENCES ON THE lar to that provided by mmap. RASes are nals, all state was kept in the client. To NETBSD OPERATING SYSTEM registered with the kernel. The kernel provide a Linux environment, a new ver- Gregory McGarry checks for such sequences in the sion of glibc, targeted toward K42, was A Restartable Atomic Sequence (RAS) is cpu_switch function and restarts compiled. Exec was also implemented a mechanism used to efficiently imple- sequences if necessary. Performance in user-space. This involves an unload ment atomic operations on uniprocessor studies show that this approach is better operation followed by a reload opera- systems. It was designed to provide than the syscall approach. RAS is cur- tion. Namespace resolution for files was atomic operations on processors that rently transparently used in the NetBSD also implemented in user-space. The don’t provide them. On processors that pthread library for uniprocessor entire Linux emulation layer was pro- do provide them, RAS increases proces- machines. vided by a modified Linux kernel, with sor performance. K42 as the target architecture. Currently, System calls are one option when it the project has a fully functional imple- comes to performing atomic operations mentation for 64-bit architectures and is for user threads. This emulates memory- available for download under the LGPL interlocked instructions. However, this license. comes with a large overhead. RAS is a URL: http://www.research.ibm.com/K42/ better solution to this problem. An RAS is basically some code that provides

October 2003 ;login: USENIX ATC ‘03 57