O CTOBER 2005 VOLUME 30 NUMBER 5

THE USENIX MAGAZINE

OPINION Musings RIK FARROW

OSES FreeBSD 5 SMPng MICHAEL W. LUCAS Better Tools for Kernel Evolution, Please! MARC E. FIUCZYNSKI Solaris 10 Containers PETER BAER GALVIN

SYSADMIN DNS-based Spam Rejection HOBBIT Finding Trojans for Fun and Profit BORIS LOZA ISPadmin: Embedded Hardware ROBERT HASKINS

SECURITY Security Through Obscurity: A Review of a Few |of FreeBSD’s Lesser-Known Security Capabilities DAVI D MALONE Surviving DDoS Attacks SRI KANTH KANDULA

NETWORKING Distributed, Adaptive Resource Allocation for Sensor Networks GEOFFREY MAINLAND AND MATT WELSH

BOOK REVIEWS Book Reviews ELIZABETH ZWICKY AND ADAM TUROFF

USENIX NOTES The USENIX Association Financial Report for 2004 ...and much more

CONFERENCES Linux Kernel Developers Summit; International Workshop on Wireless Traffic Measurements and Modeling (WitMeMo ’05);Workshop on End-to-End, Sense and Respond Systems, Applications, and Services (EESR ’05); MobiSys 2005:The Third International Conference on Mobile Systems, Applications, and Services; HotOS X:Tenth Workshop on Hot Topics in Operating Systems;VEE ’05: First International Conference on Virtual Execution Environments; Steps to Reducing Unwanted Traffic on the Internet Workshop (SRUTI ’05) The Advanced Computing Systems Association Upcoming Events

ACM/IFIP/USENIX 6TH INTERNATIONAL 3RD SYMPOSIUM ON NETWORKED SYSTEMS MIDDLEWARE CONFERENCE DESIGN AND IMPLEMENTATION (NSDI ’06)

NOVEMBER 28–DECEMBER 2, 2005, GRENOBLE, FRANCE Sponsored by USENIX, in cooperation with ACM SIGCOMM http://middleware05.objectweb.org and ACM SIGOPS MAY 8–10, 2006, SAN JOSE, CA, USA http://www.usenix.org/nsdi06 19TH LARGE INSTALLATION SYSTEM Paper titles and abstracts due: October 10, 2005 ADMINISTRATION CONFERENCE (LISA ’05) Final paper submissions due: October 17, 2005 Sponsored by USENIX and SAGE

DECEMBER 4–9, 2005, SAN DIEGO, CA, USA 5TH SYSTEM ADMINISTRATION AND NETWORK http://www.usenix.org/lisa05 ENGINEERING CONFERENCE (SANE 2006) Organized by Stichting SANE and co-sponsored by Stichting NLnet, USENIX, and SURFnet 2ND WORKSHOP ON REAL, LARGE MAY 15–19, 2006, DELFT, THE NETHERLANDS DISTRIBUTED SYSTEMS (WORLDS ’05) http://www.sane.nl/sane2006 DECEMBER 13, 2005, SAN FRANCISCO, CA, USA Paper submissions due: October 24, 2005 http://www.usenix.org/worlds05

2006 USENIX ANNUAL TECHNICAL 3RD INTERNATIONAL IEEE SECURITY IN STORAGE CONFERENCE (USENIX ’06)

WORKSHOP MAY 30–JUNE 3, BOSTON, MA, USA Sponsored by IEEE Computer Society Task Force on Information http://www.usenix.org/usenix06 Assurance (TFIA) in cooperation with IEEE Mass Storage Systems Paper submissions due: January 17, 2006 Technical Committee (MSSTC) and USENIX

DECEMBER 13, 2005, SAN FRANCISCO, CA, USA http://www.ieeeia.org/sisw/2005 15TH USENIX SECURITY SYMPOSIUM (SECURITY ’06)

JULY 31–AUGUST 4, VANCOUVER, B.C., CANADA 4TH USENIX CONFERENCE ON FILE AND http://www.usenix.org/sec06 STORAGE TECHNOLOGIES (FAST ’05) Paper submissions due: February 1, 2006 Sponsored by USENIX in cooperation with ACM SIGOPS, IEEE Mass Storage Systems Technical Committee (MSSTC), and IEEE TCOS 7TH SYMPOSIUM ON OPERATING SYSTEMS DECEMBER 13–16, 2005, SAN FRANCISCO, CA, USA DESIGN AND IMPLEMENTATION (OSDI ’06)

http://www.usenix.org/fast05 NOVEMBER 6–8, SEATTLE, WA, USA http://www.usenix.org/osdi06 Paper submissions due: April 24, 2006 7TH IEEE WORKSHOP ON MOBILE COMPUTING SYSTEMS AND APPLICATIONS (WMCSA 2006) Sponsored by IEEE Computer Society in cooperation with USENIX

APRIL 6–7, 2006, SEMIAHMOO RESORT, WA, USA http://research.ihost.com/wmcsa2006/ Paper submissions due: October 1, 2005

For a complete list of all USENIX & USENIX co-sponsored events, see http://www.usenix.org/events OPINION 2 Musings RIK FARROW OPERATING SYSTEMS 4 FreeBSD 5 SMPng MICHAEL W. LUCAS 8 Better Tools for Kernel Evolution, Please! MARC E. FIUCZYNSKI 11 Solaris 10 Containers contents PETER BAER GALVIN SYSADMIN 15 DNS-based Spam Rejection HOBBIT 19 Finding Trojans for Fun and Profit BORIS LOZA 23 ISPadmin: Embedded Hardware ROBERT HASKINS SECURITY 27 Security Through Obscurity: A Review of a Few of FreeBSD’s Lesser-Known Security Capabilities DAVID MALONE 35 Surviving DDoS Attacks SRIKANTH KANDULA NETWORKING 40 Distributed, Adaptive Resource Allocation for Sensor Networks VOL. 30, NO. 5, OCTOBER 2005 GEOFFREY MAINLAND AND MATT WELSH

EDITOR ;login: is the official BOOK REVIEWS Rik Farrow magazine of the [email protected] USENIX Association. 45 Book Reviews ELIZABETH ZWICKY AND ADAM TUROFF MANAGING EDITOR ;login: (ISSN 1044-6397) is Jane-Ellen Long published bi-monthly by the USENIX NOTES [email protected] USENIX Association, 2560 COPY EDITOR Ninth Street, Suite 215, 48 25 Years Ago Steve Gilmartin Berkeley, CA 94710. PETER H. SALUS [email protected] $85 of each member’s annual 49 SAGE Update dues is for an annual sub- PRODUCTION CHRIS PALMER Rob Carroll scription to ;login:. Subscrip- Casey Henderson tions for nonmembers are 49 USACO, the USA Computing Olympiad $115 per year. TYPESETTER ROB KOLSTAD Star Type Periodicals postage paid at 50 The USENIX Association Financial Report for [email protected] Berkeley, CA, and additional offices. 2004 USENIX ASSOCIATION 2560 Ninth Street, POSTMASTER: Send address CONFERENCE REPORTS Suite 215, Berkeley, changes to ;login:, California 94710 USENIX Association, 55 Linux Kernel Developers Summit Phone: (510) 528-8649 2560 Ninth Street, JONATHAN CORBET FAX: (510) 548-5738 Suite 215, Berkeley, CA 94710. 57 International Workshop on Wireless Traffic http://www.usenix.org http://www.sage.org ©2005 USENIX Association. Measurements and Modeling (WitMeMo ’05) USENIX is a registered trade- 60 Workshop on End-to-End, Sense and Respond mark of the USENIX Associa- Systems, Applications, and Services (EESR ’05) tion. Many of the designa- tions used by manufacturers 64 MobiSys 2005:The Third International and sellers to distinguish their Conference on Mobile Systems, Applications, products are claimed as trade- and Services marks. USENIX acknowl- edges all trademarks herein. 75 HotOS X:Tenth Workshop on Hot Topics in Where those designations ap- Operating Systems pear in this publication and USENIX is aware of a trade- 86 VEE ’05: First International Conference on mark claim, the designations Virtual Execution Environments have been printed in caps or initial caps. 92 Steps to Reducing Unwanted Traffic on the Internet Workshop (SRUTI ’05) FOR THE PAST COUPLE OF MONTHS, I have become absorbed in operating sys-

RIK FARROW tems. My quest began with some consult- ing regarding security features of the cur- rent Linux kernel, including SELinux, then plunged even deeper during the HotOS workshop. I do not pretend to be a kernel hacker, but I am very interested in what musings goes on with the design and implementa- Rik Farrow provides and Internet security con- tion of software. sulting and training. He is the author of UNIX System Security and System Administrator’s Guide to System V and editor of the SAGE Short Topics in System Like some others, I had wondered what had hap- Administration series. pened to FreeBSD 4’s stellar performance when Free- [email protected] BSD 5 appeared. Instead of getting faster, FreeBSD was slower. If I had bothered digging deeper, I would have learned that this was the result of far-reaching changes in the FreeBSD kernel. Michael W. Lucas ex- plains these changes in his article, which in turn is based on a talk given by Robert Watson about modify- ing the kernel to support SMP (symmetric multipro- cessing). And perhaps by the time you read this col- umn, FreeBSD 6 will have appeared, ready to utilize the new multiprocessor cores that are popping up. Lucas explains just why the transition from single- threaded to multi-threaded kernel takes so long and is so hard to do right. I first understood the importance of the Big Giant Lock when I was reviewing an early multiprocessing server that used SPARC processors. I ran a simple benchmark that spawned additional processes, each of which ran an integer-intensive pro- gram. I tried my benchmark with one processor, then two, three, and, finally, four processors enabled, and the results astounded me (at the time). Adding pro- cessors does not in itself linearly improve perfor- mance. Enabling the fourth processor barely added a 15% improvement to the results. The Big Giant Lock ensures that only one process (or interrupt handler) can run in kernel space at a time, which devastates performance. The HotOS workshop (see the summaries in this issue) brought different surprises. I enjoyed the work- shop immensely, as much for the free time spent with attendees as for the talks. The lunchtime discussions have sparked one article already, in which Marc Fiuc- zynski makes a fervent plea for better methods for patching Linux kernels. You will also find a discussion of the Linux Kernel Developers Summit by Jonathan Corbet. Corbet, who has been summarizing the summits for years, has pro- vided a short overview for ;login: readers. And Peter Galvin explains Solaris 10 containers. Note that Sun bit the SMP revision bullet years ago, making it a

2 ;LO GIN: V OL. 30, NO. 5 leader in SMP OSes today. The concept of containers adds a powerful twist to Solaris, a useful VM architecture unlike others you may know about. On the security front, David Malone discusses security features of FreeBSD. As I read this, I found myself wishing that some of these appeared in Linux as well. But BSD-envy is nothing new. Srikanth Kandula explains Kill-Bots, based on a paper he presented at NSDI (see the summaries in the August issue of ;login:). You might note that instead of my usual musings, I have acted much more like an editor this time around. I apologize, but this issue is so packed with sum- maries and articles, I really didn’t have the space for me. You can expect that my column will appear much as it has in the past in the December issue of ;login:, with its focus on security. Again, I do appreciate your feedback about the changes appearing in ;login:. You can send me email at [email protected], along with article proposals, letters to the editor, complaints, and praise. If you have books you want to review (or think should be reviewed), try the new [email protected] alias. You can find a more structured approach to making suggestions about ;login: at https:// db.usenix.org/cgi-bin/loginpolls/oct05login/survey.cgi. To be honest, I am quite happy to be able to approach some of the world’s brightest minds and ask them to write articles about those issues I consider most crucial for the advancement of computing systems. But you do need to make sure I don’t stray too far.

SAVE THE DATE! FAST ’05 4th USENIX Conference on File and Storage Technologies December 13–16, 2005, San Francisco, CA

Join us in San Francisco, CA, December 13–16, 2005, for the latest in file and storage technologies. The 4th USENIX Conference on File and Storage Technologies (FAST ’05) brings together storage system researchers and practitioners to explore new directions in the design, implementation, evaluation, and deployment of storage systems. Meet with premier storage system researchers and practitioners for a day of advanced tutorials and 2.5 days of ground-breaking file and storage information!

;LOGIN: OCTOBER 2005 MUSINGS 3 FREEBSD IS ONE OF THE GRANDPARENTS of open source operating systems, and

MICHAEL W. LUCAS FreeBSD version 4 is considered the gold standard of high performance by its user community. In this article we’ll discuss the improvements in FreeBSD 5, using the net- work stack as an example of the particular- ly heinous problems faced when enhancing FreeBSD 5 SMPng multiprocessor operating systems.

THE NETWORK STACK FreeBSD 5 had many disruptive new features (such as the GEOM disk layer and ACPI) but also had ex- Michael W. Lucas is a network consultant and author tremely high ambitions for its new SMP implementa- of Absolute BSD, Absolute OpenBSD, Cisco Routers for the Desperate, and the forthcoming PGP & GPG. He tion. This SMP infrastructure was necessary for the has been logging onto UNIX-like systems for twenty future growth of FreeBSD, but required massive years and finds lesser operating systems actively rewrites in many parts of the system. The new multi- uncomfortable. CPU project, dubbed “SMPng,” steered FreeBSD in a [email protected] direction that promised incredible performance en- hancements—at the price of a lot of work. To understand why this was considered worthwhile, we need to consider some basics of multiprocessor The author wishes to gratefully acknowl- computing. What we usually think of as “multipro- edge Robert Watson’s strong contribution to cessing” is actually “symmetric multiprocessing,” or this article. SMP. In SMP you use multiple general purpose pro- cessors, all running the same OS. They share memory, I/O, PCI busses, and so on, but each CPU has a pri- vate register context, CPU cache, APIC timer, and so on. This is certainly not the only approach: all mod- ern video cards have a Graphics Processing Unit (GPU), which could be considered a special purpose asymmetric multiprocessor. Managing multiple iden- tical general purpose CPUs has become dramatically more important with the advent of multi-core CPUs. A multiprocessor-capable OS is one that operates cor- rectly on multiprocessor systems. Most operating sys- tems are designed to give the user access to those extra CPUs simply as “more computing power.” While many people have implemented alternatives to this, the general idea that more CPU means more horsepower is still what most of us believe. Additionally, adding processors can’t be allowed to change the look-and-feel of our operating system. Managing these processors becomes much simpler if you abandon the current process model, standard APIs, and so on. Many of these APIs and services were designed for systems with only a single CPU and as- sumed that the hardware had only one processor exe- cuting one task at a time. As with so many other hard- ware evolutions, it would be easier to knock down the house of UNIX and build a strip mall. Instead, the FreeBSD Project has had contractors climbing over the old house to bring it up to today’s building code— and gain some extras while we’re at it.

4 ;LO GIN: V OL. 30, NO. 5 Obviously, the goal of adding processors is improving bottleneck. Shuffle your trade-offs until you reach an performance. The problem is that “performance” is a acceptable average for all of your workloads. very vague term: it depends on the work you’re doing, Traditional OS kernels expect that there is only one and trade-offs happen everywhere. There’s a 100-mpg CPU, and so when the kernel returns to a task, all of carburetor that works wonderfully, if you don’t mind the appropriate resources should be right where they doing 0 to 60 in about an hour. were left for that task. When the machine has multi- The best way to handle performance tuning in SMP is ple processors, however, it’s entirely possible for mul- to measure how your system performs under the tiple kernel threads to access the same data structures workload you’re interested in, and continue measur- simultaneously. Those structures can become cor- ing as you add additional CPUs. The SMP implemen- rupted. This makes the kernel angry, and it will take tation needs to not slow down the application in it- out its feelings on you one way or another. self, and then it needs to provide features to make it You must implement a method of maintaining inter- possible to accelerate the application. In an ideal nal consistency and data synchronization, and pro- world, your application performance would increase vide higher level primitives to the rest of the system. linearly with additional processors—an eight-CPU Some applications require that certain actions be han- system would perform eight times as well as a one- dled as a whole (known as “atomic operations”). Oth- CPU system. This simply isn’t realistic. The OS and ers simply insist that certain things are done before application will be slowed by having to share re- other things. Your synchronization model must take sources such as memory and bus access, and keeping all of this into account, without breaking the API so track of where everything is becomes increasingly badly that you scare off your users. Your choice of complex as the number of CPUs increases. Consistent locking model affects your system’s performance and measuring and benchmarking are vital when embark- complexity, and so is very important. For example, ing on an SMP implementation. let’s contrast the locking model used with FreeBSD 3.x/4.x to that used in 5.x and later. Implementing SMP The Big Giant Lock (BGL) model used in FreeBSD 3.x/4.x is the most straightforward way to implement Implementing SMP is simple. First, make it run. Then SMP. The kernel is only allowed to execute on one make it run fast. Everything else is just petty details. CPU at a time. If a process running on the other CPU The obvious place to begin is in the kernel. Nothing needs to access the kernel, it is held off (spinlocks) happens until your kernel notices the additional until the kernel is released by the other process. processors. Your OS must be able to power up the Contention occurs in the BGL model when tasks on processors, send them instructions, and attend to multiple CPUs compete to enter the kernel. Think everything that makes it possible to use the hardware. about how many types of workload access the kernel: Then your applications must be able to use the addi- user threads that do system calls, interrupts or timer tional CPU. You might find that your important appli- driver activity, reading or writing to disk or networks, cation can only run on one processor at a time, ren- IPC, scheduler and context switches, and just about dering that additional processor almost useless. You everything else. On a two-CPU system this isn’t too can have an application monopolize one CPU while bad—at least you’re doing better than you would with the other CPU handles all the other petty details of one CPU. It’s horrible on a four-CPU system, and un- keeping the system up, but this is less than ideal. thinkable on an eight-way or bigger. Your OS libraries and utilities play a vital part in this. The locks are obviously necessary for synchroniza- Perhaps the most common way to make an applica- tion, but the costs are high. Overall, a dual-CPU sys- tion capable of using parallelism is by using threads. tem is an improvement over a single processor. Your OS must include a multiprocessor-capable and well-optimized threads library. It can do its thread Fine-Grained Locking support in either the kernel or userland. Some appli- cations use more brute-force methods of handling FreeBSD 4.x’s Big Giant Lock was the main perfor- parallelism—the popular Apache daemon forks mance bottleneck, and just had to go. That’s where copies of itself, and those children can automatically fine-grained locking came in. Fine-grained locking is run on separate processors. simply smaller kernel locks that contend less. For ex- Once your application can use additional processors, ample, a process that has entered the kernel to write start measuring performance. By observing the system to a file shouldn’t block another process from entering as it handles your work you can identify and address the kernel to transmit a packet. The FreeBSD devel- the bottlenecks in your real-world load. Once you fix opers implemented this iteratively. First they locked those bottlenecks, benchmark again to find the new the scheduler and close dependencies such as memo-

;LOGIN: OCTOBER 2005 FREEBSD 5 SMPNG 5 ry allocation and timer events. High-level subsystems a car’s oil while said vehicle was barreling down the followed, such as a generic network stack lock or a freeway at 80 miles an hour. file-system lock. They then proceeded to data-based Today, FreeBSD 5.x has fine-grained locking in most locking. Once they hit this point, it was a simple mat- major subsystems, except for VFS. The network stack ter of watching to identify the new bottlenecks and as a whole runs without Giant, although a few net- lock them more finely. work protocols still require it. Some high-end net- Goals along the path include adopting a more thread- work drivers execute without seizing Giant. FreeBSD ed architecture and implementing threads where the 6.0 (which should be out by the time this article kernel can work in parallel. Interrupts in particular reaches print) is almost completely Giant-free. VFS it- were permitted to execute as threads. FreeBSD also self runs without Giant, although some file systems had to introduce a whole range of synchronization do not. (Those of you running high-performance primitives such as mutexes, sx locks, rw locks, and databases on a FAT file store, with a server using semaphores. Low-level primitives are mapped into those dollar-a-dozen Ethernet cards, might not be higher level programming services. Atomic opera- pleased with its performance.) A few straggling device tions and IPIs are at the bottom, which are used to drivers require the BGL, but those are slated for con- build mutexes, semaphores, signals, and locks, version or execution before FreeBSD 7.0 is released which, in turn, are assembled into lockless queues (probably in 2007). The network stack also runs and other structures at the very top. without the BGL. As locking the network stack was As this was a gradual migration rather than an all-at- one of the more interesting parts of implementing once conversion, subsystems that were not yet prop- fine-grained locking, let’s take a closer look at it. erly locked during the conversion were allowed to “seize” the Giant Lock. A device driver that had not Locking the Network yet been converted was allowed to scream “OK, everybody out of the kernel, I must process an inter- The FreeBSD network stack includes components rupt!” This was called “holding” the Giant Lock, and such as the mbuf memory allocator, network device it reduced performance to the 3.x/4.x level. drivers and interface abstractions, a protocol-inde- One by one, each system was finely locked, the BGL pendent routing and event model, sockets and socket slid off, and newly exposed problems resolved. This buffers, and a slew of link-layer protocols and net- produced a very different contention pattern. work-layer protocols such as IPv4, IPv6, IPSec, IPX, EtherTalk, ATM, and the popular Netgraph extension This was complicated, of course, by the fact that framework. Excluding distributed file systems and FreeBSD is a project in use by millions of people device drivers, that’s about 400,000 lines of code. To around the world. People even consider selected complicate things further, FreeBSD’s TCP/IP stack has points along the development version stable enough been considered one of the best performers for many for production use. If the FreeBSD team could have years. It’s important not to squander that reputation! simply declared, “The development branch of FreeB- SD will be utterly unusable for six months,” fine- Locking the network stack has very real problems. grained locking could have been accomplished more Overhead is vital: a small per-packet cost becomes quickly. They would have also alienated many users very large when aggregated over millions of packets and commercial sponsors. While commercial compa- per second. TCP is very sensitive to misordering, and nies can get away with this, a project like FreeBSD interprets reordered packets as requiring fast retrans- simply can’t. The team did the equivalent of changing mit. Much like our 100-mpg carburetor, different op-

6 ;LO GIN: V OL. 30, NO. 5 timizations conflict (e.g., optimizing for latency can but weakening ordering can improve performance in damage throughput). certain cases. FreeBSD uses a few general strategies for locking the Awareness of locking order and violations is critical network stack. Data structures are locked. Also, locks throughout this. The WITNESS run-time lock-order are no finer than that required by the UNIX API— monitor tracks lock-order acquisitions, builds a graph e.g., parallel send and receive on the same socket is reflecting the current lock order, and detects lock- useful, but not parallel send on the same socket. Ref- order cycles. It also confirms that you’re not recur- erences to in-flight packets are locked, not the pack- sively locking a non-recursive lock as well as detect- ets themselves. Layers have their own locks, as ob- ing other basic problems. WITNESS uses up a lot of jects at different layers have different requirements. CPU time but is invaluable in debugging. Locking order is vital. Seizing locks incorrectly can Every lock is another slice of overhead. FreeBSD 5.x cause deadlocks. Driver locks are leaf locks. The net- amortizes the cost of locking by avoiding multiple work protocol drives most inter-layer activity, so pro- lock operations when possible, and it amortizes the tocol locks are acquired before either driver locks or cost of locking over multiple packets. When possible, socket locks. FreeBSD 5 avoids lock problems via de- locks are coalesced or reduced. Combining locks ferred dispatch. across layers can avoid additional locks. If you lock Transmission is generally serial, so the work is as- finely enough you can cause “live lock,” where your signed to a single thread. Reception can be more par- system is so busy locking and unlocking from inter- allel, so work can be split over multiple threads. rupts that it does no actual work. Some workloads handled parallelization better than others. Parts of the network stack, such as TCP, ab- Increasing Parallelism solutely require serialization to avoid protocol perfor- All of the above is just “making it run.” Afterwards mance problems. Any sort of naive threading violates came time to “make it run fast.” Once the network ordering. FreeBSD uses two sorts of serialization: stack is freed of the Big Giant Lock, pick an interest- thread serialization and CPU serialization, which uses ing workload and see where contention remains. per-CPU data structures and pinning/critical sections. Where was CPU-intensive activity serialized in a sin- gle thread, causing unbalanced CPU usage? Identify- Today’s SMPng Network Stack ing natural boundaries in processing, such as protocol hand-offs, layer hand-offs, etc., both restricted and in- FreeBSD 5.x and above largely run the network stack spired further optimizations. Every trade-off had to be without the Big Giant Lock, and 6.x shows substan- carefully considered and then tested to confirm those tial improvements over both 4.x and 5.x. The project ideas. Context switches and locks are expensive, so has progressed from raw functionality to performance they had to be made as useful as possible. tuning. The development team is paying close All this had its own challenges. The FreeBSD-current attention to the performance of popular applications mailing list (for those people using the development such as MySQL, as well as basic matters such as raw version) saw many reports of deadlocks, poor perfor- throughput. mance under edge situations, and any sort of weird Many workloads, such as databases and multi-thread- issue imaginable. While FreeBSD’s sponsors were very ed/multi-process TCP use, show significant improve- generous with donations of test facilities, no test can ments. The cost of locking hampers per-packet per- possibly compare with the absurd range of conditions formance on very specific workloads, such as the found in the real world. packets per second when forwarding and bridging One not uncommon problem during development packets. And, compared to the gold standard of 4.x, was deadlock. If threads one and two both require performance on single-processor systems is some- locks A and B, and thread one holds A while thread times suboptimal. While single-processor perfor- two holds B, the whole system grinds to a halt. This mance is being carefully monitored and enhanced, as deadly embrace was avoided by a hard lock order on dual-core systems become the standard even on most mutexes and sx locks, disallowing lock cycles, workstation systems this will be less important. The and the WITNESS lock verification tool. There’s also a FreeBSD team is actively working on performance variable, hierarchal lock order. Lock order is a proper- measurement and optimization. ty of data structures, and at any given moment the Juggling all these optimizations is hard; it took about lock order is defined for that data structure. The lock five years to get past merely functional to optimal. order can change as the data structure changes. And a The oil change at 80 mph is just about done, though, master lock serializes simultaneous access to multiple and it’s time to floor the accelerator and see what this leaf locks. Ordering was vital to avoiding deadlock— baby can do.

;LOGIN: OCTOBER 2005 FREEBSD 5 SMPNG 7 THIS ARTICLE SUMMARIZES A FORAY into the land of Linux, revealing the soft

MARC E. FIUCZYNSKI underbelly of the animal called kernel. This voracious animal is eagerly eating up better tools for others, but how much longer can it do so before its belly bursts? kernel evolution, Analogies aside, to many users Linux is great—it is please! cheap (free in many cases) and often works quite well for the intended purposes. Generally, such users treat Dr. Marc E. Fiuczynski is a research scientist in the Computer Science department at Princeton Linux as a black box by using an unmodified distribu- University and a member of the PlanetLab R&D tion such as Fedora, SuSE, etc. For users for whom a team. conventional distribution is insufficient, Linux, as one [email protected] of the quintessential open source projects, offers un- limited flexibility for customization. Moreover, vari- ous kernel extensions released as patch sets, or simply patches, offer unique and useful features not available in the mainline version of the kernel. Unfortunately, maintaining a customized kernel can be challenging, particularly when it is necessary to keep track of security updates, bug fixes, and general enhancements to the mainline kernel. The reason is that externally developed kernel extensions are often available as patches that typically apply only to the vanilla mainline kernel. While these patches some- times apply to the latest mainline kernel, often they do not and so require integration work. While such integration (merging) may be trivial at times, it can quickly get out of hand. Why? Even when these patches apply cleanly to the latest release from kernel.org, they often do not to the latest releas- es from distributions such as Fedora or SuSE. These distributions, to set themselves apart from others, in- troduce their own value-added modifications to the kernel and, in some cases, tend to be ahead of the sta- ble 2.6 release by integrating features from the unsta- ble kernel.org releases. Consequently, for those using a customized kernel in a production setting, the job of keeping on top with the latest and greatest kernel can become a tedious one that is pure overhead. This has been my experience while maintaining the Linux kernel used for PlanetLab (http://www .planet-lab.org). PlanetLab is a geographically distrib- uted overlay platform designed to support the deploy- ment and evaluation of planetary-scale network ser- vices. As of August 2005, it consists of over 580 machines at 275 sites in 30 countries, and it has sup- ported over 450 research projects. PlanetLab contin- ues to grow at a rate of approximately five sites and 10 machines per month. Each machine runs a cus- tomized version of Linux to support a “virtual private server” (VPS) model, which is used to isolate separate research projects running on a single machine from

8 ;LO GIN: V OL. 30, NO. 5 each other. At one point the kernel for PlanetLab was etc.) these changes need to be integrated into the ker- modified by 28 patches—both externally developed nel proper, hence driving kernel programmers to and homegrown. The kernel was so tedious to main- achieve the ultimate integration of their modification tain that at one point it lagged eight minor releases into the mainline kernel managed by Torvalds. To behind the 2.4 mainline kernel release. Upon switch- avoid this problem altogether, I am working with a ing to the 2.6 kernel release, we focused on reducing team spread across New York University, University of the patch count to a minimum with the goal of keep- Victoria, and Princeton University on a toolkit called ing close to the latest mainline kernel release. None- C4, for CrossCutting C Code. theless, our kernel still uses several large patches to As part of our work on C4, our analysis of various support performance isolation and namespace isola- patch sets reveals that kernel extensions primarily tion for said VPS support. make intramodule changes: a coherent collection of At this point, you, likely a Linux user, may be think- modifications encapsulated within an existing mod- ing, “Hey man, quit the whining. This problem only ule. These changes may modify many of the functions affects a small group that have a vested interest in within a particular module, but they do not change maintaining a highly customized kernel.” I humbly the externally visible interface. Clients of the module disagree. With the rampant success of Linux, this do not need to change their code and usage patterns. “small” group is growing by leaps and bounds. Be- Please note the loose use of the word “module” to sides well-known players like RedHat, SuSE, and mean any collection of components with a well-de- IBM, there is a growing number of corporations (e.g., fined external interface, including kernel subsystems. PalmSource, Wind River Systems, Panasonic, NEC, In particular, a module is not limited to a single ker- NTT DoCoMo, to name just a few) and government nel source file. agencies worldwide that are putting a tremendous Some kernel extensions also make intermodule amount of effort into the Linux kernel. The happy changes: modifications that change intermodule in- days of the Linux phenomenon probably are num- terfaces, or the visible semantics of an existing inter- bered. Why? Rather than working toward the general face, in fundamental ways. For instance, modifica- good, corporate kernel programmers will try as hard tions of a function’s type or the field makeup of a data as possible to push their “modifications” into the structure (e.g., adding, deleting, or changing the type Linux kernel in pursuit of their own agendas. It is just of a field) are intermodule changes. Making such a matter of time before the soft underbelly bursts. changes can have far-reaching consequences: it can When this happens, significant infighting will ensue, require updating all modules that directly use the leading to fragmentation or who knows what—noth- changed code. This is prohibitive when the interface ing that’s good for any community. What can be done changes are in the kernel proper or in the generic de- about this? Will Linus Torvalds keep everything vice driver framework and trigger corresponding under control so that the rest of us can continue to changes in specific device drivers—there might be obliviously toil along with Linux? No! To address this hundreds. Doing such updates manually is error- problem it is not prudent to rely on a bazaar, cabala, prone and time-consuming. cathedral, benevolent pope, or any sociopolitical Recognizing that intramodule and intermodule model. No one truly understands or can predict the changes are common to new kernel extensions for long-term outcome of such models. As engineers we Linux, our approach with C4 is to make them part of have two choices: (1) blissfully ignore the problem the kernel’s architecture by leveraging aspect-oriented and hope the day of reckoning will never arrive, or programming (AOP) techniques. Whereas object- (2) turn it into a technological problem that we can oriented programming provides linguistic mechanisms attempt to solve—i.e., find better ways to evolve the for structuring self-contained units of code, AOP pro- kernel proper with kernel extensions. vides linguistic mechanisms for structuring concerns My position paper titled “patch(1) Considered Harm- that naturally cut across primary modules of a system. ful,” presented at this year’s Workshop on Hot Topics More specifically, our approach is to express intramod- in Operating Systems , outlines why patch is harmful ule changes as semantic patches using aspects, which for the evolution of the kernel (see the HotOS sum- provide a language-supported methodology for inte- maries in this issue of ;login:). In a nutshell, patch is grating crosscutting concerns with a program. good for localized fixes but bad for kernel extensions The benefits of aspects are twofold. First, they pro- that introduce changes which crosscut many files vide a well-defined specification of domain-specific and/or functions. Analysis of patches revealed that features that is separate from baseline functionality, kernel extensions can easily cover a hundred existing yet can be automatically integrated with the kernel. kernel files, even though it represents a logical unit, Second, we believe that aspects will enable tools to expressing a single crosscutting concern. The crux of perform automatic analysis of the implications of the problem is that with today’s tools (patch, cvs, bk,

;LOGIN: OCTOBER 2005 BETTER TOOLS FOR KERNEL EVOLUTION, PLEASE! 9 composing several crosscutting concerns and there- extensions will be part of the mainline release, the fore help identify true semantic conflicts, as opposed kernel.org folks can declare a preferred composition to the line-by-line conflicts identified by patch. of these extensions, but others will be truly free to Thus, a toolkit like C4 provides an opportunity to the choose what they prefer—i.e., no longer will there be growing number of well-funded (and under-funded) a need to patch in code and resolve merge conflicts kernel developers to have their code included with between separate extensions. the latest mainline kernel. The automatic integration Building a toolkit to solve this problem is a tall of kernel extensions at build time will leave the ker- order, and the C4 toolkit is just one approach. There nel proper largely unencumbered. In contrast, the ex- are undoubtedly others. For more information isting model litters the kernel proper with unneces- and an initial release of the C4 toolkit, please see sary #include statements and code fragments that do http://c4.cs.princeton.edu. nothing until the corresponding CONFIG option is enabled. More important, Torvalds et al. no longer will need to decide what kernel extensions make it onto the mainline kernel.org release. Rather, since the

Thanks to USENIX Supporting Members

Addison-Wesley/Prentice Hall PTR Microsoft Research AMD NetApp Asian Development Bank Oracle Cambridge Computer Services, Inc. OSDL Delmar Learning Perfect Order Electronic Frontier Foundation Raytheon Eli Research Ripe NCC Hewlett-Packard Splunk IBM Sun Microsystems, Inc. Intel Taos Interhack Tellme Networks The Measurement Factory UUNET Technologies, Inc.

It is with the generous financial support of our supporting members that USENIX is able to fulfill its mission to:

• Foster technical excellence and innovation • Support and disseminate research with a practical bias • Provide a neutral forum for discussion of technical issues • Encourage computing outreach into the community at large

We encourage your organization to become a supporting member. Send email to Catherine Allman, Sales Director, [email protected], or phone her at 510-528-8649 extension 32. For more information about memberships, see http://www.usenix.org/membership/classes.html.

10 ;LO GIN: V OL. 30, NO. 5 THE CONCEPT IS SIMPLE: ALLOW multiple copies of Solaris to run within one

PETER BAER GALVIN physical system. Indeed, creating and using Solaris zones is simple for experienced sys- tem administrators. But then why are there so many questions surrounding this new Solaris 10 Solaris feature? Just what is a container? How do you upgrade it? How can you limit containers its resource use? Does it work like VMware? Peter Baer Galvin is the chief technologist for And so on. In this article I describe the theo- Corporate Technologies, Inc., a systems integrator and VAR, and was the systems manager for Brown ry of Solaris 10 containers, and the facts be- University’s Computer Science Department. He is hind creating, managing, and using them. currently contributing editor for SysAdmin Magazine, where he manages the Solaris Corner, and is co- author of the Operating Systems Concepts and Applied Operating Systems Concepts textbooks. Overview [email protected] Solaris 10 containers are a new feature of Solaris. A container is a virtualized copy of Solaris 10, running within a Solaris 10 system. While this is similar in concept to VMware, for example, it is more of a dis- tant relative than a blood brother. Perhaps its closest relative is BSD jails. Containers have a different pur- pose from VMware (and other operating system virtualization software, such as Xen). Those tools create a virtual layer on which multiple operating systems can run. Containers run within Solaris 10, act only as Solaris 10 environments, and only run So- laris applications. Before I delve further into containers, a clarification is needed. Solaris 10 has the concepts of “zones” and “containers.” Simply, a container is a zone with re- source management (including fair-share scheduling) added. Mostly the terms are used interchangeably, but where needed I will point out differences. Given that limited Solaris 10 view of virtualization, of what use are containers? Consider the following set of features that containers add to Solaris 10: I Up to 8192 containers can exist on the same sys- tem. The “global zone” is what would normally be called “the operating system.” All other containers are referred to as “non-global.” I Containers can share resources with the global zone, including binaries and libraries, to reduce disk-space requirements. An average container takes 60MB of disk space or less. A container shar- ing files with the global zone is known as a “sparse container” and is created via a sparse install. I Because binaries are shared (by default), and So- laris optimizes memory use by sharing objects in memory when possible, an average container uses approximately 60MB of memory when booted.

;LOGIN: OCTOBER 2005 SOLARIS 10 CONTAINERS 11 I By default a package (Sun’s concept of an install- gation facility. The same application could be run in able application) installs in the global zone and all multiple containers, use the same network ports, and nonglobal zones. Likewise, a patch will by default be unaware of any of its brothers. A container can install in all containers that contain the packages crash and reboot without affecting any other contain- to which the patch refers. ers or the global zone. An application can run amok I As mentioned above, a container is a resource- in a container, eating all available CPU but leaving the managed zone. The first release of Solaris 10, on other containers with their guaranteed fair-share which this article is based, includes CPU use as a scheduling slices. A user with root access inside of a manageable resource. (Note that there are now container may control that container, but cannot es- three streams of Solaris release: the standard com- cape from the container to read or modify the global mercial release; the “Express” updates that arrive zone or any other containers. In fact, a container every month or so, and for all intents are beta re- looks so much like a traditional system that it is a leases available to anyone interested; and the Open common mistake to think you are using the global Solaris community release, which is a periodic zone when you are “contained.” An easy navigation snapshot of the internal build of Solaris based on solution is found in the command zonename. It dis- the Open Solaris release, and is the least tested of plays the name of the current zone. I suggest you these. This latter release is the most recent of the have that output displayed as part of your prompt so builds, but also the most likely to have problems.) that you always track the zone you are logging in to. As with other OS-virtualizing technologies, contain- ers are secure from each other, allowing only network Creation access between them. They also have their own net- work configurations, having their own IP addresses The creation of a container is a two-step process. The and allowing variations in network membership (sub- zonecfg command defines the configuration of a con- nets and network masks, for instance). tainer. It can be used interactively or it can read a configuration file, such as: There are also some limits that come with containers create -b (most of these are likely to be removed in future re- set zonepath=/opt/zones/zone00 leases of Solaris): set autoboot=false I A container cannot be an NFS server. add inherit-pkg-dir I A container cannot be moved to another system set dir=/lib (i.e., imported or exported). end I A container must have a pre-set network address add inherit-pkg-dir (i.e., it cannot be DHCP-configured). set dir=/platform end I The only container-manageable resource as of the add inherit-pkg-dir first release of Solaris is CPU shares. A container set dir=/sbin could, for example, use all the system’s virtual end memory. add inherit-pkg-dir I Due to the security restriction that a nonglobal set dir=/usr container be securely separate from other contain- end ers, some features of Solaris 10 do not work in a add inherit-pkg-dir container. The most limiting is DTrace, the ex- set dir=/opt/sfw traordinary Solaris 10 debugging/analysis tool. end I A container cannot run Linux binaries natively add net (the initial container marketing from Sun to the set address=131.106.62.10 set physical=bcme0 contrary notwithstanding). Likewise not currently end supported, a container cannot have its own fire- add rctl wall configuration. Solaris 10 uses ipfilters as its set name=zone.cpu-shares firewall technology; ipfilters can be configured in add value (priv=privileged,limit=1,action=none) the global zone only. end I By definition, a container runs the same operating In the above example, I create zone00. It will not au- system release as the global zone. Even kernel tomatically boot when the system boots. It will be a patches installed on the system affect all contain- sparse install, inheriting the binaries, libraries, and ers. Only application patches or non-kernel oper- other static components from the global zone, for all ating system patches can vary between containers. of the inherit-pkg-dir directories added. It will have So what we are left with is a very lightweight, easy to the given network address, using the Ethernet port manage, but in some ways limited application segre- known as bcme0. Finally, it will have a fair-share

12 ;LO GIN: V OL. 30, NO. 5 scheduler share of 1. (See below for more information Placing this information in the sysidcfg file in the /etc on the fair-share scheduler.) The command zonecfg directory of the container (/opt/zones/zone00/root/ –z zone00 –f config-file will read the configuration etc/sysidcfg) before the first boot provides most of the (from the file named “config-file”). Either a sparse in- sysifconfig answers. Now the container can be boot- stall or a full container install is possible. If there are ed. To connect to the container console, as root, you no inherit-pkg-dir entries, all packages from the glob- can use zlogin –C zone00. Here you can watch the al zone are installed in full. Such a container can take boot output. Unfortunately, there is still an NFSv4 3GB or more of storage but is then less dependent on question to answer for sysidconfig, but once that is the global zone. Typically, sparse installation is used. done the container is up and running. The first boot Several options are available with zonecfg. An impor- also invokes the new Solaris 10 service management tant one is fs, which mounts file systems within the facility, which analyzes the container and adds ser- container. Not only can file systems from other hosts vices as prescribed by the installed daemons and be mounted via NFS, but loopback mounts can be startup scripts. Future boots of the container only used to mount directories from the global zone with take a few seconds. the container. For example, to mount /usr/local read only as /opt/local in zone00, interactively: Management # zonecfg –z zone00 Once a container has been installed and booted, it is zonecfg:zone00> add fs zoneadm list zonecfg:zone00:fs> set dir=/usr/local fairly self-sufficient. The command will zonecfg:zone00:fs> set special=/opt/local show the status of one or all containers. zonecfg:zone00:fs> set type=lofs I find a script to execute a command against every zonecfg:zone00:fs> add options [ro,nodevices] container is helpful. For example: zonecfg:zone00:fs> end #!/bin/bash When the zone is rebooted, the mount will be in place. for z in zone00 zone01 zone02 zone03 zone04 Next, zoneadm –z zone00 install will verify that the zone05 zone06 zone07 zone08 zone09 zone10 zone11 zone12 zone13 zone14 zone15 zone16 configuration information is complete, and if it is will zone17 zone18 zone19 zone20; perform the installation. The installation for the most do part consists of package installations of all of the zoneadm -z $z boot packages in the inherit-pkg-dir directories, but only zlogin -z $z hostname; those parts of the packages that are nonstatic (config- done uration files, for example). The directory under Note that the zlogin command, when run as root in which the container will be created must exist and the global zone, can execute a command in a specified must have file mode 700 set. Typically, a container in- zone without a password being given. stallation takes a few minutes. Another administrative convenience comes from the Once the container is created, zoneadm –z zone00 direct access the global zone has to the other contain- boot will boot the container. The first boot will cause ers’ root file systems. Root in the global zone can copy a sysidconfig to execute, which by default will files into the directory tree of any of the containers via prompt for the time zone, root password, name ser- the container’s root directory. vices information, and so on. Rather than answer those questions interactively, a configuration file can By default, any package added to the global zone is do the trick. For example: added to every container on the system. Likewise, any name_service=DNS patch will be added to every container that contains { the files to which that patch applies. It is a bit surpris- domain_name=petergalvin.info ing to watch pkgadd boot each installed but nonrun- name_server=131.106.56.1 ning container, install the packages, and then return search=arp.com the container to its previous state. But that is an ex- } ample of just how well integrated into Solaris 10 con- network_interface=PRIMARY tainers are. There are options to the pkgadd and { patchadd to override this behavior. hostname=zone00.petergalvin.info } timezone=US/Eastern Fair-Share Scheduler terminal=vt100 system_locale=C As previously mentioned, the fair-share scheduler can timeserver=localhost be used in conjunction with zones to make them into root_password=bwFOdwea2yBmc containers. A container then may be limited in its security_policy=NONE share of available CPU scheduling slices. Fair share is

;LOGIN: OCTOBER 2005 SOLARIS 10 CONTAINERS 13 a complex scheduling algorithm in which an arbitrary networking. If a set of applications uses shared mem- number of shares are assigned to entities within the ory, then they obviously need to be in the same con- computer. A given entity (in this case a container, but tainer. Much will depend on the support by software it could be a collection of processes called a project) creators and vendors of containers. Early indications then is guaranteed its fair share of the CPU—that is, are good that vendors will support their applications its share count divided by the total. If some CPU is being installed inside containers. unused, then anyone needing CPU may use it, but if Some of the design decisions surrounding containers there is more demand than CPU, all entities are given will limit how they are used. For example, it does not their share. There is even the notion of scheduling make sense to install a development, QA, and produc- memory, in that an entity that used more than its fair tion container for a given application on the same sys- share may get less than its share for a while when tem. Development might want to be using a different some other entity needs CPU. operating system release than the one currently in use The system on which a container runs must have the in production, for example. Or QA may need to test a fair-share scheduler enabled. First, the scheduler is kernel patch. It would be reasonable to create a con- loaded and set to be the default via dispadmin –d FSS. tainer for each developer on a development server, The fair-share scheduler is now the default on the sys- however, to keep rogue code from crashing that tem (even surviving reboots), and all new processes shared system or hogging the CPU. For those that are put into that scheduling class. Now all the current host applications for others (say, application service processes in the timesharing class (the previous de- providers), containers are a boon for managing and fault) can be moved into fair share by priocntl –s –c securing the multiple users or companies that use the FSS –i class TS. Finally, I’ll give the global zone some applications on their servers. shares (in this case, five) so it cannot be starved by the containers: prctl –n zone.cpu-shares –v 5 –r –i Conclusion zone global. To check the share status of a container (in this case, the global zone), use prctl –n zone.cpu- The Solaris 10 container facility is a different take on shares –i zone global. Take note that the prctl com- virtualized operating system environments commonly mand share setting does not survive reboots, so this found on other systems. Only one Solaris release may command should be added to an rc script to make it run on the system, and even kernel patches affect all permanent. containers. Beyond that, containers offer a cornu- copia of features and functions. They are lightweight, Use so they may be arbitrarily created and deleted. They are secure, protecting themselves from other contain- Containers are new, and therefore best practices are ers and protecting the global zone from all containers. still in their infancy. Given the power of containers And they can be resource managed (although only and the low overhead, I would certainly expect that CPU resources at this point) to avoid a container from most Solaris 10 systems will have containers config- starving others. Further, containers are integrated ured. An obvious configuration is that only root users into other aspects of Solaris 10, such as package and access the global zone, and all other users live in one patch management. Containers are a welcome or more containers. Another likely scenario is that addition to Solaris 10 and allow for improved utiliza- each application be installed in a nonglobal container, tion of machines, as well as more security and or each into its own container, if it only communi- manageability. cates with the other applications on that system via

14 ;LO GIN: V OL. 30, NO. 5 TIRED OF BEING HAMMERED BY spam and virus attacks from spyware-

HOBBIT infested ISP customers? Here is an easy and entertaining way to use the providers’ own DNS naming schemes against them. In the absence of ISPs really doing anything on DNS-based spam their end, we can use a few well-construct- ed filtering rules at our end to deny email rejection delivery from suspect networks, and still Hobbit has played the “Internet insecurity” game leave a path open for legitimate messages. long enough to realize that it’s hopeless as long as those bothersome and complacent humans are still in the loop. When he isn’t working on heavy-handed It is no secret that a huge flood of spam and malicious approaches to spam control, he’s probably outside hacking on the Prius. email emerges from compromised machines in homes and businesses. The botnets grow larger by the day [email protected] and have become a profitable underground offering. 0wned machines are used to launder connections and launch all sorts of attacks, using the convenient high- speed bandwidth in the customer infrastructure of the ISPs that connect them. Unfortunately, many major ISPs are behind the curve in preventing this, and ob- stinately remain so even though what they should be doing is common knowledge. They want to retain their neutral status as a carrier, not to be responsible for traffic filtering. Many ISPs have been forced to block downstream packets to obvious problems such as Windows file- sharing ports, but they have put little effort into up- stream filtering, thinking that it would cause too many (more) support complaints. Some, such as UUnet and Concentric, have denied direct SMTP de- livery from their own untrusted customer clouds, and they’ve been successfully running such configurations for years. The result is that spam and virus/trojan at- tempts rarely, if ever, arrive from their infrastructures. I believe that Comcast began experimenting with fil- tering in the cable-modem swamps, but that seems to have been undone recently. What is one to do about the rest of the ISPs that sim- ply let the stuff out? Well, another area that ISPs do seem to pay more attention to is DNS naming within the customer networks, and it turns out we can rely on that much more than their ability to keep a lid on the botnets. Largely automated within turn-key DHCP servers, each address that appears on cable and DSL networks generally receives some kind of valid PTR record in the DNS server authoritative for those .IN-ADDR.ARPA blocks. And since the naming is ma- chine-generated, based on the client’s IP or MAC ad- dress, it usually follows some recognizable pattern such as these: c-24-128-171-15.hsd1.ma.comcast.net pcp05184511pcs.plsntv01.nj.comcast.net

;LOGIN: OCTOBER 2005 DNS-BASED SPAM REJECTION 15 pool-151-203-213-167.bos.east.verizon.net fl-71-0-153-49.dyn.sprint-hsd.net 15-95.200-68.tampabay.res.rr.com dsl-67-114-79-114.dsl.lsan03.pacbell.net CPE0008a10ba047-CM014100000470.cpe.net.cable.rogers.com Almost all of these names imply something about the client and/or the sur- rounding network infrastructure. They are unlikely to be applied to infrastruc- ture machines for the ISP itself, such as the mail-relay servers customers are ex- pected to use for their outbound mail. Despite the fact that DNS is considered a generally untrustworthy source of data, these PTR names are at least somewhat trustable, since an SMTP server will generally look them up “out of band” via servers that are less likely to be under a spammer’s control. If a spammer is inter- cepting all of your DNS queries, then you’ve got a much larger problem. Thus, as part of an overall best-practice set of anti-spam measures, we can take advantage of such naming schemes and detect if a generic customer is attempt- ing to deliver mail directly, instead of passing it through the local ISP. We can then deny delivery with a rejection message asking the sender (if it’s a human paying attention at all) to please use the ISP’s authorized relayer to send the message. I do this within Postfix, and all examples herein are based on the filter- ing features that Postfix has to offer, but the concepts should easily map to other common SMTP servers. We can hope that the ISP mail relay also applies a few anti-spoofing rules to make sure the headers aren’t forged—this is in fact proba- bly not the case in most instances due to the management overhead, but as luck would have it, the mechanism works because most ISP customers get configured by default to drop off their mail at the ISP relay. The few who want to do their own direct sending will see the error and, hopefully, resend the message via the ISP to get it through. (Mildly political rant, below, about the relative merits of each path.) So, how is this implemented? In Postfix, the SMTP server is able to look up and act upon details about the client connecting to it. Restrictions may be imposed by adding one or more instructions to check client name or IP attributes in main.cf: smtpd_xxx_restrictions = ..., check_client_access regexp:/etc/postfix/client_acl, ... where xxx can indicate one of several stages of SMTP delivery. The most com- mon section is smtpd_recipient_restrictions, which allows collecting as much log detail about the client and the message as possible. The regexp: tag can also be hash: or dbm: or pcre:, depending on which style of data storage is desired and which version of Postfix, but it is likely that only regular-expression-based matching will be useful here. The client access directive is usually found some- where in a list amidst several other types of checks and special cases—allowing local or authenticated connections, looking up addresses in RBL services, etc. The client_acl file itself contains an ordered list of expressions to match, actions to take upon match, and optional error-message text for rejections. Here’s where the DNS-based magic happens, along with any other network-level checks need- ed. Rules are matched for both DNS lookups and the ASCII representation of the IP address, which rocks because of the seamless versatility that it lends. Some examples: # *all* of cybermall, 207.0.62.0/23 /^207\.0\.6[23]\./ 550 No Soliciting # oops, several of yahoo’s legit relays keep landing in spamcop +^216\.155\.201\.[56]+ OK # block all of Latvia [I don’t know anyone there] /\.lv$/ 550 Denied

16 ;LO GIN: V OL. 30, NO. 5 # random known spam-nests /cablemas\.net$/ REJECT /\.popsite\.net$/ 550 Access denied (spam) Note that it’s still all regexp string comparisons, so we can’t specify CIDR nota- tion, but that’s fine—to deny partial blocks or aggregates, the [set] syntax of ad- dress digits gets close enough. I tend to err on the broader side if I’m slamming the door on some podunk ISP that appears to be spammer-friendly. Postfix regu- lar expressions have a couple of very minor differences from normal ones, and by default are case-insensitive unless a modifier is used. Rule processing is nor- mally “at first match, take the action and exit the list.” Inside the set of client rules, two approaches can be taken for filtering a given provider: either whitelisting its legitimate relay servers and denying the rest, or blacklisting its known customer networks using patterns. The approach taken often depends upon how the ISP does its naming, where it locates its mail- servers, etc. It takes a bit of digging and actually reading the spam carefully to get a clear picture of their infrastructure, but once that’s done the rest is easy. More examples—first, of names that lend themselves to easy one-shot identification: # general classes of dialup/DHCP clients /[-.]dial/ 550 Use your ISP’s mail relay /dial[-inu.]/ 550 Use your ISP’s mail relay /dhcp[-0-9]/ 550 Use your ISP’s mail relay # AOL direct-dialup customers get dropped into this swamp /ipt\.aol\.com$/ 550 Use your ISP’s mail relay /ipt\.aol\.net$/ 550 Use your ISP’s mail relay # ATTBI dynamics—client-mumbledyfoo.attbi.com /client.*\.attbi\.com$/ 550 Use your ISP’s mail relay # typical comcast client naming /client.*\.comcast\.net$/ 550 Use your ISP’s mail relay /^pc.*\.comcast\.net$/ 550 Use your ISP’s mail relay /\.hsd[123]..*\.comcast\.net$/ 550 Use your ISP’s mail relay An example with special-casing—I have some correspondents in the Albany, NY area who usually send directly, so I let them in ahead of the rest of Verizon’s dy- namic blocks. Even trying to keep it this tight still lets spammers connect once in a while: /pool-129.*\.alb\.east\.verizon\./ OK /pool-141.*\.alb\.east\.verizon\./ OK # all dhcp/pppoe verizon clients seem to match this... /pool.*verizon\.net$/ 550 Use your ISP’s mail relay For Roadrunner, looking up their MXes gives a pretty clear picture of which naming classes are likely to deliver legitimate mail—usually from something.mgw.rr.com, but over time I discovered some additional ones: /mgw\.rr\.com$/ OK /smtp-.*\.rr\.com$/ OK /mx[-0123].*\.rr\.com$/ OK /\.rr\.com$/ 550 Use your ISP’s mail relay An example done purely by IP address—a particular business I was dealing with is unfortunately located within one of Electric Lightwave’s netblocks. ELI ap- pears to be a spammer petri dish, so they don’t get to talk to me, period. Rather than ask the business to change to a better provider, we can allow mail from a small chunk they’re in and then deny the rest of their main /15: ## -case: SR-systems sits within an ELI block, but let ’em in /^208\.187\.213\./ OK /^208.18[67]\./ 550 Denied (ELI, spam) Obviously, techniques like this aren’t for everyone and are certainly not a solu- tion by themselves, but they do kill quite a bit of spam once the major botnet of-

;LOGIN: OCTOBER 2005 DNS-BASED SPAM REJECTION 17 fenders are added in correctly. It is prudent to research as much as is externally visible about each ISP—using spam we receive through them, legitimate mes- sages we receive, MX lookups, their customer-service Web pages, and possibly even Googling for archived message headers in postings from some of their cus- tomers just to read how their mail was delivered. For large IP blocks it may be better to use no-logging firewall rules to completely hide the mailservers from them and avoid piling up big log files of rejections. As jingoistic as it may seem, I have had to occasionally deny huge RIPE/APNIC/LACNIC allocations right up front just to stem the tide. There are also many other techniques that can be done inside Postfix that are be- yond the scope of this article. There are plenty of FAQs on spam-busting kicking around and pointed to from the Postfix.org Web site. A nice feature of using Postfix’s own features, such as client parsing, is that it’s fast—the regular expres- sion matching runs natively right inside Postfix and doesn’t need to farm out to external plug-ins or start up extra processes and Perl interpreters. SpamAssassin and the like seem somewhat less attractive because they sit outside Postfix and chew much more CPU. But if the external packages have the other features you need, then by all means use them. Some ISP customers, such as small businesses and consultants, may have a legit- imate need to handle their own mail and deliver directly. Unfortunately, the ISPs often tar them with the same dynamic-DNS brush, so when the business be- lieves it’s acewidgets.com, the outside world sees its SMTP connections coming from c-67-163-134-87.hsd1.ct.comcast.net. The right answer for this is either delegated reverse DNS as described in RFC 2317, or for the ISP to maintain a nominal set of static PTR records for that customer. However, finding anyone within the ISPs who even knows what that means, let alone how to set up and maintain it, is often an insurmountable challenge. Forcing customers through ISP mailservers is also a political gray area, as well as a certain amount of maintenance headache for the ISP. It doesn’t have to be. The relay servers are usually already in place, since most customers appear to get set up to deliver through them by default. But those servers also present a good opportunity for the ISP to enforce anti-spoofing in message envelopes and headers and be a better neighbor to the rest of the Internet, especially with today’s nonstop epidemic of forged headers and joe-jobs coming from compro- mised customer machines. It is highly worthwhile for them to build infrastruc- ture that is sufficiently capable and configurable to support that enforcement but still handle some special cases, such as letting acewidgets.com deliver as acewidgets.com, and then lock down the customer subnets. This of course re- quires having those same people who are clueless about delegated DNS come to understand how to run a proper mailserver. If I seem to be down on ISP technical capability in general, it’s for very sound historical reasons. Really, I have tried to get through to many of them and have consistently failed to find anyone who knew what I was talking about. Perhaps if you’re reading this and work for one of these ISPs, you can take it to manage- ment as a wake-up call. The more ISPs realize the extraordinary extent of the problems and put forth the effort to be better Internet neighbors, the less need there will be for kludges like all of the above.

REFERENCES “Classless IN-ADDR.ARPA delegation”: http://rfc-editor.org/rfc/rfc2317.txt Postfix FAQs: http://www.postfix.org/docs.html

18 ;LO GIN: V OL. 30, NO. 5 “THE TROJAN HORSE” ORIGINALLY referred to the ploy used by the ancient

BORIS LOZA Greeks to attack the city of Troy. Today, it’s fairly common knowledge that a trojan horse is an application a cheeky hacker tries to install on your hard disk to get easy finding trojans for access to your computer. A trojan can be part of a rootkit while masquerading as a fun and profit legitimate application such as ls, df, or ps. In Boris Loza, Ph.D., is a founder of Tego System Inc. and this article I will show you how to find HackerProof Technology, in addition to being a con- tributor to many industry magazines. He holds sev- rootkits and trojans using handy little utili- eral patents and is an expert in . ties and a couple of tricks. [email protected]

Checking Inodes

One of the ways to find trojan files in a current direc- tory is to check inode numbers. Many rootkits modi- fy the access and modification time of the files they replace, so at a glance a file may appear to be un- changed or even untouched. What remains is to check an inode number of a file in question. Most installs will install files sequentially. For exam- ple, the output below shows inode numbers for files in the /etc directory: $ ls –ai /etc | sort | more …… 183491 TIMEZONE 183492 autopush 183493 cfgadm 183494 clri 183495 crash 183496 cron …… The –i option of ls lists the files’ inode numbers. As you can see from the output, most of the inode num- bers are in sequence. A broken number sequence indicates the possibility that those files were installed after the main installa- tion took place. Look for out-of-place entries, either very high or very low. Also look for new groupings, as all the rootkit pieces were probably installed at the same time. Note: The newfs command uses fsirand(1M) to in- stall random inodes when creating a new file system. Also, if you use fsirand periodically, your system inode numbers will not be in sequence. For this rea- son, you may want to create a master database of all inode numbers for all your files. You can use some- thing like the following to collect this information into a file: # ls –aiR / > my_inodes

;LOGIN: OCTOBER 2005 FIN DING TROJANS FOR FUN AND PROFIT 19 Put this database aside and check the inode numbers of files in question against it. Update the database after installing new patches or system upgrades. Check closely the /usr, /usr/bin, /sbin, /usr/sbin, and your X Window binaries di- rectory, because rootkits are usually hiding in these places. If an attack was successful, a hacker may install a rootkit. This is a suite of appli- cations that can be used for many nasty things (creating back doors, root shells, etc.). It also helps to hide its own presence by modifying system commands that, for example, list all files in the directory (ls, dir (on Linux)) or find any file (find). Therefore, if you suspect that an attacker is on your system, you may not want to trust the ls or find commands, because they most likely have been re- placed. How, then, do you list all files in the directory/s to find the rootkit’s files?

Alternative Ways to List Files in a Directory

If a rootkit has been installed on your system, it replaced the ls and find com- mands with trojan versions that will not show a real list, one that includes the rootkit’s files. If you suspect that the current directory may contain a hidden directory or file, do one of the following. If you use Korn shell (ksh), press “ESC=” to list all files in a directory. For example, on Solaris OS: $ ksh –o vi $ .= 1) ../ 2) ./ 3) .Xauthority 4) .dt/ 5) .dtprofile 6) .hushlogin 7) .netrc 8) .rhosts 9) .sh_history Or: $ * 1) TT_DB/ 12) mnt/ 2) bin/ 13) net/ 3) cdrom/ 14) opt/ 4) dev/ 15) platform/ 5) devices/ 16) proc/ 6) etc/ 17) sbin/ 7) export/ 18) tmp/ 8) home/ 19) usr/ 9) kernel/ 20) var/ 10) lib/ 21) vol/ 11) lost+found/ 22) xfn/ While your ls might be trojaned and not be able to see the hidden files, your Korn shell will see them. You can also use the echo(1) command, which lists all files in a directory. For example: $ echo * TT_DB bin cdrom core dev devices downloads etc export home kernel lib lost+found mnt mynes.txt net nsmail opt patches platform proc proj- ects sbin test tmp ts1 typescript usr var vol xfn Note: Echo will not show hidden files, such as files starting with “.”. On Linux systems, you may use the less(1) command to display all files in the directory:

20 ;LO GIN: V OL. 30, NO. 5 $ less . 2 drwxr-xr-x 3 boris other 512 Jun 4 20:59 ./ 2 drwxr-xr-x 43 root sys 1024 May 26 20:05 ../ 2 -rw———- 1 boris other 90 Jun 4 21:14 .bash_profile ... To list all files in the directory you can also use the tar(1) utility. Use the -w op- tion for “wait for user confirmation.” Answer “y” for the first entry and “n” for the rest of the list: $ tar cvwf /tmp/bb . r drwxr-xr-x 1003/1 1024 Jun 1 17:35 2005 .: y a ./ 0K r -rw———- 1003/1 3918 Jun 4 20:34 2005 ./.sh_history: n r -rw-r—r— 1003/1 418 Mar 22 14:43 2004 ./.profile: n ^C

Using od and cat Commands

Because a directory is also a file, we will use commands that can be used to look inside files: od and cat. (You cannot use od(1) or cat(1) to display directories on Linux). First, we will try to get the octal dump (using the od command) of the directory to look for all files. Let’s try the following: $ od -c . 0000000 \0 \b 023 337 \0 \f \0 001 . \0 \0 \0 \0 013 006 P 0000020 \0 \f \0 002 . . \0 \0 \0 \b 023 340 \0 020 \0 007 0000040 P r o j e c t \0 \0 \b 023 341 \0 024 \0 013 0000060 w e b s t a t . l o g \0 \0 \b 023 342 0000100 001 304 \0 006 s t a t u s \0 \0 \0 \0 \0 \0 0000120 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 * 0001000 $ The od command with -c option displays single-byte characters. Certain non- graphic characters appear as C-language escapes: null \0 backspace \b form-feed \f new-line \n return \r tab \t Others appear as 3-digit octal numbers. The od -c command starts each line with the number of bytes, in octal, shown since the start of the file. The first line starts at byte 0. The second line starts at byte 20 (that’s byte 16 in decimal, the way most of us count). And so on. One can easily find file names on this output (., .., Project, webstat.log, and status). The cat -v –t –e turns nonprintable characters into a printable form. A directory usually has some long lines, so it’s a good idea to pipe cat’s output through fold: $ cat -v -t -e . | fold -62 ^@^H^SM- _^@^L^@^A.^@^@^@^@^K^FP^@^L^@^B..^@^@^@^H^SM- `^@^P^@^G Project^@^@^H^SM-a^AM-X^@^Kwebstat.log^@^@^@^@^@^AM- D^@^Fstatu s^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^ ......

;LOGIN: OCTOBER 2005 FIN DING TROJANS FOR FUN AND PROFIT 21 @^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^ @^@^@^@^@^@^@^@^@^ @^@^@^@^@$ You may try to filter out nonprintable characters. You can use something like the following (or use your own method): $ cat -v -t -e . | fold | sed “s/[\x00-\x08\x80-\x88\x0B-\x19\ x8B-\x99\x7F\xFF]”//g Or: $ od -c . | sed “s/[\001-\010\013-\037\177-\377]//g” Note: Be aware that some files shown by od or cat may be files that have already been deleted.

Conclusion

Obviously, if your ls command doesn’t show the same files as any of these com- mands, beware (remember not to count deleted files!). This may be because your ls has been replaced by a hacker. Do not panic! This is not the end of the world. Open your favorite book on incident response and OS recovery. You can find more hands-on tips and tricks of this kind in my new book UNIX, Solaris and Linux: A Practical Security Cookbook, which deals with securing UNIX OS without any of the third-party tools. It can be reviewed at www.amazon.com.

NEW! ;login: Surveys To Help Us Meet Your Needs

;login: is the benefit you, the members of USENIX, have rated most highly. Please help us make this magazine even better. Every issue of ;login: online now offers a brief survey, for you to provide feedback on the articles in ;login: . Have ideas about au- thors we should—or shouldn’t—include, or topics you’d like to see covered? Let us know. See http://www.usenix.org/publications/login/2005-10/ or go directly to the survey at https://db.usenix.org/cgi-bin/loginpolls/oct05login/survey.cgi

22 ;LO GIN: V OL. 30, NO. 5 IN THIS EDITION OF ISPADMIN, I take a look at the use of embedded sys-

ROBERT HASKINS tems for ISPs and other networks. There are many embedded platforms and open source software applications available for those interested in deploying wireless access points and firewalls for an ISP or any network. It’s not possible to cover all the ISPadmin hardware and software in the space avail- able here, so I’ve provided overviews and EMBEDDED HARDWARE pointers to other resources.

Robert Haskins has been a UNIX system administra- tor since graduating from the University of Maine with a B.A. in computer science. Robert is employed by Renesys Corporation, a leader in real-time Inter- What Are Embedded Systems? net connectivity monitoring and reporting. He is lead author of Slamming Spam: A Guide for System According to Wikipedia, an embedded system is “a Administrators (Addison-Wesley, 2005). special-purpose computer system, which is complete- [email protected] ly encapsulated by the device it controls” [1]. Embed- ded systems differ from general-purpose computers (such as PCs) by their programming to accomplish specific tasks. Many different types of embedded sys- tems are in use on large networks, most notably switches and routers. Beyond their specific program- ming, embedded systems are usually characterized by: I Low power utilization I Small size and comparatively lower cost I Limited use of active cooling devices (fans) I Limited CPU and memory To keep the devices in a small package, many design decisions must be made. First of all, the small size dictates a small power supply, which limits the num- ber of components and the speed of the CPU (the faster the CPU, the more power and cooling is used). Less power also means that fewer memory chips can be used. All of these requirements lead to lower cost, which makes the platform attractive for all of the rea- sons larger devices (such as traditional PCs) are not. Often, embedded systems are designed to use Com- pact Flash (CF) or similar types of non-volatile stor- age rather than hard disk drives. While reducing heat cost and increasing reliability by using fewer moving parts, the lack of a hard drive does make for some added complexity in design. It also requires “stripped down” versions of operating systems to run success- fully and can limit log retention due to space and read/write cycles. CF memory is limited in the num- ber of times the memory gates can be switched on or off, so the memory must be replaced after this thresh- old is exceeded.

;LOGIN: OCTOBER 2005 ISPADMIN: EMBEDDED HARDWARE 23 nor is it big enough for a full-size PCI card. These Available Embedded Platforms mini-PCI and regular PCI cards are often used for For the purposes of this discussion, platforms are bro- 802.11 wireless network cards. ken down into two types, general-purpose and re- purposed. Rather than performing any specific func- Mini-ITX tion per se, general-purpose platforms are meant to accomplish different functions depending on the soft- Mini-ITX is a form-factor specification developed by ware loaded. Re-purposed platforms are commercial VIA after the purchase of Cyrix from National Semi- products that are loaded with alternate firmware and conductor and Centaur (WinChip) from IDT [7]. may or may not be used in the same applications that Most of the Mini-ITX products are meant to be low- the original creators intended them for. cost PC desktop solutions, as most of the line in- There are a number of general-purpose embedded cludes faster processors and active cooling (fans). platforms available on the market today. The more This is emphasized by the fact that most of the form common, lower-cost platforms include: factors for the Mini-ITX cases are desktop, set-top I Soekris Engineering [2] box, or other hobbyist type of enclosures. However, I PCEngines Wireless Router Application Platform VIA clearly understands the potential for the embed- [3] ded market, as it has several boards that do not re- I PC/104 [4] quire cooling fans and that do contain multiple I Mini-ITX [5] Ethernet ports. These attributes make those VIA product lines ideal for the embedded market. The Soekris and PCEngines platforms are each partic- ular to the vendors who created them and are not standards per se. The PC/104 and Mini-ITX platforms Embedded System Software are standards in the sense that multiple companies manufacture boards in those formats. There are many choices when it comes to running software on your embedded device, and no possible Due to space constraints, only Soekris and Mini-ITX way to cover them all here. If you need specialized platforms will be covered in this article. However, ISP-type services (e.g., RADIUS, QoS), you must use many of the software packages mentioned run on the an environment that allows packages to be added as other platforms listed above. On the re-purposed plat- desired and not a GUI environment such as form side, the big target currently is Linksys hardware (see next section). The available software platforms (and other hardware manufacturers who use similar range from build your own from scratch, to flash im- chipsets). Only the Linksys WRT54G is covered in ages that can be loaded directly on flash and then this article on the re-purposed hardware side. onto the embedded box. If you want the quickest start, the preprogrammed Soekris Engineering flash is the way to go. However, using a distribution such as Voyage Linux [8] is only a little more work, Soekris boards are one of the more widely used gener- but you gain a lot of flexibility. One complication is al-purpose embedded platforms available today. The that Voyage (and similar environments) requires an Soekris platform consists of an AMD Geode [6] 100 existing Linux machine to run the install script. If to 266MHz processor and 16 to 256MB of memory you want to build the Voyage kernel, environment, (specifications depend on the model). Flash memory and/or additional packages, the machine must run a (long-term storage) is provided on board or through a Debian Linux. Pebble [9] is another popular stripped- CF socket. All Soekris models come with two or three down Debian Linux environment for embedded/ Ethernet ports, which makes them ideal for firewalls wireless applications. or for aggregating multiple wireless access points/net- works into one egress/firewall point. Soekris boards Numerous BSD-flavored environments are available also support power over Ethernet, making them ideal because the Soekris line of boards seems to have a for situations where no traditional 110V AC power is bias toward this UNIX variant. BSD variants that run available. on Soekris include m0n0BSD [10] and wifiBSD [11], and there’s a nice write-up on installing embedded All Soekris boards are available with at least one mini- FreeBSD on Soekris net4801 hardware [12]. Many PCI slot. Some models are available with regular PCI people seem to roll their own BSD variant for use on slots, which makes it easy to add functionality, assum- embedded hardware [13]. As with Linux, if you want ing the add-in PCI board consumes little power and to build your own embedded BSD-based environ- runs at 3.3V. However, be aware that the default ment, you must first have a BSD-based development Soekris case doesn’t give external access to the slot, system from which to work.

24 ;LO GIN: V OL. 30, NO. 5 NSLU2 into a firewall or otherwise increasing its use- Pre-Packaged Firewalls fulness. If all you require is a firewall, then specialized GUI Most commonly in the ISP market, the WRT54G de- environments are for you. One of the nicest ones vice is used for enabling wireless access points at low available that runs on Soekris and other PC hardware cost. As shipped by Linksys, built-in functions of the is m0n0wall [14]. It is a FreeBSD-based system, WRT54G include: stripped down, with a very nice GUI; another one is I Wireless radio IPcop [15]. However, ease of installation and manage- I Five-port Ethernet switch ment comes at the price of flexibility, as you cannot I Consumer-grade router easily add functionality to these firewalls. I Firewall Similar chipsets are used by products offered by a Price/Performance Comparison number of manufacturers, including Asus, Buffalo, Motorola, and Siemens, among others, according to From a cost/performance standpoint, VIA is clearly the OpenWRT Table of Hardware site [19]. The the leader—500MHz VIA boards for embedded appli- OpenWRT software will run with varying degrees of cations can be purchased for approximately US$100 support on most of these other platforms. The hard- as of this writing. Add a case, power supply, CF-to- ware specifications for the Linksys WRT54G version IDE adapter, RAM, and a 128MB CF card and the cost 3.0 device, according to [19], are the following: of an embedded Mini-ITX system is comparable to I the low-end Soekris models. Obviously, the perfor- Broadcom 4712 chipset running at 200MHz I mance is much greater for the Mini-ITX system; the 4MB flash/16MB RAM Soekris boards run at only 100 to 266MHz and have At a current street price of US$50 for the WRT54G, smaller amounts of memory, but they also require that’s an excellent price/performance ratio! Products very little power (10 watts for a 266MHz net4801 sys- marketed specifically to the service provider with tem including a pair of wireless cards). similar functionality start around US$100. The inclusion of the typical PC devices on the Mini- ITX platform (VGA adapter, mouse, keyboard, paral- Software for Re-Purposing Devices lel, sound, etc.) does factor in, however. The “wasted” space on the printed circuit board, at least for the em- ISPs don’t use consumer-grade products such as the bedded application, means that the package for the Linksys WRT54G because the firmware as shipped Mini-ITX form factor can never be as small as the from Linksys doesn’t include authentication methods Soekris. The added components will increase power (such as RADIUS) and other functionality needed for requirements for Mini-ITX as well. The small size of working with their networks. These features are often the Soekris (as well as PC/104 and WRAP) form fac- found in higher-priced devices aimed specifically at tor is certainly useful in applications where space and the service-provider market, but are lacking in con- power are at an absolute premium. sumer-grade devices. To rectify this, several people have put out replace- Re-Purposing Hardware ment firmware for the WRT54G (and similar) de- vices. Two of the more widely used ones are: First, a warning: Re-flashing firmware on Linksys I Sveasoft Talisman [20] (and similar devices) is going to void your warranty I OpenWRT [21] and support agreements, so be sure you know what you are doing before embarking on any project that For someone wanting an easy, drop-in network solu- involves such activity! tion, Sveasoft is probably better. It is open source, but a small fee is charged for the latest versions of On the low-cost embedded hardware side, devices firmware/support. While OpenWRT is more flexible such as the Linksys NSLU2 [16] and Linksys and due to its modular framework for adding packages, WRT54G [17] can be re-purposed. The NSLU2 is de- more time must be invested to configure the exact signed to be a network storage device that allows a image you want. If the package you need isn’t avail- USB hard drive to be accessed across the network. It able, you may be able to port it to the OpenWRT envi- runs proprietary firmware, but this can be changed by ronment and load it into your custom image. installing one of the NSLU2-Linux project software images [18]. One of the firmware releases allows the user to add other hardware devices, turning your

;LOGIN: OCTOBER 2005 ISPADMIN: EMBEDDED HARDWARE 25 [7] For a brief history of Mini-ITX, see Conclusion http://www.mini-itx.com/hardware/history/. Embedded hardware is in use at many ISPs and other [8] Voyage Linux: http://www.voyage.hk/software/voyage.html. larger networks. The application for such hardware is usually wireless access points and/or firewalls. There [9] Pebble Linux: http://www.nycwireless.net/pebble/. are many embedded hardware solutions available to [10] m0n0BSD: http://www.m0n0.ch/bsd/. the ISP or hobbyist today. These include general- [11] wifiBSD: http://www.wifibsd.org/. purpose platforms such as Soekris Engineering and [12] Onlamp embedded BSD article: Mini-ITX, as well as re-purposed hardware such as http://www.onlamp.com/pub/a/bsd/2004/03/11/ Linksys WRT54G. Big_Scary_Daemons.html. On the software side for general-purpose embedded [13] For a nice description of how to reduce FreeBSD hardware, stripped-down versions of Linux and BSD for use in embedded systems, see variants are in wide use. Images can be readily ob- https://neon1.net/misc/minibsd.html. tained and loaded via CF and similar media. GUI [14] m0n0wall: http://www.m0n0.ch/wall/. front ends [15] IPcop: http://www.ipcop.org/. are available for firewall applications as well. For [16] Linksys NSLU2 network<==>USB storage link: re-purposed hardware such as WRT54G, several http://www.linksys.com/servlet/Satellite?childpagename Linux distributions are available that enable addition- =US%2FLayout&packedargs=c%3DL_Product_C2%26cid al functionality for ISPs, including RADIUS and other %3D1115416906769&pagename=Linksys%2FCommon service provider requirements. %2FVisitorWrapper. [17] Linksys WRT54G: REFERENCES http://www.linksys.com/servlet/Satellite?childpagename =US%2FLayout&packedargs=c%3DL_Product_C2%26cid [1] Wikipedia Embedded Systems definition: %3D1115416825557&pagename=Linksys http://en.wikipedia.org/wiki/Embedded_systems. %2FCommon%2FVisitorWrapper. [2] Soekris Engineering: http://www.soekris.com/. [18] NSLU2 Linux project: http://www.nslu2-linux.org/. [3] PCEngines WRAP: http://www.pcengines.ch/wrap.htm. [19] OpenWRT table of supported hardware: [4] PC104 Consortium: http://www.pc104.org/. http://openwrt.org/TableOfHardware. [5] VIA’s Mini-ITX page: http://www.via.com.tw/en/ [20] Sveasoft WRT54G replacement firmware: initiatives/spearhead/mini-itx/. http://www.sveasoft.com/content/view/20/1/. [6] AMD Geode: http://www.amd.com/us-en/ [21] OpenWRT project: http://openwrt.org/. ConnectivitySolutions/ProductInformation/ 0,,50_2330_9863,00.html.

26 ;LO GIN: V OL. 30, NO. 5 IN THIS ARTICLE I’M GOING TO LOOK at some less well known security features

DAVID MALONE of FreeBSD. Some of these features are com- mon to all the BSDs. Others are FreeBSD- specific or have been extended in some way in FreeBSD. The features that I will be security through discussing are available in FreeBSD 5.4.

obscurity First, I’ll mention some features that I don’t plan to cover. FreeBSD jails are like a more powerful version of chroot. Like chroot they are restricted to a subtree A REVIEW OF A FEW OF of the file system, but users (including root) in a jail are also restricted in terms of networking and system FREEBSD’S LESSER-KNOWN calls. It should be safe to give root access to a jail to an untrusted user, making a jail like virtual system, SECURITY CAPABILITIES not unlike some applications of UML [1] or Xen [2] David is a system administrator at Trinity College, but without running separate kernels for each system. Dublin, a researcher in NUI Maynooth, and a com- A lot has already been written about jails [3], so I mitter on the FreeBSD project. He likes to express won’t dwell on them further here. himself on technical matters, and so has a Ph.D. in mathematics and is the co-author of IPv6 Network Another feature that I don’t plan to spend much time Administration (O’Reilly, 2005). on is ACLs. ACLs are an extension of the traditional [email protected] UNIX permissions system to allow you to specify per- missions for users and groups other than the file’s owner and group. Since ACLs are familiar to many people from Solaris, Linux, and Windows, to name just a few examples, I’ll just refer to [4] for the details on FreeBSD.

File Flags

A lesser-known set of extended permissions is the “file flags” supported by BSD’s UFS file system. Of in- terest to us here are the append-only, immutable, and undeletable flags. These names are reasonably self- explanatory: append-only files can only be appended to, immutable files cannot be changed in any way, and undeletable files cannot be deleted (or have their hard links removed—all names referring to the inode are protected). Each of these flags comes in two flavors: system and user. System flags can only be set and cleared by root. User flags can be set and cleared by the file’s owner and root. The chflags command can be used to set them and the -o flag to ls can be used to display them. With these commands the names used for the system version of the flags are sappnd, schg, and sunlnk, and the user versions are uappnd, uchg, and uunlnk. For example, it often surprises newcomers to UNIX permissions that a file can be deleted by any user who can write to the directory the file is in. Below, user lmalone has set the undeletable flag, and now user dwmalone cannot remove it.

;LOGIN: OCTOBER 2005 SECURITY THROUGH OBSCURITY 27 dwmalone@hostname% ls -ldo . normal undeleteable drwxr-xr-x 19 dwmalone wheel - 3584 May 14 09:03 . -rw-r—r— 1 lmalone wheel - 0 May 14 09:03 normal -rw-r—r— 1 lmalone wheel uunlnk 0 May 14 09:03 undeleteable dwmalone@hostname% rm normal undeleteable override rw-r—r— lmalone/wheel for normal? y override rw-r—r— lmalone/wheel uunlnk for undeleteable? y rm: undeleteable: Operation not permitted Even root cannot remove this file, until the flag has been cleared manually: root@hostname# rm undeleteable rm: undeleteable: Operation not permitted root@hostname# chflags nouunlnk undeleteable root@hostname# rm undeleteable There are some obvious applications for these flags. Append-only files can be used as log files, protecting against accidental or malicious truncation. FreeBSD installs a number of important files as system immutable (libc, init) to keep them from being damaged accidentally. The immutable flag can also be used to prevent people “stealing” a link to an SUID executable. Usually, in any directory for which a person has write permis- sions, he can make a hard link to any file in that filesystem for which he has read permissions. This means that if someone knows there is a vulnerability to be an- nounced in some SUID executable, he can steal a hard link to it and still have ac- cess to the executable after the original file appears to the sysadmin to be delet- ed. If an immutable flag is set on a file, these sorts of games aren’t possible [5]. Note that file flags can only be manipulated locally and cannot be set or cleared over NFS. This means that marking a file on an NFS server as immutable is a good way to keep anyone from changing it.

BSD Secure Level

As I described above, the file system flags provide some useful flexibility, and even some protection against shooting oneself in the foot as root. However, they provide little protection against a malicious root user, who could just clear all the flags before going about their nefarious business. The BSD operating systems do provide a simple form of protection against a ma- licious root user in the form of numbered “secure levels,” in which the higher the number, the greater the restrictions on what can be done. The secure level can be raised using the command, but cannot be lowered while the sys- tem is running. On FreeBSD the secure level is set to -1 by default, but the se- cure level for multi-user operation can be set in /etc/rc.conf by adding settings such as: kern_securelevel_enable=”YES” kern_securelevel=”2” The restrictions placed on system operation at each secure level are as follows: If secure level > 0 you can’t: I access hardware from user processes via /dev/mem, /dev/pci, I/O instructions, and so on; I load or unload kernel modules; I change system-level file system flags (unlink, immutable, append); I run a debugger on init; I or cause /dev/random to perform a reseeding operation by writing to it. If secure level > 1 you also can’t: I open disks in /dev for writing (including SCSI pass-through devices);

28 ;LO GIN: V OL. 30, NO. 5 I change firewall rules; I or run the clock faster than twice its normal speed or turn time backwards. If secure level > 2 you also can’t: I change certain secondary firewall features such as ipf’s NAT and ipfw’s dum- mynet configuration; I change certain sysctl values (msgbuf_clear, ipport_reserved{high, low}). At secure level 1, root can’t change immutable files. A malicious root might de- cide to unmount the file system and edit the raw disk, circumventing file per- missions, flags, and ACLS. At secure level 2, this isn’t possible, as disks cannot be opened for writing. Interestingly, each jail actually has its own secure level, so a jail can run at a higher secure level than the host system. This means that a careful combination of a high secure level and UFS file flags can prevent an intruder from installing rogue kernels or kernel modules, the sort of trickery described in Rik Farrow’s “Musings” column last April [6]. For this to work, all the files and directories involved in the boot process need to be immutable—this would include /boot, /sbin, /etc, /bin, /lib, /usr/bin, /usr/lib, etc. In practice, this isn’t often done, as it reduces the amount that can be achieved with online system administration. To update libraries the system must be re- booted and the library installed in single-user mode before the secure level is raised. I did say that the secure level could not be lowered while the system is running. There is a way around this that I have used. If you have chosen to include kernel debugger support in your kernel, then someone with access to the kernel debug- ger can reduce the secure level. For example, I can use CTRL+ALT+ESC to get to the debugger on the console of my server: root@hostname# sysctl kern.securelevel=2 kern.securelevel: -1 -> 2 root@hostname# KDB: enter: manual escape to debugger [thread pid 13 tid 100001 ] Stopped at kdb_enter+0x2f: nop db> write securelevel 0 securelevel 0x2 = 0 db> continue root@hostname# sysctl kern.securelevel kern.securelevel: 0 This technique allows an administrator to run the system at a high secure level when appropriate but to lower it when needed. It is important to remember that access to the kernel debugger requires physical access to the system (either to the console or via FireWire) and requires debugger support in the kernel.

MAC Framework

The MAC (Mandatory Access Control) framework is part of the excellent work done by the TrustedBSD project [7] to bring new security features to FreeBSD. The MAC framework allows people to develop kernel modules that provide ad- ditional checks on what the kernel permits processes to do. The MAC framework includes a number of sample modules implementing well- known security systems, such as the Biba integrity model and Multi-Level Secu- rity (MLS), which I’ll just mention here since to do them justice would require many pages. The TrustedBSD project also provides a port of the SELinux [8] policy system. All the MAC modules that are shipped with FreeBSD are docu- mented both in manual pages (man 4 mac) and in the FreeBSD handbook [9]. Along with these well-known modules are included a number of quirkier offer- ings. I’ll mention three of these here: seeotheruids, bsdextended, and portacl.

;LOGIN: OCTOBER 2005 SECURITY THROUGH OBSCURITY 29 Note, while some of these modules can be loaded at any time, they all require that the MAC framework be compiled into your kernel by adding options MAC to your kernel config file. The more complex modules must either be loaded at boot time or compiled into your kernel.

SEEOTHERUIDS MAC MODULE

The seeotheruids module prevents users from seeing processes owned by other users. Usually ps, netstat, /proc and top will display the processes and sockets on the system belonging to all users. When the seeotheruids module is enabled, normal users can only see their own processes. Root is not a normal user, so it can see everyone’s processes. It is also possible to allow a particular group of users to see the processes of others; for example, we can arrange for users in group wheel (with GID 0) to be able to see everyone’s processes: root@hostname# kldload mac_seeotheruids root@hostname# sysctl security.mac.seeotheruids.enabled=1 security.mac.seeotheruids.enabled: 0 -> 1 root@hostname# sysctl security.mac.seeotheruids.specificgid=0 security.mac.seeotheruids.specificgid: 0 -> 0 root@hostname# sysctl security.mac.seeotheruids.specificgid_enabled=1 security.mac.seeotheruids.specificgid_enabled: 0 -> 1

BSDEXTENDED MAC MODULE

The bsdextended MAC module allows you to define more complex relationships between users and what files they are permitted to access. Unlike file permis- sions, ACLs, and flags, these rules are not attached to particular files but are systemwide. This is perhaps best explained using another example. Suppose we have a sand- box user “pproxy” whose sole purpose is to run a POP proxy daemon. This user probably only needs read access to a few files on the system but, in fact, has ac- cess to all the files that are readable by “others.” pproxy@hostname% ./ls -l total 8068 -r-xr-xr-x 1 pproxy wheel 4096352 Apr 30 12:07 cat -r-xr-xr-x 1 pproxy wheel 4096352 Apr 30 12:07 ls -rw-r—r— 1 pproxy wheel 6 Apr 30 12:27 myfile -rw-r—r— 1 root wheel 4 Apr 30 12:27 otherfile pproxy@hostname% ./cat myfile hello pproxy@hostname% ./cat otherfile bye By loading the bsdextended module we can use the ugidfw command to define rules stating which files the pproxy user can access. The following commands set up rules that allow the pproxy user read and execute access to a file and its attributes if it belongs to user pproxy and no access to any other files: root@hostname# kldload mac_bsdextended root@hostname# ugidfw add subject uid pproxy object uid pproxy mode srx root@hostname# ugidfw add subject uid pproxy object not uid pproxy mode n The “subject” part of a ugidfw command describes the user or group that the rule applies to. The “object” part describes the files the rule applies to, by speci- fying their owner (either by UID or GID). The mode describes the permitted op- erations. r normal read access to the file w normal write access to the file x execute/search access s read access to file attributes, such as permissions, owner, etc.

30 ;LO GIN: V OL. 30, NO. 5 a administrative operations such as chmod, etc. n no access The first matching rule is used to decide what access is permitted by mac_bsd- extended. Remember that for an action to be permitted it must also be allowed by traditional permissions and any other MAC modules, ACLs, or file flags in force. After creating these rules with ugidfw, the pproxy user has greatly reduced ac- cess to the system. They can no longer read world-readable files (even /etc/pass- wd and /etc/group are inaccessible), and they cannot write to any files: pproxy@hostname% ./ls -l ls: otherfile: Permission denied total 8066 -r-xr-xr-x 1 3007 0 4096352 Apr 30 12:07 cat -r-xr-xr-x 1 3007 0 4096352 Apr 30 12:07 ls -rw-r—r— 1 3007 0 6 Apr 30 12:27 myfile pproxy@hostname% ./cat myfile hello pproxy@hostname% ./cat otherfile cat: otherfile: Permission denied pproxy@hostname% echo > myfile myfile: Permission denied Those who read the preceding two examples carefully will have noticed that I used copies of ls and cat belonging to the pproxy user. In fact, I also had to stat- ically link these commands, as the pproxy user cannot read the normal copies of cat, ls, or libc because of the ugidfw rules! This is much the same situation as setting up a chrooted environment, where the correct executables and libraries need to be available to the user.

PORTACL MAC MODULE

The portacl module provides more flexible control of who can use which net- work ports. The traditional UNIX-style rules controlling who can listen for data on a port are: I Root can listen anywhere. I Everyone else can listen on ports > 1023. Thus certain daemons, such as Web servers and news servers, need to at least begin life running as root in order to get access to the required ports (port 80 and port 119, respectively). The portacl module allows rules like “user www can bind to port 80,” which means that the Web server never needs to run as root. Let’s take that as an example: root@www# kldload mac_portacl root@www# sysctl security.mac.portacl.rules=uid:80:tcp:80,uid:80:tcp:443 security.mac.portacl.rules: -> uid:80:tcp:80,uid:80:tcp:443 Here we’re saying that UID 80 (i.e., user www) should be permitted to bind to tcp port 80 and tcp port 443. We do not need to make any other change. Since the constraints enforced by the MAC framework are in addition to the normal constraints enforced by the kernel, we need to tell the kernel to relax its usual restrictions and let portacl do the work. root@www# sysctl net.inet.ip.portrange.reservedlow=0 net.inet.ip.portrange.reservedlow: 0 -> 0 root@www# sysctl net.inet.ip.portrange.reservedhigh=0 net.inet.ip.portrange.reservedhigh: 1023 -> 0

;LOGIN: OCTOBER 2005 SECURITY THROUGH OBSCURITY 31 Portacl actually has an implicit rule that limits ports 1–1023 to root, unless oth- erwise permitted by your setting of security.mac.portacl.rules, so we don’t need any further rules to have the other ports behave as usual. We can then start Apache as user www—provided user www can write to the necessary log files. In order to do this you could move the log files to /var/log/www and have that directory owned by user www. The necessary changes to the default Apache conf file look like this: Listen 0.0.0.0:80 LockFile /var/log/www/accept.lock PidFile /var/log/www/httpd.pid ErrorLog /var/log/www/httpd-error.log CustomLog /var/log/www/httpd-access.log combined In some cases, there will be no downside to starting a daemon as a non-root user. However, there are some minor downsides to starting Apache as user www. As the logs are owned by user www, a vulnerability in a CGI or PHP script may give write/truncate access to log files. Of course, file flags could be used to miti- gate this. Unfortunately, there is no equivalent of the net.inet.ip.portrange.reserved* for IPv6, which means that the mac_portacl module is of less use in combination with IPv6. This is why the example uses the wildcard IPv4 address explicitly: to stop Apache using the IPv6 wildcard address. This omission should be fixed in a future release of FreeBSD.

GEOM and Disk Encryption

The last feature of FreeBSD that I’ll mention is GEOM. GEOM is a framework for dealing with disk-like objects in the FreeBSD kernel. For example, the physical disks on the system are registered with GEOM. GEOM has classes that understand PC partition tables and FreeBSD disk labels. These classes can examine the disk and make the partitions and subpartitions of the disk available in /dev. GEOM isn’t restricted to just recognizing subpartitions; it can also perform more complex transformations such as striping, network-based disks, and encrypted disks. A sample module for doing disk-based encryption, called BDE, is included with GEOM. Using BDE is actually quite straightforward. Naturally, you need a spare partition to house the encrypted disk—in this case, we use /dev/ad0s1g. First, we initialize the disk and choose a passphrase—as usual, choosing a good passphrase is essential: root@hostname# gbde init /dev/ad0s1g -L /etc/ad0s1g.lock Enter new passphrase: Reenter new passphrase: The lock file specified in the command will have some information about the encrypted disk’s “lock sector” written into it. This file should be treated with some care—it needs to be kept backed up, and knowing its contents will make an attacker’s life easier. Next, we can attach the disk, create the file system, and check that everything works OK: root@hostname# gbde attach ad0s1g -l /etc/ad0s1g.lock Enter passphrase: root@hostname# newfs /dev/ad0s1g.bde root@hostname# mount /dev/ad0s1g.bde /stuff root@hostname# df /stuff Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/ad0s1g.bde 60667770 4 55814346 0% /stuff

32 ;LO GIN: V OL. 30, NO. 5 Now the file system can be used as usual, but BDE will encrypt each block be- fore it is written to the disk. The problem with encrypted disks is getting the passphrase to the system when it wants to use the disk. Naturally, you can’t store the passphrase on an unen- crypted disk; the encryption would be pointless! One option is to manually issue the gbde attach command whenever the administrator wants access to use the encrypted disk. Another option is to require the administrator to enter the passphrase at boot time, when filesystems are mounted. This can be done by creating an appropri- ate entry in /etc/fstab and /etc/rc.conf: root@hostname% fgrep /stuff /etc/fstab /dev/ad0s1g.bde /stuff ufs rw 2 2 root@hostname% fgrep gbde /etc/rc.conf gbde_devices=”AUTO” With this configuration, the administrator will be prompted for the passphrase for /dev/ad0s1g at boot time. The boot scripts also support encryption of the swap partition. In this case, the key can be chosen randomly, as the contents of a swap partition aren’t required after a reboot: root@hostname% fgrep swap /etc/fstab /dev/ad0s1b.bde none swap sw 0 0 root@hostname% fgrep gbde /etc/rc.conf gbde_swap_enable=”YES” More details about how to operate the GEOM BDE system, including how to de- tach and destroy encrypted disks, can be read in the gbde manual page. For a de- scription of BDE internals, see [10]; a discussion of the strengths and weakness- es of its designs can be found at [11]. The GEOM system should also make it easier for FreeBSD to support disk encryption schemes used by other systems, such as NetBSD’s CGD [12].

Summary

In this article we’ve covered some older features (file flags and secure levels) and some newer ones (the MAC and GEOM frameworks). These features basically provide a richer set of choices in the design of a secure system, providing op- tions that aren’t available with the plain UNIX security model. As usual, the tricky bit is the care required to use these features correctly. More features are in the pipeline. In particular, there should be support for event auditing and OpenBSM available in the FreeBSD 6 family of releases. It will also be interesting to see what interesting applications of the MAC and GEOM frameworks people can come up with. Companies and individuals are already developing third-party modules for use in their own environments or in FreeB- SD-based products.

R EFERENCES [1] User Mode Linux: http://user-mode-linux.sourceforge.net/. [2] Xen virtual machines: http://www.cl.cam.ac.uk/Research/SRG/netos/xen/. [3] The original jail paper is available at http://docs.freebsd.org/44doc/. Many tutorials are also available; see, e.g., http://www.freebsddiary.org/jail.php. [4] FreeBSD Handbook, “Security,” File System Access Control Lists section, http://www.freebsd.org/handbook/; see a tutorial article at http://ezine.daemonnews.org/200310/acl.html. [5] FreeBSD also provides the sysctls security.bsd.hardlink_check_uid and security.bsd.hardlink_check_gid, which prevent users from making hard links unless their UID/GID matches that of the file. However, these sysctls are only enforced against locally running processes, and so hard links can still be made over NFS.

;LOGIN: OCTOBER 2005 SECURITY THROUGH OBSCURITY 33 [6] “Musings,” ;login:, April 2005. [7] TrustedBSD project: http://www.trustedbsd.org/. [8] SELinux: http://www.nsa.gov/selinux/. SEBSD port: http://www.trustedbsd.org/sebsd.html. [9] FreeBSD Handbook, “Mandatory Access Control,” http://www.freebsd.org/handbook/. [10] gbde—GEOM-Based Disk Encryption: http://phk.freebsd.dk/pubs/ bsdcon-03.gbde.paper.pdf. [11] GBDE discussion on the Cryptography Mailing List threads: http://www.mail-archive.com/[email protected]/msg03636.html; http://www.mail-archive.com/[email protected]/msg03671.html. [12] The CryptoGraphic Disk Driver: http://www.imrryr.org/~elric/cgd/.

34 ;LO GIN: V OL. 30, NO. 5 CONSIDER THE FOLLOWING SCENARIO: Alyssa Hacker subverts tens of thousands

SRIKANTH KANDULA of machines by using a worm and then uses these zombies to mount a distributed denial of service attack on a Web server. Alyssa’s zombies do not launch a SYN flood surviving or issue dummy packets that will only con- gest the Web server’s access link. Instead, DDoS attacks the zombies fetch files or query search Srikanth Kandula is a graduate student at the MIT engine databases at the Web server. From Computer Science and Artificial Intelligence Laboratory in the Networks and Mobile Systems the Web server’s perspective, these zombie group. His research interests are in networked sys- requests look exactly like legitimate tems and security. requests, so the server ends up spending a [email protected] lot of its time serving the zombies, causing legitimate users to be denied service. This article is related to research published as “Botz-4-Sale: Surviving Organized DDoS Such an attack, which we call CyberSlam, is discon- Attacks That Mimic Flash Crowds,” Pro- certingly real. In a recent FBI case, a Massachusetts ceedings of the 2nd Symposium on Networked businessman hired professionals to DDoS his com- Systems Design and Implementation (NSDI petitor’s Web site [1]. Like any other online business, ’05) (Berkeley: USENIX Association, 2005). the competitor had a search engine back end. So the professionals used a large botnet to flood the compe- titor’s site with a massive number of queries, bringing it down for almost a week. Several extortion attempts at online gaming and gambling sites used similar at- tacks [2]. Why CyberSlam? If you think about it, there are some real reasons why CyberSlam attacks happen. First, we know that many large botnets exist. Zombie machines are typically compromised by worms, viruses, or mal- ware, and the zombies are controlled by remote bot- net controllers over IRC channels [3, 4, 5]. Second, there is a great incentive to mimic the browsing pat- terns of legitimate users. It avoids detection by stan- dard filters and intrusion detection boxes that rou- tinely identify and block anomalous traffic. This is especially important for organized DDoS mafia, be- cause for them the botnet is a reusable resource that they would like to protect. Finally, in CyberSlam an attacker is doing little while the server does a lot. By sending a single HTTP packet containing a small re- quest, the attacker can make the server reserve sock- ets, TCP buffers, and an application process, and do significant database processing or congest some other server bottleneck. So how can a system administrator deal with Cyber- Slam attacks? Let us look at some existing tech- niques. First, how about using passwords for authen- tication? Passwords don’t exist for most Web sites (e.g., Google), because they are cumbersome to man- age both for the site and for the customers. More im-

;LOGIN: OCTOBER 2005 SURVIVING DDOS ATTACKS 35 FIGURE 1: GRAPHICAL PUZZLE portant, to check a password the server has to establish a TCP connection with the client, reserve a socket and a server worker process, and search the password database, so an attacker can simply DDoS the password-checking mechanism. Second, computational puzzles make the client do some heavy computation be- fore giving them service. But computation is typically abundant in a botnet, and solving a puzzle for every request slows down all the normal users. So, we need a new approach to counter CyberSlam attacks. Here is a potential solution. Intuitively, online businesses care more about serv- ing human users; so if the server can quickly identify the human users, it can serve their requests selectively and drop all the others. There are easy ways of distinguishing humans from zombies (e.g., the graphical puzzles used by Yahoo and Hotmail). Humans can solve these puzzles easily; zombies cannot do so at all. So when the server is overloaded, it sends a graphical puzzle, as shown in Fig. 1, to everybody and serves only those who answer correctly.

Challenges

Are we done, then? Unfortunately, the answer is no. There are three main chal- lenges with using graphical puzzles. First, in a typical setup, sending the graph- ical puzzle allows an unauthenticated client to establish a TCP connection, hog sockets, TCP buffers, and application processes, and force context-switches from the kernel network stack up into application space, so attackers can easily DDoS this authentication mechanism. Second, graphical puzzles have a bias against users who cannot (disabled users) or will not (due to inconvenience) solve the puzzle. By forcing a server to use graphical puzzles, the attacker has al- ready won—she denies access to all such users. Third, a server has to divide its resources between authenticating new users and serving the users it has already authenticated. This is a tricky problem. If the server authenticates every new user, it may run out of resources to serve users who are already authenticated, leading to unnecessary starvation. If, on the other hand, the server authenticates too few new arrivals, then it may not have enough users to serve and may go idle.

Introducing Kill-Bots

We present Kill-Bots, a simple, cheap, and effective software modification to the server’s operating system that distinguishes friend from foe. The core principle is simple: do not allow clients to reserve any server resource until they are authen- ticated. Kill-Bots kicks in whenever a Web site is in danger of being over- whelmed by requests. The software asks requestors to solve a simple graphical puzzle before granting access to server resources such as buffer space. Once a client solves the graphical puzzle, she is given an HTTP cookie so that she can obtain service for some time without having to solve another puzzle. Addresses that repeatedly request access to the server without solving the puzzle are black-

36 ;LO GIN: V OL. 30, NO. 5 listed automatically. When the load on the Web server decreases, it stops issuing puzzles and accepts requests from non-blacklisted addresses, so even real users who did not solve the puzzle gain access. Finally, Kill-Bots efficiently divides server resources by adapting the probability with which new users are authenti- cated. A Kill-Bots server neither accepts more users than it can serve nor goes idle by not authenticating enough new users. Fig. 2 shows how these individual pieces fit together in Kill-Bots, and Fig. 3 shows the HTML for the puzzle.

FIGURE 2: HANDLING ZOMBIES WITH KILL-BOTS

FIGURE 3: HTML SOURCE FOR THE PUZZLE

FIGUR E 4: KILL-BOTS MODIFIES SERVER’S TCP STACK TO SEND TESTS TO NEW CLIENTS WITHOUT ALLOCATING A SOCKET OR O T H ER CONNECTION RESOURCES Let us briefly look at each of the main components of Kill-Bots. The modified network stack is shown in Fig. 4. When, the client sends a TCP SYN, Kill-Bots responds with a SYN cookie. The SYN cookie is a standard defense mechanism that serves two purposes. First, it filters requests from clients with spoofed IP addresses. Second, it allows the server to continue the TCP handshake without maintaining any state for the half-open connection. Upon receiving the SYN cookie, the client responds with an acknowledgment completing the TCP hand- shake. At this point, the standard network stack allocates a socket, reserves TCP buffers, and passes the request onto an application-space process. All of this is done for every attacker request and is quite costly. More important, these re- sources remain allocated until the client responds with a FIN. An attacker can simply hog resources by not sending the FIN. To avoid this, Kill-Bots ignores the acknowledgment. Instead, it looks at the first data packet that contains the same acknowledgment number and peeps into the HTTP header to confirm that this is a previously authenticated client with a valid HTTP cookie. If not, the modi- fied kernel immediately sends the graphical puzzle and a TCP FIN without cre- ating a socket or reserving any resources, as shown in Fig. 4. Recall the second challenge: Graphical puzzles have a bias against users who cannot or will not answer the puzzle. So whenever an attacker forces the server to use graphical puzzles, these users are denied access. Kill-Bots uses the follow-

;LOGIN: OCTOBER 2005 SURVIVING DDOS ATTACKS 37 ing observation of client behavior: a human user who does not answer the graphical puzzle will only retry a couple of times to see if he can access the serv- er without answering; attackers, on the other hand, will have to continuously bombard the server with requests and fetch the graphical puzzles or the attack goes away. There is a simple difference between the human users and attackers now; the attackers have many more unanswered puzzles than the normal users. Kill-Bots uses a bloom filter to track the number of unanswered puzzles per IP and drops all requests from an IP if its associated bloom counters cross a limit, say, 32. Recall the third challenge of dividing resources between serving authenticated users and authenticating new users. Kill-Bots deals with this by probabilistically authenticating new users and dropping others. The authentication probability adapts to server conditions. Intuitively, whenever the server is lightly loaded, the authentication probability increases so that Kill-Bots authenticates more new users, and when the server is heavily loaded it decreases. I will defer the specific details of the adaptation, but the tricky parts are how much can one increase the authentication probability without overloading the server and how long it takes to adapt to changing conditions. Note that small increments would reduce the chance of overshooting, while large increments would react to changing condi- tions and get you to the optimal operating condition quicker. We reconcile these contradictory preferences by using techniques from control theory. In experiments, a Kill-Bots-protected Web server successfully endured five times as many hits as an unprotected Web server could tolerate. Not only did the Web server stay online, but protected Web sites also maintained speedy response times, even during the height of the attacks. You might wonder what would hap- pen during a flash crowd, i.e., when the server overload is caused by a large number of legitimate requests. Kill-Bots improves both server hit-rate and re- sponse times by using the adaptive authentication probability mechanism to quickly drop new users who cannot be served. This ensures that server re- sources are not wasted on requests that are going to be dropped at a later time.

FIGURE 5: A MODULAR REPRESENTATION OF THE KILL-BOTS CODE

Using Kill-Bots

From a practical standpoint, here is how you could use Kill-Bots to protect your own Web server. Fig. 5 is a modular representation of the Web server using Kill-Bots. Kill-Bots doesn’t require an extra server; it is a software patch to the operating system of the server kernel. Kill-Bots needs a store of graphical puz- zles. Generating the graphical puzzles is relatively easy, for example using the JCAPTCHA software [6], and can done on the server itself during periods of rel- ative inactivity or on a different dedicated machine. Also, puzzles may be pur- chased from a trusted third party. Kill-Bots has modest overhead; it uses 10MB of RAM to cache graphical puzzles, maintain the bloom filter that blacklists zombie

38 ;LO GIN: V OL. 30, NO. 5 IPs, and maintain state for each cookie. A kernel thread periodically loads fresh graphical puzzles into memory. The per-request overhead is also quite small. On a 2.4GHz PIV workstation, peeping into HTTP requests costs 8 microseconds of server time, and serving a puzzle costs 31 microseconds.

Why Does Kill-Bots Matter?

Worries over distributed denial-of-service attacks are spreading. It is depressing, yet true, that the future will see many more organized DDoS attacks. Most Web server defenses use authentication procedures that are easily outwitted and re- quire huge excesses in the form of replicated content, multiple CPUs, fancy hardware, and extra bandwidth. Kill-Bots is much cheaper and can be deployed easily; it requires no changes in users’ Web browsers and works with the very large number of Web servers running Linux. Although Kill-Bots occasionally misclassifies legitimate users as zombies, it allows Web sites under attack to re- main available and so keeps the Web open for business, while barring the way to thieves and vandals.

REFERENCES [1] K. Poulsen, “FBI Busts Alleged DDoS Mafia,” 2004: http://www.securityfocus.com/news/9411. [2] J. Leyden, “East European Gangs in Online Protection Racket,” 2003: http://www.theregister.co.uk/2003/11/12/east_european_gangs_in_online/. [3] J. Leyden, “The Illicit trade in Compromised PCs,” 2004: http://www.theregister.co.uk/2004/04/30/spam_biz/. [4] E. Hellweg, “When Bot Nets Attack,” MIT Technology Review, September 2004. [5] L. Taylor, “Botnets and Botherds”: http://sfbay-infragard.org. [6] JCAPTCHA: http://jcaptcha.sourceforge.net/.

;LOGIN: OCTOBER 2005 SURVIVING DDOS ATTACKS 39 SENSOR NETWORKS HAVE THE POTEN- tial to revolutionize any number of fields. GEOFFREY MAINLAND AND MATT WELSH In the future, sensor networks may allow bridges to immediately report on structural damage following an earthquake, environ- distributed, adaptive mental monitoring systems to track pollu- resource allocation tants in real time, and emergency response workers to direct limited resources to those for sensor networks who need medical attention most urgently. Geoffrey Mainland is currently a Ph.D. student at They will also vastly increase the quantity Harvard University and received his A.B. in Physics from Harvard College. His research interests include and quality of research data scientists can programming languages, distributed systems, net- collect in the field. Our group at Harvard is works, and intelligent control. working on several projects that apply sen- [email protected] sor network technology to problems not tra- ditionally associated with computer science, Matt Welsh is an assistant professor of Computer science at Harvard University. His research interests including CodeBlue, a software and hard- encompass sensor networks, operating systems, and distributed systems. ware platform for emergency response, and

[email protected] the Volcán Tungurahua project, where we are using sensor networks to monitor erup- tions at an active volcano in central Ecuador.

A typical sensor network device is the UC Berkeley Mica2 node, which consists of a 7.3MHz ATmega128L processor, 128KB of code memory, 4KB of data memo- ry, and a Chipcon CC1000 radio capable of 38.4Kbps and an outdoor transmission range of approximately 300 meters. The node measures 5.7cm by 3.1cm by 1.8cm and is typically powered by two AA batteries, with an expected lifetime of days to months, depend- ing on application duty cycle. With such limited com- putational and communication resources, how can ef- fective applications be developed using this class of device? One might be tempted to assume that in a few years sensor network devices will have more powerful CPUs and better radios. Undoubtedly more powerful devices will appear, but we believe that the majority of sensor network nodes will be similar to the Mica2 in terms of processing power and bandwidth—they’ll just be much smaller and cheaper. Far from becoming obsolete, techniques for programming these small de- vices will become increasingly important. Consider two commonly cited applications for sensor networks: environmental monitoring [1, 2] and dis- tributed vehicle tracking [3, 4]. Both applications re- quire nodes to collect local sensor data and relay it to a central base station, typically using a multi-hop rout- ing scheme. To reduce bandwidth requirements, nodes may need to aggregate their local sensor data with that of other nodes. Even these simple applications present unique challenges to system implementers. Nodes

40 ;LO GIN: V OL. 30, NO. 5 must individually determine a schedule for sampling, aggregating, and sending data, subject to energy budget constraints. This schedule affects energy usage and, therefore, the overall lifetime of the network, as well as the quality of the data generated by the network. A node’s ideal schedule is based on its physical location, position in the routing topology, and changes in the environment. As a result, there is almost never an ideal a priori common schedule for all nodes. Any action-scheduling algorithm must cope with network dynamics, so even full knowledge of a node’s network location and capabilities doesn’t allow one to choose a schedule ahead of time that will work well for all settings. Many current applications use a single fixed, common schedule anyway, or try to build in some ad hoc adaptive behavior. For example, an application might selectively activate nodes that are expected to be near some phenomenon of in- terest. The only current tool that programmers have to address the scheduling issue is manual tuning, which is difficult and error-prone. We propose an adaptive resource allocation scheme for sensor networks, called “Self-Organizing Resource Allocation” (SORA). Rather than defining a fixed node schedule, SORA causes nodes to individually tune their rate of operation using techniques from reinforcement learning [5]. Nodes receive rewards for taking “useful” actions that contribute to the overall network goal, such as lis- tening for incoming radio messages or taking sensor readings. Each node learns which actions are profitable based on this reward feedback. Network retasking is accomplished by adjusting rewards, rather than pushing new code to sensor nodes, and network lifetime is controlled by constraining nodes to take actions that meet a local energy budget.

Application Example: Vehicle Tracking

As a concrete example of using SORA to manage resource allocation in a realistic sensor network application, we consider tracking a moving vehicle through a field of sensors. Vehicle tracking raises a number of interesting problems in terms of detection accuracy and latency, in-network aggregation, energy man- agement, routing, node specialization, and adaptivity [4, 6, 7]. Vehicle tracking can be seen as a special case of the more general data collection problem also found in applications such as environmental and structural monitoring [2, 8]. In the tracking application, each sensor is equipped with a magnetometer capa- ble of detecting local changes in a magnetic field, which indicates the proximity of the vehicle to the sensor node. One node acts as a fixed base station, which collects readings from the other sensor nodes and computes the approximate lo- cation of the vehicle based on the data it receives. The systemwide goal is to track the location of the moving vehicle as accurately as possible while simulta- neously maximizing the efficiency of the network’s energy use. Each sensor node can take the following set of actions: sample a local sensor reading, send data toward the base station, listen for incoming radio messages, sleep for some interval, and aggregate multiple sensor readings into a single value. Each node maintains a fixed-length FIFO buffer of sensor readings, which may be sampled locally or received as a radio message from another node. Each entry in the buffer consists of a tuple containing a vehicle location estimate weighted by a magnetometer reading. The sample action appends a local reading to the buffer, and the listen action may add an entry if the node receives a mes- sage from another node during the listen interval.

;LOGIN: OCTOBER 2005 DISTRI BUTED, ADAPTIVE RESOURCE ALLOCATION FOR SENSOR NETWORKS 41 Each action a has a utility u(a) given by:  β r if the action is available u(a)= a a 0 otherwise where ra is the current reward for action a, and β isa the estimated probability of payment for that action, which is learned by nodes as described below. In our work, reward vectors are broadcast by the base station, but they may also be static parameters of the sensor network program. An action may be unavailable if either the current energy budget is too low to take the action, or other depen- dencies have not been met (such as lack of sensor readings to aggregate). This utility function is just the expected reward for taking a given action.

The estimated probability of receiving a reward for an action a, ,β isa updated every time that action is taken using an exponentially weighted moving average (EWMA). The equation for this update is:

  α +(1− α)βa if a receives a reward βa = (1 − α)βa otherwise where representsα the sensitivity of the EWMA filter. If an action does not pro- duce a reward, the node’s estimated probability of receiving a reward decreases, but if the action does produce a reward, the estimated probability increases. Nodes learn the probability of receiving a reward instead of directly learning the expected reward for an action because this allows them to more easily adapt when a new reward vector is injected into the network. A node chooses an action to perform by examining the utilities of all available actions and picking the action with the largest utility, which is just the expected reward. The expected reward for an action will vary over time due to possible re- ward adjustments and changing environmental conditions. Therefore, it is im- portant that nodes periodically “take risks” by choosing actions that have a low reward probability β. Toa allow nodes to occasionally explore the action space, we employ an -greedy action selection policy. With a small probability, , when faced with a decision, a node will choose an available action uniformly at ran- dom. With probability 1- the node selects the “greedy” action, that is, the ac- tion that maximizes the utility u(a). This exploration prevents a node from ever again selecting an action that has been unprofitable in the past.

Comparison with Existing Approaches

To compare the use of SORA with more traditional approaches to sensor net- work scheduling, we implemented three additional versions of the tracking sys- tem. The first employs static scheduling, in which every node uses a fixed sched- ule for sampling, aggregating, and transmitting data to the base station. Nodes perform a round of actions: sample, listen, aggregate, and transmit. The node then sleeps for a period of time. Given a daily energy budget, a node determines how long it must sleep between these rounds to meet this energy budget. The same schedule is used for every node in the network, so nodes do not learn which actions they should perform, nor do they adapt their sampling rate to en- vironmental changes such as the approach of the vehicle. This is the typical ap- proach used by current sensor network applications. The second approach employs dynamic scheduling, in which nodes continuously adjust their sleep period based on their current remaining energy. This allows nodes that do not consume energy aggregating or transmitting data to use this conserved energy to increase their sampling rate. The third and final approach, the Hoods tracker, is based on the tracking system implemented using the Hoods communication model [7]. It is largely similar to the dynamically scheduled tracker except in the way that nodes calculate the target location. Each node that detects the vehicle broadcasts its sensor reading

42 ;LO GIN: V OL. 30, NO. 5 to its neighbors. The node then listens for some period of time and, if its own reading is the maximum of those it has heard, computes the centroid of the readings (based on the known locations of neighboring nodes) as the estimated target location. This location estimate is then routed toward the base station. We implemented the Hoods tracker to emulate the behavior of a previously published tracking system for direct comparison with the SORA approach. For purposes of comparison, we are interested in two metrics: tracking accuracy and energy efficiency. We do not expect SORA to be more accurate than the other scheduling approaches, but if it is to be a realistic solution it should per- form similarly. Our goal is to give good performance while maximizing energy efficiency, so we are willing to sacrifice some accuracy for more efficient energy usage.

FIGURE 1: TRACKING ACCURACY AND ENERGY EFFICIENCY Figure 1 summarizes the accuracy and efficiency of each scheduling technique as the energy budget is varied. Each system varies in terms of its overall tracking accuracy as well as in the amount of energy used. While SORA has a somewhat higher rate of tracking error compared to the other scheduling techniques, it demonstrates the highest efficiency, exceeding 66% for a daily energy budget of 2100J. The static and dynamic schedulers achieve an efficiency of only 22%. In SORA, most nodes use far less energy than the budget allows. The ability of SORA to “learn” the duty cycle on a per-node basis is a significant advantage for increasing network lifetimes. The Hood tracker performs poorly due to its dif- ferent algorithm for collecting and aggregating sensor data, so it is not included in Figure 1.

;LOGIN: OCTOBER 2005 DISTRI BUTED, ADAPTIVE RESOURCE ALLOCATION FOR SENSOR NETWORKS 43 Conclusions

Current approaches to resource management are often extremely low-level, re- quiring that the operation of individual sensor nodes be specified manually. With SORA, nodes self-schedule their local actions in response to feedback. This allows nodes to automatically adapt to changing conditions and specialize their behavior in response to physical location, routing topology, and environmental changes. While it does require a reformulation of application logic, using SORA is not particularly difficult, and it allows sensor networks to utilize resources much more efficiently than standard, ad hoc manual techniques. The increased complexity of future sensor network applications may make them a bad fit for this simple technique, but SORA is not meant to be universal. We view it not as a final solution, but as an important step that demonstrates the potential power of adaptive techniques in real sensor network systems.

REFERENCES [1] Alberto Cerpa, Jeremy Elson, Deborah Estrin, Lewis Girod, Michael Hamilton, and Jerry Zhao, “Habitat Monitoring: Application Driver for Wireless Communications Tech- nology,” Proceedings of the Workshop on Data Communications in Latin America and the Caribbean, 2001. [2] Alan Mainwaring, Joseph Polastre, Robert Szewczyk, David Culler, and John Ander- son, “Wireless Sensor Networks for Habitat Monitoring,” ACM International Workshop on Wireless Sensor Networks and Applications (WSNA’02), Atlanta, GA, USA, Sept. 28, 2002. [3] Dan Li, Kerry Wong, Yu Hen Hu, and Akbar Sayeed,”Detection, Classification and Tracking of Targets in Distributed Sensor Networks,” Journal of the IEEE Signal Processing Magazine, Vol. 19, No. 2, March 2002. [4] Yingqi Xu and Wang-Chien Lee, “On Localized Prediction for Power Efficient Object Tracking in Sensor Networks,” Proceedings of the 1st International Workshop on Mobile Dis- tributed Computing, May 2003. [5] Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998. [6] R. Brooks, P. Ramanathan, and A. Sayeed, “Distributed Target Classification and Tracking in Sensor Networks,” Proceedings of the IEEE, November 2003. [7] Kamin Whitehouse, Cory Sharp, Eric Brewer, and David Culler, “Hood: A Neighbor- hood Abstraction for Sensor Networks,” Proceedings of the International Conference on Mo- bile Systems, Applications, and Services (MOBISYS ’04), Boston, MA, June 2004. [8] Venkata A. Kottapalli, Anne S. Kiremidjian, Jerome P. Lynch, Ed Carryer, Thomas W. Kenny, Kincho H. Law, and Ying Lei, “Two-Tiered Wireless Sensor Network Architecture for Structural Health Monitoring,” Proceedings of the SPIE 10th Annual International Sym- posium on Smart Structures and Materials, San Diego, CA, March 2000.

44 ;LO GIN: V OL. 30, NO. 5 ters. Plus, the pictures are very Optimizing Linux Performance does convincing. a nice job of negotiating these dif- If you are looking for practical in- ficult waters. It’s not going to turn formation of immediate use about anybody into a master perfor- passive and indirect attacks, this is mance tuner (no book is), but it not the book for you; then again, it should give somebody with nor- makes no claim to be. If you are a mal system administration skills “hacker” type in the old sense of sufficient background in the tools the word, fond of taking things and techniques to start solving apart to see how they work, and problems. Its descriptions of how book reviews you have any interest in security, things work are brief, but deep you will probably find significant enough that you can understand ELIZABETH ZWICKY portions of this book intriguing. what the tools are doing and why you care, and it covers a wide [email protected] Try not to be turned off by the ini- tial chapters, which unfortunately range of tools, with information about how to use their output. ADAM TUROFF are the weakest. Most valuably, it talks about, and [email protected] OPTIMIZING LINUX PERFORMANCE: demonstrates, the process of per- A HANDS-ON GUIDE TO LINUX formance tuning—how you actu- SILENCE ON THE WIRE: A FIELD PERFORMANCE TOOLS ally put the tools together to get a GUIDE TO PASSIVE RECONNAISSANCE Philip G. Ezolt desired result. AND INDIRECT ATTACKS Prentice Hall, 2005, 0-13-148682-9, Performance Tuning for Linux Serv- Michal Zalewski 353 pp. ers does not succeed so well. It at- No Starch Press, 2005, 1-59327-046-1, PERFORMANCE TUNING FOR LINUX tempts a vast and ambitious task: 281 pp. SERVERS to cover everything you need to Reviewed by Elizabeth Zwicky know about Linux internals, down Sandra K. Johnson, Gerrit to very precise levels of detail, plus Zalewski’s book is an interesting Huizenga, and Badari Pulavarty, performance characteristics of all tour through some security-related eds. the common kinds of server work- stuff. It is not, as the subtitle im- IBM Press, Pearson plc, 2005, 0-13- loads, and case studies, with a tu- plies, an overview of passive at- 144753-X, 547 pp. torial on programming for perfor- tacks. It’s more of a country ramble Reviewed by Elizabeth Zwicky mance thrown in. I’m prejudiced than the field guide you might use to start with against books with on such a walk. It wanders This batch of books brought me large numbers of authors, and this through security, and computing two on Linux performance opti- one, with 23 authors and three ed- as a whole, with some novel infor- mization, an under-served topic, itors, has only strengthened my mation, some basic information and a difficult one to write about. prejudices. The chapters are of un- you may not have come across be- Performance tuning is difficult, even quality and often overlap or fore, and some information you al- and doing it well involves three even contradict each other. Many ready knew and didn’t much care things: of the chapters appear to have about. I, for instance, found the 1. Excellent detective skills. originally been written for other description of how modems actu- 2. An intimate understanding of purposes and fail to integrate well ally work irrelevant but amusing, precisely how the system you are with the rest of the book. but yawned my way through the tuning works. state tables that accompany the re- This book contains an enormous capitulation of the entire history 3. Familiarity with applicable amount of information, but it’s not of computing, starting with logic tools. in a form that is particularly us- itself and ending with modern 1 is extraordinarily difficult to able. I like the chapter on file processor design. Then again, I teach at all, let alone in a book. 2 is servers (it’s entitled “File and Print was fascinated by the section on an immense amount to cover, dif- Servers,” but it doesn’t really say random number generators; it’s fers importantly between sub-re- much about print servers), and the a frequently discussed topic in leases of a single product, and final “case study” chapter is a gen- security, but it’s still rare to see a leads into territory best covered by uine case study that ties things to- lucid discussion of how they work “Internals of” books. That leaves gether and shows a realistic trou- and don’t work, and why it mat- 3, which isn’t all that interesting bleshooting process. The detailed without 1 and 2. sections on internals may also be

;LOGIN: OCTOBER 2005 BOOK REVIEWS 45 of interest if you are reasonably fa- ting him to use anything else is going to make her happy about fir- miliar with kernel internals on now a lost cause). But my father ing up a terminal. Your equally hy- other systems and want a technical is pretty technological to start pothetical technology-oriented overview of Linux internals. Al- with. The hypothetical mother in niece, however, will be writing though the information in them is sentences like “Even my mother shell scripts in no time. I found of use in performance tuning, they can use this computer” is not one unfortunate and puzzling do not in general tell you why or going to use this book. It would be error (the statement in Chapter 2 how. great for your reasonably technol- that the shell waits for background If you are experienced in perfor- ogy-savvy relatives, the people processes to finish before return- mance tuning and want to know who are actually running the latest ing its prompt—it’s not even clear about Linux specifics, parts of this version of Windows, and not be- to me what this was intended to book may be of use to you. If you cause it was installed on the com- say, but it’s definitely wrong), but are not, the book is unlikely to be puter when they bought it, either, single errors like that can occur in useful, and some chapters contain or the ones who successfully put even the best of books. information that is misleading or together their own machine. It’s Max OS X Tiger for UNIX Geeks is downright incorrect: “A module is not going to work for the people a nice introduction to software de- a kernel feature that provides the who think you’re a crazy UNIX velopment on OS X Tiger. It starts benefits of a microkernel without a bigot who hates Microsoft for with a general introduction to the penalty,” for instance. Skip it, and some stupid political reason of features of MacOS X that are dif- get Optimizing Linux Performance. your own. Peter van derLinden ferent from other operating sys- tries to be balanced and fair, but tems, and then dives into what you PETER VAN DERLINDEN’S GUIDE TO he’s got a very strong UNIX and need to know to build software. I LINUX open source accent, which I find found the general introduction en- Peter van derLinden charming but which will set off lightening and useful, although Pearson Education, 2005, 0-13- alarm bells in people who trust from a system administration 187284-2, 624 pp. Microsoft and distrust the open point of view it has some short- source community. I’m not sure it’s comings—I really wanted to know LEARNING UNIX FOR MAC OS X possible to sell Linux to those peo- TIGER how inetd.conf plays with the ple, but I’m sure this book won’t other ways of starting programs on Dave Taylor do it, although it does try. demand, for instance. These are O’Reilly, 2005, 0-596-00915-1, 260 pp. The book covers Linux installa- minor issues, however, and the

MAC OS X TIGER FOR UNIX GEEKS tion, using Linspire, and most day- book is intended for developers, to-day operations. It’s strongly ori- who can just use an appropriate Brian Jepson ented to using free (both open Tiger method to start their pro- O’Reilly, 2005, 0-596-00912-7, 395 pp. source and non-commercial) tools, grams. Even for a non-developer, Reviewed by Elizabeth Zwicky so it covers Lphoto and OpenOf- it’s a great reference to porting and fice, but doesn’t talk about emula- packaging software for Tiger. I rec- Here’s a batch of books trying to tion or about commercial software ommend it. introduce particular UNIX vari- for Linux. It does have thorough ants to various audiences. Peter PERL BEST PRACTICES coverage of dual boot options. van derLinden’s Guide to Linux is Damian Conway intended for people converting Learning UNIX for Mac OS X Tiger O’Reilly, 2005, 0-596-00173-8, 544 pp. from Windows; Learning UNIX for is what you offer to your geeky rel- Mac OS X Tiger is for experienced atives with Macintosh systems to Reviewed by Adam Turoff Macintosh users learning to use get them to use the command line. Conway’s collection of 256 tips for the UNIX command line; and the It is a solid introduction to UNIX writing better Perl programs is a audience for Mac OS X Tiger for basics with a Macintosh accent, in- breath of fresh air, highlighting UNIX Geeks is experienced UNIX cluding coverage of Macintosh- practices that work well and id- developers converting to the Mac- specific features and difficulties ioms to be avoided at all costs. All intosh. and comparisons with the GUI of his advice is prefaced with an tools. It moves at a pretty fast clip, Peter van derLinden’s Guide to Linux implicit zeroth-law of program- as you need to if you’re going to might possibly have allowed my ming: Always code as if the person get the basics of any UNIX into a father to switch to Linux (after who ends up maintaining your code mere 245 content pages; so once years of loathing Windows and will be a violent psychopath who again, if you have the hypothetical using it anyway, my father has knows where you live. mother people talk about, it’s not converted to a Macintosh, and get-

46 ;LO GIN: V OL. 30, NO. 5 Some advice, such as the chapters ADVANCED PERL PROGRAMMING, on advanced uses of Perl, not ad- on formatting, naming, and con- 2ND ED. vanced (and oblique) features of trol structures, may seem nitpicky Simon Cozens Perl. The book opens with a dis- and arbitrary. Taken together, they cussion of some deep, dark magic- O’Reilly, 2005, 0-596-00456-7, 304 pp. constitute a set of conventions that playing with the symbol table and allow you to do more with less ef- Reviewed by Adam Turoff incorporation of useful features fort. Later chapters focus on more The first edition of Advanced Perl first found in other programming advanced topics where the prac- Programming covered an esoteric languages. From there, the book tices described represent the dif- grab bag of topics, some useful, switches to a Cook’s Tour of ference between keeping your san- some arcane, and some impracti- CPAN, where Cozens compares ity and spending extreme amounts cal. These represented the van- and contrasts alternative CPAN of time mired in needless debug- guard of what “advanced Perl” modules for doing similar tasks. ging. meant in 1997. Since then, topics The discussions are useful and in- I recommend Perl Best Practices to such as object-oriented Perl, refer- formative, but never very deep. anyone who deals with Perl even ences, closures, modules, and eval While this book may continue to on a casual basis. This is the first are now widely understood and no be mistitled, it is certainly a wel- book that simply and concisely longer “advanced” topics. come addition to my Perl book- captures the lore of what makes a If you wrote off this title based on shelf. I recommend this book to good Perl program good, and how experience with the first edition, programmers who are starting to to avoid features that slowly lead look again. The second edition of delve into advanced topics such as to bit-rot. This book is already on Advanced Perl Programming is a parsing, templating, testing, event- my list of books to reread annually. complete rewrite, and attempts to centric programming, Unicode, or address the shortcomings of the database modeling. first edition. Topics considered ad- vanced eight years ago are now taken as given. The focus is now

SAVE THE DATE! WORLDS ’05 Second Workshop on Real, Large Distributed Systems December 13, 2005, San Francisco, CA http://www.usenix.org/events/worlds05

Join us in San Francisco, CA, December 13, 2005, for one day of discussion of useful ideas, experience, and research directions in the area of real, large distributed systems. The Second Workshop on Real, Large Distributed Systems will bring together people who are exploring the new challenges of building widely distributed networked systems and who lean toward the “rough consensus and running code” school of systems building. WORLDS is a place to share new ideas, experiences, and work in progress, with an emphasis on systems that actu- ally run in the wide area and the specific challenges they present for designers and researchers.

;LOGIN: OCTOBER 2005 BOOK REVIEWS 47 USENIX BOARD OF DIRECTORS 25 YEARS AGO Communicate directly with the PETER H. SALUS USENIX Board of Directors by writing to [email protected]. [email protected] It was October 1980. PRESIDENT ;login: carried an article headed: USENIX Michael B. Jones, “Microsoft Xenix Operating Sys- [email protected] tem.” With all the to-do over who did what to whom when in UNIX notes (see http://www.groklaw.net), I VICE PRESIDENT thought I’d read it and quote USENIX MEMBER BENEFITS Clem Cole, from it. [email protected] “The XENIX* Operating System Members of the USENIX Associa- from Microsoft is a microprocessor tion receive the following benefits: SECRETARY adaptation of Bell Laboratories’ FREE SUBSCRIPTION to ;login:, the Associ- Version 7 UNIX* operating system. Alva Couch, ation’s magazine, published six times With the XENIX operating system, a year, featuring technical articles, [email protected] Microsoft is bringing the power system administration articles, tips of the UNIX OS to microcomput- and techniques, practical columns on such topics as security, Perl, Java, and TREASURER ers. The XENIX Operating System will be released in versions that operating systems, book reviews, and Theodore Ts’o, run on the Zilog Z8000, Motorola summaries of sessions at USENIX [email protected] conferences. M68000, Intel 8086, and the DEC PDP-11*.” A C CESS TO ;LOGIN: online from October DIRECTORS 1997 to this month: www.usenix.org/ That’s how it began. It continued: Matt Blaze, publications/login/. “The XENIX system is an interac- [email protected] ACCESS TO PAPERS from USENIX confer- tive, multi-user, multi-tasking op- ences online: www.usenix.org/ Jon “maddog” Hall, erating system with a flexible user publications/ library/proceedings/ [email protected] interface. . . . THE RIGHT TO VOTE on matters affecting Geoff Halprin, “Microsoft is committed to sup- the Association, its bylaws, and elec- [email protected] porting the XENIX system as an tion of its directors and officers. Marshall Kirk McKusick, environment for program develop- DISCOUNTS on registration fees for all [email protected] ment that will be identical on all USENIX conferences. the popular 16-bit microproces- sors. Different CUs, as well as dif- DISCOUNTS on the purchase of proceed- EXECUTIVE DIRECTOR ings and CD-ROMs from USENIX ferent hardware boards for the conferences. Ellie Young, same CU, will be supported. . . .” [email protected] SPECIAL DISCOUNTS on a variety of prod- Interestingly, even 25 years ago, it ucts, books, software, and periodi- “sounds” like Microsoft: “backed cals. For details, see www.usenix.org/ by Microsoft’s experience and ex- membership/specialdisc.html. pertise in producing quality system software.” “Microsoft is committed FOR MORE INFORMATION regarding membership or benefits, please see to having a version of the XENIX www.usenix.org/membership/ system for every popular 16-bit mi- or contact offi[email protected]. croprocessor by the end of 1981.” Phone: 510-528-8649 Go talk to “Bob Greenberg, XENIX Product Manager,” should you have any questions. Oh. Those asterisks. There’s a 4- line footnote about TMs.

48 ;LO GIN: V OL. 30, NO. 5 WORKSHOP PROCEEDINGS meeting we selected an executive want to hear which member ser- The October 1985 ;login: carried director. By the time you read this, vices SAGE members want, and seven papers from the December we will have likely completed how best to deliver them. Board 1984 Graphics Workshop, held in arrangements with the AMC and members will be there all week Monterey. In Atlanta, in June will have introduced it and our ex- and will welcome input from pres- 1986, Reidar Bornholt accosted me ecutive director to SAGE. ent or potential SAGE members. with questions about the “pro- We continue to work in the short Please strongly consider attending ceedings” for the 1985 workshop. term, with the support of the LISA this year to see and shape the When I got back to El Cerrito USENIX Board, on moving SAGE future of SAGE, as well as for the (where the USENIX office was member services to our control. excellent technical and training then situated), I asked, and a dusty We hope to finish well before the programs that LISA offers. manila envelope was produced. A LISA ’05 conference. We also Speaking personally, I have en- photo-offset Proceedings was formed a number of strategic com- joyed serving on the SAGE Board available by September. It was the mittees, described in the August a great deal so far. The engage- first workshop so memorialized. Memo to Members, to build vari- ment and energy—and, honestly, ous parts of a stable SAGE, such fun times—at the meeting were a as partnerships, development of hopeful sign of things to come. I SAGE UPDATE sponsors, membership, communi- hope you join us in building an CHRIS PALMER cations, and leadership. effective organization focused solely on our profession. Please, [email protected] Leadership is a special focus of the new SAGE. We have formed a if you wish to get involved or Since the writing of the August highly independent, non-Board have feedback on the future of SAGE update, many of the transi- Leadership Committee. The Lead- SAGE, don’t hesitate to contact tional arrangements necessary to ership Committee will be the us at [email protected]. form a new, independent SAGE nominating committee for elec- have come to pass. First and fore- tions, but will stay active between USACO: THE USA COMPUTING OLYMPIAD most, we held elections for the elections to engage and encourage new Board. The SAGE Board con- volunteers, helping to develop R OB KOLS T AD sists of Geoff Halprin, Trey Harris, present and future leaders of USACO Head Coach Doug Hughes, Andrew Hume, SAGE. Greg Rose has graciously You probably know that USENIX Chris Palmer, David Parter, Tom consented to serve as chair of the is a major sponsor of the USA Perrine, Stephen Potter, and Pat Leadership Committee. Wilson. Computing Olympiad, a 13-year- We then went beyond our institu- old organization that serves pre- Then, from July 29th to the 31st, tional efforts and examined the college computer enthusiasts the new Board met for a strategic mission of SAGE. SAGE is dedicat- around the world. The Computing planning session and first Board ed to individual support and edu- Olympiad has these goals: meeting of the new SAGE. We cation, advancing our profession Provide pre-college students with elected our officers: Tom Perrine is as a whole, and serving the wider opportunities to sharpen their now the president, Pat Wilson the public interest in system adminis- computer programming skills to vice president, and Andrew Hume tration. That’s a broad mandate, enable them to compete success- is secretary/treasurer. We then and hard to distill. We aim to pro- fully at the international level. spent nearly all of our time plan- vide our membership year-round ning for the short and medium with the community experience Enhance the quality of pre-college term, and doing preliminary long- LISA conferences offer. Education, computer education by providing term brainstorming. communication, and research all students and teachers with chal- In the short term, our efforts are fall under this effort. lenging problems, training materi- als, and competitions that empha- focused on creating a stable, trans- Now we have to decide which pro- size algorithm development and parent, and responsible organiza- grams we want to revive or start, problem-solving skills. tion. Without a strong foundation, in what order, and how we deliver further efforts would be futile. The on them. SAGE’s work up to and Recognize excellent students with Interim Board did great work here, including LISA ’05 will aim to an- outstanding skills in computer sci- providing us with a corporation, swer these questions. We will be ence and encourage them to pur- bylaws, and legal structure. They presenting our medium-range sue the profession. also chose an association manage- plans there at the Community ment company (AMC), and at the Meeting and BoF sessions. We’ll

;LOGIN: OCTOBER 2005 USENIX NOTES 49 Provide educational, motivational, 18–25. Veterans Eric Price and year with an increase in net assets and competitive materials in the Alex Schwendner will be joined of $218K. form of programming competi- by John Pardon and Matt Mc- tions and Web-based training. Cutcheon in vying for the gold USENIX MEMBERSHIP DUES AND EXPENSES The 2004–05 season has been ex- against 300 students from over traordinarily successful in achiev- 75 other countries. USENIX averaged 5,700 members ing these goals. The Training Pages The USACO continues to strive in 2004, an 11% drop from the are approaching 30,000 logins per to accomplish its mission with previous year. Of these, half opted month by students in the 90 differ- expanding programs and qualify- for SAGE membership as well. ent countries that utilize the ing events. If you’d like to assist Chart 1 shows the total USENIX USACO training facilities. The or if your organization would like membership dues revenue mailing list has over 18,000 mem- to support the USACO, please ($632K) for 2004, divided into bers; participation in our monthly contact Rob Kolstad at membership types. Chart 2 pre- contests (held during the academ- [email protected]. sents how those dues were spent. ic year) will soon top 1,000 elite Note that all costs for producing programmers per contest. THE USENIX ASSOCIATION . conferences, including staff, mar- The most recent achievement was FINANCIAL REPORT FOR 2004 keting, and sales and exhibits, are the USA Invitational Computer covered by revenue generated by ELLIE YOUNG Olympiad held June 1–9 at Col- the conferences. orado College in Colorado [email protected] OTHER USENIX PROGRAMS Springs, Colorado. The 15 best The following information is pro- U.S. competitors matched skills vided as the annual report of the Chart 3 describes how the money against the best from Bulgaria, USENIX Association’s finances. allocated to student and outreach Canada, China, and Poland. Veter- The accompanying statements programs and standards activities an senior Alex Schwendner have been reviewed by Michelle ($200K) was spent in 2004. Chart emerged victorious, edging out Suski, CPA, in accordance with 4 shows how the USENIX admin- silver medalists Eric Price, Filipek Statements on Standards for Ac- istrative expenses were allocated. Wolski from Poland, and Chuen- counting and Review Services is- (The category “Misc.” covers such guag Zhu from China. sued by the American Institute of items as legal fees, elections, re- Among the innovations this year Certified Public Accountants. Ac- newals, taxes, licenses, and bank was the second of six rounds: the companying the statements are charges.) speed competition. Unlike most several charts that illustrate where SAGE programming contests, where the your USENIX and SAGE member- action cannot be compared favor- ship dues go. The Association’s Chart 5 shows SAGE revenue ably to watching paint dry, the complete financial statements for sources for 2004 (primarily mem- speed competition was suspense- the fiscal year ended December 31, bership dues of $145K and the ful. Competitors were given a 2004, are available on request. $114K revenue share from its problem requiring 5–10 minutes sponsorship of the LISA confer- to solve. The first solution earned FINANCIAL STATEMENT SUMMARY ence). Chart 6 shows all SAGE the most points, with declining USENIX continues to be a healthy expenses (a total of $226K). points until half the competitors organization. In 2003 and 2004,

had completed it. At that point, we managed to have a modest in- 2006 USENIX NOMINATING COMMITTEE the sub-round closed. Because crease in our net assets, which en- contestants incur a small penalty abled us to replenish the Reserve The biennial election of USENIX’s for wrong answers, later correct Fund. At the beginning of the Board of Directors will be held in answers may score more points. 2004 fiscal year, USENIX pro- the spring of 2006. The USENIX With almost a dozen tasks and two duced a balanced budget. Howev- Board has appointed Kirk McKu- dozen programmers, and nonstop er, we incurred a slight deficit of sick to serve as chairman of the action, several competitors kept $74K in operations, which was Nominating Committee. The com- neck-and-neck for the lead! mainly due to lower than project- position of this committee and in- The Olympiad was also used to ed attendance at the USENIX An- structions on how to nominate in- choose the four U.S. students who nual Technical Conference. This diviuals have been sent to USENIX will represent our country at the was largely offset by the fourth- members electronically and pub- International Olympiad, to be held quarter performance of the Re- lished on the USENIX Web site. in Nowy Sacz, Poland, August serve Fund. USENIX ended the

50 ;LO GIN: V OL. 30, NO. 5 USENIX ASSOCIATION STATEMENT OF FUNCTIONAL EXPENSES For the Years Ended December 31, 2004 and 2003

Student Conferences Programs, Managemen and Programs and Good Works Sage Total t and Fund Total 2004 2003 Workshops Membership and Projects SAGE Certification Program general Raising Support Total Total

Operating Expenses Conference & workshop-direct * $ 1,855,355 $ $ $ $1,855,355 $ $ $ $0 $ 1,855,355 $ 1,631,705 Personnel and related benefits: Salaries 509,536 121,306 17,128 114,140 762,110 210,831 75,182 286,013 1,048,123 1,150,706 Payroll taxes 49,385 11,757 1,660 11,063 73,865 20,434 7,287 27,721 0 101,586 82,928 Employee benefits 101,554 24,177 3,414 22,749 151,893 42,020 14,984 57,004 0 208,898 215,345 Membership/proceedings 8,254 8,254 0 8,254 7,370 Membership/login: 164,866 164,866 0 164,866 187,520 Membership/e-learning 0000 SAGE expenses * 39,630 39,630 0 39,630 68,839 SAGE Certification expenses * 0 0 0 6,878 Student programs, Good . Works, and projects 200,083 200,083 0 200,083 108,019 General and administrative * 188,929 98,394 6,954 21,176315,453 0 168,269 15,432 183,701 499,154 505,372 $ 2,704,759 $ 428,755 $ 229,239 $ 208,757 $3,571,509 0 $ $ 441,554 $ 112,885 5$54,439 $ 4,125,949 3,964,682

Additional detail may be found in the supplemental schedules on pages 14-17.

;LOGIN: OCTOBER 2005 USENIX NOTES 51 Chart 2: Where Your 2004 Membership Dues Went

Executive Office Expenses 28%

Executive Office Personnel 45%

;login: 27%

Chart 3: Student & Outreach Program Expenses 2004

Other Conference Sponsorship/Stipends 11%

Membership in Computing Research Association (CRA) & The Open Group 5%

Student Stipends to Attend USENIX Conferences 39% Standards Activities Chart 1: USENIX Member Revenue Sources 2004 10%

Student K-12 Program: USA 4% Corporate Computing Olympiad 4% 7% Supporting 4% Other: University Outreach, Awards, OpenAFS 3% Educational Inst. 6% Support for Electronic Frontier Foundation (EFF) 25%

Affiliates 7% Chart 4: USENIX Administrative Expenses 2004

Telephone & Postage 3% Office & Computer Supplies 4% Rent & Utilities 19% Accounting 6%

Individual 75% Insurance 8%

Marketing/Promotion/PR 8%

Misc 19%

Board Meetings 9%

Database, Systems & Depreciation Web Administration 12% 12%

52 ;LO GIN: V OL. 30, NO. 5 Chart 5: SAGE Revenue Sources 2004 Chart 6: SAGE Expenses 2004

Miscellaneous Pubs/T-Shirts/Misc 6% 2% Publications 8%

Share of LISA '03 Office Expenses Conference Revenue 15% 44%

Dues 54%

Personnel 71%

NEW! ;login: Surveys To Help Us Meet Your Needs

;login: is the benefit you, the members of USENIX, have rated most highly. Please help us make this magazine even better. Every issue of ;login: online now offers a brief survey, for you to provide feedback on the articles in ;login: . Have ideas about au- thors we should—or shouldn’t—include, or topics you’d like to see covered? Let us know. See http://www.usenix.org/publications/login/2005-10/ or go directly to the survey at https://db.usenix.org/cgi-bin/loginpolls/oct05login/survey.cgi

;LOGIN: OCTOBER 2005 USENIX NOTES 53 Attention, Members: Are You Getting the Most Out of Your Membership?

Become an active member of the Association.This is your community: get involved!

We are proud of our 30-year history of offering services to the advanced computing systems community. The support and participation of our members make us able to offer some of the most highly respected conferences and publications in the industry. We have recently added the benefit of a Jobs Board for all members, as well as additional benefits for our Educational, Corporate, and Supporting members. We encourage you either to upgrade your membership or to talk to your employer about an institutional membership with USENIX. In addition to the great benefits you already enjoy, we are offering these new benefits:

STANDARD MEMBERSHIPS: INDIVIDUAL ($115 PER YEAR) AND STUDENT ($40 PER YEAR)

• The USENIX Jobs Board: Looking for a new job? USENIX members have direct access to offerings from top-notch potential employers. For information on how to post, see http://www.usenix.org/jobs/.

EDUCATIONAL MEMBERSHIP ($250 PER YEAR)

• The USENIX Jobs Board (see above) • Up to two additional copies of ;login: per issue (email offi[email protected] with your request)

CORPORATE MEMBERSHIP ($460 PER YEAR)

• The USENIX Jobs Board (see above) • Up to four additional copies of ;login: per issue (email offi[email protected] with your request) • Up to five conference registrations at the USENIX member price for your staff (email [email protected] for a discount code to use in registering) • Your company name listed on our Corporate Members Web page, http://www.usenix.org/membership/corporate.html.

SUPPORTING MEMBERSHIP ($2500 PER YEAR)

• The USENIX Jobs Board (see above) • Up to four additional copies of ;login: per issue (email offi[email protected] with your request) • Tarballs of any USENIX conference Proceedings from the year before your membership term begins (email offi[email protected] with your request)

For a full listing of all benefits or to join online, please see http://www.usenix.org/membership.

54 ;LO GIN: V OL. 30, NO. 5

are responding by adding better 2005 Linux Kernel virtualization support to their Developers Summit CPUs. Jonathan Corbet is a co-founder of A session on I/O busses mostly LWN.net and the author of its kernel con- tent. He is the lead author of Linux Device concerned technical details on Drivers, 3rd edition, published by O’Reilly. the best interfaces for DMA opera- For the last four years, Jonathan has been tions and dealing with memory- conference on the planning committee for the Kernel Summit. challenged devices and systems. The kernel contains several inde- In the young and fast-moving reports pendent mechanisms for setting up Linux community, anything that and executing scatter/gather I/O; it has happened for four years in a was agreed that it might be nice to row can be called “traditional.” OUR THANKS TO THE SUMMARIZERS: unify these subsystems at some Thus, the 2005 Linux Kernel De- point, but nobody seems in any real 2005 LINUX KERNEL DEVELOPERS velopers Summit, held on July 18 hurry to do the work. There was SUMMIT and 19 (immediately prior to the discussion of a new memory alloca- Jonathan Corbet Ottawa Linux Symposium), is by tion interface that would help driv- now a traditional event. For two WITMEMO ’05 ers work around the memory ad- days each year, this invitation-only David Blinn dressing limitations found in a crowd of around 70 core kernel de- Minkyong Kim discouraging number of devices; a velopers gather to talk about where patch is expected soon. Irfan Sheriff kernel development should go over EESR ’05 the next year. Few important devel- The virtual memory management Ahmad T. Al-Hammouri opment decisions were made at the session showed that nobody is itch- ing to make major changes to how Apratim Purakayastha 2005 Summit, but it was an oppor- tunity for developers to catch up on the VM subsystem works—in the Himanshu Raj what is happening in areas outside near future, at least. Instead, atten- J. Sairamesh their particular expertise and, of tion is currently focused on dealing MOBISYS ’05 course, to pursue topics of interest with memory fragmentation and Mike Blackstock in the hallway and pub meetings. memory pressure. The most likely short-term solution to fragmenta- David Johnson The 2005 Summit opened with a tion (where it gets hard for the ker- Neil McCurdy panel of processor architects. This nel to allocate multiple, contiguous panel has, over the years, served Xiaoqiao (George) Meng pages) is a new allocation scheme as a forum where manufacturers Himanshu Raj that would segregate user-space could share some of their plans and Ram Kumar Rengaswamy pages (which are easily moved) hear about any concerns the kernel Ya-Yunn Su from kernel memory allocations developers have. Two themes stood (which are not). Linux still does Denitsa Tilkidjieva out this year: power management not behave as well as anybody Hongwei Zhang and virtualization. Manufacturers would like when memory gets seri- need to reduce the power demands HOTOS X ously tight; the issue here seems to of their chips lest future systems be Prashanth Bungale be finding a good way to throttle required to be equipped with cryo- Rik Farrow memory-intensive processes with- genic cooling units; they would like Alexandra Fedorova out creating performance problems. to have help from the kernel devel- Nikolaos Michalakis opers in designing algorithms A brief session discussed some se- Steve VanDeBogart (scheduling in particular) that can curity-related patches that could be Steve Zhang help with power management. merged soon. These include better The developers, in return, would kernel API checks (finding double- VEE ’05 like a reliable way to ask the free errors, for example), more ad- Xing Fang processor what its current power dress space randomization, making Long Fei consumption is so that power-relat- use of recent gcc features, and Shuo Yang ed changes can be benchmarked. tightening access to various files in

SRUTI ’05 There is quite a bit of hype around /proc and /dev/mem. virtualization—running guest op- Jayanthkumar Kannan A session dedicated to virtualiza- erating systems on top of software- Balachander Krishnamurthy tion had little to say; most of that implemented virtual machines— work is in user space at this point. Lakshminarayanan Subramanian and the processor manufacturers

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 55 There is interest in merging the tographic and security support, and talk of how the various ways of Xen patches sometime soon. Those more. We may even see support for providing real-time response could patches have been significantly re- hardware TCP offload engines, be judged, but no time to actually worked (Xen used to add itself as something that has been resisted by apply those criteria. So real-time an entirely new architecture, but the networking developers for can be expected to continue to heat that did not go over well with the years. up the mailing lists for a while yet. kernel developers) and should find From networking, the discussion The Desktop Developers Confer- their way into the kernel before too turned toward the increasing con- ence was happening at the same long. vergence of the networking and time as the Kernel Summit; a few The final session on Monday was storage subsystems. Storage-area delegates came over to give the ker- dedicated to the virtual filesystem networks, iSCSI, and so on are nel developers an update. The core layer. Some potentially contentious making networking a crucial part of the discussion consisted of issues (such as the merging of Reis- of the block I/O subsystem. This grungy details on how to rational- er4 or FUSE) were avoided entirely; convergence can cause problems ize Linux support for graphics instead, the discussion centered on when memory gets tight; the block cards. These cards are complex de- the increasing complexity of the layer needs to write out pages to vices, often with secret interfaces. core VFS code. In particular, the free memory, but a network-based In the past, there has been a great mixture of direct and buffered I/O storage layer must allocate memory deal of confusion as to whether has created a difficult mess that to accomplish those writes. There these cards should be controlled by somebody will eventually have to are things that can be done to ad- the kernel or by the X server in clean out. dress this problem, but Linus Tor- user space. These issues are slowly Tuesday began with a panel that ad- valds also wants to push back on being worked out, and better dressed the frustrations faced by the manufacturers of these systems. graphics support should be coming hardware manufacturers who wish Rather than go through all this to a screen near you shortly. to work with the kernel develop- trouble to make network-based The kernel developers heard a re- ment process. The corporate way storage work, wouldn’t it be better port from the power management of doing things, involving fixed to just install a local disk? That summit, held two days earlier. schedules, lengthy internal quality said, there are real reasons behind Much work remains to be done in assurance work, and control over these technologies, and Linux will the power management area. There the code, does not mix well with find a way to support them are currently two software suspend the Linux process. Getting code properly. implementations, neither of which into the kernel is easier if that code A brief session on clusters showed is as solid as its users would like. It is posted at a very early stage so that there was not a whole lot to was agreed that the external “sus- that show-stopper problems can be concern the kernel developers; pend2” patches would be posted identified and fixed. But hardware once again, most of the work is and considered for merging into companies would rather just pro- now in user space. There will be the mainline. Video adapters are a duce a fully functional, tested, and moves to merge a couple of cluster constant challenge in making sus- certified driver at the end. That ap- file systems soon (RedHat’s GFS pend work; they are supposed to be proach can get them sent back to and Oracle’s OCFS2); it seems that reinitialized by the operating sys- the drawing boards with funda- the two might have agreed to use tem on resume, but the manufac- mental problems to fix. The ven- the same distributed lock manager. turers will not tell the Linux devel- dors also complained about the The session on RAS tools was most- opers how to do that. So, instead, constantly changing kernel API, ly a celebration of the merging of pressure is now being put on the which gives them long-term sup- the kexec and kdump patches, BIOS vendors to provide that reini- port problems. which should bring reliable crash tialization support in the firmware. The networking developers had dump capability to the mainline The final session, traditionally, is held a summit of their own the kernel. There are still quite a few devoted to the kernel development week before the Kernel Summit; loose ends to tie down. process and the ongoing desire to the outcomes were summarized for Real-time capability for the Linux extract hard deadlines from Linus the crowd. A great deal is happen- kernel has been the subject of a Torvalds. Deadlines were less of an ing in the network community, in- great deal of intense discussion issue this year; instead, the devel- cluding the reworking of much old over the last year. Most of that in- opers were concerned with improv- code, the de-bloating of the core tensity failed to show up at the Ker- ing the quality of kernel releases; sk_buff structure (which represents nel Summit session dedicated to recent kernels are seen by many a packet in the kernel), better cryp- the topic, though. There was some as containing too many bugs. Two

56 ;LO GIN: V OL. 30, NO. 5 reasons were identified for this: cles to optimize radio transmis- International Workshop on kernel developers are waiting too sions, because computation re- Wireless Traffic Measurements long into the release cycle to merge quires less energy than radio trans- and Modeling (WitMeMo ’05) their changes (thus missing out missions. on weeks of testing time), and Seattle,WA Beyond ZebraNet, there are three bugs, even when identified, are not June 5, 2005 challenges: a lack of stable applica- being fixed. An attempt will be tion drivers on which to experi- made to address the first problem KEYNOTE ADDRESS ment, a lack of good experimental by requiring that new features be infrastructure, and a lack of data merged into the kernel within the Summarized by Minkyong Kim sharing among researchers. The first couple of weeks of the cycle. Dynamic Adaptation and Mobile first two challenges are difficult to After that, a feature freeze of sorts Wireless Systems: Experiences and change, but the last should be easi- will be imposed, and only fixes will Challenges er. Currently, data sharing takes be merged. Getting developers to place more or less exclusively at Margaret Martonosi, Princeton actually fix bugs can be a bigger conferences or workshops. This University challenge when there is no boss to community needs broader-scale order them to fix things. Mobile computing systems present sharing. Martonosi also advocated Overall, the 2005 Summit was seen several challenges. First, mobile creating test-beds and simulation as a successful gathering. Some de- computing happens on devices environments. with constrained optimization and velopers have noted that, over time, Martonosi concluded her talk with highly varying applications. Appli- the summit is moving away from a the following research questions: cations constantly change, and new forum where issues are debated and What can we do to tolerate sparse applications present new con- decided and is becoming instead a and high-disruption wireless net- straints. Second, hardware also two-day status report. Given that works? What do we do if a source- changes quickly. Instead of ab- the kernel has grown to a point to-destination route never exists or stracting hardware, people tend to where nobody can really under- exists only rarely? How can we re- make software to fit their needs. stand every part of it, such a status duce the packet delivery latency for Without hardware abstraction, the report can be important. But if the disseminating data? How do we first challenge becomes more se- summit is not a place where deci- better support infrastructure for vere. Third, there are always new sions are made, some of the devel- real-system wireless measurements? opers may stop coming. So there metrics for success. David Kotz may be changes made in the future (Dartmouth College) commented Maria Papadopouli (University of to spice things up a bit. that the reason people do not do North Carolina) commented on the abstraction is that they want more difficulty of correlating data from For more detailed reporting from control. different sensors and also the prob- the summit sessions, please see lem of measurement errors. David ZebraNet, a project consisting of a http://lwn.net/Articles/ Kotz (Dartmouth) asked whether network of mobile sensors de- KernelSummit2005/. there have been similar problems in signed for animal tracking, started space research. The speaker said three years ago in response to biol- that it is similar but the distinction ogists’ desire to track animals over is that the events in space opera- an extended period and long dis- tions are relatively well scheduled, tances. Biologists will use the re- whereas zebras are random. She sulting information to suggest ways mentioned that the range of control to manage land to preserve wildlife. is also different. ZebraNet uses a store-and-forward http://www.princeton.edu/~mrm/ network to collect sensor data. zebranet.html Each sensor has a radio with a one- kilometer range, though the effec- tive range was only 100–800 me- MORNING PRESENTATIONS ters due to a ground loop caused by Summarized by Irfan Sheriff the stitching in the collar. Sensor collars put on the necks of seven Analysis of a WiFi Hotspot Network zebras exchange data every two David P. Blinn, Tristan Henderson, and hours with others within range David Kotz, Dartmouth College (i.e., 2 km). The collar, designed David Blinn began by talking about from scratch, trades processor cy- some large-scale WiFi studies that

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 57 have been conducted. This study performed as well as NS-2, at the buggy too, as they found a weird is smaller than many of those and is same time supporting NS-2. packet result count with SNMP. based on the analysis of data from Ashu Sabharwal said he did not see Ashu Sabharwal asked how intru- 312 access points installed at Veri- a big difference between MobiNet sions and MAC layer misconfigura- zon phone booths in New York. and distributed NS-2, because real tions can be detected? Papadopouli However, this is the first work application traces can also be fed pointed out that some cases of mis- based on the traces collected from into NS-2. He also noted that with configured clients and APs can be a production 802.11 network. The large-scale deployment, the jitter detected by analyzing SNMP data. access points are connected encountered on the Ethernet line through an ADSL backbone. Modeling Users’ Mobility Among between the core and edge nodes WiFi Access Points The study involved looking at the could affect the results. Mahadevan number of active users and pending countered by saying that this could Minkyong Kim and David Kotz, users, and the movement of users be solved by limiting the number of Dartmouth College among access points. It was con- applications per edge node and Minkyong Kim presented a paper ducted for five weeks in Novem- having separate Ethernet lines. on modeling user mobility based ber–December 2004. Most results Additional information is available on traces collected at Dartmouth were not surprising—diurnal at http://ramp.ucsd.edu/ College. The tests were conducted usage, weekly usage patterns, 5% of ~pmahadevan/publications/ from April to May 2003. The Flux cards generating about 85% of traf- Mobinet_techrep.pdf. model, as it is called, clusters the fic, and the like. The paper con- set of access points that have the cluded that users were not very An Accurate Technique for Measuring common behavior of user distribu- mobile and usually stuck to one ac- the Wireless Side of Wireless Networks tion during different periods of time. cess point. Jihwang Yeo, Moustafa Youssef, and The model constructs five clusters Maria Papadopouli suggested that Ashok K. Agrawala, University of Mary- with a set of access points that have users who were always on the net- land; Tristan Henderson, Dartmouth similar peak behavior. Alex won- work might be working from home. College dered why there were just five clus- People were interested in the data Tristan Henderson presented a ters. Kim responded that the num- and there was discussion of at- measurement architecture for ber should probably depend on the tempts to make the data public. 802.11 networks on the wireless environment and the behavior of MobiNet: A Scalable Emulation side of the network using a set of the users. Ashu Sabharwal asked if Infrastructure for Ad Hoc and sniffers. Most earlier measurement there were any surprises in the Wireless Networks work has been on the wired side of peak behavior. Kim felt there were the 802.11 network, which cannot no significant surprises. There was Priya Mahadevan and Amin Vahdat, capture the full details of the pack- a discussion about the need for bet- University of California, San Diego; ets, because the access point strips ter models that closely simulate re- Adolfo Rodriguez, IBM and Duke Uni- some of the headers before handing alistic user behavior. versity; David Becker, Duke University over the packets. This work looks Priya Mahadevan presented an em- at countering the deficiencies of the AFTERNOON PRESENTATIONS ulation environment for ad hoc earlier work. Summarized by David Blinn wireless networks called the Mobi- The setup consisted of three sniffer Net. The advantage of this work is PCs equipped with prism2 802.11 An Experimental Study of Multimedia that real applications can be simu- cards. The tools used included lib- Traffic Performance in Mesh Networks lated on a large scale in the emula- pcap-based ethereal to capture Yuan Sun, Irfan Sheriff, Elizabeth M. tor. The drawbacks of the simula- data. The metric of importance was Belding-Royer, and Kevin C. Almeroth, tor, such as simplified physical completeness of data, and three University of California, Santa Barbara layer models and simplified mobili- sniffers were found to be sufficient Irfan Sheriff presented experimen- ty models, remain. This work tries for one access point. There was a tal results of streaming video and to combine the good features of the discussion of the effect of different voice traffic testing on a 25-node simulators (repeatability, efficien- drivers on the packet data collect- 802.11b mesh network test-bed cy) with that of live deployment ed. Maria Papadopouli thought it (http://moment.cs.ucsb.edu/ (real application usage). should not have an effect. However, meshnet/). The network topology Mahadevan presented the general people agreed that current device consisted of static routes between a architecture and evaluation results, drivers handle packets differently sequence of nodes in a four-hop which showed that the model is ac- and that such handling should be path. curate, scalable, and can support standardized. SNMP seems to be applications unmodified. MobiNet

58 ;LO GIN: V OL. 30, NO. 5 The experimenters observed that DSDV, but dramatically different re- chronization between the links, fewer simultaneous voice flows sults for DSR. He identified that which can be negated by summing could be well supported than video this difference was caused by the over minimum recorded times, and flows and determined that the effect of frequent link breakages. clock skew caused by clock racing, number of packets per second sent The AF model observed these which can be accounted for by ex- by an application had a greater im- breakages, which greatly affected amining trends in clock timing. In pact on quality than the size of the the results. an experimental setup of 802.11 ac- packets. Increasing the number of De Lara stressed that he was not cess points, experimental results multimedia flows beyond a satura- claiming his AF model was perfect, verified the scheme’s prediction of tion point caused dramatic jumps but rather that he wished to show C/3 for a three-hop network and in both latency and loss, and in- that simplifications in models C/4 for a network with four hops or creasing the number of hops de- could produce nonuniform varia- more, with C being the single-hop creased the saturation point. A tions, and that it is nontrivial to de- capacity of the links. The presenta- greater number of flows also result- termine which simplifications are tion concluded with a debate over ed in an increase in latency varia- important and which are not. the precise meaning and usefulness tion, or jitter. of the notion of wireless path ca- A paper describing the work is pacity. With multiple video flows, unfair- available at http://www.cs.toronto ness became an issue, with some .edu/~delara/papers/secon2004/. flows consuming great amounts of PANEL SESSION bandwidth leaving little for the Measurement Study of Path Capacity Summarized by David Blinn other flows. Unfairness was an even in 802.11b-Based Wireless Networks bigger problem in voice flows. At Tony Sun, Guang Yang, Ling-Jyh Chen, Who’s Afraid of Wireless the conclusion of the presentation, M.Y. Sanadidi, and Mario Gerla, Measurements Studies? some members of the audience ex- University of California, Los Angeles Panelists: Christophe Diot, Intel Re- pressed concern that some of the Tony Sun began by remarking that search Cambridge; David Kotz, measured results, such as the ef- studying wireless path capacity is Dartmouth College; Maria Papadopouli, fects of RTS/CTS, might be specific complicated by the dynamic condi- University of North Carolina; Ashu to the testing topology and would tions affecting the links, and pre- Sabharwal, Rice University fail to generalize to other networks. sented a scheme to better estimate The panelists took turns expressing The Perils of Simplified Simulation the maximum achievable data rate their hopes and fears for the future Models for Indoor MANET Evaluation in a multi-hop wireless path. of the field of wireless measure- Eyal de Lara, University of Toronto The new scheme, AdHoc Probe, ment studies. Maria Papadopouli emphasized the need to integrate Eyal de Lara opened his invited talk builds upon the previous CapProbe two directions: measurement and by questioning the value of Mobile (http://nrl.cs.ucla.edu/CapProbe/) modeling. Benchmarks and metrics Ad Hoc Network (MANET) simu- scheme, which uses a packet pair to should be defined to identify access lations when the simulations use estimate the capacity of a link. patterns. David Kotz stressed the extremely simplified space and mo- Measuring path capacity in a wire- need to build a strong measure- bility models. MANET simulations less ad hoc network is more diffi- ment community, noting the incon- frequently assume that the net- cult than in a wired network, due sistencies in definitions and tools. works are deployed in free space, to the effects of bottleneck capacity, He introduced a new endeavor, with no impediments to radio com- network topology, interference, CRAWDAD: A Community Re- munication, and that users move the use of 802.11 auto-rate, and source for Archiving Wireless Data between random waypoints. These RTS/CTS. AdHoc Probe is superior at Dartmouth (http://crawdad.cs models are inadequate and not ro- to CapProbe because it uses faster, .dartmouth.edu), which will in- bust. less interference-prone one-way transmission instead of CapProbe’s clude an archive of wireless data He introduced a detailed model, two-way transmission. Tristan Hen- sets and a set of tools and will run called attenuation factor (AF), of derson (Dartmouth) questioned measurement workshops. Ashu space and user mobility with an the absence of SIFS/DIFS delays in Sabharwal worried that the com- AutoCAD map of a building and the researchers’ calculations. Sun munity might be reinventing the waypoints between sensible points responded that the aim of the re- wheel by focusing too much effort in building rooms. In evaluating search was only a rough estimation on systems that are essentially irrel- two routing protocols, DSDV and of theoretical link capacity. evant. He introduced the CMC DSR, using these models, the de- Open Wireless Platform, an up- Implementation issues in the new tailed model predicted similar re- gradable and expandable wireless scheme include system time syn- sults to the simplified models for testbed for testing in all layers of

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 59 the network stack. Christophe Diot The infrastructure consists of three Workshop on End-to-End, pointed out the need for a strong layers: the generation, the transport Sense-and-Respond Systems, community and uniform testing (physical wires), and the substa- Applications, and Services equipment along with a common tions for distribution to end users (EESR ’05) standard to calibrate test-beds. and businesses. Because the trans- After these introductions, the panel port layer is exposed to the ele- took questions and a spirited dis- Seattle,WA ments, the whole system is vulnera- cussion followed. June 5, 2005 ble to disruptions, with serious Papadopouli and Sabharwal argued consequences to the economy. In addition, the distribution system, for the need to raise standards OPENING REMARKS within the community, which from big generation plants to users, does not support third-party, small- might require more visible work- Chatschik Bisdikian, of IBM Re- scale power producers. One objec- shops and conferences and a will- search, the co-chair of the EESR tive of the GridWise project, whose ingness to reject papers that make workshop, explained that the moti- overall goal is to apply sense-and- no attempt to justify their underly- vation behind the workshop is the respond research to the power grid ing assumptions. The panelists also context-aware middleware work for of the future, is to build the “ner- emphasized the need for multi-dis- pervasive applications. The goal is vous system” for the electric power ciplinary and interdisciplinary re- to examine the role of sensor and grid by making a ubiquitous com- search to support different layers of actuators and to apply the technol- munication infrastructure. This the systems (from the physical ogy in Internet-scale systems. layer all the way through the appli- will allow intelligent distribution of Advances in sensor technology and cation), as well as the need for power via facilitating cross-level increased intelligence in actuators strong ties with statisticians. communication. have provided a rich set of appli- What causes blackouts? Generally, James Scott (Intel Cambridge) cations. Businesses want to use complete blackouts are caused by a asked if trace-based experiments these Sensor and Actuator Net- ripple effect in which part of the could be made as simple as running works (SANETS) to make better electric grid goes down and the rest a simulator. The panel answered decisions and to form true sense- of the system tries to keep up with that it might be possible to take a and-respond (S&R) systems. An the load. When the system cannot measurement and replay it, but S&R system has not only sensors meet the demand, more of the sys- someone has to set up actual equip- and actuators but decision analysis tem shuts down, creating a vicious ment and find the traces in order to or control components. S&R sys- cycle and eventually resulting in a make it easy. People will want scal- tems evolve from SANET applica- complete blackout. To address this ability in such a setup and will tions by taking an end-to-end view, problem, we need intelligence built want to be able to tweak parame- where data is sensed, interpreted by into end-user appliances. Such ap- ters of the trace. the decision-making components, pliances will be able to take actions and then acted upon. An example Scott also asked how we might that are largely unnoticeable to the of such a system is the context- judge the success of a simulation or user and can help the overall sys- aware electricity grid, part of the trace when we have to sacrifice re- tem adjust to load shedding. Build- GridWise project, the subject of the alism for reproducibility. David an- ing such a system, however, re- keynote address. In addition to the swered that this problem is one en- quires collaboration between the keynote address, the workshop countered in the natural sciences Grid operators and device manu- hosted eight papers, with half cov- because, like a wireless network, facturers. not all elements of nature can be ering S&R technologies and half controlled. Following the example covering applications. The work- In conclusion, markets and control of these sciences, the solution is to shop closed with a lively panel dis- systems are ultimately going to perform many studies and use sta- cussion. merge. The ability to optimize at tistics to overcome the problem of lower granularities than the ones available today will drive future the huge number of parameters in KEYNOTE ADDRESS experiments, for instance in a business processes. More infor- multi-factor experiment. Simula- Summarized by Himanshu Raj mation on the GridWise project tion can then be used to drive ex- The GridWise Project is available from http://www .gridwise.org. perimentation. Rob Pratt, Pacific Northwest National Laboratory Most of the questions related to the basic electrical engineering behind Rob Pratt presented several prob- power generation, such as how lems in existing power grid design. multiple generators are kept in

60 ;LO GIN: V OL. 30, NO. 5 sync in terms of frequency (one Weak consistency is good enough. During the discussion that fol- generator starts up and works as Another member of the audience lowed, the similarities between this the heartbeat while others synchro- asked about using a content-driven work and that of multimedia sys- nize to it) and whether we can approach rather than a directory- tems were explored. Al-Hammouri packetize power and treat it like driven one. He offered TinyDB as explained that multimedia is con- data on the Internet to store and an example of a content-driven ap- cerned about jitter. He also ex- forward (no, but there are some proach. Tilak responded that a con- plained that they had examined taps in the making that can main- tent-driven approach can be real- queue lengths as well as packet-re- tain the flow of power on different ized by application-specific name- ceive rates and output lengths. He lines). spaces. The next question was presented fixed playback delay, but about how you can generate that pointed out that you could focus SENSE ’N’ RESPOND organization automatically, to on avoiding packet losses. TECHNOLOGIES which Tilak responded that file sys- More information is available at tems easily support a dynamic http://vorlon.case.edu/~vxl11/ Summarized by Jonathan Munson and scheme—unlike databases, whose Apratim Purakayastha NetBots. schemas are generally fixed. A File System Abstraction for Sense M-ECho: A Middleware for Morphable Transversal Issues in Real-Time Sense- and Respond Systems Data-Streaming in Pervasive Systems and-Respond Systems Sameer Tilak, Kenneth Chiu, and Nael Himanshu Raj, Karsten Schwan, and Ahmad T. Al-Hammouri, Vincenzo Lib- Abu-Ghazaleh, State University of New Ripal Nathuji, Georgia Institute of eratore, and Huthaifa A. Al-Omari, Case York (SUNY) at Binghamton; Bhanu Technology Western Reserve University; Stephen M. Pisupati and Geoffrey Brown, Indiana Phillips, Arizona State University Himanshu Raj presented M-ECho, University a middleware for system morphing Ahmad Al-Hammouri defined real- Challenges for wireless sensor net- (the continuous and dynamic adap- time sense-and-respond systems as works (WSNs) include heterogene- tation of services and systems). The remote control of a physical envi- ity in hardware and software, com- objective is to extend the applica- ronment via an S&R system. The munication models between system tion’s longevity by optimizing contribution of this work is formu- elements, the scale of the system, power consumption. The applica- lating a conceptual framework for and resource constraints. Previous tion area of interest is robotics. real-time S&R issues for networks. approaches have examined DB ab- M-ECho uses an event-driven, stractions, but these are too close to Transversal issues in S&R systems pub/sub architecture. Dynamically the application level. Current ab- arise because of the necessity of deployable codelets serve as event stractions are programming-lan- providing quality of service in real- handlers and filters. It uses both guage– and OS–based. In this pres- time S&R systems. One issue is dynamic code generation and static entation, Sameer Tilak proposed a adaptability and tolerance to code repositories. To evaluate the filesystem abstraction. With this round-trip delays and jitter. They system, the authors integrated it approach, the WSN is presented as decompose delay into predictable with the Player/Stage robotics a distributed file system. The in- delay and jitter. Jitter is dealt with framework and measured the terface is well understood, hides via playback buffers. Dealing with power performance of two different heterogeneity, enables application- playback delay is challenging be- mechanisms: code parameteriza- specific namespaces, offers struc- cause existing schemes (e.g., from tion/substitution and code migra- tured namespaces, and allows fine- multimedia systems) do not apply tion. In the future, Raj expects to grained control over resources. since the performance metrics are examine robotics applications, the Tilak then introduced an extended different. The delays in S&R sys- use of compiler-assisted tech- example, a Smart Zoo. The research tems are round-trip rather than niques, and quantifying the energy challenges they face include sup- one-way, and delays are determined cost of dynamic compilation of porting resource-efficient operation by the controller. The scheme they codelets versus using a static code and in-network application-specific used in this work is a combination repository server. processing, the consistency model, strategy, in which the controller de- A member of the audience asked and tolerating network unreliabili- termines the playback delay based whether the optimization is done ty. on the round-trip time. Al-Ham- mouri also discussed congestion on each node or across the entire A member of the audience asked control and enabling a choice of system. Raj responded that the op- about the rate of change of the di- utility functions for different S&R timization happens across the en- rectory structure. Tilak responded systems. tire system. Another question was that most changes are localized whether the optimization algo- even though the system is dynamic. rithm was centralized. Raj respond-

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 61 ed that it is currently centralized, scribed current emergency medical A Rule-Based System for Sense and Re- but that they plan to use a clustered services (EMS) problems: patient spond Telematics Services approach in the future. care reporting is paper-based, poor- Jonathan Munson, David Wood, Gerry The Abstract Task Graph: A Method- ly captured, untimely, incomplete, Thompson, and Alan Cole, IBM T. J. ology for Architecture-Independent and not searchable. The problem is Watson Research Center; Sang Woo Lee Programming of Networked Sensor exacerbated in mass casualty and DaeRyung Lee, IBM Ubiquitous Systems events, where there is a need to Computing Laboratory, Korea identify and distinguish each pa- Amol Bakshi and Viktor K. Prasanna, tient’s condition so that patients Jonathan Munson defined telemat- University of Southern California; Jim with severe conditions receive im- ics as vehicle-based computers and Reich and Daniel Larner, Palo Alto Re- mediate care. Also, there is a need communication systems. Potential search Center for patients to be directed to the applications include traffic naviga- Sensor networks involve three care center that can best treat the tion, weather detection, congestion roles: the end user (usually a do- patient’s condition. Consequently, detection, safety vehicles, and dis- main expert such as a scientist), the there is a need for an emergency tributed multimedia and gaming. application developer, and the sys- service system with improved com- Programming challenges arise tem programmer. In his talk, Amol munication, documentation, and when developing such applications Bakshi presented the Abstract Task exchange of information between because of the heterogeneities in Graph (ATaG), a macro-program- the pre-hospital and hospital phas- data acquisition, data processing, ming model that involves specifica- es of emergency care. data collection, event detection, and response processing. tion of aggregate behavior, not sim- Steve introduced a system that ply node-level programming. ATaG meets the above requirements. He Munson presented the Telematics is an application-neutral approach explained the system’s components Event Detection Service (TEDS), and tries to support reuse. ATaG and how data flows between these which offers a rule-based program- uses data-driven flow, in which components. Emergency medical ming model that enables develop- events carry information about the technicians (EMTs) enter patient ers to more easily develop a wide phenomenon. It also uses mixed information on PDAs and tablet range of event-driven telematics imperative/declarative specifica- PCs equipped with wireless con- services. This project was done in tions in order to separate “what” nectivity. A sensor is attached to the Ubiquitous Computing Labora- from “where” from “when.” each patient to keep track of the tory in Seoul, Korea, jointly created Feng Shao asked when the binding patient’s vital signs and the patient’s by the government of Korea and of abstract tasks occurs. Bakshi re- location (via GPS). Patient sensors IBM. sponded that some are bound at in one location form a sensor net- Rules present low-level events in runtime and others at compile work. Data from each sensor net- terms of Boolean expressions or time. Another member of the audi- work, along with data from PDAs programs with different states. Rule ence asked about the use of chan- and tablet PCs, used by EMTs, is inputs, such as position, velocity, or nels. Bakshi responded that the aggregated at a local command cen- pressure, are fed from data acquisi- channel is not a communication ter (an ambulance). All such com- tion devices. Functions defined on link but is input/output plus a zone munication occurs over a wireless rules include spatial functions, of interest. It is a way to associate a network, e.g., 802.15.4. Data from temporal functions, and logical task to a data item. different local command centers is functions. A higher-level program then transferred via the Internet can be constructed using the ABLE SENSE ’N’ RESPOND SOLUTIONS and aggregated at a central com- rule language, a low-level yet flexi- mand center where resources can ble language. Summarized by Ahmad T. Al-Hammouri be managed. Decisions then propa- In the most recent TEDS prototype, A Sensor-Based, Web Service–Enabled, gate back to local command cen- a developer defines a set of rules Emergency Medical Response System ters. The first prototype is built and the actions for responding to upon several technologies such as Nada Hashmi and Dan Myung, 10Blade, events from the rules; it represents TinyOS, 802.15.4 and Zigbee, C# Inc.; Mark Gaynor and Steve Moulton, a sense-and-respond framework and Java, and Web Services and Boston University similar to the Struts framework for Grid. developing Web Services applica- Steve Moulton, a surgeon at Boston tions. University, presented a scalable emergency response system that combines sensors, mobile databas- es, Web services, and wireless infra- structure technologies. Steve de-

62 ;LO GIN: V OL. 30, NO. 5 Meteorological Command and Control: tions, and asset performance. Jakka tions in particular: a marine center An End-to-End Architecture for a Haz- Sairamesh noted the factors moti- for air-ground combat sensors and ardous Weather Detection Sensor Net- vating the need for such a system: a system for monitoring space in work lack of visibility in current business parking lots. He demonstrated the Michael Zink, David Westbrook, Sherief operations, rising costs for business challenge of tracking vehicles using Abdallah, Bryan Horling, Vijay Lakam- management, and late reaction to ground-based sensors and the is- raju, Eric Lyons, Victoria Manfredi, and critical events. sues in real-time communication Jim Kurose, University of Massachu- The system needed to address sev- and coordination. setts; Kurt Hondl, National Severe eral challenges such as real-time Malena Mesarina focused on the Storms Laboratory sensing and monitoring, complex role of standards and return on in- Michael Zink presented the soft- event management, domain-specif- vestment (ROI) for sensor-based ware architecture of the meteoro- ic analytic, multi-format streams, applications in the industry and logical command and control and multi-rate streams. Jakka physical environment. She talked (MC&C) component of a NetRad Sairamesh presented the high-level about applications such as combat prototype. The goal of NetRad is to architecture for this system, which field tracking and habitat monitor- detect a tornado within 60 seconds is primarily composed of the Event ing, and the role of motes in sens- of formation and to track its cen- Stream Processor. The Event ing. She also talked about RFID- troid with a temporal error no Stream Processor uses the Event and EPC-based sensor applications greater than 60 seconds. The Transformation Service to convert for the industry as being important MC&C component forms a closed into the standard format, the Event and timely. She argued that the ROI control loop starting from ingesting Correlation Service to retrieve a list is critical to enabling real deploy- data from remote radars (sensors), of metrics that need to get calculat- ments and that pilots are needed to identifying meteorological features ed, and the Session Update Service validate such deployments. She has in the data, reporting features to to ensure that the current event is found that the technology is expen- end users, and determining each included in future metrics calcula- sive, hard to program, and requires radar’s future scan strategy based tions. The Metric Evaluation Ser- deep knowledge about the events. on detected features and end-user vice determines what, if any, ac- Ron Ambrosio talked about a better requirements. All of these steps tions need to be taken based on closed-loop operations control be- need to complete within 30 sec- the newly calculated metrics. These tween the sensor systems at the onds before another cycle starts. actions are taken with calls to the edge of the network and the opera- Action Instantiation Service. A benchmark based on Nexrad tional enterprise systems at the radar data is used to determine core. He discussed the traditional whether the total processing time PANEL SESSION model of computing, which had a separation between this core and of all steps can fulfill the 30-second Summarized by J. Sairamesh deadline. The results show that all the edge systems. He focused on ar- Research Challenges of End-to-End these steps have sub-second execu- chitectural principles and program- Sense ’n’ Respond Solutions tion times that make them well- ming models for scalable Internet suited for the NetRad system. Panelists: Ron Ambrosio, IBM T. J. Wat- control systems that have seamless son Research Center; Malena Mesarina, integration between the edge nodes Reducing Business Surprises Through HP Labs; Vincenzo Liberatore, Case and the core enterprise nodes. He Proactive, Real-Time Sensing and Alert Western Reserve University; Feng Zhao, enumerated many architectural Management Microsoft Research principles. He also defined “real- Mitch Cohen, Jakka Sairamesh, and time” processing as not being sim- The panel session began with short Mao Chen, IBM T. J. Watson Research ply handling a higher rate of infor- presentations from each panelist. Center mation through sensors but a Feng Zhao focused on challenges in notion of determinism of the This paper describes the system de- planet-scale S&R systems. He con- process and events. sign of a unified semantic event- sidered application domains such stream system that continuously as autonomous vehicle tracking, Vincenzo Liberatore supported the monitors diverse data sources and health care, networked transporta- desire to provide deterministic per- generates alerts based on domain- tion, ubiquitous appliances, and formance, even at the expense of specific rules. Such a system can motion sensing for security. He also latency. He supported the definition enable manufacturers to closely talked about the role of sensor sub- of “real-time” processing that Am- monitor critical business events nets, protocols for sensor subnet brosio described. He said that (reducing surprises) and gather communication, and storage issues measurement, validation, and met- business failures, warranty intelli- in S&R systems, and he considered rics for validation are critical in un- gence, field events, sales transac- issues pertaining to two applica- derstanding scalable S&R systems.

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 63 He talked about a tele-epistemolo- Rashid also gave examples of ubiq- MobiSys 2005:The Third gy application and the value of net- uitous I/O, which turns any surface International Conference on worked S&R in such applications. into an interactive computing sur- Mobile Systems, Applications, He argued that almost all of the face, and of streaming intelligence, and Services S&R applications fall into the area such as skyserver.sdss.org and of control theory models and that skyquery.net, which allow scien- these models need to be applied Seattle, WA tists to do data mining against a and validated. He strongly argued June 5–8, 2005 large number of databases and has the need for real-time S&R bench- resulted in the discovery of new as- Tristan Henderson of Dartmouth College marks, simulation, evaluation, met- tronomical phenomena. conducted a study of the conference rics, emulation, autonomous opera- WLAN during the conference. Questions Rashid believes the goal of ubiqui- tion, determinism in operation, and regarding the study can be directed to tous computing should be to bridge handling failure situations. him at [email protected]. the gap between the rich and the A fundamental question was raised poor. More than three billion peo- by the audience on whether we ple pay a poverty premium for KEYNOTE ADDRESS were reproducing control theory basic goods and services. They work. The panel responded that, to Summarized by Himanshu Raj have little access to important some extent, S&R systems are al- Technology’s Future: A Crisis in amenities and make decisions af- ready in the field (e.g., home con- Confidence? fecting their livelihood (e.g., when trol systems). However, these sys- to plant crops) based on incom- tems do not scale up. They also Rick Rashid, Microsoft Research plete information. Even in the de- believe that existing control theory Rick Rashid addressed the growing veloped world, people with in- is being applied. Questions were unrest among the CS community comes below $35,000 have a 70% raised about event management. about the future of the IT industry chance of not having good online The panel talked about event han- and opportunities for research. He access. Wiring to every home in the dling, event bus architecture, and then outlined certain trends emerg- whole world is not realistic. Rashid standards for handling events. ing from ubiquitous and pervasive described the Mesh Networking The panel discussed standards in domains and suggested that we are Project, which is applying wireless depth and agreed that standards actually just getting started on re- networking technology and mesh should focus on accessing informa- search that will impact society at its networks to form a cooperative net- tion and representing the data to core. work to cover the last hop. These abstractions provided by the sen- A terabyte of storage can hold every networks must be autonomic, in sors. As for what would enable conversation you take part in for that they should be self-managing, S&R to succeed, the panel felt that your entire life. It can hold one pic- self-healing, and inexpensive. the entire community has a role to ture for every minute of your life. It Other projects looking at this prob- play. Academia can contribute to can hold a year of everything you lem include the TIER project at application, methodology, and met- can see, in full-motion video. The Berkeley (http://tier.cs.berkeley rics for evaluation. In fact, the NSF SenseCam project at Microsoft Re- .edu/) and the Digital Gangetic has a $40 million program in this search is examining what you could Plains project at IIT Kanpur area. Business can also contribute, do with a terabyte of personal stor- (http://www.iitk.ac.in/mladgp/). especially focusing on the ROI age. They have devised a wearable In conclusion, there are ample op- question. data and image recorder. The de- portunities for reinvigorating CS vice contains a camera and numer- research. The emphasis must be on ous sensors (e.g., accelerometer, access to information rather than passive IR, light level, temperature, mere access to devices. images, etc.). Image capture is trig- During the Q&A session, Ramón gered by sensing some interesting Cáceres (IBM Research) asked change in environment (e.g., light- about how to sustain technology ing levels). They are hoping to change in rural areas. Rashid re- apply this technology to patients sponded that you must find the with moderate memory impair- economic value behind the tech- ment and as an aid for caregivers of nology and that, unless the person patients with severe memory loss. providing the information access It can also be used for self-reflec- points (such as Internet kiosks) tion. makes money, he will not keep pro- viding the service. Mandayam

64 ;LO GIN: V OL. 30, NO. 5 Raghunath, also of IBM Research, lenges. RealityFlythrough uses phones. Mosmondor pointed out followed up by asking how to get “motion” as a substitute for an that the server of LiveMail spent rural areas started. Rashid respond- infinite number of cameras; it uses 98% of the time on parameter adap- ed that there have been a number dynamic path estimation with tation yet only 2% of the time on of approaches. One that worked smoothened transition between storage and retrieval. in parts of India and Bangladesh scenes; and it uses dynamic archiv- Mosmondor answered questions was a bank that supports micro- ing of high-quality images from dif- regarding the relative importance of loans. Tapan Parikh (University of ferent locations to reduce the den- bandwidth and processing capabili- Washington) pointed out that the sity of camera deployment and the ty, the necessity of 3D rather than positive effects we foresee from amount of image data that needs to 2D face presentation, and the rela- technology have to do with the ap- be delivered. He concluded the talk tion of LiveMail to other efforts. plications that emerge from those with a discussion of their evalua- Mosmondor stressed that LiveMail technologies. Rashid agreed that tion. Overall, they found the num- does not require high bandwidth so this was true and then discussed ber of live cameras to be the bottle- much as high processing capability, whether this would cause the field neck, not the total number of such that the server could be of computer science to lose its cameras. dropped out of the architecture if identity. He pointed out that this After the talk, McCurdy discussed the client (e.g., cell phone or PDA) style of research can bring risks with the audience issues such as is fast enough. from a career and from a funding how to fetch archival images, how MediaAlert—A Broadcast Video perspective, but he also pointed out to deal with outdated images, the that about 50% of the basic re- Monitoring and Alerting System for impact of GPS precision on system Mobile Users search investment made by Mi- performance, and the network crosoft goes toward projects that challenges posed by the possible Bin Wei, Bernard Renger, Yih-Farn funding agencies are not (yet) com- use of omnidirectional cameras. Chen, Rittwik Jana, Huale Huang, Lee fortable with. He expressed hope Begeja, David Gibbon, Zhu Liu, and Be- that funding agencies would even- More information is available at hzad Shahraray, AT&T Labs—Research tually view these cross-disciplinary http://peanutgallery.homeip.net/ drupal/taxonomy/term/1. Bin Wei presented MediaAlert, a projects as legitimate research system for delivering TV news ac- areas. LiveMail: Personalized Avatars for cording to the interests of users. Mobile Entertainment Wei discussed the techniques em- APPLICATIONS ON THE GO Miran Mosmondor, Ericsson Nikola ployed in MediaAlert: processing Tesla; Tomislav Kosutic, KATE-KOM; media and extracting descriptive Summarized by Hongwei Zhang Igor S. Pandzic, Zagreb University abstractions of the media, and con- A Systems Architecture for Ubiquitous Miran Mosmondor presented Live- tent repurposing to disseminate Video Mail, a prototype system for com- media to devices with distinct pro- Neil J. McCurdy and William G. municating personalized images tocols and capabilities. He stressed Griswold, University of California, between mobile devices such as cell the importance of aligning unsyn- San Diego phones. He briefly introduced the chronized images and captions and the scalability of the system to sup- Neil McCurdy presented the archi- concepts and techniques of 3D port a large number of users and tecture of their ubiquitous video modeling, face animation, and 3D devices. system, RealityFlythrough, which graphics on mobile platforms, then enables users to visually explore a described the client-server architec- Wei walked through a scenario remote area in real time. McCurdy ture of LiveMail. The client enables where a user created an identity started by discussing the potential the user to customize the charac- and topics of interest, and Medi- applications of the system, which teristics of face animation, and the aAlert then analyzed and delivered include disaster response, remote server acts as the relay between the related images to the user. Wei also monitoring, remote navigation, and sender and the receiver. Mosmon- pointed out that media processing virtual shopping. dor stressed the importance of per- takes longer than dissemination forming face animation using and noted the importance of deal- McCurdy introduced the technical parameter adaptation, instead of ing with false positives in alert. challenges: the low density of cam- transmitting raw face images: by era deployment, the continuous David Kotz asked whether the transmitting the parameters instead movement of both the object and user’s location is used in the selec- of images, LiveMail only requires a the camera, and the need for reli- tion of content. Wei replied that it bandwidth of 0.3Kbps, which is able, live, and real-time data deliv- is and that the news content itself is available even in resource resource- ery. He then elaborated on how he location-dependent, because differ- constrained devices such as cell dealt with these technical chal- ent news channels provide different

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 65 content. Mark Corner asked simulate a lost key by impersonat- knows this is true, since he’s not whether closed captioning was im- ing one device and telling the other Alice, so he and Alice can agree portant to this system. Wei replied device that it has forgotten its key. that the bit is 1. If Alice instead had that closed captioning is very im- This action causes the two devices said, “I’m Bob,” Bob would know portant to the existing system and to run the full pairing process the that this is false, so they would that this is an area that will evolve. next time they communicate, thus both agree on the bit being 0. Eve Tristan Henderson asked whether opening the devices to the attack. hears these messages, but does not they had done any usability testing. Shaked suspects that custom hard- know who was the source and Wei said that they had only used ware would be required, because it therefore cannot determine the val- the system within their own team may be necessary to spoof the Blue- ues of the bits. and that more usability testing was tooth ID and because the timing of The challenge is to construct an en- necessary. Andre Hesse asked what the attack is critical. vironment where Eve cannot deter- happens once the user has received Shaked concluded the talk by sug- mine the source of a message. In the maximum number of alerts for gesting possible countermeasures this talk, Castelluccia discussed the day and then something really to this attack. One consequence of three approaches Eve could take to important happens. Wei clarified the re-pairing attack is that the user discover the source of a message. that the maximum number of alerts is asked to re-enter the PIN. Users She could use timing information, per day is for the scheduled alerts should be wary of such requests. signal power, or frequency. He then and that the system sends the most Users should also enter their PINs described ways that Alice and Bob relevant clips first. Real-time alerts as infrequently as possible to re- could protect against each of these cover emergency events and are not duce the risk of an attacker eaves- techniques, one of which required subject to the maximum. dropping on the pairing process. physically “shaking them up.” He also suggests that the hardware Someone suggested that an attacker SHAKE ’EM, BUT DON’T CRACK ’EM manufacturers use the 128-bit PINs could intentionally create a colli- that are allowed in the standard to Summarized by Neil McCurdy sion when Alice sends a message to make the brute-force attack less Bob, so Alice and Bob would not Cracking the Bluetooth PIN likely to succeed. Even when using agree. Castelluccia pointed out Yaniv Shaked and Avishai Wool, Tel Aviv a 128-bit PIN, though, a denial of that this would not work, since University service attack could still be per- 802.11 is reliable: Alice would ex- formed by constantly running the This paper describes the implemen- pect an ACK from Bob. Another re-pairing attack. tation of an attack on the Bluetooth questioner wondered whether cam- security mechanism and received Shake Them Up! A Movement-Based eras could be used to record the po- considerable press coverage prior Pairing Protocol for CPU-Constrained sitions of the devices as they were to MobiSys. Devices being shaken. The question was mostly asked in jest, but Castelluc- Shaked presented results which Claude Castelluccia, INRIA, France and cia pointed out that, even if such an show that a four-digit PIN (the PIN University of California, Irvine; Pars attack were feasible, the devices size on many commodity devices) Mutaf, INRIA, France could be obscured from view while can be cracked in less than 0.3 sec. This talk complemented the Blue- being shaken. on an old Pentium III 450MHz tooth cracking talk. Claude Castel- computer, and in 0.06 sec. on a luccia showed how secure authen- Pentium IV 3GHz HT computer. tication between sensor devices MOBILE SERVICES Because the attack uses brute force, could be performed across a clear Summarized by Mike Blackstock the time to crack larger PINs in- channel without using a PIN and Reincarnating PCs with Portable creases by a factor of 10 for each without using anything as CPU- SoulPads additional digit used in the PIN. intensive as a public key. Other The second contribution of this goals were: no preconfiguration, no Ramón Cáceres, Casey Carter, Chandra paper is the re-pairing attack, extra cost, no special equipment, Narayanaswami, and Mandayam which forces two Bluetooth devices and protection from denial of serv- Raghunath, IBM T.J. Watson Research to rerun the pairing process that ice attacks. Center had just been described. Because The strategy assumes that an eaves- Awarded Best Paper! devices usually store the link keys dropper cannot tell the source of a Mandayam Raghunath presented indefinitely, the first attack only message (source anonymity). As- SoulPad, a new approach to solving works during the short interval suming you have two devices, Alice the problem of providing a person- when devices first connect to each and Bob, Alice can send a message al and customized computing envi- other. However, an attacker can to Bob that says, “I’m Alice.” Bob ronment anywhere. When users

66 ;LO GIN: V OL. 30, NO. 5 move locations, they want to sus- execute applications on a server DeltaCast: Efficient File Reconciliation pend their work and then start up over the Internet. The bandwidth in Wireless Broadcast Systems again exactly where they left off. over the Internet is often limited to Julian Chesterfield, University of In the SoulPad solution, the PC a T1 (1.5Mbps) and latency is high, Cambridge; Pablo Rodriguez, (body) is separated from the ses- leading to poor response to desktop Microsoft Research sion state and bits (soul). SoulPad interaction. In a manner similar to uses a pocket-sized USB drive that cyber-foraging, Slingshot replicates Julian Chesterfield presented the can be plugged into any PC to state from the remote (home) com- DeltaCast system, which very effi- allow users to resume their person- puting environment to a VM run- ciently reconciles two versions of a alized computing environment. ning on one or more surrogates. Be- file between a server and any num- The key enablers for SoulPad are cause Slingshot only replicates the ber of clients with any version of a fast, small, high-capacity USB 2.0 state, users can fall back to their file using a pure radio broadcast drives from mass market media home server in the event of surro- system. DeltaCast uses a hierarchi- players, auto-configuring OSes gate failure. Once a replica is in- cal hashing scheme, combined with such as Knoppix, and mature virtu- stantiated on one or more surro- decomposable hashes and erasure alization technologies. gates, a proxy broadcasts user coding for high efficiency. Delta- Cast was compared with file down- Although the virtual machine and interaction to all replicas and the home server. The applications test- load, flat hash, hierarchical hash swap space use an encrypted file schemes using Web pages, and bi- system, SoulPad is still currently ed are the VNC desktop and speech recognition, though Su only dis- nary data such as software up- vulnerable to hardware and BIOS grades. From this evaluation, it was attacks. To address reliability, the cussed the VNC application in her presentation. determined that the number of hi- system supports network backups, erarchy levels could be dynamic though local backup is also possi- The system was evaluated to show depending on the data type. Com- ble. To deal with hardware diversi- that once the state was transferred, pared to hierarchical and single- ty, the system needs to keep many taking 27 minutes, the interaction layer hash systems, the time re- drivers up to date. Some older sys- performance was 2.6 times faster. quired to get the data required to tems cannot boot from a USB don- Using a microdrive to store volatile update a file on a client is also gle, so a small boot CD could be re- state and chunk hashes sped things much lower for Web pages. The quired. up considerably, requiring only 6 amount of data downloaded is the Raghunath was asked whether minutes: about 3 minutes to trans- same as in hierarchical hash there are other issues that need to fer state, another 3 to replay the schemes and less than in single- be addressed to put SoulPads to logs. Overall the system improves layer hashing. DeltaCast trades off use. One issue is that certain appli- performance for low-latency appli- decoding time, but the overall cations check the hardware and as- cations and hides surrogate failures penalty is low. This system can be sume that they are pirated when by using replicas and broadcasting applied to not only broadcast radio they have been moved to a new PC. the interactions. networks but also IP multicast, There are also legal issues that need Ramón Cáceres asked if the system overlay networks, and content dis- to be addressed. relied on the applications running tribution networks. Further information is available at in a deterministic manner; for ex- One attendee asked why the au- http://www.research.ibm.com/ ample, could a Slingshot applica- thors did not consider the use of WearableComputing/SoulPad/ tion maintain communications Turbocodes or other well-known soulpad.html. from the outside world? Su re- hash algorithms. Apparently these sponded that determinism is re- hashes do not require the system to Slingshot: Deploying Stateful Services quired and that outside communi- in Wireless Hotspots solve linear equations, but use iter- cation is not yet supported. ative algorithms, so they should be Ya-Yunn Su and Jason Flinn, University Another member of the audience faster. Chesterfield answered that of Michigan asked whether requiring VMware they used a hash that was already Ya-Yunn Su presented the Slingshot on all surrogates might perhaps be available to them and understand system, which alleviates the bottle- too strong an assumption. Su re- that perhaps their system may not neck associated with the back-haul sponded that they do assume that be optimal in this area. Maria connection to hotspots by replicat- the surrogate is already running Ebling asked whether the carousel ing application state to surrogate VMware, but suggested that per- size of hashes or data on a given computers closer to wireless access haps VMware itself could be trans- channel was fixed. Chesterfield an- points. She presented the scenario ferred first in the future. swered that any number of erasure where a user brings a Pocket PC codes can be generated and any of into a coffee shop and uses VNC to

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 67 these blocks/codes is useful in re- ary talk on the technological and their role in improving the robust- generating data and hash codes. societal implications of the devel- ness of systems (i.e., punishing ac- opment of mobile pervasive sys- tions that are not allowed, such as POSTERS AND DEMONSTRATIONS tems, with an emphasis on the ro- spam, viruses, etc). But Spector bustness of these systems. also pointed out the challenges of Summarized by Denitsa Tilkidjieva Spector first described the trend of InfoP: privacy, storage for massive This session hosted 24 displays of mobile systems. On one hand, the data, digital signatures and CAs, researchers’ work in the field of ever-growing modality, decreasing and law enforcement across inter- mobile systems, applications, and form factor, declining cost, and ex- national boundaries. services. A wide range of applica- ploding connectivity have been Spector discussed the importance tions were on display. We highlight pushing the development of mobile of the “science of design” in in- just three of these here: systems. On the other hand, med- creasing system robustness. Besides The pre-hospital patient care sys- ical informatics and security appli- the traditional time and space com- tem (Hashmi et al.) consisted of a cations are calling for mobile sys- plexity, Spector specifically noted number of pulse sensors, attached tems. For instance, the market for the need to manage the complexity to patients’ fingers to monitor vital health care is about $1.5 trillion per of usage (i.e., user interface). To signs. It is most useful when multi- year, which is about as large as the this end, he argued that we should ple patients are in need of atten- whole IT industry. Nevertheless, in pay attention to the meaning and tion, because it can allow quick de- most scenarios of mobile pervasive measurement of robustness and of cisions at critical moments. systems, we are envisioning amal- design methodologies. He also em- gams of components. The systems phasized the importance of simple The inHand system for ubiquitous are usually so complex that we can- yet flexible architectures, as well as personalized interactive content not prove their correctness. In self-healing and self-optimization. (Bhatti et al.) was shown in both a many cases, it is also impossible to Finally, he suggested that the tech- poster and a demonstration. The anticipate what may go wrong in a nical community should work with researchers demonstrated the in- system. the wider society to address a Hand device and how it can be broader range of robustness issues. used to gather user-customizable Given the complexity of pervasive information about movies, events, systems, Spector argued that one and the like. key challenge of designing these SPEEDY WI RELESS systems is to guarantee their ro- If you try to cook a turkey, a duck, Summarized by Xiaoqiao (George) Meng bustness, e.g., ease of use, evolu- and a chicken all in one, you will tion, QoS, reliability, security, and Improving TCP Performance over get a Turducken. Sorber and his fitness to purpose. Wireless Networks with Collaborative colleagues borrowed this name for Multi-Homed Mobile Hosts their system for hierarchical power After discussing the challenges management, consisting of a lap- posed by mobile pervasive systems, Kyu-Han Kim and Kang G. Shin, Uni- top, a stripped-down PDA, and a Spector analyzed why existing versity of Michigan mote. The device always chooses techniques and architectures do not Kyu-Han Kim pointed out that the platform that performed the satisfy the need for robustness. To wireless networks have capacity given task at the lowest energy cost, demonstrate design for robustness, limitations. In practice, a group of thus extending the lifetime of the Spector described a hierarchical ar- nearby hosts may constitute a mo- device up to 10 times. chitecture based on a common bile collaborative community trusted computing base, a secure Photos of the posters and demon- (MC2) where hosts share band- hypervisor, and trusted virtual do- strations can be found at width and content. In such a com- mains. Spector stressed the impor- http://www.mtholyoke.edu/ munity, each host is multi-homed tance of adopting a top-down ap- ~dntilkid/mobisys. and data is multiplexed to improve proach and the development of utilization. Accordingly, it is impor- trustworthy capabilities (e.g., attes- tant to allow TCP to achieve high PLENARY SESSION tation, privacy services, and au- utilization of the aggregate band- Summarized by Hongwei Zhang thentication). width over multiple interfaces, but this presents several challenges, Staying Off the Hot Seat with Cool Mo- In particular, Spector stressed the from requiring exact link-state in- bile Systems concept of “information prove- nance (InfoP),” by which the origin formation, to coping with dynamic Alfred Spector, IBM Software of information can be identified wireless links, to handling out-of- Alfred Spector, chief technology of- with proof. InfoP can provide the order packet delivery, to controlling ficer at IBM Software, gave a vision- basis for law enforcement to play congestion.

68 ;LO GIN: V OL. 30, NO. 5 To cope with these challenges, Kim lenges, from coping with limited by limiting how much data each introduced PRISM, Proxy-based In- bandwidth, to dealing with applica- host can send. The proposed solu- verse Multiplexer. PRISM consists tions having dissimilar needs, to tion is called Overlay MAC Layer of adaptive scheduling, intelligent dealing with dissimilar network (OML). OML uses readily available, acknowledgment controlling, net- channels with varying QoSes. inexpensive hardware, and can work-assisted fast recovery, and Horde middleware provides a num- evolve to meet diverse application IMUX at the proxy’s network layer. ber of services, the most novel of scenarios/requirements. Rao then PRISM architecture has an adaptive which is QoS modulation. With described how to implement OML. scheduler (ADAS) to achieve cheap QoS modulations, applications He summarized the results of their and adaptive fair-scheduling. express stream QoS sensitivities, evaluation on both a six-node test- PRISM has been implemented in where the QoS is multi-dimension- bed (based on Click from MIT) and Linux kernel 2.4.20 and netfilter. al. When an application sends data, a Qualnet network simulator. They PRISM-IMUX is a filter at a net- it receives some utility from the found that the disparity between work layer. The authors also de- consumption of its data at another one-hop and four-hop flows was re- vised a testbed, by which they host. Applications express QoS duced but that the throughput on found that PRISM delivers 95% of “objectives” which define QoS con- the one-hop flows was also greatly the aggregated bandwidth. In a set- straints on streams. In summary, reduced. ting with three heterogeneous mo- the goal for Horde is to build a flex- Himanshu Raj asked whether pro- bile nodes, PRISM achieved more ible network striping middleware viding fairness reduced the overall bandwidth than vanilla TCP. for WWANs. throughput. Rao explained that this David Kotz asked how members of David Kotz asked how many appli- approach actually increases the the community find one another. cations the authors have examined throughput, though it is not shown Kim answered that they can use the and how many more they plan to in the graph because the graph service location protocol to find ex- explore. Qureshi expressed con- does not account for starved flows. isting collaborative communities, cern that the interface may be too Ramón Cáceres pointed out that but that forming communities and rich, in that you may be able to ex- with an asymmetric link, one node dealing with membership dynamics press more than is necessary. They would never be in the active list, is an area for future work. are building the telemedicine appli- but Rao clarified that a node in the Horde: Separating Network Striping cation to gain more experience middle can piggyback information Policy from Mechanism with the system. about the other node so that it will be added to the active list. Richard Asfandyar Qureshi and John Guttag, An Overlay MAC Layer for 802.11 Paine encouraged Rao to propose MIT Computer Science and AI Laborato- Networks some of these changes to 802.11 ry Ananth Rao and Ion Stoica, University because, although 802.11n address- Asfandyar Qureshi presented of California, Berkeley es some of these issues, it would be Horde, a middleware mechanism Multi-hop wireless networks are useful to address the other ques- that allows multi-stream applica- being considered for last-mile tions as well. Brian Noble asked tions to communicate over multi- broadband connectivity. However, what happens if one of the nodes ple channels with widely varying such networks are subject to issues gets greedy, and Rao explained that latency and bandwidth. The au- of fairness. This work addresses they currently assume that the box thors were motivated to build this issue; specifically, the authors is tamper-proof. Horde in order to support mobile show how to prevent starvation of telemedicine, specifically an appli- flows without changing the hard- OPERATING SYSTEMS FOR cation that allows doctors to exam- ware. SENSOR NETWORKS ine patients in transit to the ER. This application requires the trans- Ananth Rao first described a starva- Summarized by David Johnson tion problem with 802.11 MAC mission of unidirectional video, bi- Design and Implementation of a that they identified by using a test- directional audio, and low-rate Single-System Image Operating bed with six nodes. They found physiological data streams in real System for Ad Hoc Networks time from a moving ambulance. that the cause of this problem was Hongzhou Liu, Tom Roeder, Kevin This system requires high through- interference and asymmetric carrier Walsh, Rimon Barr, and Emin Gün Sirer, put, low latency, and the ability to sense. Rao pointed out that existing Cornell University deal with vehicular motion in an solutions to this problem (e.g., con- urban area. tention-based MACs and TDMA) Hongzhou Liu presented Magnet- require hardware modifications but OS, a distributed operating system Network striping in a WWAN envi- that the starvation problem can designed for use in ad hoc net- ronment presents substantial chal- also be solved above the MAC layer works. Liu observed that ad hoc

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 69 network applications are extremely SOS: A Dynamic Operating System for costs were 400 times as much as for difficult to program, even today. Sensor Nodes SOS, because updates to TinyOS re- MagnetOS responds to this prob- Chih-Chieh Han, Ram Kumar, Roy Shea, quire the replacement of the entire lem by combining all network Eddie Kohler, and Mani Srivastava, system binary. Kumar observed that nodes into a single, event-based University of California, Los Angeles the important metric to observe virtual machine; this abstraction when comparing update energy eases application development and Ram Kumar presented SOS, an op- costs is the frequency of updates. erating system designed for use in increases network lifetime. In Mag- Himanshu Raj asked what impact netOS, synchronous and asynchro- sensor network applications. Be- cause sensor networks often mani- one module can have on another nous events signal code execution and whether they can crash one an- by triggering event handlers. A stat- fest in long-term deployments, in- dividual nodes must be flexible to other. Kumar responded that the ic partitioning approach converts architecture provides no form of application class files into event respond to remotely controlled changes. SOS is a modular, applica- memory protection and that a wild handlers. Migration of event han- pointer could corrupt the data dlers provides improved energy ef- tion-independent operating system that supports dynamic reprogram- space of another module. Unfortu- ficiency and saving of computation nately, there is no hardware support if loss of power is imminent. Mag- ming via module updates and re- placements. In contrast, TinyOS, to correct this problem. Bhanu netOS provides several migration Pisupati asked about the program- algorithms that are designed to the de facto sensor network operat- ing system, produces a static binary ming model. Kumar responded that minimize communication energy it is described in more detail in the overhead. composed of both system and ap- plication-level functionality and paper, but that all modules are im- Liu presented results from evaluat- must be recompiled and replaced plemented as message handlers and ing MagnetOS with an application on each node to effect an upgrade. listen on a particular port for mes- called SenseNet, in which there are Another similar system, Maté, pro- sages intended for them. Jason a number of fixed sensors and mo- vides a virtual machine that can Flinn found the idea of safety bile data processing components. execute small code fragments dis- checks fairly interesting, but asked They compared a number of algo- tributed through the network. Ap- for clarification about how this re- rithms, three that required no com- plication-level upgrades are possi- ally works. Kumar responded that munication overhead and two that ble, but interpreter upgrades must static checks would require analyz- did. The TopoCenter(1) algorithm, fall back on the TinyOS update sys- ing the entire source code and which moves objects using one- tem. SOS consists of a static kernel, would not solve the problem. He hop neighborhood knowledge from which provides an abstraction of clarified that the base stations each communicating partner, in- the hardware, and is installed on all maintain some information about creased system lifetime by a factor nodes. Dynamically linked mod- the kind of modules present on the of 2.5. ules communicate with various nodes and do some analysis at load Ahmad Al-Hammouri asked why kernel services and device drivers time before sending a module. they did not use aglets, since aglets as necessary. Modules can register provide a clean environment for functions with the kernel, and po- LOCATION (HERE) mobile agents. Liu explained that tential callers may subscribe to pro- Summarized by Neil McCurdy the major complaint about mobile vided functions. The kernel pro- agents is security, because they can vides priority scheduling, dynamic A Relative Positioning System for Co- execute any code on any node. memory and intra-module message Located Mobile Devices Doug Terry added that MagnetOS is passing, as well as safety features. Mike Hazas, Christian Kray, Hans using a different model; they are Kumar discussed the results of an Gellersen, Henoc Agbota, and Gerd Ko- breaking their code onto different evaluation comparing TinyOS, rtuem, Lancaster University, U.K.; Al- machines. Terry then asked SOS, and Maté. All ran Surge-like bert Krohn, University of Karlsruhe, whether they thought they would applications (Surge is a well-known Germany need multiple algorithms and adapt multi-hop data-gathering applica- Mike Hazas introduced Relate, a between them dynamically or tion for TinyOS). They found that compelling approach to determin- whether one would be sufficient. CPU activity was on average 1% ing fine-grained relative locations Liu responded that they currently higher in SOS than in TinyOS. En- and orientations between proxi- require the programmer to pick one ergy usage during code updates in mate devices. It uses ultrasound and that their experience to date SOS was an order of magnitude peer-to-peer sensing, giving the suggests that the two Topo algo- larger than in Maté, because Maté user relative location accuracy in rithms perform well throughout must update only a small chunk of the 10cm range and eliminating the the range. bytecode. However, TinyOS update need for the infrastructure-based

70 ;LO GIN: V OL. 30, NO. 5 location support that is typically WALRUS: Wireless Acoustic Location mation in the packets. David Kotz needed for such fine-grained accu- with Room-Level Resolution Using Ul- commented that one of the unique racy. The Relate team created their trasound characteristics of this work was own hardware—a USB dongle that Gaetano Borriello, University of Wash- that they were using existing de- has three ultrasound transmitters ington and Intel Research; Alan Liu, vices. He then asked why they de- and sensors. Time of flight of the Tony Offer, and Christopher Palistrant, cided to move toward custom hard- ultrasound signal determines the University of Washington; Richard ware. Borriello replied that one range, and the signal strength, as Sharp, Intel Research Cambridge, UK thing to consider is how much op- recorded by each of the three sen- timization each device could have. sors, determines the angle of ar- Gaetano Borriello presented a sys- Robert Hall asked whether they rival. tem that provides room-level gran- had considered the health implica- ularity by using a combination of To evaluate the system, Hazas de- tions of ultrasound. Borriello said wireless and ultrasound. Borriello that they are not boosting the scribed an experiment in which five began by explaining the impor- laptops were placed at various posi- speaker output and that one of the tance of room-level localization to reasons to move toward custom tions on a 2.4 x 1.6m surface. One usability. The goals of the system hundred iterations of this experi- hardware was to move further away included low deployment cost, low from the audible range. Another ment were performed, with half of support cost, and a system that was them ensuring that the dongles had member of the audience asked incrementally useful and deploy- whether they had considered plac- line-of-sight to one another. With able. The system should also ap- good line-of-sight, one can expect ing microphones in the room to proach 100% accuracy while main- measure the volume levels and then roughly 9 cm, 33° accuracy. With taining privacy. limited line of sight, one can expect including those volume levels in approximately 11cm, 48° accuracy. Borriello described WALRUS as the WiFi packets so that clients Hazas closed by discussing some of being inspired by lightning and would know what volume to ex- the issues with this approach. First, thunder, with WiFi (through an ac- pect. Borriello responded that they he posited that hardware that is cess point broadcast of informa- had walked around the room and better equipped to do signal pro- tion) acting as the lightning and ul- measured the volume levels, but cessing may be able to get the accu- trasound (through commodity they found that this was not impor- racy to as low as 2 or 3cm. He also speakers) acting as the thunder. tant because their detection algo- pointed out that there is a limit to Clients receiving the WiFi packet rithm relied on a relative measure the number of devices that can be keep the packet only if they also of strength. handled, since only one device can detect the sound. The system was tested in two environments under communicate at a time. WORK-IN-PROGRESS many different conditions (music REPORTS (WIPS) The Relate system was demonstrat- playing, conversations ongoing, ed on Monday night, and Hazas keys jangling, doors slammed, Summarized by Ya-Yunn Su was asked how his system differed etc.), and the system proved largely CAM: Architecture for Automating from Cricket. Relate dongles have immune to extraneous noise. Bor- three transducers, are optimized for Paper-Based Processes in the riello concluded by discussing Developing World co-planar calculations, and can cal- some of the limitations in the exist- culate the angle of arrival to deter- ing implementation and presenting Tapan Parikh, University of Washington mine the orientation of the devices. a vision of future prototypes that Tapan Parikh said that in develop- The Relate system also does not re- might one day be located in a store ing countries paper-based processes quire any calibration. near you: a wristwatch receiving are inefficient, but cell phone pene- One questioner pointed out that lo- device, with light bulb and air tration is growing, and most cell cation accuracy is more important freshener transmitting devices. phones have built-in cameras. He as the devices get closer to one an- Lin Zhong asked whether they had proposed using such mobile other. For example, 10cm accuracy considered listening to the ultra- phones to digitize manual, paper- may not be adequate when the de- sound signal first and then the driven processes. A user could take vices are only 2cm apart. Hazas WiFi. Borriello responded that this a picture of a paper document, agreed. There was also a question approach might be perfectly rea- transfer the data to a remote ma- about whether the signal process- sonable, but that they haven’t tried chine, and propagate it to the ap- ing could be done in software. it. Ed Nightingale asked whether propriate recipient. The receiver Hazas did not see any reason why they had considered rooms that could then print the document. not. were changing. Borriello responded http://www.cs.washington.edu/ that you could put arbitrary infor- homes/tapan

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 71 Smart Attire–The Digital Diary from syslog data, extracts mobility located at http://www.guide Tarek Abdelzaher, University of Virginia characteristics from the user’s path, .lancs.ac.uk/. and extracts speed/duration/pause Invisible Agents The author proposed building sen- to generate the mobility model. sors into clothing to record user ac- Nobuo Kawaguchi and Negishi Yuuya, tivity. The smart attire includes ac- Emulab Unleashed! Into the WiFi and Nagoya University Mobile Wireless Dimensions celerometers, magnetometers, and Nobuo Kawaguchi pointed out that temperature, sound, light, and GPS David Johnson, University of Utah in a world with many computers sensors. Possible applications are David Johnson presented Emulab, a (PC, laptop, PDAs, etc.) and smart personal digital diaries, and med- network emulation testbed. Due to appliances (music players, TiVo), ical monitors. The prototype sys- the complex nature of wireless net- we do not have a good interface to tem is a winter jacket with five works, simulation is not enough in control and aggregate the interfaces motes and a GPS tracking device. testing and validating new research of all the devices. They built invisi- The motes on the jacket record ac- ideas. A real testbed like Emulab ble agents to solve this problem. tivities and upload the data when would be a better way. Emulab has The example is a conference room near an access station using added three wireless testbeds in- with a projector, multiple monitors, 802.15.4. One example shows that cluding a building-scale WiFi test- ceiling lights, and similar equip- the author can infer the user’s activ- bed, a fixed sensor net, and a mo- ment. They can combine brightness ity (e.g., typing, walking) from the bile WiFi network. The WiFi with a human sensor to control the data collected by accelerometers. testbed enables remote hardware ceiling light, or a human sensor to Another example shows the user’s reset functionality, since it is diffi- activate the projector. On each of path or stillness by data from the cult to find the real location and re- the target devices, they run a VNC GPS device. The user may use data boot the machine manually. There server. The master device connects mining techniques to understand are also fully programmable motes to all the target devices. personal life patterns (e.g., Is my and mobile robots with a tracking social life declining?) and query the A Social Networking Web Site for the system accurate to 1cm. Emulab Research Community history to keep track of health or fi- can be remotely accessed and al- nancial activities (e.g., When is the lows multiple users. It provides a James Scott and Richard Sharp, Intel last time I visited my dentist?). The realistic and repeatable wireless en- Cambridge prototype system shows that power vironment. Future work includes James Scott said that the popula- management is important for smart automated rechargeable stations tion of researchers is huge and attire. and power monitoring. More growing. There are many types of Extracting a Mobility Model from information can be found at relationships between researchers Network Traces http://www.emulab.net. (e.g., as co-authors, work col- Minkyong Kim, Dartmouth College Content Management for Mobile and leagues, conference attendees), but Pervasive Experiences communities that overlap might Researchers need a realistic mobili- also be isolated (e.g., SIGMOBILE, ty model to simulate the effective- Nigel Davies, Lancaster University Pervasive/UbiComp, SIGCOMM). ness of new algorithms on wireless Nigel Davies emphasized the im- One useful feature of a social net- networks. There are two ways to portance of content in a working Web site would be an easy generate network traces: syslog and ubicom/pervasive computing envi- home-page generator containing GPS. Syslog data collected on ac- ronment based on experience in basic information (e.g., bio, publi- cess points contains client events, GUIDE deployment. Their solution cations, photos). To prevent spam- including time and action. It has was to assign many students to ming, the Web site could be semi- the advantage of availability of work on content and the user inter- exclusive or could follow Gmail’s large data sets, but it is often hard face. Other projects, such as Can invitation-based model. The Web to estimate a user’s location from an You See Me Now? (http://www site would encourage people to reg- AP location, perhaps due to the de- .canyouseemenow.co.uk) and ister and verify their information. vice not being associated to the Equator (http://www.equator.ac.uk/) closest AP or perhaps because of Crawdad—A Community Resource for reached similar conclusions. In the Archiving Wireless Data at Dartmouth some device-specific implementa- e-Campus project, content can tion. An alternative is GPS. Unfor- come from users, be automatically David Kotz, Dartmouth College tunately, there is no GPS data set generated, etc. The challenge is David Kotz pointed out that there publicly available, and it does not how to manage multimedia content is little real wireless network traffic work indoors. To address these lim- in future mobile and ubiquitous available for researchers. They initi- itations, Kim extracts the user’s computing environments. More in- ated Crawdad as a facility for stor- path from time and AP location formation on the GUIDE project is ing data sets collected from real

72 ;LO GIN: V OL. 30, NO. 5 wireless networks. They already tion technologies are GPS and Robert Harle asked what the test had a campuswide wireless infra- WiFi. GPS devices are not accurate environments were like. Youssef in- structure for collecting traces. The and have limited coverage. WiFi dicated that it was a typical CS de- challenges include developing a can only be used on limited de- partment with offline measure- common data format, importing vices. Based on calculations from ments taken at night and used existing data, and anonymizing GSM tower signal strengths, the during the day to capture environ- data. They hope to coordinate with cell phone can provide accuracy ment variations in a typical deploy- other communities to develop net- within 30cm outdoors and 4m ac- ment. Mark Corner asked how the work trace formats and tools. The curacy with a 1m grid. They plan to new dynamic power control feature tools and data can also be used in make the trace publicly available. would affect location determina- course projects. More information tion using the new WLAN systems. is available at http://crawdad.cs LOCATION (THERE) Youssef thought that their ongoing .dartmouth.edu/. work in automatically generating Summarized by Mike Blackstock Secure Mobile Architecture the radio maps based on environ- The Horus WLAN Location ment changes may be effective in Richard Paine, Boeing Technology Determination System dealing with this problem. Richard Paine proposed a secure Moustafa Youssef and Ashok Agrawala, Deploying and Evaluating a Location- mobile architecture that can cryp- University of Maryland Aware System tographically identify each packet. They can support mobility by Moustafa Youssef presented the R.K. Harle and A. Hopper, University of transparently changing the address Horus system used to determine in- Cambridge, U.K. for the user and application. This door locations to an accuracy of Robert Harle presented their expe- framework improves their enter- less than 2m by using existing rience with deploying and using prise network by reducing opera- wireless LAN infrastructure. Horus, the Active Bat system at Cam- tional cost and complexity. They like other WLAN-based locating bridge. The Bat system is accurate use four techniques to achieve their systems, uses APs as reference to 3 to 5cm in three dimensions. goal: points and the observed signal The data on which this study is strength to these APs to estimate 1. PKI: Public key infrastructure. based was collected over a period distance via triangulation. Howev- Each client uses his/her badge for of more than two years from a de- er, when indoors, the observed sig- client authentication. ployment that covers about 500 nal strength readers can differ by square meters in their new build- 2. HIP: Host mobility protocol. 15dBm for a given distance. Like ing. Communications are based on Radar, Horus uses a radio map to Harle found that the killer applica- IPSec, therefore each host is identi- characterize the area to counter tion for this type of system was al- fied by a security parameter index these effects. Unlike Radar, howev- lowing companies to learn more (SPI) rather than IP. Each host is er, Horus is a probabilistic system about how office building space is further identified by a host identity rather than deterministic. tag (HIT), which is SHA-1 of the used, to encourage people to work The goals of this system were high public key. more effectively. Surveys showed accuracy, low computational re- that privacy was not an issue, but 3. NDS: Network directory servic- quirements, energy efficiency, and that result may not extend beyond es. The client goes through an iden- scalability (both in number of users small communities such as theirs. tification process before using and in the area covered). The A more meaningful study of such LDAP. Horus techniques accounted for a systems would require deployment 4. LENS: Location-enabled net- 25% reduction in the average dis- in a corporate office or perhaps a work services: see http://www tance error. The Horus system is hospital where there is more over- .opengroup.org/bookstore/catalog/ shown to have higher accuracy on lap in working hours and collabo- select.tpl?text=secure+mobile+ average than Radar by more than ration is more common. 82% and better than 27% for the architecture. James Scott thought it was interest- probabilistic system, and is more GSM War Drive ing that the corporate space usage computationally efficient by an was the best application rather than Mike Chen, Intel Seattle order of magnitude. The authors an application that benefited end then applied the Horus ideas in Mike Chen presented their goal of users. Harle indicated that in fact Radar and showed that these ideas providing a playground for location- one company expected to get a re- could reduce Radar’s average dis- based services. Cell phones are the turn on their investment in one tance error by more than 58% and location devices people already year of data collection, without decrease the worst-case error by carry every day. Some current loca- using any of the software that could more than 78%.

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 73 benefit the end user. Further, they land (a less dense suburb where MORE POWER TO YOU were not interested in deploying homes are 15 to 20 feet apart)— end-user applications until they and compared their location esti- Summarized by Ram Kumar Rengaswamy had reaped that ROI. Minkyong mates with GPS readings. Energy Efficiency of Handheld Kim asked whether providing more They then applied three different Computer Interfaces: Limits, information to the end user regard- algorithms. Their baseline tests Characterization, and Practice ing the accuracy of a location read- found that the specific algorithm Lin Zhong and Niraj K. Jha, Princeton ing would increase system trust. used did not matter much, though University Harle indicated that they experi- fingerprints performed poorly with mented with a five-bar system Lin Zhong noted that the role of only one AP. Interestingly, errors the user interface has often been ig- ranking and found it useful, but were higher in the more dense that this ranking system did not nored in the design of energy-effi- downtown area, probably owing to cient systems. The speed of the help this issue significantly. Guan- the fact that many APs are higher ling Chen asked whether the high human interaction is significantly up and not contributing to making lower than the speed of the com- accuracy of the Bat system was nec- measurements more accurate. They essary for the applications in this puter. The computer ends up concluded that it is possible to get spending significant time waiting deployment. Harle indicated that acceptable accuracy of 13–20m in although it was not needed for this for user inputs. The energy con- high-density areas, and around sumed by the computer in these type of application, there are class- 40m in lower-density areas, with es of applications, such as those idle periods can be eliminated about 30 to 60 minutes of calibra- through better system design. that use position clicking and tion for the neighborhood. pointing, that require it. He agreed Zhong and Jha compared two that the most important thing is David Kotz asked about the data forms of user input, speech and highly accurate room-level granu- corresponding to Figure 4, where handwriting, using energy efficien- larity. the Y axis is labeled “% of Time”; cy as the metric. Energy efficiency Cheng clarified that it actually rep- is a combination of the speed of the Accuracy Characterization for resents “% of records.” Another Metropolitan-Scale WiFi Localization input and its power efficiency. The member of the audience was asked experiments showed that speech is Yu-Chung Cheng, Intel Research Seattle whether the urban results were af- more energy efficient than hand- and University of California, San Diego; fected by the GPS noise in urban writing. Zhong also designed a Yatin Chawathe and Anthony LaMarca, canyons. Cheng said that they wireless wrist-watch to be used as a Intel Research Seattle; John Krumm, Mi- countered this possible effect by low-power, low-cost cache device. crosoft Research checking that readings were corre- The cache device is a slave to the While an intern at Intel Research, lated by three GPS units and only main computer. It collects user Yu-Chung Cheng and his col- using those that were consistent. input while the main computer is leagues worked to characterize the David Kotz then asked whether it put to sleep. This results in system- accuracy of the Place Lab WLAN would be possible to improve accu- level power savings. racy with the indoor Horus tech- location determination systems for The results evoked a lot of interest use outdoors. Unlike GPS, their niques. Youssef and Cheng dis- cussed this possibility offline. from the audience. Brian Noble system works in urban canyons and raised an interesting point about indoor environments, and it relies Youssef later reported that they concluded that the Horus tech- the usage scenarios of the input on more ubiquitous technology techniques. Listening and writing (check out the density map for niques would be useful when there is more information available for are concurrent operations while Manhattan at http://www.wigle speaking and listening are not. .net). Unlike other WLAN systems, the localization algorithm. For ex- ample, when the number of APs Hence, from a holistic viewpoint, it their system requires less configu- might be more energy efficient to ration time (1km2 area/1 hour) per scan increases, Horus tech- niques can give a significant advan- listen and write than to listen and using war driving. Although it is speak. consequently less accurate tage. However, where you have just (13–40m), this is not an issue for one AP per scan, there is not applications such as a location- enough information to notice a dif- based Web search. ference between the techniques. For their experiment they gathered The traces and source code for this data in three neighborhoods— paper are available at http://www downtown Seattle, an urban resi- .placelab.org/. dential area, and an area in Kirk-

74 ;LO GIN: V OL. 30, NO. 5 Turducken: Hierarchical Power Make synchronous IPC irrelevant. HotOS X: Tenth Workshop on Management for Mobile Devices Synchronous IPC in microkernels Hot Topics in Operating Systems Jacob Sorber, Nilanjan Banerjee, and is expensive. But as we learned Mark D. Corner, University of from Xen, IPC does not need to be Massachusetts; Sami Rollins, Mount Sante Fe, New Mexico on a critical path. Holyoke College June 12–15, 2005 OS is a reusable component. VMMs Jacob Sorber explained that the have achieved complete compo- main principle of hierarchical RELIGIOUS WARS nent reusability, because they treat OS as the reusable component. power management is to pick the Summarized by Alexandra Fedorova most energy-efficient component of During the Q&A session, the most the system for a task. The challenge Are Virtual Machine Monitors fire came from Gernot Heiser, who is in partitioning a given task into a Microkernels Done Right? disagreed with the analysis. Gernot set of subtasks and assigning each Steven Hand, Andrew Warfield, Keir argued that liability inversion is not to the most efficient component. Fraser, Evangelos Kotsovinos, and Dan inherent to microkernels (in UNIX Such an approach automatically Magenheimer, University of Cambridge you have system daemons imple- maximizes system lifetime. It is de- Computer Laboratory and HP Labs mented as user processes that run sirable to have minimum user in- Steven Hard argued that microker- in privileged mode). He also insist- tervention in such a system. nels and virtual machine monitors ed that IPC in microkernels is not a The authors developed the Tur- (VMMs) both emerged to achieve problem: New fast hardware allows ducken system for hierarchical isolation of the software system cheap implementation. Besides, power management in laptops. from the underlying hardware, but Liedtke, using L4 as an example, Turducken consists of a laptop at- used different means to do so. He demonstrated that microkernels tached to a PDA and a mote sensor highlighted similarities and differ- can be fast. node. The role of the mote is to ences among VMMs and microker- OS Verification—Now! maintain clock synchronization nels, talked about architectural les- Harvey Tuch, Gerwin Klein, and Gernot with a time server. The laptop and sons learned with these systems, Heiser, National ICT Australia the PDA derive their clock from the and suggested that it is best to de- mote upon their wakeup. The PDA sign architectures that borrow the Harvey Tuch argued that there is an is responsible for caching Web best from VMMs and microkernels, urgent need to develop practical pages and waking up the laptop to as opposed to sticking to a particu- formal verification tools that can be display the pages once they are lar architecture. used in high-performance industri- al operating systems. This is a chal- fully loaded. The user interacts di- He summarized the differences as rectly with the laptop. The laptop lenging task, and requires the right follows: Whereas VMMs multiplex OS architecture: basically, a micro- responds to user queries (e.g., entire operating systems, microker- email retrieval). Turducken signifi- kernel. VMMs won’t work because nels multiplex many small tasks as they increase the size of the TCB. cantly lowers average power con- threads. VMMs are closely aligned sumption compared to convention- with hardware, so the interface The rest of the talk was a survey of al systems. looks like hardware; microkernels available formal verification meth- The audience provided some very expose a higher-level interface, and ods. Formal verification is done by interesting comments and sugges- tasks communicate using synchro- constructing a system model, a sys- tions. The power supply design of nous IPC. VMMs offer one address tem specification, and a verification such a system was discussed. It was space per scheduled entity; micro- tool that checks for desired proper- concluded that it would be most kernels offer multiple scheduled ties using the model and specifica- energy efficient to have separate entities per address space. Architec- tion as input. Examples of formal batteries for each system compo- tural lessons learned from research specifications are HOL temporal nent. Usage of the lower-tier com- with these systems are: and Bayer-Moore logic, microker- ponents during the transition inter- nel APIs. Model checking is usually Avoid liability inversion. Moving done by automatic reachable state- val of a high-tier device from one trusted system code to the user state to another was considered. space exploration or by theorem level, as is done in microkernels, proving. They have already imple- involves having to trust user code mented some preliminary verifica- to perform essential system func- tion tools for the L4 microkernel. tions (e.g., user-level page in Mach). During the Q&A, Aaron Brown pointed out that real customers are using OSes as giant application

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 75 servers; they run databases and ap- that using events is worth all this this by maintaining a radix tree of plication servers on them, and it is trouble. The speaker said that using the blocks in the active image. not clear how much leverage you events is preferable because they Armando Fox and Mary Baker are going to get from verifying just are much faster. Pei Cao responded asked about the fault tolerance of the OS kernel. Several other atten- that based on her experience of de- the system. Andrew responded that dees, including Eddie Kohler and signing large software projects peo- nodes are monitored for failure Rik Farrow, asked how they were ple use both threads and events. (both loss of connectivity and going to deal with changing operat- Events are a pain, so people use deadline failure) and that blocks ing systems. Harvey responded that them only when performance is ab- are sent to at least three nodes to there are two types of changes, im- solutely critical, which does not reduce the chance of losing infor- plementation changes and API happen very often. mation. changes; they have a way of dealing Stupid File Systems Are Better with implementation changes, but STORAGE API changes are more difficult. Lex Stein, Harvard University Summarized by Steve VanDeBogart Making Events Less Slippery with eel Lex Stein discussed some experi- Parallax: Managing Storage for a Ryan Cunningham and Eddie Kohler, ments he conducted to validate the Million Machines University of California, Los Angeles file system speed tricks accumulat- Andrew Warfield, Russ Ross, Keir Fraser, ed over the last 30 years. The clas- This talk continued the old debate Christian Limpach, and Steven Hand, sic assumption in file systems is concerning threads and events. University of Cambridge Computer that the closer the block numbers, Events are fast but difficult to pro- Laboratory the faster it will be to access blocks. gram. The speaker described eel, a However, with virtualized block Andrew Warfield spoke about Par- programming tool designed to sim- numbers on modern storage sys- allax, a system for dealing with the plify programming with events. eel tems, this is not necessarily the increased demand on storage sys- uses program analysis techniques case. To determine how much the tems created by the use of virtual to improve programmability while smart allocation of blocks affects machines. With the adoption of vir- preserving the event model, and performance on a modern storage tual machines in cluster environ- provides the libeel event library, vi- system Lex removed the smartness ments, the number of system im- sualization, and debugging tools. by randomizing the block values in ages per disk has increased by a Programming events is hard be- a trace. factor of 10 to 100. Some organiza- cause it is difficult to understand tions expect that within the next Two traces were used to conduct the flow of the program and diffi- few years they will be supporting tests, one taken during the compi- cult to debug. eel’s visualization one million live system images in a lation of a Linux kernel, the other tools make it easy to visualize the single data center. Furthermore, under the Postmark benchmark. control flow, and the debugger al- techniques that take advantage of The block numbers in these traces lows you to step through each pro- historical versions of VM state were then randomly permuted. The gram flow separately; you follow (e.g., for intrusion detection or modified and unmodified traces the callbacks related to the same configuration debugging) imply ad- were played back in a disk-accurate connection, so each flow of related ditional demand on storage sys- simulator with a varying number of events appears sequential. The tems. Parallax is designed to solve disks used as JBOD (Just a Bunch speaker concluded that events don’t these problems by being able to of Disks), RAID 4, and RAID 5. The need to be hard to program. You scale to many images while sup- simulation results show that with a can use simple tools to extract pro- porting fast and frequent snap- modest number of disks the ran- gram control flow and to help with shots. domized traces started to perform visualization, verification, and de- better than the smart traces. bugging. The key insight behind Parallax is that block-level write sharing John DeTreville suggested that Margo Seltzer pointed out that should not be on the critical path. maybe the smart traces were trying while eel helps clarify the event- The common case is that all active to schedule the blocks into too based program after it’s written, it system images started from a small short a time frame and thus not does not appear to help with the number of base images and then di- taking advantage of the parallel actual writing of the program. The verged. While it is possible that dis- seek capacity of the multiple disks. speaker responded that by allowing tinct systems have the same soft- Lex replied that maybe there is a the programmer to visualize newly ware added on top of the base point where things should get par- written code, eel does make writing image, this is not the major source allelized and that this might be a easier. Margo was not convinced of shared blocks. Parallax handles new trick to improve the perfor-

76 ;LO GIN: V OL. 30, NO. 5 mance of file systems. Russ Cox metrics to account for the true cost peated large resource allocation, added that maybe instead of dumb- of cache hits and misses. improved OS support for resource ing down the file system, since it Several attendees pointed out that isolation) could make revisiting still does help performance on a good access models are needed in this old idea fruitful. single disk, the volume manager order to prefetch the right data. Several people expressed doubt should do a better job of distribut- Athanasios agreed, adding that we that demand outpacing supply is ing blocks. Petros Maniatis suggest- need to come up with some generic only a problem for free usage test- ed adding traces that evenly distrib- models with parameters that can be bed systems. Others noted the diffi- uted the work among the disks for tuned for specific devices or appli- culty in choosing a viable currency comparison: maybe performance is cations. for such a market economy. Al- improved, not by randomness per though history has shown that arti- se, but through hot-spots eliminat- OUTSIDE THE COMFORT ZONE ficial currencies tend to work poor- ed in the random trace. ly, using real money is generally Aggressive Prefetching: An Idea Whose Summarized by Steve Zhang unpopular with user bases, since Time Has Come Why Markets Could (But Don’t users would not start on an equal Athanasios Papathanasiou and Michael Currently) Solve Resource Allocation footing. Finally, real economies Scott, University of Rochester Problems in Systems have problems that are generally addressed by government-con- Jeffrey Shneidman, Chaki Ng, and David Athanasios Papathanasiou put forth trolled regulatory agencies, and an Parkes, Harvard University; Alvin the idea that the performance char- analogous solution would need to AuYoung, Alex Snoeren, and Amin acteristics of common memory and be found for a market-based alloca- Vahdat, University of California, San storage architectures have changed tion policy. enough that in order to get good Diego; Brent Chun, Intel Berkeley Re- performance (where performance search Lab Operating Systems Should Support Business Change may affect energy consumption), Jeffrey Shneidman argued for using aggressive prefetching must be markets to solve the resource allo- Jeff Mogul, HP Labs done. There have been drastic im- cation problem in large-scale sys- Jeff Mogul expounded on why provements in CPU power and tems. Utilization data from systems building systems with business storage capacity with moderate im- like PlanetLab shows how demand changes in mind (e.g., starting or provements in I/O bandwidth, but exceeds supply, especially during changing services, meeting new I/O latency has hardly improved at peak times, and demonstrates the regulations, mergers, and spinoffs) all. This means there is decreased need for an allocation policy. Using is extremely important for most en- risk and increased need to do pre- markets would allow for an effi- terprises. AT&T wireless’s down- fetching. cient policy where those who value fall, HP’s SCM rollout, and Comair’s Bandwidth has increased 40% per the resources the most receive crew-scheduling fiasco were men- year, but latency has only increased them. tioned as anecdotal evidence of the 10%. In order to lower the latency He outlined some problems that difficulties in adapting to business to the same level as before, in- would face any allocation policy. changes and the high costs of being creased prefetching is required. First, it would have to be selected unable to do so efficiently. Agile Furthermore, the amount of mem- and supported by the users. In ad- businesses tend to dominate their ory available to do speculative op- dition, it would be difficult to di- markets. However, most business erations has increased. Memory vide resources, since different re- systems are so complex that slack has increased by a factor of sources in a system may be con- changing or replacing them is ex- 100 over the past 15 years. nected in subtle ways. Consumers tremely costly, and with most IT Having pitched the idea of aggres- would also need to predict needs departments’ current focus on cost- sive prefetching, Athanasios pre- well and accurately value their cutting, change is never easy or sented some research challenges needs based on the currency cho- quick. that would have to be met in order sen. Finally, the implementation of Jeff believes that research should to make aggressive prefetching a market policy requires a method focus on allowing application-level worthwhile: device-aware prefetch- for expressing bids easily and effi- flexibility, which requires standard- ing; characterization of an applica- ciently. Despite these problems, ization at the lower layers at enter- tion’s I/O demand; coordinating I/O and although using market econ- prise scales. OS-level flexibility, requests with device power states; omies have been studied before, he which has been the historical focus speculative predictors that provide believes today’s environ-ment (e.g., of many researchers (e.g., micro- sufficient data coverage; and better demand much higher than supply, kernels, extensible OS), actually a semi-cooperative user base, re- undermines standardization. Al-

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 77 though formal verification may be Christos Karamanolis talked about machine learning, focusing on su- unrealistic for years to come, OS his experience in applying control pervised learning and reinforce- conformance testing should be a theory to systems management. ment learning. In supervised learn- first-class research topic. Also, cap- Recent surveys have shown that ing, the goal is to build a model turing an accurate snapshot of an 75%–80% of IT costs go into man- that can predict system output IT infrastructure at many levels and aging existing systems. A feedback- from sensor values, whereas in quantifying the value of IT (how based control system would allow reinforcement learning the goal much money a machine or a system humans to be taken out of the loop is a model that can control the sys- is making) would be vital for man- and thus cut costs immensely. tem according to sensor values to agement to gauge the costs and However, because computer sys- achieve some desired behavior. benefits of system changes accu- tems change quite frequently, adap- For supervised learning, there are rately. Finally, with the prevalence tive control is needed to dynamical- well-established techniques that of outsourcing, auditability of sys- ly estimate models to be used by can handle high-dimensional data, tems becomes critical for third- the controller. but reinforcement learning tech- party auditors to verify that a cus- The authors experimented with niques work best for low- (5–10) tomer’s requirements are being met using an automated controller as a dimensional data. The speaker only by the supplier. scheduler that handles different briefly touched upon unsupervised Researching these areas is challeng- classes of clients, and attempted to learning but believes that many ing, however, because enterprise have the system maintain a consis- system problems may require such applications tend not to be open tent throughput based on a service- techniques. source and may require millions of level agreement. Some important Moises Goldszmidt then talked dollars and several person-years to lessons were learned: First, more about the challenges of applying install and configure. Studying con- recent controller actions must have any statistical learning technique to trolled testbed systems in a lab en- higher impact on current measure- systems. More specifically, he relat- vironment is not a replacement for ments than earlier actions. Second, ed the problem to his work with looking at real deployed systems. the action-measurement relation- correlating low-level system met- In addition, it is not clear how to ship must be close to linear; where rics with higher-level objectives. measure the success of any method that is not the case, different ac- Finding such correlations not only in addressing these concerns. tions and/or measurements must be would help novice system adminis- tried until a linear pair is found. trators deal with simple problems, IT’S NOT AI, IT’S SYSTEMS More properties from control theo- but would also be useful to experts ry were translated into system re- by providing better visibility about Summarized by Steve Zhang quirements in order to help system the behavior of large-scale systems. Designing Controllable Computer architects design systems more Although there is generally plenty Systems amenable to automated control, of raw data available for problems and these are listed in the paper. Christos Karamanolis, Magnus in this arena, thanks to mature During the Q&A session, it was Karlsson, and Xiaoyun Zhu, HP Labs measurement and monitoring established that although control tools, there is a lack of labeled data As a prelude to Christos Karamano- theory is at least four decades old, that would aid the evaluation and lis’s talk, Elizabeth Bradley of the researchers have only recently ex- comparison of different approach- University of Colorado, Boulder, plored taking formal control-theory es. Another challenge is that learn- gave a brief introduction to control approaches to systems issues. How- ing schemes must be adapted to theory. She talked about simple ever, informal ad hoc methods handle online streams of data. Al- methods and tools that work for based on control-theory principles though the machine-learning com- linear time-invariant systems. have been used for quite some munity has yet to provide any gen- However, operating system prob- time. eral solutions, it is possible to use lems tend to be nonlinear and time- Three Research Challenges at the In- domain-specific knowledge to con- varying, the solutions for which are tersection of Machine Learning, struct efficient algorithms for the usually ad hoc. Although it’s possi- Statistical Induction, and Systems systems area. The speaker also ble to simplify many of these cases noted that in most cases, obtaining to work with a linear system-based Moises Goldszmidt and Ira Cohen, HP true root-cause analysis is neither approach, it’s very important to be Labs; Steve Zhang and Armando Fox, practical nor necessary. Instead, mindful of the assumptions that Stanford University one should strive for diagnosis that must be made for these solutions to Terran Lane from the University of can easily map to possible repair be valid. New Mexico set the stage for this actions. talk by providing an overview of

78 ;LO GIN: V OL. 30, NO. 5 Panel: Control Theory/Machine uration, all files, all system vari- Because there are fixed costs in- Learning ables. Then we have a chance of volved, you have to look at the life- Christos Karamanolis, Elizabeth checking whether the configura- time of automation, to amortize Bradley, Terran Lane, Moises tion is correct. But in order to do fixed costs over time. Automation Goldszmidt this, we need a system model. lifetime can be very short, because The system model can incorporate the software package or the in- This panel focused on the feasibili- staller might change, for example. ty of applying control theory and submodels. Programmers, publish- ers, and remote administrators can Apart from cost, there are the issues machine-learning techniques to of trust and adoption. Automation large-scale distributed systems. write these submodels. Models ex- press rules for composing the pro- is a disruptive force for IT systems The consensus seemed to be that managers. Using an incremental without centralizing data from in- grams into systems. The expecta- tion is that system models will be transition path from manual to au- dividual nodes, it would be very tomatic may be the right approach. difficult if not impossible for these easier to compose from submodels. approaches to effect conformance As a result, no sequence of installs Margo Seltzer suggested that it may with systemwide high-level policy. and uninstalls can result in a badly be difficult to know the lifetime of It was noted, however, that while a formed system instance. The draw- automation. Aaron agreed and said globally optimal solution may be back of this approach is that system that they would like to be able to impossible for distributed systems, models/policies may be too difficult predict it. Christos Karamanolis locally stable methods (e.g., to express. In conclusion, John wondered how they can quantify TCP/IP) are often practicable. pointed out that earlier efforts at fixed costs for the model. Aaron re- declarative configuration were not sponded that this is challenging This led to a reference to biological widely adopted, because they were and that user studies are needed to systems and how nature has solved targeted at programmers. evaluate the costs of complication the problem of achieving global that come with automation. How- goals from local control. Terran Joe Hellerstein asked how to deal with distributed applications. John ever, they do have a way of model- Lane responded that Mother Na- ing such costs. ture has reached such solutions said that this is a hard problem but through billions of years of experi- that improving local system admin- Human-Aware Computer System De- ments on an infinitely parallel su- istration is a good start. Jay Lepreau sign percomputer. He added that if one pointed out that standard program Ricardo Bianchini, Richard Martin, had a problem that had an accurate installers are already designed to Kiran Nagaraju, Thu Nguyen, and analogy in nature, using nature’s handle program interdependencies. Fabio Oliveira, Rutgers University solution would be feasible. Howev- John responded that they have the right mechanisms, but in practice Thu Nguyen began his talk by en- er, most problems do not fit such a couraging the community to con- model. they do not handle these interde- pendencies properly. sider the human as a first-class en- tity in computer system design. He CLEANING UP THE MESS Reducing the Cost of IT Operations— argued that systems designers WE’VE MADE Is Automation Always the Answer? should use human-aware princi- Aaron Brown and Joseph Hellerstein, ples, such as: Summarized by Alexandra Fedorova IBM T.J. Watson Research Center making systems more robust to Making System Configuration More Aaron Brown said that costs of IT human mistakes; Declarative operations are quickly outpacing understanding human actions and John DeTreville, Microsoft Research server spending. A common solu- mistakes; System configuration is hard. A tion is automation, but expenses developing techniques and infra- huge fraction of every user’s time is involved in developing and deploy- structure to increase human under- spent futzing in a computer system. ing an automated solution often standing of systems to prevent, This is the biggest performance outweigh the savings it provides. hide, and undo mistakes; bottleneck. The issue is that a con- Aaron provided a case study where figuration is a shared mutable state. automated solution did not help to designing new metrics and bench- We update the state in place when cut costs. Then he offered a mathe- marks to measure system improve- we install and uninstall. A system’s matical framework for evaluating ments. correctness depends on every in- the cost-effectiveness of a solution. He then described their work on stall and uninstall we have ever Basic cost model: There are fixed understanding operator actions and done. The proposal is to use declar- costs for setup and maintenance mistakes in configuring Internet ative configuration: record every- and variable costs for automated services. The results of this study thing that describes system config- inner loop and per-instance tasks. could be used to fix and prevent

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 79 user mistakes. Thu concluded by locks (via use of atomic sections in- applications. Where’s plug ’n’ play admitting that designing good stead of explicit locking), API vio- for software applications? human-factor benchmarks is hard lations (via type qualifiers), re- Safe system extension: Extensions and recruiting human study sub- source leaks (via computation add new value to applications from jects with the necessary back- stacks), and macro errors (via a the user’s perspective (e.g., Google ground is even harder. safer and newer preprocessor). In toolbar) but are unsafe. If we look Pei Cao suggested that the reason conclusion, Bill pointed out the at the last few SOSP submissions, why networking guys do a lot more problem of having two conflicting safe OS extension has been an ac- online testing is because Cisco goals: starting out with a safe, clean tive area of research. But what routers are specifically designed for foundation, or keeping existing about safe application extension? code relevant. online testing. If people built better Multi-processor cores: We can ex- software, maybe we would not have An audience member asked about pect up to 256-processor cores in such issues with installation com- what linguistic features were novel the foreseeable future, and hence plexity. If you build complex soft- in Ivy, and why. The speaker said need better OS support. ware, you have to have a user that this was future work. Pra- model in mind. shanth Bungale asked why, when The Singularity architecture in- previous attempts such as Cyclone cludes a VMM for abstract hard- ware (with the abstract instruction APPROACHES TO OS RESEARCH failed for this reason, we should be- lieve that Ivy won’t end up involv- set being type-safe, memory-safe Summarized by Prashanth Bungale ing enormous amounts of human MSIL), closed processes (i.e., no shared memory, no dynamic code Thirty Years Is Long Enough: Getting intervention. Bill replied that it is a loading, no dynamic code-genera- Beyond C fundamental goal of Ivy to reduce manual intervention as much as tion), and IPC channels with con- Eric Brewer, Jeremy Condit, Bill Mc- possible, but it remains to be seen tract guarantees. Closkey, and Feng Zhou, University of how much this can be done. Jay One key feature of Singularity is California, Berkeley Lepreau said that Cyclone on the concept of software-isolated Bill McCloskey explained that the TCP/IP (which had millions of processes (SIPs), where each point of this talk was to get people lines of code) actually hadn’t re- process has its own garbage collec- to think of C as a bad habit and to quired very much manual interven- tor and its own garbage collection stop using it. Safety and security tion, and that perhaps what kept domain, exclusive ownership of its have now become more important Cyclone from being widely adopted address space, and no pointers out- than any other factors. Low-level were things like resistance and in- side its address space (guaranteed systems are still unsafe and inse- ertia. because of type-safety). Therefore, cure, and C is the main problem. Broad New OS Research: Challenges to exploit this situation for perfor- Java has failed mainly because it is and Opportunities mance reasons, the entire system is not expressive enough and because run in ring 0. According to their porting applications is expensive. Galen Hunt, James Larus, David Tarditi, IPC micro-performance results, The authors believe that it’s possi- and Ted Wobber, Microsoft Research Singularity is an order of magni- ble to design a systems language Galen Hunt described his working tude faster than hardware-protect- that is safer and better than C, Java, definition of OS research as re- ed systems, because there is no or C#. Areas for improvement in- search into the base abstractions hardware protection domain clude memory management (GC is provided for computation and into change, and an order of magnitude not good enough), concurrency the practical implementations of slower than an in-process proce- (manual locking is too error- those abstractions. dure call, because the IPC involves prone), data layout (the program- Singularity is a research project a GC domain change. mer should have bit-level control with the following hypothesis: Gernot Heiser asked if they also on data storage in memory), and Sound verification techniques can had a new hardware model; other- API adherence (compiler checks be combined with new OS abstrac- wise, how would type-safety help should be automatic). tions to provide dependability, device drivers? Galen responded Their proposal for a new language, reliability, and security (though that their hardware model already Ivy, guarantees that the following sometimes at the expense of perfor- incorporates simple I/O ports, for classes of errors are eliminated: mance). It has focused on the fol- example, but it is not yet sophisti- buffer overflows (via bounds lowing: cated. Armando Fox was con- checking), dangling pointers (via Configuration and manageability: cerned that while the changes re- checked memory management The OS knows a lot about the hard- garding the hardware protection policies), race conditions and dead- ware but next to nothing about the domain (e.g., entire system run-

80 ;LO GIN: V OL. 30, NO. 5 ning in ring 0) may only change very annoying), and new concerns with merge conflicts. Jay Lepreau the performance from being “really bloat and dirty the mainline code commented that the problem is fast” to being “really, really fast,” base. that Linux has a pope model— why get rid of the mechanical intel- Marc then presented C4, a toolkit there’s only one integrator. lect just for this reason? His con- to improve OS evolution. The prob- Panel: Do We Work Within Existing cern was mainly that historically, lem with the existing approach of Frameworks or Start from Scratch? checking through software has patch (1) is that it operates at the Bill McCloskey, Galen Hunt, Marc Fi- been hard to get right. Jay Lepreau lexical level. By contrast, C4 func- pointed out that Andrew Appel uczynski, Robert Grimm, Russ Cox, and tions at the semantic level so that Eric Brewer broke type-safety in secure chips. we can build better tools to manage Galen responded that if you care a complexity and analyze interfer- Chris Small: Mike Jones once said lot, you can always put the process ence semantically. Their approach to me, “When you’re on an expo- in a higher ring; an option can be is to use aspect-oriented software nential path, nothing you do now provided to say, “OK, I don’t trust development (AOSD) techniques. matters.” So, why waste your time this code. Use a separate protection C4 provides a semantic patch com- on Ivy? Why not build a better lan- domain.” Robert Grimm comment- piler through which one can ex- guage instead? ed that using an abstract machine press behavioral changes as seman- Eric Brewer: Do both. We’re not as an executable platform made it tic patches using aspects, which going to get the language right ei- hard to predict . . . The speaker in- provide a language-supported ther; but Ivy is extensible. terrupted him and immediately re- methodology for integrating cross- Robert Grimm: Every year, the pro- sponded that everything is compiled cutting concerns with a program. (no more JIT). gramming language community The C4 toolkit consists of an un- publishes work and more work on patch (1) Considered Harmful weaver and a weaver, which are Java + delta. What if you want Java Marc Fiuczynski and David Walker, analogous to diff and patch, respec- + delta + delta? Extensions are the Princeton University; Robert Grimm, tively. The main difference is that answer. the unweaver actually removes the New York University; Yvonne Coady, Margo Seltzer: Galen’s talk seemed University of Victoria code belonging to an aspect from the baseline code and the weaver so far removed from helping the Marc Fiuczynski said that the key puts it back in the right place, end user. lesson from his talk is that we need using knowledge of C’s abstract Galen Hunt: Extensible applica- better tools to improve OS evolu- syntax to avoid merge conflicts. tions would help the end user to a tion. The open source model is great extent. used everywhere—embedded sys- Their current focus is the engineer- tems, servers, clusters, HPC—and ing effort needed to get the weaver Andrew Hume: Saying “The lan- fosters community development. and unweaver working. Future guage we’re programming in is the Updates are performed through work enabled by C4 includes pro- problem” is a delusion. patches, where the changes can gram analysis tools, identifying Galen Hunt: The language directly correspond to intraprocedural data structure changes, identifying affects what we think. memory safety (or lack thereof), changes (modifications to the inter- Andrew Hume: That’s balls. nal logic of a function), intermod- and capturing programmer inten- ule changes (changes to a func- tions via declarative frameworks. Eric Brewer: I don’t believe any of tion’s signature or a data structure’s Steven Hand commented that a lot that. field-makeup), or behavior changes of the problem simply lies in really Phil Lewis: Education—what lan- (changes to the semantics of inter- crappy open source code. Margo guages do you learn? Let’s not teach faces). Through examples of Linux Seltzer asserted that they are still people C! Ten, twenty years from kernel patches containing separate distributing patches and asked now, the problems will go away. concerns, Marc showed how an ex- what they distribute exactly. Marc John DeTreville: This is not about tension could easily cover a hun- responded that they distribute C4 languages. Back in the ’70s, people dred existing kernel files, even files. Chris Small asked how they wrote their own OS, compiler, etc. though it represents a logical unit are going to get people to use these But now it’s impossible to write expressing a single, cross-cutting tools. Marc replied that whether el- your own OS. concern. Current practice makes egant or not, they reduce the pain OS evolution hard and dirty: Since for the user, and that is what is Michael Scott: Regarding the panel it is hard to understand implemen- going to get people to use their question, look at past examples. tation of a concern, composition of tools. Kirk McKusick asked why Two and a half models for success: separate concerns is very hard, not use CVS. Marc responded that (1) Exponential curve: Java, Perl, maintenance is generally hard (or even then one would have to deal HTML, etc.: there wasn’t a market

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 81 for it. (2) Migration path: C++ DISTRIBUTION WiDS included in their simulation. (from C), XHTML (from HTML), Zheng replied that they have end- Opteron (30 years of x86). (2.5) In Summarized by Nikolaos Michalakis to-end connectivity simulation and between: MESA, Smalltalk, VKer- WiDS: An Integrated Toolkit for packet drops and that the best way nel, maybe Plan9. So which model Distributed System Development to incorporate newly discovered vi- should we be looking at now? Shiding Lin, Aimin Pan, and Zheng olations is to use the model checker Galen Hunt: My answer is a defini- Zhang, Microsoft Research Asia; Rui and update the protocol model. An- tive it depends. Will I ship Singu- Guo, Beijing University of Aeronautics drew Hume asked if Zheng had larity as a product? No, we will and Astronautics; Zhenyu Guo, Tsinghua found anything not covered by the learn ideas. University protocol checker. He replied that this could happen if, for example, Pei Cao: Galen’s presentation’s Zheng Zhang started by explaining the protocol specification was im- problem and solution seemed total- that today’s distributed system de- plemented incorrectly or if the ly far off. velopment process is unscalable for model was wrong. Galen Hunt: Then you young peo- humans. Debugging is painful, and there is code divergence between Causeway: Operating System Support ple should go work on solving the for Controlling and Analyzing the interesting problems. simulation and implementation. WiDS is designed to maintain one Execution of Distributed Programs Margo Seltzer: What lessons can code for both simulation and im- Anupam Chanda, Khaled Elmeleegy, the OS community learn from each plementation, simplify debugging and Alan Cox, Rice University; Willy of your projects, assuming that by allowing debugging in a single Zwaenepoel, School of Computer and you’re wildly successful? address space as much as possible, Communication Sciences, EPFL Robert Grimm: Languages are and support large-scale perfor- Anupam Chanda began by sketch- very important for reliability and mance studies of the system in de- ing the execution flow of a multi- security. sign. To achieve these goals, WiDS tier program composed of a Web Galen Hunt: Sound static analysis essentially lets programmers link and database server. Execution and verification can dramatically their code to different libraries ac- steps are performed by “actors,” impact our ability to test (9,000 cording to their needs (single-node such as system calls, over “chan- testers testing Windows currently, simulation, parallel simulation, nels,” such as sockets. As he noted, and you know the result). network execution). Verification it is sometimes useful to write and debugging of protocol imple- meta-applications to control and Eric Brewer: We do want to support mentations can be done through a legacy drivers, etc. analyze the execution flow of such model checker in simulation. multi-tier programs. Meta-applica- Galen Hunt: The OS knows noth- Experience with WiDS shows that tions can be categorized as “log- ing about the application. We still distributed system development based” (e.g., Magpie) or “metadata have a 1970 model of what a pro- and deployment can be greatly sim- passing” (e.g., Pinpoint). Unlike gram is: a.out, stdin, stdout, stderr plified both in building complete log-based approaches, metadata model. systems such as BitVault, a data re- passing across actors allows online Andrew Hume: Sometimes you just tention system, and in the large control of multi-tier programs and get it right! Clarity and economy of scale, such as the RNRP protocol is the approach chosen by Cause- expression . . . Are we ever going to on two million nodes. Research in way, a framework that provides OS have “the model” of an OS, or are progress hopes to include playback support for building meta-applica- we going to continually have peri- of message logs in simulation mode tions. odic purging? to find network-related bugs. In The framework is placed at the Galen Hunt: I don’t know! terms of extending APIs, whether level of the OS, since placing it at to use events or threads must be a either the application or middle- programmability-centric decision. ware level might lead some compo- Zheng’s conclusion was that dis- nents of the multi-tier program to tributed system and tool develop- be oblivious to metadata passing, ment should go together. might require modifications to all Ion Stoica noted that in reality applications, and, in the case of there are problems due to connec- middleware, might not support all tivity asymmetries, congestion, and legacy protocols. Causeway associ- unbounded packet delivery, and ates metadata with an actor upon a asked how many of those violations write on a channel and propagates

82 ;LO GIN: V OL. 30, NO. 5 metadata when the actor at the This method is developed by the answer was that, when possible, other end of the channel performs a Rx system in a comprehensive, multiple changes are made simulta- read, thus making metadata passing safe, noninvasive, efficient, and in- neously. Machine learning could be follow the program flow. Causeway formative manner. Sensors detect of further help there. Mary Baker invokes a meta-application through bugs before the program crashes, asked how many rollbacks were callbacks. The authors implement- and changes include padding allo- necessary for avoiding bugs in gen- ed a priority scheduler for a Web cated memory to avoid overflows, eral. The answer was not more than server application in only 150 lines allocating memory in an isolated four. of code, making a convincing argu- location to protect against memory ment for the feasibility of building corruption, and, in the worst case, SECURITY meta-applications using OS sup- dropping user requests. However, port. A concern to be addressed in all changes respect the application’s Summarized by Nikolaos Michalakis the future, however, is security and, API. While preliminary results on When Virtual Is Harder than Real: Se- in particular, the illegal modifica- escaping deterministic bugs are curity Challenges in Virtual Machine- tion of metadata by the running more than encouraging, there are Based Computing Environments program. still several challenges for Rx, such Tal Garfinkel and Mendel Rosenblum, as committing on the program’s Petros Maniatis suggested that Stanford University Causeway could benefit from the output (one reply to user) and the Tal Garfinkel began by presenting use of both metadata passing and need for more powerful bug sen- functional differences between vir- log-based analysis, since logs are sors. Yuanyuan emphasized that to tual and real (traditional) machines streams and could be mined as they be successful against deterministic to support the hypothesis that such go by. Anupam didn’t find the idea bugs it is necessary to make the differences break existing security feasible, however. Doug Terry won- system nondeterministic. management approaches, so we dered what Causeway could do in Aaron Brown asked how Rx han- have to rethink VM security. More the OS layer that it couldn’t do in dles a bug that is detected after out- specifically, traditional machines the middleware layer. Anupam clar- put is sent to the user. Yuanyuan scale slowly and predictably while ified that by OS support he meant said that once the output reaches virtual machines do so rapidly, and modifications to the OS as well as the user the problem is hard, but traditional machines enforce homo- system libraries, but failed to an- before the output reaches the user, geneity but virtual ones encourage swer the question exactly. techniques such as data mining diversity. In addition, traditional could help prevent this. Aaron then Treating Bugs as Allergies: A Safe machines support stable popula- asked how Rx handles concurrent Method for Surviving Software tions, but virtual machines support requests. The reply was that re- Failures highly transient ones, and the dif- quests are not serialized, but re- ference is more acute since virtual Feng Qin, Joseph Tucek, and Yuanyuan plays after checkpointing are. Brett machines allow increased mobility, Zhou, University of Illinois, Urbana- Fleisch asked whether by “non-de- making it harder to link the VM to Champaign terministic” she meant increasing its owner. Bugs are inevitable, and they lead the percentage of hidden bugs com- to system failures. Existing solu- pared to deterministic bugs, and The solution proposed is to move tions such as rebooting, check- she agreed. Petros Maniatis (Intel security-related functionality out of point-recovery, application-specific Research) pointed out that changes the guest OS and into a ubiquitous recovery, and failure-oblivious by Rx might break down program- virtualization layer. Such an ap- computing cannot recover from mers’ optimizations and asked how proach will help decouple security deterministic bugs. Yuanyuan Zhou Rx would cope with that and en- and management from the struc- offered a different approach to this sure safety without programmer ture of the guest OS. problem: Since deterministic bugs feedback. Yuanyuan’s reply was that Margo Seltzer observed that this ar- are hard to cure, then “run away” future work will include identify- chitecture resembles a microkernel, from them, essentially treating ing common assumptions made by where the Trusted Computing Base them as allergies. This is achieved programmers and incorporating is pulled out of the VM. Tal replied by changing the execution environ- them into Rx. Armando Fox asked that microkernels were cool, and he ment on demand upon soft failures. how Rx compared to failure-oblivi- didn’t find anything wrong with Essentially, the program is rolled ous computing. The answer was that. Edward Wobber asked back after each change is applied to that Rx is more general. Dug Terry whether updating the VM state the execution environment until asked how Rx prioritizes changes from outside the VM could be use- the bug disappears. to the execution environment so ful. The reply was that, depending that the effects are maximized. The on the OS, it could be. Jay Lepreau

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 83 followed up on Margo’s remark, data from a new user B, the com- Access Control in a World of Software saying that there is no problem in partment is tagged anew with a Diversity separating the security from the third tag, AB. This prevents data Martin Abadi, University of California, VM. Tal added that such separation from being written out to compart- Santa Cruz; Andrew Birrell and Ted can give more control to adminis- ments having tags A or B, thus pro- Wobber, Microsoft Research trators and more flexibility to users. tecting users’ data from each other. Rik Farrow asked whether the se- When an operation is done, the Andrew Birrell described the first curity layer would look like an ad- tags on a compartment are removed steps of a design for authentication ditional virtual layer. Tal men- and the components restored. The and access control as part of Mi- tioned that the platform needs to be system finally uses trusted declassi- crosoft’s Singularity operating sys- beside the VM for security, but it is fiers that can act on behalf of multi- tem project. In actual operating an interesting question what the ac- ple users and traverse subprocess systems the facts that principals are tual architecture will look like. boundaries. bound to either users or “logged- in” users and that ACLs are flat lists Make Least Privilege a Right Philip Levis asked how Asbestos of principals are inadequate for (Not a Privilege) deals with database security. The making flexible access control deci- Maxwell Krohn, Cliff Frey, Frans answer was to have user data on sions. Kaashoek, and David Ziegler, MIT; different pages (serving as compart- ments) and a trusted index server. The new design is based on three Petros Efstathopoulos, David Mazières components: the naming tree, and Steve VanDeBogart, University of Jay Lepreau wanted to know how Asbestos differs from Flask in which records decisions the admin- California, Los Angeles; Michelle Os- istrator or implementer has made borne, New York University doing MAC in a distributed way. The answer was that applications, (e.g., when installing a program), Max Krohn said that a problem not only the kernel, can introduce thus separating static policy deci- faced today by servers is that compartments. However, as Jay sions from online control ones; a process boundaries do not always noted, a trojan might contaminate a compound principal mechanism; align with an application’s security declassifier and get access to other and a pattern recognizer for access- goals. Alice can steal Bob’s data via users’ data. Max agreed that they ing control lists. In the Singularity buffer overruns, trojans, SQL injec- need to be careful with declassifi- design, the principal is just a string tion, or even bad access control cation. constructed by a path in the nam- policies. Max presented a set of ing tree and logical operators. Since such scenarios based on Alice and Pei Cao asked whether an attacker applications are part of principal Bob accessing the same Web site. could trick the process into restor- names, they are described in the ing a component. The answer was naming tree in the form of mani- To avoid these issues, Asbestos OS that only the kernel enforces the uses Mandatory Access Control fests. Andrew argued that enumer- restore. John DeTreville asked ating principals in a control list will (MAC). Asbestos uses compart- whether this fine-grained control ments to track and control data not give the desired flexibility. In- is better. Max said, “The more fine- stead, given a list and a principal, it flow. Unlike other systems, com- grained, the better.” Alex Snoeren partments are introduced not only must be determined whether the asked whether a compartment tag principal string is contained in the by the kernel, but by applications could be illegally changed in the as well. The data tagger, which is a list string. The right approach, case of multi-threaded or event- therefore, is to do pattern recogni- small component that has no privi- driven applications. The answer leges, tags data based on users. tion, and for that reason regular ex- was that once a compartment has pressions are used. When running an Asbestos Web been labeled it cannot be accessed server, data flow is tagged; the more by a flow of different tag. Margo Armando Fox asked whether time the components of the applica- Seltzer asked what happens with expiration is included in the de- tion/system are touched by data other data, such as registers. The sign. The reply was that this might without conflicting tags, the more a answer was the registers are flushed be useful but no compelling need compartment grows, independently similarly to a context switch. Peter was found for that yet. Margo of the processes involved. Com- Druschel asked what happens if Seltzer noted that regular expres- partments are tagged upon reading user-specific data gets into a stack sions create a disconnect between data, and the more elements that and another user finds it. One way flexibility of expression and usabili- are touched (e.g., processes, sock- to deal with this problem is to wipe ty, because they are not understood ets, virtual memory pages), the the stack out when restore is is- by mere mortals. Andrew replied more a compartment grows. If a sued. that the ACLs will be created most- compartment that is already tagged ly by installation programs, not hu- by user A is touched upon a read by mans. Removing regular expres-

84 ;LO GIN: V OL. 30, NO. 5 sions does not solve the problem. els to the sensor nodes. The sensor protocol, IP, midway through the The goal is to be expressive. Mi- nodes can then only report events network stack that allowed work chael Jones asked how reputation- that violate the model. The proxy to proceed in parallel above and based access control could be in- can then either send a new model below that point. Can we just use corporated. Andrew replied that he or note the violation as an aberra- IP in sensor networks? No, it isn’t would not like to add more com- tion. This technique is more energy appropriate, but we should develop plexity than that of the naming efficient than a push model. It also something for sensor networks that tree. Reputation makes him nerv- allows the proxy to answer queries serves the same purpose that IP ous. Michael Scott suggested that immediately, since the proxy is no- does for the Internet. regular expressions could be used tified whenever an abnormal event Philip went on to argue that SP, the to find bad access control rules, occurs. If the query requires more Sensor Protocol, should be a single- something that Andrew agreed to accuracy than the model provides, hop protocol that provides a richer look into. Alex Snoeren noted that the proxy will first examine its interface than just send and re- since semantic value is put on the cache to see if it has already re- ceive. It should not specify the strings in the naming tree, there is trieved the needed information. If wire protocol, because the underly- a danger that if the tree is changed not, it will poll the relevant nodes. ing link layer varies too much. It it will hurt the system. Andrew This information may be available should work both for address-free agreed that they better get these at a lower tier in a multi-tiered net- protocols, such as flooding and tree names right; relative paths would work, possibly sparing the energy- collection, and for name-based pro- help there. Petros Maniatis asked scarce motes at the bottom from tocols. There should be an interface whether they could do combina- answering the query directly; if for the layers above to specify a for- tions of authentication methods in not, the data is cached at each tier warding predicate. Furthermore, it a scalable manner. Andrew replied on the way up, preventing further is important that it provide inter- that it was not possible in regular direct queries of the low-level node faces for things that have to cross expressions; they would need to if the data is needed again. These layers, such as power management, enhance their language. techniques provide a middle timing, and security. ground between streaming out all Petros Maniatis asked if SP will SENSOR NETS the data, which is energy expen- sive, and querying nodes directly, help in the wild or if it’s an academ- Summarized by Steve VanDeBogart which is slow. ic exercise. Philip responded that it will take place in both domains. In PRESTO: A Predictive Storage Towards a Sensor Network Architec- an academic sense SP will help us Architecture for Sensor Networks ture: Lowering the Waistline understand things at a deep level, Peter Desnoyers, Deepak Ganesan, David Culler, Prabal Dutta, Cheng Tien but at the same time it will facilitate Huan Li, Ming Li, and Prashant Shenoy, Ee, Rodrigo Fonseca, Jonathan Hui, real code. Additionally, because University of Massachusetts Philip Levis, Joseph Polastre, Scott sensor networks still exist in an Deepak Ganesan presented the Shenker, Ion Stoica, Gilman Tolle, and isolated administrative domain, we ideas and motivation for PRESTO, Jerry Zhao, University of California, may be able to iterate the design, a query architecture for sensor net- Berkeley unlike IP. works. Desirable features include Philip Levis began by saying that Breakout Sessions low latency, low power utilization, sensor network research today is a Summarized by Rik Farrow the ability to query archival data, mess. There are a lot of different During the last portion of the and the ability to formulate new options for solving a given prob- HotOS workshop, the attendees queries after events have already lem, but each solution is vertically split into groups that had been occurred. PRESTO tries to provide integrated into a complete stack arranged by Margo Seltzer. The all these features by taking advan- of solutions and each component assignment for each group was to tage of the decreasing cost of stor- in a stack is incompatible with any design and present a paper on a age as well as suppressing commu- other stack. Therefore, if you want particular topic in one hour. As it nications that report no new to use modules from more than one turned out, all papers that included information. stack you have to build a totally a PowerPoint presentation were ac- Many events that sensor networks new stack that integrates the com- cepted, and the one group that are currently being used to monitor ponents you need. This lack of ca- failed to reach that point had its have domain-specific models. For pability is a limiting factor to the topic rejected. instance, temperature variation is advancement of sensor networks. Completed papers will become part easily predicted from time of day This wasn’t a problem for the Inter- of the HotOS 2005 proceedings. and season. Based on previous data, net because there was a well-defined the PRESTO proxy can send mod-

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 85 tion is to add another set of layers. stituents, and the emergence of VM VEE ’05: First ACM/USENIX He then presented some perfor- abstractions. He then said that the Conference on Virtual Execution mance primitives, such as code motivation of the research is to en- Environments cache support, visibility of micro- sure fairness and efficiency in the ops, and so forth. underlying host resource alloca- June 11–12, 2005 Smith reminded the audience that tion. He advocated the use of self- killer applications motivate virtual- adaptation in the guest VMs them- KEYNOTE ADDRESS ization techniques. One of the kill- selves, based on feedback about er apps for virtualization is security. resource usage and availability. Summarized by Shuo Yang, Long Fei, and Xing Fang HLL VMs, with security features Bestavros defined a virtual machine built in; process VMs, whose code that fairly and efficiently adjusts its A Unified View of Virtualization can be inspected before being exe- demand for system resources as a James E. Smith, University of cuted; and system VMs, which pro- Friendly VM (FVM) and proposed Wisconsin vide isolation with simple VMMs, a resource-sharing technique that is James Smith started his keynote by can all be used to support secure applicable to any application whose discussing why virtualization tech- networked computing. execution is “friendly” to other ap- niques are interesting: they enable Smith also discussed the education plications sharing the same under- transcending interfaces, flexible in- curriculum about VMs. He pro- lying resources. The friendliness novation, adaptation to other soft- posed an outline of what to teach in feature of FVM applies the classical ware/hardware, networked com- a VM course, for example, putting end-to-end argument to the prob- puting, and enhanced security. He together virtually all levels of com- lem of multi-resource allocation views virtualization as the fourth puter system hardware and soft- across a set of applications sharing pillar in computer systems besides ware. He believes it is a challenging the same infrastructure. hardware, system software and ap- and necessary task. Bestavros said that the host in FVM plication software. Smith concluded the talk with the needs to provide unbiased on-de- He reviewed the domains and ex- following points: the common mand resource allocation and VMs, amples of virtual machines, and framework and terms should be and he mentioned the pricing is- the origins of virtual machine con- resolved; virtualization needs high- sues enabled by the FVM frame- cepts. He said that different interest er concepts, rather than a bag of work. Bestavros showed the experi- groups, such as OS developers, terms and techniques; and a unified mental result of resource utilization compiler developers, and applica- conference to exchange ideas from using the FVM system with “made- tion programmers have different different communities is needed, up” benchmarks and real Web serv- perspectives on virtual machines. and VEE serves that goal perfectly. er benchmarks. The results showed System virtual machines provide a that FVM successfully achieves system environment that is con- fairness and efficiency in sharing SCALABILITY, PERFORMANCE, structed at the ISA level. Process common hosting resources. AND REAL-TIMEXXXXXXXXXXX virtual machines are constructed at Diagnosing Performance Overheads in the ABI level. High-level-language Summarized by Shuo Yang the Xen Virtual Machine Environment virtual machines (HLL VMs) pro- Friendly Virtual Machines: Leveraging Aravind Menon, Jose Santos, Yoshio vide APIs, i.e., they raise the level a Feedback-Control Model for Applica- Turner, G. (John) Janakiraman, and of abstraction. tion Adaptation Willy Zwaenepoel He pointed out that many of the ex- Yuting Zhang, Azer Bestavros, Mina Yoshio Turner presented Xenoprof, isting VM techniques were invent- Guirguis, Ibrahim Matta, and Richard a systemwide statistical profiling ed on an ad hoc basis, and he asked West toolkit for the Xen virtual machine whether there might exist some- environment. Turner first discussed thing more fundamental than a col- Azer Bestavros presented a Friendly how the increased adaptation of lection of techniques. He then dis- Virtual Machines (FVM) frame- virtualization techniques can affect cussed his idea of how to establish work for efficient and fair resource application performance in unex- uniform concepts. He also reviewed allocation when sharing an under- pected ways. He then presented an the solutions and challenges of VM lying host system. Bestavros first example of Web server perfor- techniques. From there, he put discussed the background of virtual mance degradation under the Xen forward the question of how to machine adaptation: the trend of system, which motivated their enhance virtual machine perfor- VM techniques being increasingly Xenoprof project. An application’s mance. For example, the increase adapted to support applications performance in a virtual machine of system layer complexity leads to running on third-party hosts, the environment can differ markedly inefficiency. The Intel VT-x solu- need to isolate independent con-

86 ;LO GIN: V OL. 30, NO. 5 from its performance in a non-vir- cost of dynamic checks and the the implementation issues of the tualized environment, because of need for expression translation in E+S machine, which has been im- interactions with the underlying C. There is an overhead for using plemented using the StrongARM VMM. Xenoprof is designed to en- PLABs over thread-local allocation processor and integrated into able coordinated profiling of multi- buffers (TLABs) when the number HelyOS. In conclusion, Sanvido ple VMs in a system to obtain the of threads is less than the number said that their work has adopted distribution of hardware events. He of processors, or when threads are microkernel architecture for the introduced the Xenoprof design: a entirely compute-bound. PLABs are real-time application domain. paravirtualized interface to support much better than TLABs when the domain-level profilers. OProfile has number of threads is larger than the OBJECTS AND THEIR COLLECTION been ported to Xen environment as number of processors. He showed a domain-specific profiler. that their mechanism can combine Summarized by Long Fei Turner presented an example of the PLAB and TLAB adaptively. The Pauseless GC Algorithm Garthwaite said that their mecha- Xenoprof use, showing how they Cliff Click, Gil Tene, and Michael Wolf found a TCP-receive performance nism can easily be implemented in anomaly under XenoLinux. After x86 architecture. He then presented Cliff Click began by pointing out that, he presented the work done the experimental results of their that garbage collection response with Xenoprof to analyze several techniques and concluded that the time has become an important performance problems observed simple MP-RCS mechanism is in- problem for applications that con- under different Xen configurations dependent of the threading/sched- tain response-time-sensitive com- for receiver, sender, and Web server uling model and is applicable to ponents. He described a system, applications. Turner concluded that many platforms. including CPU, chip, board, and Xenoprof is a useful tool to identify A Programmable Microkernel for OS, built by Azul Systems to run major overhead in Xen. Xenoprof Real-Time Systems garbage-collected virtual machines. The hardware supports fast user- will be included in official Xen and Christoph Kirsch, Marco Sanvido, and OProfile releases. mode trap handlers. The hardware Thomas Henzinger TLB supports an additional privi- Supporting Per-Processor Local- Marco Sanvido presented the mi- lege level, GC mode, which lies be- Allocation Buffers Using Lightweight crokernel system architecture for tween the usual user and kernel User-Level Preemption Notification hard real-time applications. He first modes. TLB violations on GC-pro- Alex Garthwaite, Dave Dice, and Derek presented the reactive (response to tected pages generate fast user-level White environment) and the proactive traps instead of OS-level excep- Alex Garthwaite presented a local- (task scheduling in platforms) re- tions. The CPU supports a read allocation buffers (LAB) manage- quirement in embedded systems, barrier instruction, which resem- ment technique that supports local and introduced the concept and ar- bles a standard load instruction ex- buffer allocation with regard to chitecture of their solution’s design. cept that if it refers to a GC-protect- processors instead of threads. The E (embedded) machine is a ed page a fast user-mode trap virtual machine that triggers the (GC-trap) handler is invoked. Garthwaite first gave a performance execution of software tasks with re- comparison under different LAB- The GC algorithm is highly con- spect to events. The S (scheduling) current, parallel, and compact. The size policies for the VlanoMark machine is a virtual machine that benchmark. He made the point that algorithm is divided into three orders the execution of software phases, Mark, Relocate, and garbage collection is a hard prob- tasks, and together their E+S ma- lem when there are more threads Remap. The Mark phase is respon- chines equal the microkernel sible for periodically marking the than processors and high preemp- model. Their model represents the tion rates. live and dead objects. The Relocate abstraction of the interaction be- phase finds pages with little live He then presented a processor- tween the hardware platform, reac- data, to GC-protect, relocate, and local allocation buffers (PLABs) tively constrained by the E machine compact them and to free the back- strategy that associates local alloca- and proactively constrained by the ing physical memory. The Remap tion buffers (LABs) with processors S machine. phase updates every relocated and with buffers for each thread al- Sanvido talked about the time-safe- pointer in the heap. During the re- located from its processor’s LAB. ty requirement of hard real-time locating phase, if a mutator’s read- Multi-processor restartable critical applications and presented a proof barrier GC-traps, the GC-trap han- sections (MP-RCS) techniques im- that the time-safety requirements dler looks up the forwarding plement such a buffer allocation were fulfilled with schedule-carry pointer and places the correct value strategy. He then introduced the code in their system. He introduced both in the register and in memory. challenges of PLABs, such as the

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 87 The authors backed up their design Exploiting Frequent Field Values in provides two execution modes: na- with experiments using a modified Java Objects for Reducing Heap tive mode, where the debugger is version of SpecJBB benchmark. The Memory Requirements directly executed on a real CPU, authors also state that the read bar- Guangyu Chen, Mahmut Kandemir, and and the virtual machine mode, rier behavior can be emulated on Mary J. Irwin where the debugger is executed on standard hardware at some cost. a virtual machine. Guangyu Chen said that the capa- Use Page Residency to Balance Trade- bilities of applications executing on In order to provide compatibility offs in Tracing Garbage Collection embedded and mobile devices are and efficiency, the debugger uses Daniel Spoonhower, Guy Blelloch, and strongly influenced by memory size native machine code as its target. Robert Harper limitations. The authors use object The virtual machine translates the compression to improve memory native machine code of the target Daniel Spoonhower presented this program by inserting code to save paper. The key innovation of the space utilization in an embedded Java environment. The compres- states that are changed during exe- paper is a mechanism that allows cution. The debugger is capable of the collector to dynamically bal- sion is based on the observation that a small set of values appears switching between the native and ance the tradeoffs of copying and the VM mode. In the native mode, non-copying collection for each frequently in heap-allocated ob- jects. users cannot reverse-execute the page based on page residency, a target program; in the VM mode, measure of the density of reachable Their approach uses profile infor- the users can use reverse execution. objects on a page. If the residency mation to categorize the object Some basic debugging functionality of a page is sufficiently high, the fields into three levels: level-0 (the (e.g., breakpoint, step) is supported page should be promoted, other- field does not have a dominant fre- in both modes. The debugger al- wise it should be copied. quent value), level-1 (the field has lows four types of settings, to allow Measuring the residency of even a a non-zero or non-null frequent trade-offs between granularity, ac- single page requires a traversal of value), level-2 (the field has a fre- curacy, overhead, and memory re- the entire heap. To avoid this over- quent value that is zero or null). quirements of reverse execution. head, the authors devised several They propose two compression The user can choose the appropri- residency-prediction heuristics and schemes. The first one divides an ate setting by designating the prop- recovery mechanisms to handle object into primary part and sec- er reverse execution unit (line or poor predictions. The authors also ondary part (containing level-2 procedure) and optimization flag identified a continuous range of fields). The secondary part is elimi- (enable or disable). tracing collectors and showed that nated if all the level-2 fields are zero or null. The second scheme Module-aware Translation for Real-life classic GC algorithms can be con- Desktop Applications sidered special cases of the pro- shares level-1 fields among multi- posed GC algorithm with extreme ple objects. Experimental results Jianhui Li, Peng Zhang, and Orna Et- residency assumptions. showed that these compression zion schemes can reduce the heap size Their experiments revealed the im- Jianhui Li explained that a dynamic significantly with little perfor- binary translator is a just-in-time pact of heap size and configuration mance impact. thresholds on the performance of compiler that translates source ar- GC algorithms. Mark-sweep per- chitecture binaries into target ar- forms better when the heap size GOING NATIVE chitecture binaries on the fly. When hot modules are loaded and un- is small, whereas semi-space per- Summarized by Long Fei forms better when the heap size is loaded repeatedly, traditional dy- An Efficient and Generic Reversible large. In both cases, their algorithm namic translators spend a signifi- Debugger using the Virtual Machine yields a performance close to the cant amount of time on repeatedly based Approach better of these. Experiments show translating these modules. He pro- that this new algorithm has only Toshihiko Koju, Shingo Takada, and posed a translation reuse engine a small variation in performance Norihisa Doi that uses a novel verification method and a module-aware mem- under six different configurations. Toshihiko Koju began with the ory management mechanism. statement that reverse execution is very useful for locating the cause of There are three stages to accom- software failures. He described a plish translation reuse: translation novel reversible debugger that uses reservation, source binary verifica- a virtual machine based approach. tion, and translation revivification. This debugger provides compatibil- In this framework, when a translat- ity and efficiency. In addition, it ed code block is invalidated, it is

88 ;LO GIN: V OL. 30, NO. 5 preserved by the reuse engine and A program is first divided into cation to be compatible with differ- saved by the execution engine. In code partitions, which are then ent OS and hardware platforms. order to verify that the saved stored in a code server connected He introduced various approaches translation is exactly the expected to the client. Profiling is used to to building Web server applica- translation for a code block, the capture the hotness of code parti- tions, such as JVM (J2EE), CGI- translation engine uses a save-and- tions. The client’s code buffer is BIN, etc., and discussed the pros comparison scheme. If the reuse partitioned into multiple sub- and cons of each approach. Then engine decides to reuse the transla- buffers. The sub-buffers are or- he discussed different levels of vir- tion for a piece of the source binary, dered by the hotness of partitions tualization—OS, JVM, and Web it saves a minimum set of source bi- assigned to them. One sub-buffer server—with respect to their gener- naries that determine the semantics holds very hot code, while another ality. OS virtualization provides im- of translation. Before the transla- may hold infrequently executed portant and very general concepts tion engine translates a piece of the code. This approach is based on the to support application execution. source binary, it requests the reuse fact that most programs spend a JVM virtualization provides plat- engine to compare the saved partial large part of their execution in a form independence. Application source binaries with their counter- small portion of code. server virtualization targets certain parts in the current binary image. The authors discuss the overall classes of applications. The saved translation is used if they strategy of CB memory planning are the same. He said that application servers are and then describe two particular usually thought of as containers The authors propose a module- schemes. In the fixed scheme, code and pointed out the important role aware memory management mech- partitions are always housed in the of virtualization. First, the virtual anism, which organizes the trans- same sub-buffer during execution. environment serves as a logic wrap- lation code blocks of different In the adaptive scheme, partitions per. He gave a Web server applica- modules into different pools (mod- are cached in sub-buffers based on tion as an example of this view. He ule-private page pool and general a program’s run-time behavior. The then discussed Web application page pool). When a hot module is authors also introduce a heuristic containers as a virtual secure envi- unloaded, its private code pages are called density, which is defined as a ronment and talked about the scal- reserved for future reuse (unless partition’s execution frequency di- ability and performance challenges it’s identified as not reusable). Ex- vided by its size, to measure the of application servers. periments with real-life desktop priority of code partitions to reside applications show that this new in CB. Experiments show that these He presented cluster and workload translation-reuse technique can sig- schemes have fewer CB misses, management and hardware virtual- nificantly improve the performance which translates to a significant ization. One of the key features of of four real-life desktop applica- speedup. application servers is scaling. There tions. are a wide variety of configurations to virtualize hardware. He intro- Planning for Code Buffer Management KEYNOTE ADDRESS duced how Web cluster failover is in Distributed Virtual Execution Summarized by Shuo Yang, Xing Fang, and handled and recovered. Web clus- Environments Long Fei ters allow transparent application Shukang Zhou, Bruce R. Childers, and Application Servers as Virtualization updates and enable continuous Mary Lou Soffa Environments availability of service. WebSphere XD is the next level of sophistica- Many devices in a distributed com- Martin Nally, CTO, IBM Rational puting environment have tight tion of virtualizing hardware. He memory constraints. One approach Martin Nally first gave a broad discussed the challenges of an on- is to download code partitions on overview of virtualization services. demand operating environment for demand from a server and to cache Virtualization techniques serve as a large financial company: for ex- the partitions in the client. Shu- tools of software development syn- ample, the underutilization of kang Zhou and his colleagues ad- ergy between software and the exe- servers and the inability to share. dressed the problem of intelligently cution environment. According to He believes virtualization, such as managing the code buffer to mini- IDC data, the worldwide applica- WebSphere XD’s automatic man- mize the overhead of code buffer tion server software platform rev- agement, provides a better solution misses. They propose to move the enue is increasing dramatically. He than the conventional approaches code buffer management to the then introduced an example of in this case. server, where sophisticated writing Web server applications Nally concluded his talk with the schemes can be employed. without knowing the specific target following points. Application application servers, and showed the servers provide a virtual environ- necessity for the Web server appli-

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 89 ment for executing Internet and in- array which stores the current val- minimum achieved speedup of tranet applications. Application ues of all fields. nearly 12X in the micro-bench- servers present a simple virtual en- An object is represented by its allo- mark test cases. Inlining also ap- vironment to application program- cation instruction. The intraproce- pears to reduce the need for con- mers. Application servers virtualize dural analysis parses instructions servative assumptions about the across many physically individual that might cause an object to es- behavior of native code in the JIT computers that may be running dif- cape, and updates the escape state optimizer. ferent OS application servers, even of the instruction representing the Optimized Interval Splitting in a using some virtualization tech- object. Effects of various instruc- Linear Scan Register Allocator niques that are not seen at the OS tions on escape states were dis- level. Christian Wimmer and Hanspeter cussed. With SSA form, control Mössenböck He raised some questions at the flow is captured in phi-functions at end of his talk. J2EE applications the control merge points. Linear scan register allocation is run on JVMs which run on OSes, very suitable for JIT compilers be- Test results about compilation time cause it is much faster than graph and each of these layers is perform- and machine code quality were per- ing virtualization—is this working? coloring and is nearly as effective. formed and analyzed. The benefit is For each virtual register, lifetime Could some of the redundancy be very evident for some benchmarks, removed? Could these layers work intervals store the range of instruc- most notably mtrt in SPECjvm98 tions where a value is active. Two better together or even be coa- and Monte Carlo in SciMark. lesced? intersecting intervals must not Inlining Java Native Calls at Runtime have the same register assigned. Levon Stepanian, Angela Demke Brown, The algorithm assigns registers to DYNAMIC COMPILATION Allan Kielstra, Gita Koblents, and Kevin values in a single linear pass over TECHNIQUESXXXXXXXXXX Stoodley all intervals. Summarized by Xing Fang Levon Stepanian started out by ob- Three optimizations are proposed Escape Analysis in the Context serving that Java native calls are and implemented. Split positions of Dynamic Compilation and pervasive because they allow lega- are optimized to reduce the num- Deoptimization cy, high-performance, or architec- ber of spill loads and stores at run- ture-dependent native code to be time. They are moved out of loops Thomas Kotzmann and Hanspeter and into block boundaries. When Mössenböck integrated with Java applications. However, cross-language calls usu- two intervals are connected only by Thomas Kotzmann presented an ally incur large time and space a move instruction, the interval of intra- and interprocedural escape overheads, and this is true with the move target stores the source of analysis for a dynamic compiler. JNI. the move as its register hint. When Escape analysis determines, for possible, the target gets the same each object, whether it is accessible To reduce these overheads, the register allocated as the source, from within a single method, or authors propose inlining JNI calls eliminating the register-to-register one thread, or multiple threads. into Java applications with a JIT move. In two common cases that Method-local objects are eliminat- compiler. Both callouts (Java cover most of the intervals, moves ed and replaced with scalar vari- calls to functions implemented in inserted for spill stores can be re- ables. Thread-local objects are external languages) and callbacks moved. (external code accessing and modi- stack-allocated, and synchroniza- Christian Wimmer presented the tion on them is removed. fying data and services from a run- ning JVM) can be inlined. Work flow of the algorithm and used de- The analyses and optimizations are was done on the IBM TR JIT com- tailed examples to illustrate it. The implemented in Sun’s Java HotSpot piler, and the native functions exist algorithm is implemented for Sun’s Client VM, which has a front end in the form of W-code, the mature Java HotSpot Client VM. Results that operates on an SSA-based stack-based bytecode-like represen- prove the efficiency of the opti- High-level Intermediate Represen- tation generated by IBM compiler mized algorithm: the compilation tation (HIR). Escape analysis and front-ends. time of the algorithm is nearly lin- scalar replacement are performed ear, and it is even faster than the in parallel with the construction of Entire native functions are inlined original register allocation algo- the HIR. A state object containing a at their call sites. For small native rithm in the client compiler. locals array is maintained by the functions, removing the overhead compiler, to track the values most of callouts can be a significant ben- recently assigned to local variables. efit. The benefit of transforming The state object also has a fields callbacks is even higher, with a

90 ;LO GIN: V OL. 30, NO. 5 LANGUAGE REPRESENTATIONS Virtual Machine Showdown: Stack Instrumenting Annotated Programs Versus Registers Summarized by Xing Fang Marina Biberstein, Vugranam C. Sreed- Yunhe Shi, David Gregg, Andrew Beatty, har, Bilha Mendelson, Daniel Citron, An Execution Layer for Aspect- and M. Anton Ertl, and Alberto Giammaria Oriented Programming Languages A long-running question in the de- Instrumentation is commonly used Michael Haupt, Mira Mezini, Christoph sign of Virtual Machines is whether to collect program profile informa- Bockisch, Tom Dinkelaker, Michael stack architecture or register archi- tion. It is a spectative (i.e., one that Eichberg, and Michael Krebs tecture can be implemented more observes the program behavior) Michael Haupt introduced the key efficiently with an interpreter. program transformation and must concepts of the pointcut-and-ad- David Gregg started off by noting maintain the program structure and vice (PA) flavor of Aspect Oriented that stack machines were more functionality. Program annotation Programming (AOP): join points, popular, because of the small code enables developers and tools to pointcuts, and advice. A join point size and the ease of building stack pass extra information to later is a point in the execution of a pro- machines. The JVMs and PERL 5 stages of software development and gram, a pointcut is a query that interpreter took this approach. But execution. It is widely used in the quantifies over join points, defining PERL 6 used a register machine in- CLR platform and has been adopt- related sets of join points, and an stead, because register code was ed into the Java 1.5 standard. advice is a piece of functionality perceived as faster to interpret. Marina Biberstein gave a motivat- that can be attached to pointcuts, Execution of a VM instruction ing example of two annotations taking semantic effect when the re- could be broken down into instruc- that were both interfered with, al- spective pointcuts match. tion dispatch, operand access, and though differently, by instrumenta- According to Haupt, AOP language actual computation times. Register tion. To solve the problem, instru- mechanisms, like OOP mecha- code incurs less dispatch time than mentation must handle different nisms, deserve implementation ef- stack code, because of its fewer annotations in different ways. fort. AOP features have not gained number of instructions. However, There must be active cooperation sufficient support. register code needs to access its between the two. Currently, dispatching logic is in- operands explicitly so the code size The proposed solution takes an in- serted into application logic at is usually larger, resulting in more strumentation-driven approach. compile or load time. This gap in memory fetches for the code. Actu- Annotations are classified accord- semantics confuses debug efforts al computation time is about the ing to their behavior, and annota- and incurs performance drawbacks. same for register and stack codes. tion writers would, for each anno- The contribution of the work is the Instruction dispatch is more costly tation type, provide its description, integration of both the JPRM and than code fetching, so register code in the form of meta-annotations. the WM into the VM for supporting has the potential to be faster. The descriptions provide informa- AspectJ’s dynamic point model. The authors built a much more so- tion about the stage and lifetime of The authors developed Steamloom, phisticated translator from the the annotation, its scope, sensitivi- an extension to IBM’s Jikes RVM stack code to the register code. Re- ty to instrumentation, and whether that provides AOP functionality at sults showed that the register code the annotation can be removed or the VM level, assessable through a has 47% fewer instructions at a cost healed. This information is passed Java API. of a 25% increase in code size. Op- on to instrumentation, which then Evaluations show that the overhead timizations on the register code bases its decisions on the informa- incurred by the modification to im- were very effective. 43.47% of static tion provided. plement Steamloom is practically VM instructions were eliminated, The authors proposed a taxonomy zero, for hot runs. Class loading as well as 47.21% of the dynamic of annotations based on their study and method compilation overhead VM code. On a Pentium 4, the reg- of more than two hundred live ex- is about 7.8%. Other results show ister machine requires 32.3% less amples, which they used to classify that using an AOP-enabled infra- time than stack machine, with a annotations. They demonstrated structure does not in itself mean less than perfect dispatch scheme. their solution on a set of sample an- that execution is slowed down. Ap- With a better dispatch, the reduc- notations. plied modifications of the original tion in execution time was still VM do not critically interfere with 26.5%. It is a very strong indication other subsystems. AOP-related that the register architecture is su- functionality is more efficiently re- perior to the stack architecture for alizable at VM level. implementing interpreter-based VMs.

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 91 DISTRIBUTED VEES VM installation, and low perfor- He discussed the design of Hyper- mance overhead. Spector: it runs IDSes and server Summarized by Shuo Yang He gave a system-architecture applications on separate VMs, and PDS: A Virtual Execution Environment overview of the Entropia desktop it builds a virtual network across for Software Deployment computing system, which includes the IDS VMs. HyperSpector pro- Bowen Alpern, Joshua Auerbach, Vas- job management, resource manage- vides three mechanisms: software anth Bala, Thomas Frauenhofer, Todd ment, and physical node manage- port mirroring (packet capturing), Mummert, and Michael Pigott ment. Entropia virtual machines inter-VM mounting (filesystem (EVMs) consist of: (1) a desktop checking), and inter-VM mapping Joshua Auerbach presented a virtu- (process checking). al machine solution, the Progres- controller to guarantee unobtru- sive Deployment System (PDS), to siveness; and (2) a sandbox execu- HyperSpector has been implement- manage complex software deploy- tion layer to provide security fea- ed in their Persona operating sys- ment. The idea of PDS is to have tures. The sandbox layer takes two tem, which is based on the FreeBSD software packaged, to provision, approaches to guarantee security: 4.9 kernel. Their experimental re- deploy, and execute software on device driver mediation (with rela- sults showed the effectiveness of customer machines, and to share tively high overhead) and binary their HyperSpector system design software updates with many others interception (with low overhead). in terms of both security and over- (customers). The sandboxing execution layer head. provides file virtualization (all files Auerbach gave an overview of PDS are accessed through a confined Steps to Reducing Unwanted by introducing a working prototype virtual file system located in an En- Traffic on the Internet Workshop under Windows XP: assets to be de- tropia directory); file I/O throttling (SRUTI ’05) livered include Eclipse, JBoss, and and automated file encryption; reg- WebSphere. Using this example, istry virtualization; GUI virtualiza- he presented the architecture com- tion; network virtualization; and July 7, 2005, Cambridge, MA ponents of PDS and showed that network I/O throttling. Summarized by Jayanthkumar Kannan PDS’s virtual environment makes and Lakshminarayanan Subramanian, and software look locally installed and Finally, he discussed the perfor- edited by Balachander Krishnamurthy mance of jobs running on EVM. resolves environment conflicts. SRUTI, a first-time USENIX work- The talk concluded with an inter- shop, sponsored by AT&T Labs, He talked about PDS’s virtualizer. A esting discussion of market and Cisco Systems, and the Department process VM virtualizes the applica- customer demands. tion binary interface (ABI), not of Homeland Security, was attended hardware. Processes derived from HyperSpector: Virtual Distributed by 55 people, and 13 peer-reviewed the same asset are in the same VM. Monitoring Environments for Secure papers were presented. PDS uses a selective virtualization Intrusion Detection and only virtualizes OS calls that Kenichi Kourai and Shigeru Chiba DDOS AND WORMS access the assets’ virtually installed Kenichi Kourai presented a virtual image. Auerbach concluded his distributed monitoring environ- Using Routing and Tunneling to talk by comparing the related work ment called HyperSpector, whose Combat DoS Attacks and presenting the differences with goal is to achieve secure intrusion Adam Greenhalgh, Mark Handley, and PDS and saying that PDS provides a detection in distributed computer Felipe Huici, University College London feasible and convenient approach systems. to deploying software packages. The first session of the SRUTI He introduced the distributed workshop focused on different The Entropia Virtual Machine for intrusion detection systems (DIDs) forms of network-level filtering Desktop Grids and threats against DIDs by de- mechanisms to defend against Brad Calder, Andrew Chien, Ju Wang, scribing the behaviors and actions DDoS and worm attacks. The first and Don Yang of active attacks and passive at- paper argues that while many exist- Brad Calder introduced the back- tacks. He then talked about the ing DoS defense mechanisms are ground of desktop distributed com- traditional approach to solving hard to deploy, one can use a com- puting. The customers in this these challenges to DIDs—isolated bination of routing and tunneling venue require the following fea- monitoring—which is secure but techniques to obtain a deployable tures: desktop security, a clean exe- needs additional hardware and sup- DoS defense. The basic idea is to cution environment, unobtrusive- ports only network-based IDSes tunnel the traffic bound to a server ness, application security, ease of (NIDSes). across a fixed set of control points application integration, lightweight (edge routers in ISPs), which act as IP-level filtering gateways and use

92 ;LO GIN: V OL. 30, NO. 5 underlying routing protocols (e.g., SPAM-1 eral building blocks such as I-BGP, E-BGP, OSPF) to signal in- Bayesian detection, rate limiting, formation across different control Push vs. Pull: Implications of Protocol and blacklisting. This mechanism points. The concept of using nam- Design on Controlling Unwanted Traf- also leverages the social network of ing and path information as sepa- fic caller-callee relationships in deduc- rate entities to force inspection at Zhenhai Duan and Kartik Gopalan, ing the reputation of a caller. different control points is potential- Florida State University; Yingfei Dong, ly applicable in other network se- University of Hawaii BOTS AND SPOOFED SOURCES curity mechanisms. The second session was the first of The Zombie Roundup: Understanding, Reducing Unwanted Traffic in a two that focused on spam evasion Detecting, and Disrupting Botnets Backbone Network and detection. This paper proposes Kuai Xu and Zhi-Li Zhang, University a simple design principle for com- Evan Cooke and Farnam Jahanian, Uni- of Minnesota; Supratik Bhattacharyya, munication protocols that help par- versity of Michigan; Danny McPherson, Sprint ATL ticipants avoid unwanted traffic. Arbor Networks This paper shows how one can ob- The main observation is that a re- A common message from this ses- serve the communication patterns ceiver-pull approach is superior sion was the need for security de- of end-hosts and use this informa- to a sender-push approach in the velopers to share information in tion to determine unwanted traffic degree of control offered to a recip- order to keep pace with the grow- within the backbone of an ISP. The ient. However, in some applica- ing sophistication of Internet at- goal is to use the behavioral profile tions, such as email, a pure receiv- tacks. er-pull approach is not possible, of each end-host based on IP head- The first paper illustrates the preva- since communication is initiated by er information and the Zipf-like lence of bots on the Internet, where the sender. For such applications, a nature of traffic characteristics to thousands of new bots show up on sender-intent receiver-pull ap- identify and filter the large sources a daily basis, and it describes differ- proach is proposed, where the of unwanted traffic. One open ent techniques for detecting and sender first sends a short intent- question remains: Under what con- disrupting botnets. Among the dif- to-send message, on the basis of straints can good traffic be separat- ferent detection strategies, this which the receiver makes the deci- ed from bad traffic based only on paper stresses the need for a behav- sion to accept or reject the mes- observing the IP header informa- ioral methodology for analyzing sage. The principal advantage of tion? IRC traffic from end-hosts to detect this approach is the potential band- bot communication. One challenge Analyzing Cooperative Containment of width savings, since the receiver in measuring the prevalence of bots Fast Scanning Worms does not need to download the en- is that one needs to be part of sev- Jayanthkumar Kannan, Lakshmi- tire message. As pointed out by one eral botnets to perform such meas- narayanan Subramanian, Ion Stoica, workshop participant, this basic urements, raising legal issues. and Randy H. Katz, University of Cali- idea has been proposed before, but fornia, Berkeley this paper suggests a way of imple- An Architecture for Developing Behavioral History The final paper in this session menting it using simple extensions focused on analyzing the effective- to SMTP. Mark Allman and Vern Paxson, Interna- ness of different cooperative strate- Detecting Spam in VoIP Networks tional Computer Science Institute; Ethan Blanton, Purdue University gies for worm containment, specifi- Ram Dantu and Prakash Kolan, Univer- cally, on the relationship between sity of North Texas, Denton This paper examines how architec- the type of signaling between fire- ture can aid in determining the This paper deals with the problem walls and the level of containment. sources of unwanted traffic where of spam detection in VoIP net- This paper illustrates that the sig- the identity of a source can be in works. VoIP spam is likely to be naling strategy essential for good different granularities (e.g., email, more irritating to users than email containment depends on various end-host). The grand vision is to spam, since VoIP is synchronous. factors, including the reproduction build a repository that consists of In VoIP spam detection, the deci- rate of the worm (i.e., the number the sources of different forms of sion about spam potential has to be of new hosts one vulnerable host malicious traffic, with the chal- made using only the initial context affects), the level of malice, and the lenges that the architecture be scal- of the message and cannot be de- extent of deployment. How to gen- able, open system, distributed, ro- pendent on the content of the en- erate robust and succinct worm fil- bust, abe to handle various types of tire message. The paper proposes a ters with a low false-positive proba- traffic, and policy neutral. To detect multi-stage VoIP spam identifica- bility remains a goal for future bogus information in this reposito- tion mechanism that involves sev- work. ry, one would need audit trails for

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 93 evidence and the ability to assess the total traffic under certain condi- send spam so that HoneySpam can the reputation of reporters and cor- tions. then identify the spammers, traffic- roborate different entries for cor- Adaptive Defense Against Various Net- shape them, and provide them with rectness in the system. work Attacks incorrect information to hinder their progress. The challenge is to The Spoofer Project: Inferring the Ex- Cliff C. Zou, University of Massachu- tent of Internet Source Address Filter- hide the identity and location of setts; Nick Duffield, AT&T Labs—Re- the HoneySpam system. ing on the Internet search; Don Towsley and Weibo Gong, Robert Beverly and Steve Bauer, MIT University of Massachusetts Improving Spam Detection Based on Structural Similarity The final paper in this session de- This paper proposes a general way scribes a measurement study to to adaptively tune an attack detec- Luiz H. Gomes, Fernando D.O. Castro, quantify the extent and nature of tion mechanism in response to the Virgilio A.F. Almeida, Jussara M. Almei- source address filtering. Among the volume of attack traffic. The basic da, and Rodrigo B. Almeida, Universi- important findings are that a signif- idea is to periodically vary parame- dade Federal de Minas Gerais; Luis icant number of netblocks allow ters in a detection mechanism so as M.A. Bettencourt, Los Alamos National some form of spoofing, filtering is to optimize an objective function Laboratory applied inconsistently, filtering that includes penalties for missed This paper deals with improving policies correspond to netblocks in attacks (false negatives) and incor- traditional spam detection algo- BGP, and no specific geographic rect alarms (false positives). This is rithms using information regarding patterns abound in spoofing. based on the intuitive observation the social networks of the sender that a higher false positive proba- and the recipient. All senders are ADAPTIVE DEFENSE SYSTEMS bility is tolerable during periods grouped into clusters based on the of high attack. This technique is similarity of the recipients they Stress Testing Traffic to Infer Its applied to two detection mecha- send mail to. Similarly, receivers are Legitimacy nisms known in literature: the hop- grouped into clusters based on the count filtering method for detect- senders who have contacted them Nick Duffield and Balachander Krishna- ing spoofed SYN flood attacks, and in the past. The probability that a murthy, AT&T Labs—Research the threshold random walk for de- particular email is spam is comput- This paper proposes stress testing fending against worms. This paper ed based on the extent to which the as a general approach to distinguish provoked considerable discussion sender’s (recipient’s) cluster have between legitimate and malicious among attendees regarding the pros sent (received) spam in the past. traffic. By inducing artificial imped- and cons of such adaptive defense This decision is used to augment a iments to traffic and examining the techniques. While it is clear that Bayesian classifier, and the results reaction of the sender, one can de- smart attacks (such as pulsed DoS demonstrate that false positives are duce whether the traffic is mali- attacks) are still viable against reduced, but not by a significant cious. This idea is predicated on adaptive defense mechanisms, it amount. A question on the scalabil- two points: (1) differentiation: re- was generally agreed that adaptive ity of the system to several thou- sponse to impairment differs be- defense would reduce the impact of sands of senders/receivers was tween malicious traffic and legiti- the attack. raised, and the author suggested mate traffic; and (2) recovery: schemes like LRU aging to deal legitimate traffic can deal with im- SPA M-2 AND ENCRYPTION with this issue. pairments. They examine the appli- Lightweight Encryption for Email cability of these principles in differ- HoneySpam: Honeypots Fighting Spam Ben Adida, Susan Hohenberger, and ent domains, such as TCP, HTTP, at the Source UDP, SMTP, and BGP. The extent to Ronald L. Rivest, MIT Mauro Andreolini, Alessandro Bulgarel- which a TCP sender backs off in re- The final paper leverages identity- li, Michele Colajanni, Francesca sponse to an induced loss can be based encryption (IBE) techniques Mazzoni, and Luca Messori, Università used as a metric of the malice of the for easing the use of encrypted di Modena e Reggio Emilia sender. The frequency of HTTP email. The basic idea is to leverage connection establishment in re- The final session dealt with email DNS as a distribution mechanism sponse to a “Service unavailable” spam detection and encryption for public keys at the domain level. message can be used similarly. The mechanisms. The first paper de- In IBE, a sender can use the recipi- authors are also working on evalua- scribed the architecture of Hon- ent’s email address along with a tion of these techniques over nor- eySpam, a honeypot implementa- master public key (MPK) to derive mal traffic. One comment by an au- tion to reduce spam. The goal is the recipient’s public key. The dience member was that stress counter-cultural in that it encour- paper suggests that each email do- testing may lead to an increase in ages spammers to use the system to main should designate a set of key

94 ;LO GIN: V OL. 30, NO. 5 servers that would generate an ditional security, a recipient could One comment raised was that the MPK jointly and distribute it via also publish a second public key on ease of deriving the public key for DNS. These key servers would a broadcast channel. The security a particular recipient might also communicate the secret key for an of this scheme is dependent on that allow a spammer to encrypt mes- email address in their domain by of DNS and the channel between sages and render them unreadable simply sending it via email. For ad- the key server and the recipient. by spam filters.

• What, if any, non-text ele- are encouraged to write case ments (illustrations, code, studies of hardware or soft- diagrams, etc.) will be in- ware that you helped install writing for cluded? and configure, as long as you • What is the approximate are not affiliated with or paid ;login: length of the article? by the company you are writ- Start out by answering each of ing about. Writing is not easy for most of us. those six questions. In answering • Personal attacks Having your writing rejected, for the question about length, bear in any reason, is no fun at all. The mind that a page in ;login: is about FORMAT way to get your articles published 600 words. It is unusual for us to in ;login:, with the least effort on The initial reading of your article publish a one-page article or one your part and on the part of the will be done by people using UNIX over eight pages in length, but it staff of ;login:, is to submit a pro- systems. Later phases involve can happen, and it will, if your arti- posal first. Macs, but please send us text/plain cle deserves it. We suggest, howev- formatted documents for the pro- er, that you try to keep your article PROPOSALS posal. Send proposals to between two and five pages, as this [email protected]. In the world of publishing, writing matches the attention span of many a proposal is nothing new. If you people. plan on writing a book, you need DEADLINES to write one chapter, a proposed The answer to the question about table of contents, and the proposal why the article needs to be read is For our publishing deadlines, in- itself and send the package to a the place to wax enthusiastic. We cluding the time you can expect to book publisher. Writing the entire do not want marketing, but your be asked to read proofs of your arti- book first is asking for rejection, most eloquent explanation of why cle, see the online schedule. unless you are a well-known, pop- this article is important to the read- ular writer. ership of ;login:, which is also the COPYRIGHT membership of USENIX. ;login: proposals are not like paper You own the copyright to your submission abstracts. We are not U NACCEPTABLE ARTICLES work and grant USENIX permis- asking you to write a draft of the sion to publish it in ;login: and on article as the proposal, but instead ;login: will not publish certain arti- the Web. USENIX owns the copy- to describe the article you wish to cles. These include, but are not right on the collection that is each write. There are some elements limited to: issue of ;login:. You must grant per- that you will want to include in any • Previously published arti- mission for any third party to proposal: cles. A piece that has ap- reprint your text; financial negotia- • What’s the topic of the arti- peared on your own Web tions are a private matter between cle? server but not been posted to you and any reprinter. • What type of article is it USENET or slashdot is not (case study, tutorial, editori- considered to have been pub- FOCUS ISSUES al, mini-paper, etc.)? lished. In the past, there has been only one • Who is the intended audi- • Marketing pieces of any focus issue per year, the December ence (syadmins, program- type. We don’t accept articles Security edition. In the future, each mers, security wonks, net- about products. “Marketing” issue will have one or more sug- work admins, etc.)? does not include being en- gested focuses, tied either to events • Why does this article need to thusiastic about a new tool that will happen soon after ;login: be read? or software that you can has been delivered or events that download for free, and you are summarized in that edition.

;LOGIN: OCTOBER 2005 CONFERENCE SUMMARIES 95 PROFESSORS, CAMPUS STAFF, AND STUDENTS—

DO YOU HAVE A USENIX REPRESENTATIVE ON YOUR CAMPUS?

IF NOT, USENIX IS INTERESTED IN HAVING ONE

AT YOUR UNIVERSITY!

The USENIX University Outreach Program is a network of representatives at campuses around the world who provide Association information to students, and encourage student involvement in USENIX. This is a volunteer program, for which USENIX is always looking for academics to participate. The program is designed for faculty who directly interact with students. We fund one representative from a campus at a time. In return for service as a cam- pus representative, we offer a complimentary membership and other benefits. A liaison’s responsibilities include: I Maintaining a library (online and in print) of USENIX publications at your university for student use I Distributing calls for papers and upcoming event brochures, and re-distributing informa- tional emails from USENIX I Encouraging students to apply for travel stipends to conferences I Providing students who wish to join USENIX with information and applications I Helping students to submit research papers to relevant USENIX conferences I Providing USENIX with feedback and suggestions on how the organization can better serve students In return for being our “eyes and ears” on campus, liaisons receive a complimentary member- ship in USENIX with all membership benefits (except voting rights), and a free conference registration once a year (after one full year of service as a campus liaison). To qualify as a campus representative, you must: I Be full-time faculty or staff at a four year accredited university I Have been a dues- paying member of USENIX for at least one full year in the past For more information about our Student Programs, see http://www.usenix.org/students USENIX contact: Tara Mulligan, Scholastic Programs Manager, [email protected] Announcement and Call for Papers 2006 USENIX Annual Technical Conference: Systems Practice & Experience Track Formerly the General Session Refereed Papers Track http://www.usenix.org/usenix06

Training Program: Tuesday–Saturday, May 30–June 3, 2006 Boston, Massachusetts, USA Technical Sessions: Thursday–Saturday, June 1–3, 2006 Important Dates Overview Paper submissions due: Tuesday, January 17, 2006 Authors are invited to submit original and innovative (hard deadline) papers to the Systems Practice & Experience Track (for- Notification to authors: Monday, February 27, 2006 merly the General Session Refereed Papers Track) of Final papers due: Monday, April 17, 2006 the 2006 USENIX Annual Technical Conference. We Poster submissions due: Monday, April 24, 2006 seek high-quality submissions that further the knowl- edge and understanding of modern computing systems, Program Committee with an emphasis on practical implementations and Program Co-Chairs experimental results. We encourage papers that break Atul Adya, Microsoft new ground or present insightful results based on expe- Erich Nahum, IBM T.J. Watson Research Center rience with computer systems. The USENIX conference Program Committee has a broad scope, and we encourage papers in a wide Steven Bellovin, Columbia University range of topics in systems Ranjita Bhagwan, IBM T.J. Watson Research Center Topics Jeff Chase, Duke University Mike Chen, Intel Research, Seattle Specific topics of interest include but are not limited to: N Architectural interaction Jason Flinn, N Steven Hand, University of Cambridge Benchmarking N Deployment experience Gernot Heiser, University of New South Wales and N National ICT Australia Distributed and parallel systems N Embedded systems Kim Keeton, Hewlett-Packard N Dejan Kostic, EPFL Energy/power management N File and storage systems Jay Lepreau, University of Utah N Barbara Liskov, Massachusetts Institute of Technology Networking and network services N Operating systems Jason Nieh, Columbia University N Vivek Pai, Princeton University Reliability, availability, and scalability N Security, privacy, and trust Dave Presotto, Google N John Reumann, Google Self-managing systems N Usage studies and workload characterization Mendel Rosenblum, Stanford University N Stefan Saroiu, University of Toronto Virtualization N Web technology Geoff Voelker, University of California, San Diego N Alec Wolman, Microsoft Research Wireless and mobile systems Yuanyuan Zhou, University of Illinois at Urbana- Best Paper Awards Champaign Cash prizes will be awarded to the best papers at the conference. Please see http://www.usenix.org /publications/library/proceedings/best_papers.html for examples of Best Papers from previous years. How to Submit and plagiarism constitute dishonesty or fraud. The Authors are required to submit full papers by 11:59 USENIX Annual Technical Conference, like other sci- p.m. PDT, Tuesday, January 17, 2006. This is a hard entific and technical conferences and journals, prohibits deadline; absolutely no extensions will be given. these practices and may, on the recommendation of a All submissions for USENIX ’06 will be electronic, program chair, take action against authors who have in PDF format, via a Web form on the conference Web committed them. In some cases, the program committee site. Authors will be notified of receipt of submission may share information about submitted papers with via email. USENIX ’06 will accept two types of papers: other conference chairs and journal editors to ensure the N Regular Papers: Submitted papers must be no integrity of papers under consideration. If a violation of longer than 14 single-spaced pages, including fig- these principles is found, sanctions may include, but are ures, tables, and references, using 10 point font or not limited to, barring the authors from submitting to or larger. The first page of the paper should include participating in USENIX conferences for a set period, the paper title and author name(s); reviewing is not contacting the authors’ institutions, and publicizing the blind. Papers longer than 14 pages will not be details of the case. reviewed. Papers accompanied by nondisclosure agreements N Short Papers: Authors may submit short papers, at cannot be accepted. All submissions are held in the most 6 pages long. These will be reviewed, highest confidentiality prior to publication in the Pro- accepted submissions will be included in the Pro- ceedings, both as a matter of policy and in accord with ceedings, and time will be provided in the Short the U.S. Copyright Act of 1976. Papers Sessions for brief presentations of these Authors will be notified of paper acceptance or papers. We expect that this format will appeal to rejection by Monday, February 27, 2006. Accepted authors who wish to publicize early ideas, convey papers will be shepherded by a program committee results that do not require a full-length paper, or member. Final papers must be no longer than 14 advocate new positions. pages, formatted in 2 columns, using 10 point Times In addition, the program committee may accept some Roman type on 12 point leading, in a text block of standard submissions as 6-page short papers if they feel 6.5" by 9". the submission is interesting but does not meet the cri- Note regarding registration: One author per teria of a full-length paper. Please indicate explicitly if paper will receive a registration discount of $200. you do not wish your full-length paper to be considered USENIX will offer a complimentary registration for the Short Papers Sessions. Papers accepted for the upon request. Short Papers Sessions will automatically be included in Poster Session the Poster Session. Specific questions about submissions may be sent to A poster session will also be held, in conjunction with [email protected]. the reception, where researchers can present recent and In a good paper, the authors will have: ongoing projects. The poster session will be a good N attacked a significant problem forum to discuss new ideas and get useful feedback N devised an interesting and practical solution from the community. The poster submissions should N clearly described what they have and have not include a brief description of the research idea(s); the implemented submission must not exceed 2 pages. Accepted posters N demonstrated the benefits of their solution will be put on the conference Web site; however, they N articulated the advances beyond previous work will not be printed in the conference Proceedings. Send N drawn appropriate conclusions poster submissions to [email protected] by Simultaneous submission of the same work to mul- Monday, April 24, 2006. tiple venues, submission of previously published work,

Last Updated: 9/2/05 Announcement and Call for Papers

15th USENIX Security Symposium http://www.usenix.org/sec06

July 31–August 4, 2006 Vancouver, B.C., Canada Important Dates training program will be followed by a two and one-half day Paper submissions due: February 1, 2006, 11:59 p.m. PST technical program, which will include refereed papers, invited Panel proposals due: March 29, 2006 talks, Work-in-Progress reports, panel discussions, and Birds-of- Notification to authors: April 3, 2006 a-Feather sessions. Final papers due: May 11, 2006 New in 2006, a workshop, titled Hot Topics in Security (HotSec ’06), will be held in conjunction with the main Symposium Organizers conference. More details will be announced soon on the USENIX Web site, http://www.usenix.org. Program Chair Angelos D. Keromytis, Columbia University Symposium Topics Program Committee Refereed paper submissions are solicited in all areas relating to William Arbaugh, University of Maryland systems and network security, including: Lee Badger, DARPA N Adaptive security and system management Peter Chen, University of Michigan N Analysis of network and security protocols Bill Cheswick, Lumeta N Applications of cryptographic techniques Marc Dacier, Eurecom, France N Attacks against networks and machines Ed Felten, Princeton University N Authentication and authorization of users, systems, and Virgil Gligor, University of Maryland applications John Ioannidis, Columbia University N Automated tools for source code analysis Trent Jaeger, Pennsylvania State University N Cryptographic implementation analysis and Somesh Jha, University of Wisconsin construction Louis Kruger, University of Wisconsin N Defenses against malicious code (worms, viruses, trojans, Wenke Lee, Georgia Institute of Technology spyware, etc.) Fabian Monrose, Johns Hopkins University N Denial-of-service attacks and countermeasures Andrew Myers, Cornell University N File and filesystem security Vassilis Prevelakis, Drexel University N Firewall technologies Niels Provos, Google N Forensics and diagnostics for security Michael Reiter, Carnegie Mellon University N Intrusion and anomaly detection and prevention Michael Roe, Microsoft Research, UK N Network infrastructure security R. Sekar, Stony Brook University N Operating system security Anil Somayaji, Carleton University N Privacy-preserving (and compromising) systems Jessica Staddon, PARC N Public key infrastructure Salvatore Stolfo, Columbia University N Rights management and copyright protection David Wagner, University of California, Berkeley N Security of agents and mobile code Brian Weis, Cisco N Security architectures Tara Whalen, Dalhousie University N Security in heterogeneous and large-scale environments Invited Talks Co-Chairs N Security policy Patrick McDaniel, Pennsylvania State University N Self-protecting and healing systems Gary McGraw, Cigital N Techniques for developing secure systems Work-in-Progress Session Chair N Voting systems analysis and security Doug Szajda, University of Richmond N Wireless and pervasive/ubiquitous computing security N World Wide Web security Symposium Overview Note that the USENIX Security Symposium is primarily a The USENIX Security Symposium brings together research-ers, systems security conference. Papers whose contributions are pri- practitioners, system administrators, system programmers, and marily new cryptographic algorithms or protocols, crypt- others interested in the latest advances in the security of com- analysis, electronic commerce primitives, etc., may not be puter systems and networks. The 15th USENIX Security Sym- appropriate for this conference. posium will be held July 31–August 4, 2006, in Vancouver, B.C., Canada. Refereed Papers & Awards All researchers are encouraged to submit papers covering Papers which have been formally reviewed and accepted will be novel and scientifically significant practical works in security or presented during the Symposium and published in the Sympo- applied cryptography. Submissions are due on February 1, sium Proceedings. It is expected that one of the paper authors 2006, 11:59 p.m. PST. The Symposium will span five days: a will attend the conference and present the work. It is the responsibility of the authors to find a suitable replacement pre- Paper Submission Instructions senter for their work, if the need arises. Papers are due by February 1, 2006, 11:59 p.m. PST. All sub- The Proceedings will be distributed to attendees and, fol- missions will be made online, and details of the submissions lowing the Symposium, will be available online to USENIX process will be made available on the conference Web site, members and for purchase. http://www.usenix.org/events/sec06/cfp, well in advance of the One author per paper will receive a registration discount of deadline. Submissions should be finished, complete papers. $200. USENIX will offer a complimentary registration upon Paper submissions should be about 10 to a maximum of 20 request. typeset pages, formatted in a single column, using 11 point Awards may be given at the conference for the best overall Times Roman type on 12 point leading, in a text block of 6.5" paper and for the best paper for which a student is the lead by 9" (default LaTeX 11 point single-column article format is author. Papers by program committee members are not eligible acceptable). Reviewers may not take into consideration any for these awards. portion of a submission that is over the stated limit. Once accepted, papers must be reformatted to be about 8 to a max- Training Program, Invited Talks, Panels, imum of 16 typeset pages, formatted in 2 columns, using 10 WiPs, and BoFs point Times Roman type on 12 point leading, in a text block In addition to the refereed papers and the keynote presentation, of 6.5" by 9". the Symposium will include a training program, invited talks, Paper submissions must not be anonymized. panel discussions, Work-in-Progress reports (WiPs), and Birds- Submissions must be in PDF format. Please make sure your of-a-Feather sessions (BoFs). You are invited to make sugges- submission can be opened using Adobe Acrobat 4.0. For more tions regarding topics or speakers in any of these sessions via details on the submission process, consult the detailed author email to the contacts listed below or to the program chair at guidelines. [email protected]. To insure that we can read your PDF file, authors are urged to follow the NSF “Fastlane” guidelines for document prepara- Training Program tion and to pay special attention to unusual fonts. For more Tutorials for both technical staff and managers will provide details, see: immediately useful, practical information on topics such as N https://www.fastlane.nsf.gov/documents/pdf_create local and network security precautions, what cryptography can /pdfcreate_01.jsp and cannot do, security mechanisms and policies, firewalls, and N https://www.fastlane.nsf.gov/documents/tex/tex_01.jsp monitoring systems. If you are interested in proposing a tuto- All submissions will be judged on originality, relevance, cor- rial or suggesting a topic, contact the USENIX Training Pro- rectness, and clarity. The USENIX Security Symposium, like gram Coordinator, Dan Klein, by email to [email protected]. most conferences and journals, requires that papers not be sub- Invited Talks mitted simultaneously to another conference or publication and that submitted papers not be previously published elsewhere, or There will be several outstanding invited talks in parallel with subsequently published within 12 months of acceptance at the the refereed papers. Please submit topic suggestions and talk Symposium. In addition to citing relevant, published work, proposals via email to [email protected]. authors should relate their submission to any other relevant Panel Discussions submission of theirs in other venues that are under review at The technical sessions may include topical panel discussions. the same time as their submissions to the Symposium. Please send topic suggestions and proposals to sec06chair@ We may share information about submissions with the pro- usenix.org. The deadline for panel proposals is March 29, gram chairs of other conferences considering papers during the 2006. review period. In case of a double submission or a similar viola- tion, we will automatically reject the specific submission as well Work-in-Progress Reports (WiPs) as any other submissions by the authors of the offending paper, The last session of the Symposium will consist of Work-in- and reserve the right to take further action. Progress reports (WiPs). This session offers short presentations Papers accompanied by nondisclosure agreement forms will about work in progress, new results, or timely topics. Speakers not be considered. All submissions are treated as confidential, should submit a one- or two-paragraph abstract to both as a matter of policy and in accordance with the U.S. [email protected] by 6:00 p.m. PDT on Wednesday, Copyright Act of 1976. August 2, 2006. Make sure to include your name, your affilia- Authors will be notified of acceptance by April 3, 2006. The tion, and the title of your talk. The accepted abstracts and ses- final paper due date is May 11, 2006. Each accepted submis- sion schedule will be posted on the conference Web site. The sion may be assigned a member of the program committee to time available will be distributed among the presenters, with act as its shepherd through the preparation of the final paper. each speaker allocated between 5 and 10 minutes. The time The assigned member will act as a conduit for feedback from limit will be strictly enforced. the committee to the authors. Birds-of-a-Feather Sessions (BoFs) Specific questions about submissions may be sent via email Birds-of-a-Feather sessions (BoFs) will be held Tuesday, to the program chair at [email protected]. Wednesday, and Thursday evenings. Birds-of-a-Feather sessions are informal gatherings of persons interested in a par- Program and Registration Information ticular topic. BoFs often feature a presentation or a demonstra- Complete program and registration information will be available tion followed by discussion, announcements, and the sharing of in May 2006 on the Symposium Web site, both as HTML and strategies. BoFs can be scheduled onsite or in advance. To as a printable PDF file. If you would like to receive the latest preschedule a BoF, please send email to bofs@ USENIX conference information, please join our mailing list at usenix.org with the title and a brief description of the BoF; the http://www.usenix.org/about/mailing.html. name, title, affiliation, and email address of the facilitator; and your preference of date and time. D

Technical Knockout The magazine of choice for champion developers

Once again, Software Developers and Development Managers have selected Dr. Dobb’s Journal as the best industry publication, according to the 2005 Evans Data Developer Marketing Patterns Study.

WHY? Here’s what Dr. Dobb’s readers have to say*:

“DDJ not only provides a theoretical basis for using the new technology but usually provides a real-world implementation that I can study to determine applicability as well.” “DDJ is invaluable to me as a software architect/developer.” “DDJ is my connection to the world of people who actually think about how computers and software work.” “DDJ frequently suggests to me new tools and techniques for solving the problems that I deal with in my job.”

So, if you’re serious about software development and design, why not make sure Dr. Dobb’s Journal is in your corner every month?

Subscribe today at: Domestic: https://www.neodata.com/ddj/qualformWDDJ.html International: https://www.neodata.com/ddj/2awa.html

PO Box 52226 Boulder, CO 80322-2226 [email protected]

*Source: Signet Dr. Dobb’s Journal Ad Study, February 2005 DEC. 4–9, 2005 www.usenix.org/lisa05 SAN DIEGO, CA 19TH LARGE INSTALLATION SYSTEM ADMINISTRATION CONFERENCE

FIND THE MISSING PIECES TO YOUR TOUGHEST PUZZLES. 6 DAYS OF TRAINING 3-DAY TECHNICAL PROGRAM by industry experts, including: KEYNOTE: Qi Lu, Vice President of Engineering, • Rik Farrow on Hands-on Linux Security Yahoo! Inc., on “Scaling Search Beyond the • Don Bailey on 802.11 Wireless Network Public Web” Penetration Testing 2 TRACKS OF INVITED TALKS, INCLUDING: • Richard Bejtlich on Network Incident • Matt Blaze, University of Pennsylvania: “Picking Response Locks with Cryptology” • Jacob Farmer on Disk-to-Disk Backup • Kevin Bankston, EFF: “How Sysadmins Can Protect Free Speech and Privacy on the LISA ’05 offers the most Electronic Frontier” in-depth, real-world system NEW! Hit the Ground Running Track administration training available! Refereed Papers, Guru Is In Sessions, Vendor Exhibition, BoFs, WiPs, and more!

Sponsored by USENIX and SAGE

Check out the Web site for more information! http://www.usenix.org/lisa05

PERIODICALS POSTAGE PAID AT BERKELEY, CALIFORNIA USENIX Association AND ADDITIONAL OFFICES 2560 Ninth Street, Suite 215 RIDE ALONG ENCLOSED Berkeley, CA 94710

POSTMASTER Send Address Changes to ;login: 2560 Ninth Street, Suite 215 Berkeley, CA 94710