THE USENIX MAGAZINE

December 2003 • volume 28 • number 6 { inside:

SECURITY Perrine: The End of crypt() Passwords . . . Please? Wysopal: Learning Security QA from the Vulnerability Researchers Damron: Identifiable Fingerprints in Network Applications Balas: Sebek: Covert Glass-Box Host Analysis Jacobsson & Menczer: Untraceable Email Cluster Bombs Mudge: Insider Threat Singer: Life Without Firewalls Focus Issue: Security Deraison & Gula: Nessus Guest Editor: Rik Farrow Forte: Coordinated Incident Response Procedures Russell: How Are We Going to Patch All These Boxes? Kenneally: Evidence Enhancing Technology

BOOK REVIEWS AND HISTORY USENIX NEWS # CONFERENCE REPORTS 12th USENIX Security Symposium

The Advanced Computing Systems Association conference reports

Our thanks to Murray Kucherawy for his BSDCon ’03 areas covered 10 years ago, like gram- summaries mars and finite-state automata, are mys- SEPTEMBER 8–12, 2003 teries to younger programmers. A lot of SAN MATEO, CALIFORNIA senior developer productivity is lost Summaries by Murray Kucherawy explaining debuggers to the fresh crews coming into the market, which hurts KEYNOTE both the “bottom line” and our progress COMPUTING FALLACIES (OR, WHAT IS THE in general. WORLD COMING TO?) Michi Henning, ZeroC, Inc. While it is technically true that comput- Henning presented fourteen common ers are getting faster, software bloat and misperceptions of the technology indus- inefficiency is completely obscuring the try, and explored the fallacies of each. hardware advances offered by manufac- Those of us who have weathered the turers and their research. Do your storm of the dot-com collapse may be favorite Web pages really load any faster? nestled in the comfort of stable jobs, but There’s less of an emphasis on efficiency; according to Henning, the reality is that we no longer really care about bench- we’re far from where we need to be and, marks and actual performance compar- in some cases, possibly even going in the isons. wrong direction. How do we deal with all of this and get Many of these misperceptions involve back on track? Henning says we should the idea that computers in the workplace start acting in the interests of the people are easy to use and increase productivity. we really work for, i.e., consumers, and This overlooks some key considerations: not people obsessed with the “bottom Adding computers to the workplace also line.”It needs to be okay for developers establishes some infrastructure that has to do long-term work, with long-term to be maintained. GUIs were expected to funding, rather than fussing about how close the gap between using or manag- to achieve the current quarter’s projec- ing complex software and systems, but tions. There also needs to be a code of without truly good GUI designs – and ethics to quash the high levels of self- there are very few of those – the gap is interest currently dominating the indus- only changed, not truly reduced. try. Changing a single API can cost enormous amounts of time and money. Henning asserted that a great deal of Progress can only come from a lot more computing-related talent is wasted on cooperation and respect from everyone doing things just because they’re cool. involved — the market, the developers, This also applies to the latest and great- their managers, and our sales forces. est word processors and spreadsheet packages. There has been little true STICKY PROBLEMS advancement in the last decade, but new REASONING ABOUT SMP IN FREEBSD versions keep coming out, mainly to Jeffrey Hsu, FreeBSD Project please shareholders. Hsu discussed the logic behind lock Time-to-market pressures have also placement in the highly anticipated SMP reduced the average education of a soft- code for FreeBSD. SMP itself is exciting ware developer to far below what any not because it’s new, but because it’s seasoned administrator or developer becoming affordable, making a compari- would demand, with obvious detrimen- son of the innards of various implemen- tal effects. Remember how good you tations particularly interesting. were after just two years? Major subject

64 FreeBSD’s SMP locking is based on the ning. The kernel was never designed to have no concept of CPUs of non-uni- work done by the BSD/OS team. Only have “hot-plug” hardware. With the form capacity. EPORTS two of the low-level locking primitives advent of PCMCIA, USB, Firewire, hot- R Roberson presented a comparison of are needed in this implementation — plug PCI and other upcoming technolo- various existing scheduler algorithms, namely, mutexes and spin locks. This gies, you can have devices suddenly including the existing BSD, SVr4, and

approach comes from the observation appear and want to do something. ONFERENCE

Linux implementations, before going C that most locks in the SMP kernel actu- We don’t want to keep writing new dae- into the ULE implementation in detail. ally go uncontested, so complex locking mons for new technologies as they arrive methods are generally not needed. In The major components of the ULE (pccardd, usbd, apmd, etc.). Taking fact, it’s been observed that bus con- implementation include several queues, advantage of dynamic kernel loading tention will become an issue before any- two CPU load-balancing algorithms, concepts would be ideal, since it keeps thing complex really becomes necessary. scoring of interactive activity, a CPU the kernel size down. It’s a better use of developer time to con- usage estimator, and slice size and prior- centrate on subsystem lock code. The configuration for devd involves ity calculators. The load-balancing algo- defining event-action mappings that can rithms work together to keep the CPUs The approach used in SMP locking be triggered by, for example, device evenly loaded under a variety of load chiefly depends on what goes into the attach, device detach, and unknown conditions, even if the CPUs are of vary- subsystems involved. There are really device vents. It is possible to control a ing power. Since moving cached data only a few places where locks are truly device’s label even if the probe order regarding a specific from one necessary, and other operations should changes. Attach events can invoke con- CPU to another carries a cost, migration be skipped when considering a locking figuration actions such as triggering of threads between CPUs is taken into scheme. User-level race conditions, for dhclient executions, and it is also possi- consideration by these algorithms. Also, example, should really be dealt with out ble to guide configuration of devices threads scheduled for a non-idle CPU in user space. Locking single atomic based on location. Device drivers can be can be “stolen” by an idle CPU, and a reads, e.g., a read of four bytes, would loaded when the device arrives, rather periodic task evaluates the current load also be a waste of a lock. than having them built into the installed situation and evens it out. Reference counts are also used through- kernel. The configuration is similar to Graphs comparing the performance of out the FreeBSD SMP kernel. There is the format of modern named.conf files the four schedulers under various loads rarely a need for an atomic reference to define the event-actions. were presented and are available in the count increment/decrement primitive if Future work will include handling white paper. the basic mutex primitive is fast enough, power events, e.g., suspend and resume, especially given that most mutexes are ULE’s gains come mainly from the dock and undock. Link up/down events uncontested anyway. decoupling of interactivity, priority, and will also be able to trigger actions. Also slice size into individual parameters. Hsu closed by going over some of the planned is a control socket so that a Other schedulers leave these tightly cou- basic synchronization concepts that user-land application can monitor for pled, with varying side effects. The result should revive memories of threaded certain device events. of this is a system that appears to be programming courses from years past. much more interactive even when con- Obviously, such practices are especially ULE: A MODERN SCHEDULER FOR FREEBSD fronted with a lot of re-niced load: important in SMP as well, as it is proba- Jeff Roberson, FreeBSD Project “Livelock under nice load has been a bly a prime example of why those con- Jeff Roberson took on the task of writing constant problem for schedulers cepts are key. a new scheduler for SMP environments which ULE now avoids entirely.” after observing a lack of CPU affinity in DEVD – A DEVICE CONFIGURATION DAEMON the existing scheduler. “CPU affinity” RELEASE ENGINEERING M. Warner Losh, Timing Solutions, Inc. refers to a thread preferring the same Losh presented his work on devd, an CPU for later time slices to take advan- AN AUTOMATED BINARY SECURITY UPDATE event-driven device configuration dae- tage of large CPU caches. Supporting SYSTEM FOR FREEBSD mon package. The goal here is to over- this leads to enhanced support for Colin Percival, Computing Lab, Oxford come UNIX’s traditionally monolithic hyperthreading/SMT (symmetric multi- University approach to devices. Drivers are typi- threading) processors. Roberson also Percival’s package is intended to address cally compiled into the kernel or loaded observed that the common priority the ever-present problem of lazy system at boot time, but the device subsystems decay algorithms aren’t very fair in SMP administrators. Though Microsoft is never change while the system is run- environments, and current schedulers

BSDCON ‘03 65 best known for system managers who published in the update index. Percival CPU should take into consideration don’t bother to apply security updates in says this is a limitation he can accept price, performance and software sup- a timely fashion (or, indeed, at all), the because such a case would only occur for port, but don’t forget to think about dis- open source community is not immune. an administrator who has compiled his persing all the heat you will generate! He asserts that at least 25% of FreeBSD or her own source, and such people Network options are again based on administrators are also behind the likely don’t have a real need for an auto- operating system support, price, and the curve. Among other things, impedi- mated binary update system. needs of your application. You’ll also ments to improvement involve cvsup Of course, this all relies upon trust of need to tackle the question of public vs. being non-intuitive, and the confusing the source of the binary updates. This private IPs, for obvious security and and time-consuming nature of the problem can be addressed by Byzantine provisioning reasons. The node naming make-world approach. Some systems methods, whereby updates would only and IP assignment convention selected lack the resources to make existing be trusted (and therefore installed) if for the Fellowship reflects location in the approaches palatable. some minimum number of systems racks of the machines. Don’t name your The trick here is to determine what independently built and signed the same machines after the services they provide, binaries are affected by a change to a set of updates. because this can come back to haunt you particular piece of source code. The later. answer is not always obvious. There are BUILDING A HIGH-PERFORMANCE COMPUT- Node configuration management can be some simple approaches one can take, ING CLUSTER USING FREEBSD a headache for clusters of this size. Con- like comparing a make-world result Brooks Davis, Michael AuYeung, Gary sider such things as individual vs. auto- before and after the change, but many Green and Craig Lee, Aerospace Cor- mated OS and software installations and files change every time they are built by poration network booting. Automation of tasks in virtue of such things as build time- The goal of this project is to be able to a large cluster is critical to efficient use stamps. It’s possible to work around this build a high-performance cluster of of your time. Also think about your job- since build stamps, for example, are machines using commodity PCs, usually model: manual or batch? Of always in the same place, so if that’s all running one of the free operating sys- note here is the Sun Grid Engine (SGE), that differs, it can be skipped. There are tems. The cluster (named Fellowship, which has been ported to FreeBSD. always a few other complications after the Fellowship of the Ring) has Don’t forget your monitoring tools (e.g., though, such as fortune files, kernel ver- four core machines: frodo, the manage- Nagios (Net Saint), Big Sister, Ganglia, sion numbers, and quirks in gcc. Some- ment ; fellowship, the shell server; and, again, SGE). times a particular file will have both gamgee, for backups, databases, and crypto and non-crypto versions distrib- monitoring; and legolas, a scratch server The team learned that in a commodity uted. with 2.8 terabytes of storage. The cluster cluster environment, hardware attrition as a whole runs FreeBSD 4.8-STABLE, can be significant, so plan accordingly; Once a list of files to be distributed has and provides over 183 gigaflops of float- investing in neat cabling practices and been established, an update index is gen- ing-point compute power (LINPACK equipment are well worth the invest- erated with lines indicating the file to be benchmark). It consists of 160 nodes, ment; and automation is extremely replaced, the “old” MD5 hash of the file each dual CPU, using a mix of Pentium important. to be replaced, and the MD5 hash of the III and Xeon chips. The network used is file replacing it. Updating an old file GigE, with terminal servers for serial Future work may include Beowulf-style could have one of several “old” hashes, console access, and serial power con- process management, a checkpoint and so each of these is included in the trollers. restart service, use of a distributed file update index, along with a 2048-bit system (e.g., GFS), on-demand cluster- public RSA key, an MD5 hash of the Almost any OS works to build such an ing, and a database-driven DHCP ser- update index signed with the private environment. Things to consider when vice. part of that key, the new binaries, and selecting an OS are locally available sys- binary diffs. The distribution of the tem administrator experience, the appli- BUILD.SH — CROSS-BUILDING NETBSD public key is secured by including its cations you plan to run, the mainte- Luke Mewburn and Matthew Green, MD5 hash with the updating software. nance model you want to support, the NetBSD Foundation availability of diskless machine support, The NetBSD build infrastructure An obvious limitation to this approach and your relationship to the OS vendor includes the capability to cross-build an is that the binaries to be replaced must for help with all that extra tinkering entire release, including bootable media. match one of the “old” MD5 hashes you’re going to want to do. Selection of Luke Mewburn presented a discussion of

66 this capability and the changes involved INVITED TALK Further, license-exempt devices have no in making NetBSD capable of support- LONG-RANGE 802.11 WANS priority rights over any other user. EPORTS ing this system. R Tim Pozar and Matt Peterson, Naturally there are political obstacles as Native builds of NetBSD releases don’t cofounders of BARWN well. Governments generally own all of scale. The number of machines required BARWN (Bay Area Research Wireless the good potential transmitter locations, Network) is a community wireless net- ONFERENCE would be staggering, and the time but work at glacial speeds. Finding the C needed to compile on the slower ones is work based in San Francisco, with access right person and getting full sign-off is oppressive. In a cross-compile, one host points in the city and one on Mount San no less than a challenge. There are ever- builds for another architecture, so with a Bruno serving South San Francisco, present permit and zoning issues, and small number of very fast machines, a Colma, and Daly City. It is similar in finding someone willing to take a risk on complete release can be accomplished concept to other metropolitan wireless a private experimental project is never with greatly reduced cost and time. networks such as SFLan, SFWireless, easy. However, governments do like There is no need for superuser access to NYCwireless, and SeattleWireless, espe- demonstrations, so BARWN put do so, even to build the distribution cially with its common technical and together a demonstration for a San media. A goal in this design was to avoid political problems. All of these are built Francisco Police Department mobile nonportable OS things like chrooted on the concept of member-owned infra- command center using streaming video, environments, shared libraries and loop- structure, but differ on the for-pay vs. and that managed to grease some wheels back or virtual file systems. It is impor- free issue. to get the project where it is today. tant also to separate the build tools from BARWN’s objectives are to develop and Technical challenges also abound. 802.11 the installed build tools, and to have document long-range (over two miles) doesn’t scale well, especially over large minimal impact on the NetBSD source very low-cost wireless networking, and distances. Interference with other nearby tree. to provide a test bed for new protocols. equipment is also a consideration. The three main tools used are build.sh, Practical applications, such as public 802.11h will go some way to help clear which does the cross-compiles; makefs, safety and incident response, can also these hurdles using a technique called which builds file-system images (cur- serve as a backbone tying together vari- “frequency co-ordination,”selecting the rently only ffs, but there is planned sup- ous communities. Other positive side quietest part of the spectrum to use. port for iso9660, ext2fs, and FAT); and effects involve counteracting the loss of Build-out over short distance can be installboot, a cross-platform-friendly bi-directional expression on the Inter- done visually, but longer-range installa- boot sector writer. net, since BARWN offers limited AUP tions need to be done with expensive restrictions, symmetrical bandwidth, no surveys. An important feature for the “unprivi- port filtering, and real static address leged build” process is the change to log- space. BARWN also hopes to bridge the FreeBSD has a lot of good, stable sup- privileged file-system operations, such as gap between clients on different major port for wireless, with more under permission changes to a “meta log file” providers, so that traffic within San development, but is leading the instead of actually applying them, and Francisco doesn’t need to transit a major game by a small margin and their efforts using that information when building network exchange down in San Jose tend to work around a lot of firmware installation media and tar files. first. idiosyncrasies in various cards more effectively. There is a general lack of The result of this work is a set of regular The deployment prefers triangles of cov- drivers for Broadcom and TI devices, automated builds for all platforms from erage, and it has been found that there’s although there are unofficial drivers a few sources. The system is very simple good general coverage from the top of rumored to be out there. to use, but it has some teething prob- Mount San Bruno in the area between lems. Not all software is cross-compile Daly City, South San Francisco and STORAGE/CRYPTO friendly. Colma, and San Francisco. The sticky GBDE–GEOM-BASED DISK ENCRYPTION legalities of using higher-powered radio Upcoming work involves solving the Poul-Henning Kamp, The FreeBSD hassles with X11 cross-builds, and some frequency repeaters and transmitters, Project though, can impede deployment. RF improvement in cross-compiling pack- Kamp’s work involves the principle of radiation dictates how these antennas ages where autoconf is involved, as they “making sure data gets lost.”User ID and are deployed, and limits public access to seem to have a pattern of not being very password protection aren’t enough for do so. Local governments may even reg- cross-compile friendly. really important data. A hard drive can ulate the aesthetics of such deployments. be easily removed from one machine

BSDCON ‘03 67 and inserted in another, mounted as a etc. Of course, people can’t or don’t ones used by KAME IPSec, the estab- secondary disk, and read without diffi- want to remember those. GBDE can take lished best-of-breed. OpenBSD API culty. As an extreme example, the battle a passphrase from anywhere, such as compatibility was always the top prior- plan for Operation Desert Storm was keyboard, USB-key, or chip cards. Kamp ity. Performance of the initial work was stolen from a car on an unsecured lap- recommends making a passphrase out of slower than desired. There was a lot of top! two parts: your private keyboard stuff extra context switching and CPU use. and 1–8K of random bits on a USB key, Leffler was sure things could be faster. GBDE is a GEOM-based solution for the “something you know” plus “some- protection of hard drives with strong Peak performance of the package was thing you have” principle. crypto. Developed under a DARPA con- limited by the context switch rate on tract, it is and application Support for destructive keys enables a many systems. The initial framework independent, and architecture and byte- data owner to get rid of data fast. required two context switches for each endian invariant. GBDE works at the Kamp’s examples included such things operation. Leffler replaced the kernel disk level, so an encrypted partition as students taking over an embassy, thread with a software thread looks like any other partition. This on human rights offices by police or col- for a vast improvement in performance, makes it trickier for implementing good lege dorms by the RIAA, or perhaps the by a factor of 3.6. crypto, but in the end this approach wife asking, “What takes up those 40GB Making a distinction between normal makes the service easier to use. The on our hard disk?” The user can quickly and “batchable” operations enabled fur- invariance is important for media porta- destroy all the lock sectors by erasing the ther optimizations. Operations that were bility, and extends lifetime of the algo- 2048 + 128-bit master key. Attacking the not batchable used a direct dispatch rithm for future systems. disk now requires O(2^384) work, method. Replacing the software inter- which is much bigger than the If an encryption system is too cumber- rupt dispatch with direct dispatch to the O(2^256) work needed when the keys some, people just won’t use it. GBDE, crypto driver was four times faster, so are intact (though that’s a huge amount however, is practical and deployable. It there was already an improvement factor of work anyway). You get positive feed- uses multiple parallel passphrases, with of about 15. back that the lock is destroyed. A recover master key schemes, backup key meth- is still possible if the encrypted lock sec- Since many callback methods can take a ods, and destructive keys, which render tor can be restored from a backup. long time, it was inadvisable to execute a the data permanently useless when callback method in the context of the applied. The passphrases are all change- The hit in performance and disk space is device driver. However, the callback used able. The crypto principles applied are minor. The biggest risk is bad sectors, by the /dev/crypto driver does execute all the standard algorithms: AES, SHA2, which will unfortunately lock down quickly and also avoids a context switch. and MD5. The primary strength of the chunks of the disk. Use of this optimization, with due care system is via the crypto, and the second- GBDE is available in FreeBSD 5.0 and in the area of synchronization, yielded ary strength comes from frustrating later. another factor of 33 improvement. attackers via such things as unpre-

dictable on-disk locations and one-time- CRYPTOGRAPHIC DEVICE SUPPORT FOR All of these improvements reduce over- use sector keys. FREEBSD head on the system, so everyone wins. Comparing the FreeBSD implementa- The keys used are symmetric, unlike Samuel J. Leffler, Errno Consulting tion to the one in OpenBSD 3.3 shows PGP, for example; a 128-bit symmetric Leffler discussed his work porting the that this work yielded more than a 70% key is about as strong as a 2304-byte OpenBSD cryptographic framework to improvement for certain hardware up to asymmetric key. Breaking 128 bits of FreeBSD and improvements he made in operand sizes of 1KB. This is now avail- data will open a single sector. Breaking doing so. The goals here were hardware- able in the CURRENT and STABLE 256 bits will open the entire thing, but accelerated cryptographic transforma- FreeBSD branches, and NetBSD added it you’d also have to try all sectors to find tions for kernel and user applications, in August 2003. Future work will sup- the randomly placed lock sector, and if compatibility with the OpenBSD API, port asymmetric operations, support for you try a lot of variant encodings, you’d and a pass at tuning for performance. more and better hardware, and load bal- have to be able to recognize that you Leffler earned the Best Paper award for ancing. have an actual hit in the first place. this paper. The passphrase is the weak point, as Core support was converted from SPL- usual. To be useful, it has to be long and style synchronization to a fine-grained subtle, using control characters, digits, locking method, and the software crypto algorithms were merged with existing

68 ENHANCEMENTS TO THE FAST FILE SYSTEM TO exhausted. Some performance enhance- and vulnerability to viruses. There are SUPPORT MULTI-TERABYTE STORAGE SYS- ments were made to as well. some obvious examples of this. EPORTS

TEMS R Enhancements for live dumps have been Mass-market software is full of security Marshall Kirk McKusick, author and consultant added to support snapshots. A setuid holes, is bloated and inflexible, and, as root utility has been added to make a Henning pointed out in his keynote

McKusick presented recent work on ONFERENCE

snapshot of a file system to allow non- speech, is intentionally incompatible C extending the capacity of the Berkeley root users to make snapshots. Large file even with its own versions. This makes it FFS under FreeBSD. The current imple- system snapshots are supported. annoying and expensive to administer. mentation uses 32-bit block pointers, Even worse are closed-box solutions, which means file systems are limited to The amount of memory needs for which can’t verifiably satisfy security, only a few terabytes. I-nodes lack space large file systems is proportional to the reliability, or autonomic operation to add new functionality, and some size of the file system being checked. requirements. This hinders analysis of newer file-system technology is difficult Four bytes are needed per i-node, 50 the system and its implications, impedes to apply without changing the existing bytes per directory, and one bit per improvement of quality, and limits on-disk format. block. This means 1.2MB of memory is urgent on-site fixes. However, the needed per terabyte on a file system like UFS2, the new version of FFS, addresses closed-box approach benefits develop- /usr, but only 66K is needed per terabyte these issues. There is a single code base ers. Hiding intellectual property makes on an MP3 file system because all the for both the older and newer implemen- money and encourages consumer reten- files are very large. tations. The new on-disk format tion and loyalty. Reluctant continuance increases the size of an i-node from 128 This work has been present in FreeBSD is incentivized. Regrettably, security by to 256 bytes, but directory format is 5.0 for over a year, and has now been obscurity does get some mileage. One retained. The existing linear scan ported to NetBSD. Future work will would think that closed-box systems remains, but there are hooks for an allow extent-based storage allocation by have implied liability, except the shrink- indexing system. The two implementa- having each i-node store its block size wrap user agreement disclaims every- tions share directory manipulation code. directly. thing. The idea of cylinder groups is retained, Open source has its own problems. The but all geometry information is elimi- INVITED TALK source is available to intruders, and vul- nated. A superblock is added as a super- SOCIAL AND TECHNICAL IMPLICATIONS OF nerabilities can be exploited. However, set of the original superblock. The size NONPROPRIETARY SOFTWARE as we all know, open source is also reserved for the boot block area can be Peter G. Neumann, Computer Science important in promoting the evolution of selected rather than being fixed, and a Laboratory, SRI International high reliability and critical systems. zero-size boot block may be selected for Neumann presented his views about the Given a really secure system, open file systems that don’t need one. open source community and the work it source isn’t harmful since it’s of little Extended attributes are now supported. produces in contrast to the industry’s help to attackers. Open box solutions, on There’s optional auxiliary data stored commercial output. The BSD commu- the other hand, need more work to with each i-node, much like Apple data nity has made significant advances make them trustworthy, robust, and forks. The current implementation toward high-assurance, trustworthy sys- dependable. allows an expandable 32K that is used to tems. They are generally secure, reliable, In general, failures are likely due to poor support such things as ACLs and MACs. and interoperable. security. Viruses, worms, and trojans are Timestamp fields are now 64-bits long, Where disciplined development is pres- rampant, and there are denial-of-service and a fourth timestamp has been added ent, open source has many advantages attacks for which there is, unfortunately, called “birth time,”which is the actual over closed-source proprietary systems, almost no defense. There is a general creation time of the file. The i-node flags and has the possibility of being very belief that using cryptography secures are separated into user-settable and ker- robust. The “best practices” advocated you, even if your use of it is sloppy. nel-settable flag sets. by commercial proprietors are substan- There is extensive outsourcing of system I-nodes are now dynamic. Only two dard and often inadequate. They suffer administration duties, which in itself blocks of i-nodes per cylinder group are from lax requirements, flawed lan- can be a compromise to security. The now allocated during newfs, and blocks guages, and sloppy development, caus- Homeland Security Agency wants to dedicated to i-nodes are expanded ing bounds and buffer size violations deploy Microsoft software globally whenever the current i-nodes are nearly

BSDCON ‘03 69 within its organization to facilitate inter- able contracts, liabilities, and incentives. SYSTEM BUILDING operability. The list goes on. Sound business models for open-box RUNNING BSD KERNELS AS USER PROCESSES software need to be designed. Architec- There can be major social implications BY PARTIAL EMULATION AND REWRITING OF tures need to be more robust, with mini- ACHINE NSTRUCTIONS for computing failures. The USS York- M I mal dependence on weak components. town’s engines shut down for nearly Hideki Eiraku and Yasushi Shinjo, Uni- We must work toward more trustworthy three hours because of a Microsoft versity of Tsukuba servers, firewalls, and distribution paths. machine suffering a division-by-zero Eiraku and Shinjo won the Best Student User authentication has to become far error. Design issues caused patriot mis- Paper award for their work on this less trivial, with bilateral peer authenti- sile inaccuracies. Bad UI design and paper. cation. The computing infrastructure faulty assumptions caused the Iran Air has to become a lot more resistant to Running multiple OSes on a single airbus shoot down. The 1980 ARPANET denial-of-service attacks. We need better machine enables the simultaneous exe- collapse and the 1990 AT&T nationwide protocols and analysis tools. cution of applications written for differ- slowdown are also prime examples. ent operating systems. There are two Developers of closed-box solutions often The discussion that followed acknowl- typical approaches to this: virtual OSes make “could never happen” assump- edged that many of us involved in the and user-level OSes. The latter have tions, eventually with disastrous results. open source movement know and heed porting problems, involving tremendous these points, but it’s an uphill battle. Again echoing Henning’s keynote effort and/or detailed knowledge of the That very day, CSPAN was broadcasting speech, Neumann also mentioned com- host and target kernel and architectures. a congressional subcommittee hearing mon development fiascoes, such as ram- about a recent virus outbreak that was Partial emulation of hardware and the pant feature creep, bloat, very expensive in terms of both time and rewriting of machine instructions at incompatibilities, bad requirements, and money. Microsoft, NAI, and others were compile time is done to detect some bad architectures. The update of the invited to testify. The ultimate advice nonprivileged instructions that are nation’s air traffic control system left coming out of this meeting was “don’t tightly related to privileged ones. Imple- controllers with basically the same click on PIF attachments” rather than a mented this way, user-level NetBSD is equipment and squandered over four concession that this is the fault of bad faster than NetBSD on Bochs, a virtual billion dollars. The update of the IRS software design. Recent worms have got- machine implementation, by a factor of told a similar story. ten very deep penetration but fortu- 10. Thus, we can generate a user-level UI design is also responsible for some nately haven’t been malicious... yet. OS based on a native system. The success major disasters. The design of helicop- We’ve reached a point where large-scale is somewhat limited, though; the source ters that would eventually be dispatched infections are very easy to create. Unfor- is needed, and it’s still slower than to the Middle East had no requirement tunately, deaths from such egregious NetBSD on VMware or than user-mode for engine shielding against EM interfer- flaws in software design do not get Linux, mainly because of the volume of ence, or even sand. Pacemakers and anti- enough attention, and new laws tend to faults taking place. theft devices do not mix, a requirement remove liability rather than enforce it. There are four key issues to tackle: detect that was never considered. John Denver’s Neumann says BSD platforms are par- and emulate privileged and some non- final flight involved a fuel-starved engine ticularly promising in developing trust- privileged instructions; redirect system because the user interface for the fuel worthy systems. An alternative to the calls and page faults to the user-level OS; system in his aircraft was basically commonplace homogeneous Microsoft emulate essential peripherals; and emu- “down for right tank, right for left tank, installations is desperately needed. late the MMU. The changes to NetBSD up for ‘off’.” Open-box software is not the final to work in this environment were minor Neumann advocates future emphasis on answer, but it has enormous potential, but necessary, as unmodified NetBSD discipline in development and good especially with continued diligence and 1.6 does not provide the needed facili- engineering of products at all levels. We discipline. ties. The PTRACE_SYSCALL facility of need improvements in evolution, evalu- Linux was introduced, and six constants, ation, education, and training. There including the base address, were needs to be more effort given to respon- changed. After doing this, detailed ker- sible operational support. We need open nel knowledge was not needed. standards for code, interfaces and inter- To demonstrate their results, NetBSD operability, and distribution. There 1.5 was booted under NetBSD 1.6, and a should be progress toward more reason- build of patch was done. User-level

70 NetBSD and FreeBSD have been gener- boot is very reliable. However, a great but local forecast information in text ated based on native systems. More than deal of effort was expended to condense over the bottom of the screen. EPORTS one virtual machine can be run at once. the software down to a single 1.68MB R These boxes can be reconfigured or format floppy, and using that format can Source code for this work will be avail- rebooted via satellite instructions. They be annoying. able on SourceForge. support simultaneous NTSC (analog) ONFERENCE

The second version’s design has a few and ASI (MPEG digital) output. They C

A DIGITAL PRESERVATION NETWORK APPLI- new requirements. It must run from provide this service for the normal CATION BASED ON OPENBSD read-only media, but somehow allow Weather Channel broadcast and also for David S. H. Rosenthal, Stanford Univer- fast updates. Signatures against write- a new product called WeatherScan, sity Libraries locked media must be checked. All disk which has no presenters and only relays Rosenthal presented a “network appli- file systems must be marked noexec. A a local forecast. ance,”a digital preservation system for major OS distribution must be used, One of the changes needed for FreeBSD keeping academic journals published on with minimized changes to the build includes ACPI back-porting to support the Web and accessible over the long process. The OS footprint must be mini- “soft off,”which allows the “power” but- term. The goal here is to establish what mal, and everything should be built ton to cause a software interrupt, per- appears to be a single-function box that from CVS nightly. The system trusts mitting a cleaner shutdown. Several can be connected to the Internet with only the BIOS and the contents of the driver updates and fixes were made, and minimal monitoring or administration, boot CD and the write-locked floppy, improvements were made to the instal- and that is cheap to install, maintain, which contains the entire configuration, lation system. and upgrade. passwords, and package verification Rosenthal’s solution is a peer-to-peer keys. Everything else can be verified Some problems that need to be over- system of persistent Web caches. The based on this or on data in the key- come: nice is too mean to processes that package crawls journal Web sites, dis- servers. This design has passed several need large CPU slices; user-land threads tributes to local users by acting as a security audits, takes just over five min- have difficulties blocking the whole proxy (keeping available documents that utes to boot a single unit, and in a fire- process with very large read() calls; and may now be gone), and preserves mate- drill test, 96% of all machines in the there were issues with thread priorities, rial by cooperating with other libraries’ cluster were upgraded in 48 hours. scheduling, and signals. A lot of this is already addressed in FreeBSD 5.x. caches to detect and repair damage. The Future work involves enhancing the host requirements are low material and per- to add support for DHCP, NAT, native NETWORKING sonnel costs. As a result of its success Java, USB storage, and open source since 1999, the system is now in use at BIOS, and for more diverse environ- TAGGING DATA IN THE NETWORK STACK: over 60 libraries. The original paper cov- ments such as governments and devel- mbuf_tags ering this work was published at Freenix oping countries. The source is available Angelos D. Keromytis, Columbia Uni- 2000. on SourceForge. versity An mbuf is a fixed-size buffering scheme The system uses generic PC hardware, USING FREEBSD TO RENDER REALTIME used in the BSD network stack. The with inherent replication to make it reli- LOCALIZED AUDIO AND VIDEO problem here is that packets require able, and open source is used to reduce John H. Baldwin, Weather Channel additional attributes for processing, and software costs. Staff costs are reduced by Baldwin demonstrated a FreeBSD box the available 16-bit flags and the inter- minimizing the need for administration, built for the Weather Channel that pro- face information are not sufficient. with a goal of 10 minutes per month. vides local information overlays onto There are many such potential attrib- The first version was based on the Linux live feeds. A “Weather STAR,”a utes; IPSec requires four or five, and router project. Every station ran from a FreeBSD-based satellite-addressable data more appear periodically in other write-locked boot floppy. The kernel and device, receives a full feed of weather implementations. RAM disk file system root was on floppy, data, extracts data useful to its location, Keromytis’ work involves mbuf_tags,a and remaining binaries were copied via and overlays it on top of the live broad- dynamically allocated variable-sized FTP to a temporary file system. A reboot cast from the studio for rebroadcast attribute buffer, and is similar to always returned the system to a known locally. What you see on the Weather NetBSD’s aux mbufs. There is a minimal good state. This has gone through about Channel is a live broadcast showing a fixed header referred to by an mbuf 150 machine-years of testing. Rosenthal weather presenter with national maps, packet header, and a general memory has observed that a floppy used only for

BSDCON ‘03 71 allocator. A list is added to the mbuf Leffler began porting the OpenBSD Channel and TCP/IP offload. The proj- packet header using a most-recent-first crypto framework, and cloned the ect of embedding NetBSD onto such a method. Kernel modules are free to use KAME IPSec code onto FreeBSD-STA- card would allow the offloading of a their own method, but an API is pre- BLE. The KAME code was changed to a variety of processing chores from the sented for creating, deleting, append- callback (continuation) model, and host server. ing/prepending, copying, and finding heavy tuning was done. The basic KAME The Wasabi Embedded Programming tags. One or two lines needed to be framework was retained for compatibil- Environment (WEPE) is a merging of changed for a few existing routines to ity, but ideas like packet tags, continua- user space into kernel space. The entire handle propagation and freeing of tags. tions, and code path merging were application lives in the kernel to pro- integrated from the OpenBSD code. The This implementation currently uses mal- mote effective interaction with the mes- result was familiar to both developer loc() to allocate the tag and its data. saging and DMA hardware. This also groups. Some work had to be done in various permits more direct access to large network drivers to correct their handling The performance is dominated by chunks of contiguous SDRAM. In addi- of mbufs. crypto calculations, so several of the tion, WEPE is an API for applications bottlenecks that needed to be optimized that provides portability for user space mbuf_tags can be used to propagate showed up only with fast crypto hard- and the kernel environment, a configu- IPSec-related implementation through- ware. Improvements were made in the ration management framework, and a out the stack, loop detection for virtual areas of the crypto support, reducing set of NetBSD kernel modifications. interfaces (with some optimizations processing overhead; data handling, produced for multi-threaded kernels), or The API offers file and socket I/O opera- aligning packet data and aggressive coa- an improved packet filter engine. tions and thread and some networking lescing of mbuf chains; network drivers, functions. There is also a kshell, which Future work will use the pool(9) alloca- tackling the usual hardware issues of acts as the WEPE debug console, and tor to avoid the need for synchroniza- latency, bus bandwidth, and interrupt several kernel environment debugging tion when allocating memory, tag coalescing; and system I/O, such as IRQ tools. The operating environment is triggers for use with encapsulation, and multiplexing, bus bandwidth handling, interesting in its contrasts: The host has application-defined tags. interrupt latency, and system effects plenty of local disk, but the embedded such as IRQ entropy. FAST IPSEC: A HIGH-PERFORMANCE IPSEC system has no local disk, only a small IMPLEMENTATION The results: Leffler’s Fast IPSec imple- RAM disk; host reset is under software mentation is 60% faster than the control, but the HBA reset is not; and SAMUEL J. LEFFLER, ERRNO CONSULTING IPSec is composed of three protocols: OpenBSD code for the software crypto applications on the host are largely AH (authentication), ESP (encryption case, and about the same speed as independent, but on the HBA they are and authentication), and IPCOM (com- KAME. The peak hardware accelerated tightly coupled. operation is more than twice that of any pression). ESP is the most frequently Thorpe and Briggs successfully demon- other open source implementation. used. There are also a variety of crypto strated an iSCSI target HBA running End-to-end throughput measurements and authentication algorithms used NetBSD+WEPE in conjunction with were about 230MBps for a uniprocessor within these. There are plenty of IPSec Intel and DataCore at Storage Network- machine and over 400MBps when acting implementations, most notably KAME, ing World in April 2003. The perfor- as an IPSec gateway. The CPU and 32- the OpenBSD IPSec code, FreeS/WAN mance was not as good as they had bit PCI are the limiting factors. Future for Linux, and Linux’s IPSec. hoped, but they understand now that work will include IPv6 support, an over- their debugging and analysis tools are So why another implementation? haul of the PF_KEY code, and an SADB too limited. Although more work will be Samuel Leffler sought to implement redesign to improve locking. IPSec with hardware acceleration for needed to move this work from the realm of “doable” to that of “viable,” FreeBSD and to develop a wireless mesh THE WHBA PROJECT: EXPERIENCES “DEEPLY network. The requirements were sup- EMBEDDING” NETBSD something very useful was produced with many intelligent potential applica- port of hardware acceleration, a space- JASON R. THORPE AND ALLEN K. BRIGGS, tions. efficient implementation, and WASABI SYSTEMS, INC. compatibility with FreeBSD. The Thorpe and Briggs worked with the idea approach chosen was an amalgam of of embedding NetBSD on a host bus KAME and OpenBSD’s work. adapter (HBA). Typical HBA applica- tions include RAID, SCSI, iSCSI, Fibre

72 INVITED TALK sensors like a heart monitor, GPS, and a All of the applications presented were POST-DIGITAL POSSIBILITIES cellular modem to upload it all. They’ve remarkable, spanning enhancements EPORTS

also added sensors to jewelry to monitor both in established technical areas such R MICHAEL HAWLEY, MIT your health over time. as photographic journals and in new You’re never more creative than when areas of innovation such as advances in you play. Michael Hawley leads a team of Mount Everest was a testing ground for

kitchen technology. It will indeed be ONFERENCE

graduate student researchers interested some biometrics useful when people go C interesting to see where Hawley’s in finding out what the next high-tech to very cold climates and high altitudes. research will take us next. revolution will be. Photography was the Weather monitoring stations were also last medium to be subsumed by the digi- deployed and were useful even at tal wave, but Hawley says the digital rev- extremely low bandwidths. GPS survey- olution has hardly started. The biggest ing points were set up to monitor tec- theme out of the Media Lab lately is tonic movement of the mountain over embedded intelligence. time. In Iceland, skiing kinematics were measured, but these need to be as non- Technology and toys tell us a lot about invasive as possible while still being able our advances. Video games are now to withstand harsh conditions. doing what supercomputers used to do. Even a Furby contains a lot of technol- In Hawaii, monitoring stations were ogy (by old standards) packed into a built, disguised as tree branches or small toy! There’s a lot of infrastructure rocks, for tracking the pollination pat- still to be invented. Lots of it is based on terns of some very rare plants that are what we already know, but a lot isn’t. not fully understood. These need to be completely self-contained as there is no Technology in LEDs has improved in power in the areas they wish to monitor. recent years. They are brighter, cheaper, They also need small monitors to broad- efficient, and even networked, with no cast their observations to bases. gels to switch around. But we also like technologies that break new ground, do Cambodia is an example of a very poor things we didn’t imagine before. They country that has managed to build good have added chips to coffee mugs, wireless coverage. Orphans with access watches, and various other devices. to a computer become teachers, even Imagine a coffee machine that knows celebrities. By local standards, a donated what you want and how you like it! This old Macintosh can turn a school into a can bring about a radical change in supercomputing center. The country has interface designs so that a lot of the sim- a very high percentage of youth, so ple things become automatic. growing up with access to this technol- ogy is obviously very important for their In the kitchen, such intelligence has been future. added to various devices to improve automation. They call this “counter The team also traveled to Bhutan, a intelligence,”which produced offers of country in Asia about the size of sponsorship from the NSA and inquiries Switzerland. It was the last country in from the CIA (no, the Culinary Institute the world to get cable TV.While taking of America). They also invented a digital photos there, a GPS attached to the nose, based on some biochemical tech- team’s digital camera would sample their nology that already exists, which can position periodically and add its details sound an alarm when it smells that your to the JPEG metadata, along with lens cake is ready. and camera setting information. This was later extracted when the photos Another project involved having a were uploaded, making indexing and marathon runner swallow a thermome- describing each photo much easier. ter in a pill (don’t worry, it’s FAA- approved) and wear a fanny pack full of

BSDCON ‘03 73