Quick viewing(Text Mode)

The Flux OS Toolkit: Reusable Components for OS Implementation

The Flux OS Toolkit: Reusable Components for OS Implementation

The Flux OS Toolkit: Reusable Components for OS Implementation

Bryan Ford Kevin Van Maren Jay Lepreau Stephen Clawson Bart Robinson Jeff Turnery Department of , University of Utah Salt Lake City, UT 84112

[email protected] http://www.cs.utah.edu/projects/flux/

Abstract 1 Introduction

To an unappreciated degree, research both in operating As functionality continues to expand systems and their programming languages has been severely and diversify, it is increasingly impractical for a small group hampered by the lack of cleanly reusable code providing to implement even a basic useful OS coreÐe.g., the func- mundanelow-levelOS infrastructuresuch as bootstrapcode tionality traditionally found in the kernelÐentirely and device drivers. The Flux OS Toolkit solves this problem from scratch. Furthermore, in most research domains, only by providing a set of clean, well-documented components. a few speci®c areas provide fodder for interesting research These componentscan be used as basicbuildingblocks both topics. For example, from reading an OS conference pro- for operating systems and for language run-time ceedings, one might be given the impression that build- systems directly on the hardware. The toolkit's implementa- ing an OS ªonlyº involves writing a sys- tion itself embodies reuse techniques by incorporating com- tem, an IPC system, a ®le system, a scheduler, some fast ponents such as device drivers, ®le systems, and network- local-area network code, and a pro®ler to produce nice bar ing code, unchanged, from other sources. We believe the kit charts. However, as any experienced OS builder knows, also makes feasiblethe production of highlyassured embed- many of the problems involved in building an OS have al- ded and operating systems: by enabling reuse of low-level ready been solved countless times, and just aren't interest- code, the high cost of detailed veri®cation of that code can ing to researchers. For example, any realistic OS, in or- be amortized over many systems for critical environments. der to be useful even for research, typically includes boot The OS toolkit is already heavily used in several different loader code, kernel startup code, variousdevice drivers, ker- OS and projects, and has already nel printf and malloc code, a kernel debugger, etc. A catalyzed research and development that would otherwise research project starting a new OS completely from scratch never have been attempted. would invariably spend at least the ®rst six months simply writing such infrastructureªgrungeº before even starting on the interesting aspects of the OS.

1.1 Related work

Most OS researchers have realized this problem of high Thisresearchwas supportedin partbytheDefenseAdvancedResearch Projects Agency,monitored by the Department of the Army under contract startup cost, and have resorted to cannibalizing BSD, , number DABT63±94±C±0058, and Rome Laboratory, Air Force Material or other freely available OSes rather than reinventing the Command,USAF, underagreementnumberF30602±96±2±0269. The U.S. wheel. Mach used BSD, [13], and vendors' device Government is authorized to reproduceand distribute reprints for Govern- drivers; SPIN [3] uses device drivers from FreeBSD; and mental purposes notwithstanding any copyright annotation hereon.

y U.S. Department of Defense VINO [17] takes its device drivers, bootstrap code, and low- Copyright 1997IEEE. Personal use of this material is permitted. How- level support for virtual memory from NetBSD. ever, permission to reprint/republish this material for advertising or promo- While this approach saves time, the developer must man- tional purposesor for creatingnew collective worksfor resale or redistribu- tion to servers or lists, or to reuse any copyrightedcomponent of this work ually examine and dissect the old OS; it would save even in other works must be obtained from the IEEE. more time if the developer could simply obtain a set of

1 clearly-documented components. It would also enable re- in Fluke'satomicoperations[18]), which tookonlya month. search by those whose primary expertise is in areas other We have found this ability to prototype radical designs in a than operating systems, e.g., programming language re- ªrealº kernel tobe crucial to choosingdesignsthat are worth searchers who wish to explore the effects of higher-level lan- fully developing. guages runningdirectlyon the hardware. Thisis thepurpose of the Flux OS Toolkit. Recent research projects such as [6], SPIN [3], 2 Toolkit design and VINO [17], focus on creating extensible systems which allow applications to modify the behavior of the core OS to suit their particular needs. However, these systems still de- ®ne a particular, ®xed set of ªcoreº functionality and a set of The Flux OS Toolkit is a framework and set of modular- policies by which the core can be used and extended. The ized librarycode, together withextensive documentation[9] OS Toolkit, in contrast, makes no attempt to be a useful OS for the construction of operating system kernels, servers, initself and doesnot de®neany particularset of ªcoreº func- and other core OS functionality. The goal is for develop- tionality, but merely provides a suite of components from ers to take the OS Kit and immediately have a starting point which real OSes can be built. for investigating ªrealº OS issues such as , VM, Other approaches involved creating an operating system IPC, ®le systems, or security. Researchers in programming built from a complex object-oriented framework, such as languages for systems bene®t as well, as the toolkit in the Choices[5] or [15] work. Although such makes it easy to run language systems on the bare hardware. efforts have been in¯uential in other OS projects, such as The intention of this toolkit is not to ªwrite the OS for Spring,theydonotappear tohave been widelyused. Incon- youº; we certainly want to leave the OS writing to the OS trast, the OS Kit exhibits a less ambitious, but more prag- writer. The dividing line between the ªOSº and the ªOS matic, and we believe more effective, approach to software Toolkit,º as we see it, is basically the line between what OS design and re-use. Gabriel distinguished two approaches writers want to write and what they would otherwise have to software design and implementation, sardonically label- to write but don't really want to. Naturally this will vary ing them ªThe Right Thingº and ªWorse is Betterº [12]. among different OS groups and developers. If you really The former is characterized by interface perfection at the want to write yourown x86 protected-mode startup code, or cost of implementation complexity (e.g., Lisp with CLOS), have found a clever way to do it ªbetter,º you are perfectly whereas the latter sacri®ces interface elegance and com- free to do so and simply not use the corresponding code in pleteness for simplicity of implementation (e.g., Unix and our toolkit. However, the OS Kit is modular enough so that C). Gabriel makes a strong case that ªWorse is Betterº is the you can still easily use other parts of it to ®ll in other func- more sucessful approach, and we believe that the OS Kit ex- tional areas you don't want to have to deal with yourself (or empli®es this lesson. areas that you just don't have time to do ªyet.º) As such, the toolkit is designed to be usable either as a 1.2 Historical genesis of the OS Toolkit whole or in arbitrary subsets, as requirements dictate. It is useful not only for kernels but also for other OS-related pro- We followedthe cannibalization approach in our own OS grams, such as boot loaders or servers running on top of a research for some time. However, starting in 1995, that ap- . proach gradually evolved, resultingin what became theFlux While the OS Kit currently runs on x86 PCs, it is de- OS Toolkit, or ªOS Kitº. Because we were ®nding our ver- signed to be portable to other architectures, and most of the sion of Mach [11] too constraining a vehicle in which to OS Kit's exported interfaces are architecture-neutral. Most prototype new ideas, we developed a series of experimen- of the heavily architecture-speci®c aspects of the OS Kit are tal kernels to try out ideas before designing our Fluke ker- isolated in a single component, the low-level kernel support nel [10]. In doing so, we gradually modularized and for- library, whose purpose is to provide easy access to the raw malized the libraries of support code we developed, proto- privileged-mode hardware facilities without adding over- typing the OS Kit along the way. These experimental ker- head or obscuring the underlying abstractions. For exam- nels embodied radical changes to fundamental aspects of OS ple, on the x86, the kernel support library includes functions structure which would have been impossible to explore in to directly create and manipulate x86 page tables and seg- an existing operating system. One of these kernels explored ment registers. Other OS Kit components can, and often implementations of high performance kernel-mediated ca- do, provide higher-level architecture-neutral facilities built pabilities and IPC paths, and took about 2 weeks to develop on these low-level mechanisms; however, the architecture- from scratch; theotherexploredinterruptibilityof kernelop- speci®c interfaces always remain accessible in order to pro- erations at arbitrary points (®nding a more ®nal expression vide maximum ¯exibility.

2 3 Sample components boots from standard boot loaders is as easy as writingan or- dinary ªHello Worldº application in C. The toolkit currently contains ®fteen major libraries, The OS Kit also provides the necessary code to initialize ranging from uniprocessor and multiprocessor bootstrap and start multiple processors in a symmetric multiprocess- code, through , to support for popular ing (SMP) system. Of course, it is still up to the OS writer

®le systems and schemes.1 In the follow- to make the overall OS SMP-safe. ing sections we brie¯y describe some of these components. Forconvenience, some partsofthe OS Kit,such as itsde- fault console I/O and debugging support, are designed to be automatically SMP-safe; other parts of the OS Kit can eas- 3.1 Kernel bootstrap support ily be made SMP-safe but require the OS writer to provide the appropriate synchronization mechanisms at the individ- One of the time-honoredways towaste time ina research ual component level. Since the components do not gener- project is to write a boot loader for a new OS: as an invio- ally contain ®ne-grained synchronization internally, higher- late rule, each new OS must have its own boot loader, and level components, such as the ®le system and the network- that boot loader must be incompatiblewith those of all other ing code, will work, but not exhibit optimal parallel per- operating systems. Furthermore, each OS often has several formance. However, the low-level components are small boot loaders: one to boot from the hard disk, one to boot enough to providean appropriate level of granularityin typ- from across the network (this ªoneº often multiplied by the ical situations. number of distinct Ethernet cards to be supported), one to boot from an existing OS such as MS-DOS, etc. 3.2 Memory management While searching for a good bootstrap solution for our own OS research, we examined the bootstrap mechanisms ofa number of existingsystems, and foundthat the diversity Another aspect of OS implementation that often involves of existing mechanisms was caused not by any fundamental a large amount of uninteresting work is physical memory difference in the bootstrap service required by the OS, but management. Many research operating systems support- instead merely by the completely ad hoc way in which boot ing virtual memory start out simply by keeping free phys- loaders are typically constructed. In other words, because ical pages on a list; systems that don't support virtual mem- boot loaders are so fundamentally uninteresting, OS devel- ory typically use a simple malloc-like allocator of some opers felt compelled to produce a minimal quick-and-dirty kind. Unfortunately, in practice, all hopes of using such design, which results in thisboot loader being unsuitablefor clean, simple solutions are quickly dashed as soon as the the next OS due to slight differences in design philosophyor unsuspecting OS attempts to support real hardware, which requirements. invariably proves to be painfully picky. For example, de- To solvethisproblem, we workedwithkeypeople invar- vices often require the use of contiguous physical memory iousother OSprojectstoproducethe MultiBootstandard [8] blocks larger than a page in size, requiring the VM system for x86 PCs, which is a standard interface between a boot to scrounge through page lists for contiguous pages. Even loader and an OS so that any compliant boot loader can load less well-behaved devices are extremely common: many any compliant OS. This standard interface includes features DMA devices on PCs require contiguousbuffers in the low- needed by advanced systems but typically not cleanly sup- est 16MB of physical memory. In general, operating sys- portedby existingboot loaders, such as support for boot im- tems must ef®ciently manage address spaces of all types, ages of unlimited size and boot images consisting of mul- such as virtual address spaces, paging spaces, block or page tiple distinct ®les. We then incorporated all the necessary maps, etc.; these are precisely the kinds of grimy issues that support code into the OS Kit to make it trivial to create OS researchers don't have time to worry about, but must be MultiBoot-compliantOS kernels, and included a set of sim- solved if the OS is ever to become ªrealº in any sense. ple MultiBoot-compliant boot loaders. A more complete To address these memory management issues, the OS and powerful MultiBoot-compatibleboot loader, GRUB[4], Kit includes a set of simple, but extremely ¯exible, mem- is also available as a separate package. The result is that, us- ory management support libraries. The list-based memory ing the OS toolkit, writing a ªHello Worldº OS kernel that manager, or LMM, provides powerful and ef®cient prim- itives for managing allocation of either physical or virtual

1 The OS Kit currently includes the following libraries: low-level ker- memory, in kernel or user-level code, and includes support nelbootstrappingandsupport, multiprocessorsupport,alist-based memory for managing multiple ªtypesº of memory in a pool, and manager,an addressmap manager, a minimal C library, memory allocation for allocations with various type, size, and alignment con- debugging,disk partitioning, ®le system reading, program loading, a math library, device drivers, the NetBSD Fast , and the FreeBSD and straints. The address map manager, or AMM, is designed x-kernel network protocol stacks. to manage address spaces that don't necessarily map directly

3 tophysical or virtualmemory; itprovidessimilar supportfor OS Kernel other aspects of OS implementation such as the management of processes' address spaces, paging partitions, free block maps, or IPC namespaces. NetBSD File System FreeBSD AMM Networking 3.3 Minimal C library Partitioning Driver Support Mature OS kernels typically contain a considerable libm amount of code that merely reimplements basic C library Linux Drivers Minimal libc functionality such as printf and malloc. This is done because the ªrealº C library implementations of these func- MemDebug Driver Support tionsmake too many assumptions about the surroundingen- LMM vironmentand arenot ¯exibleenough toworkin akernel en- vironment. For example, the standard C library's printf SMP Low-level Kernel Support Library includes a mass of complicated buffering code, which uses many different system calls, terminal-related ioctls, and Hardware dynamic memory allocation, when all that the kernel really needs is simple formatted console output. Similarly, stan- You provide OSkit provides OSkit provides (from other OS) dard malloc implementations make fundamental assump- tions about the layout of a 's , e.g., that Figure 1. General OS Kit organization. Although the theheap is arbitrarilygrowableusing sbrk, and will always OS Kit's components work with each other easily, they be contiguous and monotonically increasing. are designed to be well separated from each other, al- lowing the OS to use them together or in isolation and The OS Kit includes a minimal C library that provides to control how they interact with each other. Note that common C library routineswithout all the unnecessary frills the relative size of the areas does not re¯ect the com- and unwanted assumptions in standard C libraries. For ex- ponents' sizes. ample, locales and ¯oating-point are not supported, and the standard I/O calls don't do any buffering, relying instead on the underlying read and write operations provided by the OS. The C library routinesare highly modularized and well- rectly from an existingOS, and a small surroundinglayer of separated, preventing the entire library from being linked in ªglue codeº that mimics the execution environment of the when one function is called. Dependencies between func- donor OS. This design allows the device drivers to operate tions are minimized, and those dependencies that do exist oblivious to their true surroundings in environments vastly are well-documented, so that individual functionscan be re- different from those for which they were originallywritten, placed as necessary in order to adapt the minimal C library such as in preemptive or multiprocessor kernels, or even in to arbitrary environments. For example, printf relies on user-mode processes. For example, in addition to provid- the OS only to provide a putchar implementation. ing basic functions and variables that the drivers reference, the Linux glue code also invisibly emulates Linux's current process abstraction so that the drivers can be run in environ- 3.4 Device drivers ments in which the process abstraction is completely differ- ent or even nonexistent. The glue code surrounding the im- One of the most expensive and boring tasks in OS de- ported code hides the details of the original OS environment velopment and maintenance is supportingall of the different from the developer and in its place presents clean, simple, kinds of device hardware available. Devices are tricky and well-de®ned device interfaces. These device interfaces con- their glitches often undocumented; sometimes only binary form toa small subset of the ComponentObject Model [14], versions of drivers are available. Recognizing the imprac- namely the interface queryingand reference counting mech- ticality of providing our own device support from scratch, anisms, which allow them to be cleanly extended and up- and the advantages of reusing others' code, we made a key dated over time and facilitate future binary-level compati- extension to the approach taken in the rest of the OS Kit. bility. We generalized the approach taken by Goel at Columbia This design required almost no modi®cations to the de- and Utah, which allowed unchanged Linux device drivers to vice drivers themselves, which vastly simpli®es the task of be used by the Mach 3.0 kernel [13]. As illustrated in Fig- keeping the drivers up-to-date with the newest versions of ure 1, the OS Kit'sdevice driversupportis composed of two the donor operating system. Of course, the glue code still cleanly-separated pieces: a large base of code imported di- has to be updated todeal withchanges in the drivers'overall

4 native environment, but this is much simpler than updating systems are being developed now to run on OSes that can- all the device drivers manually. The use of binary versions not support their safety, security, or reliabilityrequirements. of drivers, such as NetWare's ODI drivers, should also be Also, many critical systems run as embedded systems whose possible, although we have not yet attempted this. infrastructure needs are a good match to the OS Kit's li- The OS Kit currently incorporates over 50 existing Eth- braries. In embedded systems the OS Kit's code would con- ernet and disk device drivers from Linux 2.0.29; fewer than stitute a higher fraction of the total code than in full-blown 150 lines out of 80,000 were modi®ed or added, mostly operating systems, which is further evidence of the value of in header ®les. Some of these changes were to override pursuing reusable veri®cation of OS Kit components. macros for memory mappings and enabling/disabling in- terrupts; some ®xed bugs that exist in Linux; and a few were added for debugging. The remainder were changes 5 Current status that added data structureelements required by thegluecode, or de®ned different macros for operations; for example, we modi®ed Linux's skbuff structure, which is used to store An early version of the OS Kit has been released pub- network packets. licly in beta-test form and is available from http://- www.cs.utah.edu/projects/flux. The OS Kit Naturally, the ¯exibility provided by the framework currently consists of about 3,500 public header ®le lines and sometimes imposes a performance cost, depending on the 220,000 lines of code. Of these, 207,000 lines are reused environment and the way the drivers are used. For instance, virtually unmodi®ed from existing sources, so the OS Kit's Linux uses contiguous buffers for network data, while maintenance burden only consists of the remaining 13,000 many systems use more complex scatter-gather buffers, e.g., lines of ªnativeº OS Kit code, 23% of which is x86-speci®c. mbufs; thus using Linux drivers with BSD-derived net-

2 All line counts were taken after ®ltering out comments, working code requires an extra copy on the send path. blank lines, preprocessor directives, and punctuation-only However, even in projects intending to write custom device lines(e.g., linescontainingonly a brace); the result typically

drivers speci®cally optimized for theOS inquestion, shrink-

=4 1=2 runs 1 to the size of the un®ltered code. wrapped OS Kit device drivers are still very useful. Since OS Kit device drivers can coexist with custom drivers, they can be used as a base while custom drivers are being devel- 5.1 Existing uses of the OS Toolkit oped, and also to provide broader hardware coverage. Note that although the OS Kit's device drivers would seem at ®rst to be highly machine-dependent, several of the Our Flukemicrokernel [10] puts almost all of the OS Kit original Linux drivers it contains are already used on non- to use, and in fact over half of the Fluke kernel is OS Kit x86 platforms supported by Linux such as Alpha, MIPS, code. We used an early version of the OS Kit in MOSS [7], and PowerPC (although many others still have embedded a DOS extender (a small OS kernel that runs on MS-DOS x86 assembly). When NetBSD drivers are supported, we and creates a more complete process environment for 32-bit will gain more portability, since NetBSD has gone further applications), which is being used in commercial products. in separating out machine-dependent code from the device Besides the experimental kernels mentioned earlier, we have drivers. used the OS Kit in smaller utilities, such as a specialized ker- nel to boot another kernel from the network. 4 Reusable speci®cation and veri®cation Some of the OS Kit'sexternal users have informed us of their efforts. At MIT, Olin Shivers et al. are investigating advanced-language operating systems and use the OS Kit to We are exploring one other aspect of the OS Kit. In ad- run SML/NJ on the bare hardware as the OS [16]. This is a ditionto saving development time and money in OS design, goal the ML community has desired for years but until now the OS Kit also presents the possibilityof reusable veri®ca- the low-level aspects have presented too much of a barrier. tion. Veri®cation is an extremely expensive activity, and is The SR project at U.C. Davis [2] is exploring using the OS usually carried out at the operating system API level. Even Kit to run SR directly on the hardware. Here at Utah, we for an A1 security evaluation, the low-level code is not re- have also ported two other languages to the OS Kit, Java quired to be formally veri®ed, primarily due to the inordi- and Smalltalk. The Systems and Communications Group at nate cost of the veri®cation, and the fact that the infrastruc- the University of Carlos III in Spain has employed the OS ture code was never or rarely reused. It is not just the se- Kit in their distributed microkernel-based operating system curity community that requires correct functioning: critical project, named Off [1]. The OS Kit is also being used in

2 This important case is one reason we are integrating an alternate set of the ªbits and pieces microkernelº (bpmk) being developed network drivers from NetBSD. in Finland.

5 6 Conclusion [8] B. Ford and E. S. Boleyn. MultiBoot Standard. Available as ftp://¯ux.cs.utah.edu/¯ux/multiboot, 1996. The OS Kit has proven surprisingly powerful and pop- [9] B. Ford and Flux Project Members. The Flux Operating Sys- ular, both at Utah and at external institutions, greatly aid- tem Toolkit. University of Utah. Postscript and HTML avail- ing research and development in both operating systems and able under http://www.cs.utah.edu/projects/¯ux/oskit/html/, their implementation languages. The OS Kit's relatively 1996. mundane low-levelcomponents, and its provisionof higher- [10] B. Ford, M. Hibler, J. Lepreau, P. Tullmann, G. Back, and level components through software reuse, ®ll crucial needs S. Clawson. Meet Recursive Virtual Machines. for wide classes of clients. In Proc. of the Second Symp. on Operating Systems Design and Implementation, pages137±151,Seattle, WA, Oct. 1996. USENIX Assoc. Acknowledgements [11] B. Ford and J. Lepreau. Evolving Mach 3.0 to a Migrating Model. In Proc. of the Winter 1994 USENIX Conf., We thank Shantanu Goel for his important work, both at pages 97±114, Jan. 1994. Columbia and at Utah, on the Linux frame- [12] R. P. Gabriel. Lisp: Good News, Bad News, How to Win Big. work in Mach. Erich Boleyn co-authored the MultiBoot AI Expert, pages 31±39, June 1991. speci®cation and developed a boot loader for it. We thank our clients for their feedback, in particular Olin Shivers, Al- [13] S. Goel and D. Duchamp. Linux Device Driver Emulation in bert Lin, and Greg Benson. Many others have made im- Mach. In Proc. of the Annual USENIX 1996TechnicalConf., pages 65±73, San Diego, CA, Jan. 1996. portant contributions to the OS Kit, including Mike Hibler, Steve Smalley, Godmar Back, and Greg Benson. Mike Hi- [14] Microsoft Corporation and Digital Equipment Corporation. bler and the anonymous reviewers provided helpful com- Component Object Model Speci®cation,Oct. 1995. 274 pp. ments. [15] W. Myers. Taligent's CommonPoint: The Promise of Ob- jects. Computer, 28(3):78±83, Mar. 1995. References [16] O. Shivers. Automatic ManagementofOperatingSystemRe- sources. In Proceedings of the Second ACM SIGPLAN In- ternational Conference on Functional Programming (ICFP [1] F.J.BallesterosandL.L.Fernandez.TheNetwork Hardware '97), June 1997. is the Operating System. In Proc. of the Sixth Workshop on Hot Topics in Operating Systems,Cape Cod,MA, May 1997. [17] C. Small and M. Seltzer. VINO: An Integrated Platform for To appear. Operating System and Database Research. Technical Report TR-30-94, Harvard University, 1994. [2] G. D. Benson and R. A. Olsson. A Portable Run-Time System for the SR Concurrent Programming Language. In Proc. of [18] P. Tullmann, J. Lepreau, B. Ford, and M. Hibler. User-level the Workshopon RuntimeSystemsfor ParallelProgramming, Checkpointing Through Exportable Kernel State. In Proc. Geneva,Switzerland, April 1997. Held in conjuction with the Fifth International Workshop on Object Orientation in Oper- 11th International Parallel ProcessingSymposium(IPPS'97). ating Systems, pages 85±88, Seattle, WA, Oct. 1996. IEEE. [3] B. N. Bershad, S. Savage, P. Pardyak, E. G. Sirer, M. E. Fi- uczynski, D. Becker, C. Chambers, and S. Eggers. Extensi- bility, Safety, and Performance in the SPIN Operating Sys- tem. In Proc. of the 15th ACM Symp. on Operating Sys- tems Principles,pages267±284,CopperMountain, CO, Dec. 1995. [4] E. Boleyn. GRUB ± GRand Uni®ed Bootloader. http://- www.uruk.org/grub/, 1996. [5] R. Campbell, N. Islam, P. Madany, and D. Raila. Designing and Implementing Choices: An Object-Oriented System in C++. Communications of the ACM, Sept. 1993. [6] D. R. Engler, M. F. Kaashoek,and J. O'Toole Jr. Exokernel: An Operating SystemArchitecture for Application-LevelRe- source Management. In Proc.of the 15th ACM Symp.on Op- erating Systems Principles, pages 251±266, Copper Moun- tain, CO, Dec. 1995. [7] B. Ford. MOSS: A DOS extender based on the Flux OS Toolkit. Available as http://www.cs.utah.edu/projects/¯ux/- moss/, 1996.

6