Using a Log-Structured File System for Tertiary Storage Management

Total Page:16

File Type:pdf, Size:1020Kb

Load more

HighLight: Using a Log-structured File System for Tertiary Storage Managementy John T. Kohl Carl Staelin University of California, Berkeley and Hewlett-Packard Laboratories Digital Equipment Corporation Michael Stonebraker University of California, Berkeley November 20, 1992 1. Introduction HighLight combines both conventional disk secondary Abstract storage and robotic tertiary storage into a single ®le sys- Robotic storage devices offer huge storage capacity at a tem. It builds upon the 4.4BSD LFS [10], which derives low cost per byte, but with large access times. Integrating directly from the Sprite Log-structured File System (LFS) these devices into the storage hierarchy presents a chal- [9], developed at the University of California at Berke- lenge to ®le system designers. Log-structured ®le systems ley by Mendel Rosenblum and John Ousterhout as part of (LFSs) were developed to reduce latencies involved in ac- the Sprite operating system. LFS is optimized for writ- cessing disk devices, but their sequential write patterns ing data, whereas most ®le systems (e.g. the BSD Fast match well with tertiary storage characteristics. Unfortu- File System [4]) are optimized for reading data. LFS di- nately, existing versions only manage memory caches and vides the disk into 512KB or 1MB segments, and writes disks, and do not support a broader storage hierarchy. data sequentially within each segment. The segments are threaded together to form a log, so recovery is quick; it HighLight extends 4.4BSD LFS to incorporate both sec- entails a roll-forward of the log from the last checkpoint. ondary storage devices (disks) and tertiary storage devices Disk space is reclaimed by copying valid data from dirty (such as robotic tape jukeboxes), providing a hierarchy segments to the tail of the log and marking the emptied within the ®le system that does not require any application segments as clean. support. This paper presents the design of HighLight, pro- poses various policies for automatic migration of ®le data Since log-structured ®le systems are optimized for between the hierarchy levels, and presents initial migration write performance, they are a good match for the write- mechanism performance ®gures. dominated environment of archival storage. However, sys- tem performance will depend on optimizing read perfor- mance, since LFS already optimizes write performance. Therefore, migration policies and mechanisms should ar- This research was sponsored in part by the University of California and Digital Equipment Corporation under Digital's ¯agship research range the data on tertiary storage to improve read perfor- project ªSequoia 2000: Large Capacity Object Servers to Support Global mance. Change Research.º Other industrial and governmentpartners include the California Department of Water Resources, United States Geological HighLight was developed to provide a data storage ®le Survey, MCI, ESL, Hewlett Packard, RSI, SAIC, PictureTel, Metrum system for use by Sequoia researchers. Project Sequoia Information Storage, and Hughes Aircraft Corporation. This work was 2000 [14] is a collaborative project between computer sci- also supported in part by Digital Equipment Corporation's Graduate Engineering Education Program. entists and earth science researchers to develop the nec- y Permission has been granted by the USENIX Association to essary support structure to enable global change research reprint the above article. This article was originally published in the on a larger scale than current systems can support. High- USENIX Association Conference Proceedings, January 1993. Copy- Light is one of several ®le management avenues under right c USENIX Association, 1993. 1 exploration as a supporting technology for this research. table and the inode map. The segment summary table Other storage management efforts include the Inversion contains information describing the state of each segment support in the POSTGRES database system [7] and the in the ®le system. Some of this information is necessary Jaquith manual archive system [6] (which was developed for correct operation of the ®le system, such as whether for other uses, but is under consideration for Sequoia's the segment is clean or dirty, while other information is use). used to improve the performance of the cleaner, such as the number of live data bytes in the segment. The inode The bulk of the on-line storage for Sequoia will be map contains the current disk address of each ®le's inode, provided by a 600-cartridge Metrum robotic tape unit; as well as some auxiliary information used for ®le system each cartridge has a capacity of 14.5 gigabytes for a total bookkeeping. In 4.4BSD LFS, both the inode map and of nearly 9 terabytes. We also expect to have a collection the segment summary table are contained in a regular ®le, of smaller robotic tertiary devices (such as the Hewlett- called the i®le. Packard 6300 magneto-optic changer). HighLight will have exclusive rights to some portionof thetertiary storage When reading ®les, the only difference between LFS space. and FFS is that the inode's location is variable. Once the system has found the inode (by indexing the inode map), HighLight is currently running in our laboratory, with a LFS reads occur in the same fashion as FFS reads, by simple automated ®le-oriented migration policy as well as following direct and indirect block pointers1. a manual migration tool. HighLight can migrate ®les to tertiary storage and automatically fetch them again from When writing, LFS and FFS differ substantially. In FFS, tertiary storage into the cache to enable application access. each logical block within a ®le is assigned a location upon allocation, and each subsequent operation (read or write) The remainder of this paper presents HighLight's mech- is directed to that location. In LFS, data are written to the anisms and some preliminary performance measurements, tail of the log each time they are modi®ed, so their location and speculates on some useful migration policies. We be- changes. This requires that their index structures (indirect gin with a thumb-nail sketch of the basic Log-structured blocks, inodes, inode map entries, etc.) be updated to ®le system, followed by a discussion of our basic storage re¯ect their new location, so these index structures are and migration model and a comparison with existing re- also appended to the log. lated work in policy and mechanism design. We continue with a brief discussion of potential migration policies and a In order to provide the system with a ready supply of description of HighLight's architecture. We present some empty segments for the log, a user-level process called preliminary measurements of our system performance, the cleaner garbage collects free space from dirty seg- and conclude with a summary and directions for future ments. The cleaner selects one or more dirty segments to work. be cleaned, appends all valid data from those segments to the tail of the log, and then marks those segments clean. The cleaner communicates with the ®le system by read- ing the i®le and calling a handful of LFS-speci®c system 2. LFS Primer calls. Making the cleaner a user-level process simpli®es the adjustment of cleaning policies. The primary characteristic of LFS is that all data are stored in a segmented log. The storage consists of large contigu- For recovery purposes the ®le system takes periodic ous spaces called segments which may be threaded to- checkpoints. During a checkpoint the address of the most gether to form a linear log. New data are appended to the recent i®le inode is stored in the superblock so that the log, and periodically the system checkpoints the state of recovery agent may ®nd it. During recovery the threaded the system. During recovery the system will roll-forward log is used to roll forward from the last checkpoint. Each from the last checkpoint, using the information in the log segment of the log may contain several partial segments. to recover the state of the ®le system at failure. Obviously, A partial segment is considered an atomic update to the as data are deleted or replaced, the log contains blocks of log, and is headed by a segment summary cataloging its invalid or obsolete data, and the system must coalesce this contents. The summary also includes a checksum to verify wasted space to generate new, empty segments for the log. that the entire partial segment is intact on disk and provide an assurance of atomicity. During recovery, the system 4.4BSD LFS shares much of its implementation with the scans the log, examining each partial segment in sequence. Berkeley Fast File System (FFS) [4]. It has two auxiliary data structures not found in FFS: the segment summary 1 In fact, LFS and FFS share this indirection code in 4.4BSD. 2 summary log contents file system (state) tail of log auto s new data reads; initial 0 d,a i writes migration 1 c (empty segment) caching 2 d s i s i disk farm tertiary jukebox(es) . ... ... State Key: Block Key: Figure 2: The storage hierarchy. d = dirty c = clean s = Summary a = active i = i−node vast storage. It manages the storage and the migration = data between the two levels. The basic storage and migration model is illustrated in Figure 2. HighLight has a great deal of ¯exibility, allowing arbi- trary data blocks, directories, indirect blocks, and inodes Figure 1: LFS data layout. to migrate to tertiary storage at any time. It uses the basic LFS layout to manage the on-disk storage and applying a variant on the cleaning mechanism to provide the mi- gration mechanism. A natural consequence of this layout Figure 1 shows the on-disk data structures of 4.4BSD is the use of LFS segments for the tertiary-resident data LFS. The on-disk data space is divided into segments.
Recommended publications
  • Twenty Years of Berkeley Unix : from AT&T-Owned to Freely

    Twenty Years of Berkeley Unix : from AT&T-Owned to Freely

    Twenty Years of Berkeley Unix : From AT&T-Owned to Freely Redistributable Marshall Kirk McKusick Early History Ken Thompson and Dennis Ritchie presented the first Unix paper at the Symposium on Operating Systems Principles at Purdue University in November 1973. Professor Bob Fabry, of the University of California at Berkeley, was in attendance and immediately became interested in obtaining a copy of the system to experiment with at Berkeley. At the time, Berkeley had only large mainframe computer systems doing batch processing, so the first order of business was to get a PDP-11/45 suitable for running with the then-current Version 4 of Unix. The Computer Science Department at Berkeley, together with the Mathematics Department and the Statistics Department, were able to jointly purchase a PDP-11/45. In January 1974, a Version 4 tape was delivered and Unix was installed by graduate student Keith Standiford. Although Ken Thompson at Purdue was not involved in the installation at Berkeley as he had been for most systems up to that time, his expertise was soon needed to determine the cause of several strange system crashes. Because Berkeley had only a 300-baud acoustic-coupled modem without auto answer capability, Thompson would call Standiford in the machine room and have him insert the phone into the modem; in this way Thompson was able to remotely debug crash dumps from New Jersey. Many of the crashes were caused by the disk controller's inability to reliably do overlapped seeks, contrary to the documentation. Berkeley's 11/45 was among the first systems that Thompson had encountered that had two disks on the same controller! Thompson's remote debugging was the first example of the cooperation that sprang up between Berkeley and Bell Labs.
  • The Release Engineering of Freebsd 4.4

    The Release Engineering of Freebsd 4.4

    The Release Engineering of FreeBSD 4.4 Murray Stokely [email protected] Wind River Systems Abstract different pace, and with the general assumption that they This paper describes the approach used by the FreeBSD re- have first gone into FreeBSD-CURRENT and have been lease engineering team to make production-quality releases thoroughly tested by our user community. of the FreeBSD operating system. It details the methodol- In the interim period between releases, nightly snap- ogy used for the release of FreeBSD 4.4 and describes the shots are built automatically by the FreeBSD Project build tools available for those interested in producing customized machines and made available for download from ftp: FreeBSD releases for corporate rollouts or commercial pro- //stable.FreeBSD.org. The widespread availabil- ductization. ity of binary release snapshots, and the tendency of our user community to keep up with -STABLE development with CVSup and “make world”[8] helps to keep FreeBSD- 1 Introduction STABLE in a very reliable condition even before the qual- ity assurance activities ramp up pending a major release. The development of FreeBSD is a very open process. Bug reports and feature requests are continuously sub- FreeBSD is comprised of contributions from thousands of mitted by users throughout the release cycle. Problem people around the world. The FreeBSD Project provides reports are entered into our GNATS[9] database through anonymous CVS[1] access to the general public so that email, the send-pr(1) application, or via a web-based form. others can have access to log messages, diffs between de- In addition to the multitude of different technical mailing velopment branches, and other productivity enhancements lists about FreeBSD, the FreeBSD quality-assurance mail- that formal source code management provides.
  • Berkeley DB from Wikipedia, the Free Encyclopedia

    Berkeley DB From Wikipedia, the free encyclopedia Berkeley DB Original author(s) Margo Seltzer and Keith Bostic of Sleepycat Software Developer(s) Sleepycat Software, later Oracle Corporation Initial release 1994 Stable release 6.1 / July 10, 2014 Development status production Written in C Operating system Unix, Linux, Windows, AIX, Sun Solaris, SCO Unix, Mac OS Size ~1244 kB compiled on Windows x86 Type Embedded database License AGPLv3 Website www.oracle.com/us/products/database/berkeley-db /index.html (http://www.oracle.com/us/products/database/berkeley- db/index.html) Berkeley DB (BDB) is a software library that provides a high-performance embedded database for key/value data. Berkeley DB is written in C with API bindings for C++, C#, PHP, Java, Perl, Python, Ruby, Tcl, Smalltalk, and many other programming languages. BDB stores arbitrary key/data pairs as byte arrays, and supports multiple data items for a single key. Berkeley DB is not a relational database.[1] BDB can support thousands of simultaneous threads of control or concurrent processes manipulating databases as large as 256 terabytes,[2] on a wide variety of operating systems including most Unix- like and Windows systems, and real-time operating systems. "Berkeley DB" is also used as the common brand name for three distinct products: Oracle Berkeley DB, Berkeley DB Java Edition, and Berkeley DB XML. These three products all share a common ancestry and are currently under active development at Oracle Corporation. Contents 1 Origin 2 Architecture 3 Editions 4 Programs that use Berkeley DB 5 Licensing 5.1 Sleepycat License 6 References 7 External links Origin Berkeley DB originated at the University of California, Berkeley as part of BSD, Berkeley's version of the Unix operating system.
  • Copyright © 1992, by the Author(S). All Rights Reserved

    Copyright © 1992, by the Author(S). All Rights Reserved

    Copyright © 1992, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. AN IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM FOR UNIX by Margo Seltzer, Keith Bostic, Marshall Kirk McKusick, and Carl Staelin Memorandum No. UCB/ERL M92/134 3 December 1992 AN IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM FOR UNIX by Margo Seltzer, Keith Bostic, Marshall Kirk McKusick, and Carl Staelin Memorandum No. UCB/ERL M92/134 3 December 1992 ELECTRONICS RESEARCH LABORATORY College of Engineering University of California, Berkeley 94720 AN IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM FOR UNIX by Margo Seltzer, Keith Bostic, Marshall Kirk McKusick, and Carl Staelin Memorandum No. UCB/ERL M92/134 3 December 1992 ELECTRONICS RESEARCH LABORATORY College of Engineering University ofCalifornia, Berkeley 94720 An Implementation ofa Log-Structured File System for UNIX1 Margo Seltzer —University ofCalifornia, Berkeley Keith Bostic - University ofCalifornia, Berkeley Marshall Kirk McKusick —University ofCalifornia, Berkeley Carl Staelin - Hewlett-Packard Laboratories ABSTRACT Research results [ROSE91] demonstrate that a log-structured file system (LFS) offers the potential for dramatically improved write performance, faster recovery time, and faster file creation and deletion than traditional UNIX file systems. This paper presents a redesign and implementation of the Sprite [ROSE91] log-structured file system that is more robust and integrated into the vnode interface [KLEI86].
  • Contributeurs Au Projet Freebsd Version: 43184 2013-11-13 Par Hrs

    Contributeurs Au Projet Freebsd Version: 43184 2013-11-13 Par Hrs

    Contributeurs au projet FreeBSD Version: 43184 2013-11-13 par hrs. Résumé Cet article liste les organisations et les personnes ayant contribué à FreeBSD. Table des matières 1. Gallerie des dons ..................................................................................................................... 1 2. Le bureau dirigeant .................................................................................................................. 3 3. Les développeurs FreeBSD .......................................................................................................... 3 4. Le projet de documentation de FreeBSD ...................................................................................... 14 5. Qui est reponsable de quoi ....................................................................................................... 15 6. Liste des anciens de la "Core Team" ........................................................................................... 16 7. Liste des anciens développeurs .................................................................................................. 17 8. Liste des logiciels contribués ..................................................................................................... 18 9. Contributeurs additionnels à FreeBSD ......................................................................................... 18 10. Contributeurs du kit de patch 386BSD ....................................................................................... 58 Index .....................................................................................................................................
  • Downloaded for Free From

    Downloaded for Free From

    The Design of the NetBSD I/O Subsystems SungWon Chung Pusan National University 2 This book is dedicated to the open-source code developers in the NetBSD community. The original copy of this publication is available in an electronic form and it can be downloaded for free from http://arXiv.org. Copyright (c) 2002 by SungWon Chung. For non-commercial and personal use, this publication may be reproduced, stored in a retrieval system, or transmitted in any form by any means, electronic, mechanical, photocopying, recording or otherwise. For commercial use, no part of this publication can be reproduced by any means without the prior written permission of the author. NetBSD is the registered trademark of The NetBSD Foundation, Inc. Contents Preface 14 I Basics to Learn Filesystem 15 1 Welcome to the World of Kernel ! 17 1.1 How Does a System Call Dive into Kernel from User Program ? . 17 1.1.1 Example: write system call . 17 1.1.2 Ultra SPARC 0x7c CPU Trap . 18 1.1.3 Jump to the File Descriptor Layer . 24 1.1.4 Arriving at Virtual Filesystem Operations . 28 1.2 General Data Structures in Kernel such as List, Hash, Queue, ... 30 1.2.1 Linked-Lists . 30 1.2.2 Tail Queues . 34 1.2.3 Hash . 38 1.3 Waiting and Sleeping in Kernel . 39 1.4 Kernel Lock Manager . 39 1.4.1 simplelock and lock . 39 1.4.2 Simplelock Interfaces . 40 1.4.3 Lock Interfaces . 40 1.5 Kernel Memory Allocation . 43 1.6 Resource Pool Manager .
  • An Operating System

    An Operating System

    Page 1 of 7 What is an Operating System 2.1 Examples: An operating system (OS) is software that manages computer hardware and software resources and provides common services for computer programs. The operating system is an essential component of the system software in a computer system. Application programs usually require an operating system to function. Unix and Unix-like operating systems Unix was originally written in assembly language.[6] Ken Thompson wrote B, mainly based on BCPL, based on his experience in the MULTICS project. B was replaced by C, and Unix, rewritten in C, developed into a large, complex family of inter-related operating systems which have been influential in every modern operating system (see History). The Unix-like family is a diverse group of operating systems, with several major sub-categories including System V, BSD, and Linux. The name "UNIX" is a trademark of The Open Group which licenses it for use with any operating system that has been shown to conform to their definitions. "UNIX-like" is commonly used to refer to the large set of operating systems which resemble the original UNIX. Unix-like systems run on a wide variety of computer architectures. They are used heavily for servers in business, as well as workstations in academic and engineering environments. Free UNIX variants, such as Linux and BSD, are popular in these areas. Four operating systems are certified by The Open Group (holder of the Unix trademark) as Unix. HP's HP-UX and IBM's AIX are both descendants of the original System V Unix and are designed to run only on their respective vendor's hardware.
  • Virus Bulletin, June 1990

    Virus Bulletin, June 1990

    June 1990 ISSN 0956-9979 THE AUTHORITATIVE INTERNATIONAL PUBLICATION ON COMPUTER VIRUS PREVENTION, RECOGNITION AND REMOVAL Editor: Edward Wilding Technical Editor: Fridrik Skulason, University of Iceland Editorial Advisors: Jim Bates, Bates Associates, UK, Phil Crewe, Fingerprint, UK, Dr. Jon David, USA, David Ferbrache, Heriot-Watt University, UK, Dr. Bertil Fortrie, Data Encryption Technologies, Holland, Hans Gliss, Datenschutz Berater, West Germany, Ross M. Greenberg, Software Concepts Design, USA, Dr. Harold Joseph Highland, Compulit Microcomputer Security Evaluation Laboratory, USA, Dr. Jan Hruska, Sophos, UK, Dr. Keith Jackson, Walsham Contracts, UK, Owen Keane, Barrister, UK, Yisrael Radai, Hebrew University, Israel, John Laws, RSRE, UK, David T. Lindsay, Digital Equipment Corporation, UK, Martin Samociuk, Network Security Management, UK, John Sherwood, Sherwood Associates, UK, Roger Usher, Coopers&Lybrand, UK, Dr. Ken Wong, BIS Applied Systems, UK KNOWN IBM PC VIRUSES CONTENTS (UPDATE) 11 EDITORIAL TOOLS & TECHNIQUES Computer Viruses as Weapon Dynamic Decompression, Systems 2 LZEXE and the Virus Problem 12 PROCEDURES WORM PROGRAMS Training and Awareness 3 The Internet Worm - Action and Reaction 13 FEATURE ARTICLE PRODUCT REVIEW The Bulgarian Computer Viruses - ‘The Virus Factory’ 6 Certus - Tools To Fight Viruses With 18 ANTI-VIRUS SOFTWARE END-NOTES & NEWS 20 FOR IBM PCs 10 VIRUS BULLETIN ©1990 Virus Bulletin Ltd, England./90/$0.00+2.50 This bulletin is available only to qualified subscribers. No part of this publication may be reproduced, stored in a retrieval system, or transmitted by any form or by any means, electronic, magnetic, optical or photocopying, without the prior written permission of the publishers or a licence permitting restricted copying issued by the Copyright Licencing Agency.
  • NEWS RELEASE Contact: Jim Ormond 212-626

    NEWS RELEASE Contact: Jim Ormond 212-626

    NEWS RELEASE Contact: Jim Ormond 212-626-0505 [email protected] ACM RECOGNIZES FAR-REACHING TECHNICAL ACHIEVEMENTS WITH SPECIAL AWARDS Award Recipients Made Contributions to Areas Including Battery-Free Communications, Theoretical Computer Science, and Software Systems New York, NY, May 26, 2021 – ACM, the Association for Computing Machinery, today announced the recipients of four prestigious technical awards. These leaders were selected by their peers for making contributions that extend the boundaries of research, advance industry, and lay the foundation for technologies that transform society. Shyamnath Gollakota, University of Washington, is the recipient of the 2020 ACM Grace Murray Hopper Award for contributions to the use of wireless signals in creating novel applications, including battery-free communications, health monitoring, gesture recognition, and bio-based wireless sensing. His work has revolutionized and reimagined what can be done using wireless systems and has a feel of technologies depicted in science fiction novels. Gollakota defined the technology referred to today as ambient backscatter—a mechanism by which an unpowered, battery-less device can harvest existing wireless signals (such as broadcast TV or WiFi) in the environment for energy and use it to transmit encoded data. In addition, he has developed techniques that can use sonar signals from smartphones to support numerous healthcare applications. Examples include detection and diagnosis of breathing anomalies such as apnea, detection of ear infections, and even detection of life-threatening opioid overdoses. These innovations have the potential to transform the way healthcare systems will be designed and delivered in the future, and some of these efforts are now being commercialized for real-world use.
  • UNIX Papers April, 1996

    UNIX Papers April, 1996

    UNIX Papers FreeBSD 2.1 April, 1996 Berkeley Pascal px Berkeley Pascal PX Implementation Notes Version 2.0 Performance Effects of Disk Subsystem Choices for VAX† Systems Running 4.2BSD UNIX. William N. Joy, M. Kirk McKusick. Revised January, 1979. Disk Performance diskperf Performance Effects of Disk Subsystem Choices for VAX† Systems Running 4.2BSD UNIX. Bob Kridle, Marshall Kirk McKusick. Revised July 27, 1983. Tune the 4.2BSD Kernel kerntune Using gprof to Tune the 4.2BSD Kernel. Marshall Kirk McKusick. Revised May 21, 1984 (?). New Virtual Memory newvm A New Virtual Memory Implementation for Berkeley. Marshall Kirk McKusick, Michael J. Karels. Revised 1986. Kernel Malloc kernmalloc Design of a General Purpose Memory Allocator for the 4.3BSD UNIX Kernel. Marshall Kirk McKusick, Michael J. Karels. Papers Contents Reprinted from: Proceedings of the San Francisco USENIX Conference, pp. 295-303, June 1988. Release Engineering relengr The Release Engineering of 4.3BSD. Marshall Kirk McKusick, Michael J. Karels, Keith Bostic. Revised 1989. Beyond 4.3BSD beyond4.3 Current Research by The Computer Systems Research Group of Berkeley. Marshall Kirk McKusick, Michael J Karels, Keith Sklower, Kevin Fall, Marc Teitelbaum, Keith Bostic. Revised February 2, 1989. Memory Based Filesystem memfs A Pageable Memory Based Filesystem. Marshall Kirk McKusick, Michael J. Karels, Keith Bostic. Revised 1990. Filesystem Interface fsinterface To ward a Compatible Filesystem Interface. Michael J. Karels, Marshall Kirk McKusick. Conference of the European Users’ Group, September 1986. Last modified April 16, 1991. System Performance sysperf Measuring and Improving the Performance of Berkeley UNIX. Marshall Kirk McKusick, Samuel J. Leffler, Michael J.
  • Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference

    Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference

    USENIX Association Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference Boston, Massachusetts, USA June 25–30, 2001 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION © 2001 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Improving the FreeBSD SMP implementation GregLehey IBM LTC Ozlabs [email protected] [email protected] ABSTRACT UNIX-derivedoperating systems have traditionally have a simplistic approach to process synchronization which is unsuited to multiprocessor application. Initial FreeBSD SMP support kept this approach by allowing only one process to run in kernel mode at any time, and also blocked interrupts across multiple processors, causing seriously suboptimal performance of I/O bound systems. This paper describes work done to remove this bot- tleneck, replacing it with fine-grained locking. It derivesfrom work done on BSD/OS and has manysimilarities with the approach taken in SunOS 5. Synchronization is per- formed primarily by a locking construct intermediate between a spin lock and a binary semaphore, termed mutexes.Ingeneral, mutexesattempt to block rather than to spin in cases where the likely wait time is long enough to warrant a process switch. The issue of blocking interrupt handlers is addressed by attaching a process context to the interrupt handlers.
  • The Design and Implementation of the 4.4BSD Operating System

    The Design and Implementation of the 4.4BSD Operating System

    The Design and Implementation of the 4.4BSD Operating System Marshall Kirk McKusick Keith Bostic Michael J. Karels John S. Quarterman The Design and Implementation of the 4.4BSD Operating System by Marshall Kirk McKusick, Keith Bostic, Michael J. Karels, and John S. Quarterman Revision: 76bf5f73 Copyright © 1996 Addison-Wesley Longman, Inc The second chapter of the book, The Design and Implementation of the 4.4BSD Operating System is excerpted here with the permission of the publisher. No part of it may be further reproduced or distributed without the publisher's express written permission. The rest of the book explores the concepts introduced in this chapter in incredible detail and is an excellent reference for anyone with an interest in BSD UNIX. More information about this book is available from the publisher, with whom you can also sign up to receive news of related titles. Information about BSD courses is available from Kirk McKusick. ii Table of Contents 2. Design Overview of 4.4BSD ......................................................................................................... 1 2.1. 4.4BSD Facilities and the Kernel ........................................................................................ 1 2.2. Kernel Organization ....................................................................................................... 2 2.3. Kernel Services ............................................................................................................. 4 2.4. Process Management .....................................................................................................