The Desktop File System
Total Page:16
File Type:pdf, Size:1020Kb
The Desktop File System Morgan Clark Stephen Rago Programmed Logic Corporation 200 Cottontail Lane Somerset, NJ 08873 Abstract 1.1 Goals This paper describes the structure and performance Our goals in designing the Desktop File System characteristics of a commercial file system designed (DTFS) were influenced by our impressions of what for use on desktop, laptop, and notebook computers the environment was like for small computer systems, running the UNIX operating system. Such systems such as desktop and laptop computers. The physical are characterized by their small disk drives dictated size of these systems limits the size of the power sup- by system size and power requirements. In addition, plies and hard disk drives that they can use at a rea- these systems are often used by people who have lit- sonable cost. Systems that are powered by batteries tle or no experience administering Unix systems. attempt to use small disks to minimize the power The Desktop File System attempts to improve overall drained by the disk, thus increasing the amount of system usability by transparently compressing files, time that the system can be used before requiring that increasing file system reliability, and simplifying the batteries be recharged. administrative interfaces. The Desktop File System It is common to find disk sizes in the range of has been in production use for over a year, and will 80 to 300 Megabytes in current 80x86-based laptop be included in future versions of the SCO Open and notebook systems. Documentation for current Desktop Unix system. Although originally intended versions of UnixWare recommend a minimum of 80 for a desktop environment, the file system is also MB of disk space for the personal edition, and 120 being used on many larger, server-style machines. MB for the application server. Similarly, Solaris documentation stipulates a minimum of 200 MB of 1. Overview disk space. These recommendations do not include This paper describes a commercial file system space for additional software packages. designed for use on desktop, laptop, and notebook We also had the impression that desktop and computers running the UNIX operating system. We notebook computers were less likely to be admin- describe design choices made and discuss some of the istered properly than larger systems in a general com- interesting ramifications of those choices. The most puting facility, because the primary user of the sys- notable characteristic of the file system is its ability tems will probably be performing the administrative to compress and decompress files ‘‘on-the-fly.’’ We procedures, often without the experience of profes- provide performance information that proves such a sional system administrators. These impressions led file system is a viable option in the Unix marketplace. us to the following goals: When we use the term ‘‘commercial file sys- Reduce the amount of disk space needed by con- tem,’’ we mean to imply two things. First, the file ventional file systems. system is used in real life. It is not a prototype, nor is Increase file system robustness in the presence of it a research project. Second, our design choices abnormal system failures. were limited to the scope of the file system. We were Minimize any performance degradation that might not free to rewrite portions of the base operating sys- arise because of data compression. tem to meet our needs, with one exception (we pro- Simplify administrative interfaces when possible. vided our own routines to access the system buffer The most obvious way to decrease the amount cache). of disk space used was to compress user data. (We use the term ‘‘user data’’ to refer to the data read and applications that read the executables see from and written to a file, and we use the term ‘‘meta compressed data. These files must be uncompressed data’’ to refer to accounting information used inter- before becoming intelligible to programs like nally by the file system to represent files.) Our debuggers, for example. The compression and efforts did not stop there, however. We designed the decompression steps are not transparent. file system to allocate disk inodes as they are needed, In [CAT91], compression and decompression so that no space is wasted by unused inodes. In addi- are integrated into a file system. Files are tion, we use a variable block size for user data. This compressed and decompressed in their entirety, with minimizes the amount of space wasted by partially- the disk split into two sections: one area contains filled disk blocks. compressed file images, and the other area is used as Our intent in increasing the robustness of the a cache for the uncompressed file images. The file system stemmed from our belief that users of authors consider it prohibitively expensive to other desktop systems (such as MS-DOS) would rou- decompress all files on every access, hence the cache. tinely shut their systems down merely by powering This reduces the disk space that would have been off the computer, instead of using some more gradual available had the entire disk been used to hold only method. As it turned out, our eventual choice of file compressed images. structure required us to build robustness in anyway. The file system was prototyped as an NFS From the outset, we realized that any file sys- server. Files are decompressed when they are first tem that added another level of data processing would accessed, thus migrate from the compressed portion probably be slower than other file systems. of the disk to the uncompressed portion. A daemon Nevertheless, we believed that this would not be runs at off-peak hours to compress the files least- noticeable on most systems because of the disparity recently used, and move them back to the compressed between CPU and disk I/O speeds. In fact, current portion of the disk. trends indicate that this disparity is widening as CPU Transparent data compression is much more speeds increase at a faster pace than disk I/O speeds common with the MS-DOS operating system. Pro- [KAR94]. Most systems today are bottlenecked in ducts like Stacker have existed for many years. They the I/O subsystem [OUS90], so the spare CPU cycles are implemented as pseudo-disk drivers, intercepting would be better spent compressing and decompress- I/O requests, and applying them to a compressed ing data. image of the disk [HAL94]. Our assumptions about typical users led us to We chose not to implement our solution as a believe that users would not know the number of pseudo-driver because we felt that a file system inodes that they needed at the time they made a file implementation would integrate better into the Unix system, and some might not even know the size of a operating system. For example, file systems typically given disk partition. We therefore endeavored to maintain counters that track the number of available make the interfaces to the file system administrative disk blocks in a given partition. A pseudo-driver commands as simple as possible. would either have to guess how many compressed blocks would fit in its disk partition or continually try 1.2 Related Work to fool the file system as to the number of available Several attempts to integrate file compression and disk blocks. No means of feedback is available, short decompression into the file system have been made in of the pseudo-driver modifying the file system’s data the past. [TAU91] describes compression as applied structures. If the pseudo-driver were to make a guess to executables on a RISC-based computer with a at the time a file system is created, then things would large page size and small disks. While compression get sticky if files didn’t compress as well as expected. is performed by a user-level program, the kernel is In addition, a file system implementation gave responsible for decompression when it faults a page us the opportunity to employ transaction processing in from disk. Each memory page is compressed techniques to increase the robustness of the file sys- independently, with each compressed page stored on tem in the presence of abnormal system failures. A disk starting with a new disk block. This simplifies pseudo-driver would have no knowledge of a given the process of creating an uncompressed in-core file system’s data structures, and thus would have no image of a page from its compressed disk image. way of associating a disk block address with any DTFS borrows this technique. other related disk blocks. A file system, on the other The disadvantage with this work is that it hand, could employ its knowledge about the interrela- doesn’t fully integrate compression into the file sys- tionships of disk blocks to ensure that files are always tem. Only binary executable files are compressed, kept in a consistent state. super block inode data/inode block bitmap bitmap blocks Figure 1. File System Layout Much more work has gone into increasing file count goes to 0, an integer, called the generation system robustness than has gone into integrating count, in the disk inode is incremented. This genera- compression into file systems. Episode [CHU92], tion count is part of the file handle used to represent WAFL [HIT94], and the Log-structured File System the file on client machines. Thus, when a file is [SEL92] all use copy-on-write techniques, also removed, active handles for the file are made invalid, known as shadow paging, to keep files consistent. because newly-generated file handles for the inode Copies of modified disk blocks are not associated will no longer match those held by clients. with files until the associated transaction is commit- With DTFS, the inode block can be reallocated ted.