Proceedings of the Bsdcon 2002 Conference
Total Page:16
File Type:pdf, Size:1020Kb
USENIX Association Proceedings of the BSDCon 2002 Conference San Francisco, California, USA February 11-14, 2002 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION © 2002 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Rethinking /devand devices in the UNIX kernel Poul-Henning Kamp <[email protected]> The FreeBSD Project Abstract An outstanding novelty in UNIX at its introduction was the notion of ‘‘a file is a file is a file and evenadevice is a file.’’ Going from ‘‘hardware only changes when the DEC Field engineer is here’’to‘‘my toaster has USB’’has put serious strain on the rather crude implementation of the ‘‘devices as files’’concept, an implementation which has survivedpractically unchanged for 30 years in most UNIX variants. Starting from a high-levelviewofdevices and the semantics that have grown around them overthe years, this paper takes the audience on a grand tour of the redesigned FreeBSD device-I/O system, to convey anoverviewofhow itall fits together,and to explain whythings ended up as theydid, howtouse the newfeatures and in particular hownot to. 1. Introduction tax and meaning, so that a program expecting a file name as a parameter can be passed a device name; There are really only twofundamental ways to concep- finally,special files are subject to the same protec- tualise I/O devices in an operating system: The usual tion mechanism as regular files. wayand the UNIX way. At the time, this was quite a strange concept; it was The usual way is to treat I/O devices as their own class totally accepted for instance, that neither the system of things, possibly several classes of things, and pro- administrator nor the users were able to interact with a vide APIs tailored to the semantics of the devices. In disk as a disk. Operating systems simply did not pro- practice this means that a program must knowwhat it is vide access to disk other than as a filesystem. Most dealing with, it has to interact with disks one way,tapes vendors did not evenrelease a program to initialise a another and rodents yet a third way,all of which are disk-pack with a filesystem: selling pre-initialised and different from howitinteracts with a plain disk file. ‘‘quality tested’’disk-packs was quite a profitable busi- The UNIX way has neverbeen described better than in ness. the very first paper published on UNIX by Ritchie and In manycases some kind of API for reading and writ- Thompson [Ritchie74]: ing individual sectors on a disk pack did exist in the Special files constitute the most unusual feature of operating system, but more often than not it was not the UNIX filesystem. Each supported I/O device listed in the public documentation. is associated with at least one such file. Special files are read and written just likeordinary disk files, but requests to read or write result in acti- 1.1. The traditional implementation vation of the associated device. An entry for each The initial implementation used hardcoded inode num- special file resides in directory /dev, although a bers [Ritchie98]. The console device would be inode link may be made to one of these files just as it number 5, the paper-tape-punch number 6 and so on, may to an ordinary file. Thus, for example, to ev enifthose inodes were also actual regular files in the write on a magnetic tape one may write on the file filesystem. /dev/mt. Forreasons one can only too vividly imagine, this was Special files exist for each communication line, changed and Thompson [Thompson78] describes how each disk, each tape drive,and for physical main the implementation nowused ‘‘major and minor’’ memory.Ofcourse, the active disks and the mem- device numbers to indexthough the devsw array to the ory special files are protected from indiscriminate correct device driver. access. Forall intents and purposes, this is the implementation There is a threefold advantage in treating I/O de- which survivesinmost UNIX-likesystems eventothis vices this way: file and device I/O are as similar as day.Apart from the access control and timestamp possible; file and device names have the same syn- information which is found in all inodes, the special inodes in the filesystem contain only one piece of infor- hardware had changed since last reboot. This boot pro- mation: the major and minor device numbers, often log- cedure would amongst other things create the necessary ically OR’ed to one field. special files in the filesystem, based on an intricate sys- When a program opens a special file, the kernel uses tem of per device driverconfiguration files. the major number to find the entry points in the device In the recent years, we have become used to hardware driver, and passes the combined major and minor num- which changes configuration at anytime: people plug bers as a parameter to the device driver. USB, Firewire and PCCard devices into their comput- ers. These devices can be anything from modems and 2. The challenge disks to GPS receivers and fingerprint authentication hardware. Suddenly maintaining the correct set of spe- Now, wedid not talk much about where the special cial devices in ‘‘/dev’’became a major headache. inodes came from to begin with. Theywere created by Along the way,UNIX kernels had learned to deal with hand, using the mknod(2) system call, usually through multiple filesystem types [Heidemann91a] and a the mknod(8) program. ‘‘device-pseudo-filesystem’’was a pretty obvious idea. In those days a computer had a very static hardware The device drivers have a pretty good idea which configuration1 and it certainly did not change while the devices theyhav e found in the configuration, so all that system was up and running, so creating device nodes is needed is to present this information as a filesystem by hand was certainly an acceptable solution. filled with just the right special files. Experience has The first sign that this would not hold up as a solution shown that this likemost other ‘‘pseudo filesystems’’ came with the advent of TCP/IP and the telnet(1) pro- sound a lot simpler in theory than in practice. gram, or more precisely with the telnetd(8) daemon. In order to support remote login a ‘‘pseudo-tty’’device 3. Truly understanding devices driverwas implemented, basically as tty driverwhich Before we continue, we need to fully understand the instead of hardware had another device which would ‘‘device special file’’inUNIX. allowaprocess to ‘‘act as hardware’’for the tty.The telnetd(8) daemon would read and write data on the First we need to realize that a special file has the nature ‘‘master’’side of the pseudo-tty and the user would be of a pointer from the filesystem into a different names- running on the ‘‘slave’’ side, which would act just like pace; a little understood fact with far reaching conse- anyother tty: you could change the erase character if quences. you wanted to and all the signals and all that stuff One implication of this is that several special files can worked. exist in the filename namespace all pointing to the same Obviously with a device requiring no hardware, you device but each having their own access and timestamp can compile as manyinstances into the kernel as you attributes: like, as long as you do not use too much memory.As guest# ls -l /dev/fd0 /tmp/fd0 system after system was connected to the ARPANet, crw-r----- 1 root operator 9, 0 Sep 27 19:21 /dev/fd0 crw-rw-rw- 1 root wheel 9, 0 Sep 27 19:24 /tmp/fd0 ‘‘increasing number of ptys’’became a regular task for system administrators, and part of this task was to cre- Obviously,the administrator needs to be on top of this: ate more special nodes in the filesystem. one popular way to exploit an unguarded root prompt is to create a replica of the special file /dev/kmem in a Several UNIX vendors also noticed an issue when they location where it will not be noticed. Since /dev/kmem sold minicomputers in manydifferent configurations: givesaccess to the kernel memory,gaining anyparticu- explaining to system administrators just which special lar privilege can be arranged by suitably modifying the nodes theywould need and howtocreate them were a kernel’sdata structures through the illicit special file. significant documentation hassle. Some opted for the simple solution and pre-populated /devwith every con- When NFS appeared it opened a newavenue for this ceivable device node, resulting in a predictable slow- attack: People may have root privilege on one machine down on access to filenames in /dev. butnot another.Since device nodes are not interpreted on the NFS server but rather on the local computer,a System V UNIX provided a band-aid solution: a special user with root privilege on a NFS client computer can boot sequence would takeeffect if the kernel or the create a device node to his liking on a filesystem mounted from an NFS server.This device node can in 1 Unless your assigned field engineer was present on site. turn be used to circumvent the security of other com- puters which mount that filesystem, including the server,unless theyprotect themselves by not trusting on the disk special file, not by polluting the filesystem anydevice entries on untrusted filesystem by mounting code with another file type.