How to Not Invent Kernel Interfaces

How to not invent kernel interfaces Arnd Bergmann [email protected] July 31, 2007 Contents 1 Abstract 2 2 Concepts 2 2.1 Simplicity ............................. 2 2.2 Consistency ............................ 3 2.3 Fitness ............................... 3 2.4 ABI stability ........................... 4 3 Syscalls 5 3.1 ioctl ................................ 6 3.2 Sockets ............................... 7 3.2.1 Netlink ........................... 7 4 ABI Emulation 9 4.1 Other operating systems ..................... 9 4.2 compat ioctl ............................ 9 5 File systems 11 5.1 procfs ............................... 12 5.2 sysfs ................................ 12 5.3 debugfs .............................. 13 6 Conclusion 13 Arnd Bergmann 1/13 July 31, 2007 1 Abstract Every piece of kernel code needs some form of user space interface. Examples for this include system calls, ioctl based character devices, virtual file systems, netlink or procfs files. Choosing the right one is often hard and people tend to get flamed for their choices no matter what they do. The truth is that there usually is not a single right solution for a new problem, just different degrees on how wrong it gets, and coming up with a simple interface tends to be the most complicated task in any new subsystem. This paper explores different interfaces and their pros and cons by looking at examples from existing kernel code. 2 Concepts In Finding the right interface for a new functionality should take into account different aspects. The most important ones are simplicity, consistency and fitness for the given problem. Together, they form what can best be described as ’good taste’. Taste is the ultimate feature of a good interface, but it doesn’t have a clear definition, so we need to rely on more concrete indications for whether an interface is good enough. The importance of getting an interface right in the beginning comes from the problem of changing it later. It is always possible to completely replace all of the kernel code implementing a given functionality, but you cannot change an interface in fundamental ways without breaking user applications. 2.1 Simplicity The most important aspect for a kernel interface is simplicity. A complex interface is hard to implement correctly and hard to understand, which means application programmers will introduce bugs when trying to use it. Interestingly, it is much harder to come up with a simple interface than it is to create a complex mess. Even if you start out with a simple interface, you tend to forget some aspect of the problem and later add complexity to it in order to cover all the corner cases. When that happens, it may be better to start over from scratch and come up with a better model of what needs to be done. Since the interface needs to be implemented on both kernel and user side, both of these need to be as simple as possible. Making the user side as simple as possible is more important though, because there are typically many users Arnd Bergmann 2/13 July 31, 2007 of each kernel interface, so each one of them is prone to errors when it is not obvious how the interface is used. Example 1 (nfsservctl) Sometimes an existing interface can be replaced with something simpler. One such example is the obscure nfsservctl system call that is used for a specific subsystem, namely the NFS server in the kernel. Instead of the multiplexing system call, there is now a special file system called ’nfsd’, each file in it representing one of the subfunctions of the older system call. Using the simple fill super() and simple transaction helpers, a write() followed by a read() on a file in there can be used the same way the system call was before. This would be a good example of how to do a simple interface, if we didn’t have to implement the original system call on top of it to maintain the stable ABI. In the end, user applications still use the old syscall, and the complexity remains in both kernel and user space. 2.2 Consistency When adding a new interface to an existing kernel, you should take into account how other similar subsystems achieve the same functionality. Most of the time, multiple possible solutions exist, and if they are equally simple, you should choose the one that most closely resembles existing interfaces. This gives users and reviewers the least surprises and thereby avoids bugs. Example 2 (infiniband uverbs) The infiniband driver subsystem has an interface for direct interaction of user applications with the hardware through ’verbs’, which would typically be implemented as ioctl functions. For some reason, the author must have been scared of adding a large amount of ioctl numbers in his driver and decided to use an interface based on read/write operations on a character device. While this is not a bad idea in principle, the end result is a violation of the consistency principle. The ib uverbs write() function basically acts as an ioctl() replacement, where data buffers are passed in using write(), but contain command codes and structured data, in some cases accessing user data outside of the write buffer and passing data back to the user, or waiting for events from inside of the write() function, depending on the command code. Using a real ioctl() function instead of write() here would been consistent with other interfaces and avoid potential problems with debuggability and 32 bit emulation. 2.3 Fitness Third, an interface must be suited for the purpose it’s used for. Just because others use a particular interface does not mean that it is also a good idea for something new. Arnd Bergmann 3/13 July 31, 2007 Example 3 (Wext ioctl over netlink) One particularly sad story is configuration of wireless network devices in Linux. The wireless extensions in Linux use an ioctl interface over a socket, which is a bad idea to start with, for various reasons, but is somewhat consistent with what is used for ethernet devices and others. Some of the extensions are generic to all wireless network drivers, but other are only implemented by some of them, so a ’private’ range of ioctl numbers was assigned to these. In order to reply to ioctls asynchronously, an additional netlink interface was introduced. When it became apparent that device private numbers are not a good idea because of drivers using the same calls with different numbers and vice versa, a registration facility for them was added using special ioctls. Still adding to the complexity, some drivers started running out of private ioctl numbers, so another indirection was added for ’sub-ioctls’. With this being the most complex configuration interface in Linux (possibly aside from the history s390 ’chandev’), people called out for a rewrite. With all the best intentions, a new netlink based interface was added that unified the existing event mechanism with the configuration interfaces. Unfortunately, the new netlink interface had all the same warts as the ioctl interface, and was later removed again after a series of protests. 2.4 ABI stability Linux defines interface stability in terms of the interface between kernel and user space. We happily break the interface between kernel components all the time (see Documentation/stable-API-nonsense.txt for details), but that does not mean that we break user interfaces. The idea is to guarantee every user program that has ever run on a mainline kernel to keep running on every future version. In particular, the interface stability includes all device drivers, since they are distributed together with the kernel, but not together with the programs using these drivers. There have been cases in the past where we broke an ABI, whether inten- tionally or not. In general, this results in an endless stream of bug reports from users that all fall into the same traps, and public humiliation for the responsible developers. Example 4 (Module loader) The 2.6. version of Linux introduced a complex reimplementation of loadable modules. There was only one program using the system calls for module loading (modutils) and that was replaced by module-init- tools. While almost everyone agrees that the 2.6 method of loading modules is much preferrable to the old one, the transition caused an endless amount of pain for users and distributors alike. Arnd Bergmann 4/13 July 31, 2007 Example 5 (udev) The udev tool is a replacement for the a set of functionality, including the devfs file system, the hotplug tools and module autoloading. During the development, a lot of care was taken to ensure compatiblity with the older mechanism, so users that were switching from older kernels were not forced to upgrade to udev. Unfortunately, not enough care was taken to ensure compatiblity between udev versions. Since udev made use of a lot of sysfs files, it broke when some of these files went away. As a result, distributions shipping with an older version of udev could not be upgraded to a newer kernel without upgrading the udev version first. Example 6 (KVM) The KVM hypervisor was introduced into the kernel in late 2006. For the sake of being extremely useful, it was allowed into the kernel with an unstable ioctl ABI, which then got broken in every each of the following kernel versions, and even more often in the development tree. As long as the user base consisted of only a small group of developers, that went fine, since everyone capable of using the code was aware of this problem. With 2.6.22, the interface was formally frozen and will have to be maintained in a compatible way, but it is still likely to require changes in the near future.

How to Not Invent Kernel Interfaces

MLNX OFED Documentation Rev 5.0-2.1.8.0

Userland Tools and Techniques for Board Bring up and Systems Integration

The Linux Kernel Module Programming Guide

Hetfs: a Heterogeneous File System for Everyone

Procedures to Build Crypto Libraries in Minix

System Calls System Calls

UNIX Systems Programming II Systems Unixprogramming II Short Course Notes

Toward MP-Safe Networking in Netbsd

Flexible Lustre Management

Reducing Power Consumption in Mobile Devices by Using a Kernel

Beej's Guide to Unix IPC

ECE 598 – Advanced Operating Systems Lecture 19