How to Not Invent Kernel Interfaces

Total Page:16

File Type:pdf, Size:1020Kb

How to Not Invent Kernel Interfaces How to not invent kernel interfaces Arnd Bergmann [email protected] July 31, 2007 Contents 1 Abstract 2 2 Concepts 2 2.1 Simplicity ............................. 2 2.2 Consistency ............................ 3 2.3 Fitness ............................... 3 2.4 ABI stability ........................... 4 3 Syscalls 5 3.1 ioctl ................................ 6 3.2 Sockets ............................... 7 3.2.1 Netlink ........................... 7 4 ABI Emulation 9 4.1 Other operating systems ..................... 9 4.2 compat ioctl ............................ 9 5 File systems 11 5.1 procfs ............................... 12 5.2 sysfs ................................ 12 5.3 debugfs .............................. 13 6 Conclusion 13 Arnd Bergmann 1/13 July 31, 2007 1 Abstract Every piece of kernel code needs some form of user space interface. Examples for this include system calls, ioctl based character devices, virtual file systems, netlink or procfs files. Choosing the right one is often hard and people tend to get flamed for their choices no matter what they do. The truth is that there usually is not a single right solution for a new problem, just different degrees on how wrong it gets, and coming up with a simple interface tends to be the most complicated task in any new subsystem. This paper explores different interfaces and their pros and cons by looking at examples from existing kernel code. 2 Concepts In Finding the right interface for a new functionality should take into account different aspects. The most important ones are simplicity, consistency and fitness for the given problem. Together, they form what can best be described as ’good taste’. Taste is the ultimate feature of a good interface, but it doesn’t have a clear definition, so we need to rely on more concrete indications for whether an interface is good enough. The importance of getting an interface right in the beginning comes from the problem of changing it later. It is always possible to completely replace all of the kernel code implementing a given functionality, but you cannot change an interface in fundamental ways without breaking user applications. 2.1 Simplicity The most important aspect for a kernel interface is simplicity. A complex interface is hard to implement correctly and hard to understand, which means application programmers will introduce bugs when trying to use it. Interestingly, it is much harder to come up with a simple interface than it is to create a complex mess. Even if you start out with a simple interface, you tend to forget some aspect of the problem and later add complexity to it in order to cover all the corner cases. When that happens, it may be better to start over from scratch and come up with a better model of what needs to be done. Since the interface needs to be implemented on both kernel and user side, both of these need to be as simple as possible. Making the user side as simple as possible is more important though, because there are typically many users Arnd Bergmann 2/13 July 31, 2007 of each kernel interface, so each one of them is prone to errors when it is not obvious how the interface is used. Example 1 (nfsservctl) Sometimes an existing interface can be replaced with something simpler. One such example is the obscure nfsservctl system call that is used for a specific subsystem, namely the NFS server in the kernel. Instead of the multiplexing system call, there is now a special file system called ’nfsd’, each file in it representing one of the subfunctions of the older system call. Using the simple fill super() and simple transaction helpers, a write() followed by a read() on a file in there can be used the same way the system call was before. This would be a good example of how to do a simple interface, if we didn’t have to implement the original system call on top of it to maintain the stable ABI. In the end, user applications still use the old syscall, and the complexity remains in both kernel and user space. 2.2 Consistency When adding a new interface to an existing kernel, you should take into account how other similar subsystems achieve the same functionality. Most of the time, multiple possible solutions exist, and if they are equally simple, you should choose the one that most closely resembles existing interfaces. This gives users and reviewers the least surprises and thereby avoids bugs. Example 2 (infiniband uverbs) The infiniband driver subsystem has an inter- face for direct interaction of user applications with the hardware through ’verbs’, which would typically be implemented as ioctl functions. For some reason, the author must have been scared of adding a large amount of ioctl numbers in his driver and decided to use an interface based on read/write operations on a character device. While this is not a bad idea in principle, the end result is a violation of the consistency principle. The ib uverbs write() function basically acts as an ioctl() replacement, where data buffers are passed in using write(), but contain command codes and structured data, in some cases accessing user data outside of the write buffer and passing data back to the user, or waiting for events from inside of the write() function, depending on the command code. Using a real ioctl() function instead of write() here would been consistent with other interfaces and avoid potential problems with debuggability and 32 bit emulation. 2.3 Fitness Third, an interface must be suited for the purpose it’s used for. Just because others use a particular interface does not mean that it is also a good idea for something new. Arnd Bergmann 3/13 July 31, 2007 Example 3 (Wext ioctl over netlink) One particularly sad story is config- uration of wireless network devices in Linux. The wireless extensions in Linux use an ioctl interface over a socket, which is a bad idea to start with, for various reasons, but is somewhat consistent with what is used for ethernet devices and others. Some of the extensions are generic to all wireless network drivers, but other are only implemented by some of them, so a ’private’ range of ioctl numbers was assigned to these. In order to reply to ioctls asynchronously, an additional netlink interface was introduced. When it became apparent that device private numbers are not a good idea because of drivers using the same calls with different numbers and vice versa, a registration facility for them was added using special ioctls. Still adding to the complexity, some drivers started running out of private ioctl numbers, so another indirection was added for ’sub-ioctls’. With this being the most complex configuration interface in Linux (possibly aside from the history s390 ’chandev’), people called out for a rewrite. With all the best intentions, a new netlink based interface was added that unified the existing event mechanism with the configuration interfaces. Unfortunately, the new netlink interface had all the same warts as the ioctl interface, and was later removed again after a series of protests. 2.4 ABI stability Linux defines interface stability in terms of the interface between kernel and user space. We happily break the interface between kernel components all the time (see Documentation/stable-API-nonsense.txt for details), but that does not mean that we break user interfaces. The idea is to guarantee every user program that has ever run on a mainline kernel to keep running on every future version. In particular, the interface stability includes all device drivers, since they are distributed together with the kernel, but not together with the programs using these drivers. There have been cases in the past where we broke an ABI, whether inten- tionally or not. In general, this results in an endless stream of bug reports from users that all fall into the same traps, and public humiliation for the responsible developers. Example 4 (Module loader) The 2.6. version of Linux introduced a complex reimplementation of loadable modules. There was only one program using the system calls for module loading (modutils) and that was replaced by module-init- tools. While almost everyone agrees that the 2.6 method of loading modules is much preferrable to the old one, the transition caused an endless amount of pain for users and distributors alike. Arnd Bergmann 4/13 July 31, 2007 Example 5 (udev) The udev tool is a replacement for the a set of functionality, including the devfs file system, the hotplug tools and module autoloading. During the development, a lot of care was taken to ensure compatiblity with the older mechanism, so users that were switching from older kernels were not forced to upgrade to udev. Unfortunately, not enough care was taken to ensure compatiblity between udev versions. Since udev made use of a lot of sysfs files, it broke when some of these files went away. As a result, distributions shipping with an older version of udev could not be upgraded to a newer kernel without upgrading the udev version first. Example 6 (KVM) The KVM hypervisor was introduced into the kernel in late 2006. For the sake of being extremely useful, it was allowed into the kernel with an unstable ioctl ABI, which then got broken in every each of the following kernel versions, and even more often in the development tree. As long as the user base consisted of only a small group of developers, that went fine, since everyone capable of using the code was aware of this problem. With 2.6.22, the interface was formally frozen and will have to be maintained in a compatible way, but it is still likely to require changes in the near future.
Recommended publications
  • MLNX OFED Documentation Rev 5.0-2.1.8.0
    MLNX_OFED Documentation Rev 5.0-2.1.8.0 Exported on May/21/2020 06:13 AM https://docs.mellanox.com/x/JLV-AQ Notice This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality. NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice. Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete. NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document. NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage.
    [Show full text]
  • Userland Tools and Techniques for Board Bring up and Systems Integration
    Userland Tools and Techniques for Board Bring Up and Systems Integration ELC 2012 HY Research LLC http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC Agenda * Introduction * What? * Why bother with userland? * Common SoC interfaces * Typical Scenario * Kernel setup * GPIO/UART/I2C/SPI/Other * Questions HY Research LLC http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC Introduction * SoC offer a lot of integrated functionality * System designs differ by outside parts * Most mobile systems are SoC * "CPU boards" for SoCs * Available BSP for starting * Vendor or other sources * Common Unique components * Memory (RAM) * Storage ("flash") * IO * Displays HY Research LLC * Power supplies http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC What? * IO related items * I2C * SPI * UART * GPIO * USB HY Research LLC http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC Why userland? * Easier for non kernel savy * Quicker turn around time * Easier to debug * Often times available already Sample userland from BSP/LSP vendor * Kernel driver is not ready HY Research LLC http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC Common SoC interfaces Most SoC have these and more: * Pinmux * UART * GPIO * I2C * SPI * USB * Not discussed: Audio/Displays HY Research LLC http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC Typical Scenario Custom board: * Load code/Bring up memory * Setup memory controller for part used * Load Linux * Toggle lines on board to verify Prototype based on demo/eval board: * Start with board at a shell prompt * Get newly attached hw to work HY Research LLC http://www.hy-research.com/ Feb 5, 2012 (C) 2012 HY Research LLC Kernel Setup Additions to typical configs: * Enable UART support (typically done already) * Enable I2C support along with drivers for the SoC * (CONFIG_I2C + other) * Enable SPIdev * (CONFIG_SPI + CONFIG_SPI_SPIDEV) * Add SPIDEV to board file * Enable GPIO sysfs * (CONFIG_GPIO + other + CONFIG_GPIO_SYSFS) * Enable USB * Depends on OTG vs normal, etc.
    [Show full text]
  • The Linux Kernel Module Programming Guide
    The Linux Kernel Module Programming Guide Peter Jay Salzman Michael Burian Ori Pomerantz Copyright © 2001 Peter Jay Salzman 2007−05−18 ver 2.6.4 The Linux Kernel Module Programming Guide is a free book; you may reproduce and/or modify it under the terms of the Open Software License, version 1.1. You can obtain a copy of this license at http://opensource.org/licenses/osl.php. This book is distributed in the hope it will be useful, but without any warranty, without even the implied warranty of merchantability or fitness for a particular purpose. The author encourages wide distribution of this book for personal or commercial use, provided the above copyright notice remains intact and the method adheres to the provisions of the Open Software License. In summary, you may copy and distribute this book free of charge or for a profit. No explicit permission is required from the author for reproduction of this book in any medium, physical or electronic. Derivative works and translations of this document must be placed under the Open Software License, and the original copyright notice must remain intact. If you have contributed new material to this book, you must make the material and source code available for your revisions. Please make revisions and updates available directly to the document maintainer, Peter Jay Salzman <[email protected]>. This will allow for the merging of updates and provide consistent revisions to the Linux community. If you publish or distribute this book commercially, donations, royalties, and/or printed copies are greatly appreciated by the author and the Linux Documentation Project (LDP).
    [Show full text]
  • Hetfs: a Heterogeneous File System for Everyone
    HetFS: A Heterogeneous File System for Everyone Georgios Koloventzos1, Ramon Nou∗1, Alberto Miranda∗1, and Toni Cortes1;2 1 Barcelona Supercomputing Center (BSC) Barcelona, Spain 2 Universitat Polit`ecnicade Catalunya Barcelona, Spain georgios.koloventzos,ramon.nou,alberto.miranda,toni.cortes}@bsc.es Abstract Storage devices have been getting more and more diverse during the last decade. The advent of SSDs made it painfully clear that rotating devices, such as HDDs or magnetic tapes, were lacking in regards to response time. However, SSDs currently have a limited number of write cycles and a significantly larger price per capacity, which has prevented rotational technologies from begin abandoned. Additionally, Non-Volatile Memories (NVMs) have been lately gaining traction, offering devices that typically outperform NAND-based SSDs but exhibit a full new set of idiosyncrasies. Therefore, in order to appropriately support this diversity, intelligent mech- anisms will be needed in the near-future to balance the benefits and drawbacks of each storage technology available to a system. In this paper, we present a first step towards such a mechanism called HetFS, an extension to the ZFS file system that is capable of choosing the storage device a file should be kept in according to preprogrammed filters. We introduce the prototype and show some preliminary results of the effects obtained when placing specific files into different devices. 1 Introduction Storage devices have shown a significant evolution in the latest decade. As the improvements in the latencies of traditional hard disk drives (HDDs) have dimin- ished due to the mechanical limitations inherent to their design, other technolo- gies have been emerging to try and take their place.
    [Show full text]
  • Procedures to Build Crypto Libraries in Minix
    Created by Jinkai Gao (Syracuse University) Seed Document How to talk to inet server Note: this docment is fully tested only on Minix3.1.2a. In this document, we introduce a method to let user level program to talk to inet server. I. Problem with system call Recall the process of system call, refering to http://www.cis.syr.edu/~wedu/seed/Labs/Documentation/Minix3/System_call_sequence.pd f We can see the real system call happens in this function call: _syscall(FS, CHMOD, &m) This function executes ‘INT 80’ to trap into kernel. Look at the parameter it passs to kernel. ‘CHMOD’ is a macro which is merely the system call number. ‘FS’ is a macro which indicates the server which handles the chmod system call, in this case ‘FS’ is 1, which is the pid of file system server process. Now your might ask ‘why can we hard code the pid of the process? Won’t it change?’ Yes, normally the pid of process is unpredictable each time the system boots up. But for fs, pm, rs, init and other processes which is loaded from system image at the very beginning of booting time, it is not true. In minix, you can dump the content of system image by pressing ‘F3’, then dump the current running process table by pressing ‘F1’. What do you find? The first 12 entries in current process table is exactly the ones in system image with the same order. So the pids of these 12 processes will not change. Inet is different. It is not in the system image, so it is not loaded into memory in the very first time.
    [Show full text]
  • System Calls System Calls
    System calls We will investigate several issues related to system calls. Read chapter 12 of the book Linux system call categories file management process management error handling note that these categories are loosely defined and much is behind included, e.g. communication. Why? 1 System calls File management system call hierarchy you may not see some topics as part of “file management”, e.g., sockets 2 System calls Process management system call hierarchy 3 System calls Error handling hierarchy 4 Error Handling Anything can fail! System calls are no exception Try to read a file that does not exist! Error number: errno every process contains a global variable errno errno is set to 0 when process is created when error occurs errno is set to a specific code associated with the error cause trying to open file that does not exist sets errno to 2 5 Error Handling error constants are defined in errno.h here are the first few of errno.h on OS X 10.6.4 #define EPERM 1 /* Operation not permitted */ #define ENOENT 2 /* No such file or directory */ #define ESRCH 3 /* No such process */ #define EINTR 4 /* Interrupted system call */ #define EIO 5 /* Input/output error */ #define ENXIO 6 /* Device not configured */ #define E2BIG 7 /* Argument list too long */ #define ENOEXEC 8 /* Exec format error */ #define EBADF 9 /* Bad file descriptor */ #define ECHILD 10 /* No child processes */ #define EDEADLK 11 /* Resource deadlock avoided */ 6 Error Handling common mistake for displaying errno from Linux errno man page: 7 Error Handling Description of the perror () system call.
    [Show full text]
  • UNIX Systems Programming II Systems Unixprogramming II Short Course Notes
    Systems UNIXProgramming II Systems UNIXProgramming II Systems UNIXProgramming II UNIX Systems Programming II Systems UNIXProgramming II Short Course Notes Alan Dix © 1996 Systems Programming II http://www.hcibook.com/alan/ UNIX Systems Course UNIXProgramming II Outline Alan Dix http://www.hcibook.com/alan/ Session 1 files and devices inodes, stat, /dev files, ioctl, reading directories, file descriptor sharing and dup2, locking and network caching Session 2 process handling UNIX processes, fork, exec, process death: SIGCHLD and wait, kill and I/O issues for fork Session 3 inter-process pipes: at the shell , in C code and communication use with exec, pseudo-terminals, sockets and deadlock avoidance Session 4 non-blocking I/O and UNIX events: signals, times and select I/O; setting timers, polling, select, interaction with signals and an example Internet server Systems UNIXProgrammingII Short Course Notes Alan Dix © 1996 II/i Systems Reading UNIXProgramming II ¥ The Unix V Environment, Stephen R. Bourne, Wiley, 1987, ISBN 0 201 18484 2 The author of the Borne Shell! A 'classic' which deals with system calls, the shell and other aspects of UNIX. ¥ Unix For Programmers and Users, Graham Glass, Prentice-Hall, 1993, ISBN 0 13 061771 7 Slightly more recent book also covering shell and C programming. Ì BEWARE Ð UNIX systems differ in details, check on-line documentation ¥ UNIX manual pages: man creat etc. Most of the system calls and functions are in section 2 and 3 of the manual. The pages are useful once you get used to reading them! ¥ The include files themselves /usr/include/time.h etc.
    [Show full text]
  • Toward MP-Safe Networking in Netbsd
    Toward MP-safe Networking in NetBSD Ryota Ozaki <[email protected]> Kengo Nakahara <[email protected]> EuroBSDcon 2016 2016-09-25 Contents ● Background and goals ● Approach ● Current status ● MP-safe Layer 3 forwarding ● Performance evaluations ● Future work Background ● The Multi-core Era ● The network stack of NetBSD couldn’t utilize multi-cores ○ As of 2 years ago CPU 0 NIC A NIC B CPU 1 Our Background and Our Goals ● Internet Initiative Japan Inc. (IIJ) ○ Using NetBSD in our products since 1999 ○ Products: Internet access routers, etc. ● Our goal ○ Better performance of our products, especially Layer 2/3 forwarding, tunneling, IPsec VPN, etc. → MP-safe networking Our Targets ● Targets ○ 10+ cores systems ○ 1 Gbps Intel NICs and virtualized NICs ■ wm(4), vmx(4), vioif(4) ○ Layer 2 and 3 ■ IPv4/IPv6, bridge(4), gif(4), vlan(4), ipsec(4), pppoe(4), bpf(4) ● Out of targets ○ 100 cores systems and above ○ Layer 4 and above ■ and any other network components except for the above Approach ● MP-safe and then MP-scalable ● Architecture ○ Utilize hardware assists ○ Utilize lightweight synchronization mechanisms ● Development ○ Restructure the code first ○ Benchmark often Approach : Architecture ● Utilize hardware assists ○ Distribute packets to CPUs by hardware ■ NIC multi-queue and RSS ● Utilize software techniques ○ Lightweight synchronization mechanisms ■ Especially pserialize(9) and psref(9) ○ Existing facilities ■ Fast forwarding and ipflow Forwarding Utilizing Hardware Assists Least locks Rx H/W queues CPU 0 Tx H/W queues queue 0 queue 0 CPU 1 queue 1 queue 1 NIC A NIC B queue 2 queue 2 CPU 2 queue 3 queue 3 CPU 3 Packets are distributed Packets are processed by hardware based on on a received CPU to flow (5-tuples) the last Approach : Development ● Restructure the code first ○ Hard to simply apply locks to the existing code ■ E.g., hardware interrupt context for Layer 2, cloning/cloned routes, etc.
    [Show full text]
  • Flexible Lustre Management
    Flexible Lustre management Making less work for Admins ORNL is managed by UT-Battelle for the US Department of Energy How do we know Lustre condition today • Polling proc / sysfs files – The knocking on the door model – Parse stats, rpc info, etc for performance deviations. • Constant collection of debug logs – Heavy parsing for common problems. • The death of a node – Have to examine kdumps and /or lustre dump Origins of a new approach • Requirements for Linux kernel integration. – No more proc usage – Migration to sysfs and debugfs – Used to configure your file system. – Started in lustre 2.9 and still on going. • Two ways to configure your file system. – On MGS server run lctl conf_param … • Directly accessed proc seq_files. – On MSG server run lctl set_param –P • Originally used an upcall to lctl for configuration • Introduced in Lustre 2.4 but was broken until lustre 2.12 (LU-7004) – Configuring file system works transparently before and after sysfs migration. Changes introduced with sysfs / debugfs migration • sysfs has a one item per file rule. • Complex proc files moved to debugfs • Moving to debugfs introduced permission problems – Only debugging files should be their. – Both debugfs and procfs have scaling issues. • Moving to sysfs introduced the ability to send uevents – Item of most interest from LUG 2018 Linux Lustre client talk. – Both lctl conf_param and lctl set_param –P use this approach • lctl conf_param can set sysfs attributes without uevents. See class_modify_config() – We get life cycle events for free – udev is now involved. What do we get by using udev ? • Under the hood – uevents are collect by systemd and then processed by udev rules – /etc/udev/rules.d/99-lustre.rules – SUBSYSTEM=="lustre", ACTION=="change", ENV{PARAM}=="?*", RUN+="/usr/sbin/lctl set_param '$env{PARAM}=$env{SETTING}’” • You can create your own udev rule – http://reactivated.net/writing_udev_rules.html – /lib/udev/rules.d/* for examples – Add udev_log="debug” to /etc/udev.conf if you have problems • Using systemd for long task.
    [Show full text]
  • Reducing Power Consumption in Mobile Devices by Using a Kernel
    IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. Z, NO. B, AUGUST 2017 1 Reducing Event Latency and Power Consumption in Mobile Devices by Using a Kernel-Level Display Server Stephen Marz, Member, IEEE and Brad Vander Zanden and Wei Gao, Member, IEEE E-mail: [email protected], [email protected], [email protected] Abstract—Mobile devices differ from desktop computers in that they have a limited power source, a battery, and they tend to spend more CPU time on the graphical user interface (GUI). These two facts force us to consider different software approaches in the mobile device kernel that can conserve battery life and reduce latency, which is the duration of time between the inception of an event and the reaction to the event. One area to consider is a software package called the display server. The display server is middleware that handles all GUI activities between an application and the operating system, such as event handling and drawing to the screen. In both desktop and mobile devices, the display server is located in the application layer. However, the kernel layer contains most of the information needed for handling events and drawing graphics, which forces the application-level display server to make a series of system calls in order to coordinate events and to draw graphics. These calls interrupt the CPU which can increase both latency and power consumption, and also require the kernel to maintain event queues that duplicate event queues in the display server. A further drawback of placing the display server in the application layer is that the display server contains most of the information required to efficiently schedule the application and this information is not communicated to existing kernels, meaning that GUI-oriented applications are scheduled less efficiently than they might be, which further increases power consumption.
    [Show full text]
  • Beej's Guide to Unix IPC
    Beej's Guide to Unix IPC Brian “Beej Jorgensen” Hall [email protected] Version 1.1.3 December 1, 2015 Copyright © 2015 Brian “Beej Jorgensen” Hall This guide is written in XML using the vim editor on a Slackware Linux box loaded with GNU tools. The cover “art” and diagrams are produced with Inkscape. The XML is converted into HTML and XSL-FO by custom Python scripts. The XSL-FO output is then munged by Apache FOP to produce PDF documents, using Liberation fonts. The toolchain is composed of 100% Free and Open Source Software. Unless otherwise mutually agreed by the parties in writing, the author offers the work as-is and makes no representations or warranties of any kind concerning the work, express, implied, statutory or otherwise, including, without limitation, warranties of title, merchantibility, fitness for a particular purpose, noninfringement, or the absence of latent or other defects, accuracy, or the presence of absence of errors, whether or not discoverable. Except to the extent required by applicable law, in no event will the author be liable to you on any legal theory for any special, incidental, consequential, punitive or exemplary damages arising out of the use of the work, even if the author has been advised of the possibility of such damages. This document is freely distributable under the terms of the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. See the Copyright and Distribution section for details. Copyright © 2015 Brian “Beej Jorgensen” Hall Contents 1. Intro................................................................................................................................................................1 1.1. Audience 1 1.2. Platform and Compiler 1 1.3.
    [Show full text]
  • ECE 598 – Advanced Operating Systems Lecture 19
    ECE 598 { Advanced Operating Systems Lecture 19 Vince Weaver http://web.eece.maine.edu/~vweaver [email protected] 7 April 2016 Announcements • Homework #7 was due • Homework #8 will be posted 1 Why use FAT over ext2? • FAT simpler, easy to code • FAT supported on all major OSes • ext2 faster, more robust filename and permissions 2 btrfs • B-tree fs (similar to a binary tree, but with pages full of leaves) • overwrite filesystem (overwite on modify) vs CoW • Copy on write. When write to a file, old data not overwritten. Since old data not over-written, crash recovery better Eventually old data garbage collected • Data in extents 3 • Copy-on-write • Forest of trees: { sub-volumes { extent-allocation { checksum tree { chunk device { reloc • On-line defragmentation • On-line volume growth 4 • Built-in RAID • Transparent compression • Snapshots • Checksums on data and meta-data • De-duplication • Cloning { can make an exact snapshot of file, copy-on- write different than link, different inodles but same blocks 5 Embedded • Designed to be small, simple, read-only? • romfs { 32 byte header (magic, size, checksum,name) { Repeating files (pointer to next [0 if none]), info, size, checksum, file name, file data • cramfs 6 ZFS Advanced OS from Sun/Oracle. Similar in idea to btrfs indirect still, not extent based? 7 ReFS Resilient FS, Microsoft's answer to brtfs and zfs 8 Networked File Systems • Allow a centralized file server to export a filesystem to multiple clients. • Provide file level access, not just raw blocks (NBD) • Clustered filesystems also exist, where multiple servers work in conjunction.
    [Show full text]