Lab2 Handout
Total Page:16
File Type:pdf, Size:1020Kb
CS 194-24 Lab 2: fs Vedant Kumar, Palmer Dabbelt February 27, 2014 Contents 1 Getting Started 2 2 lpfs Structures and Interfaces 3 3 The Linux VFS Layer 3 3.1 Operation Tables . .5 3.2 Inode Cache . .5 3.3 Directory Cache . .5 4 The Linux Block Layer 5 4.1 Page Cache . .6 4.2 Device Mapper . .6 5 Other Useful Kernel Primitives 6 5.1 Slab Allocation . .6 5.2 Work Queues . .7 5.3 The RCU Subsystem . .7 5.4 Wait Queues . .7 6 Schedule and Grading 7 6.1 Design Document . .7 6.2 Checkpoint 1 . .8 6.3 Checkpoint 2 . .8 6.4 Checkpoint 3 . .9 6.5 Evaluation . .9 1 CS 194-24 Spring 2013 Lab 2: fs For this lab you will implement a filesystem that supports efficient snapshots, copy-on-write updates, encrypted storage, checksumming, and fast crash recovery. Our goal is to give you a deeper understanding of how real filesystems are designed and implemented. We have provided the on-disk data structures and some support code for a log-structured filesystem (lpfs). You can either build on top of the distributed code or implement a novel, feature-equivalent design. The first rule of kernel programming may well be \don't mess with the kernel", which is why we've built fsdb. The idea here is to run your kernel code in userspace via a thin compatibility layer. This gives you the chance to debug and test in a relatively forgiving environment. You will extend fsdb to host ramfs as well as your own filesystem. 1 Getting Started Pull the latest sources from the class project repo. You should see some new directories: • lpfs: A filesystem skeleton. The compatibility layer also lives in here. • ramfs: A compact version of linux/fs/ramfs. Note that ramfs/compat.c is symlinked to lpfs/compat.c. You can mount a ramfs by running mount -t ramfs ramfs /mnt. • userspace: Miscellaneous tools to help build and debug your filesystem. Run sudo make fsdb. You should see some interesting output. A reduced version follows: dd if=/dev/zero of=.lpfs/disk.img bs=1M count=128 make reset_loop .lpfs/mkfs-lp /dev/loop0 Disk formatted successfully. .lpfs/fsdb /dev/loop0 snapshot=0 (info) lpfs: mount /dev/sda, snapshot=0 Registered filesystem |fsdb> The build system creates, mounts and formats a disk image for you. It uses a loop device to accomplish this. Then it invokes fsdb on your new disk, leaving you ready to debug. Since we're relying on the build system to do some interesting work, it's crucial that you thoroughly understand */Makefile.mk. You may occasionally need to extend the build system, so reading through these files early on is worthwhile. Let's make a small modification to lpfs to see how everything works. Go to the bottom of lpfs/struct.h and uncomment the LPFS DARRAY TEST macro. This will cause the filesystem to run sanity checks on its block layer abstraction code (`darray') instead of actually mounting. Now when you run sudo make fsdb, you should see this: (info) lpfs: mount /dev/sda, snapshot=0 (info) lpfs: Starting darray tests. (info) lpfs: darray tests passed! (info) Note: proceeding to graceful crash... Looks like the tests pass in userspace. The next step is to run them in the kernel to make sure this wasn't a fluke. Run make linux, then ./boot qemu, and finally mount -t lpfs /dev/sda /mnt in the guest's shell. If you see the same success messages, feel free to do a happy hacker dance. Life is short. 2 CS 194-24 Spring 2013 Lab 2: fs 2 lpfs Structures and Interfaces In an attempt to make this lab manageable, we've designed a set of on-disk structures that define lpfs. These structures are defined in lpfs/lpfs.h. If you need to modify these structures, make sure that you also update the formatting program (lpfs/mkfs-lp.c). Failure to do this will result in corrupted images. The main structure you'll find inside here is struct lp superblock fmt, which defines the on-disk format of an lpfs superblock. Superblocks are a concept that exist in most UNIX-derived systems. The superblock is the first block in the on-disk filesystem image and contains all the information necessary to initialize a filesystem image. This block is loaded when the OS attempts to mount a block device using a particular filesystem implementation and is parsed by the filesystem implementation. As you can probably see from the superblock structure, lpfs is a log-structured filesystem. LFS, the first log-structured file system, is described in a research paper online http://www.cs.berkeley. edu/~brewer/cs262/LFS.pdf. lpfs largely follows the design of LFS: data is stored in segments that are written serially, the SUT contains segment utilization information, and garbage collection must be performed to free segments for later use. The one major difference is that lpfs uses a statically-placed journal instead of LFS's dynamic journal. The goal here is to aid crash recovery { if the journal is static then it should be easier to find. Another minor difference is that lpfs supports snapshots. Effectively what this means is that you can make a system call that tells lpfs to keep around an exact copy of the filesystem at some particular point in time. This maps well to log-structured filesystems lpfs/lpfs.h also contains the on-disk structures that describe files, directories, and journal entries. These pretty much mirror the structure of a traditional UNIX filesystem, most of the interesting bits in lpfs are in the log. lpfs/struct.h summarizes the important interfaces the filesystem relies on. You will notice that much of the code (including the entire transaction system and some of the mount logic) is far from complete. You will need to implement all of this. lpfs/inode.c takes care of loading and filling in batches of inodes. lpfs/inode map.c tracks inode mappings: these objects define a snapshot by specifying an on-disk byte address for every live inode. lpfs/darray.c implements an abstraction on top of the buffer head API. It presents a picture of a segment as a contiguous array, handles locking, and can sync your buffers to disk. The problem with darray is that the buffer head interface is quite bloated. When the rest of your filesystem is done, you should rewrite darray using the lighter bio interface. 3 The Linux VFS Layer Filesystems are one of the more complicated aspects of an operating system (by lines of code, only drivers/ and arch/ are bigger than fs). Luckily for you, Linux provides something known as the VFS (Virtual Filesystem Switch) layer that is designed to help manage this complexity. In Linux, all1 filesystems are implemented using the VFS layer. Due to the fact that UNIX is designed to map pretty much every operation to the filesystem, the VFS layer plays a central role in Linux. Figure 1 shows exactly where the VFS layer lives and how it plugs into the rest of Linux. Linux's VFS documentation is very good and can be found at linux/Documentation/filesystems/vfs.txt. You will need to read this document to complete this lab. In that directory you'll also find documenta- tion for other filesystems which may or may not be useful { VFS was kind of hacked on top of an early UNIX filesystem, which still more-or-less exists as ext2 in Linux today. 1There's also FUSE, which maps VFS calls into userspace { but FUSE itself hooks into VFS so I think it still counts. 3 CS 194-24 Spring 2013 Lab 2: fs Figure 1: A map of Linux's VFS layer 4 CS 194-24 Spring 2013 Lab 2: fs 3.1 Operation Tables The primary means of interfacing your filesystem with the VFS layer consists of filling out operation tables with callbacks that will be used to perform operations that are specific to your filesystem. There are three of these tables: struct super operations which defines operations that are global across your filesystem, struct inode operations which defines methods on inodes, and struct file operations which defines operations that are local to a particular file. This split is largely historical: on the original UNIX directories and files were both accessed via the same system calls, so a different function was required to differentiate between the two. The simplest disk-based filesystem I know of is ext2, which you can find in the Linux sources (linux/fs/ext2/). If you look at how they define these operation tables, you'll notice that a significant fraction of them can be filled out using generic mechanisms provided by Linux { you'll want to take advantage of this so you can avoid re-writing a whole bunch of stuff that already works. 3.2 Inode Cache Linux's VFS layer was designed with traditional UNIX filesystems in mind, and as such has a number of UNIX filesystem concepts baked into it. You've already seen one of these with the whole directory/file distinction, but another important one is that the VFS layer directly talks to your filesystem in terms of inodes. While this was probably originally a decision that stemmed from Sun's attempts at hacking NFS into their UNIX, today it has important performance implications: specifically that Linux will cache inodes in something (quite sensibly) known as the inode cache. The VFS layer handles the majority of the inode caching logic for you.