A Versatile and User-Oriented Versioning File System Kiran-Kumar Muniswamy-Reddy, Charles P
Total Page:16
File Type:pdf, Size:1020Kb
A Versatile and User-Oriented Versioning File System Kiran-Kumar Muniswamy-Reddy, Charles P. Wright, Andrew Himmer, and Erez Zadok Stony Brook University Appears in the proceedings of the Third USENIX Conference on File and Storage Technologies (FAST 2004) Abstract officially recorded in a CVS repository. However, CVS does not work transparently with all applications. File versioning is a useful technique for recording a Another form of versioning is backup tools such as history of changes. Applications of versioning include Legato’s Networker [15] or Veritas’s Backup Exec [27] backups and disaster recovery, as well as monitoring in- and FlashSnap [28]. Modern backup systems include truders’ activities. Alas, modern systems do not include specialized tools for users to browse a history of file an automatic and easy-to-use file versioning system. Ex- changes and to initiate a recovery of file versions. How- isting backup solutions are slow and inflexible for users. ever, backup systems are cumbersome to use, run slowly, Even worse, they often lack backups for the most recent and they do not integrate transparently with all user ap- day’s activities. Online disk snapshotting systems offer plications. Worse, backup periods are usually set to once more fine-grained versioning, but still do not record the a day, so potentially all file changes within the last 24- most recent changes to files. Moreover, existing systems hour period are not backed up. also do not give individual users the flexibility to control Another approach is to integrate versioning into the versioning policies. file system [4, 17, 24, 25]. Developing a native version- We designed a lightweight user-oriented versioning ing file system from scratch is a daunting task and will file system called Versionfs. Versionfs works with any only work for one file system. Instead, we developed file system and provides a host of user-configurable poli- a stackable file system called Versionfs. A stackable cies: versioning by users, groups, processes, or file file system operates at the highest possible layer inside names and extensions; version retention policies and the OS. Versionfs can easily operate on top of any other version storage policies. Versionfs creates file ver- file system and transparently add versioning functional- sions automatically, transparently, and in a file-system ity without modifying existing file system implementa- portable manner—while maintaining Unix semantics. A tions or native on-media structures. Versionfs monitors set of user-level utilities allow administrators to config- relevant file system operations resulting from user ac- ure and enforce default policies: users can set policies tivity, and creates backup files when users modify files. within configured boundaries, as well as view, control, Version files are automatically hidden from users and are and recover files and their versions. We have imple- handled in a Unix-semantics compliant manner. mented the system on Linux. Our performance evalu- To be flexible for users and administrators, Versionfs ation demonstrates overheads that are not noticeable by supports various retention and storage policies. Reten- users under normal workloads. tion policies determine how many versions to keep per file. Storage policies determine how versions are stored. 1 Introduction We define the term version set to mean a given file and Versioning is a technique for recording a history of all of its versions. A user-level dynamic library wrapper changes to files. This history is useful for restoring allows users to operate on a file or its version set without previous versions of files, collecting a log of important modifying existing applications such as ls, rm, or mv. changes over time, or to trace the file system activities Our library makes version recovery as simple as opening of an intruder. Ever since Unix became popular, users an old version with a text editor. All this functionality have desired a versatile and simple versioning file sys- removes the need to modify user applications and gives tem. Simple mistakes such as accidental removal of files users a lot of flexibility to work with versions. (the infamous “rm *” problem) could be ameliorated We developed our system under Linux. Our perfor- on Unix if users could simply execute a single command mance evaluation shows that the overheads are not no- to undo such accidental file deletion. ticeable by users under normal workloads. CVS is one of the most popular versioning tools [2]. The rest of this paper is organized as follows. Sec- CVS allows a group of users to record changes to files in tion 2 surveys background work. Section 3 describes a repository, navigate branches, and recover any version the design of our system. We discuss interesting im- plementation aspects in Section 4. Section 5 evaluates CVFS was designed with security in mind. Each indi- Versionfs’s features, performance, and space utilization. vidual write or small meta-data change (e.g., atime up- We conclude in Section 6 and suggest future directions. dates) are versioned. Since many versions are created, new data structures were designed so that old versions 2 Background can be stored and accessed efficiently. As CVFS was Versioning was provided by some early file systems like designed for security purposes, it does not have facili- the Cedar File System [7] and 3DFS [13]. The main dis- ties for the user to access or customize versioning. advantage of these systems was that they were not com- NTFS version 5, released with Windows 2000, pro- pletely transparent. Users had to create versions manu- vides reparse points. Reparse points allow applications ally by using special tools or commands. CVS [2] is a to enhance the capabilities of the file system without re- user-land tool that is used for source code management. quiring any changes to the file system itself [23]. Several It is also not transparent as users have to execute com- backup products, such as Storeactive’s LiveBackup [26], mands to create and access versions. are built using this feature. These products are specific Snapshotting or check-pointing is another approach to Windows, whereas Versionfs uses stackable templates for versioning, which is common for backup systems. and can be ported to other OSes easily. Periodically a whole or incremental image of a file sys- Clearly, attempts were made to make versioning a part tem is made. The snapshots are available on-line and of the OS. However, modern operating systems still do users can access old versions. Such a system has several not include versioning support. We believe that these drawbacks, the largest one being that changes made be- past attempts were not very successful as they were not tween snapshots can not be undone. Snapshotting sys- convenient or flexible enough for users. tems treat all files equally. This is a disadvantage be- 3 Design cause not all files have the same usage pattern or storage requirements. When space must be recovered, whole We designed Versionfs with the following four goals: snapshots must be purged. Often, managing such snap- Easy-to-use: We designed our system such that a sin- shot systems requires the intervention of the adminis- gle user could easily use it as a personal backup trator. Snapshotting systems include AFS [12], Plan-9 system. This meant that we chose to use per-file [20], WAFL [8], Petal [14], Episode [3], Venti [21], Spi- granularity for versions, because users are less con- ralog [10], a newer 3DFS [22] system, File-Motel [9], cerned with entire file system versioning or block- and Ext3COW [19]. Finally, programs such as rsync, level versioning. Another requirement was that the rdiff, and diff are also used to make efficient incre- interface would be simple. For common operations, mental backups. Versionfs should be completely transparent. Versioning with copy-on-write is another technique Flexibility: While we wanted our system to be easy to that is used by Tops-20 [4], VMS [16], Elephant File use, we also made it flexible by providing the user System [24], and CVFS [25]. Though Tops-20 and VMS with options. The user can select minimums and had automatic versioning of files, they did not handle all maximums for how many versions to store and ad- operations, such as rename, etc. ditionally how to store the versions. The system Elephant is implemented in the FreeBSD 2.2.8 ker- administrator can also enforce default, minimum, nel. Elephant transparently creates a new version of a and maximum policies. file on the first write to an open file. Elephant also pro- Portability: Versionfs provides portable versioning. vides users with four retention policies: keep one, keep The most common operation is to read or write all, keep safe, and keep landmark. Keep one is no ver- from the current version. We implemented Ver- sioning, keep all retains every version of a file, keep safe sionfs as a kernel-level file system so that applica- keeps versions for a specific period of time but does not tions need not be modified for accessing the current retain the long term history of the file, and keep land- version. For previous version access, we use library mark retains only important versions in a file’s history. wrappers, so that the majority of applications do A user can mark a version as a landmark or the system not require any changes. Additionally, no operat- uses heuristics to mark other versions as landmark ver- ing system changes are required for Versionfs. sions. Elephant also provides users with the ability to Efficiency: There are two ways in which we approach register their own space reclamation policies. However, efficiency. First, we need to maximize current ver- Elephant has its own low-level FFS-like disk format and sion performance. Second, we want to use as little cannot be used with other systems.