Richer File System Metadata Using Links and Attributes
Total Page:16
File Type:pdf, Size:1020Kb
Richer File System Metadata Using Links and Attributes Alexander Ames Nikhil Bobb Scott A. Brandt Adam Hiatt [email protected] [email protected] [email protected] [email protected] Carlos Maltzahn Ethan L. Miller Alisa Neeman Deepa Tuteja [email protected] [email protected] [email protected] [email protected] Storage Systems Research Center Jack Baskin School of Engineering University of California, Santa Cruz Abstract clude not only arbitrary, user-specifiable key-value pairs on files but relationships between files in form of links with at- Traditional file systems provide a weak and inadequate tributes. structure for meaningful representations of file interrela- It is now often easier to find a document on the Web tionships and other context-providing metadata. Existing amongst billions of documents than on a local file system. designs, which store additional file-oriented metadata ei- Documents on the Web are embedded in a rich hyperlink ther in a database, on disk, or both are limited by the structure while files typically are not. Companies such as technologies upon which they depend. Moreover, they do Google are able to take advantage of links between Web not provide for user-defined relationships among files. To documents in order to deliver meaningful ranking of search address these issues, we created the Linking File System results using algorithms like PageRank [32]. In contrast, (LiFS), a file system design in which files may have both search tools for traditional file systems do not have infor- arbitrary user- or application-specified attributes, and at- mation about inter-file relationships other than the hierar- tributed links between files. In order to assure performance chical directory structure and ownership of files. when accessing links and attributes, the system is designed The reason for this dearth of relationships between files to store metadata in non-volatile memory. This paper dis- is that the management of file system metadata is expen- cusses several use cases that take advantage of this ap- sive in traditional system architectures where volatile main proach and describes the user-space prototype we devel- memory must be backed by much slower disk-based sec- oped to test the concepts presented. ondary storage. The advent of new non-volatile main mem- ory technologies such as MRAM [8] promises to reduce the cost of accessing file systems in an arbitrary or fine- 1. Introduction grain fashion with the development of novel file system de- signs [16, 27, 49]. Traditional file systems provide a weak and inadequate The promise of non-volatile main memory has prompted structure for meaningful representations of file interrela- designers to use such memory for the persistent storage of tionships and other context-providing metadata. Solving file system metadata. Although memory-resident metadata this problem has become increasingly urgent as users are trivially speeds up certain common file system operations faced with a growing amount of personal data such as (such as stat), a more remarkable benefit is the ability to email, chat communications, digital photography, and on- employ far richer metadata structures. File systems design- line music. Moreover, computational scientists continue ers will no longer be constrained by disk access speed, but to bemoan the lack of mechanisms for cross-archival ac- instead can focus on the needs of the user. Frequent ac- cess, retrofitting of metadata, and identifying groupings of cess, context-aware searches, and other operations that are related results needed for data mining [38, 39]. A content- prohibitively expensive under traditional, disk-oriented file neutral, file system-based mechanism for storage of arbi- system architectures will become feasible. trary metadata provides one solution to this weakness. As The Linking File System’s ability to assign attributes to an application of this principle, we introduce the Linking and establish attributed links between files in a standard- File System (LiFS). It extends file system metadata to in- ized fashion forms a powerful infrastructure capable of sup- 101 porting a variety of different and extremely useful user, ap- linkingfs. plication, and system operations. Attributed files directly pdf dependency: support enhanced file system searches. Attributed links source will support a number of recent efforts to extend hierar- chical directory structures with more user-friendly and per- hierarchy:main linkingfs. sonalized file organizations [18, 28, 31, 35, 43]. Weighted tex mmst05- hierarchy:main links between files can also be used to record access pat- linkingfs terns that are useful for pre-fetching, hoarding, indexing, dependency:included and search result ranking. Indeed, these links provide an abstract model for file interrelationships previously un- hierarchy:main available at a file system level. ieee.cls 2. The Design of LiFS cvs hierarchy:main hierarchy:main The key features of LiFS are links between files and at- hierarchy:main tributes on both files and links. To ensure the performance templates and reliability of LiFS, the design relies on both the non- volatility and low latency of MRAM. At the most basic fast02- level, a search within the file system traverses a series of miller links across a graph of metadata. An in-memory structure is crucial for this operation; random seeks on disk to in- odes, even with some caching, would be prohibitively slow. Figure 1. Example use of links with attributes: The low latency of MRAM will allow metadata operations organizing files into a directory-like hierarchy and searches to be performed almost instantaneously. called “main” and keeping track of depen- dencies between files. 2.1. Links of each key and its corresponding value is also currently Each link in LiFS has a source file, a target file, and a unconstrained. We anticipate putting an efficiency ceiling non-empty set of attributes consisting of key-value pairs. on this size into the final implementation; excessively long LiFS links differ from POSIX links in that LiFS links rep- key and value strings might affect the ability to deliver high resent a relationship between files instead of simply a refer- performance. ence to a file. The attributes of the links express the nature Both the key and value members of an attribute can con- of the relationship. tain arbitrary data, including binary data. This allows appli- Any file can potentially contain a link to any other file. cations to have rich metadata such as thumbnails, preview As a consequence, every file is also a directory and the dis- video clips, and cached printer spool files without the over- tinction between files and directories is eliminated. The tra- head of special encoding. Sharing of metadata via file and ditional notion of containment of a file within a directory is link attributes provides a powerful infrastructure for appli- simply one relationship among many that can be expressed cation integration. with links. The main benefit of attributes is that they enable users, The key benefit of links is that they provide native sup- applications, and the system itself to annotate files and port for a variety of relationships between files that are cur- links. This allows for fast and effective file searching, cat- rently supported in an ad hoc manner by individual appli- egorizing, partitioning, and manipulation, and provides in- cations and the operating system. In addition to contain- frastructure for other features that may not have been con- ment, links can express a variety of other useful relation- sider by the file system designer. It also provides a far ships such as included-in, referenced-by, dependent-upon, richer context in which files can include information about created-by, opened-by, and others. Links also allow for dy- provenance, intended use, type, contents, creator, modifi- namically customizable views of the file system based on cation history, version, and other information that a user, the type of link followed. application, or system may want to keep. We also allow attributes to be executable. A special case 2.2. Attributes of executable file attributes are file triggers. A file trigger on a file specifies a pattern/action pair. A pattern specifies In our initial design both files and links can carry a num- the file system operation (such as a read or write) on the ber of attributes limited only by available memory. The size file with which the executable attribute is associated. An 102 action specifies code that is executed whenever the asso- cites the target. This allows finding documents by specify- ciated operation is invoked on the file. File triggers are a ing “citation paths.” powerful mechanism that simplifies the implementation of Nonetheless, in order for users to properly identify files a wide variety of file system services such as versioning, that they may wish to open, there must be a traversal of mirroring, and others (as discussed in Section 3.5). File identifiable links beginning with a file system entry point to triggers raise a rich set of interesting security and language the target file, akin to root, or a path relative to the current design issues that we are actively investigating. working directory. Unlike conventional file systems, there need not be the same root directory for all users; in fact, a 2.3. Interface user may be able to choose between one of many roots. LiFS supports an enhanced version of the POSIX inter- 2.3.3. Link Accesses, Updates and Deletions In LiFS, face. Here, we present the key examples of the linking API. we introduce API calls to retrieve a single link, perform up- dates to the link, and delete a link. Just as users must open 2.3.1. Link Creation We introduce a new system call, a single file in conventional systems, there are cases where rellink, to create new (relational) links.