
A Model, Schema, and Interface for Metadata File Systems Stijn Dekeyser Richard Watson Lasse Motrøen University of Southern Queensland, Australia {dekeyser,rwatson}@usq.edu.au, [email protected] Abstract particular offer of a course. However, a query that seeks to find all submissions for a particular student Modern computer systems are based on the tradi- in a given semester is not supported. tional hierarchical file system model, but typically To further explore these problems, consider the contain large numbers of files with complex interre- following common scenario. Bill has a multitude of lationships. This traditional model is not capable of music and image files on his personal computer and meeting the needs of current computer system users, wants to organise his collection such that he can find who need to be able to store and retrieve files based and relate files easily. Using a traditional folder ap- on flexible criteria. A metadata file system can asso- proach leads to various problems. The multimedia ciate an extensive and rich set of data with a file, thus files can be placed in folders named according to sev- enabling more effective file organisation and retrieval eral properties such as genre, year, band name, and than traditional file systems. location of photo. As discussed above, using hier- In this paper we review a wide range of existing archical folders means that Bill loses the ability to proposals to add metadata to files and make that search for files from different perspectives. He could metadata available for searching. We then propose populate the folders with soft links (or shortcuts) to a hierarchy of definitions for metadata file systems the actual music files, but this would create an un- based on the reviewed prototypes. We introduce a acceptable burden of managing such links. Bill has data model for a database-oriented pure mdfss com- installed third-party applications such as Google Pi- plete with operations and semantics. The model sup- casa (for his image files) and RealPlayer (for his mu- ports user-initiated instance and schema updates and sic files). These applications manage the organisation file searches based on structured queries. We also of files into groups based on the value of a property explore the design space of a set of user interface op- like “genre”, which addresses the shortcoming of the erations intended to implement the pure model and folder approach. However it offers no solution if Bill facilitate the capturing of rich metadata. We argue wishes to link an image file to to a music file or if he that without such a simple method for users to cre- wants to add his own metadata fields to a file. ate rich metadata, progress in this field will remain This scenario demonstrates that organising mul- limited. timedia using a traditional hierarchical file system, even when enhanced with specific applications, often Keywords: Operating systems, Advanced applica- proves to be impractical. The problem is not limited tions of databases, Metadata. to multimedia as every type of file can have a large collection of metadata associated to it which can be 1 Introduction used to organise the file space. Traditional file systems store simple file metadata; Problem Statement Simply stated, the first prob- a predefined set of data, mostly maintained by the lem we address is that users must be able to manage operating system, is held in directories and file files such that they can be located effectively at some control blocks (e.g. inodes). Apart from assigning future time. We need to be able to search for a file file names, users can effectively specify metadata using multiple pathways (or search criteria). For ex- by creating a directory hierarchy. The file path ample, we may use keywords that have been auto- may encode some metadata. For instance the matically extracted from the file, or attribute values path courses/csc2404/07/s2/ass1/1234/sync.c (assigned by system or user), or links to related files, assigns the following attributes to the file sync.c: to seek the target file. A design for a metadata file course=csc2404, year=2007, semester=2, stu- system must include both the metadata storage model dentId=1234, assignmentNum=1, filename=sync, and appropriate user interfaces to allow a user to eas- filetype=Csource. The ability to search based on ily locate a file based on its metadata. attributes is limited as these attributes are stored Critically, the second problem that we address is hierarchically, and accessed via a path specification. that a successful metadata file system must feature a It is a simple matter to build a search query that user interface that allows users to easily assign mean- specifies all attributes in a file’s path; this will yield ingful and rich metadata. Requiring the user to create all files in a directory. In our example, it is easy every piece of metadata through keyboard entry will to locate all student assignment submissions for a almost certainly impede the adoption of such poten- tially revolutionary systems. Copyright c 2008, Australian Computer Society, Inc. This pa- per appeared at the Thirty-First Australasian Computer Sci- ence Conference (ACSC2008), Wollongong, Australia. Con- Existing Work Recently the advent of social net- ferences in Research and Practice in Information Technology working websites that let users share images (e.g. (CRPIT), Vol. 74, Gillian Dobbie and Bernard Mans, Ed. Re- Flickr) and video (e.g. YouTube) has demonstrated production for academic, not-for profit purposes permitted pro- novel ways of organising multimedia. Such applica- vided this text is included. tions use the simple concept of tags to let users assign metadata to their files, and allow others to search for associated with a file. This can only be done by al- files easily. On the users’ own computers, more ad- tering the file itself which will result in re-indexation. vanced applications such as Picasa and Google Desk- Windows Desktop Search is a similar system, based top offer automated collection of metadata and use on the research prototype Stuff I’ve Seen (SIS) [7]. localised databases to store metadata and use it in search. Solutions proposed by researchers in the past MIT Semantic File System The MIT Semantic decade took a more comprehensive approach by ex- File System [11] is one of the first file systems to ad- tending file systems with metadata functionality. On dress the shortcomings of traditional tree structured the commercial side, Microsoft is attempting1 to im- file systems. The main aim of the MIT Semantic File plement a metadata file system called WinFS. We re- System (SFS) is to allow users to access files based view these efforts in Section 2. on file content, as well as accessing files by name. MIT SFS is designed to be integrated into a tree Contribution It is clear that various approaches to structured file system and it does so through the con- create, manage, and use metadata for files are being cept of virtual directories. Each virtual directory is considered and developed, and that there is no single interpreted as a query and contains symbolic links to solution currently available that has wide adoption or the actual files stored in the underlying file system. satisfactorily solves all issues. In this paper we review In order for SFS to provide file access based on file a wide range of existing proposals to add metadata content (to make use of virtual directories as queries) to files and make that metadata available for search- the content of a file needs to be extracted. SFS does ing. We then propose a taxonomy for metadata file this by associating each file type with a transducer systems based on the reviewed prototypes. We in- program that will extract the relevant metadata from troduce a data model for a database-oriented pure files in the system. Each file type will have a specific mdfs complete with operations and semantics. We transducer, and each transducer will be specifically explore a number of interesting and non-trivial issues designed to extract desired attributes and values from that must be solved before a full-scale pure mdfs can a file type. For example, a transducer for an email be implemented. We also discuss two prototype im- file may extract attributes “To”, “From” and “Sub- plementations of our model and outline user interface ject”. MIT SFS comes with a set of default transduc- interactions to capture rich metadata. ers that can handle the most common file types, but As evidenced by the fact that a major software users are also able to implement their own transduc- company has not been able to deliver one after many ers. A transducer table is used to determine which years of work, it is clear that creating a truly useful transducer to use for a certain file type. and powerful mdfs is a daunting task. The problems Gifford et al. [11] outline some of the shortcom- are likely not only technical, but also of a more human ings of MIT SFS. The first point mentioned is that of nature (complexity for users, compatibility issues for the query language that each virtual directory can be businesses, etc). We therefore present our work as a associated with. MIT SFS offers only a basic query modest step and as a basis for future extensions. language that prohibits users from using boolean op- Note that the work presented in this paper is, erators (such as ‘OR’, ‘AND’, etc.) to specify their within the context of computer science, of a highly queries. Users are also unable to assign metadata to multidisciplinary nature, drawing on results from files manually.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-