
PennFS: A File System on Relational Database Dept. of CIS - Senior Design 2012-2013 Jin You Di Mu Hongda Ma [email protected] [email protected] [email protected] Univ. of Pennsylvania Univ. of Pennsylvania Univ. of Pennsylvania Philadelphia, PA Philadelphia, PA Philadelphia, PA Boon Thau Loo [email protected] Univ. of Pennsylvania Philadelphia, PA ABSTRACT collaboration system. PennFS, a Windows 8 Store Application, aims to visualize The current file system made many attempts to solve the and simplify the user experience of file management by im- problems. For example, to reduce the time of going deep plementing an alternative file system that is better indexed to the folder of folders, current file system allows the user and organized through relational database. The traditional to create shortcut to files and favorite folders on desktop. hierarchical file system has several limitations; especially, Also, current file system accelerates file searching by allow- when files share multiple relational properties, one file may ing search by keyword. However, these solutions are still relate to multiple folders, and the primary search method limited by the disadvantage of tree structure in current file using keywords is not universally efficient. To resolve this system. problem, a tagging system is implemented using database de- A customized view based on a relational database is an sign principles to index the filesystem. In PennFS, a new ideal way to resolve this problem. Using a relational database application layer based on relational database and Language- to manage the file information, the limitations of the tree integrated Query (LINQ) is introduced to coordinate with a structure in hierarchical file system can be mitigated. After user interface built with C#/Extensible Application Markup users attach multiple tags onto a file, tags can be digested Language (XAML) [9], local file system accessed using Win- by the relational database as indexes to facilitate the search dows Software Development Kit (SDK) for Windows 8 [10], process. One advantage brought by this tagging system is and an email system accessed using Limilab application pro- that only one copy of file is required in the system to present gramming interface (API) [6]. files in multiple views. In addition, database layer is able to translate and forward a user query to search for data in both local file system and email. At last, since the relational 1. INTRODUCTION database stores only file metadata, it allows easier integra- tion with other platforms. In almost every mainstream operating system, a hierar- To implement the above idea, PennFS integrates the user chical file system with a tree structure and extra utilities interface using Visual Studio 2012 with Blend [8], a local file allowing search by keyword is used to store files and doc- access system using Windows SDK for Windows 8, a Gmail uments. This structure has remained unchanged for many file system access system built with Limilab API, with a years while the daily usage of computers soars and the com- database management system using LINQ. PennFS builds a plexity of file relationships increases, so it is very helpful to relational layer on top of the hierarchical file system and con- develop new ways to facilitate file searches. nects the database includes Windows 8 file system metadata Several problems are innate in traditional hierarchical file so that files under the same tag, including user-defined tag system which is based on the tree data structure. First, and system-recorded tag, could be searched and managed by since the files are always treated as leaves of a tree, as the LINQ queries. Secondly, metadata from online email service number of files increases, users have to build complicated and online file storage platforms is recorded in the database. tree structure to classify the data manually, thus extending Examples include email metadata from Gmail. In addition the time of file management. Secondly, the more relation- to the feature of searching for files, PennFS supports search- ships the files have, the deeper and more complex the tree ing of tags from the database. With this feature, when becomes. In many cases, since the files are hidden deep in searching with keywords, users could avoid placing multi- the tree structure, search for files become very inefficient. ple queries in various products such as mailboxes and cloud Thirdly, files sharing common features may scatter in many storage. places in the tree structure, so users who want multiple files at the same time have to search in many locations to collect all the files. Fourthly, users have to save duplicate files un- 2. RELATED WORK der different folders if they want to organize files according This section discusses related work includes the origin of to multiple relationships. Finally, the current file system the idea of relational file system, WinFS, an attempt of does not support management of email and files in online implementation database file system on Windows platform, tagging system in Gmail, and database connection. to SQL maps the data model of a relational database to an object model expressed in the programming language of 2.1 Idea of Relational File System the developer, which is C# in our application. It translates Hierarchical file system is the most traditional and commonly- developing programming language into SQL the language- used file system. With a tree-like data structure, it is con- integrated queries in the object model and sends them to venient for users to navigate at front-end and easy for de- the database for execution at run-time. Also, LINQ to SQL velopers to implement at back-end. However, hierarchical translates the return result from database back to objects database has its limit; it is too specific and restrictive on so that developer can work with developing programming defining a file since one file can only belong to one folder. language. Many theoretical and practical works have been done to im- prove the file storage and management. Here we want to 3. SYSTEM MODEL discuss the idea of building a file system based on relational database principles. In 2005 File and Storage Technologies Conference, a suggestion that filesystem should borrow ideas from database was presented by Dr. Jim Gary [3]. Later in his technical report published in 2006, he argues that, unlike what previous folklore suggests, file system performance may be improved by using database techniques to handle small files [12]. 2.2 Relational Filesystem on Windows On Windows platform, the project that most closely re- sembles our approach is WinFS, an advanced storage sub- system project for Windows platform. WinFS, developed but later cancelled by Microsoft in 2003, was intended to be a revolutionary file storage system which empowers users to search and manage files based on content [5]. It introduces the idea that everything is an item, and each item has meta- data properties that are described by a schema. Unlike the Figure 1: System Model Diagram small projects for Unix mentioned above, WinFS intends to change the structure of file storage to allow a single item to PennFS builds an alternative file system on top of the ex- exist in more than one WinFS folder. PennFS adopts some isting New Technology File System (NTFS) system in Win- ideas of WinFS. First, the WinFS UI, unlike the traditional dows operating system. As shown in Figure 1, it consists of hierarchical folder view, is very friendly to users with com- five major components including user interface, API infras- plex projects by embodying many data visualization ideas tructure, file systems, email, and relational database. not traditionally in hierarchical file system, such as timeline i) User Interface: The user interface, a Windows 8 Store and graph. Also, the searching in WinFS is a process of Application, presents the file system view in a clean and organizing the relationships across the data and connecting user-friendly fashion. It allows user to create and assign tags them together. to file from local file system and emails. User should be able to configure the backend, set email accounts. This module 2.3 Tagging System: Gmail also takes different user requests such as search and files The tagging system in Gmail, one of the most popular access, and sends it to the API infrastructure. After hearing email services provided by Google, a simple and efficient way back from the backend, the user interface will visualize the to organize messages. With two important features, Gmail's search results and file previews. tagging system inspires us to build a similar system in our ii) API infrastructure: the API infrastructure layer glues application. First, it does not limit the number and names all the other four components together. It takes requests of labels; users are able to classify a great number of emails from the user interface and propagates the query to the rest into categories like work, family, to do, read later and any of the system after proper translation of the request. At category users want. Secondly, while labels do all the work the same time, it collects the results and return to the user that folders do, users are able to add more than one label interface. Since this component is doing all the heavy lift- to one message. Thus, we decided to build a tagging system ing, the load for the user interface could be largely reduced. in our application to help the user to classify and find the In other word, minimum modification needs to be made in files more conveniently. Especially in the searching process, the user interface when we are extending our application to the tags can be transformed to search query directly; as a be compatible with new platforms such as new online email result, users with little background of database are able to folder and collaboration folder.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-