A Reliable and Portable Multimedia File System
Total Page:16
File Type:pdf, Size:1020Kb
A Reliable and Portable Multimedia File System Joo-Young Hwang, Jae-Kyoung Bae, Alexander Kirnasov, Min-Sung Jang, Ha-Yeong Kim Samsung Electronics, Suwon, Korea {jooyoung.hwang, jaekyoung.bae, a78.kirnasov}@samsung.com {minsung.jang, hayeong.kim}@samsung.com Abstract being used for various CE devices from per- sonal video recorder (PVR) to hand held cam- corders, portable media players, and mobile In this paper we describe design and imple- phones. File systems for such devices have re- mentation of a database-assisted multimedia quirements for multimedia extensions, reliabil- file system, named as XPRESS (eXtendible ity, portability and maintainability. Portable Reliable Embedded Storage System). In XPRESS, conventional file system metadata Multimedia Extension Multimedia extensions like inodes, directories, and free space infor- required for CE devices are non-linear editing mation are handled by transactional database, and advanced file indexing. As CE devices which guarantees metadata consistency against are being capable of capturing and storing A/V various kinds of system failures. File sys- data, consumers want to personalize media data tem implementation and upgrade are made easy according to their preference. They make their because metadata scheme can be changed by own titles and shares with their friends via in- modifying database schema. Moreover, us- ternet. PVR users want to edit recorded streams ing well-defined database transaction program- to remove advertisement and uninterested por- ming interface, complex transactions like non- tions. Non-linear editing system had been only linear editing operations are developed easily. necessary in the studio to produce broadcast Since XPRESS runs in user level, it is portable contents but it will be necessary also for con- to various OSes. XPRESS shows streaming sumers. Multimedia file system for CE devices performance competitive to Linux XFS real- should support non-linear editing operations ef- time extension on Linux 2.6.12, which indi- ficiently. File system should support file index- cates the file system architecture can provide ing by content-aware attributes, for example the performance, maintainability, and reliability al- director and actor/actress of a movie clip. together. Reliability On occurrence of system fail- ures(e.g. power failures, reset, and bad blocks), 1 Introduction file system for CE devices should be recovered to a consistent state. Implementation of a reli- able file system from scratch or API extension Previously consumer electronics (CE) devices to existing file system are difficult and requires didn’t use disk drives, but these days disks are long stabilization effort. In case of appending 58 • A Reliable and Portable Multimedia File System multimedia API extension (e.g. non-linear edit data consistency. Metadata consistency is to operations), reliability can be a main concern. support transactional metadata operations with ACID (atomicity, consistency, isolation, dura- Portability Conventional file systems have bility) semantics or a subset of ACID. Typical dependency on underlying operating system. file system metadata consist of inodes, directo- Since CE devices manufacturers usually use ries, disk’s free space information, and free in- various operating systems, file system should odes information. Data consistency means sup- be easily ported to various OSes. porting ACID semantics for data transactions as well. If a data transaction to update a portion Maintainability Consumer devices are diverse of file is aborted due to some reasons, the data and the requirements for file system are slightly update is complete or data update is discarded different from device to device. Moreover, file at all. system requirements are time-varying. Main- tainability is desired to reduce cost of file sys- There have been several approaches of imple- tem upgrade and customization. menting file system consistency; log structured file system [11] or journaling file system [2]. In this paper, we propose a multi-media file sys- In log structured file system, all operations are tem architecture to provide all the above re- logged to disk drive. File system is structured quirements. A research prototype named as as logs of consequent file system update op- “XPRESS”(eXtensible Portable Reliable Em- erations. Journaling is to store operations on bedded Storage System) is implemented on separate journal space before updating the file Linux. XPRESS is a user-level database- system. Journaling file system writes twice (to assisted file system. It uses a transactional journal space and file system) while log struc- Berkeley database as XPRESS metadata store. tured file system does write once. However XPRESS supports zero-copy non-linear edit journaling approach is popular because it can (cut & paste) operations. XPRESS shows per- upgrade existing non-journaling file system to formance in terms of bandwidth and IO la- journaling file system without losing or rewrit- tencies which is competitive to Linux XFS. ing the existing contents. This paper is organized as follows. Archi- tecture of XPRESS is described in section 2. Implementation of log structured file system or XPRESS’s database schema is presented in sec- journaling is a time consuming task. There tion 3. Space allocation is detailed in section 4. have been a lot of researches and implemen- Section 5 gives experimental results and discus- tation of ACID transactions mechanism in sions. Related works are described in section 6. database technology. There are many stable We conclude and indicate future works in sec- open source databases or commercial databases tion 7. which provide transaction mechanism. So we decide to exploit databases’ transaction mech- anism in building our file system. Since we 2 Architecture aimed to design a file system with performance and reliability altogether, we decided not to save file contents in database. Streaming per- File system consistency is one of important file formance can be bottlenecked by database if system design issue because it affects overall file contents are stored in db. So, only meta- design complexity. File system consistency data is stored in database while file contents are can be classified into metadata consistency and stored to a partition. Placement of file contents 2006 Linux Symposium, Volume One • 59 on the data partition is guided by multimedia optimized allocation strategy. Storing file system metadata in database makes file system upgrades and customization much easier than conventional file systems. XPRESS handles metadata thru database API and is not responsible for disk layout of the metadata. File system developer has only to design high level database schema and use well defined database API to upgrade existing database. To upgrade conventional file systems, developer should handle details of physical layout of metadata and modify custom data structures. XPRESS is implemented in user level because user level implementation gives high portabil- ity and maintainability. File system source codes are not dependent on OS. Kernel level file systems should comply with OS specific Figure 1: Block Diagram of XPRESS File Sys- infrastructures. Linux kernel level file system tem compliant to VFS layer cannot be ported eas- ily to different RTOSes. There can be a over- head which is due to user level implementa- There are four modules handling file system tion. XPRESS should make system calls to metadata; superblock, directory, inode, and al- read/write block device file to access file con- loc modules. Directory module implements the tents. If file data is sparsely distributed, context name space using a lookup database (dir.db) switching happens for each extent. There was and path_lookup function. The function an approach to port existing user level database looks up the path of a file or a directory through to kernel level[9] and develop a kernel level recursive DB query for each token of the path database file system. It can be improved if us- divided separated by ’/’ character. The flow- ing Linux AIO efficiently. Current XPRESS ing is the simple example for the process of this does not use Linux AIO but has no significant function. Superblock module maintains free in- performance overhead. odes. Inode module maintains the logical-to- physical space mappings for files. Alloc mod- Figure 1 gives a block diagram of XPRESS file ule maintains free space. system. XPRESS is designed to be independent of database. DB Abstraction Layer (DBAL) is In terms of file IO, file module calls inode mod- located between metadata manager and Berke- ule to updates or queries extents.db and ley DB. DBAL defines basic set of interfaces reads or writes file contents. The block de- which modern databases usually have. SQL or vice file corresponding to the data partition (say complex data types are not used. XPRESS has “/dev/sda1”) is opened for read/write mode not much OS dependencies. OSAL of XPRESS at mount time and its file descriptor is saved has only wrappers for interfacing block device in environment descriptor, which is referred file which may be different across operating to by every application accessing the partition. systems. XPRESS does not implement caching but relies 60 • A Reliable and Portable Multimedia File System on Linux buffer cache. XPRESS supports both performance, transaction-protected data man- direct IO and cached IO. For cached IO, the agement services to applications. Berkeley DB block device file is opened without O_DIRECT provides a simple function-call API for data flag while opened with O_DIRECT in case of access and management. Berkeley DB is em- direct IO. bedded in its application. It is linked directly into the application and runs in the same ad- Support for using multiple partitions concur- dress space as the application. As a result, no rently, XPRESS manages mounted partitions inter-process communication, either over the information using environment descriptors. A network or between processes on the same ma- environment descriptor has information about a chine, is required for database operations. All partition and its mount point, which is used for database operations happen inside the library.