HMVFS: a Hybrid Memory Versioning File System

Total Page:16

File Type:pdf, Size:1020Kb

HMVFS: a Hybrid Memory Versioning File System HMVFS: A Hybrid Memory Versioning File System Shengan Zheng, Linpeng Huang, Hao Liu, Linzhu Wu, Jin Zha Department of Computer Science and Engineering Shanghai Jiao Tong University Outline • Introduction • Design • Implementation • Evaluation • Conclusion Introduction • Emerging Non-Volatile Memory (NVM) • Persistency as disk • Byte addressability as DRAM • Current file systems for NVM • PMFS, SCMFS, BPFS • Non-versioning, unable to recover old data • Hardware and software errors • Large dataset and long execution time • Fault tolerance mechanism is needed • Current versioning file systems • BTRFS, NILFS2 • Not optimized for NVM Design Goals • Strong consistency • A Stratified File System Tree (SFST) represents the snapshot of whole file system • Atomic snapshotting is ensured • Fast recovery • Almost no redo or undo overhead in recovery • High performance • Utilize the byte-addressability of NVM to update the tree metadata at the granularity of bytes • Log-structured updates to files balance the endurance of NVM • Avoid write amplification • User friendly • Snapshots are created automatically and transparently Overview • HMVFS is an NVM-friendly log-structured versioning file system • Space-efficient file system snapshotting • HMVFS decouples tree metadata from tree data • High performance and consistency guarantee • POSIX compliant Outline • Introduction • Design • Implementation • Evaluation • Conclusion On-Memory Layout • DRAM: cache and journal Segment Checkpoint Information DRAM Information Node Address Tree Cache Tree Table (NAT Cache ) (CIT) Journal NVM Superblock Superblock Block Segment Node Address Tree (NAT) Checkpoint Blocks (CP) Information Information Table Table Node Blocks Data Blocks (BIT) (SIT) Auxiliary Information Main Area (SFST) Random Writes Sequential Writes • NVM: • Random write zone • Sequential write zone • File system metadata • File metadata and data • Tree metadata • Tree data Index Structure in traditional Log-structured File Systems • Update propagation problem Node Updated blocks Inode block Indirect node Metadata Data Direct block node Triple-indirect Indirect Indirect Indirect Double-indirect … node node node Indirect Inode Single-indirect node Direct pointer Direct Direct Direct Direct Direct Or … … … node node node node node Inline data … … Data …………… Data blockData blockData Data blockData blockData blockData blockData Data blockData Data blockData Data blockData Index Structure without write amplification • Node Address Table Node Updated blocks Inode block Indirect node Metadata Data Direct block node Triple-indirect Indirect Indirect … Indirect Double-indirect node node node Single-indirect Direct pointer Direct Direct Direct Direct Direct Or … … … node node node node node Inline data Node Address Table … … Node-ID Address Data … … n-1 0x38 n 0x420x73 …………… n+1 0x24 Data blockData blockData Data blockData blockData blockData blockData Data blockData Data blockData Data blockData … … Index Structure for versioning • Node Address Table with the dimension of version. Node Updated blocks Inode block Indirect node Metadata Data Direct block node Triple-indirect Indirect Indirect … Indirect Double-indirect node node node Single-indirect Direct pointer Direct Direct Direct Direct Direct Or … … … node node node node node Inline data Node Address Table with Version Version1 Version2 Version3 … … Node-ID Address Address Address Data … … … … n-1 0x14 0x38 0x38 n 0x42 0x42 0x73 …………… n+1 0x24 0x24 0x24 Data blockData blockData Data blockData blockData blockData blockData Data blockData Data blockData Data blockData … … … … How to store different trees space-efficiently Original New Node Address Tree NAT NAT • Node Address Tree (NAT) root root • A four-level B-tree to store multi-version Node Address NAT NAT NAT … Table space-efficiently internal internal internal • Adopt the idea of CoW friendly B-tree NAT NAT NAT NAT … … • NAT leaves contain NodeID-address pairs internal internal internal internal • Other tree blocks in NAT contain pointers to lower level NAT NAT NAT NAT …… … blocks. leaf leaf leaf leaf Node P,1 P,1 Q,1 P,0 Q,1 Indirect Direct Direct Inode node node node E,1 F,1 E,2 F,1 F',1 E,1 F,0 F',1 A,1 B,1 C,1 D,1 A,1 B,1 C,2 D,1 D',1 A,1 B,1 C,1 D,0 D',1 Original snapshot New snapshot Checkpoint CP CP Stratified File System Tree (SFST) block block Node Address Tree NAT NAT • Four different categories of blocks: root root • Checkpoint layer NAT NAT NAT … • Node Address Tree (NAT) layer internal internal internal • Node layer NAT NAT NAT NAT … … internal internal internal internal • Data layer NAT NAT NAT NAT …… … • All blocks from SFST are stored in the main area with leaf leaf leaf leaf log-structured writes Node • Balance the endurance of NVM media Indirect Direct Direct Inode • Each SFST represents a valid snapshot of file system node node node • Share overlapped blocks to achieve space-efficiency Data … … Data block Data block Data Data block Data block Data block Data block Data Original snapshot New snapshot Checkpoint CP CP Stratified File System Tree (SFST) block block Node Address Tree NAT NAT • The metadata of SFST root root • In auxiliary information zone NAT NAT NAT … • Random write updates internal internal internal • Segment Information Table (SIT) NAT NAT NAT NAT … … • Contains the status information of every segment internal internal internal internal • Block Information Table (BIT) NAT NAT NAT NAT …… … • Keeps the information of every block leaf leaf leaf leaf • Update precisely at variable bytes granularity Node • Contains: Indirect Direct Direct Inode • Start and end version number node node node • Block type Data • Node ID … … • Reference count Data block Data block Data Data block Data block Data block Data block Data Garbage Collection in HMVFS • Move all the valid blocks in the victim segment to the current segment • When finished, update SIT and create a snapshot • Handle block sharing problem Version 1 Version 2 Version 3 Version 4 NAT NAT NAT NAT block block block block Segment A Segment B Node Node Node Node Node Block Block Block Block Block 1 2 2 2 2 Outline • Introduction • Design • Implementation • Evaluation • Conclusion Block Information Table (BIT) • Block sharing problem • The corresponding pointer in the parent block must be updated if a new child block is written in the main area • Node ID and block type • Used to locate parent node Type of the block Type of the parent Node ID Checkpoint N/A N/A NAT internal NAT internal Index code in NAT NAT leaf Inode Indirect NAT leaf Node ID Direct Data Inode or direct Node ID of parent node Block Information Table (BIT) • Start and end version number • The first and last versions in which the block is valid • Operations like write and delete set these two variables to the current version number • Reference count • The number of parent nodes which are linked to the block • Update with lazy reference counting • File level operations and snapshot level operations update the reference count • If the count reaches zero, the block will become garbage Original snapshot New snapshot Checkpoint Super CP CP Snapshot Creation Block block block Node Address Tree NAT NAT • Strong consistency is guaranteed root root • Flush dirty NAT entries from DRAM to form a new NAT … NAT NAT Node Address Tree internal internal internal • Follow the bottom-up procedure NAT … NAT … NAT NAT internal internal internal internal • Status information are stored in checkpoint block NAT … NAT … NAT NAT • Space-efficient snapshot leaf leaf leaf leaf • The atomicity of snapshot creation is ensured Node Indirect Direct Direct Inode • Atomic update to the pointer in superblock to announce node node node the validity of the new snapshot Data • Crash during snapshot creation can be recovered by … … undo or redo depend on the validity Data blockData Data blockData blockData blockData blockData blockData Snapshot Deletion • Deletion starts from the checkpoint block • Checkpoint cache is stored in DRAM • Follows the top-down procedure to decrease reference counts • Consistency is ensured by journaling • Call garbage collection afterwards • Many reference counts have decreased to zero P,1 Q,1 P,0 Q,1 E,2 F,1 F',1 E,1 F,0 F',1 A,1 B,1 C,2 D,1 D',1 A,1 B,1 C,1 D,0 D',1 Crash Recovery • Mount the writable last completed snapshot • No additional recovery overhead • Mount the read-only old snapshots • Locate the checkpoint block of the snapshot • Retrieve files via SFST Superblock Checkpoint Checkpoint Checkpoint Checkpoint NAT root NAT root NAT root NAT root … … … … Outline • Introduction • Design • Implementation • Evaluation • Conclusion Evaluation • Experimental Setup • A commodity server with 64 Intel Xeon 2GHz processors and 512GB DRAM • Performance comparison with PMFS, EXT4, BTRFS, NILFS2 • Postmark results • Different read bias numbers HMVFS BTRFS NILFS2 EXT4 PMFS HMVFS BTRFS NILFS2 180 120 160 100 140 ) -1 120 80 2.7x and 2.3x 100 60 sec 80 60 40 40 Efficiency (sec Efficiency 20 20 0 0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Percentage of Reads Percentage of Reads Transaction performance Snapshotting efficiency Evaluation • Filebench results • Fileserver • Different numbers of files HMVFS BTRFS NILFS2 EXT4 PMFS HMVFS BTRFS NILFS2 25 40 35 20 9.7x and 6.6x ) 30 -1 15 sec 25 20 10 15 ops/sec (x1000) ops/sec Efficiency Efficiency ( 10 5 5 0 0 2k 4k 8k 16k 2k 4k 8k 16k Number of Files Number of Files Throughput performance Snapshotting efficiency Evaluation • Filebench results • Varmail • Different depths of directories HMVFS BTRFS NILFS2 EXT4 PMFS HMVFS BTRFS NILFS2 25 12 8.7x and 2.5x 20 10 ) -1 8 15 6 10 4 ops/sec (x1000) ops/sec Efficiency Efficiency (sec 5 2 0 0 0.7 1.2 1.4 2.1 0.7 1.2 1.4 2.1 Directory Depth Directory Depth Throughput performance Snapshotting efficiency Outline • Introduction • Design • Implementation • Evaluation • Conclusion Conclusion • HMVFS is the first file system to solve the consistency problem for NVM-based in-memory file systems using snapshotting.
Recommended publications
  • Online Layered File System (OLFS): a Layered and Versioned Filesystem and Performance Analysis
    Loyola University Chicago Loyola eCommons Computer Science: Faculty Publications and Other Works Faculty Publications 5-2010 Online Layered File System (OLFS): A Layered and Versioned Filesystem and Performance Analysis Joseph P. Kaylor Konstantin Läufer Loyola University Chicago, [email protected] George K. Thiruvathukal Loyola University Chicago, [email protected] Follow this and additional works at: https://ecommons.luc.edu/cs_facpubs Part of the Computer Sciences Commons Recommended Citation Joe Kaylor, Konstantin Läufer, and George K. Thiruvathukal, Online Layered File System (OLFS): A layered and versioned filesystem and performance analysi, In Proceedings of Electro/Information Technology 2010 (EIT 2010). This Conference Proceeding is brought to you for free and open access by the Faculty Publications at Loyola eCommons. It has been accepted for inclusion in Computer Science: Faculty Publications and Other Works by an authorized administrator of Loyola eCommons. For more information, please contact [email protected]. This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. Copyright © 2010 Joseph P. Kaylor, Konstantin Läufer, and George K. Thiruvathukal 1 Online Layered File System (OLFS): A Layered and Versioned Filesystem and Performance Analysis Joe Kaylor, Konstantin Läufer, and George K. Thiruvathukal Loyola University Chicago Department of Computer Science Chicago, IL 60640 USA Abstract—We present a novel form of intra-volume directory implement user mode file system frameworks such as FUSE layering with hierarchical, inheritance-like namespace unifica- [16]. tion. While each layer of an OLFS volume constitutes a subvol- Namespace Unification: Unix supports the ability to ume that can be mounted separately in a fan-in configuration, the entire hierarchy is always accessible (online) and fully navigable mount external file systems from external resources or local through any mounted layer.
    [Show full text]
  • Enabling Security in Cloud Storage Slas with Cloudproof
    Enabling Security in Cloud Storage SLAs with CloudProof Raluca Ada Popa Jacob R. Lorch David Molnar Helen J. Wang Li Zhuang MIT Microsoft Research [email protected] {lorch,dmolnar,helenw,lizhuang}@microsoft.com ABSTRACT ers [30]. Even worse, LinkUp (MediaMax), a cloud storage Several cloud storage systems exist today, but none of provider, went out of business after losing 45% of client data them provide security guarantees in their Service Level because of administrator error. Agreements (SLAs). This lack of security support has been Despite the promising experiences of early adopters and a major hurdle for the adoption of cloud services, especially benefits of cloud storage, issues of trust pose a significant for enterprises and cautious consumers. To fix this issue, we barrier to wider adoption. None of today's cloud stor- present CloudProof, a secure storage system specifically de- age services|Amazon's S3, Google's BigTable, HP, Mi- signed for the cloud. In CloudProof, customers can not only crosoft's Azure, Nirvanix CloudNAS, or others|provide se- detect violations of integrity, write-serializability, and fresh- curity guarantees in their Service Level Agreements (SLAs). ness, they can also prove the occurrence of these violations For example, S3's SLA [1] and Azure's SLA [10] only guar- to a third party. This proof-based system is critical to en- antee availability: if availability falls below 99:9%, clients abling security guarantees in SLAs, wherein clients pay for are reimbursed a contractual sum of money. As cloud stor- a desired level of security and are assured they will receive age moves towards a commodity business, security will be a certain compensation in the event of cloud misbehavior.
    [Show full text]
  • VERSIONFS a Versatile and User-Oriented Versioning File System
    VERSIONFS A Versatile and User-Oriented Versioning File System A THESIS PRESENTED BY KIRAN-KUMAR MUNISWAMY-REDDY TO THE GRADUATE SCHOOL IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE STONY BROOK UNIVERSITY Technical Report FSL-03-03 December 2003 Abstract of the Thesis Versionfs A Versatile and User-Oriented Versioning File System by Kiran-Kumar Muniswamy-Reddy Master of Science in Computer Science Stony Brook University 2003 File versioning is a useful technique for recording a history of changes. Applications of ver- sioning include backups and disaster recovery, as well as monitoring intruders’ activities. Alas, modern systems do not include an automatic and easy-to-use file versioning system. Existing backup systems are slow and inflexible for users. Even worse, they often lack backups for the most recent day’s activities. Online disk snapshotting systems offer more fine-grained versioning, but still do not record the most recent changes to files. Moreover, existing systems also do not give individual users the flexibility to control versioning policies. We designed a lightweight user-oriented versioning file system called Versionfs. Versionfs works with any file system, whether local or remote, and provides a host of user-configurable policies: versioning by users, groups, processes, or file names and extensions; version retention policies by maximum number of versions kept, age, or total space consumed by versions of a file; version storage policies using full copies, compressed copies, or deltas. Versionfs creates file versions automatically, transparently, and in a file-system portable manner—while maintaining Unix semantics. A set of user-level utilities allow administrators to configure and enforce default policies; users are able to set policies within configured boundaries, as well as view, control, and recover files and their versions.
    [Show full text]
  • Metadata Efficiency in Versioning File Systems
    2nd USENIX Conference on File and Storage Technologies, San Francisco, CA, Mar 31 - Apr 2, 2003. Metadata Efficiency in Versioning File Systems Craig A.N. Soules, Garth R. Goodson, John D. Strunk, Gregory R. Ganger Carnegie Mellon University Abstract of time over which comprehensive versioning is possi- ble. To be effective for intrusion diagnosis and recovery, Versioning file systems retain earlier versions of modi- this duration must be greater than the intrusion detection fied files, allowing recovery from user mistakes or sys- latency (i.e., the time from an intrusion to when it is de- tem corruption. Unfortunately, conventional versioning tected). We refer to the desired duration as the detection systems do not efficiently record large numbers of ver- window. In practice, the duration is limited by the rate sions. In particular, versioned metadata can consume of data change and the space efficiency of the version- as much space as versioned data. This paper examines ing system. The rate of data change is an inherent aspect two space-efficient metadata structures for versioning of a given environment, and an analysis of several real file systems and describes their integration into the Com- environments suggests that detection windows of several prehensive Versioning File System (CVFS), which keeps weeks or more can be achieved with only a 20% cost in all versions of all files. Journal-based metadata encodes storage capacity [41]. each metadata version into a single journal entry; CVFS uses this structure for inodes and indirect blocks, reduc- In a previous paper [41], we described a prototype self- ing the associated space requirements by 80%.
    [Show full text]
  • Replication, History, and Grafting in the Ori File System
    Replication, History, and Grafting in the Ori File System Ali Jose´ Mashtizadeh, Andrea Bittau, Yifeng Frank Huang, David Mazieres` Stanford University Abstract backup, versioning, access from any device, and multi- user sharing—over storage capacity. But is the cloud re- Ori is a file system that manages user data in a modern ally the best place to implement data management fea- setting where users have multiple devices and wish to tures, or could the file system itself directly implement access files everywhere, synchronize data, recover from them better? disk failure, access old versions, and share data. The In terms of technological changes, disk space has in- key to satisfying these needs is keeping and replicating creased dramatically and has outgrown the increase in file system history across devices, which is now prac- wide-area bandwidth. In 1990, a typical desktop machine tical as storage space has outpaced both wide-area net- had a 60 MB hard disk, whose entire contents could tran- work (WAN) bandwidth and the size of managed data. sit a 9,600 baud modem in under 14 hours [2]. Today, Replication provides access to files from multiple de- $120 can buy a 3 TB disk, which requires 278 days to vices. History provides synchronization and offline ac- transfer over a 1 Mbps broadband connection! Clearly, cess. Replication and history together subsume backup cloud-based storage solutions have a tough time keep- by providing snapshots and avoiding any single point of ing up with disk capacity. But capacity is also outpac- failure. In fact, Ori is fully peer-to-peer, offering oppor- ing the size of managed data—i.e., the documents on tunistic synchronization between user devices in close which users actually work (as opposed to large media proximity and ensuring that the file system is usable so files or virtual machine images that would not be stored long as a single replica remains.
    [Show full text]
  • A Distributed File System for Distributed Conferencing System
    A DISTRIBUTED FILE SYSTEM FOR DISTRIBUTED CONFERENCING SYSTEM By PHILIP S. YEAGER A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2003 Copyright 2003 by Philip S. Yeager ACKNOWLEDGMENTS I would like to thank Dr. Richard Newman for his help and guidance with this project. I would like to express my gratitude to Dr. Jonathan C.L. Liu and Dr. Beverly Sanders for serving on my committee. I would like to thank Dr. Joseph Wilson for serving as a substitute at my defense. I would also like to thank Vijay Manian and the other DCS group members for their advice and contributions. I thank my parents and friends for their encouragement and support. Finally, I would like to thank Candice Williams for everything she has done. Without these people this work would not have been possible. iii TABLE OF CONTENTS Page ACKNOWLEDGMENTS ................................................................................................. iii LIST OF FIGURES ......................................................................................................... viii ABSTRACT....................................................................................................................... ix CHAPTER 1 INTRODUCTION ........................................................................................................1 1.1 Introduction........................................................................................................1
    [Show full text]
  • How to Copy Files
    How to Copy Files Yang Zhan, The University of North Carolina at Chapel Hill and Huawei; Alexander Conway, Rutgers University; Yizheng Jiao and Nirjhar Mukherjee, The University of North Carolina at Chapel Hill; Ian Groombridge, Pace University; Michael A. Bender, Stony Brook University; Martin Farach-Colton, Rutgers University; William Jannen, Williams College; Rob Johnson, VMWare Research; Donald E. Porter, The University of North Carolina at Chapel Hill; Jun Yuan, Pace University https://www.usenix.org/conference/fast20/presentation/zhan This paper is included in the Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST ’20) February 25–27, 2020 • Santa Clara, CA, USA 978-1-939133-12-0 Open access to the Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST ’20) is sponsored by How to Copy Files Yang Zhan Alex Conway Yizheng Jiao Nirjhar Mukherjee UNC Chapel Hill and Huawei Rutgers Univ. UNC Chapel Hill UNC Chapel Hill Ian Groombridge Michael A. Bender Martín Farach-Colton William Jannen Pace Univ. Stony Brook Univ. Rutgers Univ. Williams College Rob Johnson Donald E. Porter Jun Yuan VMware Research UNC Chapel Hill Pace Univ. Abstract Docker make heavy use of logical copies to package and deploy applications [34, 35, 37, 44], and new container cre- Making logical copies, or clones, of files and directories ation typically begins by making a logical copy of a reference is critical to many real-world applications and workflows, directory tree. including backups, virtual machines, and containers. An ideal Duplicating large objects is so prevalent that many file sys- clone implementation meets the following performance goals: tems support logical copies of directory trees without making (1) creating the clone has low latency; (2) reads are fast in full physical copies.
    [Show full text]
  • Abréviations De L'informatique Et De L'électronique/Version Imprimable
    Abréviations de l'informatique et de l'électronique/Version imprimable... https://fr.wikibooks.org/w/index.php?title=Abréviations_de_l'informat... Un livre de Wikilivres. Une version à jour et éditable de ce livre est disponible sur Wikilivres, une bibliothèque de livres pédagogiques, à l'URL : http://fr.wikibooks.org/wiki/Abr%C3%A9viations_de_l%27informatique_et_de_l %27%C3%A9lectronique Vous avez la permission de copier, distribuer et/ou modifier ce document selon les termes de la Licence de documentation libre GNU, version 1.2 ou plus récente publiée par la Free Software Foundation ; sans sections inaltérables, sans texte de première page de couverture et sans Texte de dernière page de couverture. Une copie de cette licence est inclue dans l'annexe nommée « Licence de documentation libre GNU ». A AA : Anti-Aliasing (Anticrènelage) AAA : Authentication Authorization Accounting AAC : Advanced Audio Coding AAD : Analogique-Analogique-Digital ABAP : Allgemeiner Berichtsaufbereitungsprozessor, (en anglais : Advanced Business Application Programming) Langage dans SAP Ack : Acknowledge character (Acquittement) AC97 : Audio Codec '97 (Audio Codec 1997) ACE Access Control Entry, voir Access Control List Adaptive Communication Environment, voir anglais Adaptive Communication Environment ACE ACL : Access Control List 1 sur 36 14/07/2015 00:18 Abréviations de l'informatique et de l'électronique/Version imprimable... https://fr.wikibooks.org/w/index.php?title=Abréviations_de_l'informat... ACPI : advanced configuration and power interface ACRONYM
    [Show full text]
  • A Hybrid Memory Versioning File System
    HMVFS: A Hybrid Memory Versioning File System Shengan Zheng, Linpeng Huang, Hao Liu, Linzhu Wu, Jin Zha Department of Computer Science and Engineering Shanghai Jiao Tong University Email: fvenero1209, lphuang, liuhaosjtu, wulinzhu, [email protected] Abstract—The byte-addressable Non-Volatile Memory (NVM) Versioning, snapshotting and journaling are three well offers fast, fine-grained access to persistent storage, and a known fault tolerance mechanisms used in a large number large volume of recent researches are conducted on devel- of modern file systems. Journaling file systems use journal oping NVM-based in-memory file systems. However, existing to record all the changes and commit these changes after approaches focus on low-overhead access to the memory and rebooting. Versioning file systems allow users to create con- only guarantee the consistency between data and metadata. In sistent on-line backups, roll back corruptions or inadvertent this paper, we address the problem of maintaining consistency changes of files. Some file systems achieve versioning by among continuous snapshots for NVM-based in-memory file keeping a few versions of the changes to single file and systems. We propose a Hybrid Memory Versioning File System directory, others create snapshots to record all the files in (HMVFS) that achieves fault tolerance efficiently and has low the file system at a particular point in time [5]. Snapshotting impact on I/O performance. Our results show that HMVFS file systems provide strong consistency guarantees to users provides better performance on snapshotting compared with for their ability to recover all the data of the file system to a the traditional versioning file systems for many workloads.
    [Show full text]
  • The File System Interface Is an Anachronism
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Harvard University - DASH The File System Interface is an Anachronism The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Ellard, Daniel. 2003. The File System Interface is an Anachronism. Harvard Computer Science Group Technical Report TR-15-03. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:25619463 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA ✂✁☎✄✝✆✟✞✡✠☛✄✌☞✎✍✑✏✓✒✔✄✖✕ ✗✓✘✙✒✔✄✛✚✢✜✤✣✦✥✧✄★✞☛✏✩✣✪✘ ✫ ✘☎✣✦✥✬✁☎✚✢✭✙✘✮✞☛✏✔✕ ✯✱✰✳✲✦✴✶✵✸✷✑✹✺✷✶✷✻✰✳✼✾✽ ✿✟❀❂❁❄❃✔❅❆❁❈❇❊❉ ❋❂●❊❍✩■✪❏▲❑ ●❊❏❚■ ✵▼✼✟◆❖▼✴✶✵✸✲✦❖▼✵❘◗❙✼ ❑❈❫ ❯❙✰✬✼❲❱✳✰✬✼❳✽❩❨❬✲✦✴❭❱✧✵✸✼✾❪❲✴ ❋ ❍❵❴ ❏ ❑❲❑ ✰ ✼❳✴✶✽❚❛❊✵❝❜✦❞✝✰❝❪❲❪❳✰✳❖❄❡ ❪❢✵ ❪ The File System Interface is an Anachronism Daniel Ellard [email protected] November 18, 2003 Abstract Contemporary file systems implement a set of abstractions and semantics that are suboptimal for many (if not most) purposes. The philosophy of using the simple mechanisms of the file system as the basis for a vast array of higher-level mechanisms leads to inefficient and incorrect implementations. We propose several extensions to the canonical file system model, including explicit support for lock files, indexed files, and resource forks, and the benefit of session semantics for write updates. We also discuss the desirability of application-level file system transactions and file system support for versioning. 1 Introduction The canonical UNIX file system interface is elegant, ubiquitous, and suboptimal. The seductive appeal of such a clean and minimal abstraction is undeniable– and its simplicity and generality has unquestionably contributed to the success of operating systems such as UNIX [14], which in turn made this model pervasive.
    [Show full text]
  • J. Parallel Distrib. Comput. HMVFS: a Versioning File System on DRAM/NVM Hybrid Memory
    J. Parallel Distrib. Comput. 120 (2018) 355–368 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc HMVFS: A Versioning File System on DRAM/NVM Hybrid Memory Shengan Zheng, Hao Liu, Linpeng Huang *, Yanyan Shen, Yanmin Zhu Shanghai Jiao Tong University, China h i g h l i g h t s • A Hybrid Memory Versioning File System on DRAM/NVM is proposed. • A stratified file system tree maintains the consistency among snapshots. • Snapshots are created automatically, transparently and space-efficiently. • Byte-addressability is utilized to improve the performance of snapshotting. • Snapshot efficiency of HMVFS outperforms existing file systems by 7.9x and 6.6x. article info a b s t r a c t Article history: The byte-addressable Non-Volatile Memory (NVM) offers fast, fine-grained access to persistent storage, Received 9 January 2017 and a large volume of recent researches are conducted on developing NVM-based in-memory file systems. Received in revised form 12 October 2017 However, existing approaches focus on low-overhead access to the memory and only guarantee the Accepted 27 October 2017 consistency between data and metadata. In this paper, we address the problem of maintaining consistency Available online 16 November 2017 among continuous snapshots for NVM-based in-memory file systems. We propose an efficient versioning Keywords: mechanism and implement it in Hybrid Memory Versioning File System (HMVFS), which achieves fault Versioning tolerance efficiently and has low impact on I/O performance. Our results show that HMVFS provides better Checkpoint performance on snapshotting and recovering compared with the traditional versioning file systems for Snapshot many workloads.
    [Show full text]
  • Fist : Alanguagefors Ta Ckablefilesystems
    Proceedings of 2000 USENIX Annual Technical Conference San Diego, California, USA, June 18–23, 2000 F I S T : A L A N G U A G E F O R S TA C K A B L E F I L E S Y S T E M S Erez Zadok and Jason Nieh THE ADVANCED COMPUTING SYSTEMS ASSOCIATION © 2000 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. FiST: A Language for Stackable File Systems Erez Zadok and Jason Nieh Computer Science Department, Columbia University ezk,nieh ¡ @cs.columbia.edu Abstract Media Common Avg. Code Size Type File System (C lines) Traditional file system development is difficult. Stack- Hard Disks UFS, FFS, EXT2FS 5,000–20,000 able file systems promise to ease the development of file Network NFS 6,000–30,000 systems by offering a mechanism for incremental develop- CD-ROM HSFS, ISO-9660 3,000–6,000 ment. Unfortunately, existing methods often require writ- Floppy PCFS, MS-DOS 5,000–6,000 ing complex low-level kernel code that is specific to a sin- gle operating system platform and also difficult to port. Table 1: Common Native Unix File Systems and Code Sizes for Each Medium We propose a new language, FiST, to describe stackable file systems.
    [Show full text]