Multi-Process Systems: • Information Marshalling and File System Unmarshalling • Block Management • Directory Implementation

Total Page:16

File Type:pdf, Size:1020Kb

Multi-Process Systems: • Information Marshalling and File System Unmarshalling • Block Management • Directory Implementation What we have discussed before • The file abstraction • File manager and file descriptors Multi-Process Systems: • Information marshalling and File system unmarshalling • Block management • Directory implementation Operating Systems and Distributed Systems Operating Systems and Distributed Systems 2 What we have discussed before What we have discussed before Fid = open("MyInput.txt", O_RDONLY)) numRead = read(Fid, buf, BUF_LEN)) Operating Systems and Distributed Systems Operating Systems and Distributed Systems The long-term information What we will learn storage problem • Must store large amounts of data • Files – Gigabytes -> terabytes -> petabytes • Directories & naming • Stored information must survive the • File system implementation termination of the process using it • Example file systems – Lifetime can be seconds to years – Must have some way of finding it! • Multiple processes must be able to access the information concurrently Operating Systems and Distributed Systems 5 Operating Systems and6 Distributed Systems What is a file system? The Linux file system • An organization of data and metadata on a storage device • The Linux file system architecture is an interesting example of abstracting complexity. • Another way to think about a file system is as a • Using a common set of API functions, a large variety of file protocol. systems can be supported on a large variety of storage devices. – Just as network protocols (such as IP) give meaning to the streams of data traversing the Internet, file systems • Example: the read function call, which allows some number of give meaning to the data on a particular storage medium. bytes to be read from a given file descriptor. • There are many types of file systems and media. – unaware of file system types, such as ext3 or NFS. – unaware of the particular storage medium upon which the file – With all of this variation, the file system interface is system is mounted implemented as a layered architecture, separating • AT Attachment Packet Interface (ATAPI) disk, • the user interface layer • Serial-Attached SCSI (SAS) disk, • the file system implementation • Serial Advanced Technology Attachment (SATA) disk. • the drivers that manipulate the storage devices – when the read function is called for an open file, the data is returned as expected. Operating Systems and Distributed Systems 7 Operating Systems and Distributed Systems High-level architecture High-level architecture • User space contains the applications (for this example, the user of the file system) and the GNU C Library (glibc) which provides the user interface for the file system calls (open, read, write, close). – The SCI acts as a switch, funneling system calls from user space to the appropriate endpoints in kernel space • The VFS is the primary interface to the underlying file systems: – exports a set of interfaces and then abstracts them to the individual file systems, which may behave very differently from one another. • Two caches exist for file system objects (inodes and dentries), which I'll define shortly. Each provides a pool of recently-used file system objects. Operating Systems and Distributed Systems 9 Operating Systems and10 Distributed Systems The external view: Naming files Typical file extensions Extension Meaning • Important to be able to find files after they’re created File.txt Text file • Every file has at least one name File.c C source program • Name can be – Human-accessible: “foo.c”, “my photo”, “Go Slugs!” File.jpg JPEG-encoded picture – Machine-usable: 1138, 8675309 File.tgz Compressed tar file • Case may or may not matter File.html HTML file – Depends on the file system • Name may include information about the file’s contents File~ Emacs-created backup – Certainly does for the user (the name should make it easy to File.tex Input for TeX figure out what’s in it!) – Computer may use part of the name to determine the file type File.mp3 Audio encoded in MPEG layer 3 File.pdf Adobe Portable Document Format File.ps Postscript file Operating Systems and11 Distributed Systems Operating Systems and12 Distributed Systems File structures Internal file structure Magic number Header 1 byte 1 record Text size Module Data size name BSS size Date Header Symbol table size Object module 12A 101 111 Owner Entry point Protection Flags Size Text Header sab wm cm avg ejw sab elm ddel Object module Data Header Relocation bits Sequence Sequence Object module of bytes of records F01 W03 F03 Symbol table Tree Executable file Archive file Operating Systems and Distributed Systems 13 Operating Systems and Distributed Systems 14 Accessing a file File attributes (possibilities) • Sequential access Attribute Meaning – Read all bytes/records from the beginning Protection Who can access the file and how Password Password needed to access the file – Cannot jump around Creator ID of the person who created the file Owner Current owner of the file • May rewind or back up, however Read-only flag Indicates whether file may be modified – Convenient when medium was magnetic tape Hidden flag Indicates whether file should be displayed in listings System flag Is this an OS file? – Often useful when whole file is needed Archived flag Has this file been backed up? ASCII/binary flag Is this file ASCII or binary? • Random access Random access flag Is random access allowed on this file? Temporary flag Is this a temporary file? – Bytes (or records) read in any order Lock flags Used for locking a file to get exclusive access Record length Number of bytes in a record – Essential for database systems Key position / length Location & length of the key Creation time When was the file created? – Read can be … Last access time When was the file last accessed? • Move file marker (seek), then read or … Last modified time When was the file last modified? Current size Number of bytes in the file • Read and then move file marker Maximum size Number of bytes the file may grow to Operating Systems and Distributed Systems 15 Operating Systems and16 Distributed Systems File operations Directories • Create: make a new file • Append: like write, but only • Naming is nice, but limited at the end of the file • Delete: remove an existing • Humans like to group things together for file • Seek: move the “current” pointer elsewhere in the convenience • Open: prepare a file to be file accessed • File systems allow this to be done with • Get attributes: retrieve directories (sometimes called folders) • Close: indicate that a file is attribute information no longer being accessed • Set attributes: modify • Grouping makes it easier to • Read: get data from a file attribute information – Find files in the first place: remember the enclosing • Write: put data to a file • Rename: change a file’s directories for the file name – Locate related files (or just determine which files are related) Operating Systems and Distributed Systems 17 Operating Systems and18 Distributed Systems What’s in a directory? Directory structure • Structure • Two types of information – Linear list of files (often itself stored in a file) – File names • Simple to program – File metadata (size, timestamps, etc.) • Slow to run • Basic choices for directory information • Increase speed by keeping it sorted (insertions are slower!) – Store all information in directory – Hash table: name hashed and looked up in file • Fixed size entries • Decreases search time: no linear searches! • Disk addresses and attributes in directory entry • May be difficult to expand – Store names & pointers to index nodes (i-nodes) • Can result in collisions (two files hash to same location) attributes – Tree games attributes games • Fast for searching mail attributes mail attributes • Easy to expand news attributes news attributes research attributes research • Difficult to do in on-disk directory • Name length Storing all information Using pointers to attributes in the directory index nodes – Fixed: easy to program – Variable: more flexible, better for users Operating Systems and19 Distributed Systems Operating Systems and20 Distributed Systems Linux directories Hierarchical directory system • Linux supports Inode=3 Root directory – Plain text directories Rec_len=16 – Hashed directories Name_len=5 A B C – Plain text more common, but File_type=2 slower Users\0\0\0 • Directories just like regular A A A B B C C C foo Papers bar foo blah files Inode=251 Papers foo Photos • Name length limited to 255 Rec_len=12 bytes Name_len=2 A A A B B File_type=2 os.tex sunset Family foo.tex foo.ps Me\0\0 A A A sunset kids Mom Operating Systems and21 Distributed Systems Operating Systems and Distributed Systems 22 Unix directory tree Sharing files bin Root directory etc lib A B C usr tmp A A A B B C C C Papers foo Photos foo Photos bar foo blah elm A A A B os.tex sunset Family lake foo.c .cshrc /usr/elm/foo.c todo.txt A A ? sunset kids ??? Operating Systems and23 Distributed Systems Operating Systems and24 Distributed Systems Solution: use links Operations on directories • A creates a file, and inserts into her directory • Create: make a new • Readdir: read a directory • B shares the file by creating a link to it directory entry • A unlinks the file • Delete: remove a directory • Rename: change the name of a directory – B still links to the file (usually must be empty) • Similar to renaming a file • Opendir: open a directory – Owner is still A (unless B explicitly changes it) • Link: create a new entry in to allow searching it a directory to link to an A A B B • Closedir: close a directory existing file (done searching) b.tex b.tex • Unlink: remove an entry in a.tex a.tex a directory • Remove the file if this is the last link to this file Owner: A Owner: A Owner: A Count: 1 Count: 2 Count: 1 Operating Systems and25 Distributed Systems Operating Systems and26 Distributed Systems High-level architecture Virtual file system layer (Linux) • Each individual file system implementation, such as ext2, • The VFS acts as the root level of the file-system interface.
Recommended publications
  • Serverless Network File Systems
    Serverless Network File Systems Thomas E. Anderson, Michael D. Dahlin, Jeanna M. Neefe, David A. Patterson, Drew S. Roselli, and Randolph Y. Wang Computer Science Division University of California at Berkeley Abstract In this paper, we propose a new paradigm for network file system design, serverless network file systems. While traditional network file systems rely on a central server machine, a serverless system utilizes workstations cooperating as peers to provide all file system services. Any machine in the system can store, cache, or control any block of data. Our approach uses this location independence, in combination with fast local area networks, to provide better performance and scalability than traditional file systems. Further, because any machine in the system can assume the responsibilities of a failed component, our serverless design also provides high availability via redundant data storage. To demonstrate our approach, we have implemented a prototype serverless network file system called xFS. Preliminary performance measurements suggest that our architecture achieves its goal of scalability. For instance, in a 32-node xFS system with 32 active clients, each client receives nearly as much read or write throughput as it would see if it were the only active client. 1. Introduction A serverless network file system distributes storage, cache, and control over cooperating workstations. This approach contrasts with traditional file systems such as Netware [Majo94], NFS [Sand85], Andrew [Howa88], and Sprite [Nels88] where a central server machine stores all data and satisfies all client cache misses. Such a central server is both a performance and reliability bottleneck. A serverless system, on the other hand, distributes control processing and data storage to achieve scalable high performance, migrates the responsibilities of failed components to the remaining machines to provide high availability, and scales gracefully to simplify system management.
    [Show full text]
  • XFS: There and Back ...And There Again? Slide 1 of 38
    XFS: There and Back.... .... and There Again? Dave Chinner <[email protected]> <[email protected]> XFS: There and Back .... and There Again? Slide 1 of 38 Overview • Story Time • Serious Things • These Days • Shiny Things • Interesting Times XFS: There and Back .... and There Again? Slide 2 of 38 Story Time • Way back in the early '90s • Storage exceeding 32 bit capacities • 64 bit CPUs, large scale MP • Hundreds of disks in a single machine • XFS: There..... Slide 3 of 38 "x" is for Undefined xFS had to support: • Fast Crash Recovery • Large File Systems • Large, Sparse Files • Large, Contiguous Files • Large Directories • Large Numbers of Files • - Scalability in the XFS File System, 1995 http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html XFS: There..... Slide 4 of 38 The Early Years XFS: There..... Slide 5 of 38 The Early Years • Late 1994: First Release, Irix 5.3 • Mid 1996: Default FS, Irix 6.2 • Already at Version 4 • Attributes • Journalled Quotas • link counts > 64k • feature masks • • XFS: There..... Slide 6 of 38 The Early Years • • Allocation alignment to storage geometry (1997) • Unwritten extents (1998) • Version 2 directories (1999) • mkfs time configurable block size • Scalability to tens of millions of directory entries • • XFS: There..... Slide 7 of 38 What's that Linux Thing? • Feature development mostly stalled • Irix development focussed on CXFS • New team formed for Linux XFS port! • Encumberance review! • Linux was missing lots of bits XFS needed • Lot of work needed • • XFS: There and..... Slide 8 of 38 That Linux Thing? XFS: There and..... Slide 9 of 38 Light that fire! • 2000: SGI releases XFS under GPL • • 2001: First stable XFS release • • 2002: XFS merged into 2.5.36 • • JFS follows similar timeline • XFS: There and....
    [Show full text]
  • Comparing Filesystem Performance: Red Hat Enterprise Linux 6 Vs
    COMPARING FILE SYSTEM I/O PERFORMANCE: RED HAT ENTERPRISE LINUX 6 VS. MICROSOFT WINDOWS SERVER 2012 When choosing an operating system platform for your servers, you should know what I/O performance to expect from the operating system and file systems you select. In the Principled Technologies labs, using the IOzone file system benchmark, we compared the I/O performance of two operating systems and file system pairs, Red Hat Enterprise Linux 6 with ext4 and XFS file systems, and Microsoft Windows Server 2012 with NTFS and ReFS file systems. Our testing compared out-of-the-box configurations for each operating system, as well as tuned configurations optimized for better performance, to demonstrate how a few simple adjustments can elevate I/O performance of a file system. We found that file systems available with Red Hat Enterprise Linux 6 delivered better I/O performance than those shipped with Windows Server 2012, in both out-of- the-box and optimized configurations. With I/O performance playing such a critical role in most business applications, selecting the right file system and operating system combination is critical to help you achieve your hardware’s maximum potential. APRIL 2013 A PRINCIPLED TECHNOLOGIES TEST REPORT Commissioned by Red Hat, Inc. About file system and platform configurations While you can use IOzone to gauge disk performance, we concentrated on the file system performance of two operating systems (OSs): Red Hat Enterprise Linux 6, where we examined the ext4 and XFS file systems, and Microsoft Windows Server 2012 Datacenter Edition, where we examined NTFS and ReFS file systems.
    [Show full text]
  • Filesystem Considerations for Embedded Devices ELC2015 03/25/15
    Filesystem considerations for embedded devices ELC2015 03/25/15 Tristan Lelong Senior embedded software engineer Filesystem considerations ABSTRACT The goal of this presentation is to answer a question asked by several customers: which filesystem should you use within your embedded design’s eMMC/SDCard? These storage devices use a standard block interface, compatible with traditional filesystems, but constraints are not those of desktop PC environments. EXT2/3/4, BTRFS, F2FS are the first of many solutions which come to mind, but how do they all compare? Typical queries include performance, longevity, tools availability, support, and power loss robustness. This presentation will not dive into implementation details but will instead summarize provided answers with the help of various figures and meaningful test results. 2 TABLE OF CONTENTS 1. Introduction 2. Block devices 3. Available filesystems 4. Performances 5. Tools 6. Reliability 7. Conclusion Filesystem considerations ABOUT THE AUTHOR • Tristan Lelong • Embedded software engineer @ Adeneo Embedded • French, living in the Pacific northwest • Embedded software, free software, and Linux kernel enthusiast. 4 Introduction Filesystem considerations Introduction INTRODUCTION More and more embedded designs rely on smart memory chips rather than bare NAND or NOR. This presentation will start by describing: • Some context to help understand the differences between NAND and MMC • Some typical requirements found in embedded devices designs • Potential filesystems to use on MMC devices 6 Filesystem considerations Introduction INTRODUCTION Focus will then move to block filesystems. How they are supported, what feature do they advertise. To help understand how they compare, we will present some benchmarks and comparisons regarding: • Tools • Reliability • Performances 7 Block devices Filesystem considerations Block devices MMC, EMMC, SD CARD Vocabulary: • MMC: MultiMediaCard is a memory card unveiled in 1997 by SanDisk and Siemens based on NAND flash memory.
    [Show full text]
  • Netinfo 2009-06-11 Netinfo 2009-06-11
    Netinfo 2009-06-11 Netinfo 2009-06-11 Microsoft släppte 2009-06-09 tio uppdateringar som täpper till 31 stycken säkerhetshål i bland annat Windows, Internet Explorer, Word, Excel, Windows Search. 18 av buggfixarna är märkta som kritiska och elva av dem är märkta som viktiga, uppdateringarna finns för både servrar och arbetsstationer. Säkerhetsuppdateringarna finns tillgängliga på Windows Update. Den viktigaste säkerhetsuppdateringen av de som släpptes är den för Internet Explorer 8. Netinfo 2009-06-11 Security Updates available for Adobe Reader and Acrobat Release date: June 9, 2009 Affected software versions Adobe Reader 9.1.1 and earlier versions Adobe Acrobat Standard, Pro, and Pro Extended 9.1.1 and earlier versions Severity rating Adobe categorizes this as a critical update and recommends that users apply the update for their product installations. These vulnerabilities would cause the application to crash and could potentially allow an attacker to take control of the affected system. Netinfo 2009-06-11 SystemRescueCd Description: SystemRescueCd is a Linux system on a bootable CD-ROM for repairing your system and recovering your data after a crash. It aims to provide an easy way to carry out admin tasks on your computer, such as creating and editing the partitions of the hard disk. It contains a lot of system tools (parted, partimage, fstools, ...) and basic tools (editors, midnight commander, network tools). It is very easy to use: just boot the CDROM. The kernel supports most of the important file systems (ext2/ext3/ext4, reiserfs, reiser4, btrfs, xfs, jfs, vfat, ntfs, iso9660), as well as network filesystems (samba and nfs).
    [Show full text]
  • Zack's Kernel News
    KERNEL NEWS ZACK’S KERNEL NEWS ReiserFS Turmoil Their longer term plan, Alexander Multiport Card driver, again naming In light of recent events surrounding says, depends on what happens with himself the maintainer. Hans Reiser (http:// www. linux-maga- Hans. If Hans is released, the developers Jiri’s been submitting a number of zine. com/ issue/ 73/ Linux_World_News. intend to proceed as before. If he is not patches for these drivers, so it makes pdf), the question of how to continue released, Alexander’s best guess is that sense he would maintain them if he ReiserFS development came up on the the developers will try to appoint a wished; in any event, no other kernel linux-kernel mailing list. Alexander proxy to run Namesys. hacker has spoken up to claim the role. Lyamin from Hans’s Namesys company offered his take on the situation. He said Status of sysctl Filesystem Benchmarks that ReiserFS 3 has pretty much stabi- In keeping with Linus Torvalds’ recent Some early tests have indicated that ext4 lized into bugfix mode, though Suse assertions that it is never acceptable to is faster with disk writes than either ext3 folks had been adding new features like break user-space, Albert Cahalan volun- or Reiser4. There was general interest in ACL support. So ReiserFS 3 would go on teered to maintain the sysctl code if it these results, though the tests had some as before. couldn’t be removed. But Linus pointed problems (the tester thought delayed In terms of Reiser4, however, Alexan- out that really nothing actually used allocation was part of ext4, when that der said that he and the other Reiser de- sysctl (the implication being that it feature has not yet been merged into velopers were still addressing the techni- wouldn’t actually break anything to get Andrew Morton’s tree).
    [Show full text]
  • Compatibility Matrix for CTE Agent with Data Security Manager Release 7.0.0 Document Version 15 January 19, 2021 Contents
    Compatibility Matrix for CTE Agent with Data Security Manager Release 7.0.0 Document Version 15 January 19, 2021 Contents Rebranding Announcement 6 CTE Agent for Linux 6 Interoperability 6 Table 1: Linux Interoperability with Third Party Applications 6 ESG (Efficient Storage GuardPoint) Support 7 Table 2: Efficient Storage GuardPoint Support 7 Linux Agent Raw Device Support Matrix 8 Red Hat, CentOS, and OEL non-UEK 6.10 Raw Device Support 8 Table 3: Red Hat 6.10 | CentOS 6.10 | OEL non-UEK 6.10 (x86_64)3 8 Red Hat, CentOS, and OEL non-UEK 7.5-7.9 Raw Device Support 9 Table 4: Red Hat 7.5-7.9 | CentOS 7.5-7.8 | OEL non-UEK 7.5-7.8 (x86_64)1,2 9 Red Hat, CentOS, and OEL non-UEK 8 Raw Device Support 9 Table 5: Red Hat 8.0-8.2 | CentOS 8.0-8.2 | OEL non-UEK 8.0-8.2 (x86_64)1,2 9 SLES 12 Raw Device Support 10 Table 6: SLES 12 SP3, SLES 12 SP4, and SLES 12 SP5 (x86_64)2 10 SLES 15 Raw Device Support 10 Table 7: SLES 15, SLES 15 SP1, and SLES 15 SP2 (x86_64) 10 Redhat 6.10/7.5 ACFS Support with Secvm 11 Table 8: Oracle ACFS/Secvm support on Redhat 6.10/7.5 (x86_64) 11 Table 9: Oracle ACFS/Secvm Stack with Red Hat 6.10/7.5 (x86_64) 11 Linux Agent File System Support Matrix 12 Red Hat, CentOS, and OEL non-UEK 6. 10 File System Support 12 Table 10: Red Hat 6.10 | CentOS 6.10 | OEL non-UEK 6.10 (x86_64)1,3 12 LDT Feature for Red Hat, CentOS, and OEL non-UEK 6.10 File System Support 13 Table 11: Red Hat 6.10 | CentOS 6.10 | OEL non-UEK 6.10 (x86_64)1 13 Red Hat, CentOS, and OEL non-UEK 7.5 - 7.9 File System Support 14 Table 12: Red Hat 7.5-7.9 | CentOS
    [Show full text]
  • Xfs: a Wide Area Mass Storage File System
    xFS: A Wide Area Mass Storage File System Randolph Y. Wang and Thomas E. Anderson frywang,[email protected] erkeley.edu Computer Science Division University of California Berkeley, CA 94720 Abstract Scalability : The central le server mo del breaks down when there can b e: The current generation of le systems are inadequate thousands of clients, in facing the new technological challenges of wide area terabytes of total client caches for the servers networks and massive storage. xFS is a prototyp e le to keep track of, system we are developing to explore the issues brought billions of les, and ab out by these technological advances. xFS adapts p etabytes of total storage. many of the techniques used in the eld of high p er- Availability : As le systems are made more scalable, formance multipro cessor design. It organizes hosts into allowing larger groups of clients and servers to work a hierarchical structure so lo cality within clusters of together, it b ecomes more likely at any given time workstations can b e b etter exploited. By using an that some clients and servers will b e unable to com- invalidation-based write back cache coherence proto col, municate. xFS minimizes network usage. It exploits the le system Existing distributed le systems were originally de- naming structure to reduce cache coherence state. xFS signed for lo cal area networks and disks as the b ottom also integrates di erent storage technologies in a uni- layer of the storage hierarchy. They are inadequate in form manner. Due to its intelligent use of lo cal hosts facing the challenges of wide area networks and massive and lo cal storage, we exp ect xFS to achieve b etter p er- storage.
    [Show full text]
  • Scaling Source Control for the Next Generation of Game Development by Mike Sundy & Tobias Roberts Perforce User's Conference May 2007, Las Vegas
    Scaling Source Control for the Next Generation of Game Development by Mike Sundy & Tobias Roberts Perforce User's Conference May 2007, Las Vegas Introduction: This document intends to show performance differences between various Windows and Linux Perforce server configurations. About the Authors Tobias Roberts is the Studio IT Architect for the Electronic Arts Redwood Shores (EARS) and Sims Studios. He has been focused on server performance and studio architecture for the past nine years. E-mail: [email protected]. Mike Sundy is the Senior Perforce Administrator for the EARS/Sims Studios. He has over eleven years of version control administration experience and has specialized in Perforce for the past five years. E-mail: [email protected] or [email protected] Installation Metrics At Electronic Arts, we check in art binary data in addition to code. There are typically a much greater number of data assets compared to code. 90% of files we store in Perforce are binary, the other 10% are code. Some of our servers have upwards of 1200 users, 6.3 million files, 600,000 changelists, and a 80 GB db.have. Oldest server has 7 years of P4 history. EA is primarily a Windows-based development shop, with upwards of 90% of client desktops running Windows. EA has over 4,000 Perforce users and 90+ Perforce servers scattered across dozens of worldwide studios. Our largest server has 1.5 TB of data. We use 5 TB of Perforce RCS storage total at the EARS/Sims studios. The EARS/Sims studios have 10 Perforce servers and approximately 1,000 users.
    [Show full text]
  • A Transparently-Scalable Metadata Service for the Ursa Minor Storage System
    A Transparently-Scalable Metadata Service for the Ursa Minor Storage System Shafeeq Sinnamohideen† Raja R. Sambasivan† James Hendricks†∗ Likun Liu‡ Gregory R. Ganger† †Carnegie Mellon University ‡Tsinghua University Abstract takes to ensure scalability—the visible semantics must be consistent regardless of how the system chooses to The metadata service of the Ursa Minor distributed distribute metadata across servers. Several existing sys- storage system scales metadata throughput as metadata tems have demonstrated this design goal (e.g., [3, 6, 36]). servers are added. While doing so, it correctly han- dles operations that involve metadata served by differ- The requirement to be scalable implies that the sys- ent servers, consistently and atomically updating such tem will use multiple metadata nodes, with each storing metadata. Unlike previous systems, Ursa Minor does so some subset of the metadata and servicing some subset by reusing existing metadata migration functionality to of the metadata requests. In any system with multiple avoid complex distributed transaction protocols. It also servers, it is possible for the load across servers to be un- assigns object IDs to minimize the occurrence of multi- balanced; therefore, some mechanism for load balancing server operations. This approach allows Ursa Minor to is desired. This can be satisfied by migrating some of the implement a desired feature with less complexity than al- metadata from an overloaded server to a less loaded one, ternative methods and with minimal performance penalty thus relocating requests that pertain to that metadata. (under 1% in non-pathological cases). The requirement to be transparent implies that clients 1 Introduction see the same behavior regardless of how the system has distributed metadata across servers.
    [Show full text]
  • NOVA: the Fastest File System for Nvdimms
    NOVA: The Fastest File System for NVDIMMs Steven Swanson, UC San Diego XFS F2FS NILFS EXT4 BTRFS © 2017 SNIA Persistent Memory Summit. All Rights Reserved. Disk-based file systems are inadequate for NVMM Disk-based file systems cannot 1-Sector 1-Block N-Block 1-Sector 1-Block N-Block Atomicity overwrit overwrit overwrit append append append exploit NVMM performance e e e Ext4 ✓ ✗ ✗ ✗ ✗ ✗ wb Ext4 ✓ ✓ ✗ ✓ ✗ ✓ Performance optimization Order Ext4 ✓ ✓ ✓ ✓ ✗ ✓ compromises consistency on Dataj system failure [1] Btrfs ✓ ✓ ✓ ✓ ✗ ✓ xfs ✓ ✓ ✗ ✓ ✗ ✓ Reiserfs ✓ ✓ ✗ ✓ ✗ ✓ [1] Pillai et al, All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI '14. © 2017 SNIA Persistent Memory Summit. All Rights Reserved. BPFS SCMFS PMFS Aerie EXT4-DAX XFS-DAX NOVA M1FS © 2017 SNIA Persistent Memory Summit. All Rights Reserved. Previous Prototype NVMM file systems are not strongly consistent DAX does not provide data ATomic Atomicity Metadata Data Snapshot atomicity guarantee Mmap [1] So programming is more BPFS ✓ ✓ [2] ✗ ✗ difficult PMFS ✓ ✗ ✗ ✗ Ext4-DAX ✓ ✗ ✗ ✗ Xfs-DAX ✓ ✗ ✗ ✗ SCMFS ✗ ✗ ✗ ✗ Aerie ✓ ✗ ✗ ✗ © 2017 SNIA Persistent Memory Summit. All Rights Reserved. Ext4-DAX and xfs-DAX shortcomings No data atomicity support Single journal shared by all the transactions (JBD2- based) Poor performance Development teams are (rightly) “disk first”. © 2017 SNIA Persistent Memory Summit. All Rights Reserved. NOVA provides strong atomicity guarantees 1-Sector 1-Sector 1-Block 1-Block N-Block N-Block Atomicity Atomicity Metadata Data Mmap overwrite append overwrite append overwrite append Ext4 ✓ ✗ ✗ ✗ ✗ ✗ BPFS ✓ ✓ ✗ wb Ext4 ✓ ✓ ✗ ✓ ✗ ✓ PMFS ✓ ✗ ✗ Order Ext4 ✓ ✓ ✓ ✓ ✗ ✓ Ext4-DAX ✓ ✗ ✗ Dataj Btrfs ✓ ✓ ✓ ✓ ✗ ✓ Xfs-DAX ✓ ✗ ✗ xfs ✓ ✓ ✗ ✓ ✗ ✓ SCMFS ✗ ✗ ✗ Reiserfs ✓ ✓ ✗ ✓ ✗ ✓ Aerie ✓ ✗ ✗ © 2017 SNIA Persistent Memory Summit. All Rights Reserved.
    [Show full text]
  • Red Hat Ceph* Storage and Intel®
    RED HAT CEPH STORAGE AND INTEL CACHE ACCELERATION SOFTWARE Accelerating object storage with the Intel SSD Data Center family SOLUTION OVERVIEW INTRODUCTION To manage massive data growth, organizations are increasingly choosing object storage solutions, allowing them to scale storage flexibly while controlling costs. Ceph is a popular solution, letting organizations deploy cost-effective industry-standard hardware as a part of proven software- defined storage infrastructure. With this shift, the storage media itself has become a key consider- Achieve flash-accelerated object ation. Traditional hard disk drives (HDDs) are affordable, but often lack the desired input/output (I/O) storage performance at lower performance for demanding workloads, such as storing large numbers of objects. Proprietary all- costs than proprietary flash arrays offer performance, but can be cost-prohibitive for large-scale deployments. all-flash array solutions. Red Hat® Ceph Storage combined with Intel® Solid State Drives (SSDs) Data Center family and Intel® Cache Acceleration Software (CAS) has emerged as a compelling option. Organizations can use Intel Use the Intel SSD Data Center CAS to selectively classify key portions of a given I/O workload for acceleration with the high-per- family and Intel CAS to intelli- formance and low latency of flash storage. The performance difference can be remarkable. When gently prioritize I/O for caching. testing large object-count storage workloads, Red Hat saw performance improvements of up to 400% for small-object writes when using Intel SSDs and Intel CAS.1 The solution is also cost-effec- Confidently deploy software- tive, achieving strong performance results with only 2-4 SSDs per system.
    [Show full text]