Decoupled Analytics for Shared Storage Systems

Total Page:16

File Type:pdf, Size:1020Kb

Decoupled Analytics for Shared Storage Systems MixApart: Decoupled Analytics for Shared Storage Systems Madalin Mihailescu†, Gokul Soundararajan†, Cristiana Amza University of Toronto, NetApp, Inc.† Abstract nodes, and the results are written back to the file sys- tem [19, 22]. The file system component of these frame- Distributed file systems built for data analytics and en- works [13, 21] is a standalone storage system; the file terprise storage systems have very different functionality system stores the data on local server drives, with redun- requirements. For this reason, enabling analytics on en- dancy (typically, 3-way replication) to provide fault tol- terprise data commonly introduces a separate analytics erance and data availability. storage silo. This generates additional costs, and ineffi- While the benefits of these frameworks are well un- ciencies in data management, e.g., whenever data needs derstood, it is not clear how to leverage them for per- to be archived, copied, or migrated across silos. forming analysis of enterprise data. Enterprise data, e.g., MixApart uses an integrated data caching and schedul- corporate documents, user home directories, critical log ing solution to allow MapReduce computations to ana- data, e-mails, need to be secured from tampering, pro- lyze data stored on enterprise storage systems. The front- tected from failures, and archived for long-term reten- end caching layer enables the local storage performance tion; high-value data demands the enterprise-level data required by data analytics. The shared storage back-end management features provided by enterprise storage. simplifies data management. Traditional analytics file systems [13], however, have We evaluate MixApart using a 100-core Amazon EC2 been designed for low-value data, i.e., data that can be cluster with micro-benchmarks and production workload regenerated on a daily basis; low-value data do not need traces. Our evaluation shows that MixApart provides (i) enterprise-level data management. Consequently, these up to 28% faster performance than the traditional ingest- file systems, while providing high throughput for paral- then-compute workflows used in enterprise IT analyt- lel computations, lack many of the management features ics, and (ii) comparable performance to an ideal Hadoop essential for the enterprise environment, e.g., support for setup without data ingest, at similar cluster sizes. standard protocols (NFS), storage efficiency mechanisms such as deduplication, strong data consistency, and data 1 Introduction protection mechanisms for active and inactive data [26]. These disparities create an environment where two We design a novel method for enabling distributed data dedicated reliable storage systems (silos) are required for analytics to use data stored on enterprise storage. Dis- running analytics on enterprise data: a feature-rich en- tributed data analytics frameworks, such as MapRe- terprise silo that manages the total enterprise data and a duce [12, 19] and Dryad [22], are utilized by organiza- high-throughput analytics silo storing enterprise data for tions to analyze increasing volumes of information. Ac- analytics. This setup leads to a substantial upfront in- tivities range from analyzing e-mails, log data, or trans- vestment as well as complex workflows to manage data action records, to executing clustering algorithms for across different file systems. Enterprise analytics typi- customer segmentation. Using such data flow frame- cally works as follows: (i) an application records data works, data partitions are loaded from an underlying into the enterprise silo, (ii) an extraction workflow reads commodity distributed file system, passed through a se- the data and ingests the data into the analytics silo, (iii) ries of operators executed in parallel on the compute an analytics program is executed and the results of the USENIX Association 11th USENIX Conference on File and Storage Technologies (FAST ’13) 133 analytics are written into the analytics file system, (iv) 2 MixApart Use Cases the results of the analytics are loaded back into the enter- prise silo. The workflow repeats as new data arrives. MixApart allows computations to analyze data stored on To address the above limitations, we design MixApart enterprise storage systems, while using a caching layer to facilitate scalable distributed processing of datasets for performance. We describe scenarios where we expect stored in existing enterprise storage systems, thereby dis- this decoupled design to provide high functional value. placing the dedicated analytics silo. MixApart replaces Analyzing data on enterprise storage: Companies the distributed file system component in current analytics can leverage their existing investments in enterprise stor- frameworks with an end-to-end storage solution, XDFS, age and enable analytics incrementally. There exist many that consists of: file-based data such as source-code repositories, e-mails, a stateless analytics caching layer, built out of local and log files; these files are generated by traditional ap- • server disks, co-located with the compute nodes for plications but currently require a workflow to ingest the data analytics performance, data into an analytics filesystem. MixApart allows a sin- a data transfer scheduler that schedules transfers of gle storage back-end to manage and service data for both • data from consolidated enterprise storage into the disk enterprise and analytics workloads. Data analytics, using cache, just in time, in a controlled manner, and the same filesystem namespace, can analyze enterprise data with no additional ingest workflows. the enterprise shared storage system for data reliability • and data management. Cross-data center deployments: MixApart’s decou- pled design also allows for independent scaling of com- The design space for MixApart includes two main pute and storage layers. This gives the flexibility of plac- alternatives: (i) unified caching, by having the XDFS ing the analytics compute tier on cloud infrastructures, cache interposed on the access path of both the analytics such as Amazon EC2 [3], while keeping the data on- and the enterprise workloads, (ii) dedicated caching, by premise. In this scenario, upfront hardware purchases are having the XDFS cache serve analytics workloads only. replaced by the pay-as-you-go cloud model. Cross-data Moreover, in terms of implementation, for each design center deployments benefit from efforts such as AWS Di- option, we can either leverage the existing HDFS [13] rect Connect [2] that enable high-bandwidth connections codebase for developing our cache, or leverage existing between private data centers and clouds. file caching solutions e.g., for NFS or AFS [8] file sys- tems. With MixApart we explore the design of a dedi- cated cache with an implementation based on the exist- ing Hadoop/HDFS codebase. Our main reasons are pro- totyping speed and fair benchmarking against Hadoop. 3 MapReduce Workload Analysis We deployed MixApart on a 100-core Amazon EC2 cluster, and evaluated its performance with micro- Data analytics frameworks process data by splitting a benchmarks and production workload traces from Face- user-submitted job into several tasks that run in parallel. book. In general, the results show that MixApart pro- In the input data processing phase, e.g., the Map phase, vides comparable performance to Hadoop, at similar tasks read data partitions from the underlying distributed compute scales. In addition, MixApart improves stor- file system, compute the intermediate results by running age efficiency and simplifies data management. First, the computation specified in the task, and shuffle the out- the stateless cache design removes the redundancy re- put to the next set of operators – e.g., the Reduce phase. quirements in current analytics file systems, thus low- The bulk of the data processed is read in this initial phase. ering storage demands. Moreover, with classic analytics Recent studies [10, 16] of production workloads de- R deployments, ingest is usually run periodically as a way ployed at Facebook, MicrosoftBing, and Yahoo! make to synchronize two storage systems, independent of job three key observations: (i) there is high data reuse across demands. In contrast, our integrated solution provides a jobs, (ii) the input phases are, on average, CPU-intensive, dynamic ingest of only the needed data. Second, MixA- and (iii) the I/O demands of jobs are predictable. Based part eliminates cross-silo data workflows, by relying on on the above workload characteristics, we motivate our enterprise storage for data management. The data in the MixApart design decisions. We then show that, with typ- cache is kept consistent with the associated data in shared ical I/O demands and data reuse rates, MixApart can sus- storage, thus enabling data freshness for analytics, trans- tain large compute cluster sizes, equivalent to those sup- parently, when the underlying enterprise data changes. ported by current deployments using dedicated storage. 2 134 11th USENIX Conference on File and Storage Technologies (FAST ’13) USENIX Association 3.1 High Data Reuse across Jobs 100000 100000 100000 100000 40s 40s Production workloads exhibit high data reuse across jobs 10000 10000 20s 20s 10000 10000 0.5s 0.5s with only 11%, 6%, and 7% of jobs from Facebook, 1000 1000 1000 1000 Bing, and Yahoo!, respectively, reading a file once [10]. Using the job traces collected, Ananthanarayanan et al. 100 100 100 100 # of Tasks # of Tasks # of Tasks # of Tasks estimate that in-memory cache hit rates of 60% are pos- 10 10 10 10 40s 40s sible by allocating 32GB of memory on each machine, 20s 20s with an optimal cache replacement policy [10]. 1 1 1 1 0.5s 0.5s MixApart employs a distributed cache layer built from 0 0.2 0.4 00.6 0.2 0.8 0.40.99 0.6 0.8 0.99 0 0.2 0.4 00.6 0.2 0.8 0.40.99 0.6 0.8 0.99 local server drives; this disk cache is two orders of mag- Data ReuseData Ratio Reuse Ratio Data ReuseData Ratio Reuse Ratio nitude larger than an in-memory cache. For example, (a) 1Gbps (b) 10Gbps Amazon EC2 [3] instances typically provide about 1TB Figure 1: Compute Cluster Size.
Recommended publications
  • Elastic Storage for Linux on System Z IBM Research & Development Germany
    Peter Münch T/L Test and Development – Elastic Storage for Linux on System z IBM Research & Development Germany Elastic Storage for Linux on IBM System z © Copyright IBM Corporation 2014 9.0 Elastic Storage for Linux on System z Session objectives • This presentation introduces the Elastic Storage, based on General Parallel File System technology that will be available for Linux on IBM System z. Understand the concepts of Elastic Storage and which functions will be available for Linux on System z. Learn how you can integrate and benefit from the Elastic Storage in a Linux on System z environment. Finally, get your first impression in a live demo of Elastic Storage. 2 © Copyright IBM Corporation 2014 Elastic Storage for Linux on System z Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. AIX* FlashSystem Storwize* Tivoli* DB2* IBM* System p* WebSphere* DS8000* IBM (logo)* System x* XIV* ECKD MQSeries* System z* z/VM* * Registered trademarks of IBM Corporation The following are trademarks or registered trademarks of other companies. Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
    [Show full text]
  • BESIII Physical Analysis on Hadoop Platform
    20th International Conference on Computing in High Energy and Nuclear Physics (CHEP2013) IOP Publishing Journal of Physics: Conference Series 513 (2014) 032044 doi:10.1088/1742-6596/513/3/032044 BESIII Physical Analysis on Hadoop Platform Jing HUO12, Dongsong ZANG12, Xiaofeng LEI12, Qiang LI12, Gongxing SUN1 1Institute of High Energy Physics, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China E-mail: [email protected] Abstract. In the past 20 years, computing cluster has been widely used for High Energy Physics data processing. The jobs running on the traditional cluster with a Data-to-Computing structure, have to read large volumes of data via the network to the computing nodes for analysis, thereby making the I/O latency become a bottleneck of the whole system. The new distributed computing technology based on the MapReduce programming model has many advantages, such as high concurrency, high scalability and high fault tolerance, and it can benefit us in dealing with Big Data. This paper brings the idea of using MapReduce model to do BESIII physical analysis, and presents a new data analysis system structure based on Hadoop platform, which not only greatly improve the efficiency of data analysis, but also reduces the cost of system building. Moreover, this paper establishes an event pre-selection system based on the event level metadata(TAGs) database to optimize the data analyzing procedure. 1. Introduction 1.1 The current BESIII computing architecture High Energy Physics experiment is a typical data-intensive application. The BESIII computing system now consist of 3PB+ data, 6500 CPU cores, and it is estimated that there will be more than 10PB data produced in the future 5 years.
    [Show full text]
  • Globalfs: a Strongly Consistent Multi-Site File System
    GlobalFS: A Strongly Consistent Multi-Site File System Leandro Pacheco Raluca Halalai Valerio Schiavoni University of Lugano University of Neuchatelˆ University of Neuchatelˆ Fernando Pedone Etienne Riviere` Pascal Felber University of Lugano University of Neuchatelˆ University of Neuchatelˆ Abstract consistency, availability, and tolerance to partitions. Our goal is to ensure strongly consistent file system operations This paper introduces GlobalFS, a POSIX-compliant despite node failures, at the price of possibly reduced geographically distributed file system. GlobalFS builds availability in the event of a network partition. Weak on two fundamental building blocks, an atomic multicast consistency is suitable for domain-specific applications group communication abstraction and multiple instances of where programmers can anticipate and provide resolution a single-site data store. We define four execution modes and methods for conflicts, or work with last-writer-wins show how all file system operations can be implemented resolution methods. Our rationale is that for general-purpose with these modes while ensuring strong consistency and services such as a file system, strong consistency is more tolerating failures. We describe the GlobalFS prototype in appropriate as it is both more intuitive for the users and detail and report on an extensive performance assessment. does not require human intervention in case of conflicts. We have deployed GlobalFS across all EC2 regions and Strong consistency requires ordering commands across show that the system scales geographically, providing replicas, which needs coordination among nodes at performance comparable to other state-of-the-art distributed geographically distributed sites (i.e., regions). Designing file systems for local commands and allowing for strongly strongly consistent distributed systems that provide good consistent operations over the whole system.
    [Show full text]
  • Considerations When Choosing a Backup System for AFS
    Considerations when Choosing a Backup System for AFS By Kristen J. Webb President and CTO Teradactyl LLC. June 18, 2005 The Andrew File System® has a proven track record as a scalable and secure network file system. Organizations everywhere depend on AFS for the sharing of data both internally and externally. The most telling sign of the longevity of AFS is the adoption of the open source version of the software (OpenAFS). One of the important challenges facing AFS system administrators is the need to provide efficient, cost-effective backup strategies that can scale with ever increasing cell sizes. This paper provides an overview of the issues to consider when putting together a long term backup and recovery strategy. The topics include: Methods available for data backup and recovery; Impact of the backup function on cell performance and scale; Volume management; Centralized vs. Distributed backups; Disk-to-Tape, Disk-to-Disk and Disk-to-Disk-to-Tape comparisons; Disaster Recovery requirements; Advanced Solutions. The reader should have a basic understanding of backup and recovery concepts and familiarity with AFS architecture and cell administration. Introduction The purpose of this paper it to examine the general behavior of backup systems in relation to working with the Andrew File System (AFS). It is written as a guide for system administers to help make them aware of all of the considerations for backup, restore, and disaster recovery that they need to take into account when putting together a backup and recovery strategy for AFS. The paper includes discussions on the features and limitations of the native backup and recovery programs provided with AFS.
    [Show full text]
  • Distributed Systems 14
    Distributed Systems 14. Network File Systems (Network Attached Storage) Paul Krzyzanowski Rutgers University Fall 2018 October 22, 2018 © 2014-2018 Paul Krzyzanowski 1 Accessing files File sharing with socket-based programs HTTP, FTP, telnet: – Explicit access – User-directed connection to access remote resources We want more transparency – Allow user to access remote resources just as local ones NAS: Network Attached Storage October 22, 2018 © 2014-2018 Paul Krzyzanowski 2 File service models Upload/Download model Remote access model – Read file: copy file from server to client File service provides functional interface: – Write file: copy file from client to server – create, delete, read bytes, write bytes, etc… Advantage: Advantages: – Simple – Client gets only what’s needed – Server can manage coherent view of file system Problems: – Wasteful: what if client needs small Problem: piece? – Possible server and network congestion – Problematic: what if client doesn’t have enough space? • Servers are accessed for duration of file access – Consistency: what if others need to modify the same file? • Same data may be requested repeatedly October 22, 2018 © 2014-2018 Paul Krzyzanowski 3 Semantics of file sharing Sequential Semantics Session Semantics Read returns result of last write Relax the rules Easily achieved if • Changes to an open file are – Only one server initially visible only to the process – Clients do not cache data (or machine) that modified it. BUT • Need to hide or lock file under – Performance problems if no cache modification
    [Show full text]
  • CIFS SMB2 SMB3 Meets Linux a Year in Review
    The Future of File Protocols: CIFS SMB2 SMB3 Meets Linux A Year in Review Steve French Senior Engineer – File System Architect IBM Linux Technology Center 1 IBM, Linux, and Building a Smarter Planet © 2012 IBM Corporation Legal Statement – This work represents the views of the author(s) and does not necessarily reflect the views of IBM Corporation – A full list of U.S. trademarks owned by IBM may be found at http://www.ibm.com/legal/copytrade.shtml. – Linux is a registered trademark of Linus Torvalds. – Other company, product, and service names may be trademarks or service marks of others. 2 © 2012 IBM Corporation Who am I? – Steve French ([email protected] or [email protected]) – Author and maintainer of Linux cifs vfs (for accessing Samba, Windows and various SMB/CIFS based NAS appliances) – Wrote initial SMB2 kernel client prototype – Member of the Samba team, coauthor of SNIA CIFS Technical Reference and former SNIA CIFS Working Group chair – Architect: File Systems/NFS/Samba for IBM Linux Technology Center © 2012 IBM Corporation SMB3: Great Feature Set, Broad Deployment, Amazing Performance ● Introduction of new storage features in Windows 8 causing one of most dramatic shifts in storage industry as companies rapidly move to support “SMB3” (aka SMB2.2) ● “SMB2.2 (CIFS) screams over InfiniBand” (Storage CH Blog) • Is (traditional) SAN use going to die? – “Market trends show virtualization workloads moving to NAS” (Dennis Chapman, Technical Director NetApp, SNIA SDC 2011) – “Unstructured data (file-based) storage is growing faster than structured data” (Thomas Pfenning, Microsoft GM, SNIA SDC 2011) – Customers prefer “file” • SMB2/CIFS market share is MUCH larger than NFS.
    [Show full text]
  • Introduction to Distributed File Systems. Orangefs Experience 0.05 [Width=0.4]Lvee-Logo-Winter
    Introduction to distributed file systems. OrangeFS experience Andrew Savchenko NRNU MEPhI, Moscow, Russia 16 February 2013 . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Outline 1. Introduction 2. Species of distributed file systems 3. Behind the curtain 4. OrangeFS 5. Summary . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Introduction Why one needs a non-local file system? • a large data storage • a high performance data storage • redundant and highly available solutions There are dozens of them: 72 only on wiki[1], more IRL. Focus on free software solutions. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Introduction Why one needs a non-local file system? • a large data storage • a high performance data storage • redundant and highly available solutions There are dozens of them: 72 only on wiki[1], more IRL. Focus on free software solutions. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Species of distributed file systems Network Clustered Distributed Terminology is ambiguous!. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Network file systems client1 client2 clientN server storage A single server (or at least an appearance) and multiple network clients. Examples: NFS, CIFS. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Clustered file systems server1 server2 serverN storage Servers sharing the same local storage (usually SAN[2] at block level). shared storage architecture. Examples: GFS2[3], OCFS2[4]. .. .. .. .. .
    [Show full text]
  • Building Windows File Systems: a Look at the Openafs Client for Windows
    Building Windows File Systems: A Look at the OpenAFS Client for Windows Peter Scott Kernel Drivers, LLC Jeffrey Altman Your File System, Inc 2010 Storage Developer Conference. Kernel Drivers, LLC. All Rights Reserved. AFS History … on Windows The SMB Interface Used the Loopback Adapter for NetBIOS name registration All requests were processed by the AFS Service in user mode Minimal kernel level caching Lack of support for network browsing within Explorer 2010 Storage Developer Conference. Kernel Drivers, LLC. All Rights Reserved. 2 More History … ‘Relatively’ easy to implement but problems … Performance Generic solution to fit all situations Microsoft components, very difficult to get bugs fixed Several critical bugs found in SMB with no fix in sight Ex: Static thread worker pool count leads to dead locks Documentation is minimal, at best Reliant on the Windows redirector for network behavior … good and bad 2010 Storage Developer Conference. Kernel Drivers, LLC. All Rights Reserved. 3 A Native Approach … Custom File System Pros … Complete control of name space parsing Reparse Points Mount Points DFS Links Better optimization of IO pathways Minimize Kernel to User transitions Cache invalidation requirements Better system cache integration Support for write-through behavior when needed 2010 Storage Developer Conference. Kernel Drivers, LLC. All Rights Reserved. 4 A Native Approach … And the down side … Complex implementation Undocumented interfaces srvsvc, wkssvc, IPC$ Network Provider support Cache and Memory Manager Must deal with 3rd party products File system filters 2010 Storage Developer Conference. Kernel Drivers, LLC. All Rights Reserved. 5 Windows File System Model Windows IFS (Installable File System) Interface IRP Based model IO Request Packets ‘Fast IO’ Interface used for more than just IO Direct-call model, no IRPs … almost Network Provider Interface for Network Redirectors only Support for drive mappings UNC Browsing 2010 Storage Developer Conference.
    [Show full text]
  • The Evolution of File Systems
    The Evolution of File Systems Thomas Rivera, Hitachi Data Systems Craig Harmer, April 2011 SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individuals may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced without modification The SNIA must be acknowledged as source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the Author nor the Presenter is an attorney and nothing in this presentation is intended to be nor should be construed as legal advice or opinion. If you need legal advice or legal opinion please contact an attorney. The information presented herein represents the Author's personal opinion and current understanding of the issues involved. The Author, the Presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. The Evolution of File Systems 2 © 2012 Storage Networking Industry Association. All Rights Reserved. 2 Abstract The File Systems Evolution Over time additional file systems appeared focusing on specialized requirements such as: data sharing, remote file access, distributed file access, parallel files access, HPC, archiving, security, etc. Due to the dramatic growth of unstructured data, files as the basic units for data containers are morphing into file objects, providing more semantics and feature- rich capabilities for content processing This presentation will: Categorize and explain the basic principles of currently available file system architectures (e.g.
    [Show full text]
  • The Selinux Notebook - the Foundations
    The SELinux Notebook - The Foundations The SELinux Notebook The Foundations Page 1 The SELinux Notebook - The Foundations 0. Notebook Information 0.1 Copyright Information Copyright © 2009 Richard Haines. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled “GNUFree Documentation License”. The scripts and source code in this Notebook are covered by the GNU General Public License. The scripts and code are free source: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or any later version. These are distributed in the hope that they will be useful in researching SELinux, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with scripts and source code. If not, see <http://www.gnu.org/licenses/>. 0.2 Revision History Version Date Description of Change Changed By 1.0 20th Nov ‘09 First release. Richard Haines 0.3 Acknowledgements Logo designed by Máirín Duffy 0.4 Abbreviations Term Definition apol Policy analysis tool AV Access Vector AVC Access Vector Cache BLP Bell-La Padula CC Common Criteria CMW Compartmented Mode Workstation DAC Discretionary Access Control Page 2 The SELinux Notebook - The Foundations Term Definition F-10 Fedora 10 FLASK Flux Advanced Security Kernel - A security-enhanced version of the Fluke kernel and OS developed by the Utah Flux team and the US Department of Defence.
    [Show full text]
  • Around the Linux File System World in 45 Minutes
    Around the Linux File System World in 45 minutes Steve French IBM Samba Team [email protected] Abstract lines of code), most active (averaging 10 changesets a day!), and most important kernel subsystems. What makes the Linux file system interface unique? What has improved over the past year in this important part of the kernel? Why are there more than 50 Linux 2 The Linux Virtual File System Layer File Systems? Why might you choose ext4 or XFS, NFS or CIFS, or OCFS2 or GFS2? The differences are not al- The heart of the Linux file system, and what makes ways obvious. This paper will describe the new features Linux file systems unique, is the virtual file system in the Linux VFS, how various Linux file systems dif- layer, or VFS, which they connect to. fer in their use, and compare some of the key Linux file systems. 2.1 Virtual File System Structures and Relation- ships File systems are one of the largest and most active parts of the Linux kernel, but some key sections of the file sys- tem interface are little understood, and with more than The relationships between Linux file system compo- than 50 Linux file systems the differences between them nents is described in various papers [3] and is impor- can be confusing even to developers. tant to understand when carefully comparing file system implementations. 1 Introduction: What is a File System? 2.2 Comparison with other Operating Systems Linux has a rich file system interface, and a surprising number of file system implementations.
    [Show full text]
  • Openafs Quick Start Guide for UNIX I
    OpenAFS Quick Start Guide for UNIX i OpenAFS Quick Start Guide for UNIX OpenAFS Quick Start Guide for UNIX ii Copyright © 2000-2014 IBM Corporation and other contributors. All Rights Reserved This documentation is covered by the IBM Public License Version 1.0. OpenAFS Quick Start Guide for UNIX iii COLLABORATORS TITLE : OpenAFS Quick Start Guide for UNIX ACTION NAME DATE SIGNATURE WRITTEN BY February 12, 2016 REVISION HISTORY NUMBER DATE DESCRIPTION NAME BP--1.6.x-4781- gc0876-dirty OpenAFS Quick Start Guide for UNIX iv Contents 1 Installation Overview 1 1.1 The Procedures Described in this Guide . .1 1.1.1 Required Initial Procedures . .1 1.1.1.1 Incorporating AFS Into the Kernel . .1 1.1.1.2 Installing the First AFS Machine . .1 1.1.2 As-needed Procedures . .2 1.1.2.1 Upgrading the Operating System . .2 1.1.2.2 Installing Additional File Server Machines . .2 1.1.2.3 Configuring or Decommissioning Database Server Machines . .2 1.1.2.4 Installing Additional AFS Client Machines . .2 1.1.2.5 Building AFS from Source Code . .2 1.1.2.6 Configuring Legacy Components . .2 1.2 Recommended Reading List . .2 1.3 Requirements . .3 1.3.1 Login Identity . .3 1.3.2 General Requirements . .3 1.3.3 File Server Machine Requirements . .4 1.3.4 Client Machine Requirements . .4 1.4 Supported System Types . .4 1.5 About Upgrading the Operating System . .4 1.6 The OpenAFS Binary Distribution . .5 1.7 How to Continue . .5 2 Installing the First AFS Machine 6 2.1 Requirements and Configuration Decisions .
    [Show full text]