COSC 6397 Big Data Analytics Distributed File Systems

COSC 6397 Big Data Analytics Distributed File Systems Edgar Gabriel Spring 2015 What is a file system • A clearly defined method that the OS uses to store, catalog and retrieve files • Manage the bits that make up a file itself and Metadata • Metadata: “data about data”, e.g. – where data is logically placed on hard drive – file name – organizational hierarchies (i.e. directory) – Last modification date – Permissions(read,write,execute etc.) 1 UNIX File Model - overview • A File is a sequence of bytes • When a program opens a file, the file system establishes a file pointer. The file pointer is an integer indicating the position in the file, where the next byte will be written/read. • Disk drives read and write data in fixed-sized units (disk sectors) • File systems allocate space in blocks, which is a fixed number of contiguous disk sectors. • In UNIX based file systems, the blocks that hold data are listed in an inode. An inode contains the information needed to find all the blocks that belong to a file. • If a file is too large and an inode can not hold the whole list of blocks, intermediate nodes (indirect blocks) are introduced. Write operations • Write: – the file systems copies bytes from the user buffer into system buffer. – If buffer filled up, system sends data to disk • System buffering + allows file systems to collect full blocks of data before sending to disk + File system can send several blocks at once to the disk (delayed write or write behind) - Data not really saved in the case of a system crash - For very large write operations, the additional copy from user to system buffer could/should be avoided 2 Read operations • Read: – File system determines, which blocks contain requested data – Read blocks from disk into system buffer – Copy data from system buffer into user memory • System buffering: + file system always reads a full block (file caching) + If application reads data sequentially, prefetching (read ahead) can improve performance - Prefetching harmful to the performance, if application has a random access pattern. Hiding disk latency: Caching and buffering • Avoids repeated access to the same block • Allows a file system to smooth out I/O behavior • Helps to hide the latency of the hard drives • Lowers the performance of I/O operations for irregular access • Non-blocking I/O gives users control over prefetching and delayed writing – Initiate read/write operations as soon as possible – Wait for the finishing of the read/write operations just when absolutely necessary. 3 Journaling file systems • Updating a file takes typically multiple steps. An interruption between the steps leads to an inconsistent file system • Example: deleting a file – Remove the directory entry – Mark the inode blocks as free in the space map • A journaling file system keeps track of the changes that will be made in a journal before committing them to the main file system – Entries to journal are made before modifying the file sytem • After a crash, the journal is replied and an entry either – Succeeds: could be completely replayed during recovery – Not replayed: journal entry has not been finished – Journal entries often contain a checksum per entry to verify for corruption Journaling file systems (II) • Physical journal: – Data and metadata are written to the journal before modifying the file system – Large overhead -> data written twice • Logical journal: – Only metadata written to journal – Modifications to data written to file system directly -> worst case scenario: data is garbage, but directory structure and file structure are consistent -> trade off between performance and reliability 4 Log structured file systems • Conventional file systems lay out files to optimize spatial locality – make in-place changes to their data structures in order to perform well on magnetic disks (seek is slow) • Log-structured file systems treat storage as a circular buffer – Write always occurs to the head of the log • Writes create multiple, chronologically-advancing versions of both file data and meta-data – Can be used to make old file versions nameable and accessible (snapshotting) • Recovery from crashes is simpler: upon its next mount, the file system can reconstruct its state from the last consistent point in the journal – not need to walk all its data structures Distributed File Systems • The generic term for a client/server file system where the data is not locally attached to a host. • Clients, servers, and storage are dispersed across machines. • Configuration and implementation may vary • Clients should view a DFS the same way they would a centralized FS; the distribution is hidden at a lower level. • Performance is concerned with throughput and response time. Slide based on a lecture by Jerry Breecher: http://web.cs.wpi.edu/~jb/CS502/lectures/Section17-Dist_File_Sys.ppt 5 Distributed File Systems - Characteristics • Naming: mapping between logical and physical objects – Example: A filename maps to <cylinder, sector>. – In a conventional file system, it's understood where the file actually resides; the system and disk are known. – In a transparent DFS, the location of a file, somewhere in the network, is hidden. • Location transparency: The name of a file does not reveal any hint of the file's physical storage location. • Location independence: The name of a file doesn't need to be changed when the file's physical storage location changes. Slide based on a lecture by Jerry Breecher: http://web.cs.wpi.edu/~jb/CS502/lectures/Section17-Dist_File_Sys.ppt Distributed File Systems - Characteristics • Caching – Reduce network traffic by retaining recently accessed disk blocks in a cache, so that repeated accesses to the same information can be handled locally. – If required data is not already cached, a copy of data is brought from the server to the user. – Perform accesses on the cached copy. – Files are identified with one master copy residing at the server machine, – Copies of (parts of) the file are scattered in different caches. • Cache Consistency Problem: Keeping the cached copies consistent with the master file. Slide based on a lecture by Jerry Breecher: http://web.cs.wpi.edu/~jb/CS502/lectures/Section17-Dist_File_Sys.ppt 6 Distributed File Systems - Characteristics • Typical steps for a read operation: – The client makes a request for file access. – The request is passed to the server in message format. – The server makes the file access. – Return messages bring the result back to the client. • Cache location: – data can be kept in the local memory or in the local disk. – Caching can be done on the client and the server side Slide based on a lecture by Jerry Breecher: http://web.cs.wpi.edu/~jb/CS502/lectures/Section17-Dist_File_Sys.ppt Distributed File Systems - Characteristics • Stateful: server keeps track of information about client requests. – Maintains what files are opened by a client – Memory must be reclaimed when client closes file or when client dies. – Good for Performance: no need to parse the filename each time, or "open/close" file on every request. – Bad for Reliability: stateful server loses everything on crash • Stateless: Each client request provides complete information needed by the server (i.e., filename, file offset ). – Server maintains information on behalf of the client – Stateless remembers nothing so it can start easily after a crash Slide based on a lecture by Jerry Breecher: http://web.cs.wpi.edu/~jb/CS502/lectures/Section17-Dist_File_Sys.ppt 7 Example: NFS – The Network File System • Protocol for a remote file service • Stateless server (v3) • Communication based on RPC (Remote Procedure Call) • NFS provides session semantics – changes to an open file are initially only visible to the process that modified the file • File locking not part of NFS protocol (v3) but often available through a separate protocol/daemon Image taken from a lecture by Jerry Breecher: http://web.cs.wpi.edu/~jb/CS502/lectures/Section1 • Client caching not part of the 7-Dist_File_Sys.ppt NFS protocol (v3) – implementation dependent behavior Parallel File Systems • Parallel File System: data blocks are striped across multiple storage devices on multiple storage servers. • Support for parallel applications: all nodes access to the same files at the same time (concurrent read and write capabilities) • Three relevant parameters: – Stripe factor: number of disks – Stripe size: size of each block – Which disk contains the first block of the file … Block 1 Block 2 Block 3 Block n … Disk 1 Disk 2 Disk 3 Disk 4 8 Parallel File Systems: Conceptual overview Compute nodes Meta-data server storage server 0 storage server 1 storage server 2 storage server 3 Parallel File Systems - Concept • Metadata server: – stores namespace metadata, such as filenames, directories, access permissions, and file layout. – Metadata server not necessarily involved in file I/O operations • Distributed Metadata server: – E.g. multiple metadata server available, each hosting a part of the namespace • hashing function on file name or • Sub trees of the directory • Write operations: – Require locking of entire file or file block to ensure consistency – Distributed locking protocols can be used 9 Example: Parallel Virtual File System • Open source project from Clemson University • Lightweight server daemon to provide simultaneous access to storage • Each node in the cluster can be a server, a client, or both. • Best suited for providing large, fast temporary storage. • The basic PVFS2 package consists of three components: a server, a client, and a kernel module. • Default stripe size: 64kB – In practice: often changed to 1 MB – Can be adjusted on a per-directory basis Slides based on a talk by James W. Barker: http://www.slideshare.net/lystrata/survey-of-clusteredparallelfilesystems004lanlppt-10538039 Example: Parallel Virtual File System • Stateless architecture – PVFS2 servers do not keep track of typical file system bookkeeping information such as which files have been opened, file positions, etc. – No shared lock state to manage – Can fail and resume without disturbing the system as a whole.

Load more