COMP520-12C Final Report Nomadfs a Block
Total Page:16
File Type:pdf, Size:1020Kb
COMP520-12C Final Report NomadFS A block migrating distributed file system Samuel Weston This report is in partial fulfilment of the requirements for the degree of Bachelor of Computing and Mathematical Sciences with Honours (BCMS(Hons)) at The University of Waikato. ©2012 Samuel Weston Abstract A distributed file system is a file system that is spread across multiple ma- chines. This report describes the block-based distributed file system NomadFS. NomadFS is designed for small scale distributed settings, such as those that exist in computer laboratories and cluster computers. It implements features, such as caching and block migration, which are aimed at improving the perfor- mance of shared data in such a setting. This report includes a discussion of the design and implementation of No- madFS, including relevant background. It also includes performance measure- ments, such as scalability. 2 Acknowledgements I would like to thank all the friendly members of the WAND network research group. This especially includes my supervisor Tony McGregor who has provided me with a massive amount of help over the year. Thanks! On a personal level I have enjoyed developing NomadFS and have learnt a great deal as a consequence of this development. This learning includes improv- ing my C programming ability, both in user space and kernel space (initially NomadFS was planned to be developed as a kernel space file system). I have also learnt a large amount about file systems and operating systems in general. 3 nomad /'n@Umæd/ noun member of tribe roaming from place to place for pasture; 4 Contents 1 Introduction 11 2 Background 13 2.1 A file system overview . 13 2.1.1 System calls . 15 2.2 Distributed systems . 17 2.2.1 Communication . 18 2.2.2 Synchronisation and Consistency . 18 2.2.3 Fault Tolerance . 18 2.2.4 Performance . 19 2.2.5 Scalability . 19 2.2.6 Transparency . 19 2.3 The Linux Virtual File System . 19 2.4 Filesystem in Userspace . 20 2.5 Summary . 21 3 Goals 22 4 File System Survey 24 4.1 Network File System (NFS) . 24 4.2 Gluster File System (GlusterFS) . 25 4.3 Google File System . 25 4.4 Zebra and RAID . 26 4.5 Summary . 27 5 Design 28 5.1 Overview . 28 5.2 Block interface . 29 5.2.1 Block-based approach . 29 5.2.2 Identification and locality . 30 5 5.3 File system structure . 30 5.3.1 Communication API . 31 5.4 Performance and Reliability . 32 5.4.1 Cache . 33 5.4.2 Synchronisation . 34 5.4.3 Block Mobility and Migration . 35 5.4.4 Block Allocation . 35 5.4.5 Prefetching . 36 5.4.6 Scalability . 36 5.5 Summary . 36 6 Implementation 37 6.1 Clients and Block Servers . 38 6.1.1 Client . 38 6.1.2 Block Server . 40 6.1.3 Locality and Client start up . 40 6.2 Communication . 41 6.2.1 Transport Protocol . 41 6.2.2 Messages . 41 6.2.3 Common Client and Server Communication . 42 6.2.4 Client Network Queue . 42 6.2.5 Overlapped IO . 43 6.2.6 Block server specific communication . 44 6.3 Synchronisation . 44 6.3.1 Distributed Synchronisation . 44 6.3.2 Internal Synchronisation . 45 6.4 Cache . 46 6.4.1 Cache coherency . 46 6.5 Block migration . 47 6.6 Aggressive Prefetching . 48 6.7 Issues and Challenges . 48 6.8 Summary . 48 7 Evaluation 49 7.1 Test Environment . 49 7.2 Migration . 50 7.3 Scalability . 51 7.4 Effect of block size on performance . 53 7.5 NFS Comparison . 54 6 7.6 IOZone . 54 7.7 Summary . 57 8 Conclusions and Future Work 58 8.1 Summary . 58 8.2 Conclusion . 58 8.3 Future Work . 59 8.3.1 Potential Extensions . 59 8.4 Final Words . 60 Bibliography 62 A Performance analysis scripts 64 B NomadFS current quirks 66 C IOZone benchmark results 68 D Configuration file format for NomadFS 70 E NomadFS source code listing 71 7 List of Figures 2.1 A file system . 13 2.2 File system layout on block abstraction (not to scale) . 14 2.3 Inode structure including indirection blocks . 14 2.4 A distributed file system . 17 2.5 VFS flow example. A user space write system call passes through the VFS and reaches the required file system write function. Adapted from Fig. 13.2 [11]. 20 5.1 High Level Architecture . 28 5.2 Client to server link . 29 5.3 Block and inode identifier . 30 5.4 Message passing . 31 5.5 File based cache invalidation . 34 6.1 Client Architecture . 38 6.2 Message layout in NomadFS (Data Block not to scale) . 41 6.3 Network queueing . 43 6.4 Overlapped IO (Adapted from Figure 2.4 [12]) . 43 6.5 Synchronisation . 45 6.6 Buffer Cache (Adapted from Fig. 5-20 [20]) . 46 6.7 Migration flow . 47 7.1 Test Environment . 50 7.2 Migration Performance . 51 7.3 Scalability on file smaller than cache . 52 7.4 Scalability on file larger than cache . 52 7.5 Affect of block size on performance . 53 7.6 IOZone Write . 55 7.7 IOZone Random Write . 55 7.8 IOZone Read . 56 8 7.9 IOZone Random Read . 56 9 Acronyms API Application Programming Interface. FUSE Filesystem in Userspace. LFS Log-Structured File System. NFS Network File System. RAID Redundant Array of Individual Disks. VFS Virtual File System. 10 Chapter 1 Introduction Multiple computer systems such as cluster computers and computer laboratories generally have a large amount of aggregate storage, due to each machine having its own `small' hard disk drive. As opposed to making use of the combined storage and performance capabilities of these `small' disks, a common approach to shared data in these systems is to use a single centralised storage system. A distributed file system which can take advantage of these storage and per- formance capabilities would help to improve the usefulness of shared data in a small scale distributed settings. This report covers the design and implementation of NomadFS, a new, pri- marily block-based distributed file system for the Linux environment. NomadFS is aimed at meeting the needs of smaller scale distributed environments. From a user's standpoint, performance is important. Because of this NomadFS has built in functionality which allows maximal usage of the machine's local disk. This includes preferring the local disk when creating data and allowing data to migrate to the disks of machines which use it the most. So that goals such as migration could be implemented and tested, common distributed file system functionalities such as fault tolerance through replication were not deemed a priority in this research. When approaching this problem there were a number of options available on how to implement such a file system. Firstly a decision was needed on whether the underlying architecture would operate on blocks or files. A block-based approach refers to the ability for the file system to operate directly on top of a block device while a file-based approach means that the file system relies on some form of underlying file architecture. For reasons that are explained in Chapter 5, a block-based approach, with some file based elements, was chosen for NomadFS. 11 Chapter 2 contains background file system information. This includes a background to file systems, block-based file systems and distributed systems. An understanding of these topics is required to fully understand this project. Chapter 3 contains the set of goals which NomadFS aimed to meet. Distributed file systems are not a new topic in Computer Science, it is therefore necessary that some related implementations are surveyed. This file system survey can be found in Chapter 4. The design and implementation of NomadFS are central to this project and are covered in Chapters 5 and 6. Chapter 5 overviews the design of NomadFS, and why these design decisions were made. Chapter 6 covers the implementa- tion, and covers the specifics of how the various design elements were imple- mented in NomadFS. Chapter 7 contains a performance oriented evaluation of NomadFS in its current state. Chapter 8 rounds off the report with conclusions and potential future work. 12 Chapter 2 Background This chapter covers the background to this project. This includes an overview of file systems and in particular block-based file systems for the unfamiliar reader. Distributed systems, distributed file systems, and some the issues they encounter are then covered. The chapter ends by covering the Linux Virtual Filesystem (VFS) and Filesystem in Userspace (FUSE) with some depth. An understanding of these topics, especially the later ones is important in the context of this project. 2.1 A file system overview A file system is software that provides a means for users to store their data in a persistent manner. From the user's point of view this is generally seen as directories and files. End User / Files and /file1 directories /directory1/ /directory1/file2 File System Block Device Figure 2.1: A file system 13 In file system terminology disks, or raw devices, are divided into equal sized segments called blocks. File systems are then built on top of this block-based storage abstraction, which is typically provided by a block device driver that interfaces with a piece of hardware such as a hard disk drive (HDD). For data to remain persistent, the file system must lay the data out on this series of blocks in an organised manner. Most Unix file systems do this by making use superblocks, inodes, bitmap areas, and data blocks.