Decafs: a Modular Distributed File System to Facilitate

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by DigitalCommons@CalPoly DECAFS: A MODULAR DISTRIBUTED FILE SYSTEM TO FACILITATE DISTRIBUTED SYSTEMS EDUCATION A Thesis presented to the Faculty of California Polytechnic State University San Luis Obispo In Partial Fulfillment of the Requirements for the Degree Master of Science in Computer Science by Halli Meth June 2014 c 2014 Halli Meth ALL RIGHTS RESERVED ii COMMITTEE MEMBERSHIP TITLE: DecaFS: A Modular Distributed File Sys- tem to Facilitate Distributed Systems Ed- ucation AUTHOR: Halli Meth DATE SUBMITTED: June 2014 COMMITTEE CHAIR: Associate Professor Chris Lupo, Ph.D., Department of Computer Science COMMITTEE MEMBER: Professor Alexander Dekhtyar, Ph.D., Department of Computer Science COMMITTEE MEMBER: Associate Professor Aaron Keen, Ph.D., Department of Computer Science COMMITTEE MEMBER: Associate Professor John Bellardo, Ph.D., Department of Computer Science iii ABSTRACT DecaFS: A Modular Distributed File System to Facilitate Distributed Systems Education Halli Meth Data quantity, speed requirements, reliability constraints, and other factors en- courage industry developers to build distributed systems and use distributed services. Software engineers are therefore exposed to distributed systems and services daily in the workplace. However, distributed computing is hard to teach in Computer Science courses due to the complexity distribution brings to all problem spaces. This presents a gap in education where students may not fully understand the challenges introduced with distributed systems. Teaching students distributed concepts would help better prepare them for industry development work. DecaFS, Distributed Educational Component Adaptable File System, is a modular distributed file system designed for educational use. The goal of the system is to teach distributed computing concepts to undergraduate and graduate level students by allowing them to develop small, digestible portions of the system. The system is broken up into layers, and each layer is broken up into modules so that students can build or modify different components in small, assignment-sized portions. Students can replace modules or entire layers by following the DecaFS APIs and recompiling the system. This allows the behavior of the DFS (Distributed File System) to change based on student implementation, while providing base functionality for students to work from. Our implementation includes a code base of core DecaFS Modules that students can work from and basic implementations of non-core DecaFS Modules. Our basic non-core modules can be modified to implement more complex distribution techniques iv without modifying core modules. We have shown the feasibility of developing a modular DFS, while adhering to requirements such as configurable sizes (file, stripe, chunk) and support of multiple data replication strategies. v ACKNOWLEDGMENTS Thanks to: • Chris Lupo and Alex Dekhtyar for project inspiration, support and guidance. • Jeffrey Forrester #1 lab partner since 2010. • Peter Faiman compilation magician, git wizard, and data storage sorcerer. • My Mom and Dad for constant love and support (and letting me vent through my stress). vi TABLE OF CONTENTS List of Tables xii List of Figures xiii 1 Introduction1 1.1 Distributed Systems in Education....................1 1.2 Distributed File Systems.........................1 1.3 Our Contribution.............................2 2 Background4 2.1 Transparency...............................4 2.2 Fault Tolerance..............................4 2.2.1 Availability............................5 2.2.2 Replication............................5 2.3 Scalability.................................5 2.4 File Names................................6 2.4.1 Additional Naming Properties..................6 3 Related Work7 3.1 Andrew File System (AFS)........................7 3.2 Google File System (GFS)........................8 3.2.1 GFS Architecture.........................9 3.3 Hadoop Distributed File System (HDFS)................ 10 3.4 Others................................... 11 3.5 Related Work and DecaFS........................ 12 4 DecaFS Requirements 13 4.1 System Requirements........................... 13 4.2 DFS Requirements............................ 13 4.3 Limitations and Configuration...................... 14 vii 5 DecaFS Design 15 5.1 Overview.................................. 15 5.2 Definitions................................. 18 5.3 Barista................................... 18 5.3.1 Barista Core............................ 19 5.3.2 Persistent Metadata....................... 19 5.3.3 Volatile Metadata......................... 20 5.3.4 Locking Strategy......................... 20 5.3.5 IO Manager............................ 20 5.3.6 Distribution Strategy....................... 21 5.3.7 Replication Strategy....................... 21 5.3.8 IO, Distribution and Replication................ 21 5.3.9 Access Module.......................... 22 5.3.10 Monitored Strategy........................ 22 5.4 Espresso.................................. 22 5.4.1 Espresso Storage......................... 23 5.5 Network.................................. 23 5.5.1 Net TCP............................. 23 5.5.2 Network Core........................... 23 5.6 Connecting Layers............................ 24 5.6.1 DecaFS Barista.......................... 24 5.6.2 Espresso Core........................... 24 6 DecaFS Workflows 25 6.1 Data Flow................................. 25 6.2 Open.................................... 26 6.3 Read.................................... 29 6.4 Write.................................... 31 6.4.1 Metadata............................. 32 6.4.2 Chunks and Replica Chunks................... 32 6.4.3 Distribution and Replication Strategy Modules......... 33 6.4.4 Writes During Node Failures................... 33 viii 6.5 Close.................................... 35 6.6 Delete................................... 37 6.6.1 Metadata............................. 38 6.6.2 Node Failures........................... 38 6.7 Seek.................................... 41 6.8 Stat.................................... 43 6.8.1 Stat and Write Processing.................... 43 7 DecaFS Implementation 45 7.1 Barista Core................................ 45 7.1.1 DecaFS Client Request State.................. 48 7.1.1.1 Request Information.................. 48 7.1.1.2 Read........................... 49 7.1.1.3 Write.......................... 50 7.1.1.4 Delete.......................... 51 7.1.2 Internal DecaFS Requests.................... 51 7.2 Persistent STL.............................. 52 7.3 Persistent Metadata........................... 53 7.4 Volatile Metadata............................. 56 7.5 Locking Strategy............................. 60 7.6 IO Manager................................ 62 7.7 Distribution Strategy........................... 65 7.8 Replication Strategy........................... 66 7.9 Node Failures for our Distribution/Replication Strategy........ 66 7.9.1 Read................................ 66 7.9.2 Write................................ 67 7.10 Access Module.............................. 68 7.11 Monitored Strategy............................ 70 7.12 Espresso Storage............................. 72 7.12.1 read chunk()............................ 73 7.12.2 write chunk()........................... 73 7.12.3 delete chunk()........................... 73 ix 7.13 FUSE................................... 75 8 Testing and Validation 76 8.1 Google Test and Google Mock...................... 76 8.2 Espresso.................................. 77 8.3 Barista................................... 77 8.3.1 Independent Modules....................... 78 8.3.2 Dependent Modules........................ 79 8.3.2.1 Volatile Metadata................... 79 8.3.2.2 Other Mocks...................... 79 8.4 Data Storage............................... 81 9 Conclusions 84 9.1 Requirements............................... 84 9.2 Discussion................................. 86 10 Future Work 87 10.1 Classroom Use.............................. 87 10.2 Testing................................... 87 10.2.1 Automated System Tests..................... 88 10.2.2 API Usability........................... 88 Bibliography 89 Appendix 91 A Workflows for Failures 92 A.1 Open Failures............................... 92 A.2 Read Failures............................... 94 A.3 Write Failures............................... 95 A.4 Close Failures............................... 97 A.5 Delete Failures.............................. 98 A.6 Seek Failures............................... 99 A.7 Stat Failures................................ 100 B APIs 101 B.1 DecaFS Types............................... 101 x B.2 Barista Core API............................. 106 B.3 Persistent Metadata API......................... 116 B.4 Volatile Metadata API.......................... 119 B.5 Locking Strategy API........................... 124 B.6 IO Manager API............................. 126 B.7 Distribution Strategy API........................ 131 B.8 Replication Strategy API......................... 131 B.9 Access API................................ 131 B.10 Monitored Strategy API......................... 133 B.11 Espresso Storage API........................... 135 xi LIST OF TABLES 7.1 Barista Core Client API......................... 47 7.2 Persistent Metadata External API.................... 55 7.3 Volatile Metadata (System Health) External API........... 57 7.4 Volatile Metadata (File Cursor) External API............

Load more