A Study for Scalable Directory in Parallel File Systems Yang Wu Clemson University, [email protected]

Clemson University TigerPrints All Theses Theses 8-2009 A Study for Scalable Directory in Parallel File Systems Yang Wu Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_theses Part of the Computer Sciences Commons Recommended Citation Wu, Yang, "A Study for Scalable Directory in Parallel File Systems" (2009). All Theses. 625. https://tigerprints.clemson.edu/all_theses/625 This Thesis is brought to you for free and open access by the Theses at TigerPrints. It has been accepted for inclusion in All Theses by an authorized administrator of TigerPrints. For more information, please contact [email protected]. ASTUDY FOR SCALABLE DIRECTORY IN PARALLEL FILE SYSTEMS A Thesis Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Master of Science Computer Engineering by Yang Wu August 2009 Accepted by: Dr. Walter B. Ligon, III, Committee Chair Dr. Adam W. Hoover Dr. Richard R. Brooks Abstract One of the challenges that the design of parallel file system for HPC(High Perfor- mance Computing) has to face today is maintaining the scalability to handle the I/O gener- ated by parallel applications that involve accessing directories containing a large number of entries and performing hundreds of thousands of operations per second. Currently, highly concurrent access to large directories is poorly supported in parallel file systems. As a re- sult, it is important to build a scalable directory service for parallel file systems to support efficient concurrent access to larger directories. In this thesis we demonstrate a scalable directory service designed for parallel file systems(specifically for PVFS) that can achieve high throughput and scalability while min- imizing bottlenecks and synchronization overheads. We describe important concepts and goals in scalable directory service design and its implementation in the parallel file system simulator–HECIOS. We also explore the simulation model of MPI programs and the PVFS file system in HECIOS, including the method to verify and validate it. Finally, we test our scalable directory service on HECIOS and analyze the performance and scalability based on the results. In summary, we demonstrate that our scalable directory service can effectively handle highly concurrent access to large directories in parallel file systems. We are also able to show that our scalable directory service scales well with the number of I/O nodes in the cluster. ii Table of Contents Title Page ....................................... i Abstract ........................................ ii List of Tables...................................... v List of Figures ..................................... vi 1 Introduction .................................... 1 1.1 File Systems.................................. 1 1.2 Directories................................... 5 1.3 Motivation................................... 9 1.4 Thesis Statement ............................... 10 1.5 Approach ................................... 11 1.6 Thesis Organization.............................. 12 2 Related Work ................................... 13 2.1 File Systems.................................. 13 2.2 HECIOS.................................... 14 2.3 GIGA+ .................................... 17 3 Research and Design ............................... 20 3.1 Server List................................... 21 3.2 P2S Map and Extensible Hashing....................... 23 3.3 Client Lookup................................. 24 3.4 Directory Splitting............................... 27 3.5 Request Forward and Response........................ 29 3.6 Directory Traversal .............................. 32 4 Implementation and Methodology........................ 35 4.1 HECIOS.................................... 35 4.2 Scalable Directory............................... 39 4.3 Modeling ................................... 47 iii 5 Results ....................................... 58 5.1 Mathematical Model ............................. 58 5.2 Throughput and Scalability.......................... 61 5.3 Directory Growth............................... 66 5.4 Synchronization................................ 69 6 Conclusions .................................... 72 6.1 Throughput .................................. 73 6.2 Scalability................................... 73 6.3 Directory Growth............................... 73 6.4 Synchronization................................ 74 6.5 Summary ................................... 74 6.6 Future Works ................................. 75 Bibliography...................................... 76 iv List of Tables 3.1 Example: A Cluster Running PVFS ..................... 22 3.2 Example: Server List for the Directories in the above Cluster . 23 3.3 Example: Lookup Table for Virtual Server ID 1 to 7 . 27 4.1 Scalable Directory Options.......................... 40 4.2 Client Operation State Machines....................... 41 4.3 Palmetto Compute Nodes Architecture.................... 48 4.4 Example: HECIOS Trace Format....................... 49 v List of Figures 1.1 Network File System Overview........................ 3 1.2 Parallel File System Overview ........................ 3 1.3 PVFS API................................... 4 1.4 File Striping in PVFS............................. 6 1.5 Directory in PVFS............................... 8 2.1 HECIOS Architecture............................. 15 2.2 Lookup Directory Entry in GIGA+...................... 18 2.3 Split Directory in GIGA+........................... 19 3.1 Distributed Directory in PVFS ........................ 21 3.2 P2S Map and Distributed Directory...................... 24 3.3 Lookup Operation in PVFS.......................... 25 3.4 Lazy Update and Request Forward in PVFS................. 30 3.5 Directory Entry Collecting in PVFS ..................... 33 4.1 HECIOS Software Layers........................... 36 4.2 HECIOS Objects Hierarchy.......................... 38 4.3 HECIOS Network............................... 39 4.4 Lookup Name State Machine......................... 42 4.5 Create Directory State Machine........................ 42 4.6 Remove Directory Entry State Machine ................... 42 4.7 Create Directory Entry State Machine .................... 43 4.8 Server-side Lookup State Machine...................... 44 4.9 Server-side Access Directory Entry State Machine.............. 45 4.10 Server-side Create Directory Entry State Machine.............. 46 4.11 Server-side Read Directory State Machine.................. 46 4.12 Server-side Split Directory State Machine .................. 47 4.13 HECIOS Simulation Sequence Diagram................... 50 4.14 Create Directory Entry Processing Time Distribution ............ 52 4.15 Create Directory Entry Processing Time Cdf................. 53 4.16 Remove Directory Entry Processing Time Distribution ........... 54 4.17 Remove Directory Entry Processing Time Cdf................ 54 4.18 Lookup Path Processing Time Distribution.................. 55 4.19 Lookup Path Processing Time ........................ 56 vi 4.20 Lookup Path Processing Time Cdf ...................... 56 5.1 Directory Entry Creation Queuing Model .................. 59 5.2 Directory Entry Creation Throughput for Pre-Distributed Directory . 62 5.3 Directory Entry Creation Queue Size..................... 63 5.4 Directory Entry Creation Throughput in a Dynamically Splitting Directory . 64 5.5 Directory Entry Creation Queue Size for a Pre-Distributed Directory . 64 5.6 Directory Entry Creation Waiting Time for a Pre-Distributed Directory . 65 5.7 Directory Entry Creation Waiting Time for a Dynamically Splitting Directory 67 5.8 Directory Growth in Pre-Distributed Directory................ 68 5.9 Directory Growth in Dynamically Splitting Directory............ 68 5.10 Synchronization Overhead in Pre-Distributed Directory........... 70 5.11 Synchronization Overhead in Dynamically Splitting Directory . 71 vii Chapter 1 Introduction 1.1 File Systems A file system is the part of an operating system that deals with the management of files, i.e., how are they structured, named, accessed, used, protected, implemented, and managed. From a user’s point of view, as is stated by Tanenbaum in [29], a file system is a collection of files and directories, plus operations on them(e.g., reading and writing files, creating and destroying directories, and moving files among directories). On the other side, from a designer’s point of view, a file system can look quite different. Designers need to be concerned with the implementation of storage allocation, directory structure, and the way that the system keeps track of the relationship between blocks and files. File systems vary in those aspects as well as in the methods they use to improve reliability, performance and scalability. Different file systems have been developed to meet various of requirements from hardware devices, operating systems, data access patterns and users. Here we categorize them into three groups: local file systems, network file systems and parallel file systems, and discuss each of them in the following sections. 1 1.1.1 Local File Systems A local file system is a file system designed for the storage of files on a local data storage device, including but not limited to disk storage devices(e.g., floppy disk, hard disk drive, optical disk), tape storage devices and flash memory cards(e.g., multimedia card, USB flash drive, solid-state drive). Local file systems can trace their history back to the early days of computers

Load more