Optimizing Local File Accesses for FUSE-Based Distributed Storage

Optimizing Local File Accesses for FUSE-Based Distributed Storage Shun Ishiguro∗ Jun Murakami∗ Yoshihiro Oyama∗z Osamu Tatebeyz ∗Department of Informatics, The University of Electro-Communications Email: fshun,[email protected], [email protected] yFaculty of Engineering, Information and Systems, University of Tsukuba Email: [email protected] zJapan Science and Technology Agency, CREST Abstract—Modern distributed file systems can store huge these communications between the kernel module and the amounts of information while retaining the benefits of high reli- userland daemon involve frequent memory copies and context ability and performance. Many of these systems are prototyped switches, they introduce significant runtime overhead. The with FUSE, a popular framework for implementing user-level file systems. Unfortunately, when these systems are mounted framework forces applications to access data in the mounted on a client that uses FUSE, they suffer from I/O overhead file system via the userland daemon, even when the data is caused by extra memory copies and context switches during stored locally and could be accessed directly. The memory local file access. Overhead imposed by FUSE on distributed copies also increase memory consumption because redundant file systems is not small and may significantly degrade the data is stored in different page cache. performance of data-intensive applications. In this paper, we propose a mechanism that achieves rapid local file access in In this paper, we propose a mechanism that allows appli- FUSE-based distributed file systems by reducing the number cations to access local storage directly via the FUSE kernel of memory copies and context switches. We then incorporate module. We focus on the flow of operations performed in the mechanism into the FUSE framework and demonstrate its local file accesses by FUSE-based file systems and propose effectiveness through experiments, using the Gfarm distributed a mechanism to “bypass” redundant context switches and file system. memory copies. We then explain our modifications to the FUSE kernel module in order to incorporate our mechanism I. INTRODUCTION into it. The result is a kernel module that directly accesses local Highly scalable distributed file systems have become a storage wherever possible, thereby eliminating unnecessary key component of data-intensive science where low latency, kernel-daemon communication and many memory copies and high capacity, and high scalability of storage are crucial to context switches associated with this communication, which application performance. As a result, a large body of literature in turn improves the I/O performance of FUSE-based file has been devoted to distributed file systems and how they can systems. store a great number of large files by federating storage across We demonstrate the effectiveness of our proposed mech- multiple servers [1]–[7]. anism by using a Gfarm distributed file system [1] against Several modern distributed file systems [1], [3], [6], [7] our modified version of FUSE and testing its performance. allow users to mount their file systems on UNIX clients by Our results show that the mechanism significantly improved using FUSE [8], a widely used framework for implementing the I/O throughput of the Gfarm file system (ranging from user-level file systems. Once the file systems are mounted in 26% to 63%), and also greatly reduced the amount of memory this way, the distributed file system can be accessed with stan- consumed by page caching (half of the original). Because our dard system calls such as open, close, read, and write. mechanism has no strict dependencies on Gfarm, we believe While this approach increases the simplicity and enhances the that our test results are generalizable to other FUSE-based separation of concerns significantly, it also has a drawback: the distributed file systems and that our proposed mechanism current FUSE framework imposes considerable overhead on could be clearly integrated into the FUSE kernel module. I/O and can degrade the overall performance of data-intensive The contributions of this work are (1) to provide a mecha- applications. Previous papers describe that FUSE may incur nism for reducing context switches and memory copies occur- large runtime overhead [9]–[11]. For example, Bent et al. [9] ring in local storage accesses in FUSE-based distributed file reported that approximately 20% overhead was imposed on systems, and (2) to demonstrate its effectiveness by adapting the bandwidth of file systems. the mechanism to Gfarm and showing experimental results. The FUSE framework consists of a kernel module and II. GFARM AND FUSE userland daemons. The kernel module mediates between applications and a userland daemon, forwarding requests for A. Gfarm file system access to the daemon and returning the pro- Figure 1 shows the basic architecture of Gfarm, which cessing results of the daemon to the application. Because consists of metadata servers and I/O servers. A metadata ① open ⑥ close metadata server ④ ② file locations validity, file info I/O server I/O server I/O server & & & client client client Fig. 2. Access to a user-level file system using FUSE ③ open ⑤ read, write ⑥ close Fig. 1. Gfarm Architecture 3) The daemon processes the request in several ways. Some use a local file system such as ext3. Others use a remote server manages file system metadata, including inodes, direc- server and network communications. tory entries, and file locations. I/O servers manage file data 4) The daemon returns the result to the FUSE module. and are run in either dedicated nodes or client nodes. Under 5) The FUSE module forwards the result to the application certain settings, an I/O server and its client programs run in program. the same node; under other settings, they may run in different To clarify the runtime overhead imposed on reading and nodes. writing files, we here outline the flow of the read operation One of the ways for applications to access a Gfarm file in FUSE-based file systems. First, the application issues a system is to call functions in the Gfarm library, including those read system call to the user-level file system, which sends a for standard operations such as open and read. When a client corresponding read request to the FUSE module. The FUSE accesses a file, the flow of the operation is as follows: module receives this read request and forwards it to a userland 1) The client sends an open request to a metadata server daemon by writing the request and its properties to the special when it wants to open a file. file /dev/fuse. The daemon picks up the request from 2) The client receives the location of the file from the /dev/fuse and performs the operations appropriate for the metadata server. read request, such as copying data to the FUSE module buffer. 3) The client sends an open request to the I/O server that Finally, the FUSE module returns the result to the application. contains the file. Due to these additional context switches and memory 4) The I/O server communicates with the metadata server to copies, the performance of a FUSE-based user-level file system check whether the open request is valid. File information tends to be poorer than that of a kernel-level file system. such as an inode number and a generation number is also At minimum, two additional context switches occur when communicated. an application process sends a request to a user-level file 5) The client sends further requests, such as read and write, system and receives the result: one to transfer control from to the I/O server. the application process to the daemon, and another to transfer 6) The client sends a close request to both the metadata control back from the daemon to the application process. server and the I/O server. Additional memory copies also occur, when request data (such By restricting communication of read and write requests to as the request type) is communicated to the daemon via only the I/O server containing the file, Gfarm reduces the /dev/fuse, and when the daemon copies file data to the workloads of the metadata server and the network. FUSE module’s own buffer, so that it can be returned to the application. B. FUSE The basic FUSE architecture consists of a kernel module C. Accessing a FUSE-Mounted Gfarm File System (hereafter referred to as the “FUSE module”) and userland Gfarm2fs is a FUSE-based userland daemon for mounting daemons. Developers create their user-level file systems within a Gfarm file system on a UNIX client. Figure 3 shows the these daemons by providing file system operations such as flow of a file access when gfarm2fs is used. The steps are as open, close, read, and write. follows: Figure 2 shows how the FUSE framework is used to access 1) The application program issues a system call for a file a user-level file system. The steps are as follows: access. 1) The application issues a system call to the operating 2) The FUSE module receives a request for file system system to access files in a FUSE-based file system. The access. service routine for the system call sends a request for 3) The FUSE module forwards the request to the gfarm2fs the corresponding file operation to the FUSE module. daemon. gfarm2fs processes the request by communi- 2) The FUSE module forwards the request to a userland cating with a metadata server and I/O servers using the daemon. Gfarm library. Fig. 3. Access to a Gfarm file system using FUSE Fig. 4. Flow of the open component. The FUSE module obtains a file descriptor from gfarm2fs, which opens the file. 4) The results of the request are returned to the FUSE module.

Load more