Removing the Middleman for High-Performance FUSE File

Removing the Middleman for High-Performance FUSE File

Direct-FUSE: Removing the Middleman for High-Performance FUSE File System Support Yue Zhu*, Teng Wang*, Kathryn Mohror+, Adam Moody+, Kento Sato+, Muhib Khan*, Weikuan Yu* Florida State University* Lawrence Livermore National Laboratory+ Outline • Background & Motivation ·Design ·Performance Evaluation ·Conclusion ROSS’18 S-2 Introduction n High-performance computing (HPC) systems needs efficient file system for supporting large-scale scientific applications Ø Different file systems are used for different kinds of data in a single job Ø Both kernel- and user-level file systems can be used in the applications Ø Due to kernel-level file systems’ development complexity, reliability and portability issues, user-level file systems are more leveraged for particular I/O workloads with special purpose n Filesystem in UserSpacE (FUSE) Ø A software interface for UniX-like computer operating systems Ø It allows non-privileged users to create their own file systems without modifying kernel code Ø User defined file system is run as a separate process in user-space Ø Example: SSHFS, GlusterFS client, FusionFS(BigData’14) ROSS’18 S-3 How does FUSE Work? n Execution path of a function call Ø Send the request to the user-level file system process o App program → VFS → FUSE kernel module → User-level file system process Ø Return the data back to the application program o User-level file system process → FUSE kernel module → VFS → App program Application Program User Level File System User Space Kernel Space Virtual File System (VFS) FUSE Ext4 Page Cache Storage Device ROSS’18 S-4 FUSE File System vs. Native File System FUSE File System Native File System # User-kernel 4 2 Mode Switch # Context 2 0 Switch # Memory 2 1 Copies Application Program User Level File System User Space Kernel Space Virtual File System (VFS) FUSE Ext4 Page Cache Storage Device ROSS’18 S-5 Number of Context Switches & I/O Bandwidth n The complexity added in FUSE file system execution path causes performance degradation in I/O bandwidth Ø tmpfs: a file system that stores data in volatile memory Ø FUSE-tmpfs: a FUSE file system deployed on top of tmpfs Ø dd micro-benchmark and perf system profiling tool are used to gather the I/O bandwidth and the number of context switches Ø Experiment method: continually issue 1000 writes Write Bandwidth # Context Switches Block FUSE-tmpfs tmpfs FUSE- tmpfs Size (KB) (MB/s) (GB/s) tmpfs 4 163 1.3 1012 7 16 372 1.6 1012 7 64 519 1.7 1012 7 128 549 2.0 1012 7 256 569 2.4 2012 7 ROSS’18 S-6 Breakdown of Metadata & Data Latency n The actual file system operations (i.e. metadata or data operations) only occupy a small amount of total execution time Ø Tests are on tmpfs and FUSE-tmpfs Ø Real Operation in metadata operation: the time of conducting operation Ø Data Movement: the actual time of write in a complete write function call Ø Overhead: the cost besides the above two, e.g. the time of context switches 250 11.18% 600 Real Operation Data Movement 38.12% 200 500 Overhead Overhead 150 400 (ns) 300 100 37.86% 200 2.17% 33.7% Latency 50 Latency (ns) 100 34.8% 10.08%15.82% 0 tmpfs FUSE-tmpfs tmpfs FUSE-tmpfs 0 Create Close 1 4 16 64 128 256 Metadata Operations Transfer Sizes (KB) Fig. 1. Time Expense in Metadata Operations Fig. 2. Time Expense in Data Operations ROSS’18 S-7 Existing Solution and Our Approach n How to reduce the overheads from FUSE? Ø Build an independent user-space library to avoid going through kernel (e.g., IndexFS (SC’14), FusionFS) Ø However, this approach cannot support multiple FUSE libraries with distinct file paths and file descriptors n We propose Direct-FUSE to support multiple backend I/O services to an application Ø We adapted libsysio to our purpose in Direct-FUSE o libsysio is developed by Scalability team of Sandia National Lab): « a POSIX-like file I/O, and name space support for remote file systems from an application’s user-level address space. ROSS’18 S-8 Outline ·Background & Motivation • Design ·Performance Evaluation ·Conclusion ROSS’18 S-9 The Overview of Direct-FUSE n Direct-FUSE mainly consists of three components 1. Adapted-libsysio o Intercept file path and file descriptor for backend services identification o Simplify metadata and data execution path in original libsysio 2. lightweight-libfuse (not real libfuse) o Abstract file system operations from backend services to unified APIs 3. Backend services o Provide defined file system operations (e.g., FusionFS) Application Program Direct-FUSE Adapted-libsysio lightweight-libfuse Backend Services FUSE-Ext4 FusionFS Client …. Ext4 FusionFS Server … ROSS’18 S-10 Path and File Descriptor Operations n To facilitate the interception of file system operations for multiple backends, the operations are categorized into two: 1. File path operations i. Intercept prefiX and path (e.g., sshfs:/sshfs/test.txt) and return mount information ii. Look up corresponding inode based on the mount information, and redirect to defined operations 2. File descriptor operations i. Find open-file record based on given file descriptor « Open-file record contains pointers to inode, current stream position, etc ii. Redirect to defined operations based inode info in open-file record ROSS’18 S-11 Requirements for New Backends n Interact with FUSE high-level APIs n Separated as an independent user-space library Ø The library contains the fuse file system operations, initialization function, and also the unmount function Ø If a backend passes some specialized data to the fuse module via fuse_mount(), then the data has to be globalized for later file system operations n Implemented in C/C++ or has to be binary compatible with C/C++ ROSS’18 S-12 Outline ·Background and Challenges ·Design • Performance Evaluation ·Conclusion ROSS’18 S-13 Experimental Methodology n We compare the bandwidth of Direct-FUSE with local FUSE file system and native file system on disk and memory by Iozone Ø Disk o Ext4-fuse: FUSE file system overlying Ext4 o Ext4-direct: Ext4-fuse bypasses the FUSE kernel o Ext4-native: original Ext4 on disk Ø Memory o tmpfs-fuse, tmpfs-direct, and tmpfs-native are similar to the three tests on disk n We also compare the I/O bandwidth of distributed FUSE file system with Direct-FUSE Ø FusionFS: a distributed file system that supports metadata- and write-intensive operations ROSS’18 S-14 Sequential Write Bandwidth n Direct-FUSE achieves comparable bandwidth performance to the native file system Ø Ext4-direct outperforms Ext4-fuse by 16.5% on average Ø tmpfs-direct outperforms tmpfs-fuse at least 2.15x Ext4-fuse Ext4-direct Ext4-native tmpfs-fuse tmpfs-direct tmpfs-native 10000 1000 (MB/s) 100 10 1 Bandwidth 4 16 64 256 1024 Write Transfer Sizes (KB) ROSS’18 S-15 Sequential Read Bandwidth n Similar to the sequential write bandwidth, the read bandwidth of Direct-FUSE is comparable to the native file system Ø Ext4-direct outperforms Ext4-fuse by 2.5% on average Ø tmpfs-direct outperforms tmpfs-fuse at least 2.26X Ext4-fuse Ext4-direct Ext4-native tmpfs-fuse tmpfs-direct tmpfs-native 10000 1000 (MB/s) 100 10 1 Bandwidth 4 16 64 256 1024 Read Transfer Sizes (KB) ROSS’18 S-16 Distributed I/O Bandwidth n Direct-FUSE outperforms FusionFS in write bandwidth and shows comparable read bandwidth Ø Writes benefit more from the FUSE kernel bypassing n Direct-FUSE delivers similar scalability results as the original FusionFS 10000 10000 fusionfs direct-fusionfs fusionfs direct-fusionfs 1000 1000 100 (MB/s) 100 10 10 Bandwidth Bandwidth (MB/s) 1 Bandwidth 1 1 2 4 8 16 1 2 4 8 16 Number of Nodes Number of Nodes Write Read ROSS’18 S-17 Overhead Analysis n The dummy read/write occupies less than 3% of the complete I/O function time in Direct-FUSE, even when the I/O size is very small Ø Dummy write/read: no actual data movement, directly return once reach the backend service Ø Real write/read: the actual Direct-FUSE read and write I/O calls 10000 10000 dummy write real write dummy read real read s) 1000 s) 1000 n n 100 100 Latency ( Latency 10 ( Latency 10 1 1 1B 4B 16B 64B 256B 1KB 1B 4B 16B 64B 256B 1KB Transfer Sizes Transfer Sizes ROSS’18 S-18 Conclusions Ø We have revealed and analyzed the context switches count and time overheads in FUSE metadata and data operations Ø We have designed and implemented Direct-FUSE, which can avoid crossing kernel boundary and support multiple FUSE backends simultaneously Ø Our experimental results indicate that Direct-FUSE achieves significant performance improvement compared to original FUSE file systems ROSS’18 S-19 Sponsors of This Research ROSS’18 S-20.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    20 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us