ExtFUSE Extension framework for File systems in User space

Ashish Bijlani, Umakishore Ramachandran Georgia Institute of Technology

1 Kernel vs User File Systems

• Examples • Examples – , OverlayFS, etc. – EncFS, Gluster, etc.

• Pros • Pros – Native performance – Improved security/reliability – Easy to develop/debug/ • Cons maintain – Poor security/reliability – Not easy to develop/ • Cons debug/maintain ––PoorPoor performance! performance!

2 File Systems in User Space (FUSE)

struct fuse_lowlevel_ops ops { • State-of-the-art framework .lookup = handle_lookup, .access = NULL, – All handlers .getattr = handle_getattr, implemented in user space .setattr = handle_setattr, .open = handle_open, .read = handle_read, • Over 100+ FUSE file systems .readdir = handle_readdir, – Stackable: Android .write = handle_write, SDCardFS, EncFS, etc. // more handlers … .getxattr = handle_getxattr, – Network: GlusterFS, , .rename = handle_rename, Amazon S3FS, etc. .symlink = handle_symlink, .flush = NULL, }

3 FUSE Architecture

4’ FUSE Daemon Application Over the L I B F U S E network User 4 Stackable 1 Kernel VFS

2

3 5 FUSE Driver QUEUE

6 Lower FS (e.g., EXT4)

4 FUSE Architecture

lookup() FUSE Daemon open(“/mnt/foo/bar”) getattr() setattr() L I B F U S E Application open() User read() 4 1 Kernel readdir() write() VFS … rename() 2 lookup(“foo”) symlink() close() 3 5 FUSE Driver getxattr() QUEUE C O N T E X T setxattr() S W I T C H 6 Lower FS (e.g., EXT4)

5 FUSE Performance

“cd -4.18; make tinyconfig; make -j4” • Intel i5-3350 quad core, Ubuntu 16.04.4 LTS • Linux 4.11.0, LibFUSE commit # 386b1b, StackFS (w/ EXT4)

50 39.74 Opts Enabled 38 30.91 -o max_write=128K 25 -o splice_read -o splice_write 13 28.57% overhead! Time (sec) -o splice_move 0 entry_timeout > 0 Native (EXT4) FUSE attr_timeout > 0

6

6 # Req received by FUSE

• “cd linux-4.18; make tinyconfig; make -j4”

400K Regular Optimized 350K 1 2 300K 3 250K 4 200K atimeVFS changes issues duringgetxattr() 150K 4x fewer read() invalidate

# Requests for each write() for 100K lookup()s cachedreading attributes security labels 50K 2’ 1’ 0K Lookup Getattr Rename Setattr Create Open Release Getxattr Mkdir Unlink Opendir Readdir Releasedir Read Write

7 eBPF Overview

• Pseudo machine architecture Clang/LLVM bytecode BPF C program • C code compiled into BPF code syscall() • Verified and loaded into kernel user • Executed under VM runtime kernel

Verifier • Evolved as a generic kernel extension framework bpf virtual machine • Used by trace, perf, net subsystems BPF Map sandbox key-value data struct • Shared BPF maps with user space Kernel functions

8 ExtFUSE

• Extension framework for File systems in User space –Register “thin” extensions - handle requests in kernel • Avoid user space context switch!

–Share data between FUSE daemon and extensions using BPF maps • Cache metadata in the kernel

9 ExtFUSE Architecture

BPF Handlers FUSE Daemon Application L I B E x t F U S E L I B F U S E User 4 1 Kernel 7 0’ Cache Load VFS Meta- BPF data Code Deliver req to 2 extension 3’ 3 BPF VM FUSE Driver QUEUE 4’ 5 BPF Map Serve from cache 6 Lower FS (e.g., EXT4)

10 ExtFUSE Applications

• BPF code to proactively cache/invalidate meta-data in kernel – Applies potentially to all FUSE file systems – e.g., Gluster readdir ahead results could be cached

• BPF code to perform custom filtering or perm checks – e.g., Android SDCardFS uid checks in lookup(), open()

• BPF code to directly forward I/O requests to lower FS in kernel – e.g., install/remove target file descriptor in BPF map

11 ExtFUSE Example struct bpf_map_def map = { .type = BPF_MAP_TYPE_HASH, .key_size = sizeof(u64), // ino (param 0) .value_size = sizeof(struct fuse_attr_out), .max_entries = MAX_NUM_ATTRS, // 2 << 16 }; // getattr() kernel extension - cache attrs int getattr(struct extfuse_args *args) { u32 key = bpf_extfuse_read(args, PARAM0); u64 *val = bpf_map_lookup_elem(map, &key); if (val) bpf_extfuse_write(args, PARAM0, val); }

12 ExtFUSE Example

• Invalidate cached attrs from kernel extensions. E.g.,

// setattr() kernel extension - invalidate attrs int setattr(struct extfuse_args *args) { u32 key = bpf_extfuse_read(args, PARAM0); if (val) bpf_map_delete_elem(map, &key); } • Cache attrs from FUSE daemon – Insert into map on atime change

• Similarly, cache lookup()s and xattr()s, symlink()s

13 ExtFUSE Performance

• “cd linux-4.18; make tinyconfig; make -j4” • Intel i5-3350 quad core, SSD, Ubuntu 16.04.4 LTS • Linux 4.11.0, LibFUSE commit # 386b1b, StackFS (w/ EXT4)

40 Overhead 30 Optimized Latency: 28.57% ExtFUSE Latency: 2.16% 20 Only 2.16% overhead! ExtFUSE Memory: 50MB

Time (sec) 10 (worst case)

0 Cached: lookup, attr, xattr Native Optimized ExtFUSE Passthrough: read, write

14 # Req received by FUSE

• “cd linux-4.18; make tinyconfig; make -j4”

400K Optimized ExtFUSE 350K 300K 250K 200K No read/ 150K Very few Very few write() reqs # Requests 100K getattr()s getxattr()s 50K 0K Lookup Getattr Rename Setattr Create Open Release Getxattr Mkdir Unlink Opendir Readdir Releasedir Read Write

15 Conclusion

• ExtFUSE framework safely executes “thin” file system handlers in the kernel.

• Developers can use ExtFUSE to • Cache metadata requests • Directly pass I/O requests to lower FS. • Insert custom security checks in the kernel.

• We ported four FUSE file systems to ExtFUSE, including Android SDcardFS and show significant performance improvements.

16 Thank You!

• Open Source Summit ’18 • Presentation slides

• Linux Plumbers Conference ’18 talk • Presentation video

• Project page –https://extfuse.github.io

17