When Ebpf Meets FUSE Improving the Performance of User File Systems
Total Page:16
File Type:pdf, Size:1020Kb
When eBPF Meets FUSE Improving the performance of user file systems Ashish Bijlani, PhD Student, Georgia Tech @ashishbijlani 1 In-Kernel vs User File Systems “People who think that “A lot of people once userspace filesystems are thought Linux and the realistic for anything but machines it ran on were toys are just misguided.” toys… Apparently I’m misguided.” - Linus Torvalds - Jeff Darcy 2 In-Kernel vs User File Systems • Examples • Examples – Ext4, OverlayFS, etc. – EncFS, Gluster, etc. • Pros • Pros – Improved security/ – Native performance reliability • Cons – Easy to develop/debug/ – Poor security/reliability maintain – Not easy to develop/ • Cons debug/maintain ––PoorPoor performance!performance! 3 File Systems in User Space (FUSE) struct fuse_lowlevel_ops ops { • State-of-the-art framework .lookup = handle_lookup, .access = NULL, – All file system handlers .getattr = handle_getattr, implemented in user space .setattr = handle_setattr, .open = handle_open, .read = handle_read, .readdir = handle_readdir, • Over 100+ FUSE file systems .write = handle_write, – Stackable: Android SDCardFS, // more handlers … EncFS, etc. .getxattr = handle_getxattr, .rename = handle_rename, – Network: GlusterFS, Ceph, .symlink = handle_symlink, Amazon S3FS, etc. .flush = NULL, } 4 FUSE Architecture 4’ FUSE Daemon Application Over the L I B F U S E network User 4 Stackable 1 Kernel VFS 2 3 5 FUSE Driver QUEUE 6 Lower FS (e.g., EXT4) 5 FUSE Performance • “tar xf linux-4.17.tar.xz” – Intel i5-3350 quad core, Ubuntu 16.04.4 LTS – Linux 4.11.0, LibFUSE commit # 386b1b Native FUSE 15 Overhead HDD: 78.8% 11 SSD: 81.1% 8 More noticeable Time (sec) 4 on faster media! 0 HDD SSD 6 FUSE Performance • “cd linux-4.17; make tinyconfig; make -j4” – Intel i5-3350 quad core, SSD, Ubuntu 16.04.4 LTS – Linux 4.11.0, LibFUSE commit # 386b1b 40 30 17.54% 20 overhead Time (sec) 10 0 Native FUSE 7 FUSE Architecture lookup() FUSE Daemon open(“/mnt/foo/bar”) getattr() Application setattr() L I B F U S E open() User 4 1 read() Kernel readdir() VFS write() … 2 lookup(“foo”) rename() symlink() 3 5 close() FUSE Driver getxattr() QUEUE C O N T E X T setxattr() S W I T C H 6 Lower FS (e.g., EXT4) 8 Write Read Releasedir 9 Readdir Opendir Unlink Mkdir Getxattr Release Open Create Setattr Rename Getattr “cd linux-4.17; make tinyconfig; make -j4” Lookup 400K Requests• received by300K FUSE daemon 200K 0K 100K # Requests # FUSE Optimizations • Big 128K writes – “-o max_write=131072” • Zero data copying for data I/O – “-o splice_read, splice_write, splice_move” • Leveraging VFS caches – Page cache for data I/O • “-o writeback_cache” – Dentry and Inode caches for lookup() and getattr() • “entry_timeout”, “attr_timeout” 10 FUSE Performance • “cd linux-4.17; make tinyconfig; make -j4” • Intel i5-3350 quad core, Ubuntu 16.04.4 LTS • Linux 4.11.0, LibFUSE commit # 386b1b 40 Opts Enabled 30 -o max_write=128K -o splice_read 20 -o splice_write Opts do not help much! -o splice_move Time (sec) 10 entry_timeout > 0 attr_timeout > 0 0 Native Regular Optimized 11 Write Read Releasedir 12 Optimized Readdir Opendir Unlink Regular Mkdir Getxattr Release Open Create Setattr 4x fewer lookup()s Rename Getattr “cd linux-4.17; make tinyconfig; make -j4” Lookup • 400K Requests received by300K FUSE daemon 200K 0K 100K # Requests # Requests received by FUSE daemon • “cd linux-4.17; make tinyconfig; make -j4” 400K Regular Optimized 3 1 2 300K 4 200K atime changes during read() invalidate # Requests 100K cached attributes 0K Lookup Getattr Rename Setattr Create Open Release Getxattr Mkdir Unlink Opendir Readdir Releasedir Read Write 13 Requests received by FUSE daemon • “cd linux-4.17; make tinyconfig; make -j4” 400K Regular Optimized 300K VFS issues getxattr() for each write() for 200K reading security labels # Requests 100K 2’ 1’ 0K Lookup Getattr Rename Setattr Create Open Release Getxattr Mkdir Unlink Opendir Readdir Releasedir Read Write 14 eBPF • Berkeley Packet Filter (BPF) – Pseudo machine architecture for packet filtering • eBPF enhances BPF – Evolved as a generic kernel extension framework – Used by tracing, perf, and network subsystems 15 eBPF Overview Clang/LLVM • Extensions written in C bytecode BPF C program • Compiled into BPF code syscall() • Code is verified and user loaded into kernel kernel • Execution under virtual Verifier machine runtime bpf virtual machine • Shared BPF maps with BPF Map sandbox user space key-value Kernel functions data struct 16 eBPF Simplified Example struct bpf_map_def map = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(u32), .value_size = sizeof(u64), .max_entries = 1, // single element }; // tracepoint/syscalls/sys_enter_open int count_open(struct syscall *args) { u32 key = 0; u64 *val = bpf_map_lookup_elem(map, &key); if (val) __sync_fetch_and_add(val, 1); } 17 ExtFUSE: eBPF meets FUSE • Extension framework for File systems in User space – Register “thin” extensions - handle requests in kernel • Avoid user space context switch! – Share data between FUSE daemon and extensions using BPF maps • Cache metadata in the kernel 18 FUSE Architecture BPF Handlers FUSE Daemon Application L I B E x t F U S E L I B F U S E User 4 1 Kernel 7 0’ Cache Load VFS Meta- BPF Deliver req to 2 data Code extension 3’ 3 BPF VM FUSE Driver QUEUE BPF Map 4’ 5 Serve from 6 cache Lower FS (e.g., EXT4) 19 ExtFUSE Simplified Example struct bpf_map_def map = { .type = BPF_MAP_TYPE_HASH, .key_size = sizeof(u64), // ino (param 0) .value_size = sizeof(struct fuse_attr_out), .max_entries = MAX_NUM_ATTRS, // 2 << 16 }; // getattr() kernel extension - cache attrs int getattr(struct extfuse_args *args) { u32 key = bpf_extfuse_read(args, PARAM0); u64 *val = bpf_map_lookup_elem(map, &key); if (val) bpf_extfuse_write(args, PARAM0, val); } 20 ExtFUSE Simplified Example • Invalidate cached attrs from kernel extensions. E.g., // setattr() kernel extension - invalidate attrs int setattr(struct extfuse_args *args) { u32 key = bpf_extfuse_read(args, PARAM0); if (val) bpf_map_delete_elem(map, &key); } • Cache attrs from FUSE daemon – Insert into map on atime change • Similarly, cache lookup()s and xattr()s 21 ExtFUSE Performance • “cd linux-4.17; make tinyconfig; make -j4” • Intel i5-3350 quad core, SSD, UbuntuOverhead 16.04.4 LTS • Linux 4.11.0, LibFUSE commit # 386b1bRegular Latency: 17.54% 40 ExtFUSE Latency: 30 5.71% 20 ExtFUSE Memory: 50MB (worst case) Time (sec) 10 Cached 0 lookup, attr, xattr Native Regular Optimized ExtFUSE 22 Requests received by FUSE daemon • “cd linux-4.17; make tinyconfig; make -j4” 400K Regular Optimized ExtFUSE 300K Very few 200K Very few getattr()s getxattr()s # Requests 100K 0K Lookup Getattr Rename Setattr Create Open Release Getxattr Mkdir Unlink Opendir Readdir Releasedir Read Write 23 ExtFUSE Applications • BPF code to cache/invalidate meta-data in kernel – Applies potentially to all FUSE file systems – e.g., Gluster readdir ahead results could be cached • BPF code to perform custom filtering or perm checks – e.g., Android SDCardFS uid checks in lookup(), open() • BPF code to forward I/O requests to lower FS in kernel – e.g., install/remove target file descriptor in BPF map 24 ExtFUSE Status and References • Work in progress at Georgia Tech – Applying to Gluster, Ceph, EncFS, Android SDCardFS, etc. – Project page: https://extfuse.github.io • References – FUSE performance study by FSL, Stony Brook – IOVisor eBPF Project – BPF Compiler Collection (BCC) Toolchain 25 Thank You! [email protected] www.linkedin.com/in/ashishbijlani 26 27.