Extension Framework for File Systems in User Space
Total Page:16
File Type:pdf, Size:1020Kb
ExtFUSE Extension framework for File systems in User space Ashish Bijlani, Umakishore Ramachandran Georgia Institute of Technology 1 Kernel vs User File Systems • Examples • Examples – Ext4, OverlayFS, etc. – EncFS, Gluster, etc. • Pros • Pros – Native performance – Improved security/reliability – Easy to develop/debug/ • Cons maintain – Poor security/reliability – Not easy to develop/ • Cons debug/maintain ––PoorPoor performance! performance! 2 File Systems in User Space (FUSE) struct fuse_lowlevel_ops ops { • State-of-the-art framework .lookup = handle_lookup, .access = NULL, – All file system handlers .getattr = handle_getattr, implemented in user space .setattr = handle_setattr, .open = handle_open, .read = handle_read, • Over 100+ FUSE file systems .readdir = handle_readdir, – Stackable: Android .write = handle_write, SDCardFS, EncFS, etc. // more handlers … .getxattr = handle_getxattr, – Network: GlusterFS, Ceph, .rename = handle_rename, Amazon S3FS, etc. .symlink = handle_symlink, .flush = NULL, } 3 FUSE Architecture 4’ FUSE Daemon Application Over the L I B F U S E network User 4 Stackable 1 Kernel VFS 2 3 5 FUSE Driver QUEUE 6 Lower FS (e.g., EXT4) 4 FUSE Architecture lookup() FUSE Daemon open(“/mnt/foo/bar”) getattr() setattr() L I B F U S E Application open() User read() 4 1 Kernel readdir() write() VFS … rename() 2 lookup(“foo”) symlink() close() 3 5 FUSE Driver getxattr() QUEUE C O N T E X T setxattr() S W I T C H 6 Lower FS (e.g., EXT4) 5 FUSE Performance “cd linux-4.18; make tinyconfig; make -j4” • Intel i5-3350 quad core, Ubuntu 16.04.4 LTS • Linux 4.11.0, LibFUSE commit # 386b1b, StackFS (w/ EXT4) 50 39.74 Opts Enabled 38 30.91 -o max_write=128K 25 -o splice_read -o splice_write 13 28.57% overhead! Time (sec) -o splice_move 0 entry_timeout > 0 Native (EXT4) FUSE attr_timeout > 0 6 6 # Req received by FUSE • “cd linux-4.18; make tinyconfig; make -j4” 400K Regular Optimized 350K 1 2 300K 3 250K 4 200K atimeVFS changes issues duringgetxattr() 150K 4x fewer read() invalidate # Requests for each write() for 100K lookup()s cachedreading attributes security labels 50K 2’ 1’ 0K Lookup Getattr Rename Setattr Create Open Release Getxattr Mkdir Unlink Opendir Readdir Releasedir Read Write 7 eBPF Overview • Pseudo machine architecture Clang/LLVM bytecode BPF C program • C code compiled into BPF code syscall() • Verified and loaded into kernel user • Executed under VM runtime kernel Verifier • Evolved as a generic kernel extension framework bpf virtual machine • Used by trace, perf, net subsystems BPF Map sandbox key-value data struct • Shared BPF maps with user space Kernel functions 8 ExtFUSE • Extension framework for File systems in User space –Register “thin” extensions - handle requests in kernel • Avoid user space context switch! –Share data between FUSE daemon and extensions using BPF maps • Cache metadata in the kernel 9 ExtFUSE Architecture BPF Handlers FUSE Daemon Application L I B E x t F U S E L I B F U S E User 4 1 Kernel 7 0’ Cache Load VFS Meta- BPF data Code Deliver req to 2 extension 3’ 3 BPF VM FUSE Driver QUEUE 4’ 5 BPF Map Serve from cache 6 Lower FS (e.g., EXT4) 10 ExtFUSE Applications • BPF code to proactively cache/invalidate meta-data in kernel – Applies potentially to all FUSE file systems – e.g., Gluster readdir ahead results could be cached • BPF code to perform custom filtering or perm checks – e.g., Android SDCardFS uid checks in lookup(), open() • BPF code to directly forward I/O requests to lower FS in kernel – e.g., install/remove target file descriptor in BPF map 11 ExtFUSE Example struct bpf_map_def map = { .type = BPF_MAP_TYPE_HASH, .key_size = sizeof(u64), // ino (param 0) .value_size = sizeof(struct fuse_attr_out), .max_entries = MAX_NUM_ATTRS, // 2 << 16 }; // getattr() kernel extension - cache attrs int getattr(struct extfuse_args *args) { u32 key = bpf_extfuse_read(args, PARAM0); u64 *val = bpf_map_lookup_elem(map, &key); if (val) bpf_extfuse_write(args, PARAM0, val); } 12 ExtFUSE Example • Invalidate cached attrs from kernel extensions. E.g., // setattr() kernel extension - invalidate attrs int setattr(struct extfuse_args *args) { u32 key = bpf_extfuse_read(args, PARAM0); if (val) bpf_map_delete_elem(map, &key); } • Cache attrs from FUSE daemon – Insert into map on atime change • Similarly, cache lookup()s and xattr()s, symlink()s 13 ExtFUSE Performance • “cd linux-4.18; make tinyconfig; make -j4” • Intel i5-3350 quad core, SSD, Ubuntu 16.04.4 LTS • Linux 4.11.0, LibFUSE commit # 386b1b, StackFS (w/ EXT4) 40 Overhead 30 Optimized Latency: 28.57% ExtFUSE Latency: 2.16% 20 Only 2.16% overhead! ExtFUSE Memory: 50MB Time (sec) 10 (worst case) 0 Cached: lookup, attr, xattr Native Optimized ExtFUSE Passthrough: read, write 14 # Req received by FUSE • “cd linux-4.18; make tinyconfig; make -j4” 400K Optimized ExtFUSE 350K 300K 250K 200K No read/ 150K Very few Very few write() reqs # Requests 100K getattr()s getxattr()s 50K 0K Lookup Getattr Rename Setattr Create Open Release Getxattr Mkdir Unlink Opendir Readdir Releasedir Read Write 15 Conclusion • ExtFUSE framework safely executes “thin” file system handlers in the kernel. • Developers can use ExtFUSE to • Cache metadata requests • Directly pass I/O requests to lower FS. • Insert custom security checks in the kernel. • We ported four FUSE file systems to ExtFUSE, including Android SDcardFS and show significant performance improvements. 16 Thank You! • Open Source Summit ’18 • Presentation slides • Linux Plumbers Conference ’18 talk • Presentation video • Project page –https://extfuse.github.io 17.