Wayback: a User-Level Versioning File System for Linux
Total Page:16
File Type:pdf, Size:1020Kb
Wayback: A User-level Versioning File System For Linux Brian Cornell, Peter Dinda, and Fabián Bustamante Computer Science Department Northwestern University 1 Really excited last night ● Great idea for this presentation ● 3d rotating slides... ● Woke up this morning, Oh no! ● Too excited: no backup copy ● No RCS1 or CVS2 check-in ● Lucky, Wayback was there! ● Reverted back to yesterday afternoon, and here we are, aren't you glad? 1: F. Tichy, 1980. 2: B. Berliner, 2001. 2 Outline ● Overview ● Related work ● How it works – User's perspective ● Behind the scenes – System's perspective ● Design decisions ● Performance ● Future work 3 Wayback: Automatic Versioning ● Automatic: Versions files and directories without user interaction ● Universal: Versions files of all types ● Comprehensive: Never loses data ● Generic: Works over any directory on any base FS ● User-space: Implemented in user-space Free (GPL): http://wayback.sourceforge.net 4 Related Work ● VMS operating system [K McCoy, 1990] – Version on close ● Elephant [D Santry, et al, 1999] – Exploit massive disks ● VersionFS stackable file system [Muniswamy- Reddy, et al, 2004] – Contemporaneous to this project – Implements different features ● Cedar [D. K. Gifford, et al, 1998], 3DFS [D. G. Korn, et al, 1989], CVFS [C. A. Soules, et al, 2003] 5 How does it work? ● Three simple steps ● Remount folder ● Edit and work as normal ● Revert or extract old versions 6 Remount folder ● Run wayback executable to remount – wayback /mnt/data /home/brian/Projects ● Use mount.wayback script to simplify and mount for all users – mount.wayback /mnt/data /home/brian/Projects ● Files still at /mnt/data, but if you edit them at /home/brian/Projects, they are versioned 7 Work normally ● Edit files with any editor ● Modify binary files too ● Move, rename, delete files ● Edit directories as much as you want ● Everything you do is recorded and remembered 8 Revert or extract versions ● List the versions of a file – vstat Wayback.sxi ● Revert a file to an old version – To a time: vrevert -d 15:00:00 Wayback.sxi – To a date: vrevert -d 2004:06:29:15:00:00 Wayback.sxi – To a numbered version as listed by vstat: vrevert -n 5 Wayback.sxi ● Revert a directory to a date/time – vrevert -d 2004:06:29:15:00:00 Documents ● Note: Even reverting can be undone 9 Revert or extract versions ● Extract versions to a different file – Same format as vrevert with a destination – vextract -d 2004:06:29:15:00:00 Wayback.sxi GoodWayback.sxi ● Delete the versioning if you're sure you don't need it anymore – vrm Wayback.sxi – All versioning info for the file is permanently deleted – File is not deleted. To delete file and versioning, first delete file, then versioning. 10 Behind the Scenes ● FUSE1 module sends all FS calls to Wayback ● Everything kept track of transparently ● Hidden log files, one per file or directory ● File operations recorded ● Directory operations recorded 1: M. Szeredi,, Filesystem in USEr space, http://sourceforge.net/projects/avf 11 Log Files ● Every versioned file has a hidden log file – Wayback.sxi. versionfs! version ● Every directory has a hidden catalog – Documents/. versionfs! version ● Logs created as soon as needed ● Logs omitted from directory listing by Wayback ● Logs in binary custom format 12 File Operations ● On every write or Log file truncate, entry Previous appended to log Log Entries ● Data needed to undo ... write or truncate is Timestamp written Write Offset Overwritten Size File Enlarged Flag Overwritten Data 13 Directory Operations ● When files are changed Catalog file in a way that changes Previous their directory entries, Catalog directory catalog is Entries ... updated Timestamp ● Every operation writes Action ID timestamp and action ID ● Action Specific data per Specific operation Data 14 Dir Ops: Create, Mkdir ● Just needs to know name so it can be deleted on revert Length of Name File or Directory Name 15 Dir Ops: Rm ● First, file is truncated to 0 bytes – File version log now has backup ● Record file attributes – Mode, owner, times ● Record filename ● Record destination for symbolic link Attributes Contents: Size of Data Attributes Mode UID GID Filename Access Time Modified Time Create Time [Link dest] 16 Dir Ops: Rmdir, Rename ● Rmdir actually does rename and omits from directory listing – Don't want to lose logs for files in directory ● Record old and new names Length of Names Old and New Names 17 Dir Ops: Chmod, Chown, Utime ● Just need to know the filename and the old attributes Size of Data Attributes Filename 18 Design Decisions ● User vs kernel level – User: Remount any base FS – User: Development easier, more stable – Kernel: Faster ● Undo vs redo log – Undo: Everything always retained as long as current version is not lost – Undo: No overhead of initial file copy – Redo: Files recoverable even if the current version is lost 19 Design Decisions ● FUSE vs other user level modules – It's modern and actively developed – It works with kernels 2.4 and 2.6 – It gives complete but simple access to FS operations – It's written in C (with which we are more familiar) 20 Design Decisions ● Comprehensive versioning vs others – Record every operation that ever happens – Periodic: Choose time-granularity of versioning – Version on close: Assume one logical operation per file open ● Individual log per file vs central log – Individual: No central point of failure – Individual: Simple, logs stay with files – Central: Doesn't consume inodes as fast 21 What's the cost of Wayback? ● 3 performance tests – Bonnie1: Large file reads and writes – Andrew2: Typical use and compilation estimation – RCS: Versioning comparison with RCS ● Tested with ext3, FUSE, and Wayback 1: T. Bray, http://www.textuality.com/bonnie 2: J. Howard, et al, 1988. 22 Test machines ● Tests on 3 machines – Machine A: Normal ● Athlon XP 2400 ● 512 Mb memory ● 2.5 in notebook hard drive – Machine B: Slow disk ● Pentium IV 2.2 Ghz ● 512 Mb memory ● USB 1.1 hard drive – Machine C: Slow processor ● Celeron 500 Mhz ● 128 Mb memory ● 2.5 in notebook hard drive 23 Bonnie: 30-70% max overhead 18 16 14 12 s / B 10 ext3 M FUSE 8 Wayback 6 4 2 0 Write per Write per Rewrite Read per Read per character block character block 24 Modified Andrew Benchmark ● Andrew uses Makefile to run five phases with Sun graphical source code ● Modification uses Perl and Linux wm ● Five phases: – 1: Directory creation (mkdir) – 2: Copying files (cp) – 3: Stating files (ls -l) – 4: Reading files (grep and wc) – 5: Compilation (gcc) 25 Andrew: Compile 15% slower 1100 1000 900 800 ) ms 700 ( Ext3 e 600 FUSE Wayback Tim 500 400 300 200 100 0 mkdir cp ls -l grep, wc gcc / 20 26 RCS Comparison ● 3 traces – Binary write: ● Random small changes with write system call – Binary file: ● Random small changes, but rewrite whole file – Text: ● Random changes to lines in text file ● 11 configurations – Wayback – RCS checkin after every k changes, k=1..10 27 RCS: Wayback Faster 50 45 40 ) 35 (s e 30 Binary write m i Binary file T 25 Text x 10 20 15 10 5 0 Wb 1 2 3 4 5 6 7 8 9 10 28 Future Work ● Log compression – Remove unnecessary granularity ● Garbage collection – Remove old versions ● Limiting – Limit number or size of versions ● Customizable versioning – Filter to only version some files, not others ● Migration of virtual machines – Undo/redo to sync virtual machines 29 Summary ● Don't lose important data! ● RCS, CVS, etc require user action ● Wayback automatically versions – Remount directory with versioning – Edit files as normal – Writes are written as undo information – Easy to revert or extract to any time ● Wayback faster than RCS even excluding user time 30 Get it! It's free! (Gnu General Public License) http://wayback.sourceforge.net 31 RCS Size Comparison (MB) File Type Binary Write Binary File Text Wayback 1157456.4 106347428.0 2182218.0 RCS Period 1 2242325.4 3856521.8 101062.2 RCS Period 2 2237020.7 3779180.4 96134.2 RCS Period 3 2233854.1 3731427.8 94336.4 RCS Period 4 2234384.4 3719578.2 93597.1 RCS Period 5 2233716.4 3700853.4 93095.3 RCS Period 6 2227924.3 3621657.1 92375.5 RCS Period 7 2230300.3 3635107.2 92321.2 RCS Period 8 2227552.2 3590195.5 91960.0 RCS Period 9 2231124.4 3625548.8 92060.8 RCS Period 10 2232218.7 3629717.4 92045.2 32 Slow Disk: Bonnie Performance 1.1 1 0.9 0.8 s 0.7 / B ext3 0.6 M FUSE 0.5 Wayback 0.4 0.3 0.2 0.1 0 Write per Write per Rewrite Read per Read per character block character block 33 Slow Processor: Bonnie Performance 18 16 14 12 s / B 10 ext3 M FUSE 8 Wayback 6 4 2 0 Write per Write per Rewrite Read per Read per character block character block 34 Slow Disk: Andrew Performance 2250 2000 1750 ) s 1500 (m Ext3 e 1250 FUSE m i T Wayback 1000 750 500 250 0 mkdir cp ls -l grep, wc gcc / 20 35 Slow Processor: Andrew Performance 3000 2750 2500 2250 ) 2000 ms ( 1750 Ext3 e FUSE 1500 Tim Wayback 1250 1000 750 500 250 0 mkdir cp ls -l grep, wc gcc / 20 36 Slow Disk: RCS Performance 60 55 50 45 ) (s 40 e Binary write m 35 i Binary file T 30 Text x 10 25 20 15 10 5 0 Wb 1 2 3 4 5 6 7 8 9 10 37 Slow Processor: RCS Performance 110 100 90 80 ) s ( 70 e Binary write 60 Binary file Tim Text x 10 50 40 30 20 10 0 Wb 1 2 3 4 5 6 7 8 9 10 38 .