NILFS2: Review and Challenges

NILFS2: Review and Challenges

Utilizing NILFS2 Fine‐grained Snapshots Ryusuke KONISHI NTT Cyberspace Laboratories NTT Corporation Outline • Nilfs2 overview • Fine‐grained Snapshots ‐ Why? • Use‐case scenario and applications • Work in progress on Snapshots • Current status and future plan 2011/6/1 Copyright (C) 2011 NTT Corporation 2 NILFS2 Overview • A mainlined filesystem (since kernel 2.6.30) • A log‐structured filesystem – Filesystem itself is a big journal – Ensure consistency and quick recovery from unexpected power failure. • Stand for fine‐grained and “any time” snapshots – Creates a number of checkpoints every time user makes a change. – Can change arbitrary checkpoints into snapshots later on. – Snapshots are concurrently mountable and accessible. 2011/6/1 Copyright (C) 2011 NTT Corporation 3 Fine‐grained Snapshots ‐ Why? • Backup is necessary to prevent data loss, but it still accompanies inconvenience and pain. CAUSE OF DATA LOSS Viruses Natural Software 4% disaster malfunction Mostly preventable with 2% 9% basic high‐integrity system (Redundant configuration) Unprotected by Human Hardware redundant drives. error failure 26% 59% Source: Ontrack Data Recovery, Inc. Including office PC. The data is based on actual data recoveries performed by Ontrack. 2011/6/1 Copyright (C) 2011 NTT Corporation 4 Solution with NILFS • Buffer filesystem history in disk. • User can even restore files mistakenly overwritten or destroyed just a few seconds ago. Usual filesystems NILFS Backup 1 2 A 3 B B 4 C A→B→C→ 5 D 6 E D D→E→F F Easily accessible ex. Previous data is Previous data is per 8 hours overridden preserved in disk 2011/6/1 Copyright (C) 2011 NTT Corporation 5 Disk write in NILFS • Only modified blocks are incrementally written to disk (in CoW) – Even for metadata and B‐tree intermediate blocks as well as data. Application view File A (modified) File B (appended) A A’ A B B’ Disk usage B‐Tree intermediate blocks Metadata blocks (inodes, …) A B Modified or appended blocks A B A’ B’ 2011/6/1 Copyright (C) 2011 NTT Corporation 6 Garbage Collection • Creates new disk space to continue writing logs (essential for LFS) • NILFS2 employs a unique GC which can reclaim disk space keeping selected checkpoints. – This makes checkpoints long‐term storable in arbitrary granularity that user demands. A checkpoint which user marked as SNAPSHOT are preserved Protection CP SS SS period CP CP CP CP CP CP CP CP CP CP Recent checkpoints are preserved, too. 2011/6/1 Copyright (C) 2011 NTT Corporation 7 Command Line Programs Tools are included in nilfs‐utils (or nilfs‐tools for Debian/Ubuntu) package • Snapshot management programs lscp list checkpoints lscp -s list snapshots mkcp -s make a snapshot chcp change an existing checkpoint to a snapshot (or vice versa) nilfs-clean manually trigger garbage collection $ lscp CNO DATE TIME MODE FLG NBLKINC ICNT 1 2011-05-08 14:45:49 cp - 11 3 2 2011-05-08 14:50:22 cp - 200523 81 3 2011-05-08 20:40:34 cp - 136 61 4 2011-05-08 20:41:20 cp - 187666 1604 5 2011-05-08 20:41:42 cp - 51 1634 6 2011-05-08 20:42:00 cp - 37 1653 7 2011-05-08 20:42:42 cp - 272146 2116 8 2011-05-08 20:43:13 cp - 264649 2117 9 2011-05-08 20:43:44 cp - 285848 2117 10 2011-05-08 20:44:16 cp - 139876 7357 ... Debian is registered trademark of Software in the Public Interest, Inc. Ubuntu is registered trademark of Canonical Ltd. 2011/6/1 Copyright (C) 2011 NTT Corporation 8 Use‐Case Scenario • Casual data protection – Prevent data loss against operation mistake, even if you have NOT taken snapshot. • Versioning – Make change history on files browsable. • Tamper detection and recovery – Filesystem itself preserves full‐time and overall range of change history ‐‐ track changes using the filesystem. • Upgrade / Trouble shoot – Can revert system state against unexpected troubles. NILFS does not need taking a snapshot before every upgrade nor conf‐file editing. 2011/6/1 Copyright (C) 2011 NTT Corporation 9 TimeBrowse Project • A GNOME Nautilus extension applying NILFS • Allow browsing change history of documents and restore its arbitrary version. http://sourceforge.net/projects/timebrowse Browsable document change history. User can confirm content of each version through a thumbnail image. 2011/6/1 Copyright (C) 2011 NTT Corporation 10 Snapshot Appliance • Example: in‐house shared storage server • Files are restorable even if other users edited or deleted (like Wiki). • Seamlessly accessible from Windows clients. • We actually have one and a half years operation record. Samba + NILFS + Device‐Mapper Snapshots become browsable on Windows by configuring volume shadow copy VFS module in Samba Redundant drives 2011/6/1 Copyright (C) 2011 NTT Corporation 11 Windows is a registered trademark of Microsoft Corporation in the United States and other countries. Tamper Detection Typical approaches Notification Database + Rule set Real‐time Rich auditing capabilities example inotify Tripwire, AIDE, etc… Fine‐grained snapshots • Can closely track the evidence of intrusion and tampering after the fact, as well as their progress. • Quick and accurate restoration from the local disk. Tripwire is aregistered trademark of Tripwire, Inc. 2011/6/1 Copyright (C) 2011 NTT Corporation 12 Development Focus Establish fine‐grained snapshots and make it ready for use Enhance support for remote backup and disaster recovery • Efficient delta extraction, restoration, de‐dupe. • Data security (e.g. shredding), anti‐tampering. 2011/6/1 Copyright (C) 2011 NTT Corporation 13 WIP ‐ Snapshot diff (1/4) • Problem (user’s demand) • It takes too long to find out changes on filesystem for thousands of snapshots. Users want to shorten the time: – Incremental remote backup – Search index rebuild – Tamper detection • Current effort • Proposing experimental API which quickly looks up changed inodes between two checkpoints. 2011/6/1 Copyright (C) 2011 NTT Corporation 14 WIP ‐ Snapshot diff (2/4) • Approach • Compare b‐trees of “ifile” (metadata storing NILFS2 inodes), then scan modified inodes in the ifile blocks whose disk addresses differ. ifile (cno=100) ifile block #3 ifile block #81 inode Block 10 11 ino=n modified ino=n Address ino=n + 1 ino=n + 1 1 2 3 4 5 ino=n + 2 created ino=n + 2 ‐‐‐ ino=n + 3 ifile (cno=200) ino=n + 4 deleted ino=n + 4 ino=n + 5 ‐‐‐ 82 11 1 2 81 4 5 2011/6/1 Copyright (C) 2011 NTT Corporation 15 WIP ‐ Snapshot diff (3/4) • API (testbed) • NILFS_IOCTL_COMPARE_CHECKPOINTS – Acquire inode numbers of modified inodes. • NILFS_IOCTL_INO_LOOKUP – Lookup pathname of the inodes by inode number. – Implementing this ioctl has impact on disk format, and also hard links are not handled at present. • Command line tool nilfs-diff [options][device] cno1..cno2 Checkpoint numbers 2011/6/1 Copyright (C) 2011 NTT Corporation 16 WIP ‐ Snapshot diff (4/4) Time required to compare two directories/snapshots containing linux‐2.6.39 source code that one file differs Comparison method Time (seconds) diff:1 -Nqr snapshot-a/ snapshot-b/ 56.5 x 209 faster x 38 faster diff:2 -Nqr snapshot-a/ snapshot-b/ 10.2 nilfs-diff 0.27 diff:1 ‐‐ modified diff which does not skip comparison even if device numbers and inode numbers equal. diff:2 ‐‐ optimized diff which skips comparison if inode numbers and ctimes equal. Hardware specs: Processor: Xeon 5160 @ 3.00 GHz x 2, Memory: 7988MB, Disk: IBM SAS SES‐2 2011/6/1 Copyright (C) 2011 NTT Corporation 17 WIP ‐ Revert API (1/4) • Problem (user’s demand) • Recovery may fail due to disk space shortage because each file is copied. • Restoring many files or media files takes time, which also leads to availability loss in business systems. – Recovery of large user data – Recovery against system upgrade failures – Recovery from tampering • Current effort • Recovery of past data without duplication. 2011/6/1 Copyright (C) 2011 NTT Corporation 18 WIP ‐ Revert API (2/4) • Approach (preliminary) • Deleted block of NILFS is not actually discarded; just its lifetime is marked ended. • Revive blocks that we want to recover, and reuse them. File A File A Revert created deleted File A time Restored file File A #100 #101 Node block 102 #102 Data block 100 101 Lifetime of blocks Blocks of “File A” reappear by extending their lifetime. 2011/6/1 Copyright (C) 2011 NTT Corporation 19 WIP ‐ Revert API (3/4) • API • in preparation ‐‐ Is it reflink? • Command line tool (testbed) nilfs-revert [options] source‐file file‐to‐be‐reverted 2011/6/1 Copyright (C) 2011 NTT Corporation 20 WIP ‐ Revert API (4/4) Time and disk space required to recover a 2GiB size file Capacity growth Restore method Time (seconds) (GiB) cp 84.6 2.04 nilfs-revert 1.1 0.016 0.8% overhead comes from update of 32 bytes metadata per disk block Hardware specs: Processor: Xeon 5160 @ 3.00 GHz x 2, Memory: 7988MB, Disk: IBM SAS SES‐2 2011/6/1 Copyright (C) 2011 NTT Corporation 21 Current Status • Not so many enhancement for the kernel code. Only noticeable changes are: – Online resize, fiemap, discard, and performance tuning, etc. • Advancement in userland support – Now bootable from GRUB2 – util‐linux‐ng (libblkid) recognizes NILFS2 partitions. – Palimpsest/udisks (GUI disk utility), parted, and so on. • nilfs‐utils 2.1 – Contains resize tool and easy‐to‐use GC tool/library. 2011/6/1 Copyright (C) 2011 NTT Corporation 22 TODO items / Future Plan • Snapshot diff and revert API • Efficient remote replication and restoration • Security – Past file shredding – Transient vulnerability frozen in snapshots • Remaining essential features – Extended attributes, POSIX ACL – Fsck • Performance improvement – Log writer, GC, directory lookup, inode allocator, etc… – Fast and space‐efficient caching of inodes and data pages against many snapshot mounts • Kernel space Garbage Collector 2011/6/1 Copyright (C) 2011 NTT Corporation 23 Questions ? We welcome your contributions • Mailing‐list – linux‐nilfs <linux‐nilfs (at) vger.kernel.org> • Project information – http://www.nilfs.org/ • Development tree – git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2.git 2011/6/1 Copyright (C) 2011 NTT Corporation 24 Thank you for listening ! 2011/6/1 Copyright (C) 2011 NTT Corporation 25.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    25 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us