Ottawa Symposium 2007 Discussion

Filesystem Support for Continuous Snapshotting

Ryusuke Konishi, Koji Sato, Yoshiji Amagai {ryusuke,koji,amagai}@osrg.net NILFS Team, NTT Labs. Agenda

● What is Continuous Snapshotting? ● Continuous Snapshotting Demo ● Brief overview of NILFS filesystem ● Performance ● Kernel issues ● Free Discussion – Applications, other approaches, and so on.

June 30, 2007 Copyright (C) NTT 2007 2 What's Continuous Snapshotting?

● Technique creating number of checkpoints (recovery points) continuously. – can restore files mistakenly overwritten or destroyed even the mis­operation happened a few seconds ago. – need no explicit instruction BEFORE (unexpected) data loss – Instantaneous and automatic creation – No inconvenient limits on the number of recovery points.

June 30, 2007 Copyright (C) NTT 2007 3 What's Continuous Snapshotting?

● Typical techniques... – make a few recovery points a day – have a limit on number of recovery points ● e.g. The Vista Shadow Copy (VSS): 1 snapshot a day (by default), up to 64 snapshots, not instant.

June 30, 2007 Copyright (C) NTT 2007 4 User's merits ● Receive full benefit of snapshotting – For general desktop users ● No need to append versions to filename; document folders become cleaner. ● can take the plunge and delete (or overwrite save). ● Possible application to regulatory compliance (i.e. SOX act). – For system administrators and operators ● can help online backup, online data restoration. ● allow rollback to past system states and safer system upgrade. ● Tamper detection or recovery of contaminated hard drives.

June 30, 2007 Copyright (C) NTT 2007 5 Continuous Snapshotting Demo (NILFS)

● A realization of Continuous Snapshotting shown through a Browser Interface ● Online Disk Space Reclaiming (NILFSv2)

June 30, 2007 Copyright (C) NTT 2007 6 NILFS project

● NILFS(v1) released in Sept, 2005. – The first version which lacks GC (Cleaner). ● NILFS2(v2) released in June, 2007. – Supports online GC with maintaining multiple snapshots. – Supports kernel 2.6.11~2.6.21 ● NILFS project home page – http://www.nilfs.org/ – GPL software; downloadable from the site. – NILFS Mailing List (in English)

June 30, 2007 Copyright (C) NTT 2007 7 Filesystems with Snapshots

Can change past Max. Number of Instant Sustainable Filesystem Developer checkpoint into Snapshots Snapshotting Snapshots after GC snapshot

ZFS Sun Microsystems No limits 

NTFS Microsoft 64 

Johns Hopkins No limits  University

LFS UCB 0 0

Linux LFS Charles University No limits √  with GC (Prague)

NILFS2 NTT No limits √ No limits √

June 30, 2007 Copyright (C) NTT 2007 8 Features of NILFS

● Continuous snapshots – Every snapshot can be accessed as a normal RO file­system ● Practical Log­structured Filesystem for Linux – B­tree based block and meta­data management – 64­bit data structures ● support many files, large files and disks. – Immediate recovery after system crash ● Highly available like journaling filesystems – Loadable Kernel module (No kernel patch required)

June 30, 2007 Copyright (C) NTT 2007 9 Conceptual diagram of NILFS snapshotting

Immediate recovery to the Segment*2: Set of modified data and meta­data blocks latest consistent state. written back periodically or triggered by sync requests validity of checksum

File

Directory Incomplete writes Allow total data restoration: (unclosed segment*2) Every snapshot preserves appends all modifications namespace and attributes as *2: we use “segment” in a few different instead of overwriting meanings. The segment here presents well as contents of files the unit of recovery. existing blocks

June 30, 2007 Copyright (C) NTT 2007 10 Online Disk Space Reclaiming (NILFS2)

Selectable GC policy Block kept by a snapshot Cleanerd (but not yet sophisticated) Eliminable block (user­land daemon) Recent block select segments Modified block in MT ● copy in­use blocks to be cleaned ● modify MTs: tables for managing checkpoints, checkpoint checkpoint checkpoint checkpoint segments, and disk address translation. ... n­2 n­1 n n+1 ... Snapshot Lifetime of blocks Write position

segment i­1 segment i segment i+1 ... segment j segment j+1

Disk block addresses are virtualized with an checkpoint address translation table (called DAT). ... n ... Modified blocks of MTs are also written out. Snapshot

segment i­1 segment i segment i+1 ... segment j segment j+1 June 30, 2007 Copyright (C) NTT 2007 11 (1) Performance

● Discussion Point – Though LFS approach shows high write performance, it incurs performance penalty due to fragmentation and GC overhead. (shown in the following slides) ● Question – Acceptable in return for the feature?

June 30, 2007 Copyright (C) NTT 2007 12 (1) Performance – iozone write

write rewrite rand­write written segments cleaned segments

AST 60.0 57.3 57.8 57.2 57.0 F 54.6 51.1 Nu

) 50.0 44.4 43.5 44.1 m

41.2 b er o

(MB/s 40.0 35.1 ut 33.8 f s hp 30.0 28.2 e gmen ug 563 ts

Thro 20.0 416

10.6 11.6 274

W 10.0

SLO 0 0.0 NILFS(v1) NILFS2 GC­wait NILFS2 0.4seg/s Measurement Environment: ­o extents ● Linux 2.6.20 SMP PREEMPT x86_64 Reclamation Speed: number of segments cleaned per second (1segment = 8MB) ●EM64T Pentium 4 HT 3.2GHz / 2GB RAM ●80GB SATA drive x2 (sda: system, sdb: test) iozone ­i 0 ­i 1 ­i 2 ­s 1g ­r 16 ­e ­U /test ­f /test/testcase June 30, 2007 Copyright (C) NTT 2007 13 (1) Performance – iozone read

read reread rand­read

AST 60.0 57.9 57.9 58.0 58.0 F 35~38%

) 50.0

39.0 39.4 (MB/s 40.0 37.8 37.6 35.8 34.9 ut hp 30.0 ug

Thro 20.0

W 10.0

2.9 2.8 2.0 2.0 1.8 SLO 0.0 Ext3 Ext4 NILFS(v1) NILFS2 GC­wait NILFS2 0.4seg/s Measurement Environment: ­o extents ● Linux 2.6.20 SMP PREEMPT x86_64 Reclamation Speed: number of segments cleaned per second (1segment = 8MB) ●EM64T Pentium 4 HT 3.2GHz / 2GB RAM ●80GB SATA drive x2 (sda: system, sdb: test) iozone ­i 0 ­i 1 ­i 2 ­s 1g ­r 16 ­e ­U /test ­f /test/testcase June 30, 2007 Copyright (C) NTT 2007 14 (2) Page Cache for Continuous Snapshotting

● Discussion Point – Current NILFS design avoids applying no specialized versioning extension to page cache itself. ● Every snapshot has independent page caches. ● Concurrent access to different snapshots is not efficient both in memory utilization and read performance. ● Question – What kind of page cache enhancement is reasonable and acceptable for continuous snapshotting?

June 30, 2007 Copyright (C) NTT 2007 15 (3) Online Block Relocation

● Discussion Point – LFS changes on­disk address of data and meta­data in each write or when GC moves them to reclaim disk space. ● To avoid filesystem failure or unsecured data read due to reuse of disk blocks, the start and end of disk read must be recognizable. ● But FS cannot know the completion of some type of read. e.g. readv(), io_submit() ● Mapped flag (BH_Mapped) interferes reassignment of disk address. ● Question – Better kernel support to achieve safe block relocation. ● It seems a common issue on operations like the online defrag.

June 30, 2007 Copyright (C) NTT 2007 16 Free Discussion

● How is Continuous Snapshotting? ● Other approaches to Continuous Snapshotting ● Applications ● Garbage Collection Strategy ● How to present snapshots? – Mount point per snapshot, or extended namespace

June 30, 2007 Copyright (C) NTT 2007 17