GCTrees: Garbage Collecting Snapshots Chris Dragga and Douglas J. Santry Advanced Technology Group, NetApp Inc.
[email protected],
[email protected] Abstract—File-system snapshots have been a key component There are many advantages to employing shared storage. of enterprise storage management since their inception. Creating Shared storage offers location transparency, disaster recovery, and managing them efficiently, while maintaining flexibility and and advanced data management features. Snapshots have be- low overhead, has been a constant struggle. Although the cur- come an indispensable tool for data management. Snapshots rent state-of-the-art mechanism, hierarchical reference counting, offer a consistent read-only point-in-time view of a file system. performs reasonably well for traditional small-file workloads, Such views are important when a consistent backup is required. these workloads are increasingly vanishing from the enterprise data center, replaced instead with virtual machine and database Snapshots can also be used to conveniently recover from data workloads. These workloads center around a few very large files, loss, potentially without the need for a system administrator’s violating the assumptions that allow hierarchical reference count- assistance. ing to operate efficiently. To better cope with these workloads, we introduce GCTrees, a novel method of space management A space-efficient implementation of snapshots has to man- that uses concepts of block lineage across snapshots, rather than age block sharing well. As a result, snapshot implementations explicit reference counting. As a proof of concept, we create a must be able to efficiently detect which blocks are shared. prototype file system, gcext4, a modified version of ext4 that uses Consider a new snapshot.