Latency-Aware, Inline Data Deduplication for Primary Storage

iDedup: Latency-aware, inline data deduplication for primary storage Kiran Srinivasan, Tim Bisson, Garth Goodson, Kaladhar Voruganti NetApp, Inc. fskiran, tbisson, goodson, [email protected] Abstract systems exist that deduplicate inline with client requests for latency sensitive primary workloads. All prior dedu- Deduplication technologies are increasingly being de- plication work focuses on either: i) throughput sensitive ployed to reduce cost and increase space-efficiency in archival and backup systems [8, 9, 15, 21, 26, 39, 41]; corporate data centers. However, prior research has not or ii) latency sensitive primary systems that deduplicate applied deduplication techniques inline to the request data offline during idle time [1, 11, 16]; or iii) file sys- path for latency sensitive, primary workloads. This is tems with inline deduplication, but agnostic to perfor- primarily due to the extra latency these techniques intro- mance [3, 36]. This paper introduces two novel insights duce. Inherently, deduplicating data on disk causes frag- that enable latency-aware, inline, primary deduplication. mentation that increases seeks for subsequent sequential Many primary storage workloads (e.g., email, user di- reads of the same data, thus, increasing latency. In addi- rectories, databases) are currently unable to leverage the tion, deduplicating data requires extra disk IOs to access benefits of deduplication, due to the associated latency on-disk deduplication metadata. In this paper, we pro- costs. Since offline deduplication systems impact la- pose an inline deduplication solution, iDedup, for pri- tency the least, they are currently the best option; how- mary workloads, while minimizing extra IOs and seeks. ever, they are inefficient. For example, offline systems Our algorithm is based on two key insights from real- require additional storage capacity to absorb the writes world workloads: i) spatial locality exists in duplicated prior to deduplication, and excess disk bandwidth to per- primary data; and ii) temporal locality exists in the access form reads and writes during deduplication. This ad- patterns of duplicated data. Using the first insight, we se- ditional disk bandwidth can impact foreground work- lectively deduplicate only sequences of disk blocks. This loads. Additionally, inline compression techniques also reduces fragmentation and amortizes the seeks caused by exist [5, 6, 22, 38] that are complementary to our work. deduplication. The second insight allows us to replace the expensive, on-disk, deduplication metadata with a The challenge of inline deduplication is to not increase smaller, in-memory cache. These techniques enable us the latency of the already latency sensitive, foreground to tradeoff capacity savings for performance, as demon- operations. Reads are affected by the fragmentation strated in our evaluation with real-world workloads. Our in data layout that naturally occurs when deduplicating evaluation shows that iDedup achieves 60-70% of the blocks across many disks. As a result, subsequent se- maximum deduplication with less than a 5% CPU over- quential reads of deduplicated data are transformed into head and a 2-4% latency impact. random IOs resulting in significant seek penalties. Most of the deduplication work occurs in the write path; i.e., generating block hashes and finding duplicate blocks. To 1 Introduction identify duplicates, on-disk data structures are accessed. This leads to extra IOs and increased latency in the write Storage continues to grow at an explosive rate of over path. To address these performance concerns, it is nec- 52% per year [10]. In 2011, the amount of data will sur- essary to minimize any latencies introduced in both the pass 1.8 zettabytes [17]. According to the IDC [10], to read and write paths. reduce costs and increase storage efficiency, more than We started with the realization that in order to improve 80% of corporations are exploring deduplication tech- latency a tradeoff must be made elsewhere. Thus, we nologies. However, there is a huge gap in the current ca- were motivated by the question: Is there a tradeoff be- pabilities of deduplication technology. No deduplication tween performance and the degree of achievable deduplication? While examining real-world traces [20], we Type Offline Inline developed two key insights that ultimately led to an an- Primary, NetApp ASIS [1], swer: i) spatial locality exists in the duplicated data; and latency EMC Celerra [11], iDedup ii) temporal locality exists in the accesses of duplicated sensitive StorageTank [16], (This paper) data. The first observation allows us to amortize the seeks caused by deduplication by only performing dedu- Secondary, EMC DDFS [41], throughput EMC Cluster [8] plication when a sequence of on-disk blocks are dupli- sensitive DeepStore [40], cated. The second observation enables us to maintain an (No motivation NEC HydraStor [9], in-memory fingerprint cache to detect duplicates in lieu for systems in Venti [31], SiLo [39], of any on-disk structures. The first observation mitigates this category) Sparse Indexing [21], fragmentation and addresses the extra read path latency; ChunkStash [7], whereas, the second one removes extra IOs and lowers Foundation [32], write path latency. These observations lead to two con- Symantec [15], trol parameters: i) the minimum number of sequential EMC Centera [24], duplicate blocks on which to perform deduplication; and GreenBytes [13] ii) the size of the in-memory fingerprint cache. By ad- Table 1: Table of related work:. The table shows how this pa- justing these parameters, a tradeoff is made between the per, iDedup, is positioned relative to some other relevant work. capacity savings of deduplication and the performance Some primary, inline deduplication file systems (like ZFS [3]) impact to the foreground workload. are omitted, since they are not optimized for latency. This paper describes the design, implementation and without affecting latency, rather than the lack of benefit evaluation of our deduplication system (iDedup) built to deduplication provides for primary workloads. Our sys- exploit the spatial and temporal localities of duplicate tem is specifically targeted at this gap. data in primary workloads. Our evaluation shows that The remainder of this section further describes the dif- good capacity savings are achievable (between 60%-70% ferences between primary and secondary deduplication of maximum) with a small impact to latency (2-4% on systems and describes the unique challenges faced by average). In summary, our key contributions include: primary deduplication systems. • Insights on spatial and temporal locality of duplicated data in real-world, primary workloads. 2.1 Classifying deduplication systems • Design of an inline deduplication algorithm that leverages both spatial and temporal locality. Although many classifications for deduplication systems • Implementation of our deduplication algorithm in an exist, they are usually based on internal implementation enterprise-class, network attached storage system. details, such as the fingerprinting (hashing) scheme or • Implementation of efficient data structures to reduce whether fixed sized or variable sized blocks are used. Al- resource overheads and improve cacheability. though important, these schemes are usually orthogonal • Demonstration of a viable tradeoff between perfor- to the types of workloads their system supports. Similar mance and capacity savings via deduplication. to other storage systems, deduplication systems can be • Evaluation of our algorithm using data from real- broadly classified as primary or secondary depending on world, production, enterprise file system traces. the workloads they serve. Primary systems are used for primary workloads. These workloads tend to be latency The remainder of the paper is as follows: Section 2 sensitive and use RPC based protocols, such as NFS [30], provides background and motivation of the work; Sec- CIFS [37] or iSCSI [35]. On the other hand, secondary tion 3 describes the design of our deduplication system; systems are used for archival or backup purposes. These Section 4 describes the system’s implementation; Sec- workloads process large amounts of data, are throughput tion 5 evaluates the implementation; Section 6 describes sensitive and are based on streaming protocols. related work, and Section 7 concludes. Primary and secondary deduplication systems can be further subdivided into inline and offline deduplication 2 Background and motivation systems. Inline systems deduplicate requests in the write path before the data is written to disk. Since inline dedu- Thus far, the majority of deduplication research has tar- plication introduces work into the critical write path, it geted improving deduplication within the backup and often leads to an increase in request latency. On the other archival (or secondary storage) realm. As shown in Ta- hand, offline systems tend to wait for system idle time to ble 1, very few systems provide deduplication for latency deduplicate previously written data. Since no operations sensitive primary workloads. We believe that this is due are introduced within the write path; write latency is not to the significant challenges in performing deduplication affected, but reads remain fragmented. 2 Content addressable storage (CAS) systems (e.g., [24, 31]) naturally perform inline deduplication, since blocks are typically addressed by their fingerprints. Both archival and CAS systems are sometimes used for primary storage. Likewise, a

Latency-Aware, Inline Data Deduplication for Primary Storage

Achieving Superior Manageability, Efficiency, and Data Protection With

A Large-Scale Study of File-System Contents

Netbackup ™ Enterprise Server and Server 8.0 - 8.X.X OS Software Compatibility List Created on September 08, 2021

Redefine Blockchain Storage

On the Performance Variation in Modern Storage Stacks

Filesystems HOWTO Filesystems HOWTO Table of Contents Filesystems HOWTO

Primary Data Deduplication – Large Scale Study and System Design

CD ROM Drives and Their Handling Under RISC OS Explained

Data Deduplication Techniques

The Effectiveness of Deduplication on Virtual Machine Disk Images

Porting FUSE to L4re

Paratrac: a Fine-Grained Profiler for Data-Intensive Workflows