arXiv:1803.07722v1 [cs.DC] 21 Mar 2018 ddp 2 6 ,2,5 0 1 ,9 3 4.However, SN-SS the 14]. on 13, 9, techniques 7, dedup 21, such 10, of 5, adoption 22, direct efficiency. 9, 16, storage [2, deduplication cluster-wide (dedup) increase on studies to several exist There systems storage cloud storage servers. balance servers storage to the cluster storage across the of utilization in removal objects relocate or can the addition and the in as changes crash dynamic such cannot allows cluster, it failure (iii) server and cluster, storage independent whole are single servers a so storage bottleneck, (ii) where metadata scalable, central highly is no prop- it contains multiple it to is (i) due [18] storage erties: cloud and in [11] GlusterFS employed as widely perfor- such [1 SN-SS high fault-tolerance 20]. for 11, and servers availability, storage scalability, of mance, number large a accommo- date (SN-SS) systems storage shared-nothing Introduction The 1 the in failure. robustness server high sudden as of per- event well minimal as evalu- with degradation The savings formance disk-space Ceph. high on shows ation deduplication proposed the imple- We ment mechanism. consistency asynchronous flag- a based employ we identification, garbage transactional on and ensure consistency To based chunks. of cluster-wide fingerprint content and made the chunks is of metadata placement The deduplication systems. shared- storage of constraints nothing design deduplica- the preserving scal- distributed while performance ability a guarantees which design shard We metadata copies tion duplicate cluster. eliminate the can across cluster- paper, that scalable this and deduplication In fault-tolerant wide robust, metadata. a propose deduplication inconsisten- we and in data resulting failure, errors in system to cies a prone of event are the intro- which in deduplication changes, Further, transactional duces scalability, rebalancing. bottleneck, storage metadata and central no as sys- design such storage tems the distributed efficiency. shared-nothing ignores of space specifications research improve deduplication to Traditional systems storage tributed eulco Korea. of Republic eulcto ddp ehiusaeepoe in employed are techniques (dedup) Deduplication dis- in employed largely been has Deduplication * r rnei urnl flae ihAo nvriy Suw University, Ajou with affiliated currently is Prince Mr. outFutTlrn n clbeCutrwd Dedupli Cluster-wide Scalable and Fault-Tolerant Robust A wi hn hn-y e,Pic Hamandawana Prince Lee, Chang-Gyu Khan, Awais Abstract oagUiest,Sol eulco Korea of Republic Seoul, University, Sogang hrdNtigSoaeSystems Storage Shared-Nothing on, 8, 1 aec,tesoaecs nrae.Aohrefc of is transactions. effect failed systems of storage Another chunks garbage deduplication in increases. failures cost transaction lookup storage space fingerprint the but and employed overhead i.e., latency, be space can additional deduplication, logging to redo due to & undo contrary The over- savings. is journaling which and head checkpointing operation delay additional and require chunks. ordering is distribute SN-SS, transaction to it of responsible Additionally, However, are nature I/Os distributed parallel [4]. to metadata where system applicable style file directly soft-update disk-based of not server consistency use single the a storage to address in of is to event counts study an recent reference in A cluster failures. the deduplication implemented in and not data metadata if inconsistent cause changes, chunk-based can These variable carefully, or fixed [13]. is small transactions transaction multiple object-based into complete split a where changes, 1(a)(b) Figure problems. server. these illustrate storage to each I/Os on cluster. metadata DB-shards the high renew in incurs chunk rebalancing the of this However, be location must new in metadata the servers deduplication for storage case, updated this the In bal- across cluster. evenly utilization the to space servers storage the shuf- the ance a rebalancing across deleting This chunks the or occurs. fles adding cluster as in trig- server such is storage change rebalancing a storage whenever the gered SN-SS, In rebalancing. in DB-shards all cluster. to fin- the broadcasted the be chunk, must duplicate lookup a gerprint identify to inher- i.e., from problems, suffers ited SN-SS to However, approach [13]. DB-sharding proposed this been has server database storage metadata each dedup on em- whole the that of DB-shard approach such the beds DB-sharding reduce simple to order cost, In additional servers. for cost deduplication re- resource multiple 12] software 15, and 9, hardware 21, 5, additional 8, quire 7, [10, servers multiple decentralized across a ment manage- hand, metadata deduplication other distributing the to approach On scalability. also the but SN-SS limit of properties not shared-nothing 22] violate 2, only 16, [13, dedu- server single management a metadata using plication approach server dedup exam- central For A SN-SS. ple, of constraints design basic the violates eulcto lorqie rnatoa level transactional requires also Deduplication storage to related deeply is issue challenging Another * ugogPr,Yuga Kim Youngjae Park, Sungyong , ainfor cation DHT-based Placement Client Nodes Foo Object Chunking C1, C2 DM-Shard Chunk Lookup Foo FP (0xAA) OMAP Garbage Metadata Update I/Os Fingerprint Collection Data placement {C1 = 0xB4} Object FP Chunk List using DHT Client Nodes {C2 = 0xB5} Foo 0xAA 0xB4, 0xB5 Broadcasting Rash(FP) New server Rash(FP) CIT DB (a) DHT-based Rash(FP) Redirect( 0xB4 ) Chunk FP RFC Flag Tagged Shard Consistency Placement & OSS 1 DB-Sharding Redirect( 0xB4 ) = OSS 4 0xB4 0xB5 Redirect( 0xB5 ) = OSS 5

server.1 server.2 server.3 server.4 server.5 DHT OSS OSS OSS OSS OSS …. OSS 1 2 3 4 5 n Step 1. Chunk Relocation (b) Storage ….. Rebalancing 0xB4 Chunk FP Lookup DM-Shard Add CIT Tagged Unique Consistency Chunk FP RFC Flag store chunk Step 2. DB Shard metadata update I/Os 0xB4 1 0 (victim chunks require additional I/Os to update DB shards for location) Duplicate OMAP flag & rfc++ Object FP Chunk List Garbage Figure 1: (a) Traditional distributed DB sharding approach OSS 4 Collection and (b) storage rebalancing issues in SN-SS such as Ceph [18] Client I/O Chunk Redirection I/O Storage Server I/O and GlusterFS [11]. Specifically, (b) illustrates the chunk relo- Figure 2: Cluster-wide deduplication based on DB-sharding cation when a new server is added to the cluster. and content-fingerprint based placement in SN-SS. To address the above-mentioned challenges in SN-SS, the storage server to write or read objects in the cluster. we propose to build a scalable and consistent cluster- Each storage server performs deduplication and stores wide deduplication framework for SN-SS. In particu- data and metadata. When storage server receives a write lar, we use chunk’s content fingerprint to avoid lookup request (OSS 1 in Figure 2), it is responsible for split- broadcast issue in DB-shard and employ a tagged con- ting the object into small fixed-size data chunks, com- sistency approach to ensure the validity of deduplication putes the fingerprint for each chunk’s content. Then, it metadata. This paper has following specific contribu- redirects the data chunk to storage server based on the tions: computed fingerprint (OSS 4 in Figure 2). • We employ database partitioning to handle dedupli- This fingerprint-based redirection frees from keeping cation metadata in a decentralized manner and pre- the location of each data chunk in the storage system. serve the shared-nothing property of SN-SS. We use At this point, the storage server builds a mapping of the content-based fingerprint to distribute and locate the object and its data chunks’ fingerprints in DM-Shard the chunks in the cluster. Even if chunks are shuf- (Deduplication Metadata Shard) as shown in Figure 2 fled across the storage servers in the cluster, content- (OSS 1). We explain the DM-Shard in Section 2.2 in fingerprint is able to determine the exact location of detail. The redirected chunks received on other storage the storage server responsible for storing the object servers (OSS 4 in Figure 2) are treated in the following and chunks. manner; The chunk fingerprint lookup is made in CIT • We aim to design asynchronous tagged consistency (Chunk Information Table) of DM-Shard, which is re- which ensures correct status of the transaction and sponsible for maintaining the fingerprint of data chunk, deduplication metadata. Moreover, our partitioned reference count and commit flag. The reference count deduplication metadata and tagged consistency aid in of a fingerprint shows the degree of references linked to identifying garbage chunks and require no additional it and commit flag is a tag to ensure the validity of the monitoring and journaling. chunk (tagged consistency), i.e., whether the fingerprint • We design and implement the proposed data dedu- is pointing to valid stored content in the storage server plication in Ceph, a scale-out shared-nothing storage or content is missing from the storage server. If chunk system [18] and evaluate our proposed ideas in real fingerprint exists and commit flag is valid, then the refer- testbeds. ence count (RFC in CIT) increment is granted. Whereas, the non-existence of fingerprint is treated as a unique 2 Cluster-wide Data Deduplication chunk. The data chunk is stored in the storage server and CIT entry is updated accordingly (OSS 4). This process 2.1 Architecture Overview is iterated for all the data chunks in parallel. When all The proposed cluster-wide deduplication is built on a the chunks are stored, then Object-Map (OMAP) entry is shared-nothing distributed storage system. Figure 2 created (OSS 1) which defines the object layout such as, shows the architecture design of cluster-wide dedupli- name, fingerprint and chunk list of the object. The write cation. Logically, the SN-SS is composed of clients, operation finishes, when all the data chunks, OMAP and storage servers and no additional metadata servers and CIT data structures are created. employs distributed-hash table (DHT) for data place- The tagged consistency guarantees the validity and ment [18, 11]. correctness of all the CIT entries and data chunks in stor- The client performs object name hashing and locates age without additional logging and journaling. The DM-

2 Shard and tagged consistency together assist in identify- object and chunk relocation process is neglected in all ing the garbages and orphan data chunks, i.e., remains of previous deduplication studies [10, 7, 13, 15]. In pre- partially failed transactions. The chunk fingerprints with vious studies, the location of object and data chunks is an invalid flag (Flag in CIT) are interpreted as garbage stored along with metadata, i.e., data chunk 1A is stored data chunks and collected periodically. on server x and data chunk 1B is stored on server y. 2.2 Deduplication Metadata Shard This type of deduplication metadata management suf- We build a Deduplication Metadata Shard (DM-Shard) fers when chunks are relocated in the cluster because as shown in Figure 2 to effectively manage deduplication object and chunk location is lost. One solution can be; metadata. The design decision to use distributed DM- to transform current self-balancing mechanism to update Shard is to comply scalable and shared-nothing prop- the deduplication metadata while relocating the objects erty of SN-SS. Every storage server in the cluster hosts a and chunks, but it entails complex implementation and a DM-Shard holding all the persistent data structures such high number of I/Os for every object and chunk reloca- as object layout information and data chunk fingerprint. tion to update the deduplication metadata. Each shard keeps the unique information of objects and To determine the exact location of the data chunk and data chunks in a separate data structure, i.e., Object Map related DM-Shard across the cluster, we use the data (OMAP) and Chunk Information Table (CIT). chunk fingerprint. The fingerprint can be obtained in two ways: i) to generate the fingerprint directly from the data • Object Map (OMAP): OMAP maintains the complete chunk contents (write request approach), and ii) to ob- layout and reconstruction logic of an object, i.e., ob- tain the data chunk fingerprint from OMAP using object ject name, object fingerprint, and list of data chunks. name or object fingerprint (read request). The computed- The OMAP data structure is shown in Figure 2. In fingerprint tells the storage server location responsible DHT-based storage systems, an object is identified by for storing the actual data chunk and the metadata shard hashing the object name, and if we do not maintain (CIT). This content-based placement relieves us from i) the hash of object, we cannot reconstruct the origi- complicated location management for each data chunk, nal object because we need all the chunks’ fingerprint ii) modifications in existing self-balancing mechanism, created from this object. OMAP assists in read op- and iii) frequent deduplication metadata updates. An- erations, where object fingerprint is given to lookup other gain of this content-based placement is that we do chunks belonging to a specific object. not require to broadcast I/Os to all storage servers for fin- • Chunk Information Table (CIT): CIT maintains the gerprint lookup, instead we send a single lookup I/O to performance-sensitive deduplication metadata. It in- only a single storage server. cludes data chunk fingerprint, reference count and commit flag. All the lookup and reference update op- 2.4 Asynchronous Tagged Consistency erations are possible via this data structure. The deduplication metadata inconsistencies in dis- tributed storage systems lead to data authenticity and in- The advantage to keep different data structures is man- tegrity issues. For example, if an object transaction is ifold: i) to provide effective execution of fingerprint op- split into multiple chunk-based transactions, and one of eration, i.e., lookup, increment/decrement, ii) reduced the small transactions fails. Then, in such case, the whole congestion on a single data structure when multiple I/Os object transaction fails and two problems are likely to access the data structure, iii) to avoid data chunk finger- happen. First, an invalid reference fingerprint in DB- print lookup in case of the read request. shard and second, garbage chunks left of the failed trans- Both OMAP and CIT data structures are updated syn- action. Worst of all, new incoming duplicate fingerprint chronously during a write operation to avoid concurrent increments the invalid reference entry, causing serious lookups of identical fingerprints, which can result in stor- metadata inconsistency. Due to transactional modifica- age inefficiency. We describe complete read and write tions, a complicated transaction and rollback logic is re- I/O transaction with usage of OMAP and CIT in Fig- quired to make reference count consistent [12, 6]. ure 3. For deduplication metadata replication and fault- To addresssuch consistency concern, we add a commit tolerance, we rely on SN-SS because we store our DM- flag to each data chunk entry which specifies the consis- Shard in the storage server and is replicated like a normal tency state of the chunk, i.e., 0 or 1. The flag with 0 is object. invalid chunk (missing from storage) and 1 is valid chunk 2.3 Chunk Relocation and I/O Routing (available in storage). A simple approach is to add com- SN-SS such as Ceph [18] and Gluster [11] distribute ob- mit flag with object or chunk data structure and update jects in a storage-balanced fashion. For instance, Ceph the commit flag at transaction completion time. How- uses CRUSH algorithm [19] to fairly distribute the stor- ever, this simple approach requires transaction lock and age load across the storage servers, when the cluster updates the flag synchronously which affects the scala- topology changes, e.g., a new storage server is added, bility of the system. To bypass such transaction lock, we removed or disk failure occurred. The objects are relo- propose an asynchronous thread-based consistency man- cated across the storage servers in order to balance the ager which runs on every storage server. All the incom- storage load in the cluster as shown in Figure 1(b). This ing write I/Os registers to consistency manager. Once

3 Transaction Split Fingerprinting Placement Function(fp) Chunking Redirect(fp)=j 2 fp fp 3 Redirect(fp)=i @ OSS (i, j, m, n) // Find chunk list of Object for (i = 0; i < chunklist.size(); i++) 2 fp fp Redirect(fp)=m Redirect(fp)=n OMAP.find 1 object += read_chunk(chunklist[i]); (objectfp, &chunklist) if fp in CIT 4 5 if fp.flag == 1 6 4 3 read_chunk(fp) 1 send ack() fp.rfc++; 1 4 chunk data() else OSS … Write Object OSS … Read Object if fp.getattr() != exist store chunk(); DMShard DMShard DMShard DMShard DMShard DMShard DMShard switch fp.flag(); 5 return object DMShard 7 Ack to client else CIT.add(fp); store chunk(); 6 OMAP.add (name, fp, chunklist) data chunks send_ack(); (a) Write I/O Flow (b) Read I/O Flow Figure 3: A complete write and read I/O transaction in cluster-wide data deduplication system. the I/O transaction completes, the consistency manager of failures and prevent the deduplication storage system asynchronously updates the flag managed in CIT (Sec- from inconsistencies. tion 2.2). If a crash occurs in the middle of a transaction when data chunk is stored and commit flag is not up- dated, then, in such case, the chunk will be marked as 3 Evaluation garbage due to invalid commit flag value because trans- Implementation: We implement the proposed cluster- action partially failed. We explain the tagged consistency wide deduplication framework in Ceph v10.2.3. The using two use cases. DM-Shard, consistency manager and garbage collector are embedded in each OSD ( Daemon). Unique Write: In this case, the object splits into mul- We use the SHA-1 algorithm to generate a data chunk tiple small chunks and stores the chunk on different stor- fingerprint and pass the fingerprint to the CRUSH algo- age servers based on data chunk fingerprint. Each finger- rithm [19] to distribute the data chunks in the Ceph stor- print in CIT holds an invalid flag by default, i.e., 0. The age cluster. We use SQLite [17] as backend storage for consistency manager is notified of the received write op- DM-Shard. eration. Once the I/O finish, the flag is switched from Testbed: We configured the Ceph storage cluster with invalid (0) to valid (1) asynchronously. four Object Storage Servers (OSS) equipped with two Duplicate Write: In duplicate write case, whenever SSDs acting as Object Storage Daemons (OSD). The a duplicate fingerprint wants to increment the reference details of testbed are listed in Table 1. We used the count in CIT, it needs to check the flag as shown in Fig- FIO [1] benchmark with librbd/krbd support for eval- ure 3. The fingerprint entries with a valid flag allow the uation by varying deduplication ratio and and number reference count increment or decrement operations. If of client threads with a 500GB workload. We compare the flag is invalid and reference update is required. Then, the proposed cluster-wide deduplication technique with the data chunk is required to perform an additional con- Baseline Ceph without deduplication and Ceph with a sistency check, to ensure the existence of data chunk in central server deduplication. We drop cache after every the storage server. We manage consistency check by sim- experiment and report the average of 5 iterations for each ply getting data chunk attributes from the storage server experiment. just like a stat call in the file system. If the data chunk OSS (x4) & Monitors (x3) exists, we switch the flag to valid and conduct the refer- Intel(R) Xeon(R) CPU E5-2640 v4 Processor ence operations. Otherwise, we first store the actual data @ 2.40GHz(10 cores) chunk contents and then, switch the flag. This consis- Memory 32GB Network 10Gbps tency check enables the presence of actual data and can OS CentOS 7.3.1611 repair the missing data chunks. Storage Samsung SSD 850 PRO 256GB x 2 Per OSS To claim free space consumed by garbage data chunks, Table 1: Testbed setup. we design and implement a garbage collection thread. Performance Analysis: To analyze the performance The thread periodically collects the data chunk finger- penalty incurred by the proposed cluster-wide dedup, we prints with an invalid commit flag in CIT. It keeps the fin- use synthetic datasets generated via FIO [1]. To clearly gerprints for a pre-defined threshold. Once the threshold observe the performance overhead, we set the dedupli- expires, the thread cross-match the collected FPs to CIT. cation percentage to 0% and use 8 client threads in FIO This cross-matching is required to assess any change, in benchmark. Figure 4 (a) shows the bandwidthof all three particular to invalid fingerprints. If there is no change, approaches. Our proposed partitioned metadata scales then fingerprints along with data chunks are removed as much as baseline Ceph with respect to the increased from the storage system. We do not use any additional chunk size. However, there is a certain performance journaling because it requires additional disk space. We overhead which is mainly derived from fingerprint com- claim that the proposed asynchronous consistency man- putation and network overhead for small chunk-sizes. ager ensures the data and metadata accuracy even in case The fingerprint overhead can be further minimized by

4 # of Disks Baseline Ceph Baseline Ceph Deduplication 1000 Central Dedup Central Dedup 1 2 4 8 Cluster-wide Dedup 1000 Cluster-wide Dedup Cluster-wide Dedup Approach 85 85 85 85 750 750 Disk-based Dedup Approach 85 77 65 61 500 500 Table 2: Deduplication space savings in percentage. 250

Bandwidth (MB/s) Bandwidth (MB/s) 250

0 0 4 8 64 128 256 512 1024 4096 0 20 40 60 80 100 storage servers which overcome the possible chances of Chunk Size (KB) Dedupe Percentage dedup metadata contention. (a) Performance Analysis (b) Deduplication Ratio Asynchronous Tagged Consistency: In chunk-based Figure 4: Performance analysis. consistency, the flag is managed for each data chunk fin-

Baseline Ceph Cluster-wide Dedup gerprint, whereas in object-based consistency, the flag Central Dedup Object-Based Sync 1000 Cluster-wide Dedup 1000 Chunk-Based Sync Chunk-Based Async is stored at object granularity. Figure 5 (b) shows the 750 750 bandwidth of different variant when employed. We see 500 500 that, when chunk size is small, the performance is poor 250 Bandwidth (MB/s) Bandwidth (MB/s) 250 in both chunk and object-based consistency compared to 0 0 2 4 8 16 32 4 8 64 128 256 512 1024 4096 asynchronous tagged consistency. However, when we in- # of Threads Chunk Size (KB) crease the chunk size, the performance improves. The (a) Scalability with Multiple Clients (b) Asynchronous Tagged Consistency chunk-based consistency shows high performance over- Figure 5: Scalability and consistency analysis. head as compared to others. It is due to additional seri- alized number of I/Os required to switch flags. Whereas, employing hardware-accelerator such as GPU for paral- in object-based consistency shows fair performance be- lel fingerprint computation. cause only a single I/O is required to switch the flag but Next, we discuss the performance of cluster-wide still degrades the performance more than 15% compared dedup with respect to deduplication ratio as shown in to baseline cluster-wide deduplication. On the other Figure 4 (b). We set the chunk size to 512KB and use hand, the asynchronous tagged consistency incurs negli- 8 client threads to compare the central and cluster-wide gible overhead compared to chunk and object-based con- dedup approaches. We observe both central dedupli- sistency overhead. Because both chunk and object con- cation and cluster-wide dedup approaches show limited sistency approaches introduce a transaction lock which performances to certain thresholds regardless of dedu- increases the I/O latency, whereas our approach switches plication rate. However, we see the cluster-wide dedup the commit flags asynchronously without acquiring any performance is twice that of central dedup. This im- transaction lock, hence no overhead is incurred. provement is basically due to scalable and distributed Storage Efficiency: We conduct this experiment to deduplication metadata management, which reduces the show the storage space efficiency of proposed cluster- metadata I/O contention. We do not observe notable per- wide dedup compared to local disk-based deduplication. formance improvement with cluster-wide dedup when To enable disk-based dedup, we configure Ceph cluster dedup ratio varies because small data chunk I/Os are still with [3] as backend disk file system with dedu- directed over the network which are too small to show plication enabled. We use 100% deduplication ratio and improvement if not stored on the storage server. report the results in Table 2. We observe that disk-based Scalability Analysis: To test the scalability, we use dedup storage efficiency decreases with increasing num- multiple client threads in FIO [1]. In Figure 5 (a), we ber of disks. It is because disks are not aware of each tend to show the impact of I/O contention created by other and cannot identify the duplicates stored on other multiple clients. We set the chunk size 512KB to ben- disks. Whereas, cluster-wide dedup storage efficiency efit the counter approach, i.e., central dedup because sin- remains high irrespective of number of disks. gle deduplicationmetadata DB becomes a bottleneck due to increased number of concurrent I/Os. Figure 5 (a) 4 Concluding Remarks shows that, when the number of client threads is less, the cluster-wide dedup performance is high compared to This paper presents a robust fault-tolerant, cluster-wide central dedup even when there is no contention. This deduplication framework for shared-nothing storage sys- is because central dedup server is responsible for all the tems. We design and implement a distributed deduplica- chunking and fingerprinting overhead. However, with tion metadata shard approach that uses the content hash the increased number of client threads, central dedup fur- of chunks to avoid I/O broadcasting and dynamic object ther degrades the performance. It becomes worse when relocation problems. We also propose a tagged consis- the number of client threads is 32, the central dedup tency approach which can recover reference errors and bandwidth degrades to 200MB. Whereas, our cluster- lost data chunks in case of sudden storage server failures. wide deduplication approach shows scalability and im- We implement the proposed ideas on Ceph. The evalu- proves the bandwidth with increasing number of client ation shows that our proposed approaches support high threads because CRUSH [19] distributes the data chunks scalability with minima performance overhead and high uniformly in a load-aware fashion to object storage robust fault tolerance. Our future work is to minimize servers and DM-Shard is distributed across all the object the fingerprint overhead and evaluation on a large-scale testbed with realistic datasets. 5 References [18] WEIL,S.A.,BRANDT,S.A.,MILLER,E.L.,LONG,D.D.E., AND MALTZAHN, C. Ceph: A Scalable, High-performance [1] AXBOE, J. Flexible I/O Tester. https://github.com/axboe/fio. Distributed . In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (2006), OSDI [2] BHAGWAT,D.,ESHGHI,K.,LONG,D.D.E., AND LILLIB- ’06. RIDGE, M. Extreme binning: Scalable, parallel deduplication for chunk-based file . In MASCOTS (2009), IEEE Computer [19] WEIL, S. A., BRANDT, S. A., MILLER,E.L., AND Society, pp. 1–9. MALTZAHN, C. CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data. In Proceedings of the 2006 [3] BTRFS WIKI. https://btrfs.wiki.kernel.org/index.php/Main Page. (Accessed on 03/09/2018). ACM/IEEE Conference on Supercomputing (New York, NY, USA, 2006), SC ’06, ACM. [4] CHEN,Z., AND SHEN, K. Ordermergededup: Efficient, failure- consistent deduplication on flash. In Proceedings of the 14th [20] XIA, M., SAXENA, M., BLAUM,M., AND PEASE,D.A. A tale of two erasure codes in HDFS. In 13th USENIX Conference Usenix Conference on File and Storage Technologies (2016), on File and Storage Technologies (FAST 15) FAST’16. (Santa Clara, CA, 2015), USENIX Association, pp. 213–226. [5] CLEMENTS,A. T., AHMAD,I., VILAYANNUR,M., AND LI,J. [21] XIA, W., JIANG, H., FENG,D., AND HUA, Y. Silo: A Decentralized deduplication in san cluster file systems. In Pro- similarity-locality based near-exact deduplication scheme with ceedings of the 2009 Conference on USENIX Annual Technical low ram overhead and high throughput. In Proceedings of the Conference (Berkeley, CA, USA, 2009), USENIX’09, USENIX 2011 USENIX Conference on USENIX Annual Technical Con- Association, pp. 8–8. ference (Berkeley, CA, USA, 2011), USENIXATC’11, USENIX [6] DOUGLIS, F., DUGGAL, A., SHILANE, P., WONG, T., YAN, Association, pp. 26–28. S., AND BOTELHO, F. The logic of physical garbage collection [22] ZHU,B.,LI,K., AND PATTERSON, H. Avoiding the disk bot- in deduplicating storage. In 15th USENIX Conference on File tleneck in the data domain deduplication file system. In Proceed- and Storage Technologies (FAST 17) (Santa Clara, CA, 2017), ings of the 6th USENIX Conference on File and Storage Tech- USENIX Association, pp. 29–44. nologies (Berkeley, CA, USA, 2008), FAST’08, USENIX Asso- [7] DUBNICKI, C., GRYZ, L., HELDT, L., KACZMARCZYK,M., ciation, pp. 18:1–18:14. KILIAN,W., STRZELCZAK, P., SZCZEPKOWSKI,J., UNGURE- ANU,C., AND WELNICKI, M. Hydrastor: A scalable secondary storage. In 7th USENIX Conference on File and Storage Tech- nologies (FAST 09) (San Francisco, CA, 2009), USENIX Associ- ation. [8] EMC DATA DOMAIN GLOBAL DEDUPLICATION ARRAY. http://www.datadomain.com/products/global-deduplication- array.html. (Accessed on 03/08/2018). [9] FREY, D., KERMARREC,A.-M., AND KLOUDAS, K. Proba- bilistic deduplication for cluster-based storage systems. In Pro- ceedings of the Third ACM Symposium on Cloud Computing (New York, NY, USA, 2012), SoCC ’12, ACM, pp. 17:1–17:14. [10] FU, Y., JIANG,H., AND XIAO, N. A scalable inline cluster deduplication framework for big data protection. In Proceedings of the 13th International Middleware Conference (New York, NY, USA, 2012), Middleware ’12, Springer-Verlag New York, Inc., pp. 354–373. [11] GLUSTER. Storage For Your Cloud. Gluster. http://www.gluster.org. [12] GUO, F., AND EFSTATHOPOULOS, P. Building a high- performance deduplication system. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (Berkeley, CA, USA, 2011), USENIXATC’11, USENIX Associ- ation, pp. 25–25. [13] KAISER,J., MEISTER,D., BRINKMANN,A., AND EFFERT,S. Design of an exact data deduplication cluster. In 012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST) (April 2012), pp. 1–12. [14] LU, M., CHAMBLISS, D., GLIDER,J., AND CONSTANTI- NESCU, C. Insights for data reduction in primary storage: A practical analysis. In Proceedings of the 5th Annual International Systems and Storage Conference (New York, NY, USA, 2012), SYSTOR ’12, ACM, pp. 17:1–17:7. [15] LUO,S.,ZHANG,G., WU, C., KHAN,S., AND LI, K. Boafft: Distributed deduplication for big in the cloud. IEEE Transactions on Cloud Computing PP, 99 (2015), 1–1. [16] QUINLAN,S., AND DORWARD, S. Venti: A New Approach to Archival Storage. In Proceedings of the Conference on File and Storage Technologies (Berkeley, CA, USA, 2002), FAST ’02, USENIX Association, pp. 89–101. [17] SQLITE:SQLITE HOME PAGE. https://www.sqlite.org/.

6