Towards Fast De-Duplication Using Low Energy Coprocessor
Total Page:16
File Type:pdf, Size:1020Kb
Towards Fast De-duplication Using Low Energy Coprocessor Liang Ma, Caijun Zhen, Bin Zhao, Jingwei Ma, Gang Wang and Xiaoguang Liu College of Information Technical Science,Nankai University Nankai-Baidu Joint Lab Weijin Road 94, Tianjin, 300071, China [email protected], [email protected] Abstract—Backup technology based on data de-duplication are two possible ways to solve this problem: using more has become a hot topic in nowadays. In order to get a computing units to distribute the computing tasks and using better performance, traditional research is mainly focused on special computing devices in individual computing unit. decreasing the disk access time. In this paper, we consider computing complexity problem in data de-duplication system, Using more computing units can noticeably improve CPU and try to improve system performance by reducing computing performance and has better scalability [7], [8]. However, time. We put computing tasks on commodity coprocessor to how to maintain data synchronization between computing speed up the computing process. Compared with general- units remains a big problem. Using special computing de- purpose processors, commodity coprocessors have lower energy vices does not have synchronization problem, but has higher consumption and lower cost. Experimental results show that they have equal or even better performance compared with cost and longer development cycle. general-purpose processors. We found that putting the computing tasks on high per- formance commodity coprocessors could be a good solution, Keywords-De-duplication; Commodity coprocessors; Com- puting complexity; Low energy which avoids introducing extra synchronization problem and has better computing performance, lower energy consump- tion and less cost. I. INTRODUCTION In this paper, we present a new continuous data protection With the development of information technology, data has system based on data de-duplication: DedupCDP. Unlike become the foundation of all various trades and industries. traditional de-duplication systems, we pay more attention How to store the massive backup data generated in the to reducing the computing pressure on CPU by cooperating backup process has become a new hot spot. In the process with coprocessors. The rest of this paper is organized as of full backup, incremental backup and continuous data follows: Section II surveys related work. Section III presents protection (CDP) [1]–[4], if we store the backup data directly the architecture of DedupCDP system. Section IV describes without any treatment, it will cause large amounts of storage how to fast de-depulication using coprocessor. In section overhead. The data de-duplication technology emerged in V, we report on various simulation experiments with real recent years is a good solution to backup data overhead data. Finally, we describe our conclusions and future work problem. By discarding the duplicate data produced in the in Section VI. backup process, the storage overhead problem can be well solved. For the example of Data Domain De-duplication File II. RELATED WORK System (DDFS) [5], which applies data de-duplication to the Early de-duplication storage systems, such as EMC’s backup data produced in a month, achieves a compression Centra Content Addressed Storage (CAS) [9], implemented ratio of 38.54:1, and reduces the storage overhead and de-duplication at file level. File level de-duplication storage network bandwidth consumption greatly as expected. systems use the fingerprint of a file to detect identical files. Since there is a great performance gap between CPU Since they only implement de-duplication at file level, such and I/O operations, for the pursuit of better performance, systems achieve only limited storage saving. Moreover, file traditional de-duplication systems [5], [6] focus on how to level de-duplication storage systems have poor portability. decrease the disk access time. However, with the emergence It is natural to implement de-duplication at block level of new storage media such as SSD, this would no longer because of the block read and write interface provided by be a problem. Together with the development of network kernel. Venti [10] computes a cryptographic hash for each technology, which enlarges data transmission bandwidth disk contents and does data de-duplication by comparing greatly, CPU has to carry out more and more data com- cryptographic hashes of blocks. Venti achieves a better putation tasks such as SHA-1 and compression during the compression radio than systems implemented at file level data de-duplication process, which makes CPU a potential and can be easily ported to other operating systems. Venti bottleneck. If CPU is occupied by too many highly complex implements a large on-disk index with a cache for fast tasks, the I/O performance will be greatly decreased. There fingerprints lookup. Since there is no locality in fingerprints, 978-0-7695-4134-1/10 $26.00 ⃝c 2010 IEEE. its index cache is ineffective. Its throughput is still limited ∙ Capture every write operation BIO to the original to less than 7MB/sec though 8 disks are used to lookup volume and make a copy of it. fingerprints in parallel. ∙ Change the target device of the copied BIO to the data Early studies only focused on compression radio but not transmission module. on techniques to high throughput. To decrease the disk ∙ Insert the modified BIO to the list of Remote I/O access time, DDFS [5] uses Bloom filter [11] to detect Request Queue. duplicates instead of direct fingerprints comparisons. DDFS ∙ Dispatch the original write request to the original uses Stream-Informed Segment Layout technique to take ad- device. vantage of locality. Using these techniques, DDFS achieves Since the backup engine module and data transmission 100MB/sec for single-stream throughput and 210MB/sec for module are asynchronous, the backup engine only puts the multi-stream throughput. modified data into the remote I/O request queue without The above studies have been mainly focused on compres- waiting for it to be sent, so it can reach a very high sion radio and high throughput, not on CPU pressure which throughput. The data transmission module is responsible for can be a big bottleneck in de-duplication storage systems in sending the backup data to the de-duplication server via the near future. Our prototype is also implemented at the NBD device. Backup data are compressed before sending block level like DDFS, and use commodity coprocessor to to save bandwidth. Meanwhile, system can get a better release the CPU pressure. compression radio through data compression as well. III. SYSTEM ARCHITECTURE B. Deduplication Server DedupCDP is a data de-duplication CDP system based Deduplication server does the following tasks: receiving on Logic Volume Manager version 2 (LVM2). Our sys- data de-duplication from backup client, creating index for tem uses SplitDownStream [12] architecture, and transfers duplicate data search and metadata for data recovery, and data via NBD device. The whole system consists of three sending the metadata and the unique data to the backup parts: backup client, de-duplication sever and backup server. server for storing. Backup client is the data generating center, it makes a copy As described in DDFS system, too many disk index of each write request and sends it to the NBD device, and searches can decrease system performance. For data coming then does the original write operation. The primary task of from backup client, we use Bloom filter to do the data du- de-duplication server is to filter duplicate data blocks. After plication check. However, Bloom filter has some probability receiving backup data from the backup client, de-duplication of false positive rate, so we have to further check whether server performs the data de-duplication process. It discards the data is duplicate or not. We do this by comparing the the duplicate data, encrypts the metadata and non-duplicate fingerprint of the data to be stored with fingerprints stored in data, then sends them to the backup server. The backup system. As indicated in [11], the probability of false positive server decrypts the backup data, then stores the metadata rate can be calculated in the following formula. and unique data separately. 1 ks k − ks k Here, we detail each component of DedupCDP system 1 − (1 − ) ≈ 1 − e m (1) (Figure 1): backup client, de-duplication server and backup m server. Given our assumption that hash functions are perfectly random. In our system, the size of the Summary Vector is A. Backup Client 80M (m = 80M) and the number of hash functions is Backup client is running on the machine which requires 6 (k = 6), the size of test data is about 4G (s = 1M), backup service. It consists of three main modules: user inter- so the false positive rata of our system is less than 1%. face module, backup engine module and data transmission Most fingerprints have to be stored on disk since memory module. is not large enough to store all the fingerprints. We store The user interface module provides an interface between the fingerprints using a hash table on disk for fast search. system and user, including creation and management of Fingerprint cache is also organized into a hash table with CDP volumes. Besides, this module provides access to de- the same architecture. So when a cache miss occurs, we can duplication server. easily locate the missing fingerprint on the disk. The cost is The most important module of the backup client is backup only one disk read operation. engine module, which runs at the block level, and provides Organizing fingerprints in memory and on disk consis- CDP volume backup service. System dispatches fixed-size tently using hash technology results in low cache miss write requests to the CDP volume. The backup engine overhead. However, it doesn’t guarantee efficient use of backups each data chunk which is written to the CDP cache space.