PoS(CENet2017)090 high storage http://pos.sissa.it/

National University Technology of Denfence National University Technology of Denfence National University Technology of Denfence National University Technology of Denfence hina hina hina hina 163.com C @ C C C 7 Wu

410000, 1 410000, 410000, 410000, 201 , 7 , , , , [email protected] [email protected] Cyang_92 Jing July

system recovery time by up to 57%. Under high frequency file space storage overhead. causes a large syetemCDP change workload, FV- overhead and drastic system performance drop restricts its application. In we thispropose paper, a file version based file level CDP systembackup to(FV-CDP) bylow downusing the cheapstorage costsdistributed and storageusing local for object cacheobject andsending parralelto maskasynchronous network storage latency. It designs special opration log tofile systemidentify hierarchy the at any point-in-time and exploits parallel restoring in filesystem recovery. The experimental results show that parallel asynchronous objects sending makes system maxthe write ops to getFV-CDP improved by about 3.4 times, and the parallel recovery reduces file Continuous Data Protection (CDP) can restore data to any point-in-time, but Copyright owned by the author(s) under the terms of the Creative Commons Creative of by author(s)Copyright the terms the the under owned

Speaker Shanghai, China 1  4.0). 4.0 International (CC BY-NC-ND License Attribution-NonCommercial-NoDerivatives CENet201 22-23 Jun Jun Li Science and Engineering, College of Electronic Changsha E-mail:[email protected] E-mail: E-mail: Jiangjiang Science and Engineering, College of Electronic Changsha E-mail: ChangSha E-mail: Ning Science and Engineering, College of Electronic Changsha Distributed Object Distributed Yang Xin Science and Engineering, College of Electronic File Version Based Continuous Data Protection onDataProtectionBased Continuous FileVersion PoS(CENet2017)090 for data high coarse . Block-. Xin Yang Xin Previous CDP storage, a new solution to protection. But the block-level CDP providing istooproviding low-cost

For . Distributed storage has provide for data ds file version based continuous hierarchy at any point-in-time, and backup backup can cause severe disaster, for a large number of PC loss data t provides I overhea periodic periodic data loss resulted in backup is taken different storage controller based CDP the , viru 2 four periodic

CDP causes high d aspects of block-level CDP. ly suitable for backup system different propose with distributed object storage as the storage backend. Our FV- [1] make it imultaneously s . Recovery Point Objective (RPO)PointRecoveryObjective focused on the

and Related Work Relatedand time, but mainly s. Trap-array [2] proposed a disk array based CDP model. Zhu et al.[3] proposed a a global The panic. traditional the recent outbreak of the WannaCry . For these scenarios, continuous data protection (CDP) can theoretically recover ed point-in- applications, Compared with previous researches,system ouris FV-CDP a CDP system be can classified into file-levelblock-level, application-leveland CDP As IT is becoming a data-driven business huge space space overhead huge caus Introduction Introduction

any latency and exploring parallel dataleverages recovery local method cache for and asynchronously system performance, objects restoring.presents a log-based sending Thusmethod to toidentify FV-CDP filesystem optimize the andrestoring in filesystem exploits recovery. parralel system writing data management. The cCDP[14] system used the cloud object store to backup the log. change data protection system (FV-CDP) CDP implementation focusses on improving the write performance under high network storage recovery process and introduced cache and replication mechanism. LBFS[10] realized the file version backup in remote area under low-bandwidth network. version backup Ext3cow[11]in the Asimpementedsystem. it is cheap file and scalable, the distributed storage has been gradually favored in data backup field. Wood et al[12] showed that backups storage placed were and cheaper, Cumulus[13] backed onup local data tocloud the cloud and provided backup and recovery rate. The block-level CDP system is easy to be implemented, data resulting in data inbackground, The difficulties consistency file- and butusability examination. it ignores the consistence getbettersystemthebackupfile filein data level, records systemupdates level CDP and the recovery process is more concise. In the systemfile-level research,VMS[8] CDP used a copy-on-write strategy for file version backup. OceanStore[9] used versioning in backup differential backup method to only back up the changed data portion between versions. Morrey III et al. [5] found deduplication to optimize the backup space overhead. Mariner[6] leveraged the track-based logging to minimize the Recovery writeTime Objective (RTO). Theoverhead. ST-CDP[7] introduced aDifferential snapshot mechanismbackups based on caused large and Trap-array, gave the prediction model of the snapshot frequency influence on storage cost level CDP systems work in the block interface layer and is easy for implementation. CDP studies system architecture, Laden architecture iSCSI interface based CDP. In terms of optimizing CDP system overhead, Clotho[4] used a to become a mainstream solution to large-scale data storage. read/write bandwidth, which CDP example, and some to endure FV-CDP 1. PoS(CENet2017)090 and we it Xin Yang Xin object. The op_data file system local and Section III introduces to theSwiftGateway. to based on type and corresponding data. After into two data objects, one is the provides a simple and complete object FUSE these these objects

3 send

to file operation filtering function, through the operation operation_info a stackble object, which contains the path name, operation type, object and the other is the unstructured objects and objects under high network storage delay

FV-CDP system consists of three parts, the local Linux node Swift , any poin-in-time. any when the operation information of the file system is acquired, it is

, 1 to op_record , FUSE bypasses it to the FV-CDP backup module. After the backup

operation_info optimization Figure Figure 2

file systemfile With With FUSE, we added local xfs information system operation information, including FV-CDP Architecture FV-CDP : xfs the the information into 1

space. As shown in Swift, as the OpenStack Object Store gateway, FUSE is software a interface Unix-likefor computer systemsoperating that allows users to As shown in This chapter introduces the system architecture and implementation in three parts. Section FV-CDP Module Backup FV-CDP System Architecture System

System Architecture and Implementation Architecture System

represented by the timestamp and corresponding data. This information is then sent to the backup module. In FV- CDP backup module, we further convertstructured operation log storage management interface. The Swift Gateway acts as a proxy sever to send the objects to objectcluster. storage backendCeph storage uses The storage distributedthebackendcluster. 2.2 in user can get filtering the module receives the operation information, it willtransfer decide which has to be backed up, Figure Figure create file systems builtin user space.We 2.1 (FUSE) with file operation filtering capability, the back-endconsisted storage of the distributed object storage cluster, between the two Swift Gateway storage objectmanagement. severfor proxy is used as I shows the systemoverview ofachitecture, FV-CDP Section II introducesbackup the FV-CDP process and restoretohow FV-CDP 2. PoS(CENet2017)090 . it , to is t log f each object. Xin Yang Xin point-in- op_ op_records op_log from the Swift the t makes another convenience. is a very small object. I or containing t in database. objects is essentially a data table p_record table Figure3. all changes on the file system, so it can ed Op_record op_record op_log shows the changes of a single file on the time 3 with the way proposed by us, 4 aggregation method reduces the average storage

instruction instruction in e firstly look at how to use the file version backup to

. It is easy and efficient to do this task with database Figure object are encapsulated, they are sent to the Swift op_record table) logged the operation before

op_data s we take the we take is manipulated, and these operations are logged to , , t . In addition, the aggregate o op_log object to file1 , with MySQL. In order to restore the file system to the point in time object. This file1 object is toused record operations on FUSE including the filepath, opration object and . The operation log object record op_log op_record task Swift the this op_log es object formed by multiple structured to restoreto Restore File1 to Any Point-in-time Any Restore File1 to op_record FV-CDP Backup Process FV-CDP

op_log t1, t2, t3 and t4 object is a copy of the file version corresponding to that timepoint, and it is simplely table), so that we can retrieve through the retrieve so we canthrough table), that 3: fetched need In order to restore the file system to any point in time, the file hierarchy at any Before restoring entire file system, w When The is also needed

op_log FV-CDP Restoring Implementation Restoring FV-CDP

we op_log firstly time be used to restore the file systemitNormally, ishierarchy. not simple to extract the file system hierarchy from the operation log; organized into however, a data table ( do FV-CDP Figure Figure restore a single file to any point in time. Ataxis. If seriously affect the system's write performance; therefore, all objects are firstly cached locally sent to the before then sending parallelly Swift gateway. asynchronously and 2.3 delay for a single The ( gateway. Simple synchronous network transmission can cause serious storage latency, it will type such as create, write, delete and timestamp. record is stored as a Swift object to the back-end distributed object storage cluster, the storage delay will greatly affect the system performance. Thus, we aggregate formmultiple an Figure 2: Figure packaged into a Swift object with its name directly encoded by using path and itsby with timestamp. encodedusing Swiftobject path name into and packaged adirectly FV-CDP op_data PoS(CENet2017)090 a

node, Xin Yang Xin backed up restoring server In MySQL backup data sending over

restore the entire file system and Restoring Performance op_log_table.

synchronous the MySQL Figure 6: Figure 5 can take full advantage of the independence between

is obtained, in combination with the method of based backup eliminates the interdependence of Hierarchy loaded into the FV-CDP raverse the file system hierarchy the entire file system. the file entire t table then

y single thread situation,

are are excuted. phase, File System File under ing Figure4 , FV-CDP can, FV-CDP recovery t sys_ hierarch because Max Write Perfomance Perfomance Write Max Restor : and parallelly restore and parallelly restore 4 e firstly introduce the test environment settings, and the back-end storage usese point-in-time. File version

For the system write performance test, filebench sequential write-file workload is used. W When . In the I Performance Evaluation and

drop. To average the network delay, FV-CDP introduces local object cache and uses parallel size, and the I/O size is set to 64k.The test baseline ismechanism. The FUSE without FV-CDP results is shown in Figureperformance drops drastically from the baseline FUSE 330 67 write ops to sigle thread 5.FV-CDP It shows thatwrite when ops FV-CDP mechanism is network added, brings huge FUSE latency. This latency blocks writing and thus causes write performance Figure 5: Figure The workload is set with 1 thread sequential write on the file set of 10k files and 128k per file with the same hardware usessettings. Thethesingle databasenode FV-CDP severMySQL built on virtual CentOS7 machine with 4G RAM, hard100G disk and 4 cores CPU. Filebench is used benchmark. as 3. object storage cluster, storage nodes are built on CentOS7 virtual100GB hard disk andmachine 4 cores CPU. Thewith Swift Object Gateway2GB is also deployedRAM, on the CentOS single file to to any data backup Figure Figure procedures in procedures FV-CDP Thesegateway. objects are PoS(CENet2017)090 scenario Xin Yang Xin in Figure 7, in the workload is , indicating , that in the The result is shown in 60 minutes linearly over time. 5G size file system under 35 6 near redundant redundant backup. a file set is file size 1M, file number 10k. Storage Storage Overhead : 7 This paper proposes a file version based file-level continuous data protection system: FV- In the storage space overhead test, we use filebench file-sever workload, In the recovery performance test, the workload is filebench file-sever workload, the Conclusion And Futurework Futurework And Conclusion

restoring phase, we design the method to determine the file system hierachy from operation log. FV-CDP can parallelly restore the entire file system because of independence data.of The theexperimental resultsbackup show that parallel asynchronous objects sending makes the FV- CDP system max write ops to get time up reduces recovery system file to 57%. by improved by about 3.4 times, and the parallel recovery 4. which uses FUSE to filter the file operation CDP, information and backs up the information into sends local cached objects the distributed object storage FV-CDP cluster through Swift gateway. asynchronously to the Swift gateway in parallel to mask the network Inlatency. the file system ops workload takes approximate 55G storage space of high frequency filesystem changes of backs becauesup multiple FV-CDP FV-CDP versions of ahas single fileIt independently. helps parallel the resulted ina larger storage causes but overhead recovery, system Figure Figure set under 1 thread, which, the storage overhead is increasing to 57%.to entire filesystem 3033 files, 20 minutes 4054 files andshown in30 Figureminutes 6, 4936in which,files. parallelFV-CDP Test recovery resulthas speededis up the systemWhen restoring. the thread numbers exceeds a certain value, the addedrestoring timebecause threadthe time nooverhead is longerfixed when decreasesMySQL excutes the procedurs in Figure 4. up recovery systemreduces time the fileby showthat recovery The parallel experimental results results show that parallel asynchronous objects sending makes the system FV-CDP max write times. about3.4 ops by to get improved worklaod is set 1 thread, filesize 128k, file number 10k, eventgen rate test30. We the restoring time overhead to the selected points in time. There're 10 minutes since runFV-CDP with the FV-CDP asynchronous objects sending. It improves the write performance greately. The experimental PoS(CENet2017)090 Xin Yang Xin [J]. ACM [J].

[J]. IEEE [J]. roceedings roceedings P

[J]. ACM [J]. roceedings of P

roceedings of the of roceedings P

. ) roceedings of the of roceedings Proceedings P

.

Disaster RecoveryDisaster as a Cloud Service: A low-bandwidth network file network low-bandwidth system. A

Cloud object storage based Continuous Data Continuous Cloud based storage object

7 [J]. HotCloud, 2010, 10(8-15 [J]. Cumulus: Filesystem the backup to cloud

Pond: The OceanStore Prototype. The OceanStore Pond: Architectures for Controller Based CDP. forController Architectures

Content-based blockContent-based caching ST-CDP: Snapshots in TRAP for continuous data protection TRAP Snapshots in ST-CDP: Efficient logging and replication techniques for comprehensive data techniques for comprehensive and replication Efficient logging Trap-array: A disk array architecture providing timely recovery to any to any recovery timely providing architecture array disk A Trap-array: Ext3cow: A time-shifting file system for regulatory compliance regulatory file system time-shifting for A Ext3cow:

Clotho: Transparent Data Versioning at the Block I/O at Block the Level. Data Versioning Clotho: Transparent Portable and efficient continuous data protection for network file network servers. for protection Portable and efficient continuous data

roceedings of the Networking, Architecture and Storage (NAS), 2015 IEEEArchitecture Storage and (NAS), the of roceedings Networking, P

.

roceedings of the ACM SIGARCH Computer Architecture News, F, 2006 [C]. IEEE 2006 [C]. F, Architecture News, Computer SIGARCH ACM roceedings the of P

roceedings of the Mass Storage Systems and Technologies, 2007 MSST 2007 .24th 2007 MSST Technologies, the Systems of and roceedings Storage Mass P

. roceedings of the ACM SIGOPS Operating Systems Review, F, 2001 [C]. ACM. 2001 [C]. F, Review, SIGOPS Operating Systems ACM roceedings the of roceedings of the Dependable Systems and Networks, 2007 DSN'07 37th Annual IEEE/IFIP IEEE/IFIP Annual 37throceedings Networks, 2007 DSN'07 Dependable and the of Systems FLOURIS M, BILAS A. A. BILAS FLOURIS M, The FV-CDP system can cause significant storage overheadThe due FV-CDP to independent backup of Protection (cCDP) Protection 2015 [C]. IEEE. International Conference, F, WOOD T, CECCHET E, K al. et K, RAMAKRISHNAN CECCHET T, WOOD Economic Benefits & Deployment Challenges VOELKER M. G S, VRABLE M, SAVAGE 14. 2009, 5(4): on Storage (TOS), Transactions al. et Y, SONG R, MANDAGERE ROUTRAY N, MUTHITACHAROEN A, CHEN B, CHEN A, MAZIERES D. MUTHITACHAROEN P Z, BURNSPETERSON R. 190-212. 2005, 1(2): on Storage (TOS), Transactions YANG J, CAO Q, LI X, LI Q, J, CAO al. et YANG 2012, 61(6): on Computers, 753-66. Transactions [M]. system Digital 1990. VMS internals file Press, K. MCCOY GEELS D, R, al. et P C, EATON S RHEA 2003 [C]. F, FAST, MORREY III C B, GRUNWALD D. D. GRUNWALD III C B, MORREY 2006 [C]. F, Technologies, 23rd the and Conference Storageof on Mass IEEE Systems T-C. CHIUEH M, LIN LU S, protection IEEE. 2007 [C]. Conference on, F, IEEE ZHU N, CHIUEH T-C. T-C. N,ZHU CHIUEH P 2007 [C]. IEEE. International Conference on, F, 2004 [C]. F, MSST, the of LADEN G, TA-SHMA P, YAFFE E, et al. al. E, et YAFFE P, TA-SHMA LADEN G, 2007 [C]. F, FAST, the REN J. W, XIAO Q, YANG point-in-time. Computer Society. [7] [8] [9] [5] [6] [3] [4] [1] [2] [11] [12] [13] [14] [10] all versions of the file. Future work will further versionsexplore theand correlationintroduce between FV-CDP offile history deduplication technology to overhead. space improve the system storage References FV-CDP