Performance of a Parallel Network Backup Manager
Total Page:16
File Type:pdf, Size:1020Kb
Performanceof a Parallel Network BackupManager Jamesda Silva, Olafur Guðmundsson,Døniel Mossé - Departmentof ComputerScience, University of Maryland ABSTRACT The adventof inexpensivemulti-gigabyte tape driveshas madepossible the completely automatedbackup of many dozensof networkedworkstations to a single tape. One pioblem that arises with this schemeis that many computerscannot backup theiì disks óver the network at more than a f¡action of the tape'srated speed. Thus, running overnightbackups sequentiallycan takewell into the next day. W9 havedeveloped a parallelbackup manager named Amanda that solvesthis problem by runninga numberof backupsin parallelto a holding disk, then using a multi-bufier copy schemeto transfer the backupsto the tape at the full rated tape speed. Amanda usis accurateestimates of current backup sizes as well as historical information about backup ratesso as to schedulebackups in parallelwithout swampingthe networkor overrunningthä holdingdisk or tape. Locally,we useAmanda to backup 11.5gigabytes of datain over 230 filesystemson more than 100 workstations,using a single 2 gigabyte8mm tape drive, taking two to three hourseach night. This paperdiscusses the architectureand perfórmanceof Amãnda. Bacþround/ very well; evencomputer-literate users just don't do backups Until a few years ago, the backup medium of on a regularbasis. choice for most large UNIX sites was the A common solution that most sites have 'J..14" 9 track reel-to-reel tape, while cartrìdge tapes were adoptedis a datalessworkstation model, in which all (andstill are)popular with smallersystems. Storage user data is stored on file serverswith small local capacitiesfor 9-track and cartridgetapes vary from disks to hold temporary files and frequently used about 40 to 200 Megabytes.These tape systemsare binaries, or even a diskless workstation model, often of smaller capacity than the disk subsystems where the workstationshave no disks at all[1]. they are backing up, requiring an operatorto feed Thesenetwork organizations require fast file servers multiple tapesinto the drive for a full backupof the with large disks,and generateheavy network traffic. disks. Our department,on the other hand,has always This problem has had a big influenceon site useddatafull workstations,where all user data, tem- systemadministration. Sites with only a few large porary files and some binaries, are stored on the timesharingsystems or file serverscan anangeback- workstations. File servers only provide shared ups by operatorsat scheduledtimes, but the coordi- binaries.This allows the use of smaller"filesewers, nation of backupsof a large numberof workstations with smallerdisks. A big advantageof this modelis on a network is more difficult. Requiringusers to political; userstend to want their own disks with do their own backupsto cartridgetapes doesn't work their own dataon their own desks.They don't want to deal with a central authority for spaceor CpU cycles,or be at the whim of somefile serverin the basement. @ parrby contractDASc6o- 87-C-0066from the U. S. Army SttategicDefense Since most file writes are local, performance can be betteras we avoid Çommlndto the Departmentof ComputerScience and by the expensivesynchronous RomeI¡bs andthe DefenseAdvanced Research projecti NFS file writes and network traffic is lower. With Agency under contract F30602-90.C.0010to the the datafull model we are able to have each UMIACS,both at theUniversity of Maryland.The views, fileserversupport over 40 machinesif needed,while opinions,and/or findings contained in thisreport are those in datalessand disklessenvironments only special- of the author(s)and should not be interpretedas ized fileserverscan support more than 20 worksta- representing the offrcial policies,either expressedor tions. The big disadvantageis the difficulty of implied,of the Department of Defense,DARPA RL, or managingand backing up all the datafull worksta- the U.S. Government, unlessso designatedby other uons. officialdocumentation. Computer facilities were provided in partby NSFgranr CCR-8811954 '92 Summer USENIX - June 8-June12,1992 - SanAntonio, TX 217 Performanceof a Parallel Network Backup Manager da Silva. ... The anival of inexpensive gigabyte Digital able to keepthe tapewriting and disk readingsimul- Audio Tape (DAT) and 8mm tape technologyhas taneously. Once put*,tphas decided which files to changedthe situation drastically. Affordable disks back up, it producesdata at a very steadyrate for are now smaller than affordabletape drives, allow- the durationof the backup. We have measuredthe ing the backupof many disks onto a single gigabyte dump rates over the network for various computer tape. It is now possibleto back up all the worksta- æchitectures;some typical valuesfor full dumpsare tion disks at a site over the network onto a single shown in Table 1. These numberswill dependon 8mm tape. disk speedsand compressionrates as well as archi- Now that the spaceproblem is solved,the new tecture,so your mileagewill vary. problem is time. Backing up workstationsone at a To get more data onto backuptapes, it is often time over the network to tape is simply too slow, advantageousto compressthe backupdata. Table 1 Many workstationscannot produce dump data as also showsthe resultingdump rateswhen the dump quickly as tapescan write[2]. For example,typical output is run throughthe UNIX coMpREssprogram dump rates (both full and incremental)on our net- before being sent over the network. The effective work rangebetween about SVo to 70Voof the rated dump rate column is the originøl dump size divided 246l<B per secondof our ExabyteEXB-8200 8mm by the time it took to dumpwith compression. tape drives[3]. We found that we could not add workstationsto Dump CompressedRate our network backups becausethe Architecture nightly backupwould not finish until well after the Rate Actual I Effective startof the nextwork day. SPARCstation2 322 90 150 Amøndn,the "AdvancedMaryland Automated SPARCstationL+ L72 45 101 Network Disk Archiver," was developedto solve DECstation3100 t42 38 101 these problems. To make the project manageable, NeXT Cube L64 18 36 we built Amandaon top of the standardBSD UNIX VAXstation3200 t28 15 40 DUMIprogram. Amanda uses a holdingdisk to run Sun3/50 724 \2 29 multiple backupsin parallel, and copies the dump Sun2/120 a) 4 7 imagesfrom the holding disk to tape,usually as fast Table 1: Full Dump Rates(KB/sec) as the tapecan stream. This paper concentrateson the performance Thesetimes are for level 0 dumpson relatively issues involved with backing up a network of large disks. Incremental dumps will have rnuch datafull workstations, including the performance lower rates, becausethere is a fixed overheadfor characteristicsof DUMp, tape drives, and of the DUMPto scanthe filesystembefore dumping. Incre- sequentialand parallel versions of our backup mentalshave lessdata to amortizethat overhead.as manager. We will also discussthe architectureand shownin Table2. technicaldetails of Amanda. Dump Compressed A¡chitecture Rate Performanceof BSD pu¡rlr Rate Actual I Effective BerkeleyUNIX systemsand their derivatives SPARCstation2 298 60 156 come with a backup programcalled outr,tp,and its SPARCstation1+ 22 10 T9 correspondingrestoration program, aptly named DECstation3100 47 11 40 REsroRE.DUMr can do incrementaland full backups NeXT Cube 39 11 20 on a per-filesystembasis. Incrementalbackups are VAXstation3200 1 J q? a^ done in terms of numbereddump levels, where each Sun3/50 8 L+ A level backsup all the files changedsince the last Sun2/120 l) 4 backup at a lower dump level. Full backupsare Table 2: IncrementalDump Rates (KB/sec) known as level 0 dumps. While backup policies vary from site to site, generallya level 0 dump is With eachfilesystem getting a full backuponly doneon eachfilesystem once every week or month, onceevery two weeks,each night aboutnine out of and incrementalsare doneevery day. We run incre- ten filesystemsare getting an incremental dump, mental backupsevery weekday evening, and each makingthe incrementalbackup rates more represen- filesystemgets a full dump every two weeks. tativeof the overallbackup rates. RESTOREcan restoreentire filesystemsor indi- BeforeDUMe outputs any data,it makesan esti- vidual files and directories,and includesan interac- mateof how largethe outputwill be. This estimate tive mode where operatorscan browse the dump can be usedto make 'We calculationsabout dump sizes directoriesto pick what shouldbe restored. in advance. measuredthe accuracyof this esti- DUMPruns several child processesthat read mate as a predictorof dump output sizesfor dumps particular files from the disk in parallel and take donelater the samenight. The resultsare good:the turnswriting to the outputtape. In this way DUMris dump estimatesare very accurate,usually well 218 Summer'92 USENIX- June8.June 12,1992 - SanAntonio, TX da Silva,... Performanceof a Parallel Network Backup Manager within [Vo. Achieving Rated Tape Speed Like most backup systems,DUMr does have We measuredthe transferrates to tape by someproblems correctlybacking up filesystemsthat repeatedlywriting a memorybuffer on a systemwith are being modified while the backupis occuring[4]. no load,that is, writirig in a tight loop as fast as the Some sites instead use file-orientedprograms, like operating system (OS) and tape would allow. TARor cPIo, but theseprograms have their own set Achieving that same rate when transfening dump of problems[S]. image files from the disk to the tape requiressomè sort of multi-buffer Performanceof GigabyteTape