Breaking Through the Backup Barrier
Total Page:16
File Type:pdf, Size:1020Kb
JPJ8;D@E BackupPC ZXcXdlj#=fkfc`X 9i\Xb`e^k_ifl^_k_\YXZblgYXii`\i JDFFK?FG<I8KFI BackupPC handles backups over the network for a ranges of platforms. Find out more about this user-friendly, configurable, high-performance open source backup system. BY DAVID NALLEY etwork backup platforms are ated wiki, and the project is still led by the files to see whether they are identi- often unwieldy, partly because the original primary author, Craig Bar- cal. If it determines that the files are the Eof the complexities of schedul- ratt. Although the tool has been around same, it moves a single copy of the file ing logic and media management. User since 2001 and is relatively mature, the to the “pool” and creates hard links to friendliness can be hard to find in an en- latest version – BackupPC 3.1.0 – seems each instance of the file in the backup terprise-ready backup system. The Back- to be reaching new users. set. The results are surprising: In the upPC project [1] fills the backup niche first test I ran on eight machines (per- elegantly, handling backups over the 9\e\]`kj forming uncompressed backups and network for a plethora of platforms and One of the defining features of Back- retention of two full backups and six transports. upPC is data de-duplication. In a tradi- incrementals), my total data store was BackupPC follows the Unix tradition tional backup system, having multiple ~1TB, but BackupPC’s data de-dupli- of small programs that perform a single backups of files that haven’t changed cation brought the actual size on disk task very well. Like other classic Unix in more than one full backup interval to ~675GB. utilities, BackupPC leverages the power requires storing the same information BackupPC also offers several nice of other applications instead of trying to more than once. The problem is only scheduling features, such as the ability reinvent the wheel. BackupPC supports compounded when you back up multi- to prioritize backups. By default, several protocols for both Windows and ple computers – particularly if they are BackupPC wakes hourly and identifies Unix-like clients – from rsync and SMB/ end-user machines that might be on the any computer that hasn't completed a CIFS, to tar and rsync tunneling over same circulation list for memos, spread- backup within the specified interval. SSH. The focus is on efficient scheduling sheet, and other common documents. Also, it checks to see which machines and a user-friendly restore process. BackupPC addresses this problem with are on the network, and after combining BackupPC has an active user commu- a two-tier check. The first check locates these two lists, BackupPC prioritizes nity with mailing lists and a user-gener- files with the same names and hashes the list of available hosts on the basis of 72 ISSUE 93 AUGUST 2008 BackupPC JPJ8;D@E time since the last backup. Other factors RAID to combine multiple disks, but you So that BackupPC will start automati- can also influence this priority list. For can only use a single filesystem to hold cally, you must add init scripts to your instance, a machine that is on the net- the store. system. In the init.d subdirectory, you work 24 hours a day is generally pre- BackupPC tests to ensure that it can will find init scripts for a variety of dis- empted by a machine with a more spo- create these hard links at each startup. tributions. radic network presence record. You’ll need to know the mount point of Copy this to /etc/init.d, set it to start My favorite feature of BackupPC is this filesystem during installation. on boot, and then start the daemon: that end users can initiate and perform The next two steps are really one-lin- their own restores without the interac- ers in the console and consist of creating $ su -c tion of the backup operator or system a user for BackupPC to run as and in- "cp linux-backuppc administrator. If you have been involved stalling the software prerequisites. Of /etc/init.d/backuppc" in backups on any scale, you know that course, this assumes that you have httpd $ su -c "chkconfig handling restores of a lost or mangled already installed and configured for your --add backuppc" file is time consuming. If the user needs server: $ su -c "chkconfig to find a specific version of the file, the --level 345 backuppc on" restore process can grow into a multi- # adduser backuppc $ su -c "chkconfig hour effort. BackupPC offers a friendly # yum install --list backuppc" web interface that provides a directory perl-Compress-Zlib $ su -c "service and file tree for each backup. Users can perl-Archive-Zip backuppc start" select a single file or multiple files in the perl-XML-RSS tree, and BackupPC will restore these perl-File-RsyncP :fe]`^liXk`fe files without the need for a system ad- Although the installation process han- ministrator. BackupPC even checks for After the prerequisites are out of the dles the basic configuration elements, whether the user has the necessary ac- way, you can grab the source [1] and other options are available via the web cess permissions to view the file before uncompress it: interface or command line. beginning the restore. BackupPC configuration is contained Users also have some control over $ tar -zxvf in two files under /etc/BackupPC: hosts when to start a backup (full or incre- BackupPC-3.1.0.tar.gz details the identity of the hosts to be mental) or whether to remove their ma- $ cd BackupPC-3.1.0 backed up, and config.pl controls the chines from the backup list for a number $ su -c "perl ./configure.pl" server configuration. of hours. The hosts file lists the hostnames to be This launches the installer, which per- backed up and the authorized users for @ejkXccXk`fe forms the basic configuration and instal- that machine: Installation of BackupPC is relatively lation of BackupPC. The default answers painless because it’s included in most are fine, with a few exceptions. The data host dhcp user moreUsers mainstream distribution package reposi- directory should be the mount point of # <--- do not edit this line tories. However, sometimes it doesn’t the filesystem for the backup pool (e.g., nalleyt61 0 david include the latest available code or has /data/BackupPC or a subdirectory # <--- example static some special installation requirements, therein). Also, you might need to enter IP host entry so I’ll cover installation from source. If the correct path for the CGI bin directory host2 1 bill jeff,fred you used your distribution’s package for (e.g., /var/www/cgi-bin/). # <--- example DHCP host entry BackupPC, skip ahead to the Configuration section. Before working on the installation, you must consider disk space and how it is set up. Because BackupPC handles de-du- plication by creating hard links from the file loca- tion in the directory struc- ture of the backups to the pool where duplicated files are actually stored, the backup store must be on a single filesystem. This doesn’t mean that you can’t use LVM, soft- ware RAID, or hardware =`^li\(1@ek_\:fe]`^liXk`fe<[`kfi#pflZXeX[aljkk_\j\im\iZfe]`^liXk`fej\kk`e^j% AUGUST 2008 ISSUE 93 73 JPJ8;D@E BackupPC backuppc ALL=NOPASSWD: /usr/ bin/rsync then modify the com- mand arguments so that it uses sudo to call rsync: $Conf{RsyncClient Cmd} = '$sshPath -l backup $host nice -n 19 sudo /path/to/ rsyncSend $argList+'; =`^li\)1:_\Zbk_\jkXkljf]Zlii\ekcpilee`e^YXZblgjXe[]X`cli\j% Although you shouldn’t limit yourself to just these When I cover authenticating to the will only look at the minimum options configuration options, setting these web interface, I’ll explain authorized that must be configured to start backups items at a minimum will take care of users more, but the vital points are the on either Windows or Linux. Also, it’s backing up Windows machines or Linux hostname and the DHCP setting. If important to remember that you can machines with Samba shares exposed. your machine gets its address via DHCP, make modifications on a per-machine Although you can configure a number you still want to use 0 for the DHCP set- basis, too. of other things, such as file/ directory ex- ting, which tells BackupPC to use DNS One thing to set up is the admin user clusions and compression levels, the last to find the host. Setting this value to 1 and how backups will be transferred in required item is configuring the web in- tells BackupPC only to use nmblookup your environment (see Listing 1). How- terface. The installation automatically to query for the host address via Net- ever, I don’t advocate the use of root as installed the web interface, but you need BIOS. the backup user; instead, I suggest that to set up authentication for it, and you The default config.pl is configured to you use a low-privileged account and set need a way to authenticate the users in wake up every hour to look for hosts to up sudo so that rsync is accessible. As the hosts file and the admin users. Be- back up, do a full backup approximately the backuppc user, you’ll also need to cause you are using Apache to provide every 7 days, and do an incremental log in to the client machine via SSH so authentication, you have a variety of backup every day (Figure 1). that it becomes a known host.