LINUXUSER Command Line:

You can specify -nH (“no host”) to store all the results in the current directory. Wget can modify the links in individ- ual HTML files. For example, if you set the -k flag, Wget will handle references to images, stylesheets, HTML pages from the same server, and so on. Wget refer- www.sxc.hu ences links to files that it has already downloaded by means of a relative path, whereas files that have not been stored on the local disk will keep their full . Don’t Panic! Even if a large is interrupted, there is no need to panic, and no need to start over from scratch. Wget gives you FTP and mirroring with Wget the - (for continue) option, which picks up from where the previous download left off. It does not matter whether you made the original download attempt using Wget or a graphical download MIRROR IMAGE manager – the tool compares the frag- ments with the original and just carries Wget downloads files and even whole from the command on from there. Wget gives you a lot of information while doing so; it might line. BY HEIKE JURZIK report:

The file is already fully U ny number of GUI-based down- compromise, which you can achieve by retrieved; nothing to do. load managers allow users to entering wget -nv. This option tells the Adownload files and whole web- program to write less output to your con- In cases where you repeatedly download sites. At the command line, you’ll need a sole but still provide some information. the same data, it makes sense to specify tool like Wget. Wget handles downloads To tell Wget to follow local links on the -N option, which tells Wget to com- quickly and without a lot of pointing and the server and mirror the data recur- pare the size and date of each file with clicking. Wget “speaks” HTTP, HTTPS, sively, just add the -r parameter. It makes the local copy: and FTP; it can continue interrupted sense to specify the recursion depth transfers, and it even has an update while doing so. You need to go down $ wget -N U function that only updates files that have one level to get both index. and all ftp://ftp.debian.de/debian-cd/U changed. embedded links (such as images or other 3.1_r0a/i386/iso-cd/debian-U HTML pages): 31r0a-i386-binary-1.iso All-Rounder ... The generic syntax for Wget is as follows: wget - -l 1 U The sizes do not match U www.-magazine.com (local 7935840) - retrieving. wget URL If you set the level Wget gives you command line output to depth to -l 2, Wget will let you know what it is doing (Figure 1): dig deeper by one level. in our example, the tool is establishing a In other words, if index. connection to a (standard html contains a link to port 80) and downloading the index. images.html, the down- html file to a local directory, ignoring load manager will now embedded images and not following follow the links on this links. If you do not want to view the page. fairly verbose output at the console, you A folder on the local might like to specify the -q (for quiet) hard disk is created for option. As this tells Wget to suppress the each URL, but you can output of error messages and basic infor- change this behavior by Figure 1: The simplest form of the Wget command ignores mation, however, you might prefer a adding another option. embedded images and does not follow links.

88 ISSUE 62 JANUARY 2006 WWW.LINUX - MAGAZINE.COM Command Line: Wget LINUXUSER

If nothing has changed, the download wget--limit-rateU manager will say something like: =20k ...

Server file not newer than U The parameter also un- local file "index.html" U derstands values in -- not retrieving. Mbytes; for 10 MBps you would type: But don’t worry if you forget the option: Wget does not normally overwrite local wget--limit-rateU files, but creates a backup with a serial =10m ... number (index.html.1, index.html.2 etc.) instead. If you prefer to restrict the total download volume, Specifying File Types specify the -Q parameter If you only want to download files of a instead. Again, you need Figure 2: This could be your very own “~/ .wgetrc”; com- specific type, and you try to pass an as- to state the volume of ments are indicated with pound signs. terisk (*) to Wget as a wildcard, the tool data; the option under- will simply display the following error stands values in , Kbytes or Mbytes. If you do this, you should beware of message: For example, the following command: other inquisitive users: if another user runs the ps command to display the ac- wget U wget -Q40m tive processes on the machine, the Wget www.linux-magazine.com/*.jpg command will be listed along with the ... restricts the download volume to 40 clear text username and password. As a HTTP request sent, U Mbytes. workaround, you could use a ~/.wgetrc awaiting response... U configuration file in your home direc- 404 Not Found Fully Automatic tory. Enter the following, for example: 14:24:09 ERROR 404: Not Found. If you have difficulty remembering the parameters for Wget, or if you think http_user = username Instead, you need to state the file type, doing so is a waste of your time, you can http_passwd = password or a list of file types, with the -A option, use a hidden configuration file in your for example: home directory to define your own pref- and keep the file private by entering erences. To create a configuration file, wget -r -l 1 U copy the global /etc/wgetrc template to chmod 600 ~/.wgetrc -A jpg,png,gif ... your home directory Perfect Combination If you watch the output closely, you will cp /etc/wgetrc ~/.wgetrc Wget does not expect user input but notice that Wget first downloads the instead just gets on with the job in the index.html but then removes the file and then launch an editor to modify the background. This is a big advantage if again to leave only the images. file. The file has entries for all the com- you need to run Wget on a remote ma- The whole thing also works in the op- mand line options, although they are chine via an SSH session. To do this, first posite direction, allowing users to ignore disabled by default. To set the -N param- establish a SSH session, and then launch specific file types using the -R parameter. eter as a default, simply remove the the Screen program by entering Again, the parameter expects a list of file pound sign (#) at the start of the follow- types that you do not want to transfer to ing line screen your local disk. The following command #timestamping = off at the prompt. After typing the Wget wget -R avi,mpg,wmv ... command and options to start down- and replace off with on. Figure 2 shows loading, you can press [Ctrl-A], [D] to keeps the disk memory hogs at bay. a sample configuration file for Wget. quit Screen. You can then log off, as any processes you have launched will con- Frugal Secure Download tinue to run. The next time you log on to You have several options for restricting Wget can also mirror data from servers the remote machine, simply type Wget. For example, if you have a slow where you need to authenticate by pro- connection, and prefer not to viding a username and password. To use screen -r use up all of your bandwidth for the this option, pass your credentials to the download, you can restrict the band- program when you launch it: to reestablish the Screen session. You width by specifying the --limit-rate= op- can now check whether Wget performed tion. You additionally need to specify the wget --http-user=username U as expected and relaunch the download volume in Kbytes per second, as in: --http-passwd=password ... if necessary. ■

WWW.LINUX - MAGAZINE.COM ISSUE 62 JANUARY 2006 89