WP-MIRROR 0.6 Reference Manual
Total Page:16
File Type:pdf, Size:1020Kb
WP-MIRROR 0.6 Reference Manual Dr. Kent L. Miller December 16, 2013 DRAFT 1 DRAFT 2 To Tylery DRAFT i WP-MIRROR 0.6 Reference Manual Legal Notices Copyright (C) 2012,2013 Dr. Kent L. Miller. All rights reserved. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License. THIS PUBLICATION AND THE INFORMATION HEREIN ARE FURNISHED AS IS, ARE FURNISHED FOR INFORMATIONAL USE ONLY, ARE SUBJECT TO CHANGE WITH- OUT NOTICE, AND SHOULD NOT BE CONSTRUED AS A COMMITMENT BY THE AU- THOR. THE AUTHOR ASSUMES NO RESPONSIBILITY OR LIABILITY FOR ANY ER- RORS OR INACCURACIES THAT MAY APPEAR IN THE INFORMATIONAL CONTENT CONTAINED IN THIS MANUAL, MAKES NO WARRANTY OF ANY KIND (EXPRESS, IMPLIED, OR STATUTORY) WITH RESPECT TO THIS PUBLICATION,AND EXPRESSLY DISCLAIMS ANY AND ALL WARRANTIES OF MERCHANTABILITY, FITNESS FOR PAR- TICULAR PURPOSES, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS. The WP-MIRROR logotype (see margin) includes a sunflower that is derived from 119px-Mediawiki logo sunflower Tournesol 5x rev2.png, which is in the public domain. See http://en.wikipedia.org/wiki/File:Mediawiki_logo_sunflower_Tournesol_5x.png. Debian is a registered United States trademark of Software in the Public Interest, Inc., man- aged by the Debian project. Linux is a trademark of Linus Torvalds. InnoDB and MySQL are trademarks of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Abstract The Wikimedia Foundation offers wikipedias in nearly 300 languages. In addition, the WMF has several other projects (e.g. wikibooks, wiktionary, etc.) for a total of around 1000 wikis. WP-MIRROR is a free utility for mirroring any desired set of these wikis. That is, it builds a wiki farm that the user can browse locally. Many users need such off-line access, often for reasons of mobility, availability, and privacy. They currently use kiwix which provides selected articles and thumbnail images. WP-MIRROR builds a complete mirror with original size images. WP-MIRROR is robust and uses check-pointing to resume after interruption. By default, WP-MIRROR mirrors the simple wikipedia and the simple wiktionary (Simple English means shorter sentences). The default should work ‘out-of-the-box’ with no user configu- ration. It should build in 200ks (two days), occupy 90G of disk space, be served locally by virtual hosts http://simple.wikipedia.site/ and http://simple.wiktionary.site/, and update automatically every week. The default should be suitable for anyone who learned English as a second languageDRAFT (ESL). The top ten wikis are the en wikipedia, followed by the de, nl, fr, it, ru, es, sv, pl, and ja wikipedias. Because WP-MIRROR uses original size image files, these wikis are too large to fit on a laptop with a single 500G disk. When higher capacity hard disk drives reach the market, this list may be shortened. But for now, a desktop PC with ample disk space and main memory is required for any of the top ten, unless the user does not need the images (and this is configurable). ii The en wikipedia is the most demanding case. It should build in 1Ms (twelve days), occupy 3T of disk space, be served locally by a virtual host http://en.wikipedia.site/, and update automatically every month. This project’s current Web home page is http://www.nongnu.org/wp-mirror/. This document was generated on December 16, 2013. Features WP-MIRROR has been designed for robustness. WP-MIRROR asserts hardware and software prerequisites, skips over unparsable articles and bad file names, waits for Internet access when needed, retries interrupted downloads, uses check-pointing to resume after interruption, and offers concurrency to accelerate mirroring of the largest wikis. WP-MIRROR issues a warning if MySQL is insecure (e.g. has a root account with no password), or if a caching web proxy is detected. WP-MIRROR warns if disk space may be inadequate for the default (a mirror of the simple wikipedia and simple wiktionary), and gracefully exits if disk space runs too low. WP-MIRROR normally runs in background as a weekly cron job, updating the mirror when- ever the Wikimedia Foundation posts new dump files. Most features are configurable, either through command-line options, or via the configuration file /etc/wp-mirror/local.conf. Wikis can be added to the mirror, or dropped from it. If an user makes an hash of things, and wants to start over, just execute root-shell# wp-mirror --restore-default which drops all WP-MIRROR related databases and files (except image files). This option is used regularly during software development and debugging. An user who wishes also to delete the image files, may execute: root-shell# cd /var/lib/mediawiki/images/ root-shell# rm --recursive --force * That said, a good network citizen would try to avoid burdening the Internet by downloading the same files, over and over. Troubleshooting If WP-MIRROR fails or hangs, please take a look at the log files /var/log/wp-mirror.log. Alternatively, try running root-shell# wp-mirror --debug root-shell# tail -n 1000 /var/log/wp-mirror.log The last few lines in the log file should provide a clue. Developers should select some of the smaller wikis (e.g. the cho, xh, and zu wikipedias) to speed up the test cycle. WP-MIRROR was written and is maintained by Dr. Kent L. Miller. Please see §1.8, Credits for a list of contributors. PleaseDRAFT report bugs in WP-MIRROR to <[email protected]>. iii Contents WP-MIRROR 0.6 Reference Manual ii Contents iv List of Figures xvi List of Tables xvii 1 General Information 1 1.1 About This Manual ................................... 1 1.2 Typographical and Syntax Conventions ........................ 2 1.3 Overview of WP-MIRROR ............................... 2 1.3.1 What is WP-MIRROR? ............................ 2 1.3.2 History of WP-MIRROR ............................ 3 1.3.3 The Main Features of WP-MIRROR ..................... 4 1.4 WP-MIRROR Development History .......................... 5 1.5 What Is New in WP-MIRROR ............................. 6 1.5.1 What Is New in WP-MIRROR 0.6 ....................... 6 1.5.2 What Is New in WP-MIRROR 0.5 ....................... 7 1.5.3 What Is New in WP-MIRROR 0.4 ....................... 8 1.5.4 What Is New in WP-MIRROR 0.3 ....................... 9 1.5.5 What Is New in WP-MIRROR 0.2 ....................... 10 1.5.6 What Is New in WP-MIRROR 0.1 ....................... 10 1.6 WP-MIRROR Information Sources .......................... 11 1.6.1 Project Home Page for WP-MIRROR ..................... 11 1.6.2 Downloading WP-MIRROR .......................... 11 1.6.3 Documentation for WP-MIRROR ....................... 11 1.6.4 Mailing Lists for WP-MIRROR ........................ 11 1.7 How to Report Bugs or Problems ........................... 12 1.8 Credits .......................................... 12 1.8.1 Contributors to WP-MIRROR ......................... 12 1.8.2 Documenters and Translators ......................... 12 2 Installing and Upgrading WP-MIRROR 13 2.1 General Installation Guidance ............................. 13 2.1.1 Operating Systems Supported by WP-MIRROR ............... 13 2.1.2 GNU/Linux Distributions Supported by WP-MIRROR ........... 13 2.1.3 How to Get WP-MIRROR ........................... 13 DRAFT2.1.4 Choosing Which WP-MIRROR Distribution to Install ........... 14 2.1.4.1 Choosing Which Version of WP-MIRROR to Install ....... 14 2.1.4.2 Choosing a Distribution Format ................... 14 2.1.4.3 Choosing When to Upgrade ..................... 14 2.1.5 Verifying Package Integrity Using Checksums or GnuPG ........... 14 2.1.5.1 How to Get the Author’s Public Build Key ............ 14 iv Contents 2.1.5.1.1 Download the Public Key from Public Keyserver . 15 2.1.5.1.2 Cut and Paste the Public Key from Public Keyserver . 15 2.1.5.2 Checksums for a DEB Package .................... 16 2.1.5.3 Checksums for a RPM Package .................... 17 2.1.5.4 Signature Checking for a Tarball .................. 17 2.2 Installing WP-MIRROR on Linux ........................... 17 2.2.1 Installing from a DEB Package ......................... 18 2.2.2 Installing from a RPM Package ......................... 18 2.3 Installing WP-MIRROR from Source ......................... 19 2.3.1 Installing Dependencies ............................. 19 2.3.1.1 Installing Build Dependencies .................... 19 2.3.1.2 Verifying clisp Version ....................... 19 2.3.1.3 Configuring common-lisp-controller ............... 19 2.3.1.4 Configuring cl-asdf ......................... 20 2.3.1.5 Installing Binary Dependencies ................... 20 2.3.2 Installing WP-MIRROR from a Standard Source Distribution ....... 21 2.3.3 WP-MIRROR Build Options .......................... 21 2.4 Postinstallation Configuration and Testing ...................... 22 2.4.1 Default Configuration .............................. 22 2.4.2 Working with X ................................. 22 2.4.3 Configuration of a Wiki Farm ......................... 23 3 System Planning and Configuration 25 3.1 General Information ................................... 25 3.1.1 Storage at the Wikimedia Foundation ..................... 25 3.1.1.1 XML dump files ............................. 25 3.1.1.2 SQL dump files ............................. 26 3.1.1.3 Images ................................. 26 3.1.2 Storage at Other Dump Sites .......................... 26 3.1.3 Storage on Mirror ................................ 26 3.1.4 WP-MIRROR .................................. 27 3.2 Laptop Planning and Configuration .......................... 27 3.2.1 Plan Your Disk Space .............................. 27 3.2.1.1 Configure HDD Write Caching ................... 27 3.2.1.1.1 Purpose: ........................... 27 3.2.1.1.2 Motivation: ......................... 27 3.2.1.1.3 Implementation: ...................... 27 3.2.1.2 Put images Directory on Partition with Adequate Free Space .