WP-MIRROR 0.5 Reference Manual
Total Page:16
File Type:pdf, Size:1020Kb
WP-MIRROR 0.5 Reference Manual Dr. Kent L. Miller December 14, 2012 1 2 To Tylery i WP-MIRROR 0.5 Reference Manual Legal Notices Copyright (C) 2012 Dr. Kent L. Miller. All rights reserved. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License. THIS PUBLICATION AND THE INFORMATION HEREIN ARE FURNISHED AS IS, ARE FURNISHED FOR INFORMATIONAL USE ONLY, ARE SUBJECT TO CHANGE WITH- OUT NOTICE, AND SHOULD NOT BE CONSTRUED AS A COMMITMENT BY THE AU- THOR. THE AUTHOR ASSUMES NO RESPONSIBILITY OR LIABILITY FOR ANY ER- RORS OR INACCURACIES THAT MAY APPEAR IN THE INFORMATIONAL CONTENT CONTAINED IN THIS MANUAL, MAKES NO WARRANTY OF ANY KIND (EXPRESS, IMPLIED, OR STATUTORY) WITH RESPECT TO THIS PUBLICATION,AND EXPRESSLY DISCLAIMS ANY AND ALL WARRANTIES OF MERCHANTABILITY, FITNESS FOR PAR- TICULAR PURPOSES, AND NONINFRINGEMENT OF THIRD-PARTY RIGHTS. The WP-MIRROR logotype (see margin) includes a sunflower that is derived from 119px-Mediawiki logo sunflower Tournesol 5x rev2.png, which is in the public domain. See http://en.wikipedia.org/wiki/File:Mediawiki_logo_sunflower_Tournesol_5x.png. Debian is a registered United States trademark of Software in the Public Interest, Inc., man- aged by the Debian project. Linux is a trademark of Linus Torvalds. InnoDB and MySQL are trademarks of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Abstract The WikiMedia Foundation offers wikipedias in nearly 300 languages. WP-MIRROR is a free utility for mirroring any desired set of these wikipedias. That is, it builds a wikipedia farm that the user can browse locally. Many users need such off-line access, often for reasons of mobility, availability, and privacy. They currently use kiwix which provides selected articles and thumbnail images. WP-MIRROR builds a complete mirror with original size images. WP-MIRROR is robust and uses check-pointing to resume after interruption. By default, WP-MIRROR mirrors the simple wikipedia (Simple English means shorter sentences). The default should work ‘out-of-the-box’ with no user configuration. It should build in 200ks (two days), occupy 60G of disk space, be served locally by a virtual host http://simple.wpmirror.site/, and update automatically every week. The default should be suitable for anyone who learned English as a second language (ESL). The top ten wikipedias are: en de, fr, nl, it, es, pl, ru, ja, and pt. Because WP-MIRROR uses original size image files, these wikipedias are too large to fit on a laptop with a single 500G disk. When higher capacity hard disk drives reach the market, this list may be shortened. But for now, a desktop PC with ample disk space and main memory is required for any of the top ten, unless the user does not need the images (and this is configurable). ii The en wikipedia is the most demanding case. It should build in 5Ms (two months), occupy 3T of disk space, be served locally by a virtual host http://en.wpmirror.site/, and update automatically every month. This project’s current Web home page is http://www.nongnu.org/wp-mirror/. This document was generated on December 14, 2012. Features WP-MIRROR has been designed for robustness. WP-MIRROR asserts hardware and software prerequisites, skips over unparsable articles and bad file names, validates image files (many are corrupt), waits for Internet access when needed, uses check-pointing to resume after interruption, and offers concurrency to accelerate mirroring of the largest wikipedias. WP-MIRROR warns: if MySQL is insecure (e.g. has a root account with no password), and if a caching web proxy is detected. WP-MIRROR warns if disk space may be inadequate for the default (a mirror of the simple wikipedia), and gracefully exits if disk space runs too low. WP-MIRROR normally runs in background as a weekly cron job, updating the mirror when- ever the WikiMedia Foundation posts new dump files. Most features are configurable, either through command-line options, or via the configuration file /etc/wp-mirror/local.conf. Wikipedias can be added to the mirror, or dropped from it. If a user makes a hash of the configuration files and wants to start over, just execute root-shell# wp-mirror --restore-default which drops all WP-MIRROR related databases and files (except image files). This option is used regularly during software development and debugging. A user who wishes also to delete the image files, may execute: root-shell# cd /var/lib/mediawiki/images/ root-shell# rm --recursive --force * That said, a good network citizen would try to avoid burdening the Internet by downloading the same files, over and over. Troubleshooting If WP-MIRROR fails or hangs, please take a look at the log files /var/log/wp-mirror.log. Alternatively, try running root-shell# wp-mirror --debug root-shell# tail -n 1000 /var/log/wp-mirror.log The last few lines in the log file should provide a clue. Developers should select some of the smaller wikipedias (e.g. the cho, xh, and zu wikipedias) to speed up the test cycle. WP-MIRROR was written and is maintained by Dr. Kent L. Miller. Please see §1.8, Credits for a list of contributors. Please report bugs in WP-MIRROR to <[email protected]>. iii Contents WP-MIRROR 0.5 Reference Manual ii Contents iv List of Figures xii List of Tables xiii 1 General Information 1 1.1 About This Manual ................................... 1 1.2 Typographical and Syntax Conventions ........................ 2 1.3 Overview of WP-MIRROR ............................... 2 1.3.1 What is WP-MIRROR? ............................ 2 1.3.2 History of WP-MIRROR ............................ 3 1.3.3 The Main Features of WP-MIRROR ..................... 4 1.4 WP-MIRROR Development History .......................... 5 1.5 What Is New in WP-MIRROR ............................. 6 1.5.1 What Is New in WP-MIRROR 0.5 ....................... 6 1.5.2 What Is New in WP-MIRROR 0.4 ....................... 7 1.5.3 What Is New in WP-MIRROR 0.3 ....................... 8 1.5.4 What Is New in WP-MIRROR 0.2 ....................... 8 1.5.5 What Is New in WP-MIRROR 0.1 ....................... 9 1.6 WP-MIRROR Information Sources .......................... 10 1.6.1 Project Home Page for WP-MIRROR ..................... 10 1.6.2 Downloading WP-MIRROR .......................... 10 1.6.3 Documentation for WP-MIRROR ....................... 10 1.6.4 Mailing Lists for WP-MIRROR ........................ 10 1.7 How to Report Bugs or Problems ........................... 11 1.8 Credits .......................................... 11 1.8.1 Contributors to WP-MIRROR ......................... 11 1.8.2 Documenters and Translators ......................... 11 2 Installing and Upgrading WP-MIRROR 12 2.1 General Installation Guidance ............................. 12 2.1.1 Operating Systems Supported by WP-MIRROR ............... 12 2.1.2 GNU/Linux Distributions Supported by WP-MIRROR ........... 12 2.1.3 How to Get WP-MIRROR ........................... 12 2.1.4 Choosing Which WP-MIRROR Distribution to Install ........... 13 2.1.4.1 Choosing Which Version of WP-MIRROR to Install ....... 13 2.1.4.2 Choosing a Distribution Format ................... 13 2.1.4.3 Choosing When to Upgrade ..................... 13 2.1.5 Verifying Package Integrity Using Checksums or GnuPG ........... 13 2.1.5.1 How to Get the Author’s Public Build Key ............ 13 2.1.5.1.1 Download the Public Key from Public Keyserver . 14 iv Contents 2.1.5.1.2 Cut and Paste the Public Key from Public Keyserver . 14 2.1.5.2 Checksums for a DEB Package .................... 15 2.1.5.3 Checksums for a RPM Package .................... 16 2.1.5.4 Signature Checking for a Tarball .................. 16 2.2 Installing WP-MIRROR on Linux ........................... 16 2.2.1 Installing from a DEB Package ......................... 17 2.2.2 Installing from a RPM Package ......................... 17 2.3 Installing WP-MIRROR from Source ......................... 17 2.3.1 Installing Dependencies ............................. 18 2.3.1.1 Installing Build Dependencies .................... 18 2.3.1.2 Verifying clisp Version ....................... 18 2.3.1.3 Configuring common-lisp-controller ............... 18 2.3.1.4 Configuring cl-asdf ......................... 19 2.3.1.5 Installing Binary Dependencies ................... 19 2.3.2 Installing WP-MIRROR from a Standard Source Distribution ....... 20 2.3.3 WP-MIRROR Build Options .......................... 20 2.4 Postinstallation Configuration and Testing ...................... 21 2.4.1 Default Configuration .............................. 21 2.4.2 Working with X ................................. 21 2.4.3 Configuration of a Wikipedia Farm ...................... 22 3 System Planning and Configuration 24 3.1 General Information ................................... 24 3.2 Laptop Planning and Configuration .......................... 25 3.2.1 Plan Your Disk Space .............................. 25 3.2.1.1 Disable HDD Write Caching ..................... 25 3.2.1.2 Put images Directory on Partition with Adequate Free Space . 26 3.2.1.3 Setup Thermal Management ..................... 27 3.2.1.4 Setup smart Disk Monitoring .................... 27 3.2.2 Plan Your Database Management System ................... 28 3.2.2.1 Secure the Database Management System ............. 28 3.2.2.2 Load the DBMS time zone Tables ................. 29 3.2.2.3 Configure the InnoDB Storage Engine ................ 30 3.2.3 Plan Your DRAM ................................ 31 3.2.4 Plan Your Internet Access ........................... 31 3.2.4.1 Configure cURL ............................ 31 3.2.5 Plan Your MediaWiki .............................. 32 3.2.5.1 Configure MediaWiki ......................... 32 3.2.5.2 Enable MediaWiki Extensions .................... 33 3.2.5.3 MediaWiki Redux ........................... 34 3.2.6 Plan Your Image Processing .........................