From Paper Book to a Digital One on Wikisource
Total Page:16
File Type:pdf, Size:1020Kb
From paper book to a digital one on Wikisource [[User:Xelgen]] Aleksey Chalabyan Armenian Wikipedia (hy.wikipedia.org) Armenian Wikisource (hy.wikisource.org) Wikisource ● Launched in 2003 ● 69 Languages ● Over 4 million pieces Image requirements ● 300 DPI or more ● As few geometrical/optical distortions as possible ● Evenly lit ● Color or grayscale Flatbed Image by Fir0002 CC-BY-SA, from Wikimedia Commons ADF (Auto Document Feeder) Document feeder scanner Image by [email protected] CC-BY-SA, from Wikimedia Commons Camera (or phone camera) Image by Plasmarelais, CC-BY-SA, from Wikimedia Commons Hand scanner Images by GBPublic_PR and Zoliverz, CC-BY-SA Wikimedia Commons DIY Book Scanner (http://diybookscanner.org) Image by daniel reetz, from http://diybookscanner.org Planetary document scanner Image by JamesMoorey CC-BY-SA, from Wikimedia Commons Prism book scanner (http://prismscanner.org) Professional book scanners Image by Marie-Lan Nguyen and Ra Boe CC-BY-SA, from Wikimedia Commons Time and Damage to Quality effort per Price Availability book page Flatbed High A Lot Somewhat 50-100$ Easy to find Close to 250- Not hard to ADF on flatbed/MFD High Very low irreversiable 400$ find Document scanner Close to 300- Need to High Extremly low (feeeder) irreversiable 450$ order one You probably Camera/Smartphone Low Significant None 150$+ have one Need to Hand scanner Low Too much Almost none 50-80$ order one Not hard to 300- DIY Book scanner High Very low Almost none build it 500$ yourself Planetary document Medium None 800$+ Order scanner Linear book scanner High Very low Somewhat ~1500$ Hard to build one, store and maintain 10 000$ Very hard to Pro book scanner High Very low Usually none + get Taking book apart Image by Xelgen CC-BY-SA Taking book apart Image by Xelgen CC-BY-SA Scan Tailor (http://scantailor.org) ● Fix rotation ● Split pages ● Deskew ● Autoselect content ● Setup margins OCR (Optical Character Recognition) ● ABBYY FineReader ● CuneiForms ● Tesseract Watch out before OCR Watch out before OCR 1 2 3 4 5 Wikipedia vs. Wikisource Wikisource Index page Wikisource Index page Wikisource Index page 1. Find a book which is free (or make it free) 2. Prepare your book for the scanning 3. Scan it* 4. Rename files if needed* 5. Crop and straighten images with ScanTailor* 6. Additional corrections with any image batch editor (e.g. ImageMagick or XnView)* 7. OCR* 8. Analyze and fix common mistakes in OCR software* 9. Export it as DjVu* 10. Upload to Commons ot Wikisource 11. Create Index page on commons 12. Start proofreading and encourage others to* * Double check your results Thank you! Questions? [[User:Xelgen]] .