From Paper Book to a Digital One on Wikisource

From Paper Book to a Digital One on Wikisource

From paper book to a digital one on Wikisource [[User:Xelgen]] Aleksey Chalabyan Armenian Wikipedia (hy.wikipedia.org) Armenian Wikisource (hy.wikisource.org) Wikisource ● Launched in 2003 ● 69 Languages ● Over 4 million pieces Image requirements ● 300 DPI or more ● As few geometrical/optical distortions as possible ● Evenly lit ● Color or grayscale Flatbed Image by Fir0002 CC-BY-SA, from Wikimedia Commons ADF (Auto Document Feeder) Document feeder scanner Image by [email protected] CC-BY-SA, from Wikimedia Commons Camera (or phone camera) Image by Plasmarelais, CC-BY-SA, from Wikimedia Commons Hand scanner Images by GBPublic_PR and Zoliverz, CC-BY-SA Wikimedia Commons DIY Book Scanner (http://diybookscanner.org) Image by daniel reetz, from http://diybookscanner.org Planetary document scanner Image by JamesMoorey CC-BY-SA, from Wikimedia Commons Prism book scanner (http://prismscanner.org) Professional book scanners Image by Marie-Lan Nguyen and Ra Boe CC-BY-SA, from Wikimedia Commons Time and Damage to Quality effort per Price Availability book page Flatbed High A Lot Somewhat 50-100$ Easy to find Close to 250- Not hard to ADF on flatbed/MFD High Very low irreversiable 400$ find Document scanner Close to 300- Need to High Extremly low (feeeder) irreversiable 450$ order one You probably Camera/Smartphone Low Significant None 150$+ have one Need to Hand scanner Low Too much Almost none 50-80$ order one Not hard to 300- DIY Book scanner High Very low Almost none build it 500$ yourself Planetary document Medium None 800$+ Order scanner Linear book scanner High Very low Somewhat ~1500$ Hard to build one, store and maintain 10 000$ Very hard to Pro book scanner High Very low Usually none + get Taking book apart Image by Xelgen CC-BY-SA Taking book apart Image by Xelgen CC-BY-SA Scan Tailor (http://scantailor.org) ● Fix rotation ● Split pages ● Deskew ● Autoselect content ● Setup margins OCR (Optical Character Recognition) ● ABBYY FineReader ● CuneiForms ● Tesseract Watch out before OCR Watch out before OCR 1 2 3 4 5 Wikipedia vs. Wikisource Wikisource Index page Wikisource Index page Wikisource Index page 1. Find a book which is free (or make it free) 2. Prepare your book for the scanning 3. Scan it* 4. Rename files if needed* 5. Crop and straighten images with ScanTailor* 6. Additional corrections with any image batch editor (e.g. ImageMagick or XnView)* 7. OCR* 8. Analyze and fix common mistakes in OCR software* 9. Export it as DjVu* 10. Upload to Commons ot Wikisource 11. Create Index page on commons 12. Start proofreading and encourage others to* * Double check your results Thank you! Questions? [[User:Xelgen]] .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    30 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us