Eldakar et al. / J Zhejiang Univ SCI 2005 6A(11):1327-1340 1327 Journal of Zhejiang University SCIENCE ISSN 1009-3095 http://www.zju.edu.cn/jzus E-mail:
[email protected] The Million Book Project at Bibliotheca Alexandrina ELDAKAR Youssef1, EL-GAZZAR Khalid1, ADLY Noha†1,2, NAGI Magdy1,2 (1Bibliotheca Alexandrina, El Shatby 21526, Alexandria, Egypt) (2Computer and Systems Engineering Department, Alexandria University, Alexandria, Egypt) †E-mail:
[email protected] Received Aug. 5, 2005; revision accepted Sept. 10, 2005 Abstract: The Bibliotheca Alexandrina (BA) has been developing and putting to use a workflow for turning printed books into digital books as its contribution to the building of a Universal Digital Library. This workflow is a process consisting of multiple phases, namely, scanning, image processing, OCR, digital archiving, document encoding, and publishing. Over the past couple of years, the BA has defined procedures and special techniques for the scanning, processing, OCR and publishing, especially of Arabic books. This workflow has been automated, allowing the governance of the different phases and making possible the production of 18000 books so far. The BA has also designed and implemented a framework for the encoding of digital books that allows publishing as well as a software system for managing the creation, maintenance, and publishing of the overall digital repository. Key words: Million Book Project (MBP), Digital books workflow, Digitization, Universal Digital Library, Scanning, Multi- ---lingual OCR, Digital publishing, Image-on-text, DjVu, PDF doi:10.1631/jzus.2005.A1327 Document code: A CLC number: TP391 INTRODUCTION SCANNING The Bibliotheca Alexandrina (BA) has develop- The Minolta PS7000 scanner is used.