Dieper Lb-5632

Dieper Lb-5632

DIEPER LB-5632 Deliverable 13 Survey of current methodology in image capturing and document management Date 08-07-99 Reference D13final/public/ABC, 29 pages Produced by ABC Datenservice GmbH for UNIGOE Workpackage 4, supervised by UBG Distribution list All DIEPER partners Contact person Reinhard Ecker * Am Wasserturm 6 D-60435 Frankfurt am Main ) + 49 69 954031-30 2 + 49 69 954031-12 . [email protected] DIEPER Project: Deliverable 13 Survey of current methodology in image capturing and document management Version: Final Page 2 Date: 08.07.1999 Document history Versions Version Date Author Comments 1 20/12/98 R. Ecker Preliminary Draft of D13 2 16/02/99 R. Ecker Draft 2 of D 13 3 09/07/99 R. Ecker Final version of D 13 Updates Chapter Description of modifications Version 1. Introduction............................................................................................................................................. 3 2. Scanning ................................................................................................................................................. 4 Kind of printed materials .................................................................................................................... 5 Kind of intended use and further processing of the digitised materials................................................. 6 Kind of intended access to the digitised materials................................................................................ 6 Image processing ................................................................................................................................ 7 Image compression ............................................................................................................................. 7 Versions of image files for different applications................................................................................. 7 3. Indexing ................................................................................................................................................ 10 Categories of indexing ...................................................................................................................... 10 Document identifier .......................................................................................................................... 10 Document structure........................................................................................................................... 10 4. Methods of full text + meta data capturing............................................................................................. 12 Manual text capturing....................................................................................................................... 12 Text capturing by OCR / ICR............................................................................................................ 12 Download of catalogue data .............................................................................................................. 12 5. Document storage.................................................................................................................................. 13 Document storage formats................................................................................................................. 13 Digital master file ............................................................................................................................. 15 Application file formats .................................................................................................................... 15 Self-describing image files ................................................................................................................ 16 Storage media ................................................................................................................................... 17 6. Document management ......................................................................................................................... 19 Electronic archiving and document management systems.................................................................. 19 Basic functions of archiving and document management systems...................................................... 19 Document storage ............................................................................................................................. 20 Document retrieval......................................................................................................................... 20 Document visualisation and reproduction.......................................................................................... 20 Maintenance and administration ................................................................................................... 20 Existing archiving and document management systems for digital libraries ...................................... 21 Online library catalogue software systems ......................................................................................... 21 Local solutions.................................................................................................................................. 22 7. Relevant Standards ................................................................................................................................ 22 8. References, URLs etc............................................................................................................................. 29 Appendix: Dieper Questionnaire DIEPER Project: Deliverable 13 Survey of current methodology in image capturing and document management Version: Final Page 3 Date: 08.07.1999 1. Introduction This document describes the current status of image capturing and document management methods, especially with respect of library documents. Some years ago several libraries have started to retro-digitise printed materials, as e.g. books or periodicals, and to distribute these digital documents via Internet or other networks to users over the world. We are now at the beginning of a development which will, as we hope, make all important information make available immediately from anywhere and at any time. This new kind of information access meets already some enthusiasm from their users to act as a catalyst for starting additional projects. It is expected, that the information behaviour will be influenced considerably by the direct access to digitised documents. Libraries – in our tradition one of the significant groups of conventional information providers – will identify and use this chance to overtake also a leading role in the digital information society. The goal DIEPER project is to enhance these developments with respect to the digitisation, indexing and presentation of scientific periodicals. This report gives an overview on the current methodology status on digitisation of printed library materials and electronic storage and administration of digital documents. In addition a short overview is given to indexing and to the capturing of full text and meta data. (Deliverable 16, which is to be prepared later will give more details on these items). A list of relevant standards and a technical glossary of relevant terms is added together with some references. In the appendix to this report the results of a survey (“Dieper Questionnaire”) for the investigation of the current methodology in image capturing and document management at the project partners and selected European libraries are presented. DIEPER Project: Deliverable 13 Survey of current methodology in image capturing and document management Version: Final Page 4 Date: 08.07.1999 2. Scanning To make a printed document available via the Internet it has to be converted into an electronic format. One of the first difficult issues that must be addressed in any digital conversion project concerns the selection of appropriate formats and technologies for storage, display and distribution of the material. Another difficult question is what file format (images, PDF, SGML, HTML, etc.) should be used to deliver the content. Another question is whether to store and deliver the materials as images or as text. Given the technology available to web browsers, the most accurate way to replicate completely the originally published material, which is full of special characters, foreign languages, mathematical symbols, charts and pictures, is with scanned images. In addition, by the use of Optical Character Recognition software a corresponding text file can be built that would allow the user to search the full-text of the journals in the database. But these (uncorrected) OCR-text files should not be made available to users. A distinction should be made between coded and non-coded information: Coded information Non-coded information Files Text Image Capturing Manually input, OCR/ICR Scanning Editing Text editor Pixel editor Direct retrieval Yes No Basic scanning parameters · Kind of the original document (printed text on paper, printed image, photograph, colour, microfilm, microfiche, ...) · Size of the original document (micro form, <A 4, >A 4 <A3, <A 3, ..., >A0) · Scanning resolution (100 dpi, 300 dpi, 400 dpi, 600 dpi, ...) · Image depth (pixel information: 1 bit, 8 bit, 12 bit, 3 x 12 bit, ...) · Intended exploitation of the digital materials · File size of the digital materials Criteria for the definition of scanning parameters DIEPER Project: Deliverable 13

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    29 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us