COURSEWARE FOR TRAINING OF TRAINERS AND USERS ON THE SPECIAL APPLICATIONS OF INTERNET-BASED SERVICES IN THE FIELDS OF CULTURAL EDUCATION

CHAPTER 6

ADVANCED TECHNIQUES OF IMAGE PROCESSING AND THEIR USE FOR SPREAD OF DIGITALIZED CULTURAL VALUES

OVERVIEW OF THE TECHNIQUES FOR CREATING DIGITAL ARCHIVES AND PRESENTATION ON THE WEB

Abstract. In this short tutorial traditional and new image compression techniques are presented. Using different examples, the advantages and drawbacks of various methods of image compression are presented. This tutorial focuses on the application of new based compression techniques, which are especially suitable for the Internet presentation of the scanned documents.

Keywords: digitisation, digital libraries and archives, image compression, Internet presentations, distance learning Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

CHAPTER 6

ADVANCED TECHNIQUES OF IMAGE PROCESSING AND THEIR USE FOR SPREAD OF DIGITALIZED CULTURAL VALUES

OVERVIEW OF THE IMAGE COMPRESSION TECHNIQUES FOR CREATING DIGITAL ARCHIVES AND PRESENTATION DOCUMENTS ON THE WEB

TABLE OF CONTENTS

Introduction ...... 3

1. Overview of Image Compression Techniques...... 5

1.1 Standard and New Image Formats...... 6 1.1.1 Summary Table of Standard Formats ...... 6 1.1.2 Summary Table of New Image Formats ...... 7 1.1.3 Graphics Interchange Format (GIF) ...... 7 1.1.4 Portable Network Graphics (PNG)...... 7 1.1.5 Joint Bilevel Group (JBIG)...... 8 1.1.6 Joint Photographic Experts Group (JPEG)...... 8 1.1.7 Scalable (SVG)...... 8 1.1.8 Multi-Resolution Seamless Image Database (MrSID) ...... 8 1.1.9 Image Compression Program DjVu...... 8 1.1.10 -Based Compression Standard (WSQ) ...... 8 1.1.11 Scaleable Internet/Intranet LuraWave Format (LWF)...... 9 1.1.12 Lightning Strike Image Compression (COD)...... 9 1.1.13 Wavelet Image Files (WIF) ...... 9 1.1.14 Fractal Image Format (FIF)...... 9

2. Special Formats Suitable for the Presentation of Scanned Documents on the Internet.. 10

2.1. Disadvantages of GIF and JPEG Formats for Compression of Documents ...... 10 2.2. New Formats Based on Wavelet Techniques...... 10 2.2.1 Wavelet Based Format DjVu ...... 10 2.2.2 New Standard of Image Compression LuraDocument (LDF)...... 11 2.2.3 Wavelet Based format MrSID ...... 12 2.2.4 Comments on Usability of New Compression Techniques...... 12 2.4. The List of Pictures Demonstrating Strength of New Compressing Techniques ...... 13

References...... 18

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 2 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

PowerPoint slides 1-80 for Chapter 6

OVERVIEW OF THE IMAGE COMPRESSION TECHNIQUES FOR CREATING DIGITAL ARCHIVES AND PRESENTATION DOCUMENTS ON THE WEB

Bogdan Smolka [email protected] Silesian University of Technology, Department of Automatic Control, Electronics and Science Poland

Introduction

SLIDES 1 - 5

Much of the cultural heritage is currently available only in paper form. Archives, libraries, museums are places where the majority of the scientific and cultural resources are stored and preserved. For centuries, incunabula, maps, music sheets, manuscripts, facsimile and old prints have been carefully archived and protected over times of wars and natural disasters, as a deposit for future generations. Unfortunately, the direct access to this huge deposit of valuable material for professional and personal use is rather complicated and expensive.

The historical documents deteriorate with time in many ways. Vellum, canvas, paper, ink and print are exposed to aggressive chemical, physical and biological factors, which in turn leads to slow but inevitable and permanent loss of valuable exhibits. Although the deterioration processes can be slowed down using optimal storage conditions and renovation, the collections are lost irretrievably with time. In case of many libraries and archives in Eastern Europe the situation is even worse because of years of neglect and permanent lack of funding. Not only storage conditions are grossly neglected. The archives are also insufficiently protected against vandalism, theft and natural disasters. This dramatically adds to time-related dangers to cultural heritage. What may be surprising is that relatively young historic relics are most endangered. 19th and 20th century accidentals, newspapers and letters are degrading very quickly, due to acid paper syndrome. This great danger is forcing to take prompt counter-measures in order to preserve our cultural heritage for future generations.

Because of inevitable process of total natural destruction that is about to happen to many items in nearest future, the only way of preserving the collection is to save the information content and make exact copies of originals by analog or digital acquisition of document images. Such actions allow securing informational values of the documents and making them available to general public. It is especially important considering the fact that optimal document storage conditions, i.e. in separation from damaging external influences, often limit the access to the documents, even making them unavailable to larger groups of specialists.

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 3 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

Saving information stored in documents by copying them was already practiced in distant ages, for example in case of Greek philosophical treatises that survived to modern times only copied by the Arabs, or in scriptoria in Irish monasteries during Dark Ages. Archiving the information layer of the document indirectly prolongs its physical existence, because the need to make the original document accessible is eliminated, at least in majority of the access requests where only the information contained in the document is needed.

The most interesting documents have been filmed already to ensure the permanence of the vulnerable source of information and to enable the easy access for students and scientists. Microfilms offer acceptable level of quality, media longevity, little machine dependence and the possibility for producing additional copies without much informational loss. The widespread practice of copying documents on microfilms has, however, several serious disadvantages:

! The lifetime of color microfilms is rather short, which in many cases leads to discarding of the color information; ! Copying the microfilms for distribution leads to significant degradation of image quality, ! The microfilms themselves degrade with time as well; ! The microfilms hold only images of documents without any additional information, which can cause problems to inexperienced users; ! The microfilms can be only accessed locally where equipment for their browsing exists. To enable remote access to the filmed information, the microfilms have to be digitized, which means an additional quality loss.

These factors seriously limit usability of microfilm collections. In additions, many institutions don't keep them - for example numerous in Eastern Europe church libraries, often holding fascinating collections known exclusively to experts.

The digital methods of archiving and popularization of the collections are free from many of those disadvantages. In the time of rapid development in the network and data technology, with their constantly improving capacity for the transmission, the microfilm approach is beginning to belong to the old analogue era. The new era is the explosive development of the Internet and possibilities offered by the rapidly growing technical advances of the digitization, enabling easy access to various treasures of the cultural heritage. Modern techniques of digital collection storage and display are:

! The only durable and relatively cheap way to preserve the information value of documents, ! The only effective method of making public the knowledge about collections of documents: manuscripts, incunabula, books, maps and historical documents preserved in museums and libraries.

Even now, rapid development of digital information processing and distribution takes over microfilm collections in many places. Scanners and high-resolution digital cameras together with efficient digital archiving systems, providing long-term loss less data storage, are very attractive when compared to traditional microfilm collections:

! Digital image quality is comparable to microfilms and is still improving, ! Copying digital documents does not lead to image quality degradation,

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 4 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

! Digital collections can be accessible in degrees incomparable to traditional archives, thanks to Internet and multimedia publications, ! Existing collections of traditional document copies (microfilms, photos etc.) can be digitized as well--this is called an indirect digital image acquisition, contrary to direct acquisition directly from the document.

However, electronic access to the printed or written documents is a complex matter. Paper documents contain text and illustrations and very often the whole document has to be digitized and transformed into an image format. Sometimes it is possible to use optical character recognition tools and extract the text from the photographs or drawings, but this seems to be very difficult, when dealing with old documents and in many cases even inappropriate. An old document is simply not a sum of the information contained in the text and illustrations. The colors, texture of the paper, style of the handwriting or printing technique are very often more important as the informational content of the document, which in many cases is already known.

The philosophy behind this approach is that old documents should be shown as an integration of the information embedded in the document and its visual content. In this way the documents are digitized and shown as images. The so-called virtual libraries allow viewing the documents, making copies using a printer and what is also important the documents can be collected and stored in a private archive for later use.

The major problem with the presentation of printed documents on the Web is that a compromise between the quality of reproduction of the document and the transmission time required to download the huge amount of data contained in an image file has to found.

At the moment, the rapid growth of the number of the Internet users is worsening the problem, as the transmission capacities are almost exhausted. Perhaps with the introduction of high-speed Internet connections the problem will be alleviated, but at present and in the near future the only possible solution is the lossly compression of the image data, so that the user can access the document in a reasonable time and enjoy viewing it in a satisfactory quality. As a result of digitization of an average size document at a medium resolution and color depth, a huge file of average size about 20–50 MB is produced. The quality of such an image is high, but the transmission time, the computational burden and the hardware requirements are simply enormous. That is why techniques are commonly used, as they are capable of reducing the amount of data without much loss of the image quality.

1. Overview of Image Compression Techniques

An important aspect of image processing is the enormous amount of data, which has to be handled when transmitting digital images. The efficient transmission of images is extremely important as the image data transfer takes up over 90 percent of the volume on the Internet. In this aspect computer is a powerful technology, which is playing a vital role in the Information Age.

The compression of information can be classified into loss less and techniques. In some cases such as text or financial data transfer only the loss less algorithms can be applied. However when transmitting or storing digital images or music data, the application of

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 5 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

techniques is almost not perceivable by the user, but enables a drastic reduction of the data volume.

In this short tutorial some of the compression techniques, which can be used when transmitting or storing digital images, are presented. All formats described here, are accompanied by a short description and an Internet hyperlink, which can be used when detailed information is needed. In this material we provide the reader with the substantial information on the existing formats and make some comments regarding their usage when presenting images.

1.1 Standard and New Image Formats

1.1.1 Summary Table of Standard Formats

Plug Max. Name Full name WWW in Colors http://www.ora.com/centers/gff/specs.htm Graphics http://members.aol.com/royalef/gifabout.htm Interchange http://www.atpm.com/6.08/graphicsandtheinternet.shtml N 256 GIF Format http://www.geom.umn.edu/events/courses/1996/cmwh/Anim ations/GIF.html http://cloanto.com/users/mcb/19950127giflzw.html http://www.cdrom.com/pub/png Portable http://www.libpng.org/pub/png/ PNG Network http://www.w3.org/TR/REC-png.html N 24 bit Graphics http://www.w3.org/Graphics/PNG/ http://www.pdsimage.com/html/news/jbig/faq.htm Joint Bilevel http://www.jpeg.org/public/jbighomepage.htm 256 JBIG Group http://www.research.ibm.com/journal/rd/426/marks.html Y (gray) http://www.crs4.it/~luigi/MPEG/jbig.html http://www.ora.com/centers/gff/specs.htm 24 bit Joint http://www.jpeg.org/ JPEG Photographers http://www.landfield.com/faqs/jpeg-faq/ N Expert Group http://www.ijg.org/ http://www.jpegwizard.com/

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 6 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

1.1.2 Summary Table of New Image Formats

Plug Max. Name Full name WWW in Colors Graphics http://www.webutilities.com/Tips/gifx-1.htm GIFX Interchange http://www.gifx.com/Manual/ Y 256 Format http://www.digital-foundry.com/animation/software/gifx.shtml PTI System for Progressive http://server.hvzgymn.wn.schule-bw.de/pti/Intro/Index.htm Y 24 bit Transfer of Images DJVU Deja Vu http://www.djvu.com Y 24 bit FIF Fractal Image Format http://www.iterated.com Y 24 bit SWF Shockwave Flash http://www.macromedia.com Y 24 bit FPX FlashPix http://www.livepicture.com Y 24 bit WSQ Wavelet Scalar Quantization http://www.aware.com/products/compression/wsq.html Y 24 bit MrSID Multi-resolution Seamless http://www.lizardtech.com Y 24 bit Image Database FIF Fractal Image http://www.altamira-group.com Y 24 bit Format WIF Wavelet http://www.cengines.com/wavelet.htm Y 24 bit Image Files COD Lightning Strike Image http://www.infinop.com/infinop/html/image_compress.html Y 24 bit Compression SVG Scalable 24 bit Vector http://www.ora.com/centers/gff/specs.htm N Graphics LWF LuraWave http://www.luratech.com/ Y 24 bit

1.1.3 Graphics Interchange Format (GIF)

Although limited to 256 colours and 95 dpi resolution, GIF images are found in vast quantities and supported by most image-using software applications. The GIF image format uses a built-in LZW compression algorithm, which is a patented technology and is currently owned by Unisys Corporation. Unisys decided that commercial vendors, whose products use the GIF LZW compression, must license its use from UNISYS. End users, online services, and non-profit organizations do not pay this royalty. To avoid this royalty, vendors have developed an alternative to GIF that supports transparency and called PNG ("ping"), the Portable Network Graphic.

1.1.4 Portable Network Graphics (PNG)

This format was designed to replace the older and simpler GIF and to some extent the much more complex TIFF format. For the WWW, the PNG has three main advantages over

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 7 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

GIF: alpha channels (variable transparency), (cross-platform control of image brightness), and two-dimensional interlacing (a method of progressive display). PNG also compresses better than GIF in almost every case, but the difference is generally only around 5% to 25%.

1.1.5 Joint Bilevel Group (JBIG)

JBIG is a data-encoding standard used to compress 1-bit, bilevel image data. The recent extensions to JPEG have defined a (SPIFF) that will also store facsimile and JBIG-compressed data.

1.1.6 Joint Photographic Experts Group (JPEG)

JPEG is a standardized lossy encoding method used for compressing true-color and grayscale image data. JPEG if one of the most popular methods of data compression. JPEG data is stored in its raw form, or using the JFIF file format. Recent extensions to the JPEG standard have defined an official file format for JPEG (and others) named SPIFF.

1.1.7 (SVG)

Work in progress at W3C ( Consortium, http://www.w3.org/ ) on SVG, a vector graphics format written in XML and stylable with CSS, is expected to be a popular choice for including graphics in XML documents. It may be included either by linkage, or by textual inclusion an XML document that uses a different namespace. Because SVG can itself include raster images such as JPEG and PNG, SVG can be used to add raster and mixed vector/ to XML documents

1.1.8 Multi-Resolution Seamless Image Database (MrSID)

MrSID (Multi-resolution Seamless Image Database) is a powerful wavelet based image compressor, viewer and file format that enables instantaneous viewing and manipulation of images locally and over networks while maintaining maximum image quality. Features include unprecedented compression ratios while maintaining highest image quality; true multiple resolutions, selective decompression, seamless mosaicking and browsing.

1.1.9 Image Compression Program DjVu

AT&T Labs - Research has released an image compression program called DjVu. This format is particularly appropriate for web designers who wish to scan high-resolution color images and deliver them over the Internet or Intranets.

1.1.10 Wavelet Transform-Based Compression Standard (WSQ)

WSQ is a wavelet transform-based compression standard designed by the FBI for compression of digital fingerprint images. Aware™ is the leading provider of WSQ solutions and has a customer base that includes major system integrators as well as live scan vendors and application developers. WSQ by Aware is 3 - 4 times faster than the next fastest commercial implementation of the WSQ algorithm

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 8 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

1.1.11 Scaleable Internet/Intranet LuraWave Format (LWF)

LuraImage is a scaleable Internet/Intranet front-end technology that enables access to image databases, utilizing the highly efficient LuraWave image format. LuraImage is a web- based database optimized for image communication in low-bandwidth environments - such as the Internet. The usage of LuraWave-image compression rapidly speeds up the transmission of digital images in the Internet. LuraWave images contain all user-relevant resolutions (thumbnail representation up to the original image size) as one image file. The very first bytes of an image transmission are used to generate a good image preview.

1.1.12 Lightning Strike Image Compression (COD)

Lightning Strike software delivers the most advanced image compression currently available. Lightning Strike outperforms JPEG by 200 to 500 percent. In fact, on a 28.8 modem, a 1MB image file would take 90 seconds to download as a GIF image, 15 seconds as a JPEG image, and just 3 seconds as a Lightning Strike image. Speed plus quality is the clear advantage. At identical file sizes, Lighting Strike images have been judged significantly better than JPEG images.

Remark: a 1MB image file would take 90 seconds to download as a GIF image, 15 seconds as a JPEG image, and just 3 seconds as a Lightning Strike image

1.1.13 Wavelet Image Files (WIF) Method called "Wavelet Transform" made at the Houston Advanced Research Center (HARC™). By applying the complex wavelet algorithms to a digital image, the "compression engine" software is able to represent the image as a mathematical expression. The result is a WIF files are created using the groundbreaking advances in the mathematical compression compressed WIF file that can be 300 times smaller than the original image file size while still maintaining high image quality.

1.1.14 Fractal Image Format (FIF)

Fractal compression works by using a variety of methods to identify features within an image and then breaking down the image into a mathematically modeled series of repeating shapes and patterns. Fractal compression is very efficient, achieving compression ratios of up to 250:1; typical fractal compression ratios will range between 20:1 and 100:1. Images can be magnified or reduced, because the compression process allows the modeled images to be resolution-independent. When a fractal-encoded image is converted to a pixel image, it can be enlarged or reduced to any desired size with minimal loss of image quality. However, published reviews of fractal compression software indicate that there is probably a practical limit to how much a fractal-encoded image can be enlarged before there is a significant loss of image quality; perhaps up to 300% of the original size.

However, published reviews of fractal compression software indicate that there is probably a practical limit to how much a fractal-encoded image can be enlarged before there is a significant loss of image quality; perhaps up to 300% of the original size.

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 9 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

2. Special Formats Suitable for the Presentation of Scanned Documents on the Internet

SLIDES 6 - 20

2.1. Disadvantages of GIF and JPEG Formats for Compression of Documents

The GIF format is mainly used for compression of images that contain low number of different colors. As it uses the loss less coding scheme, its efficiency is not very high and it is not suitable for distribution of real life color or gray scale images. Much better results are obtained using the JPEG format from the Joint Pictures Expert Group, which performs the sub-sampling of the chrominance information, quantization of the DCT transform coefficients and of the image data. Although the compression ratios of about 40:1 are easily achieved without sacrificing much of image quality, the JPEG format is not suitable for the compression of documents. As documents contain many high frequency objects like letters and drawings, therefore the elimination of the high frequency components of the cosine transform leads to severe loss of the quality of document reproduction. As compression rates increase, the text rapidly becomes distorted and illegible. In order to ensure the document legibility, the JPEG files have to be large and this is the main obstacle while preparing an efficient Internet Library.

Although the compression ratios of about 40:1 are easily achieved without sacrificing much of image quality, the JPEG format is not suitable for the compression of documents. As compression rates increase, the text rapidly becomes distorted and illegible.

2.2. New Formats Based on Wavelet Techniques

A common solution is the transformation of the document image to a binary form and then to compress it using the CCITT compression standards Fax Group 3 or Fax Group 4. This approach enables the text legibility at high compression rates, at the expense of the total loss of color information. The JPEG, GIF and fax formats used for distribution of documents are being replaced now by the new wavelet based formats geared towards the direct compression of high-quality scanned documents. These new formats enable fast transmission of document images over the Internet with acceptable level of quality.

Among the new wavelet formats three are especially interesting for the presentation of scanned documents with the use of Internet: namely: DjVu, LuraDocument, and MrSID.

SLIDE 21

2.2.1 Wavelet Based Format DjVu

The DjVu format has been developed at AT&T Bell Labs (see http://www.djvu.com/) with the purpose of constructing a format, which could enable the distribution of scanned high- resolution images over the Web.

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 10 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

SLIDES 22 - 34

DjVu represents the image using three layers:

! The mask, which is a bitonal representing the text and drawings. This layer, coded with loss less compression algorithm, indicates which pixel belongs to the foreground (text, drawings, signatures) and which one to the background (paper texture, photographs). ! The second layer represents the colors of the background using the wavelet based transformation coding ! The third layer contains the information of the foreground encoded using the same wavelet algorithm.

Using DjVu, it is possible to faithfully reproduce a document scanned at 300 dpi of 25 MB size down to about 100 - 200 KB. The quality of the DjVu images is acceptable and this format is suitable for the compression of books, newspaper pages with color photographs, catalogues and so on. The size of the compressed document images is so low, that it can be used for distribution of documents on CD-ROMs (a CD-ROM can contain about 5000 newspaper pages) and of course over the Internet (the Internet sites contain in average about 50 KB of information). Another important feature of DjVu is the good performance of the freely available Internet browsers plug-ins, which allow fast zooming and scrolling of images, its splitting into background and foreground, conversion to black and white and many other useful facilities.

Using DjVu, it is possible to faithfully reproduce a document scanned at 300 dpi of 25 MB size down to about 100 - 200 KB. 2.2.2 New Standard of Image Compression LuraDocument (LDF)

As digital images and scanned documents require an enormous amount of memory, storage capacity and transmission speed, the LuraTech company (see: http://www.luratech.com/) has developed its own standard of image compression called LuraDocument (LDF), based on the novel achievements of the wavelet theory.

SLIDES 35 - 40

The LDF, like DjVu format, performs the segmentation of the document into the text and background. This approach allows the achievement of high compression ratios. Using the to the compression of the background and foreground layer, the LDF format achieves compression ratios of about 200:1, while preserving acceptable image quality. This format is specially designed for the processing of document images and allows their distribution over the Web and local networks in the business and non-profit applications. LDF format offers significant savings in disk and network resources and can be used on many computer platforms. An efficient plug-in for the popular Internet browsers is also available.

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 11 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

2.2.3 Wavelet Based format MrSID

SLIDES 41 - 57 LizardTech's MrSID for Photography (see: http://www.lizardtech.com/) is a file format, which encodes large, high-resolution images to a fraction of their original file size while maintaining high image quality. In this way images become really scalable and can be reduced, enlarged, zoomed or printed without much quality loss. MrSID is designed specifically for the compression of large files of scanned documents, like old books, newspapers and especially large geographical maps. This multi- resolution, device- independent image format delivers the optimal resolution for the screen and the Internet

This format enables that portions of huge files can be sent quickly via the Internet at print- ready quality and the transfer of data takes only a few seconds. The same is true for the encoding process. MrSID requires only seconds to open large files once they have been encoded. MrSID's high-encoding ratios result in the great overall file-size reduction without visible quality compromises.

One of the benefits of a MrSID image is its simplicity of usage. It is possible to compress an image from Photoshop® or to use the MrSID Workgroup Encoder for image collections. MrSID files are supported in commonly used graphic applications and standard Web browsers, allowing users to take advantage of the flexibility of MrSID throughout their workflows. SLIDES 57 - 80 There are many digitization projects whose aim is to preserve the cultural heritage to the future generations. Some of them are shown on slides 57- 80. The slides are accompanied by hyperlinks, so that a user can visit a specific digital archive or library.

2.2.4 Comments on Usability of New Compression Techniques As shown in this short overview and as depicted in Figures 1 - 4, the new compression techniques based on the wavelet transform deliver substantially higher image quality compared to the standard formats and enable the presentation of scanned documents over the Internet. The high quality of the compressed images and their low average size make them an interesting tool for building virtual libraries on the Web.

Our experience with the new technology shows that it should be widely used to allow the easy and fast access to high quality material, available only in the paper form. The introduction of the new techniques will surely bring the Internet a step further towards being the universal, most powerful medium of information exchange.

The capabilities of the new formats can be assessed at the experimental WWW site http://plum.ia.polsl.gliwice.pl/vb prepared in cooperation with the Silesian Library, Katowice, Poland. At this site we have collected many examples of different documents like incunabula, old letters, newspapers, songbooks, photographs, maps. Every document is compressed using DjVu and LuraDocument to enable the comparison of these two software packages. Our virtual library provides also information on other projects, whose aim is to enable the enhanced access to the cultural heritage over the Internet.

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 12 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

2.4. The List of Pictures Demonstrating Strength of New Compressing Techniques FIG. 1 The sample of a letter in picture (a) and images compressed using different techniques: LDF in (b); DJVU in (c); MrSID in (d); and JPEG in (e). The whole files were of the same size (140 KB compressed, the original had 17,2 MB in TIFF format) and were compressed using the default settings of the compressor software.

FIG. 2 An example of the efficiency of DjVu: a page from the Polish newspaper from 1922 at different zoom settings (180 KB).

FIG. 3 An example of the efficiency of the LDF: a page from a German newspaper from 1927 at different zoom settings (185 KB).

FIG. 4 An example of the efficiency of MrSID at different zoom settings.

The pictures are displayed in the following four pages

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 13 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

FIG. 1

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 14 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

FIG. 2

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 15 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

FIG. 3

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 16 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

FIG. 4 Example of the efficiency of MrSID at different zoom settings.

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 17 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

References

[1] Abid, A. Memory of the World: preserving our documentary heritage. Paris: UNESCO, Information and Informatics Division, July 1997. Available from: http://www.unesco.org/webworld/memory/Abid.htm. [2] Beagrie, N. and Greenstein, D. A strategic policy framework for creating and preserving digital collections. JISC/NPO Studies on the Preservation of Electronic Materials. eLib Supporting Study P3. London: South Bank University, Library Information Technology Centre, 1998. Available from: http://ahds.ac.uk/manage/framework.htm [3] Bearman, D. Virtual Archives. ICA Meeting, Beijing, September 1996. Available from: http://www.lis.pitt.edu/~nhprc/prog6.html [4] Bennett, J.C. A framework of data types and formats, and issues affecting the long-term preservation of digital material. JISC/NPO studies on the Preservation of Electronic Materials. British Library Research and Innovation Report, 50. London: British Library Research and Innovation Centre, 1997. Available from: http://www.ukoln.ac.uk/services/papers/bl/jisc-npo50/bennet.html [5] Cox, R.J. American archivists, cyberculture, and stasis. Paper delivered at: Cyber, Hyper or Resolutely Jurassic? Archivists and the Millenium, University College Dublin, 2-3 October 1998. Available from: http://www.ucd.ie/~archives/coxart.html [6] Day, M.W. Extending for digital preservation. Ariadne, no. 9, May 1997. Available from: http://www.ariadne.ac.uk/issue9/metadata [7] Dodson, S. and Wellbeiser, J. (comp.) Bibliography of standards and selected references related to preservation in libraries. Ottawa: National Library of Canada, 1996. Available from: http://www.nlc-bnc.ca/resource/presv/eintro.htm [8] Duranti, L., Eastwood, T. and MacNeil, H. The preservation of the integrity of electronic records. Vancouver: University of British Columbia, School of Library, Archival & Information Studies, 1996. Available from: http://www.slais.ubc.ca/users/duranti/intro.htm [9] Graham, P.S. Building the digital research library: preservation and access at the heart of scholarship. Follett Lecture Series, Leicester University, 19 March 1997. Available from: http://www.ukoln.ac.uk/services/papers/follett/graham/paper.html [10] Graham, P.S. Intellectual preservation: electronic preservation of the third kind. Washington, D.C.: Commission on Preservation and Access, 1994. Available from: http://www.clir.org/pubs/reports/graham/intpres.html [11] Haddad, P. and Jones, M. Electronic publishing: access, bibliographic and preservation issues: report to the Electronic Publishing Working Group, Australian Vice-Chancellors' Committee. Canberra: National Library of Australia, 1995. Available from: http://www.adfa.oz.au/EPub/AvccNLA.html [12] Haddad, P. and Jones, M. Electronic publishing: library and archival issues. 1996. Available from: http://www.adfa.oz.au/Epub/key/Library.html

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 18 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

[13] Haynes, D., Streatfield, D., Jowett, T. and Blake, M. Responsibility for digital archiving and long term access to digital data. JISC/NPO Studies on the Preservation of Electronic Materials. British Library Research and Innovation Report, 67. London: British Library Research and Innovation Centre, 1997. Available from: http://www.ukoln.ac.uk/services/papers/bl/jisc-npo67/digital- preservation.html [14] Hedstrom, M. and Montgomery, S. Digital preservation needs and requirements in RLG member institutions: a study commissioned by the Research Libraries Group. Mountain View, Calif.: Research Libraries Group, December 1998. Available from: http://www.rlg.org/preserv/digpres.html (Last visited: 17-Jun-1999) [15] Kenney, A.R. and Personius, L.K. The Cornell/Xerox/Commission on Preservation and Access joint study in digital preservation. Report: Phase 1, Digital capture, paper facsimiles and network access. Washington, D.C.: Commission on Preservation and Access, 1992. Available from: http://www.clir.org/pubs/reports/joint/ [16] Kenney, A.R. Digital to Microfilm Conversion: a demonstration project, 1994-1996. Final Report to the National Endowment for the Humanities. Ithaca, N.Y.: Cornell University Library, Department of Preservation and Conservation, 1997. Available from: http://www.library.cornell.edu/preservation/com/comfin.html [17] Lesk, M. Image formats for preservation and access: a report of the Technology Advisory Committee to the Commission on Preservation and Access. Washington, D.C.: Commission on Preservation and Access, 1990. Available from: http://www.clir.org/pubs/reports/lesk/lesk.html [18] Lesk, M. Preserving digital objects: recurrent needs and challenges. Available from: http://www.lesk.com/mlesk/auspres/aus.html [19] Matthews, G., Poulter, A. and Blagg, E. Preservation of digital materials: policy and strategy issues for the UK: report of a meeting held at the British Library Research and Innovation Centre, London, 13th December 1996. JISC/NPO Studies on the Preservation of Electronic Materials. British Library Research and Innovation Report, 41. London: British Library Research and Innovation Centre, 1997. Available from: http://www.ukoln.ac.uk/services/papers/bl/blri041/digpres.html [20] Menne-Haritz, A. and Brübach, N. The intrinsic value of archive and library material. Digitale Texte der Archivschule Marburg, Nr. 5. Marburg: Archivschule Marburg, 1997. Available from: http://www.uni-marburg.de/archivschule/intrinsengl.html [21] Research Libraries Group, RLG Preservation Working Group on Digital Archiving. Preliminary Report: Recommendations for RLG. 14 January 1998. Available from: http://www.rlg.org/preserv/archpre.html [22] Research Libraries Group, RLG Working Group on Preservation Issues of Metadata. Final report. May 1998. Available from: http://www.rlg.org/preserv/presmeta.html (Last visited: 17-Jun-1999) [23] Ross, S. Changing trains at Wigan: digital preservation and the future of scholarship. The JISC/NPO Digital Preservation Strategy Workshop, Scarman House, University of Warwick, 3-4 March 1999. Available from: http://www.leeds.ac.uk/cedars/OTHER/SRoss.htm

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 19 Chapter 6 ADVANCED TECHNIQUES OF IMAGE PROCESSING

[24] Russell, K. and Sergeant, D. The Cedars project: implementing a model for distributed digital archives. RLG DigiNews, 3 (3), June 1999. Available from: http://www.rlg.org/preserv/diginews/diginews3-3.html [25] Task Force on the Archiving of Digital Information. Preserving digital information: report of the Task Force on Archiving of Digital Information commissioned by the Commission on Preservation and Access and the Research Libraries Group.Washington, D.C.: Commission on Preservation and Access, 1996. Available from: http://www.rlg.org/ArchTF/ [26] UKOLN: the UK Office for Library and Information Networking. Long term preservation of electronic materials: a JISC/British Library Workshop as part of the Electronic Libraries Programme (eLib) organised by UKOLN, 27th and 28th November 1995 at the University of Warwick. British Library R&D Report 6238. Available from: http://www.ukoln.ac.uk/services/papers/bl/rdr6238/ [27] Waters, D.J. and Kenney, A. The Digital Preservation Consortium: mission and goals. Washington, D.C.: Commission on Preservation and Access, 1994. Available from: http://www.clir.org/pubs/reports/dpcmiss/dpcmiss.html [28] Waters, D.J. Archiving digital information. A presentation to the OCLC Research Library Directors' Conference, Dublin, Ohio, March 12, 1996. Available from: http://www.oclc.org/oclc/man/9680rldc/waters1.htm Available from: http://pantheon.cis.yale.edu/~dwaters/rlactalk.html [29] Waters, D.J. Electronic technologies and preservation. Washington, D.C.: Commission on Preservation and Access, 1992. Available from: http://www.clir.org/pubs/reports/waters/waters2.html [30] Waters, D.J. Electronic technologies and preservation: [...] based on a presentation to the Annual Meeting of the Research Libraries Group, June 25, 1992. Available from: http://www.ifla.org/documents/libraries/net/waters1.txt [31] Waters, D.J. Some considerations on the archiving of digital information. 1995. Available from: http://www.oclc.org:5046/oclc/research/links/archtf/waters_dig_arch.html [32] Waters, D.J. The social organization of archiving digital information [DRAFT]. 1995. Available from: http://www.oclc.org:5046/oclc/research/links/archtf/archtf_outline.html

 UNESCO, 2000. TRAINING OF TRAINERS AND USERS IN THE FIELDS OF CULTURAL EDUCATION 6 - 20