Browsing through High Quality Do cument Images with DjVu

Patrick Haner Leon Bottou Paul G Howard

Patrice Simard Yoshua Bengio and Yann Le Cun

ATT LabsResearch

Schultz Drive

Red Bank NJ

fhanerleonbpghpatriceyoshuayanngresearchattcom

Abstract Even if pictures and drawings are scanned and in

tegrated into the web page much of the visual asp ect

We present a new technique

of the original do cument is likely to be lost Visual

cal led DjVu that is specical ly geared towards the

details including font irregularities pap er color and

compression of highresolution highquality images of

pap er texture are particularly imp ortant for histori

scanned documents in color With DjVu any scr een

cal do cuments and may also b e crucial in do cuments

connected to the Internet can access and display im

with tables mathematical or chemical formulae and

ages of scannedpages while faithful ly reproducing the

handwritten text

font color drawing pictures and paper texture A

typical magazine page in color at dpi can becom

A simple alternativewould b e to scan the original

pressed down to between to KB approximately

page and simply compress the image as a JPEG or

to times better than JPEG for a similar level of

GIF le Unfortunately those les tend to be quite

subjective quality BW documents are typical ly

large if one wants to preserv e the readability of the

to KBytes at dpi or to times better than

text Compressed with JPEG a color image of a

ecient version of CCITTG Arealtime memory

typical magazine page scanned at dpi dots per

the decoder was implemented and is available as a

inch would be around KBytes to KBytes

plugin for popular web browsers

and would be barely readable The same page at

Keywords digital libraries image compression im

dpi would b e of acceptable quality but would o c

age segmentation coding

cupy around KBytes Even worse not only would

JBIG

the decompressed image ll up the entire memory of

an average PC but only a small p ortion of it would

Intro duction

b e visible on the screen at once A justreadable black

As electronic storage retrieval and distribution of

and white page in GIF would be around to

do cuments b ecomes faster and cheap er libraries are

KBytes

b ecoming increasingly digital Recent studies have

To summarize the current situation it is clear that

shown that it is already less costly to store do cuments

the complete digitization of the worlds ma jor library

digitally than to provide for buildings and shelves to

collections is only a matter of time In this context it

house them Unfortunately the diculty of con

seems paradoxical that there exist no universal stan

verting existing do cuments to electronic form is a ma

dard for ecient storage retrieval and transmission

jor obstacle to the development of digital libraries

of highquality do cument images in color

Existing do cuments are usually retyp ed and con

verted to HTML or Adob es PDF format a tedious Tomake remote access to digital libraries a pleasant

and exp ensive task sometimes facilitated by the use exp erience pages must app ear on the screen after only

of Optical Character Recognition OCR While the a few seconds delay Assuming a kilobits p er second

accuracy of OCR systems has b een steadily improv kbps connection this means that the most relevant

ing over the last decade they are still far from b eing parts of the do cument the text must b e compressed

able to translate faithfully a scanned do cumentinto a down to ab out to KBytes With a progressive

readable format without extensive manual compression technique the text would b e transmitted

correction and displayed rst Then the pictures drawings and

backgrounds would b e transmitted and displayed im background The p erformance of b oth of these meth

proving the quality of the image as more bits arrive o ds heavily relies on a new adaptive binary arithmetic

The overall size of the le should b e on the order of co ding technique called the Zco der also briey de

to KBytes to keep the overall transmission time scrib ed Section intro duces the plugin that allows

and storage requirements within reasonable b ounds to browse DjVu do cuments through standard appli

cations such as Netscap e Navigator or Microsoft Ex

Another p eculiarity of do cument images their large

plorer with a DjVu plugin Comparative results on a

size makes current image compression techniques in

wide variety of do cumenttyp es are given in Section

appropriate A magazinesize page at dots per

inch is pixel high and pixel wide Uncom

ImageBased Digital Libraries

pressed it o ccupies MBytes of memory more than

The imagebased approach to digital libraries is

what the average PC can handle A practical do cu

to store and to transmit do cuments as images To

ment image viewer would need to keep the image in a

achieve that we need to devise a metho d for com

compressed form in the memory of the machine and

pressing do cument images that makes it p ossible to

only decompress ondemand the part of the image that

transfer a highquality page over lowsp eed links mo

is displayed on the screen

dem or ISDN in a few seconds OCR would only b e

The DjVu do cument image compression technique

used for indexing with less stringent constraints on

describ ed in this pap er is an answer to all the ab ove

accuracy

problems With DjVu scanned pages at dpi in full

Several authors have prop osed imagebased ap

color can b e compressed down to to KBytes les

proaches to digital libraries The most notable ex

from MBytes originals with excellent quality Black

ample is the RightPages system The RightPages

and white pages typically o ccupy to KBytes

system was designed for do cument image transmission

once compressed This puts the size of highquality

over a lo cal area network It was used for some years

scanned pages in the same order of magnitude as an

bytheATT Bell Labs technical community and dis

average HTML page KBytes according to the lat

tributed commercially for customized applications in

est statistics DjVu pages are displayed within the

several countries The absence of a universal and op en

browser window through a plugin The DjVu plugin

platform for networking and bro wsing suchastodays

allows easy panning and zo oming of very large images

Internet limited the dissemination of the RightPages

This is made p ossible by an onthey decompression

system Similar prop osals have b een made more re

metho d whichallows images that would normally re

cently

quire MBytes of RAM once decompressed to re

All of the ab ove imagebased approaches and most

quire only MBytes of RAM

commercially available do cument image management

The basic idea b ehind DjVu is to separate the text

systems are restricted to black and white bilevel

from the backgrounds and pictures and to use dier

images This is adequate for technical and business

ent techniques to compress each of those comp onents

do cuments but insucient for other typ es of do cu

Traditional metho ds are either designed to compress

ments such as magazines catalogs or historical do c

natural images with few edges JPEG or to compress

uments Many formats exist for co ding bilevel do c

black and white do cument images almost entirely com

ument images notably the CCITT G and G fax

p osed of sharp edges CCITT G G and JBIG

standards the recent JBIG standard and the up

The DjVutechnique improves on b oth and combines

coming JBIG standard Using ATTs prop osals to

the b est of b oth approaches

the JBIG standard images of a t ypical black and

white page at dpi dots p er inch can b e transmit

Section reviews the currentavailable compression

ted over a mo dem link in a few seconds Work on

and displaying technologies and states the require

bilevel image compression standards is motivated by

ments for do cument image compression Section de

the fact that until recently do cument images were

scrib es the DjVu metho d of separately co ding the text

primarily destined to b e printed on pap er Most low

and drawings on one hand and the pictures and back

cost printer technologies excel at printing bilevel im

grounds on the other hand Section turns this idea

ages but they must rely on dithering and halftoning

into an actual image compression format It starts

to print greylevel or color images thus reducing their

with a description of the metho d used by DjVutoen

eective resolution

co de the text and the drawings This metho d is a

variation of ATTs prop osal to the new JBIG fax The low cost and availability of highresolution

standard It also includes a description of the IW color displays is causing more and more users to rely

waveletbased compression metho d for pictures and on their screen rather than on their printer to dis

play do cument images Even mo dern lowend PCs

can display x pixel images with bits per

pixelbitsperRGB comp onent while highend PCs

and workstations can display x at bits p er

pixel

Most do cuments displayed in bilevel mo de are

readable at dpi but are not pleasanttoread At

dpi the quality is quite acceptable in bilevel mo de

Displaying an entire x lettersize page at such

high resolution requires a screen resolution of pix

els vertically and pixels horizontallywhichisbe

yond traditional display technology Fortunately us

ing color or graylevels when displaying do cument im

ages at lower resolutions drastically improves readabil

ity and sub jective quality Most do cuments are read

able when displayed at dpi on a color or greyscale

display Only do cuments with particularly small fonts

require dpi for eortless readability At dpi a

typical page o ccupies pixels vertically and

pixels horizontally This is within the range of to days

highend displays Lowend PC displays and highend

p ortable computer displays have enough pixels but in

the landscap e mo de rather that the desired p ortrait

mo de

Do cument Image Compression with

DjVu

Figure Example of color document hobby

As we stated earlier the exp erience

cannot b e complete without a way of transmitting and

displaying do cument images in color Traditional color

image compression standards such as JPEG are inap

Let us consider a do cument image scanned at

propriate for do cument images JPEGs use of lo cal

dpi with bits per pixel such the catalog page

cosine transforms relies on the assumption that the

shown in Figure The main idea of our do cument

high spatial frequency comp onents in images can es

image compression technique is to generate and en

sentially be removed or heavily quantized without

co de separately three images from which the original

to o much degradation of quality While this assump

image can be reconstructed the background image

tion holds for most pictures of natural scenes it do es

the foreground image and the mask image The rst

not for do cument images A dierent technique is

two are lowresolution color images and the latter is

required to co de accurately and eciently the sharp

a highresolution bilevel image dpi A pixel in

edges of character images so as to maximize their clar

the deco ded image is constructed as follows if the

ity

corresp onding pixel in the mask image is the out

put pixel takes the value of the corresp onding pixel It is clear that dierent elements in the color im

in the appropriately upsampled background image If age of a typical page have dierent p erceptual char

the mask pixel is the pixel color is chosen as the acteristics First the text is usually highly contrasted

color of the connected comp onent or taken from the from the background with sharp edges The text must

foreground image The background image see for be rendered at high resolution dpi in bilevel or

instance the lower right image of Figure can be dpi in color if reading the page is to be a pleas

enco ded with a metho d suitable for continuoustone ant exp erience The second element in a do cument

images DjVu uses a progressive waveletbased com image is the pictures Rendering pictures at dpi

pression algorithm called IW for this purp ose The to dpi is typically sucient for acceptable quality

mask image the upp er left image of Figure can b e The third element is the background color and pap er

enco ded with a bilevel image compression algorithm texture The background colors may not require more

than dpi resolution DjVu uses a metho d called JB for this purp ose

A complete description of used by the IW waveletbased enco der a multiscale

DjVu s foregroundbackground separations algorithm successive projections algorithm which avoids

is beyond the scop e of this pap er so only the main sp ending bits to co de the parts of the background im

ideas are given here The image is partitioned into age that are covered by foreground ob jects The e

square blo cks of pixels A clustering algorithm nds vily draws on their use an ciency of IW and JB hea

the two dominant colors within each blo ck Then a ecient binary adaptive arithmetic co der called the

relaxation algorithm ensures that neighb oring blo cks ZPcoder On average the ZPco der is faster and

assign similar colors to the foreground and the back yields b etter average compression than other approx

ground After this phase each pixel is assigned to the imate arithmetic co ders All these algorithms have

foreground if its color closer to the foreground cluster realtime implementations and have b een integrated

prototyp e than to the background cluster prototyp e into a standalone DjVu enco der Each comp onentof

A subsequent phase cleans up and lters foreground the enco der is briey describ ed b elow

comp onents using a variety of criteria

Co ding the Bilevel Mask Using JB

The bilevel image compression algorithm used by

The DjVu Compression Format

DjVu to enco de the mask is dubb ed JB It is a vari

Here are the elements that comp ose a DjVu enco ded

ation on ATTs prop osal to the up coming JBIG

image le

fax standard Although the JBIG bilevel image

compression algorithm works quite well it has b ecome

The text and drawings also called the Maskare

clear over the past few years that there is a need to

represented by a single whose bits indicate

provide b etter compression capabilities for b oth loss

whether the corresp onding pixel in the do cument

less and of arbitrary scanned images

image has been classied as a foreground or a

containing b oth text and halftone images with scan

background pixel This bitmap typically contains

ning resolutions from to dots p er inch This

all the text and the highcontrast comp onents of

need was the basis for JBIG which is b eing devel

the drawings It is co ded at dpi using an algo

opp ed as a standard for bilevel do cumentcoding The

rithm called JB whichisavariation of ATTs

key to the compression metho d is a metho d for mak

prop osal to the up coming JBIG fax standard cf

ing use of the information in previously encountered

Section

characters without risking the intro duction of charac

The color of the text namely the Foregroundcon ter substitution errors that is inherent in the use of

tain a large number of neighb oring pixels with

Optical Character Recognition OCR metho ds

almost identical colors It can be considered as

The basic ideas b ehind JB are as follows

uniform for a given mark ie a connected com

The basic image is rst segmented into individual

p onent of foreground pixels

marks connected comp onents of black pixels

The Background is co ded at dpi using the

The marks are clustered hierarchically based on

waveletbased compression algorithm called IW

similarity using an appropriate distance measure

describ ed in Section

Some marks are compressed and co ded directly

The foregroundbackground representation was

using a statistical mo del and arithmetic co ding

prop osed in the ITU MRCT recommendation

The idea was used in

Other marks are compressed and co ded indirectly

Xeroxs XIFF image format which currently uses

based on previously co ded marks also using a sta

CCITTG to co de the mask layer and JPEG to co de

tistical mo del and arithmetic co ding The previ

the background and foreground layers A similar for

ously co ded mark used to help in co ding a given

mat is used in Xeroxs PagisPro desktop application

mark mayhave b een co ded directly or indirectly

for do cument scanning and indexing

DjVuachieves sup erior compression ratios by using

The image is co ded by sp ecifying for each mark

new compression algorithms for the mask layer as well

the identifying index of the mark and its p osition

as for the background and foreground layers Here are

relative to that of the previous mark

some of the novel techniques used by DjVu the soft

pattern matching algorithm used in the JB bi There are manyways to achieve the clustering and

level image compression algorithm for the mask layer the conditional enco ding of marks the algorithm that

the sparse set representation of wavelet co ecients we currently use is called soft pattern matching

This algorithm do es not yet attempt to optimize the ral to use waveletbased algorithms for enco ding the

clustering step image backgrounds However the requirements of the

DjVu pro ject set extreme constraints on the sp eed and

The key novelty with JB co ding is the solution to

memory requirements of the wavelet enco ding scheme

the problem of substitution errors in which an imp er

The background image is typically a dpi color im

fectly scanned symbol due to noise irregularities in

age containing one to two million pixels It mayonly

scanning etc is improp erly matched and treated as a

represent a nearly uniform background color It may

totally dierentsymbol Typical examples of this typ e

also contain colorful pictures and illustrations which

o ccur frequently in OCR representations of scanned

tally while the DjVu data should b e displayed incremen

do cuments where symb ols like o are often represented

is coming

as c when a complete lo op is not obtained in the

scanned do cument or a t is changed to an l when Our wavelet compression algorithm uses an inter

the upp er cross in the t is not detected prop erly By mediate representation based on a very fast ve stage

co ding the bitmap of eachmark rather than simply lifting decomp osition using DeslauriersDubuc inter

sending the matched class index the JB metho d is p olating with four analyzing moments and

robust to small errors in the matching of the marks to four vanishing moments Then the wavelet co e

class tokens Furthermore in the case when a good cients are progressively enco ded using arithmetic co d

match is not found for the current mark that mark ing cf Section and a technique named Hierar

b ecomes a chical Set Dierence comparable to zerotrees or token for a new class This new token is

setpartitioning algorithms then co ded using JBIG with a xed template of previ

ous pixels around the current mark By doing a small

Finally we have develop ed a technique for co ding

amount of prepro cessing such as elimination of very

the background image without wasting bits on back

small marks that represent noise intro duced during

ground pixels that will b e masked by foreground text

the scanning pro cess and smo othing of marks b efore

This simple and direct numerical metho d sets a large

compression the JB metho d can b e made highly ro

number of wavelet co ecients to zero while trans

bust to small distortions of the scanning pro cess used

forming the remaining wavelet co ecients in order to

to create the bilevel input image

preserve the visible pixels of the background only The

null co ecien The JB metho d has proven itself to b e ab out ts do not use memory and are co ded very

more ecient that the JBIG standard for lossless eciently by the arithmetic co der

compression of bilevel images By running the al

During decompression the wavelet co ecients are

gorithm in a controlled lossy mo de by prepro cessing

represented in a compact sparse array whichusesal

and decreasing the threshold for an acceptable match

most no memory for zero co ecients Using this tech

to an existing mark the JB metho d provides com

nique we can represent the complete background us

pression ratios ab out to times that of the JBIG

ing only a quarter of the memory required by the im

metho d for a wide range of do cuments with various

age pixels and generate the fully decompressed image

combinations of text and continuous tone images In

ondemand This greatly reduces the memory require

lossy mo de JB is to times b etter than CCITTG

ments of the viewer

which is lossless It is also to times b etter than

Arithmetic Co ding

GIF

Arithmetic co ding isawell known algorithm

Wavelet Compression of Background

for enco ding a string of symb ols with compression ra

Images

tios that can reach the information theory limit Its

of Multiresolution wavelet decomp osition is one

mechanism is to partition the interval of the real

the most ecient algorithms for co ding color images

line into subintervals whose lengths are prop ortional to

and is the most likely candidate for future

the probabilities of the sequences of events they repre

multilevel image compression standards The image

sent After the subinterval corresp onding to the actual

is rst represented as a linear combination of lo cally

sequence of data is known the co der outputs enough

supp orted wavelets The image lo cal smo othness en

bits to distinguish that subinterval from all others If

sures that the distribution of the wavelet co ecients is

probabilities are known for the p ossible events at a

sharply concentrated around zero High compression

given p ointin the sequence an arithmetic co der will

eciency is achieved using a quantization and co ding

use almost exactly  log p bits to co de an event whose

scheme that takes advantage of of this p eaked distri

ords the co der achieves en probabilityisp In other w

bution

tropic compression We can think of the enco der and

Because of the smo othness assumption it is natu deco der as blackboxes that use the probability infor

mation to pro duce and consume a bitstream the ELSCo der ab out the same as the QMCo der

and b etter than the QCo der The dierences are

Arithmetic co ders unfortunately are computation

all small in the to p ercent range We also per

ally intensive For each string element a subroutine

formed two articial tests In a test of steady state

must provide the co derdeco der with a table con

behavior co ding a long sequence of random bits with

taining estimated probabilities for the o ccurrence of

xed probabilities the ZPCo der p erformed ab out as

each p ossible symb ol at this p oint in the string The

well as the QMCo der better than the QCo der

co derdeco der itself must p erform a table search and

and much b etter than the ELSCo der

at least one multiplication

In a test of early adaptation co ding a long sequence

of random bits with xed probabilities but reinitializ

Binary Arithmetic Co ding

output bits the ZP ing the enco der index every

Binary adaptive arithmetic coders have b een devel

Co der did b etter than the QMCo der whichwas b et

op ed to overcome this drawback as computations can

ter than the QCo der whichinturnwas b etter than

be approximated using a small number of shifts and

the ELSCo der

additions instead of a multiplication

The ZPCo ders deco ding sp eed is faster than that

Moreover Binary Adaptive Arithmetic Co ders in

of the QMCo der which is in turn faster than the

clude an adaptive algorithm for estimating the symbol

other two co ders

probabilities This algorithm up dates the integer vari

Browsing DjVu Do cuments

able along with the enco ding and deco ding op erations

Satisfaction of the digital library user dep ends crit

Complex probability mo dels are easily represented by

ically on the p erformance of the browsing to ols Much

maintaining multiple indices representing the condi

more time is sp ent viewing do cuments than formulat

tional probabilities of the symbols for each value of

ing queries As a consequence browsers must provide

the contextual information considered by the mo del

very fast resp onse smo oth zo oming and scrolling abil

The QMCo der used in the JBIG standard and

ities realistic colors and sharp pictures

in the lossless mo de of JPEG standard is an example

These requirements put stringent requirements on

of approximate binary arithmetic co der Other such

the browsing software The full resolution color image

co ders are the QCo der and the ELSCo der

of a page requires ab out megabytes of memory We

cannot store and pro cess many of these in the browser

The ZCo der and the ZPCo der

without exceeding the memory limits of average desk

top However wewant to displaysuch im

Wehave implemented new approximate binary arith

ages seamlessly and allow users to drag them in real

metic co ders namely the ZCo der and the ZPCo der

time on their screen

The ZCo der was develop ed as a generalization of

We develop ed a solution called Multithreaded

the Golomb runlength co der and it has inherited

twostage deco ding consisting of the following

its qualities of sp eed and simplicity Its inter

nal representation leads to faster and more accurate

A rst thread known as the decoding thread

implementations than either the QCo der or the QM

reads bytes on the internet connection and par

Co der The probability adaptation in the ZCo der also

tially deco des the DjVu stream This rst stage

departs from the QCo der and QMCo der algorithms

of the deco ding pro cess asynchronously up dates

in a way that simplies the design of the co der tables

an intermediate representation of the page image

The ZPCo der is a variation on the ZCo der with

This representation is still highly compressed our

nearly exactly the same sp eed and p erformance char

implementation requires less than megabytes of

acteristics A rougher approximation in the optimal

main memory p er page

entropic Zvalue costs less than half a p ercent p enalty

in co de size

A second thread known as the interactive thread

Wehave compared the ZPCo der with three other

handles user interaction and repaints the screen

adaptive binary co ders the QMCo der the Q

as needed This thread uses the intermediate rep

Co der a variant of the QCo der that uses bit

resentation to reconstruct the pixels corresp ond

registers instead of bit and the Augmented ELS

ing to the parts of the do cument that must be

Co der based on the ELSCo der

redrawn on the screen

In the main test various co ders including the ZP

Despite the complexity related to thread management Co der have b een incorp orated into the JB compres

and synchronization this organization provides many sion system The ZPCo der did slightly worse than

seck the mask text K is loaded secK The background is still blurred

K are necessary for this background

secK Loading is nished image

Figure Downloading through a K modem progressive decompression of text rst fol lowed by the background

at increasing qualitydetail of Figure

advantages Since b oth threads work asynchronously

the browser is able to display images incrementally

while data is coming in Memory requirements are

limited b ecause the interactive thread computes im

age pixels only for regions smaller than the screen size

Scrolling op erations are smo oth b ecause they involve

just the second stage deco ding of the few pixels un

covered by the scrolling op erations

We implemented these ideas in a plugin for

Netscap e Navigator or Internet Explorer Each page

of a DjVu do cument is displayed byinvoking its URL

Behind the scenes the plugin implements information

JPEG at dpi only DjVu

caching and sharing This design allows the digital li

brary designer to set up anavigation interface using

Figure Comparison of JPEG at dpi left with

well known Web technologies like HTML or Java Fig

quality factor and DjVu right The images are

ure shows how a do cument is displayed while it is

cropped from hobby The le sizes are K for

downloaded through a K mo dem

JPEG and K for DjVu

Results and Comparisons with

Conclusion

Other Metho ds

As digital libraries are increasingly b ecoming a fact

Wehave selected seven images representing typical

of life they will require a universal standard for ef

color do cuments These images have b een scanned at

cient storage retrieval and transmission of high

dpi and bitspixel from a variety of sources Our

quality do cument images The work describ ed in this

compression scheme combines twomain chunks one

pap er is a substantial step towards meeting this need

wavelet compressed chunk for the background color

by prop osing a highly ecient compression format

one JBIG chunk for the bitmap whose combined size

DjVu together with a browser that enables fast

is rep orted in Figure

internet access With the same level of legibility

Figure gives a full comparison between JPEG

dots p er inch DjVuachieves compression ratios to

and DjVu with details from each image to assess the

times higher than JPEG

readability Those details do not show the signicant

The DjVu plugin is freely available for download

amount of graphics and texture that all these images

at httpdjvuresearchattcom This site con

contain However wegive the p ercentage of bits sp ent

tains an exp erimental Digital Library with do cu

on co ding graphics and texture in eac h image which

ments from various origins The DjVu enco der is also

ranges from to When compared to the original

available for researchandevaluation purp oses

dpi raw image DjVu achieves compression rates

It is p ossible to further optimize the compres

ranging from to

sion rate A version of DjVu that enco des sev

Compressing JPEG do cuments at dpi with the

eral pages together will be able to share JBIG

lowest p ossible quality setting yields images that

shap e dictionaries between pages Problems such

are of comparable quality as with DjVu As shown

as foregroundbackgroundmask separation or con

in the JPEG dpi column le sizes are to

nected comp onent ltering are b eing rephrased in

times larger than DjVu le sizes

terms of compression rate optimization

For the sake of comparison we subsampled the do c

The addition of text layout analysis and optical

ument images to dpi with lo cal averaging and

cognition OCR will makeit p ossible to character re

applied JPEG compression adjusting the quality pa

index and edit text extracted from DjVuenco ded do c

rameter to pro duce les sizes similar to those of DjVu

uments

In the JPEG dpi column the fact that text is

hardly readable is not due to the dpi subsampling

References

but to ringing artifacts inherentinlow JPEG qual

ity settings

JBIG Progressive bilevel image compression

Figure shows a more global comparison b etween ITU recommendation T ISOIEC Interna

DjVu and JPEG tional Standard

Image JPEG dpi JPEG dpi DjVu

Description Raw image detail quality sizeDjVu compressed

Magazine Add

image

K K K K adsfreehand

Brattain

Noteb o ok

image

K K K K brattain

Scientic

Article

image

graham K K K K

Newspap er

Article

image

lrrwpostlrrwpost K K K K

CrossSection

of Jupiter

image

planetsjupiter K K K K

XVI I Ith

Century book

image

cuisinep K K K K

US First

Amendment

image

usaamend K K K K

Figure Compression results for seven selected images with compression format Raw applies no compression

JPEG quality ranges from to the value yielding the compression rate which is the closest to DjVu is

chosen The image value corresponds to the percentage of bits required to code for the background Each

column shows the same selected detail of the image Tomakeselection as objective as possible when possible the

are the le size in kilobytes rst occurrence of the word the was chosen The two numbers under each image

and the compression ratio with respect to the raw le size

M Lesk Practical Digital Libraries Books Bytes PGHoward and J S Vitter Arithmetic co ding

and Bucks Morgan Kaufmann for Proceedings of the IEEE

G Story L OGorman D Fox L Shap er and

JPEG Digital compression and co ding of contin

H Jagadish The RightPages imagebased elec

requirements and guide uous tone still images

tronic library for alerting and browsing IEEE

lines ITU recommendation T ISOIEC Inter

Computer

national Standard

T Phelps and R Wilensky Towards active ex

W B Pennebaker J L Mitchell G G Lang

tensible networked do cuments Multivalentarchi

don and R B Arps An overview of the basic

tecture and applications In Proceedings of the

principles of the qco der adaptive arithmetic bi

st ACM International Conference on Digital Li

nary co der IBM Journal of Research and Devel

braries pages

opment Novemb er

I H Witten A Moat and T C Bell Managing

Wm D Withers A rapid entropyco ding algo

Gigabytes Compressing and Indexing Documents

rithm Technical

and Images Van Nostrand Reinhold New York

rep ort Pegasus Imaging Corp oration Url

ftpwwwp egasusimagingcompubelsco derp df

MRC Mixed rater content MRC mo de ITU

SW Golomb Runlength enco dings IEEE

Recommendation T

Trans Inform Theory IT July

E H Adelson E Simoncelli and R Hingorani

L Bottou P Howard and Y Bengio The Z

Orthogonal pyramid transform for image co ding

co der adaptive binary co der In Proceedings of

In Proc SPIE vol Visual Communication

IEEE Data Compression Conference Snowbird

and Image Processing II pages Cambridge

UT

MA Octob er

L Bottou and S Pigeon Lossy compression of

J M Shapiro Emb edded image co ding using ze

partially masked still images In Proceedings of

rotrees of wavelets co ecients IEEE Transac

IEEE Data Compression Conference Snowbird

tions on Signal Processing Decem

UT

b er

Wim Sweldens The lifting scheme A custom

design construction of biorthogonal wavelets

Harmonic Journal of Applied Computing and

Analysis

Amir Said and William A Pearlman A new fast

and ecient image co dec based on set partitioning

in hierarchical trees IEEE Transactions on Cir

cuits and Systems for VideoTechnology

June

R N Ascher and G Nagy A means for achiev

ing a high degree of compaction on scandigitized

printed text IEEE Trans Comput C

Novemb er

PGHoward Text image compression using soft

pattern matching Computer Journal to

app ear

I H Witten R M Neal and J G Cleary Arith

metic co ding for data compression Communica

tions of the ACM June