<<

Color Do cuments on the Web with DjVu

Patrick Haner Yann LeCun Leon Bottou Paul Howard Pascal Vincent Bill Riemers

ATT Labs Research

Schulz Drive Red Bank NJ USA

fhaneryannleonbpghvincentb crgresearchattcom

Abstract catalogs for ecommerce sites online publishing forms

pro cessing and scientic publication are in need of an

We present a new technique

ecient compression technique for color do cuments

cal led DjVu that is specical ly geared towards the

The availability of lowcost high quality color scan

compression of scanned documents in color at high

ners the recent emergence of highsp eed pro duction

resolution With DjVu a magazine p age in color at

color scanners and the app earance of ultra high res

dpi typical ly occupies between KB and KB

olution digital cameras op ens the do or to such appli

approximately to times better than JPEG for

cations

a similar level of readability Using a combination

of Hidden Markov Model techniques and MDLdriven

Standard color image compression are

heursitics DjVu rst classies each pixel in the image

inadequate for such applications b ecause they pro

as either foreground text drawings or background

duces excessively large les if one wants to preserve

pictures photos paper texture The pixel categories

the readabilityofthe text Compressed with JPEG

form a bitonal image which is compressedusingapat

a color image of atypical magazine page scanned at

tern matching technique that takes advantage of the

dpi dots per inch is around KB and is

similarities between character shapes A progessive

barely readable The same page at dpi has accept

chnique combined with based compression te

able quality but o ccupies KB These sizes are

a masking is then used to the

impractical for online do cument browsing even with

foreground and background images at lower resolution

broadband connections

while minimizing the number of bits spent on the pixels

Preserving the readability of the text and the sharp

that are not visible in the foreground and background

ness of line art requires high resolution and ecient

planes Encoders decoders and realtime memory ef

co ding of sharp edges typically dpi On the other

cient plugins for various web browsers are available

hand preserving the app earance of continuoustone

for al l the major platforms

images and background pap er textures do es not re

quire as high a resolution typically dpi An obvi

Intro duction

ous waytotakeadvantage of this is to segment these

With the generalized use of the Internet and the

foreground layer elements into separate layers The

declining cost of scanning and storage hardware do c

would contain the text and line drawings while the

uments are increasingly archived communicated and

background layer would contain continuoustone pic

manipulated in digital form rather than in pap er

tures and background textures

form The growing need for instant access to informa

The separation metho d brings another considerable

tion makes the screen the preferred display

advantage Since the text layer is separated it can b e

medium

stored in a chunk at the b eginning of the image le

Compression technology for bitonal black and

and deco ded by the viewer as so on as it arrives in the

white do cument image archives has a long history

client machine

see and references therein It is the basis of a

large and rapidly growing industry with widely ac Overall the requirements for an acceptable user ex

cepted standards Group MMRGroup and less p erience are as follows The text should app ear on

p opular and emerging standards JBIG JBIG the screen after only a few seconds delay This means

The last few years have seen a growing demand for that the text layer must t in KB assuming a

atechnology that could handle color do cuments in a Kbsec connection The pictures and backgrounds

eective manner Such applications as online digital would app ear next improving the image quality as

libraries with ancient or historical do cuments online more bits arriv e The overall size of the le should

be on the order of to KB to keep the over A pixel in the deco ded image is constructed as follows

all transmission time and storage requirements within if the corresp onding pixel in the mask image is the

reasonable b ounds output pixel takes the value of the corresp onding pixel

in the appropriately upsampled background image If

Large images are also problematic during the de

the mask pixel is the pixel color is taken from the

compression pro cess A magazinesize page at dpi

foreground image

is pixels high and pixels wide and o ccupies

MB of memory in uncompressed form more than

The mask image is enco ded with a new bilevel im

what the average PC can prop erly handle Apracti

age compression algorithm dubb ed JB It is a varia

cal do cument image viewer should therefore keep the

tion on ATTs prop osal to the emerging JBIG stan

image in a compressed form in the memory of the ma

dard The basic idea of JB is to lo cate individual

chine and only decompress ondemand the pixels that

shap es on the page such as characters and use a

are b eing displayed on the screen

shap e clustering algorithm to nd similarities b etween

shap es Shap es that are representativeof each

The DjVu do cument image compression tech

cluster or in a cluster by themselves are co ded as in

nique describ ed in this pap er addresses all the

dividual with a metho d similar to JBIG A

ab ove mentioned problems With DjVu pages

given pixel is co ded with arithmetic co ding using pre

scanned at dpi in full color can be compressed

viously co ded and neighb oring pixels as a context or

down to to KB les from MB originals with

predictor Other shap es in a cluster are co ded using

excellent quality This puts the size of highquality

the cluster prototyp e as a context for the arithmetic

scanned pages in the same order of magnitude as an

co der thereby greatly reducing the required number

a verage HTML page which are around KB on aver

of bits since shap es in a same cluster have many pixels

age DjVu pages are displayed progressively within a

in common In lossy mo de shap es that are suciently

window through a plugin which allows

similar to the cluster prototyp e may b e substituted by

easy panning and zo oming of very large images with

the prototyp e A another chunk of bits contains a list

out generating the fully deco ded MB image This

of shap e indices together with the position at which

is made p ossible by storing partially deco ded images

they should b e painted on the page all of this is co ded

in a data structure that typically o ccupies MB from

using arithmetic co ding

which the pixels actually displayed on the screen can

b e deco ded on the y

For the background and foreground images DjVu

This pap er gives an overview of the DjVutechnol

uses a progressiv e waveletbased compression algo

ogy which is describ ed in more details in The

rithm called IW IW oers many key advan

rst two sections explain the general principles used

tages over existing continuoustone images compres

in DjVu compression and decompression The remain

sion metho ds First the can be

ing two sections and study the b ehavior of the the

p erformed entirely without multiplication op erations

foregroundbackground segmentation algorithm The

relying exclusively on shifts and adds thereby greatly

last section details unique features that contribute to

reducing the computational requirements Second the

the p erformance of DjVu

internal memory data structure for IW images al

lows inplace progressive deco ding of the wavelet co

The DjVu Compression Metho d

ecients without copies and uses an ecient sparse

tree representation Third the data structure allows

The basic idea b ehind DjVu is to separate the text

ecient onthey rendering of any subimage at any

from the background and pictures and to use dif

prescrib ed resolution in a time prop ortional to the

ferent tec hniques to compress each of those comp o

number of rendered pixels not image pixels This

nents Traditional metho ds are either designed to

last feature is particularly useful for ecient panning

compress natural images with few edges JPEG or

images Lastly a mask and zo oming through large

to compress black and white do cument images al

ing technique based on multiscale successive pro jec

most entirely comp osed of sharp edges Group

tions is used to avoid sp ending bits to co de areas of

MMRGroup and JBIG The DjVu technique

the background that are covered by foreground char

improves on b oth and combines the b est of b oth ap

acters or drawings Both JB and IW rely on a new

proaches A foregroundbackground separation algo

typ e of adaptive binary arithmetic co der called the

rithm generates three images from which the original

ZPco der that squeezes out any remaining redun

image can be reconstructed the background image

dancy to within a few p ercent of the Shannon limit

the foreground image and the mask image The rst

The ZPco der is adaptive and faster than other ap

twoarelowresolution color images generally dpi

proximate binary arithmetic co ders

for the background and dpi for the foreground and

the latter is a highresolution bilevel image dpi The idea of foregroundbackground representa

Compression None GIF JPEG DjVu

The DjVu Browser Plugin

hobby p

Browsers must provide a very fast resp onse smo oth

medical dict

zo oming and scrolling abilities good color repro duc

time zone

tion and sharp text and pictures These requirements

co okb o ok

imp ose stringent constraints on the browsing software

hobby p

The full resolution color image of a page requires ab out

US Constit

MB of memory Decompressing such images b efore

hobbyp

displaying them would exceed the memory limits of

ATT Olympic

average desktop

Wedevelop ed a web browser plugin that supp orts

all the ma jor browsers on all the ma jor OS platforms

Table Compressed les sizes in KB for docu

wnloaded images are rst predeco ded into an in Do

ments using the fol lowing compression methods no

ternal memory data structure that o ccupies approx

compression GIF on the dpi image JPEG on

imately MB per page The piece of the image dis

the dpi image with quality and DjVu with

played in the browser window is deco ded onthey

dpi mask and dpi backgrounds The visual

from this data structure as the user pans around the

quality of the compressed images can be examined at

page Unlike many do cumentbrowsers each page of a

httpwwwdjvuattcomexamplescomp

DjVudocument is asso ciated with a single URL Be

hind the scenes the plugin implements information

caching and sharing This design allows the digital

tion is a key element of the MRCT standard

library designer to set up a navigation interface using

prop osal Section rep orts the gain from using JB

wellknown Web technologies like HTML JavaScript

and IW rather than the traditional MMRGroup

or Java This provides more exibility than other do c

and JPEG which are part of the T standard

umentbrowsing systems where multipage do cuments

are treated as a single entity and the viewer handles

According to an indep endent test by Inglis

the navigation b etween the pages The DjVu plugin

with bitonal scanned do cuments JB in lossless mo de

supp orts h yp erlinks in DjVu do cuments by allowing

achieves an average compression ratio of which

the content designer to sp ecify active regions on the

can b e compared to for MMRGroup and

image which links to a given URL when clicked up on

for JBIG The lossy mo de brings another factor of

to over lossless with more improvement on mostly

textual images and less on images with pictures and

The ForegroundBackground Seg

in lowquality images

menter

The p erformance of IW is very similar to the b est

The rst phase of the foregroundbackground sep

published waveletbased enco ders The size of IW

aration is based on twodimensional Hidden Markov

images is typically to that of JPEG for the

mo dels with two states foreground and background

same to noise ratio IW is particularly go o d

and single Gaussian distributions While the Gaussian

at high compression ratios and for images with few

parameters means and covariances are estimated on



highly textured areas

lo cal regions of the image the consolidated tran

sition probability between the foreground state and

On color do cuments the full DjVu metho d with

the background state is regarded as an external con

foregroundbackground separation can reach compres

stant the choice of which will b e discussed in the next

sion ratios of to As shown in table typ

section

ical lettersize color do cuments at dpi catalog or

magazine page compressed with DjVu o ccupy KB

This initial foreground separation stage is designed

to KB Occasionally larger do cuments or do cument

to prefer oversegmentationsothatnocharacters are

with lots of highly detailed pictures or handwriting

dropp ed As a consequence it may erroneously put

may o ccup y to KB This is to times bet

highlycontrasted pieces of photographs in the fore

ter than JPEG for a similar level of legibility of the

ground Avariety of lters must b e applied to the re

text DjVu is particularly go o d at repro ducing ancient

sulting foreground image so as to eliminate the most

do cuments with textured pap er

obvious mistakes

Results and examples are available from the DjVu

Equivalent to causal Markov Random Fields

at httpwwwdjvuattcomdjvu



As we use x pixel cliques there are in fact transition

More examples are available from many commercial

patterns whose probabilities are expressed as a function of one

common transition probability and noncommercial users of DjVu on the Internet 4 4 4 x 10 x 10 x 10 The main lter is designed to b e as general as p ossi 10 9 18 mask mask mask

background background

voids heuristics that would have to b e tuned ble and a background foreground foreground foreground

9 8 16

on hundreds of dierent kinds of do cuments Since

8

the goal is compression the problem is to decide for 7 14

h foreground blob found by the previous algorithm eac 7

6 12

whether it is preferable to actual ly co de it as fore

6

kground Two comp eting strategies ground or as bac 5 10

5

are asso ciated with datagenerating mo dels Using

4 8

um Description Length MDL approach

a Minim 4

west the preferred strategy is the one that yields the lo 3 6

3

overall co ding cost which is the sum of the cost of 2 4

2

co ding the mo del parameters and the cost of co ding

Likemost

the error with resp ect to this ideal mo del 1 1 2

MDL approaches used for segmentation the moti 0 0 0

2.5 5 7.5 2.5 5 7.5 2.5 5 7.5

vation is to obtain a system with very few parameters

Transition −log probability Transition −log probability Transition −log probability

to handtune However the MDL principle is used

Mail order XVI I century Textured

here to makeonlyone decision thus avoiding the time

catalog book do cument

consuming minimization of a complex ob jectivefunc

tion To co de the blob as part of the smo oth back

Figure DjVu les sizes as a function of the

ground only requires a background mo del To co de

negative log probability of the foregroundbackground

the blob as a piece of foreground that sticks out of

transition reported for three documents The

the background requires a foreground mo del a back

rst two are browsable on the Internet at

ground mo del and a mask mo del

wwwdjvuattcomdjvucatsharp erim agep djv u

The background mo del assumes that the color of a

and wwwdjvuattcomdjvuantic spha rmp djvu

pixel is the average of the colors of the closest back

ground pixels that can be found up and to the left

The foreground mo del assumes that the color of a blob

enco ding a mask that includes images elements

is uniform What remains to b e co ded is the b ound

ture and noise As it converges to zero the text gets

aries of the mask the mo del we use tends to favor

enco ded as background Note that to avoid inter

horizontal and vertical b oundaries

ferences the ltering stage was not applied In each

Thus in the main lter the background mo del al

area plot the vertical line corresp onds to the value of

lows a slow drift in the color whereas the foreground

the transition probability for whichahuman observed

mo del assumes the color to be constant in a con

the b est do cument quality It has b een observed that

nected comp onent This dierence is critical to break

for most do cument this p oint corresp onds to the b est

the symmetry b etween the foreground and the back

compression rate This prop erty greatly facilitates the

ground

tuning of the segmenter

Improvements due to JB and

segmenter Optimizing the

wavelet compressions

This section shows on one example howwe dened

a semiautomatic tuning pro cedure for the segmenter

Assuming a satisfactory foregroundbackground

separation we can fo cus on the separate compres

In the Markov mo del that drives the fore

sion of the three subimages Our test sample is com

groundbackground separation the transition prob

p osed of do cument images and contains highly tex

ability between the foreground and the background

tured backgrounds handwriting mathematical sym

states could not be estimated on the data and was

b ols hand drawings and musical scores

considered as a heuristic choice The prop ortion of

For each image our foregroundbackground separa foreground decreases monotonously as this transition

tion algorithm pro duces subimages The rawmask probability decreases When it reaches zero no fore

with bitpixel is times smaller than the origi ground is left Figure shows the impact of this tran

nal The rawbackground image is times sition probability on three very dierenttyp es of do c

smaller The raw foreground image is uments The area plots showhow the DjVu le size

times smaller The raw multilayer image yields a which is the sum of the number of used to enco de

compression rate of as shown on the left of Fig the mask background and foreground layers evolve

ure On these subimages an ob jective comparison as a function of this probability When it is high the

between DjVu and traditional approaches is p ossible do cument is oversegmented and bits are wasted in

out implying prohibitivedownload times for the end

user with fullcolor dpi pages o ccupying only

KB black and white pages only to KB Mask to

958K CCITT−G4 14:1 t b o oks where most of the color is on the Raw Mask and ancien

Image 68K kground to KB 23,000K bac

Standard

The UNIX version of the DjVu compres Fg 160K JPEG 35:1 Compression

116K

software is available free for No siondecompression

compression Fg 5K

noncommercial use at httpwwwdjvuattcom

JPEG 59:1

DjVu reference le format sp ecication and the Bg: 43K The

6.25:1

library is available in source form at the same URL

Bg

ubrowser plugin is available for Linux Win 2,555K IW44 The DjV

52:1

ws NT Mac and various UNIX platforms JB2 do

30:1

The ab oveweb site also contains a digital library with

ver pages of scanned do cuments from various Mask o 32K IW44 DjVu origins

103:1 Fg 3K compression

u is already used by a wide variety of commer 59K DjV

Bg:24K

cial and noncommercial users on the web including

University Microlm Inc for their million page

Early English Bo ok Online service

Figure For a given foregroundbackground separa

tion this gure compares for the three subimages

References

the fol lowing congurations i no compression ii

R N Ascher and G Nagy A means for achieving

standard compression techniques such as JPEG and

a high degree of compaction on scandigitized printed

MMRGroup and iii compression techniques used

text IEEE Trans Comput C Novem

in DjVu Assuming we start from a MBytes raw

b er

document image the average size we can expect for

L Bottou P Haner P G Ho ward P Simard

each subimage is reported in the corresponding block

Y Bengio and Y LeCun High quality do cument

The arrows show the techniques usedandthecompres

image compression with djvu Journal of Electronic

sion rates obtained

Imaging

L Bottou PGHoward and Y Bengio The Zco der

We measure for each image the gain in compression

adaptive binary co der In Proceedings of IEEE Data

rate brought by the algorithms used in DjVu when

Compression Conference pages Snowbird UT

compared to the best widely available compression

standard for this typ e of image MMRGroup for

L Bottou and S Pigeon of par

the mask JPEG for the foreground and background

tially masked still images In Proceedings of IEEE Data

Figure rep orts compression rates for the images

Compression ConferenceSnowbird UT MarchApril

group ed together

Comparing JPEG with IW for the background

layer is p erformed by imp osing the same mean square

PGHoward Text image compression using soft pat

error on visible nonmasked pixels Figure shows

tern matching Computer Journal

that the overall compression ratio for the background

subimages improves from with JPEG to

Stuart Inglis Lossless Document Image Compression

with IW Overall Figure shows that the novel

PhD thesis UniversityofWaikato March

compression techniques used in DjVu to compress the

mask foreground and background subimages com

W NiblackJSheinvald B Dom and D Steele Un

press our typical MBytes image in to a KBytes

sup ervised image segmentation using the minimum de

scription length principle In Proceedings of ICPR

DjVule This is ab out half what wewould have b een

obtained by combining the foregroundbackground

segmentor with JPEG and MMRGroup

MRC Mixed rater contentMRC mo de ITU Rec

ommendation T

Conclusion

DjVu as a new compression technique for color do c

I H Witten A Moat and T C Bell Managing

ument images lls the gap b etween the world of pap er

Gigabytes Compressing and Indexing Documents and

and the world of bits It allows highquality scanned

Images Van Nostrand Reinhold New York

do cument to b e easily published on the Internet with