<<

Color Do cuments on the Web with DjVu

Patrick Ha ner, Yann LeCun, Leon Bottou, Paul Howard, Pascal Vincent, Bill Riemers

AT&T Labs - Research

100 Schulz Drive, Red Bank, NJ 07701, USA

fha ner,yann,leonb,pgh,vincent,b [email protected]

Abstract catalogs for e-commerce sites, online publishing, forms

pro cessing, and scienti c publication, are in need of an

We present a new technique

ecient compression technique for color do cuments.

cal led \DjVu" that is speci cal ly geared towards the

The availability of low-cost, high quality color scan-

compression of scanned documents in color at high

ners, the recent emergence of high-sp eed pro duction

resolution. With DjVu, a magazine page in color at

color scanners, and the app earance of ultra high res-

300dpi typical ly occupies between 40KB and 80KB,

olution digital cameras op ens the do or to such appli-

approximately 5 to 10 times better than JPEG for

cations.

a similar level of readability. Using a combination

of Hidden Markov Model techniques and MDL-driven

Standard color image compression algorithms are

heursitics, DjVu rst classi es each in the image

inadequate for such applications b ecause they pro-

as either foreground text, drawings or background

duces excessively large les if one wants to preserve

pictures, photos, paper texture. The pixel categories

the readabilityofthe text. Compressed with JPEG,

form a bitonal image which is compressed using a pat-

a color image of atypical magazine page scanned at

tern matching technique that takes advantage of the

100dpi dots per inch is around 100-200KB, and is

similarities between character shapes. A progessive,

barely readable. The same page at 300dpi has accept-

-based compression technique, combined with

able quality, but o ccupies 300-600 KB. These sizes are

a masking algorithm, is then used to compress the

impractical for online do cument browsing, even with

foreground and background images at lower resolution

broadband connections.

while minimizing the number of bits spent on the

Preserving the readability of the text and the sharp-

that are not visible in the foreground and background

ness of line art requires high resolution and ecient

planes. Encoders, decoders, and real-time, memory ef-

co ding of sharp edges typically 300dpi. On the other

cient plug-ins for various web browsers are available

hand preserving the app earance of continuous-tone

for al l the major platforms.

images and background pap er textures do es not re-

quire as high a resolution typically 100dpi An obvi-

1 Intro duction

ous waytotake advantage of this is to segment these

With the generalized use of the Internet and the

elements into separate layers. The foreground layer

declining cost of scanning and storage hardware, do c-

would contain the text and line drawings, while the

uments are increasingly archived, communicated, and

background layer would contain continuous-tone pic-

manipulated in digital form rather than in pap er

tures and background textures.

form. The growing need for instant access to informa-

The separation metho d brings another considerable

tion makes the screen the preferred display

advantage. Since the text layer is separated, it can b e

medium.

stored in a chunk at the b eginning of the image le

Compression technology for bitonal black and

and deco ded by the viewer as so on as it arrives in the

white do cument image archives has a long history

client machine.

see [9] and references therein. It is the basis of a

large and rapidly growing industry with widely ac- Overall, the requirements for an acceptable user ex-

cepted standards Group 3, MMR/Group 4, and less p erience are as follows: The text should app ear on

p opular and emerging standards JBIG1, JBIG2. the screen after only a few seconds delay. This means

The last few years have seen a growing demand for that the text layer must t in 20-40KB assuming a

a technology that could handle color do cuments in a 56Kb/sec connection. The pictures, and backgrounds

e ective manner. Such applications as online digital would app ear next, improving the as

libraries with ancient or historical do cuments, online more bits arrive. The overall size of the le should

A pixel in the deco ded image is constructed as follows: be on the order of 50 to 100 KB to keep the over-

if the corresp onding pixel in the mask image is 0, the all transmission time and storage requirements within

output pixel takes the value of the corresp onding pixel reasonable b ounds.

in the appropriately up-sampled background image. If

Large images are also problematic during the de-

the mask pixel is 1, the pixel color is taken from the

compression pro cess. A magazine-size page at 300dpi

foreground image.

is 3300 pixels high and 2500 pixels wide and o ccupies

25 MB of memory in uncompressed form, more than

The mask image is enco ded with a new bi-level im-

what the average PC can prop erly handle. A practi-

age compression algorithm dubb ed JB2. It is a varia-

cal do cument should therefore keep the

tion on AT&T's prop osal to the emerging JBIG2 stan-

image in a compressed form in the memory of the ma-

dard. The basic idea of JB2 is to lo cate individual

chine and only decompress on-demand the pixels that

shap es on the page such as characters, and use a

are b eing displayed on the screen.

shap e clustering algorithm to nd similarities b etween

shap es [1, 5]. Shap es that are representativeof each

The DjVu do cument image compression tech-

cluster or in a cluster by themselves are co ded as in-

nique [2] describ ed in this pap er addresses all the

dividual with a metho d similar to JBIG1. A

ab ove mentioned problems. With DjVu, pages

given pixel is co ded with arithmetic co ding using pre-

scanned at 300dpi in full color can be compressed

viously co ded and neighb oring pixels as a context or

down to 30 to 80 KB les from 25 MB originals with

predictor. Other shap es in a cluster are co ded using

excellent quality. This puts the size of high-quality

the cluster prototyp e as a context for the arithmetic

scanned pages in the same order of magnitude as an

co der, thereby greatly reducing the required number

average HTML page which are around 50KB on aver-

of bits since shap es in a same cluster have many pixels

age. DjVu pages are displayed progressively within a

in common. In lossy mo de, shap es that are suciently

web browser window through a plug-in, which allows

similar to the cluster prototyp e may b e substituted by

easy panning and zo oming of very large images with-

the prototyp e. A another chunk of bits contains a list

out generating the fully deco ded 25MB image. This

of shap e indices together with the p osition at which

is made p ossible by storing partially deco ded images

they should b e painted on the page. all of this is co ded

in a data structure that typically o ccupies 2MB, from

using arithmetic co ding.

which the pixels actually displayed on the screen can

b e deco ded on the y.

For the background and foreground images, DjVu

This pap er gives an overview of the DjVu technol-

uses a progressive, wavelet-based compression algo-

ogy, which is describ ed in more details in [2]. The

rithm called IW44. IW44 o ers many key advan-

rst two sections explain the general principles used

tages over existing continuous-tone images compres-

in DjVu compression and decompression. The remain-

sion metho ds. First, the can be

ing two sections 4 and 5 study the b ehavior of the the

p erformed entirely without multiplication op erations,

foreground/background segmentation algorithm. The

relying exclusively on shifts and adds, thereby greatly

last section details unique features that contribute to

reducing the computational requirements. Second, the

the p erformance of DjVu.

internal memory data structure for IW44 images al-

lows in-place progressive deco ding of the wavelet co-

2 The DjVu Compression Metho d

ecients without copies, and uses an ecient sparse

tree representation. Third, the data structure allows

The basic idea b ehind DjVu is to separate the text

ecient on-the- y rendering of any sub-image at any

from the background and pictures and to use dif-

prescrib ed resolution, in a time prop ortional to the

ferent techniques to compress each of those comp o-

number of rendered pixels not image pixels. This

nents. Traditional metho ds are either designed to

last feature is particularly useful for ecient panning

compress natural images with few edges JPEG, or

and zo oming through large images. Lastly, a mask-

to compress black and white do cument images al-

ing technique based on multiscale successive pro jec-

most entirely comp osed of sharp edges Group 3,

tions [4] is used to avoid sp ending bits to co de areas of

MMR/Group 4, and JBIG1. The DjVu technique

the background that are covered by foreground char-

improves on b oth and combines the b est of b oth ap-

acters or drawings. Both JB2 and IW44 rely on a new

proaches. A foreground/background separation algo-

typ e of adaptive binary arithmetic co der called the

rithm generates three images from which the original

ZP-co der[3], that squeezes out any remaining redun-

image can be reconstructed: the background image,

dancy to within a few p ercent of the Shannon limit.

the foreground image and the mask image. The rst

The ZP-co der is adaptive and faster than other ap-

two are low-resolution color images generally 100dpi

proximate binary arithmetic co ders.

for the background and 25dpi for the foreground, and

The idea of foreground/background representa- the latter is a high-resolution bi-level image 300dpi.

Compression None GIF JPEG DjVu

3 The DjVu Browser Plug-in

hobby p15 24715 1562 469 58

Browsers must provide a very fast resp onse, smo oth

medical dict. 16411 1395 536 110

zo oming and scrolling abilities, good color repro duc-

time zone 9174 576 249 36

tion and sharp text and pictures. These requirements

co okb o ok 12128 1000 280 52

imp ose stringent constraints on the browsing software.

hobby p17 23923 1595 482 52

The full resolution color image of a page requires ab out

U.S. Constit. 31288 2538 604 134

25 MB of memory. Decompressing such images b efore

hobbyp2 23923 1213 383 68

displaying them would exceed the memory limits of

ATT Olympic 23946 955 285 41

average desktop .

We develop ed a web browser plug-in that supp orts

all the ma jor browsers on all the ma jor OS platforms.

Table 1: Compressed les sizes in KB for 8 docu-

Downloaded images are rst pre-deco ded into an in-

ments using the fol lowing compression methods: no

ternal memory data structure that o ccupies approx-

compression, GIF on the 150dpi image, JPEG on

imately 2MB per page. The piece of the image dis-

the 300dpi image with quality 20 and DjVu with

played in the browser window is deco ded on-the- y

300dpi mask, and 100dpi backgrounds. The visual

from this data structure as the user pans around the

quality of the compressed images can be examined at

page. Unlike many do cument browsers, each page of a

http://www.djvu.att.com/examples/comp

DjVu do cument is asso ciated with a single URL. Be-

hind the scenes, the plug-in implements

caching and sharing. This design allows the digital

tion is a key element of the MRC/T.44 standard

library designer to set up a navigation interface using

prop osal[8]. Section 6 rep orts the gain from using JB2

well-known Web technologies like HTML, JavaScript,

and IW44 rather than the traditional MMR/Group 4

or Java. This provides more exibility than other do c-

and JPEG which are part of the T.44 standard.

ument browsing systems where multi-page do cuments

are treated as a single entity, and the viewer handles

According to an indep endent test by Inglis [6]

the navigation b etween the pages. The DjVu plug-in

with bitonal scanned do cuments, JB2 in lossless mo de

supp orts hyp erlinks in DjVu do cuments by allowing

achieves an average compression ratio of 26.5, which

the content designer to sp ecify active regions on the

can b e compared to 13.5 for MMR/Group 4, and 19.4

image which links to a given URL when clicked up on.

for JBIG1. The lossy mo de brings another factor of 2

to 3 over lossless, with more improvement on mostly

textual images, and less on images with pictures and

4 The Foreground/Background Seg-

in low-quality images.

menter

The p erformance of IW44 is very similar to the b est

The rst phase of the foreground/background sep-

published wavelet-based enco ders. The size of IW44

aration is based on two-dimensional Hidden Markov

1

images is typically 50 to 70 that of JPEG for the

mo dels with two states foreground and background

same signal to noise ratio. IW44 is particularly go o d

and single Gaussian distributions. While the Gaussian

at high compression ratios, and for images with few

parameters means and covariances are estimated on

2

highly textured areas.

lo cal regions of the image, the \consolidated" tran-

sition probability between the foreground state and

On color do cuments, the full DjVu metho d with

the background state is regarded as an external con-

foreground/background separation can reach compres-

stant, the choice of which will b e discussed in the next

sion ratios of 500:1 to 1000:1. As shown in table 1, typ-

section.

ical letter-size color do cuments at 300dpi catalog or

magazine page compressed with DjVu o ccupy 30KB

This initial foreground separation stage is designed

to 80KB. Occasionally, larger do cuments, or do cument

to prefer over-segmentation, so that no characters are

with lots of highly detailed pictures or handwriting

dropp ed. As a consequence, it may erroneously put

may o ccupy 80 to 140KB. This is 5 to 10 times b et-

highly-contrasted pieces of in the fore-

ter than JPEG for a similar level of legibility of the

ground. Avariety of lters must b e applied to the re-

text. DjVu is particularly go o d at repro ducing ancient

sulting foreground image so as to eliminate the most

do cuments with textured pap er.

obvious mistakes.

Results and examples are available from the DjVu

1

Equivalent to causal Markov Random Fields.

at http://www.djvu.att.com/djvu.

2

As we use 2x2 pixel cliques, there are in fact 16 transition

More examples are available from many commercial

patterns, whose probabilities are expressed as a function of one

and non-commercial users of DjVu on the Internet. common transition probability. 4 4 4 x 10 x 10 x 10 The main lter is designed to b e as general as p ossi- 10 9 18 mask mask mask

background background

voids heuristics that would have to b e tuned ble and a background foreground foreground foreground

9 8 16

on hundreds of di erent kinds of do cuments. Since

8

the goal is compression, the problem is to decide, for 7 14

h foreground blob found by the previous algorithm, eac 7

6 12

whether it is preferable to actual ly co de it as fore-

6

kground. Two comp eting strategies ground or as bac 5 10

5

are asso ciated with data-generating mo dels. Using

4 8

um Description Length MDL approach[7],

a Minim 4

west the preferred strategy is the one that yields the lo 3 6

3

overall co ding cost, which is the sum of the cost of 2 4

2

co ding the mo del parameters and the cost of co ding

Like most

the error with resp ect to this \ideal" mo del. 1 1 2

MDL approaches used for segmentation [7], the moti- 0 0 0

2.5 5 7.5 2.5 5 7.5 2.5 5 7.5

vation is to obtain a system with very few parameters

Transition −log probability Transition −log probability Transition −log probability

to hand-tune. However, the MDL principle is used

Mail order XVI I century Textured

here to make only one decision, thus avoiding the time

catalog book do cument

consuming minimization of a complex ob jective func-

tion. To co de the blob as part of the \smo oth" back-

Figure 1: DjVu les sizes as a function of the

ground only requires a background mo del. To co de

negative log probability of the foreground/background

the blob as a piece of foreground that sticks out of

transition, reported for three documents. The

the background requires a foreground mo del, a back-

rst two are browsable on the Internet at

ground mo del and a mask mo del.

www.djvu.att.com/djvu/cat/sharp erim age/p 0009 .djv u

The background mo del assumes that the color of a

and www.djvu.att.com/djvu/antic s/pha rm/p 0001. djvu .

pixel is the average of the colors of the closest back-

ground pixels that can be found up and to the left.

The foreground mo del assumes that the color of a blob

enco ding a mask that includes images elements, -

is uniform. What remains to b e co ded is the b ound-

ture and noise. As it converges to zero, the text gets

aries of the mask: the mo del we use tends to favor

enco ded as background. Note that, to avoid inter-

horizontal and vertical b oundaries.

ferences, the ltering stage was not applied. In each

Thus, in the main lter, the background mo del al-

area plot, the vertical line corresp onds to the value of

lows a slow drift in the color, whereas the foreground

the transition probability for whichahuman observed

mo del assumes the color to be constant in a con-

the b est do cument quality. It has b een observed that,

nected comp onent. This di erence is critical to break

for most do cument, this p oint corresp onds to the b est

the symmetry b etween the foreground and the back-

compression rate. This prop erty greatly facilitates the

ground.

tuning of the segmenter.

6 Improvements due to JB2 and

5 Optimizing the segmenter

wavelet compressions

This section shows, on one example, howwe de ned

a semi-automatic tuning pro cedure for the segmenter.

Assuming a satisfactory foreground/background

separation, we can fo cus on the separate compres-

In the Markov mo del that drives the fore-

sion of the three sub-images. Our test sample is com-

ground/background separation, the transition prob-

p osed of 70 do cument images and contains highly tex-

ability between the foreground and the background

tured backgrounds, handwriting, mathematical sym-

states could not be estimated on the data, and was

b ols, hand drawings, and musical scores.

considered as a heuristic choice. The prop ortion of

For each image, our foreground/background separa- foreground decreases monotonously as this transition

tion algorithm pro duces 3 sub-images. The raw mask probability decreases. When it reaches zero, no fore-

with 1 bit/pixel is 24 times smaller than the origi- ground is left. Figure 1 shows the impact of this tran-

nal . The raw background image is 3  3=9 times sition probability on three very di erenttyp es of do c-

smaller. The raw foreground image is 12  12 = 144 uments. The area plots showhow the DjVu le size,

times smaller. The raw multi-layer image yields a which is the sum of the number of used to enco de

compression rate of 6.25:1, as shown on the left of Fig- the mask, background and foreground layers, evolve

ure 2. On these sub-images, an ob jective comparison as a function of this probability. When it is high, the

between DjVu and traditional approaches is p ossible. do cument is over-segmented and bits are wasted in

out implying prohibitivedownload times for the end-

user with full-color 300dpi pages o ccupying only 40

80 KB, black and white pages only 15 to 40 KB, Mask to

958K CCITT−G4 14:1 t b o oks, where most of the color is on the Raw Mask and ancien

Image 68K kground, 30 to 60 KB 23,000K bac

Standard

The UNIX version of the DjVu compres- Fg 160K JPEG 35:1 Compression

116K

software is available free for No sion/decompression

compression Fg 5K

non-commercial use at http://www.djvu.att.com.

JPEG 59:1

le format sp eci cation and the DjVu reference Bg: 43K The

6.25:1

library is available in source form at the same URL.

Bg

u browser plug-in is available for Linux, Win- 2,555K IW44 The DjV

52:1

ws 95/98/NT, Mac, and various UNIX platforms. JB2 do

30:1

The ab oveweb site also contains a digital library with

ver 1000 pages of scanned do cuments from various Mask o 32K IW44 DjVu origins.

103:1 Fg 3K compression

u is already used by a wide variety of commer- 59K DjV

Bg:24K

cial and non-commercial users on the web, including

University Micro lm Inc. for their 22 million page

Early English Bo ok Online service.

Figure 2: For a given foreground/background separa-

tion, this gure compares, for the three sub-images,

References

the fol lowing con gurations: i no compression, ii

[1] R. N. Ascher and G. Nagy. A means for achieving

standard compression techniques such as JPEG and

a high degree of compaction on scan-digitized printed

MMR/Group 4 and iii compression techniques used

text. IEEE Trans. Comput., C-23:1174{1179, Novem-

in DjVu. Assuming we start from a 23MBytes raw

b er 1974.

document image, the average size we can expect for

[2] L. Bottou, P. Ha ner, P. G. Howard, P. Simard,

each sub-image is reported in the corresponding block.

Y. Bengio, and Y. LeCun. High quality do cument

The arrows show the techniques used and the compres-

image compression with djvu. Journal of Electronic

sion rates obtained.

Imaging, 73:410{428, 1998.

[3] L. Bottou, P.G.Howard, and Y. Bengio. The Z-co der

We measure, for each image, the gain in compression

adaptive binary co der. In Proceedings of IEEE Data

rate brought by the algorithms used in DjVu when

Compression Conference, pages 13{22, Snowbird, UT,

compared to the b est widely available compression

1998.

standard for this typ e of image MMR/Group 4 for

[4] L. Bottou and S. Pigeon. of par-

the mask, JPEG for the foreground and background.

tially masked still images. In Proceedings of IEEE Data

Figure 2 rep orts compression rates for the 70 images

Compression Conference, Snowbird, UT, March-April

group ed together.

1998.

Comparing JPEG with IW44 for the background

layer is p erformed by imp osing the same mean square

[5] P.G.Howard. Text image compression using soft pat-

error on visible non-masked pixels. Figure 2 shows

tern matching. Computer Journal, 402/3:146{156,

1997.

that the overall compression ratio for the background

sub-images improves from 59:1 with JPEG to 103:1

[6] Stuart Inglis. Lossless Document Image Compression.

with IW44. Overall, Figure 2 shows that the novel

PhD thesis, UniversityofWaikato, March 1999.

compression techniques used in DjVu to compress the

mask, foreground and background sub-images com-

[7] W. Niblack J. Sheinvald, B. Dom and D. Steele. Un-

press our typical 23MBytes image into a 59 KBytes

sup ervised image segmentation using the minimum de-

scription length principle. In Proceedings of ICPR 92,

DjVu le. This is ab out half what wewould have b een

1992.

obtained by combining the foreground/background

segmentor with JPEG and MMR/Group 4.

[8] MRC. Mixed rater content MRC mo de. ITU Rec-

ommendation T. 44, 1997.

7 Conclusion

DjVu, as a new compression technique for color do c-

[9] I. H. Witten, A. Mo at, and T. C. Bell. Managing

ument images, lls the gap b etween the world of pap er

Gigabytes: Compressing and Indexing Documents and

and the world of bits. It allows high-quality scanned

Images. Van Nostrand Reinhold, New York, 1994.

do cument to b e easily published on the Internet, with-