Color Do cuments on the Web with DjVu
Patrick Ha ner, Yann LeCun, Leon Bottou, Paul Howard, Pascal Vincent, Bill Riemers
AT&T Labs - Research
100 Schulz Drive, Red Bank, NJ 07701, USA
fha ner,yann,leonb,pgh,vincent,b [email protected]
Abstract catalogs for e-commerce sites, online publishing, forms
pro cessing, and scienti c publication, are in need of an
We present a new image compression technique
ecient compression technique for color do cuments.
cal led \DjVu" that is speci cal ly geared towards the
The availability of low-cost, high quality color scan-
compression of scanned documents in color at high
ners, the recent emergence of high-sp eed pro duction
resolution. With DjVu, a magazine page in color at
color scanners, and the app earance of ultra high res-
300dpi typical ly occupies between 40KB and 80KB,
olution digital cameras op ens the do or to such appli-
approximately 5 to 10 times better than JPEG for
cations.
a similar level of readability. Using a combination
of Hidden Markov Model techniques and MDL-driven
Standard color image compression algorithms are
heursitics, DjVu rst classi es each pixel in the image
inadequate for such applications b ecause they pro-
as either foreground text, drawings or background
duces excessively large les if one wants to preserve
pictures, photos, paper texture. The pixel categories
the readabilityofthe text. Compressed with JPEG,
form a bitonal image which is compressed using a pat-
a color image of atypical magazine page scanned at
tern matching technique that takes advantage of the
100dpi dots per inch is around 100-200KB, and is
similarities between character shapes. A progessive,
barely readable. The same page at 300dpi has accept-
wavelet-based compression technique, combined with
able quality, but o ccupies 300-600 KB. These sizes are
a masking algorithm, is then used to compress the
impractical for online do cument browsing, even with
foreground and background images at lower resolution
broadband connections.
while minimizing the number of bits spent on the pixels
Preserving the readability of the text and the sharp-
that are not visible in the foreground and background
ness of line art requires high resolution and ecient
planes. Encoders, decoders, and real-time, memory ef-
co ding of sharp edges typically 300dpi. On the other
cient plug-ins for various web browsers are available
hand preserving the app earance of continuous-tone
for al l the major platforms.
images and background pap er textures do es not re-
quire as high a resolution typically 100dpi An obvi-
1 Intro duction
ous waytotake advantage of this is to segment these
With the generalized use of the Internet and the
elements into separate layers. The foreground layer
declining cost of scanning and storage hardware, do c-
would contain the text and line drawings, while the
uments are increasingly archived, communicated, and
background layer would contain continuous-tone pic-
manipulated in digital form rather than in pap er
tures and background textures.
form. The growing need for instant access to informa-
The separation metho d brings another considerable
tion makes the computer screen the preferred display
advantage. Since the text layer is separated, it can b e
medium.
stored in a chunk at the b eginning of the image le
Compression technology for bitonal black and
and deco ded by the viewer as so on as it arrives in the
white do cument image archives has a long history
client machine.
see [9] and references therein. It is the basis of a
large and rapidly growing industry with widely ac- Overall, the requirements for an acceptable user ex-
cepted standards Group 3, MMR/Group 4, and less p erience are as follows: The text should app ear on
p opular and emerging standards JBIG1, JBIG2. the screen after only a few seconds delay. This means
The last few years have seen a growing demand for that the text layer must t in 20-40KB assuming a
a technology that could handle color do cuments in a 56Kb/sec connection. The pictures, and backgrounds
e ective manner. Such applications as online digital would app ear next, improving the image quality as
libraries with ancient or historical do cuments, online more bits arrive. The overall size of the le should
A pixel in the deco ded image is constructed as follows: be on the order of 50 to 100 KB to keep the over-
if the corresp onding pixel in the mask image is 0, the all transmission time and storage requirements within
output pixel takes the value of the corresp onding pixel reasonable b ounds.
in the appropriately up-sampled background image. If
Large images are also problematic during the de-
the mask pixel is 1, the pixel color is taken from the
compression pro cess. A magazine-size page at 300dpi
foreground image.
is 3300 pixels high and 2500 pixels wide and o ccupies
25 MB of memory in uncompressed form, more than
The mask image is enco ded with a new bi-level im-
what the average PC can prop erly handle. A practi-
age compression algorithm dubb ed JB2. It is a varia-
cal do cument image viewer should therefore keep the
tion on AT&T's prop osal to the emerging JBIG2 stan-
image in a compressed form in the memory of the ma-
dard. The basic idea of JB2 is to lo cate individual
chine and only decompress on-demand the pixels that
shap es on the page such as characters, and use a
are b eing displayed on the screen.
shap e clustering algorithm to nd similarities b etween
shap es [1, 5]. Shap es that are representativeof each
The DjVu do cument image compression tech-
cluster or in a cluster by themselves are co ded as in-
nique [2] describ ed in this pap er addresses all the
dividual bitmaps with a metho d similar to JBIG1. A
ab ove mentioned problems. With DjVu, pages
given pixel is co ded with arithmetic co ding using pre-
scanned at 300dpi in full color can be compressed
viously co ded and neighb oring pixels as a context or
down to 30 to 80 KB les from 25 MB originals with
predictor. Other shap es in a cluster are co ded using
excellent quality. This puts the size of high-quality
the cluster prototyp e as a context for the arithmetic
scanned pages in the same order of magnitude as an
co der, thereby greatly reducing the required number
average HTML page which are around 50KB on aver-
of bits since shap es in a same cluster have many pixels
age. DjVu pages are displayed progressively within a
in common. In lossy mo de, shap es that are suciently
web browser window through a plug-in, which allows
similar to the cluster prototyp e may b e substituted by
easy panning and zo oming of very large images with-
the prototyp e. A another chunk of bits contains a list
out generating the fully deco ded 25MB image. This
of shap e indices together with the p osition at which
is made p ossible by storing partially deco ded images
they should b e painted on the page. all of this is co ded
in a data structure that typically o ccupies 2MB, from
using arithmetic co ding.
which the pixels actually displayed on the screen can
b e deco ded on the y.
For the background and foreground images, DjVu
This pap er gives an overview of the DjVu technol-
uses a progressive, wavelet-based compression algo-
ogy, which is describ ed in more details in [2]. The
rithm called IW44. IW44 o ers many key advan-
rst two sections explain the general principles used
tages over existing continuous-tone images compres-
in DjVu compression and decompression. The remain-
sion metho ds. First, the wavelet transform can be
ing two sections 4 and 5 study the b ehavior of the the
p erformed entirely without multiplication op erations,
foreground/background segmentation algorithm. The
relying exclusively on shifts and adds, thereby greatly
last section details unique features that contribute to
reducing the computational requirements. Second, the
the p erformance of DjVu.
internal memory data structure for IW44 images al-
lows in-place progressive deco ding of the wavelet co-
2 The DjVu Compression Metho d
ecients without copies, and uses an ecient sparse
tree representation. Third, the data structure allows
The basic idea b ehind DjVu is to separate the text
ecient on-the- y rendering of any sub-image at any
from the background and pictures and to use dif-
prescrib ed resolution, in a time prop ortional to the
ferent techniques to compress each of those comp o-
number of rendered pixels not image pixels. This
nents. Traditional metho ds are either designed to
last feature is particularly useful for ecient panning
compress natural images with few edges JPEG, or
and zo oming through large images. Lastly, a mask-
to compress black and white do cument images al-
ing technique based on multiscale successive pro jec-
most entirely comp osed of sharp edges Group 3,
tions [4] is used to avoid sp ending bits to co de areas of
MMR/Group 4, and JBIG1. The DjVu technique
the background that are covered by foreground char-
improves on b oth and combines the b est of b oth ap-
acters or drawings. Both JB2 and IW44 rely on a new
proaches. A foreground/background separation algo-
typ e of adaptive binary arithmetic co der called the
rithm generates three images from which the original
ZP-co der[3], that squeezes out any remaining redun-
image can be reconstructed: the background image,
dancy to within a few p ercent of the Shannon limit.
the foreground image and the mask image. The rst
The ZP-co der is adaptive and faster than other ap-
two are low-resolution color images generally 100dpi
proximate binary arithmetic co ders.
for the background and 25dpi for the foreground, and
The idea of foreground/background representa- the latter is a high-resolution bi-level image 300dpi.
Compression None GIF JPEG DjVu
3 The DjVu Browser Plug-in
hobby p15 24715 1562 469 58
Browsers must provide a very fast resp onse, smo oth
medical dict. 16411 1395 536 110
zo oming and scrolling abilities, good color repro duc-
time zone 9174 576 249 36
tion and sharp text and pictures. These requirements
co okb o ok 12128 1000 280 52
imp ose stringent constraints on the browsing software.
hobby p17 23923 1595 482 52
The full resolution color image of a page requires ab out
U.S. Constit. 31288 2538 604 134
25 MB of memory. Decompressing such images b efore
hobbyp2 23923 1213 383 68
displaying them would exceed the memory limits of
ATT Olympic 23946 955 285 41
average desktop computers.
We develop ed a web browser plug-in that supp orts
all the ma jor browsers on all the ma jor OS platforms.
Table 1: Compressed les sizes in KB for 8 docu-
Downloaded images are rst pre-deco ded into an in-
ments using the fol lowing compression methods: no
ternal memory data structure that o ccupies approx-
compression, GIF on the 150dpi image, JPEG on
imately 2MB per page. The piece of the image dis-
the 300dpi image with quality 20 and DjVu with
played in the browser window is deco ded on-the- y
300dpi mask, and 100dpi backgrounds. The visual
from this data structure as the user pans around the
quality of the compressed images can be examined at
page. Unlike many do cument browsers, each page of a
http://www.djvu.att.com/examples/comp
DjVu do cument is asso ciated with a single URL. Be-
hind the scenes, the plug-in implements information
caching and sharing. This design allows the digital
tion is a key element of the MRC/T.44 standard
library designer to set up a navigation interface using
prop osal[8]. Section 6 rep orts the gain from using JB2
well-known Web technologies like HTML, JavaScript,
and IW44 rather than the traditional MMR/Group 4
or Java. This provides more exibility than other do c-
and JPEG which are part of the T.44 standard.
ument browsing systems where multi-page do cuments
are treated as a single entity, and the viewer handles
According to an indep endent test by Inglis [6]
the navigation b etween the pages. The DjVu plug-in
with bitonal scanned do cuments, JB2 in lossless mo de
supp orts hyp erlinks in DjVu do cuments by allowing
achieves an average compression ratio of 26.5, which
the content designer to sp ecify active regions on the
can b e compared to 13.5 for MMR/Group 4, and 19.4
image which links to a given URL when clicked up on.
for JBIG1. The lossy mo de brings another factor of 2
to 3 over lossless, with more improvement on mostly
textual images, and less on images with pictures and
4 The Foreground/Background Seg-
in low-quality images.
menter
The p erformance of IW44 is very similar to the b est
The rst phase of the foreground/background sep-
published wavelet-based enco ders. The size of IW44
aration is based on two-dimensional Hidden Markov
1
images is typically 50 to 70 that of JPEG for the
mo dels with two states foreground and background
same signal to noise ratio. IW44 is particularly go o d
and single Gaussian distributions. While the Gaussian
at high compression ratios, and for images with few
parameters means and covariances are estimated on
2
highly textured areas.
lo cal regions of the image, the \consolidated" tran-
sition probability between the foreground state and
On color do cuments, the full DjVu metho d with
the background state is regarded as an external con-
foreground/background separation can reach compres-
stant, the choice of which will b e discussed in the next
sion ratios of 500:1 to 1000:1. As shown in table 1, typ-
section.
ical letter-size color do cuments at 300dpi catalog or
magazine page compressed with DjVu o ccupy 30KB
This initial foreground separation stage is designed
to 80KB. Occasionally, larger do cuments, or do cument
to prefer over-segmentation, so that no characters are
with lots of highly detailed pictures or handwriting
dropp ed. As a consequence, it may erroneously put
may o ccupy 80 to 140KB. This is 5 to 10 times b et-
highly-contrasted pieces of photographs in the fore-
ter than JPEG for a similar level of legibility of the
ground. Avariety of lters must b e applied to the re-
text. DjVu is particularly go o d at repro ducing ancient
sulting foreground image so as to eliminate the most
do cuments with textured pap er.
obvious mistakes.
Results and examples are available from the DjVu
1
Equivalent to causal Markov Random Fields.
digital library at http://www.djvu.att.com/djvu.
2
As we use 2x2 pixel cliques, there are in fact 16 transition
More examples are available from many commercial
patterns, whose probabilities are expressed as a function of one
and non-commercial users of DjVu on the Internet. common transition probability. 4 4 4 x 10 x 10 x 10 The main lter is designed to b e as general as p ossi- 10 9 18 mask mask mask
background background
voids heuristics that would have to b e tuned ble and a background foreground foreground foreground
9 8 16
on hundreds of di erent kinds of do cuments. Since
8
the goal is compression, the problem is to decide, for 7 14
h foreground blob found by the previous algorithm, eac 7
6 12
whether it is preferable to actual ly co de it as fore-
6
kground. Two comp eting strategies ground or as bac 5 10
5
are asso ciated with data-generating mo dels. Using
4 8
um Description Length MDL approach[7],
a Minim 4
west the preferred strategy is the one that yields the lo 3 6
3
overall co ding cost, which is the sum of the cost of 2 4
2
co ding the mo del parameters and the cost of co ding
Like most
the error with resp ect to this \ideal" mo del. 1 1 2
MDL approaches used for segmentation [7], the moti- 0 0 0
2.5 5 7.5 2.5 5 7.5 2.5 5 7.5
vation is to obtain a system with very few parameters
Transition −log probability Transition −log probability Transition −log probability
to hand-tune. However, the MDL principle is used
Mail order XVI I century Textured
here to make only one decision, thus avoiding the time
catalog book do cument
consuming minimization of a complex ob jective func-
tion. To co de the blob as part of the \smo oth" back-
Figure 1: DjVu les sizes as a function of the
ground only requires a background mo del. To co de
negative log probability of the foreground/background
the blob as a piece of foreground that sticks out of
transition, reported for three documents. The
the background requires a foreground mo del, a back-
rst two are browsable on the Internet at
ground mo del and a mask mo del.
www.djvu.att.com/djvu/cat/sharp erim age/p 0009 .djv u
The background mo del assumes that the color of a
and www.djvu.att.com/djvu/antic s/pha rm/p 0001. djvu .
pixel is the average of the colors of the closest back-
ground pixels that can be found up and to the left.
The foreground mo del assumes that the color of a blob
enco ding a mask that includes images elements, tex-
is uniform. What remains to b e co ded is the b ound-
ture and noise. As it converges to zero, the text gets
aries of the mask: the mo del we use tends to favor
enco ded as background. Note that, to avoid inter-
horizontal and vertical b oundaries.
ferences, the ltering stage was not applied. In each
Thus, in the main lter, the background mo del al-
area plot, the vertical line corresp onds to the value of
lows a slow drift in the color, whereas the foreground
the transition probability for whichahuman observed
mo del assumes the color to be constant in a con-
the b est do cument quality. It has b een observed that,
nected comp onent. This di erence is critical to break
for most do cument, this p oint corresp onds to the b est
the symmetry b etween the foreground and the back-
compression rate. This prop erty greatly facilitates the
ground.
tuning of the segmenter.
6 Improvements due to JB2 and
5 Optimizing the segmenter
wavelet compressions
This section shows, on one example, howwe de ned
a semi-automatic tuning pro cedure for the segmenter.
Assuming a satisfactory foreground/background
separation, we can fo cus on the separate compres-
In the Markov mo del that drives the fore-
sion of the three sub-images. Our test sample is com-
ground/background separation, the transition prob-
p osed of 70 do cument images and contains highly tex-
ability between the foreground and the background
tured backgrounds, handwriting, mathematical sym-
states could not be estimated on the data, and was
b ols, hand drawings, and musical scores.
considered as a heuristic choice. The prop ortion of
For each image, our foreground/background separa- foreground decreases monotonously as this transition
tion algorithm pro duces 3 sub-images. The raw mask probability decreases. When it reaches zero, no fore-
with 1 bit/pixel is 24 times smaller than the origi- ground is left. Figure 1 shows the impact of this tran-
nal . The raw background image is 3 3=9 times sition probability on three very di erenttyp es of do c-
smaller. The raw foreground image is 12 12 = 144 uments. The area plots showhow the DjVu le size,
times smaller. The raw multi-layer image yields a which is the sum of the number of bytes used to enco de
compression rate of 6.25:1, as shown on the left of Fig- the mask, background and foreground layers, evolve
ure 2. On these sub-images, an ob jective comparison as a function of this probability. When it is high, the
between DjVu and traditional approaches is p ossible. do cument is over-segmented and bits are wasted in
out implying prohibitivedownload times for the end-
user with full-color 300dpi pages o ccupying only 40
80 KB, black and white pages only 15 to 40 KB, Mask to
958K CCITT−G4 14:1 t b o oks, where most of the color is on the Raw Mask and ancien
Image 68K kground, 30 to 60 KB 23,000K bac
Standard
The UNIX version of the DjVu compres- Fg 160K JPEG 35:1 Compression
116K
software is available free for No sion/decompression
compression Fg 5K
non-commercial use at http://www.djvu.att.com.
JPEG 59:1
le format sp eci cation and the DjVu reference Bg: 43K The
6.25:1
library is available in source form at the same URL.
Bg
u browser plug-in is available for Linux, Win- 2,555K IW44 The DjV
52:1
ws 95/98/NT, Mac, and various UNIX platforms. JB2 do
30:1
The ab oveweb site also contains a digital library with
ver 1000 pages of scanned do cuments from various Mask o 32K IW44 DjVu origins.
103:1 Fg 3K compression
u is already used by a wide variety of commer- 59K DjV
Bg:24K
cial and non-commercial users on the web, including
University Micro lm Inc. for their 22 million page
Early English Bo ok Online service.
Figure 2: For a given foreground/background separa-
tion, this gure compares, for the three sub-images,
References
the fol lowing con gurations: i no compression, ii
[1] R. N. Ascher and G. Nagy. A means for achieving
standard compression techniques such as JPEG and
a high degree of compaction on scan-digitized printed
MMR/Group 4 and iii compression techniques used
text. IEEE Trans. Comput., C-23:1174{1179, Novem-
in DjVu. Assuming we start from a 23MBytes raw
b er 1974.
document image, the average size we can expect for
[2] L. Bottou, P. Ha ner, P. G. Howard, P. Simard,
each sub-image is reported in the corresponding block.
Y. Bengio, and Y. LeCun. High quality do cument
The arrows show the techniques used and the compres-
image compression with djvu. Journal of Electronic
sion rates obtained.
Imaging, 73:410{428, 1998.
[3] L. Bottou, P.G.Howard, and Y. Bengio. The Z-co der
We measure, for each image, the gain in compression
adaptive binary co der. In Proceedings of IEEE Data
rate brought by the algorithms used in DjVu when
Compression Conference, pages 13{22, Snowbird, UT,
compared to the b est widely available compression
1998.
standard for this typ e of image MMR/Group 4 for
[4] L. Bottou and S. Pigeon. Lossy compression of par-
the mask, JPEG for the foreground and background.
tially masked still images. In Proceedings of IEEE Data
Figure 2 rep orts compression rates for the 70 images
Compression Conference, Snowbird, UT, March-April
group ed together.
1998.
Comparing JPEG with IW44 for the background
layer is p erformed by imp osing the same mean square
[5] P.G.Howard. Text image compression using soft pat-
error on visible non-masked pixels. Figure 2 shows
tern matching. Computer Journal, 402/3:146{156,
1997.
that the overall compression ratio for the background
sub-images improves from 59:1 with JPEG to 103:1
[6] Stuart Inglis. Lossless Document Image Compression.
with IW44. Overall, Figure 2 shows that the novel
PhD thesis, UniversityofWaikato, March 1999.
compression techniques used in DjVu to compress the
mask, foreground and background sub-images com-
[7] W. Niblack J. Sheinvald, B. Dom and D. Steele. Un-
press our typical 23MBytes image into a 59 KBytes
sup ervised image segmentation using the minimum de-
scription length principle. In Proceedings of ICPR 92,
DjVu le. This is ab out half what wewould have b een
1992.
obtained by combining the foreground/background
segmentor with JPEG and MMR/Group 4.
[8] MRC. Mixed rater content MRC mo de. ITU Rec-
ommendation T. 44, 1997.
7 Conclusion
DjVu, as a new compression technique for color do c-
[9] I. H. Witten, A. Mo at, and T. C. Bell. Managing
ument images, lls the gap b etween the world of pap er
Gigabytes: Compressing and Indexing Documents and
and the world of bits. It allows high-quality scanned
Images. Van Nostrand Reinhold, New York, 1994.
do cument to b e easily published on the Internet, with-