Color Documents on the Web with Djvu

Color Do cuments on the Web with DjVu Patrick Ha ner, Yann LeCun, Leon Bottou, Paul Howard, Pascal Vincent, Bill Riemers AT&T Labs - Research 100 Schulz Drive, Red Bank, NJ 07701, USA fha ner,yann,leonb,pgh,vincent,b [email protected] Abstract catalogs for e-commerce sites, online publishing, forms pro cessing, and scienti c publication, are in need of an We present a new image compression technique ecient compression technique for color do cuments. cal led \DjVu" that is speci cal ly geared towards the The availability of low-cost, high quality color scan- compression of scanned documents in color at high ners, the recent emergence of high-sp eed pro duction resolution. With DjVu, a magazine page in color at color scanners, and the app earance of ultra high res- 300dpi typical ly occupies between 40KB and 80KB, olution digital cameras op ens the do or to such appli- approximately 5 to 10 times better than JPEG for cations. a similar level of readability. Using a combination of Hidden Markov Model techniques and MDL-driven Standard color image compression algorithms are heursitics, DjVu rst classi es each pixel in the image inadequate for such applications b ecause they pro- as either foreground text, drawings or background duces excessively large les if one wants to preserve pictures, photos, paper texture. The pixel categories the readabilityofthe text. Compressed with JPEG, form a bitonal image which is compressed using a pat- a color image of atypical magazine page scanned at tern matching technique that takes advantage of the 100dpi dots per inch is around 100-200KB, and is similarities between character shapes. A progessive, barely readable. The same page at 300dpi has accept- wavelet-based compression technique, combined with able quality, but o ccupies 300-600 KB. These sizes are a masking algorithm, is then used to compress the impractical for online do cument browsing, even with foreground and background images at lower resolution broadband connections. while minimizing the number of bits spent on the pixels Preserving the readability of the text and the sharp- that are not visible in the foreground and background ness of line art requires high resolution and ecient planes. Encoders, decoders, and real-time, memory ef- co ding of sharp edges typically 300dpi. On the other cient plug-ins for various web browsers are available hand preserving the app earance of continuous-tone for al l the major platforms. images and background pap er textures do es not re- quire as high a resolution typically 100dpi An obvi- 1 Intro duction ous waytotake advantage of this is to segment these With the generalized use of the Internet and the elements into separate layers. The foreground layer declining cost of scanning and storage hardware, do c- would contain the text and line drawings, while the uments are increasingly archived, communicated, and background layer would contain continuous-tone pic- manipulated in digital form rather than in pap er tures and background textures. form. The growing need for instant access to informa- The separation metho d brings another considerable tion makes the computer screen the preferred display advantage. Since the text layer is separated, it can b e medium. stored in a chunk at the b eginning of the image le Compression technology for bitonal black and and deco ded by the viewer as so on as it arrives in the white do cument image archives has a long history client machine. see [9] and references therein. It is the basis of a large and rapidly growing industry with widely ac- Overall, the requirements for an acceptable user ex- cepted standards Group 3, MMR/Group 4, and less p erience are as follows: The text should app ear on p opular and emerging standards JBIG1, JBIG2. the screen after only a few seconds delay. This means The last few years have seen a growing demand for that the text layer must t in 20-40KB assuming a a technology that could handle color do cuments in a 56Kb/sec connection. The pictures, and backgrounds e ective manner. Such applications as online digital would app ear next, improving the image quality as libraries with ancient or historical do cuments, online more bits arrive. The overall size of the le should A pixel in the deco ded image is constructed as follows: be on the order of 50 to 100 KB to keep the over- if the corresp onding pixel in the mask image is 0, the all transmission time and storage requirements within output pixel takes the value of the corresp onding pixel reasonable b ounds. in the appropriately up-sampled background image. If Large images are also problematic during the de- the mask pixel is 1, the pixel color is taken from the compression pro cess. A magazine-size page at 300dpi foreground image. is 3300 pixels high and 2500 pixels wide and o ccupies 25 MB of memory in uncompressed form, more than The mask image is enco ded with a new bi-level im- what the average PC can prop erly handle. A practi- age compression algorithm dubb ed JB2. It is a varia- cal do cument image viewer should therefore keep the tion on AT&T's prop osal to the emerging JBIG2 stan- image in a compressed form in the memory of the ma- dard. The basic idea of JB2 is to lo cate individual chine and only decompress on-demand the pixels that shap es on the page such as characters, and use a are b eing displayed on the screen. shap e clustering algorithm to nd similarities b etween shap es [1, 5]. Shap es that are representativeof each The DjVu do cument image compression tech- cluster or in a cluster by themselves are co ded as in- nique [2] describ ed in this pap er addresses all the dividual bitmaps with a metho d similar to JBIG1. A ab ove mentioned problems. With DjVu, pages given pixel is co ded with arithmetic co ding using pre- scanned at 300dpi in full color can be compressed viously co ded and neighb oring pixels as a context or down to 30 to 80 KB les from 25 MB originals with predictor. Other shap es in a cluster are co ded using excellent quality. This puts the size of high-quality the cluster prototyp e as a context for the arithmetic scanned pages in the same order of magnitude as an co der, thereby greatly reducing the required number average HTML page which are around 50KB on aver- of bits since shap es in a same cluster have many pixels age. DjVu pages are displayed progressively within a in common. In lossy mo de, shap es that are suciently web browser window through a plug-in, which allows similar to the cluster prototyp e may b e substituted by easy panning and zo oming of very large images with- the prototyp e. A another chunk of bits contains a list out generating the fully deco ded 25MB image. This of shap e indices together with the p osition at which is made p ossible by storing partially deco ded images they should b e painted on the page. all of this is co ded in a data structure that typically o ccupies 2MB, from using arithmetic co ding. which the pixels actually displayed on the screen can b e deco ded on the y. For the background and foreground images, DjVu This pap er gives an overview of the DjVu technol- uses a progressive, wavelet-based compression algo- ogy, which is describ ed in more details in [2]. The rithm called IW44. IW44 o ers many key advan- rst two sections explain the general principles used tages over existing continuous-tone images compres- in DjVu compression and decompression. The remain- sion metho ds. First, the wavelet transform can be ing two sections 4 and 5 study the b ehavior of the the p erformed entirely without multiplication op erations, foreground/background segmentation algorithm. The relying exclusively on shifts and adds, thereby greatly last section details unique features that contribute to reducing the computational requirements. Second, the the p erformance of DjVu. internal memory data structure for IW44 images allows in-place progressive deco ding of the wavelet co- 2 The DjVu Compression Metho d ecients without copies, and uses an ecient sparse tree representation. Third, the data structure allows The basic idea b ehind DjVu is to separate the text ecient on-the- y rendering of any sub-image at any from the background and pictures and to use dif- prescrib ed resolution, in a time prop ortional to the ferent techniques to compress each of those comp o- number of rendered pixels not image pixels. This nents. Traditional metho ds are either designed to last feature is particularly useful for ecient panning compress natural images with few edges JPEG, or and zo oming through large images. Lastly, a mask- to compress black and white do cument images al- ing technique based on multiscale successive pro jec- most entirely comp osed of sharp edges Group 3, tions [4] is used to avoid sp ending bits to co de areas of MMR/Group 4, and JBIG1. The DjVu technique the background that are covered by foreground char- improves on b oth and combines the b est of b oth ap- acters or drawings. Both JB2 and IW44 rely on a new proaches. A foreground/background separation algo- typ e of adaptive binary arithmetic co der called the rithm generates three images from which the original ZP-co der[3], that squeezes out any remaining redun- image can be reconstructed: the background image, dancy to within a few p ercent of the Shannon limit.

Load more