Effective Decompression of JPEG Document Images The-Anh Pham, Mathieu Delalandre
Total Page:16
File Type:pdf, Size:1020Kb
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2576024, IEEE Transactions on Image Processing IMAGE PROCESSING, IEEE TRANSACTIONS ON 1 Effective decompression of JPEG document images The-Anh Pham, Mathieu Delalandre Abstract—This work concentrates on developing an effective out for lossless and near lossless compression methods that are approach for decompressing JPEG document images. Our main devoted to document images such as MRC [1], DjVu [2], Digi- goal is targeted to time-critical applications, especially to those Paper [3], and TSMAP/RDOS [4]. Although these attempts situated on mobile network infrastructures. To this aim, the proposed approach is designed to work either in the trans- were shown to outperform the state-of-the-art compression form domain or image spatial plane. Specifically, the image techniques on a particular class of document images, they blocks are first classified into smooth blocks (e.g., background, require new standards for image representation. This constraint uniform regions) and non-smooth blocks (e.g., text, graphics, is not always applicable for many applications, especially for line-drawings). Next, the smooth blocks are fully decoded in those on mobile markets. Mobile users, in practice, prefer to the transform domain by minimizing the total block boundary variation, which is very efficient to compute. For decoding non- keep the existing standards, such as JPEG [5], for the images smooth blocks, a novel text model is presented that accounts being accessed. for the specifics of document content. Additionally, an efficient This work concentrates on improving the visual quality of optimization algorithm is introduced to reconstruct the non- document images compressed by the JPEG standard. At low smooth blocks. The proposed approach has been validated by bit-rate coding, JPEG encoded images are subject to heavy extensive experiments, demonstrating a significant improvement of visual quality, assuming that document images have been distortion of both blocking and ringing artifacts. These artifacts encoded at low bit-rates and thus are subject to severe distortion. can make the visual perception of document images impaired or even invisible. Although a large number of methods have Index Terms—Document decompression, JPEG decoding, total variation, soft classification. been proposed to address coding artifacts, they suffer from the major issue of expensive computational cost. This aspect prevents them from being applicable for time-critical applica- I. INTRODUCTION tions, especially for those developed on low bandwidth and Currently, due to the quickly expanding variety of portable restricted resource platforms. In addition, most of the existing digital imaging devices (e.g., cameras, smartphones, electronic work is devoted to natural images [6]–[14]. There has been camera-pen systems), the acquisition of document images has little effort to address the same problem for document content become much more convenient, thus leading to the huge [15]–[17]. Inspired by all of these facts, we attempt to bring an expansion of document data. In this expeditious evolution, effective approach for decoding the JPEG document images. the real challenges in document image analysis (DIA) have The proposed approach has been developed to produce a shifted toward effective pervasive computing, storage, sharing substantial improvement of visual quality while incurring a and browsing of mass digitized documents. A promising and low computational cost. In doing so, we have restricted our efficient approach is to exploit the benefits of very low bit rate approach to the scenarios that document images have been compression technologies. Lossless compression algorithms compressed in very high compression rates and hence they allow the encoded images to be correctly reconstructed, but are exposed to heavy disturbance of coding artifacts. the gain of the compression ratio is not sufficiently high. The rest of this paper has been organized in the following In contrast, lossy compression algorithms provide very low structure. Section II reviews the most recent research for post- bit rates at the cost of losing a certain degree of image processing JPEG artifacts. Section III provides a global de- quality. In addition, the level of image quality reduction can scription of the proposed approach, including three main com- be easily controlled by pre-determined parameters. For these ponents: block classification, smooth block decompression, reasons, the multimedia data, in their current form, are mostly and text block decompression. Next, the block classification compressed using a lossy compression scheme. process is detailed in Section IV, while the decompression of With the rapid increase of 3G-/4G-based markets, handheld smooth blocks and text blocks are described in Section V and devices and infrastructures, both the pervasive computation VI, respectively. Experimental results are presented in Section of document images and exploitation of related applications VII. Finally, we conclude the paper and give several lines of are becoming crucial needs for mobile users. Customers want future extensions in Section VIII. to access and retrieve good quality images while expecting a low bandwidth consumption. Additional needs include fast II. RELATED WORK response time and memory-efficient usage. These constraints imply that document images must be encoded and decoded Generally speaking, there are two main strategies for deal- in a very efficient manner. Several efforts have been carried ing with JPEG artifacts, namely, iterative and non-iterative approaches. The former approach is featured by an iterative The authors are with the Computer Science Lab of Polytech Tours school, optimization process in which an objective function is incor- 64 avenue Jean Portalis, 37200 Tours, France. The-Anh Pham is with Hong Duc University (HDU), Thanh Hoa city, Viet- porated based on some prior model of the signal or original nam. Emails: [email protected], [email protected] image. The latter approach is typically composed of two steps: 1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2576024, IEEE Transactions on Image Processing IMAGE PROCESSING, IEEE TRANSACTIONS ON 2 artifact detection and post-processing. Both of the approaches Total variation (TV) regularization methods have also been can be processed in the spatial image domain and/or Discrete widely used to address compression artifacts [10], [20], [21]. Cosine Transform (DCT) domain. In what follows, we shall The rationale behind the TV approach is realized based on discuss the most representative methods for each approach. the fact that the total variation of a noisy signal is generally We shall also provide deeper analysis, throughout this paper, higher than that of the original signal. Consequently, image for the methods that are closely related to our approach. denoising is handled by minimizing a proper TV function. Iterative methods: Typical iterative methods for artifact The work in [20], for instance, suggested that a weighted post-processing include maximum a posteriori estimation TV function should be computed by using the L1-norm. The (MAP) [7], [17]–[19], projection onto convex sets (POCS) [6], authors in [10] proposed using the L2-norm to build up a [8], [9], total variation (TV) regularization [10], [20], [21], and continuous TV model. The main deficiency, however, with sparse representation based on a learned dictionary (SRLD) such TV functions is termed as the staircasing effect [22]. [11], [13], [14]. Table I shows the main characterizations of Total generalized variation (TGV) has been introduced in [21] each method, and the details are presented hereafter. to alleviate this defect to some extent. After the TV functions have been determined, a variety of well-established methods TABLE I in the convex programming field can be applied to solve the CHARACTERIZATION OF ITERATIVE METHODS minimization problem. Recent development of JPEG artifact processing has shifted Approach Original signal model Domain towards dictionary-based sparse representation [11], [13], [14]. MAP Gibbs [7], [17], [18], BS-PM [19] DCT & Spatial POCS Neighboring constraint sets [6], [8], [9] DCT & Spatial The authors in [11] first introduced a denoising model based TV Gradient magnitude sum [10], [20], [21] DCT & Spatial on sparse and redundant representation. The underlying spirit SRLD Learned dictionaries [11], [13], [14] DCT & Spatial is to build up a dictionary consisting of the atoms that are used to sparsely represent the images. This dictionary can The MAP-based approach has been well-developed for be constructed in an offline process using a set of noise-free image denoising and JPEG artifact treatment. It solves the training image patches [11], [13] or in an online phase using inverse problem of finding the image X that corresponds the input image itself [11], [14]. In the latter case, image to the maximum a posteriori probability P (XjY ) given restoration is combined with dictionary