Context-Based Image Retrieval System: Mechanisms Differences from Traditional Retrieval
Total Page:16
File Type:pdf, Size:1020Kb
Context-based image retrieval system: mechanisms differences from traditional retrieval systems and related compression and histogram techniques (Yan Wei, Yang Li, Ni Xudong) Abstract It is necessary to index and retrieve relative information on Internet and keep pace with the latest data. For these purposes, Retrieval Systems have been researched and developed for indexing and retrieving. However, there is much space left for Retrieval
Systems to develop. The increasing development of advanced multimedia applications requires new technologies for organizing and retrieving by content databases of still digital images or digital video sequences --- context-based image retrieval system.
At this point, this paper provides a comprehensive survey of the mechanisms of traditional retrieval systems and content-based image retrieval systems and introduces two promising research directions --- histogram and compression. Based on the handsome projects, the paper firstly compares the traditional retrieval systems and content-based image retrieval systems. Secondly, this paper focuses on the color feature, a main tool for the human to recognize the object. Since Swain and Ballard, many effective methods have been proposed to retrieve the image on the base of color feature, such as Histogram Intersection and Color Histogram. With the more requirement and looming problem in this field, further researches and techniques are proposed to increase the quality and speed of image retrieval, this section will survey on the recursive HSV-space segmentation technique, description and representation of multimedia information, multiple distributions, color cooccurrence histogram and compressed color histogram, as well as the intelligent retrieval system. In the third section, this paper provides a survey about compression techniques used in still image compression and in video compression. In still image, the paper will focus on the lossless compressions solved by Huffman coding and arithmetic coding, and the lossy image compression solved by DCT-Based transform coding and Subband coding. In video compression, the paper focuses on the most popular compression technique
MPEG, and introduce four kinds of MPEG formats. At the end, the paper will provide the overview of JPEG 2000. Section 1. (by Yan Wei)
1. Introduction
It is necessary to index and retrieve relative information on Internet and keep pace with the latest data. For these purposes, the corporations have researched and developed many tools for indexing and retrieving, which are called Information Retrieval Systems. However, due to the immature constructing stage, there is much space left for Information Retrieval System to develop. The increasing development of advanced multimedia applications requires new technologies for organizing and retrieving by content databases of still digital images or digital video sequences. At this point, the paper expounds a new way--- image retrieve.
2.Traditional retrieval systems
2.1. How do retrieval system works
There are three parts in traditional retrieval system: user retrieval interface, indexing spider and database engine.
2.1.1 Indexing Spider
In [5], indexing spiders just like browsers, request and retrieve documents from web servers. But unlike browsers, they do it not for viewing by humans but for automatic indexing and inclusion into their database. They do it around the clock. Each new document encountered by the spider is scanned for links, and these links are either traversed immediately or scheduled for later retrieval. Theoretically, by following all links starting from a representative initial set of documents, a spider will end up having indexed the whole Web.
Almost all systems allow users to add their URLs to the database for spidering. Some of them retrieve submitted documents immediately, others schedule them for future scanning. Spiders update their databases by revisiting sites they've already indexed. The update periods have been varied from one week to several months.
2.1.2. How to design for retrieval systems
All searches on the Web are being done via keywords, so it is probably the most important requirement to make sure that your documents contain all the keywords that are likely to be used to find the document. Two distinct strategies can be outlined in this respect [6].
The first idea is simple: the more keywords are hit in a page, the better. Therefore, it necessary to think about all possible synonyms, variants, generic inclusive terms, subterms, and related concepts for words. Besides, the keywords can be entered in a different grammatical form, such as plural instead of singular for nouns. So the document should contain the most common collocations of the main keyword with closely related nouns, adjectives, verbs, and so on.
The second idea is that one of the factors in results ranking, as implemented by major systems is frequency, which is computed as the number of keyword occurrences divided by the document size. One consequence of this calculation is that if two documents contain the same keyword (located at the same distance from the top of document), the one that is smaller in size will get a higher ranking.
These two keyword strategies correspond to the two types of search queries, specific and general searches. Some retrieval system users are looking for very specific information: they use rare keywords, phrase searches, and various advanced features such as Boolean operators.
2.1.3. User interface
The user interface is the visible part for the users to generate search queries.
All major retrieval systems have, besides the simplest form of query with one or several keywords, some additional search options. However, the scope of these features varies significantly, and no standard syntax for invoking them is yet established. Among the most common search options are:
Boolean operators: AND (find all), OR (find any), AND NOT (exclude) to combine keywords in queries.
Phrase search: Looking for the keywords only if they're positioned in the document next to each other, in this particular order.
proximity: Looking for the keywords only if they're close enough to each other (the notion of "close enough" ranges from 2 in-between words for WebCrawler to 25 words for Lycos).
media search: Looking for pages containing Java applets, Shockwave objects, and so on;
special searches: Looking for keywords or URLs within links, image names, document titles;
various search constraints: Limiting the search to a time span of document creation, specifying a document language (Alta Vista), and so on.
In the future, retrieval systems may offer more sophisticated options, although for now, their search interfaces seem to be developing in another direction, described in the following subsection.
2.2. Ranking and results
All retrieval systems rank their results so that more relevant documents are at the top of the list [5]. This sorting is based on the frequency of keywords within a document, and the distance of keyword occurrences from the beginning of the document. In other words, if one document contains two matches for a keyword and another is identical but contains only one, the first document will be closer to the top of list. If two documents are identical except that one has a keyword positioned closer to the top, it will come first.
Usually, lists of search results contain document titles, URLs, summaries, sometimes dates of the document creation and document sizes. For compiling document summaries, several approaches have been developed.
Many retrieval systems use META descriptions provided by page authors, but when META data is unavailable, they usually take the first 100 or 200 characters of page text. Excite stands apart by ignoring META tags altogether and employing a sophisticated algorithm that extracts sentences and presents them as the page's summary. 2.3. the typical traditional retrieval systems developed by us
We will expound the frameworks, methods and flowing charts of traditional Retrieval System, based on the handsome experience of AIL --- the Multi-Coding Intelligent Retrieval System, which was developed by Wei Yan [4]. Figure 1. Shows the flow control of the project.
Build up the search object Search from the original URL
Find the free slot and generate the search threads
User retrieval No free slot AND URL interface list is empty
理的 WWW 地 NO 址队列 为空 Parse YES pages and segementation oracle 数据存入数据库
User stops searchNO continue to search
Yes 理的 WWW 地址 队列 为空 Stop search
Fig.1 flow control of program
AIL adds some features on the basis of the traditional retrieval system:multi-codes support, flexible searching methods, the various searching grammar、subject-oriented and persoanlizeation for users. The user interface of “AIL” can be divided into general search retrieval interface and user temple based interface. The general search retrieval interface is composed of simple and advanced search modules.
Simple interface
Fig. 2 AIL simple search interface
advanced search interface
Fig.3 AIL advanced search interface Fig. 4 the search results of AIL
user template search interface
Fig. 5 the user template search interface In the advanced search interface (shown in Figure 3), users can input key words and sentences. Besides, users also can choose the search parameters, for example, the author of document, subjects, the size of file, and the modified date of files. Based on these parameters, system can generate dynamic retrieval sentence. Further, system rearranges the data in database by professional fields. If users choose the professional templates in interface, system will retrieve data from according professional databases, which fastens the retrieval speed and realizes the idea of subject-orientation.
The results include the document number, titles, size and created date, abstract and whole article. Besides, AIL ranks the results by hitting.
2.4. the shortcomings of traditional retrieval system
There are some shortcomings of traditional retrieval systems:
Traditional retrieval method can not provide retrieval on the semantic level.
Some spiders cannot understand frames. Spiders always dies when sparse the frames in the HTML pages.
Retrieval systems cannot yet make heads or tails of any images, audio or video clips, so these bits of information are wasted. What remains is pure HTML source, of which spiders additionally strip off all markup and tags to get to the bare-bones plain text.
3. Image retrieval system
With the recent growth of the World Wide Web, many image and video storage and retrieval systems are being ported to the Web to give the public easy access. Currently people develop a new prototype of retrieval system --- image and video retrieval system. An image and video storage and retrieval system is an application that manages, stores and provides tools for the retrieval of images and videos.
3.1. Content-Based Query
The basic idea of content-based query is that when the user can provide a description of some of the prominent visual features of an image or video a mechanism is available by which the computer can search the archive and return the images and videos that best match the description. Typically, research on content-based query has focused on the visual features of color, texture and shape. For example, in [2], the IBM Query By Image Content (QBIC) project proposes and utilizes feature sets that capture the color, texture and shape of image objects that have been segmented manually. Texture and color features are also utilized that describe the global features of images.
The components of image retrieval system include: a graphical user interface a server application for receiving and processing queries an image retrieval server an image archive index files that index the images in the archive by visual features
In [2], the system provides the user with a graphical user-interface. The user-interface will collect the user's query and communicate the query to the query server. The Query Server receives the query message from the system interface and translates the query to find the images that match the query. After finding the best image matches to the query, thumbnail presentations of the image matches are to be transmitted back and presented to the user. The user should be able to select from the thumbnails to download the image files.
The query formulated by the user is translated into a query string which is passed to the query server. The query string should be visible, decodeable and adulterable to the user. The visibility and decodability of the query string may help the user to understand how the query information is recorded which may improve the user's ability to use the system. The decodeability condition requires that the query string may be easily interpreted. The user should also be able to both reuse and adulterate the queries by entering the query strings if desired. After the query is executed by the server, the output of the query should include:
indication to the user that the parameters were received correctly
the image thumbnails
indications of the match scores for each image match.
3.2. VisualSEEk
[2] introduces a specific example--- VisualSEEk. VisualSEEk is a visual feature retrieval system, which is developed at Columbia University. VisualSEEk provides a tool with which a person may search for and retrieve images and videos over the Web. The person formulates queries by using the VisualSEEk interface tools to illustrate salient features of the images and videos desired. They query is sent to the server which finds and retrieves to the user the images and videos that best match the visual description in the query. The VisualSEEk interface also provides tools to assign other visual properties to query elements. These include texture, shape, motion and embedded text.
3.2.1. System design Fig. 7 system design
The overall system architecture will consist of three parts [3]:
user-interface
network handling
server programs
The user-interface will consist of a Web browser. The browser will have the ability to connect to URLs on the Web, display HTML pages and display and save JPEG images. The VisualSEEk collects the query from the user and send the query string to the Server as parameters of a CGI-BIN URL.
The communication across the network from user to server will be handled entirely by the HTTP protocol. The Common Gateway Interface (CGI) interface used for HTTP will be used to execute the server program on the server machine. The query string will be passed within the URL. The user interface will collect the query from the user and for a query string. The query program will execute by reading the query string file, the database indexes and the database and the query will be performed. The HTML output of the query shall be structured such that image thumbnails appear for each matched image. When the user selects a thumbnail, the corresponding image is downloaded from the server.
The objective for this kind of retrieval system is to create an image and video search function that is easy to use and provides power and flexibility in expression of visual queries. These systems provide an interface through which a person may search for and retrieve images and videos over the Web. The person formulates queries by using the interface to create a query. The query is sent to the server which finds and retrieves to the user the images and videos that best match the visual description in the query. The usable features include color contents and the spatial layout of color regions. For example, the features are used as follows: to retrieve images of ``sunsets'' from the archive, one possible query might be constructed by sketching a yellow circle near the center of the image (for the sun) and filling the upper part of the image with orange (for the sky). The images and videos that best match this query will contain colored regions that closely match the specified color regions in terms of color matches and spatial regions, which should include “sunsets”. Figure 7 shows the interface.
Fig. 7 sunsets retrieval interface
3.3.2. Challenge
However, there are two technical challenges for the systems to provide easy access to the images and videos. First, visual searching tools are primitive. Typically, information describing each image is recorded in text, or using keywords. This requires great human effort in creating the meta-data that enables visual queries. The text descriptions also do not completely or consistently characterize the content of the images and videos. Second, the relatively large data sizes of images and videos compared to the communication channel bandwidth prohibits the user from browsing or perusing all but a small portion of the archive at a time. Therefore, the ability to find desired images and videos depends primarily on the capabilities of the query tools provided by the system.
4. The differences between the two systems
The context based image retrieval system is better than traditonal retrieval system in some points: The indexing part of context based image retrieval system is more powerful. Because images retrieve needs more requirements.
The context based image retrieval system can do with the uncompleted and fuzzy queries.
The context based image retrieval system can refine the results.
However, there are still many open research issues to be solved before retrieval systems can be put into practice. In this paper, the context based image retrieval system should implement the following points:
Adopt the template conception into the image retrieve
Just like the ideas in AIL system, based on the templates, users can retrieve the images according to their interests. The image results returned by the systems are from all fields and not well classified. Adopting the templates, users can generate more exact queries to find what they really need.
Combined with the traditional retrieval system
Since the retrieval techniques are more mature than that of the context based image retrieval system. If we add the traditional retrieval system function into the retrieval system, the comprehensive systems will become more powerful. The systems will do both the key words and image researching tasks, thus taking more advantages of the resources in Internet.
More resources
The context based image retrieval system is just in developing. The image resources on WWW need to be enlarged. However, more resources mean that the database and other related modules in systems need to be updated to a higher level, which increase the cost. Therefore more efficient indexing and storing algorithms should be expounded. Section 2. Histogram—Representation of Color Feature in Image Processing (by Yang Li)
In recent years, numerous methods for efficient image indexing and retrieval from image databases have been proposed for digital library and other applications. Low-level visual features such as color, texture, shape and so on are often employed to search relevant images based on the query image. Among these features, color constitutes a powerful visual cue and is perhaps the most salient and commonly used feature in color image retrieval systems. Color distribution information can be represented in a number of ways: mean RGB co-variant RGB [7], color clusters [8,9,10], and color names [11,12].
However, in image retrieval Color Histogram is the most commonly used color feature representation. Statistically, it denotes the joint probability of the intensities of the three color channels. It is most commonly in the sense that color statistics such as the mean or color clusters are usually calculated from the color histogram. Swain and Ballard proposes Histogram Intersection, a L1 metric, as the similarity measure for the Color
Histogram[13]. To take into account the similarities between similar but not identical colors, Ioka[14] and Niblack et al. [15] introduced a L2-related metric in comparing the histograms.
Searching and locating multimedia data in response to queries need a good description and representation of multimedia information. Mahmood and Tanveer proposed two ways of capturing color content in images called Color Histogram and
Region Color [16]. These descriptors are suitable for a wide variety of applications requiring image-to-image matching and object-to image matching. For the color histogram descriptor, they propose a matching method based on the perceptual distance between colors. They propose a color similarity matrix that gives a quantitative comparison value of a color cell with every other color cell. A weighted Euclidean distance of colors is then take to match color histograms [14]. When the goal is to find a query embedded in an image based on the color of one or more of its regions, a region color, which is illumination and pose-invariant to account of the different appearances of the query object in images of the database, is introduced. Such a descriptor allows the localization of the query containing regions through suitable segmentation. And the region color can support cross-modal queries such as using a skin color query to retrieve videos depicting a person talking, and it allows direct data manipulation since the matching of a color region retrieves the relevant image automatically. The utility of these descriptors can be demonstrated through experiments on the MPEG-7 dataset.
Due to the simplicity of color histogram [17,18], it remains the most commonly used method of this task. However, the lack of good perceptual histogram similarity measures, the global color content of histograms, and the erroneous retrieval results due to gamma nonlinearity, call for improved methods [19]. Androutsos,
Plataniotis and Venetsanopoulos present a new scheme which implements a recursive
HSV-space segmentation technique to identify perceptually prominent color areas. The average color vector of these extracted areas are then used to build the image indices, requiring very little storage. Their retrieval is performed by implementing a combination distance measure, based on the vector angle between two vectors. Their system provides accurate retrieval results and high retrieval rate. It allows for queries based on single or multiple colors and, in addition, it allows for certain colors to be excluded in the query.
This flexibility is due to our distance measure and the multidimensional query space in which the retrieval ranking of the database images is determined. Furthermore, their scheme proves to be very resistant to gamma nonlinearity providing robust retrieval results for a wide range of gamma nonlinearity values, which proves to be of great importance since, in general, the image acquisition source is unknown.
Furthermore, considering that most Color Histograms are very sparse and thus sensitive to noise, a widespread representation of image color content uses a color clustering technique based on a single color histogram giving the frequency of occurrence of every color quantizing the color space. Stricker and Orengo propose to use the cumulated Color Histogram. Their research results demonstrated the advantages of the proposed approach over the conventional Color Histogram approach [18].
Colombo and Genovesi introduces a method for extending the use of image histograms to characterize the local color properties of an image and better preserve its intrinsic geometric information [20]. The method uses a set of color histograms to represent an image through a variable number of regions, each with a well-defined and homogeneous color distribution. The extended representation is extracted by embedding the histogram intersection operator in standard segmentation techniques as a measure of color distribution homogeneity between two image regions. As such, the novel representation can capture specific objects contained in the image, whose distribution differs significantly from the global image color distribution. Besides, it is not required that each region have a dominant color: multimode distributions are also successfully classified as homogeneous regions. Segmenting the image using color distributions makes it also possible to obtain additional features, related to the geometric characteristics of each region and the spatial relationships between pairs of regions. All these descriptions are said to be “induced” by the color distribution, which remains the main feature of the representation. Once a query image directly reflecting the user’s current retrieval task is produced through a graphic interface, it is processed as above to obtain an internal representation suitable for image search inside a pictorial database.
The metric of similarity assessment of a global similarity score. Aiming at improving and exploiting the user’s knowledge of system behavior, they introduce “internal query manipulation”, which is based on accessing and manipulating through graphics directly the internal query, thus complementing traditional query composition and refinement modes such as query by sketch, query by example and relevance feedback.
Object recognition in images is always based on a model of the object at some level of abstraction. And rigidity is one interesting dimension of abstraction. Near one end of this dimension are the several object recognition algorithms that abstract objects into a rigid or semi-rigid geometric juxtaposition of image features. These include
Hausdorff distance [21], geometric hashing [22], active blobs [23], and eigenimages
[24,25]. In contrast, histogram-base approaches abstract away (nearly) all geometric relationships between pixels. In pure histogram matching, e.g. Swain & Ballard [13], there is no preservation of geometry, just an accounting of the number of pixels of given colors. The technique of Funt & Finlayson [26] uses a histogram of the ratios of neighboring pixels, which introduces a slight amount of geometry into the representation.
Peng Chang and john Krumm introduces some geometric representation into the color histogram by using a histogram of the cooccurrences of color pixels[27]. The color cooccurrence histogram (CH) keeps track of the number of pairs of certain colored pixels that occur at certain separation distances in image space. By adjusting the distances of cooccurrences, they can adjust the sensitivity of the algorithm to geometric changes in the object’s appearance such as caused by viewpoint change or object flexing. The CH is also robust to partial occlusions, because they do not require that the image account for all the cooccurrences of the model. A significant part of the paper is devoted to understanding the algorithm’s false alarm probability, which shows a principled way of choosing the algorithm’s adjustable parameters. The first theoretical false alarm analysis of histograms both cooccurrence and regular) for recognizing objects. The approach discussed in the paper is similar to other histogram-based approaches, most of which are used to find images in a database rather than\ to find an object in an image. Those approaches share an attempt to add spatial information to a regular color histogram.
Huang et al.[28] use the “color correlogram” to search a database for similar images. The correlogram is essentially a normalized version of Peng and John’s CH. Pass and Zabih
[29] use “color coherence vectors” hat represent which image colors are par of relatively large regions of similar color.
6.Histogram modification and in particular histogram equalization is the one of the basic and most useful operations in image processing, especially enhancement of contrast. Basically contrast enhancement techniques are divided into global and local histograms. An early attempt to introduce shape criteria in contrast enhancement was done in [30]. Mathematical Morphology School [31] argues that the basic operations on images should be invariant with respect to contrast changes, such as honomorphic transformations. As a consequence the basic information of an image is contained in the family of its binary shadow or level-sets in the family of sets Xu : = { x: u(x)> },
for all values of in the range of u. And under fairly general conditions, an image can be reconstructed from its level-sets by the formula u(x)=sup{ : x Xu}. If h is a strictly increasing function, and the transformation v=h(u) does not nodify the family of level-sets of u, then it only changees its index in the sense that
Xh()v = Xu for all .
The formalization of multiscale analyses given in [32] leads to a formulation of recursive/causal/local morphological and geometric invariant filters in terms of solutions of certain partial differential equations of geometric type, which provides a new view on many of the basic mathematical morphology operations. One of their basic assumptions was the locality assumption which aimed to translate into a mathematical language. The basic operations which are taken into consideration are a kind of local average around each pixel, that is, only a few pixels around a given sample influence the output value of the operations. But this excludes the case of algorithms as histogram modification, and the operations like those in [33] are not modeled by these equations. G. Sapiro and V.
Caselles show in their paper that the histogram can be modified to achieve any given distribution. The modification can be performed while simultaneously reducing noise, thus avoiding the noise sharpening effect in classical algorithms. Their approaches are extended to local contrast enhancement as well. One of the advantage of the use of
PDE’s for image processing is the possibility to combine algorithm, which is successfully used for example in [34], and the smoothing operator in [35] and the debluring one in
[36] are combined together. Other advantage of this methodology is the accuracy achieved when efficient numerical implementations are used. They present a novel PDE for histogram modification and show how to obtain any grey-level distribution, and then combine it with the smoothing operator proposed in [37], obtaining contrast normalization and denoising at the same time. C. Vicent, L. Jose-Luis, M. Jean-Michel and S. Guillermo propose a novel approach for shape preserving contrast enhancement, which is a particular case of homomorphic transformation [38]. They realize the contrast enhancement by means of a local histogram equalization algorithm, because global histogram modification not always produces good contrast, especially small regions which are hardly visible after such a global operation. The scheme they introduced is based on the grey-values and spatial relations between pixels in the image, and following mathematical morphology, constitute the basic objects in the scene. Both example of grey-value and color images are presented. Their approach attains both shape- preservation property of global techniques and the contrast improvement quality of local ones.
One disadvantage of color indexing based on color distribution histogram is that the speed of the system is deirctly related to the size of the histogram to be indexed.
And it is relatively more expensive to match or compare color histogram. J. Berens, G.
D. Finlyason and G. Qiu [39] show that color histograms contain highly correlated information and so they can be effectively compressed: they can be represented by a few numbers. They show how color histogram can be effectively compressed and how compressed color histograms can be compared for indexing. As such, color histogram comparison is no slower than any other color-based indexing method. They make two important contributions. First, they show that an opponent color histogram can be compressed more readily than can conventional color space. Secondly, they use the standard transform encoding methods (the Kkarhunen-Loeve transform, the discrete cosine transform, the Hadamard transform and hybrid transforms) to compress color histograms. Experiments show that compressing rates of up to 250:1 are possible without affecting indexing performance. This means that a database can be searched that is 250 times larger in the same time as that searched by conventional indexing.
Section 3. Image Compression (by Xudong Ni)
Recent years have significant progress in the compression of still images and motion image. Image compression technology has dramatically widened the application of still digital images, this section briefly introduces the goals, principles and methods of still image compression: transform coding, subband coding, we focus on the basic ideas, principles and processing steps of some methods. We will introduce some recent works about image compression JPEG2000. Finally, this section also briefly introduces motion image compression: MPEG
3.1 Object Quality Measure
Digital images have broad application to areas such as Internet browsing, TV transmission, video conferencing, transmission of remotely sensed images and printing, but the vast amount of data required to represent a digital data image restricts these applications. Application of digital images often is not viable due to high storage or transmission costs. Image compression technology offers a possible solution. The basic goal of image compression is to reduce the bit rate of an image to minimize the communication channel capacity or digital storage memory requirements while maintaining necessary fidelity in the image, or, equivalently, to obtain the best possible fidelity for a given bit rate. The bit rate is measured in bits per sample or bits per pixel
(bpp). The raw (uncompressed) bit rate is typically 8 bits per pixel for a gray-level and 24 bits per pixel for a color image with three 8-bit components. Fidelity can be judged by quantitative criteria such as mean square error (MSE) or peak signal-to-noise ratio
(PSNR)[40] between the original image and the reconstructed image. MSE is given by N 1 M 1 1 2 MSE NM (aij aij ) i0 j0 and PSNR is given by
max | a |2 PSNR 10log ij 10 MSE
where the aij are the elements of the original image and the aij are the elements of the reconstructed image. N and M are the dimensions of the image. Generally, the smaller the
MSE, the higher the PSNR and the better the image quality.
The fidelity of the reconstructed image can also be judged by subjective criteria such as the statistically based acceptability tests for specific applications or viewer quality ratings. For example, five-point scales of quality (bad, poor, fair, good, excellent) are sometimes used. However, quantitative criteria are often used when evaluating image compression methods because of the testing cost inherent to subjective tests.
3.2 The principles of image compression
Almost all methods of image compression are based on two fundamental principles.
The first principle is to exploit the properties of the signal sources, e.g., the statistical property, and to remove redundancy from the signal. This approach is called redundancy reduction. Almost all sampled signals in coding are redundant because
Nyquist sampling typically tends to preserve some degree of intersample correlation. This redundancy is reflected in the form of a nonflat power spectrum. Greater degrees of nonflatness lead to greater gains from redundancy removal. These gains are also referred to as prediction gains or transform coding gains, depending on whether the redundancy is processed in the time domain or frequency (or transform) domain. The second principle is to exploit the properties of the signal receiver (usually the human visual system) and to remove parts or details of the signal that will not be noticed by the receiver. This approach is called irrelevancy reduction. The idea is to quantize the sample or transform coefficients just finely enough to leave an imperceptibly distorted result, even though the quantized quantity is not mathematically zero. If the available bit rate is not sufficient to realize this kind of perceptual transparency, the intent is to minimize the perceptibility of the distortion.
Different image compression methods are based on either redundancy reduction or irrelevancy reduction while most methods exploit both. The parts of a coder that process redundancy and irrelevancy are separate in some methods, while in other methods they cannot be easily separated.
3.3 Classification of Image Compression Methods
The classification of image compression methods can be made by different characteristics. A widely acceptable classification can be made as information-lossless or information-loss techniques.
3.3.1 Lossless compression techniques
Lossless is also called noiseless coding, entropy coding or data compaction. The most popular lossless compression techniques are
Huffman coding[41]
Arithmetic coding[42]
The latter has produced 5-10% better compression than Huffman coding but is also generally more complex. Lossless compression methods are often based on redundancy reduction. With lossless compression, the compressed data can be exactly restored so as to be identical to the original.
3.3.2 lossy compression techniques
with lossy compression, the original image cannot be exactly recovered from the compressed data, which is only an acceptable approximation to the original. Lossy image compression methods are often based on irrelevancy reduction,
though practical image compression methods usually exploit both redundancy reduction and irrelevancy reduction. The lossy image compression methods include the following subclasses:
scalar quantization, including Pulse Coding Modulation (PCM) and
Differential PCM (DPCM).
Transform coding, including: Discrete Cosine Transform (DCT), etc.
Subband /Wavelet coding
3.4 DCT-Based Transform Coding
The basic motivation behind transform coding is to transform a set of pixels or samples from the spatial domain into another set of less correlated (or more independent) coefficients in the frequency domain, so that the frequency domain coefficients can be encoded more efficiently. Since DCT and DPCM have been briefly introduced in the
Advanced Information Processing class. We attempt to focus on the intuitive ideas of most recent compression technique: Subband/ Wavelet coding.
3.5 Subband/Wavelet Coding(SBC) 3.5.1 Basic Idea of Subband Coding
Within a typical image, although areas of significant spatial activity are usually apparent, there also exist extensive regions where detail is slowly varying or even substantially uniform. Given that, by standard analytical methods, rapidity of spatial variation can be expressed in terms of spatial frequency components. We are then led to the conclusion that image data has a strongly low-pass spectrum. Thus, it is wasteful to expend as much effort coding insignificant segments of the spectrum as on processing those spectral regions in which the data energy is concentrated. Therefore, an intuitive approach to coding might be to split the image into different frequency bands and apply efficient techniques to the individual sub-bands. This is the motivation of subband image coding. Subband coding was first introduced in speech coding by Crochiere et al. [43] in
1976. The basic idea is to divide the frequency band of the signal and then to code each subband with either PCM or DPCM using a coder and bit rate accurately matched to the statistic of that band. The idea of subband coding was extended to image coding by
Woods and O'Neil in 1986 [44]. The basic idea of subband image coding is to split the image signal into frequency bands of equal or equal bandwidth, and then encode these subbands independently according to the signal energy contained in that band.
3.5.2 Subband Filtering and Decomposing
The essential step of subband image coding is the decomposition of the signal into the various frequency bands by means of a subband filter bank. Then, different methods can be applied to encode the subbands.
M(x) x
I-M(x) Filter bank with morphological filter yielding perfect reconstruction when M(x) is a generalized half-band filter For perfect reconstruction, subband filter bank must obey certain design rules.
The Quadrature Mirror Filter(QMF) is one of these filters, QMF was introduced in subband by Esteban and Galand [45]
Morphological Subband Decomposition (MSD) is another subband filter introduced by Olivier Egger[46], the following figure is the results using MSD filter.
MSD decomposed image, “Lena” two subband MSD decomposed image, “Lena” seven subband
3.5.3 Advantage of subband Coding
(1) It supplies a scalable image representation method and facilitates progressive
transmission: In the context of image coding, scalability means that the
transmitted bit-stream can be decoded hierarchically. That is, a low-resolution
version of the transmitted image can be decoded with few operations, and the full
resolution image will only be decoded if necessary or desired. This facilitates
progressive transmission, where low resolution data is sent first and further detail
appears gradually. The SBC technique makes high-definition and low definition
systems compatible, in that the low definition receivers can simply ignore the
high resolution subbands. (2) It has good subjective error properties: Since quantization is performed separately
for each subband, SBC allows a more flexible design of the coding scheme which
gives good subjective error properties. Different subbands can be allocated
different bit rates. Therefore, through an appropriate bit allocation strategy among
subbands, subjectively superior performance can be obtained when compared to a
fullband coding scheme.
(3) It has good SNR performance: Compared to adaptive DCT, subband coding has
the best SNR performance at all bit rates in the range 0.67-2.0 bpp [50]. The
reconstructed image using SBC is also without blocking effects which may appear
when an image is coded by using block transform coding, such as DCT-coding.
We will compare performance of DCT and SBC in the later part of this survey.
3.6 Overview of JPEG 2000
The International Standards Organization's JPEG2000 committee[47] has
finalized specs for a new algorithm that compresses images up to 200 times with no
appreciable degradation in quality. The JPEG2000 spec, which will become ISO
15444 when it's officially approved in 2001, major change from the current JPEG is
that wavelets will replace DCT as the means of transform coding.
Among many things it will address:
o Low bit-rate compression performance,
o Lossless and lossy compression in a single codestream,
o Transmission in noisy environment where bit-error is high, o Application to both gray/color images and bi-level (text) imagery, natural
imagery and computer generated imagery, o Interface with MPEG-4, o Content-based description.
JPEG2000 image
(middle) shows
almost no quality
loss from current
JPEG, even at 158:1
compression.[48] 3.7 Overview of MPEG
Moving Picture Experts Group [49] (MPEG) is a working group of ISO/IEC in
charge of the development of standards for coded representation of digital audio and
video. Established in 1988, the group that produced MPEG-1, the standard on which
such products as Video CD and MP3 are based, MPEG-2 the standard on which such
products as Digital Television set top boxes and DVD are based and MPEG-4, the
standard for multimedia for the web and mobility. The current thrust is MPEG-7
"Multimedia Content Description Interface" whose completion is scheduled for July
2001. Work on the new standard MPEG-21 "Multimedia Framework" has started in
June 2000 and has already produced a Draft Technical Report. Several Calls for
Proposals have already been issued.
3.8 Summary
We have introduced some elementary concepts of image compression and some basic image compression methods. DCT-based transform coding is widely used due to its simplicity and standardization originated from JPEG. Subband coding is very efficient at the bit rate from 0.5-2.0 bpp and quite suitable to progressive transmission situations.
This section cannot provide complete descriptions or exhaustive studies of image compression methods. It is hoped, however, that it will provide readers with an introduction to image compression in the recent and current works in image compression area. Reference:
[1]Y. Rui, T. Huang, S. Mehrotra, and M. Ortega. A relevance feedback architecture for content-based multimedia information systems. In Workshop on Content Based Access of Image and Video Librariec, Porto Rico, June 1997 [2]John R. Smith and Shih-Fu Chang. VisualSEEk: a Content-Based Image/Video Retrieval System, System Report and User's Manual, version 1.0 beta [3]EDOARDO ARDIZZONE, MARCO LA CASCIA. Automatic Video Database Indexing and Retrieval. 1997 Kluwer Academic Publishers [4]Wei Yan. The mechanisms and implement of Intelligent Information Retrieval System, The master thesis of Tianjin University. May 2000 [5]http://www.peachpit.com/books/catalog/69642.html,Search Engines for the World Wide Web (2nd Edition),Peachpit Press, ISBN 0-201-69642-8 [6]http://websearch.miningco.com/library/weekly/aa010899.htm,Super Searchers' Search Secrets,Mining Co. Web Search Guide, Jan. 1, 1999 [7] Finlayson, G.D., Chatterjee, S.S., and Funt, B.V.: “Color angular indexing,” Proceedings of the European conference on Computer vision, April 1996, pp. 16-27. [8] Uchiyama, T., and Arbib, M.A.: “Color image segmentation using competitive learning,” IEEE Trans. Pattern Anal. Mach. Intell., 1994, 16, (12), pp. 1197-1206 [9] Rubner, Y.: “Perceptual metrics for image database navigation,” PhD thesis, Dept. of Computer Science, Stanford University, 1999 [10] Selim, S.Z., and Ismail, M.A.: K-means-type algorithms: “A generalized convergence theorem and characterization of loca optimality,” IEEE Trans. Pattern Anal. Mach. Intell., 1984, PAMI-6, (1), pp. 81-87 [11] Syeda-mahmood, T.F.: “Data and model-driven selection using color regions,” Int. J.Computer Vis., 1997, 21, (1/2), pp. 9-36 [12] Mehter, P.M., Kankanhalli, M. S., Narasimhalu, A.D., and MAN, G. C.:” Color matching for image retrieval,” Pattern Recognit. Lett., 1995, 16, pp. 325-331 [13] Swain M., Ballard D. “Color Indexing,” International Journal of Computer Vision, vol. 7, n. 11, 1991. [14] M. Ioka, A method of defining the similarity of images on the basis of color information, Technical report, Tech.Report RT-0030, IBM Tokyo Research Lab., 1989. [15] W. Niblack, R. Barber, and et al. “The QBIC project: Querying images by content using color, texture and shape,” In Proc. SPIE Storage and Retrieval for Image and Video Databases, Feb 1994. [16] Tanveer Syeda-Mahmood; Dragutin Pekovic; “On describing color and shape information in images” Signal Processing: Image Communication, vol. 16, no. 1, pp. 15- 31, Sep 2000. [17] X. Wan and C.-C.Jay Kuo, “Color distribution analysis and quantizaiton for image retrieval,” in Storage and Retrieval for Image and Video Databases IV, SPIE-2670, pp. 8-16. 1995. [18] M. Sticker and M.Orengo, “Similarity of color images,” in Storage and Retrieval for Image and Video Databases III, SPIE-2420, 1995, pp. 381-392. [19] Androustsos, D; Plataniotis, KN; Venetsanopoulos, AN; “A Novel Vector-Based Approach to Color Image Retrieval Using a Vector Angular-Based Distance Measure,” Computer Vision and Image Understanding, vol. 75, no. 1, pp. 46-58, 1999. [20] C Colombo, I Genovesi, “Image Querying and Retrieval by Multiple Color Distributions” Proceedings of Image and Video content based retrieval, pp.19-26, 1998. [21] D. P. Huttenlocher, G. A. Klanderman, and W. J. Ricklidge, “Comparing Images Using the Hausdorff Distane,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15, pp. 850-863, 1993. [22] Y.Lamdan and H.J. Wolfson, “Geometric Hashing: A General and Efficient Model- Based Recognition Scheme,” presented at Second International Conference on Computer Vision, Tampa, Florida, 1988. [23] S. Sclaroff and J. Isidoro, “Active Blobs,” presented at Sixth International Conference on COMPUTER Vision, Bombay, India, 1998. [24] M.Turk and A.Pentland, “Eigenfaces for Recognition,” Journal of Cognitive Neuroscience, vol. 3, pp. 71-86, 1991. [25] H.Murase and S.K.Nayar, “Visual Learning and Recognition of 3-D Objects from Appearance,” International Journal of Computer Vision, vol. 14, pp. 5-24, 1995. [26] B.V. Funt and G. D. Finlayson, “Color Constant Color Indexing,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, pp. 522-529, 1995. [27] C. Peng and K. John, “Object Recognition with Color Cooccurrence Histograms,” IEEE conference on Computer Vision and Pattern Recognition, Fort Collins, CO, June 23-25, 1999. [28]J. Huang, S. R. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih, “Image Indexing Using Color Correlograms,” presented at IEEE conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, 1997. [29] G. Gass and R. Zabih, “Histogram Refinement for Content-Based Image retrieval,” presented at IEEE Workshop on Applications of Computer Vision, Sarasota, Florida, 1996. [30] R. Cromatie and S. M. Pizer “Edge-affected context for adaptive contrast enhancement,” Proc. Information Processing in Medical Imaging, Lecture Notes in comp. Science 511 pp. 474-485 July, 1991. [31] J. Seral, “Image Analysis and Mathematical Morphology,” Academic Press, New York, 1983. [32] L. Alvarez, P. L. Lions, “Axioms and fundamental equations of image processing,” Arch. Rational Mechanics and Anal. 16: IX pp. 200-257, 1993. [33] G. Sapiro and V.Casselles “Histogram modification via differentil equations,” Journal of Differential Equations 135:2 pp. 238-268 1997. [34] L. Alvarez and L. Mazorra, “Signal and Image restoration by using shock filters and anisotropic diffusion,” SIAM J. Numer. Anal., 1994 [35] L. Alvarez, P. L. Lions, and J.M. Morel, “Iimage selective smoothing and edge detection by nonlinear diffusion,” SIAM J.Number. Anal. 29, pp. 845-866, 1992. [36] S. Osher and L. I. Rudin, “Feature-oriented image enhancement using shock filters,” SIAM J. Number. Anal.27, PP. 919-940, 1990. [37] G. Sapiro and A.Tannenbaum, “Edge preserving geometric enhancement of MRI data,” EE-TR, University of Minnesota, April 1994. [38] Caselles, Vicent; Lisani, Jose-Luis; Morel, Jean-Michel; Sapiro, Guillermo, “Shape Preserving Local Histogram Modification,” IEEE Transactions on Image Processing, vol. 8, no. 2, pp. 220-230, Feb 1999. [39] J. Berrens, G. D. Finlayson, G.Qiu, “Image indexing using compressed color histogram,” IEEE Proceedings, Vision, image and signal processing, vol. 147, no. 4, 2000, pp. 349-355. [40] Yang.C, Shong.G, and Zhang.C “A subband Coding Aiming at Real-Time Image Transmission”, Journal of Image and Graphics, China, 2000 Vol.5 No.3 P.191-195 [41] D.A. Huffman, "A method for the construction of minimum redundancy codes," In Proc.IRE, vol. 40, pp. 1098-1101, 1962. [42] W. B. Pennebaker, et al., "Arithmetic coding articles," IBM J. Res. Dev., vol. 32, no. 6, pp.717-774, Nov. 1988. [43] R. E. Crochiere, S. A. Webber, and J. L. Flanagan, "Digital coding of speech in subbands,"Bell. Syst. Tech. J., vol. 55, pp. 1069-1085, Oct. 1976. [44] J. W. Woods, and S. D. O'neil, "Subband coding of images," IEEE Trans. Acoust., Speech,Signal Processing, vol. ASSP-34, no. 5, pp. 1278-1288, Oct. 1986. [45] D. Esteban, and C. Galand, "Application of quadrature mirror filters to split band voice coding schemes," in Proc. ICASSP, pp. 191-195, May 1977. [46] Egger Olivier, Li Wei, Kunt Murat. High compression image coding using an ad aptive morphological subband decomposition. IEEE Proceedings,1995,83:272-287. [47] “A overview of JPEG 2000” http://citeseer.nj.nec.com/264144.html [48] “JPEG2000 wavelet compression spec approved” http://www.eetimes.com/story/ OEG19991228S0028 [49] Moving Picture Experts Group; http://www.cselt.it/mpeg/ [50] J. W. Woods, and S. D. O'neil, "Subband coding of images," IEEE Trans. Acoust., Speech,Signal Processing, vol. ASSP-34, no. 5, pp. 1278-1288, Oct. 1986.