GENERIC IMAGE SIMILARITY BASED ON KOLMOGOROV

Nima Nikvand and Zhou Wang

Dept. of Electrical & Computer Engineering, University of Waterloo, Waterloo, ON, Canada Email: [email protected], [email protected]

ABSTRACT in the class [3]. A practically more useful metric, namely the normalized distance (NID), was introduced in Image similarity measurement is a fundamental and com- [4]. To overcome the non-computable nature of Kolmogorov mon issue in a broad range of problems in image process- complexity and NID, an normalized compression distance ing, compression, communication, recognition and retrieval. (NCD) was proposed [4], which is an effective Existing image similarity measures are limited to restricted approximation of NID and has found many successful appli- application environments. The theory of Kolmogorov com- cations in the fields of bioinformatics, pattern recognition, plexity and the related normalized information distance (NID) and natural language processing. measure provide an attractive theoretic framework for generic Nevertheless, the application of NID for image similarity image similarity that is applicable to any scenario. While this measurement is still in its early stage. Several authors have is appealing, the difficulty lies in the implementation due to done pioneering work that applied the NID framework and the the non-computable nature of Kolmogorov complexity. In NCD algorithm to image clustering [5], image distinguisha- this paper, we propose a practical framework to approximate bility [6], content-based image retrieval [7] and classifi- NID, where the key is to find the shortest program within a cation problems [8], but most of them reported only moderate set of potential transformations that convert one image to an- success. Moreover, due to their focuses on specific applica- other and vice versa. As one of the initial attempts in this new tions, the generic property of NID was not fully exploited. and promising research direction, our preliminary experimen- In this paper, we attempt to develop a practical framework tal work demonstrates the wider applicability of the proposed that can effectively approximate NID, where the most critical approach than existing methods. step is to find the shortest program within a list of available Index Terms— image similarity measurement, Kol- transformations that convert one image to another. Based on mogorov complexity, normalized information distance, com- the framework, generic image similarity measures can be built pression distance that have much wider applicability than existing image simi- larity measures. 1. INTRODUCTION 2. KOLMOGOROV COMPLEXITY AND Measuring the similarity between two images is a funda- NORMALIZED INFORMATION DISTANCE mental issue in many problems throughout the entire field of image processing and machine vision. These include image The Kolmogorov complexity [3] of an object is defined to be restoration, denoising, coding, communication, interpolation, the length of the shortest program that can produce that object registration, fusion, classification and retrieval, as well as on a universal and halt: object detection, recognition and tracking. Many existing K(x) = min l(p) . (1) image similarity measures were proposed to work with very p:U(p)=x specific types of image distortions (e.g., JPEG compression) [1]. There are also methods such as the structural similarity In [4], the authors assume the existence of a general decom- (SSIM) index [2] that are applicable to a wider range of ap- pressor that can be used to decompress the presumably short- plications. However, even these “general-purpose” methods est program x∗ to the desired object x. However, they note [1] are still limited in their application scopes. For example, that due to the non-computability of this concept, a compres- SSIM does not apply or work properly when significant geo- sor that does the opposite does not have to exist. metric changes (e.g., enlargement by a factor of 2, or rotation The conditional Kolmogorov complexity of x relative to by 90 degrees) exist between the two images being compared. y is denoted by K(x|y). An information distance between x The theory of Kolmogorov complexity provides solid and y can then be defined as max{K(x|y, K(y|x)}, which groundwork to build a universal and generic information dis- is the maximum of the length of the shortest program that tance metric between any objects that minorizes all metrics computes x from y and y from x. To convert it to a normalized p symmetric metric, a novel NID measure was introduced in associated with a parameter compressor, and Ci denotes the [4]: parameter compressor of the i-th transformation. We can then max{K(x|y∗),K(y|x∗)} define our conditional compressor as NID(x, y) = . (2) max{K(x),K(y)} p CT (y|x) = min{C[y − Ti(x)] + Ci [p(Ti, x)] + log2(N)} , It was proved that NID is a valid distance metric that satisfies i (6) the identity and symmetry axioms and the triangular inequal- where C remains to be a practical image compressor which ity [4]. encodes the difference between y and the transformed image The real-world application of NID is difficult because T (x), and the log (N) term computes the number of Kolmogorov complexity is a non-computable quantity [3]. i 2 required to encode the selection of one out of N potential By using the fact that K(xy) = K(y|x∗) + K(x) = transformations. K(x|y∗) + K(y) (subject to a logarithmic term), and by The idea of finding the simplest transformation between approximating Kolmogorov complexity K using a practi- two images is sensible from the viewpoint of human visual cal data compressor C, a normalized compression distance perception, for which it has long been hypothesis that the bi- (NCD) was proposed in [4] as ological visual system is an efficient coder of the visual world C(xy) − min{C(x),C(y)} [9]. For example, given two images that are rotated copy of NCD(x, y) = . (3) max{C(x),C(y)} each other, our visual system would not interpret the differ- ence between them by directly differencing their intensity val- NCD has been proved to be an effective approximation of ues (which requires a large number of bits to encode the resid- NID and achieves superior performance in bioinformatics ap- ual), but by estimating the amount of rotation (which can be plications such as the construction of phylogeny trees using coded very efficiently). DNA sequences [4]. 4. IMPLEMENTATION 3. THE PROPOSED FRAMEWORK An advantage of NCCD (as opposed to NCD) is that it pro- When NCD was used to quantify image similarities, it did vides a more flexible framework so that different types of not achieve the same level of success as in other application transformations can be included. The list of transformations fields. For example, it was reported in [6] that NCD works can also be incremental, in the sense that new transforma- well when parts are added or subtracted from an image, but tions, when available, can be easily added into the existing struggles when image variations involve form, material and list, and expanding the list always improves the approxima- structure. We believe that this is mainly due to the poor ap- tion of NCCD to NID. Of course, exhausting all possible proximation of K(xy) using C(xy), which is often imple- transformations is practically impossible. However, by go- mented by applying a regular image compressor to the con- ing through a handful of transformations, it may be sufficient catenation of two images. For example, when an image is a to appropriately cover most image distortions encountered in ninety-degree rotated copy of another, concatenating two im- real-world applications. ages would not facilitate any efficient compression. To avoid Our current implementation of NCCD are as follows. this problem, we propose to approximate the conditional Kol- First, we adopt the content adaptive lossless image compres- mogorov complexity in Eq. (2) directly by designing a condi- sion algorithm (CALIC) [10] as the base image compressor, tional image compressor denoted by CT , so that which achieves superior performance when compared with state-of-the-art algorithms. CALIC is employed in computing K(y|x) ≈ CT (y|x) and K(x|y) ≈ CT (x|y) . (4) the denominator of Eq. (5) as well as the first term in Eq. (6). Since y − Ti(x) in Eq. (6) can generate negative values and This leads to a normalized conditional compression distance CALIC applies to grayscale images with positive intensity (NCCD) measure given by values only, the mean intensity value of y − Ti(x) is shifted to mid-gray level before the application of CALIC. Second, max{C (x|y),C (y|x)} NCCD(x, y) = T T . (5) the types of transformations involved in the computation of max{C(x),C(y)} CT include

It remains to define the conditional compressor CT . Here • Global contrast and luminance change. This is com- we propose a practical solution by making use of a set puted by a pointwise intensity transformation defined of transformations that convert one image to another. Let as s = α(r −r¯)+r ¯+β, where r and s are the intensity {Ti|i = 1, ··· ,N} be the set of transformations, let Ti(x) values before and after the transformation, respectively, represent the transformed image when applying the i-th trans- r¯ is the average value of r, and α and β are the parame- form to image x, and let p(Ti, x) denote the parameters used ters that determine the degrees of contrast and mean lu- in the transformation. Each type of transformation is also minance changes, respectively. In a special case when NCCD = 0.8121, SSIM = 0.4649, NCCD = 0.2189, SSIM = 0.7059, NCCD = 0.3901, SSIM=0.8425, NCCD =0.0091, SSIM = 0.9094, MSE = 73.4 MSE = 49.8 MSE = 78.4 MSE = 105.5

Fig. 1. Comparison of MSE, SSIM and NCCD measures using images distorted by JPEG compression, blur, JPEG2000 compression and contrast reduction.

α = 1 and β = 0, it reduces to an identity transform, i.e., because the values of K(x|y) and K(y|x) can be drastically T (x) = x. different (and so do the values of CT (x|y) and CT (y|x)). For example, converting the “Lena” image x to a blank image y • Global Fourier power spectrum scaling. This trans- is easy (as y can be created by a very short program), but the formation attempts to match two images by scaling opposite is not. the power spectrum of one image in the Fourier trans- form domain. Let X(ω) and Y (ω) be the Fourier transforms of x and y, respectively. We first find the 5. EXPERIMENT best linear transform parameters p1 and p2, such that 2 k|Y (ω)| − (p1|X(ω)| + p2)k is minimized. We then The goal of our preliminary experimental work is to test the define the transform T (x) as the inverse Fourier trans- applicability of the proposed NCCD implementation for var- form of p1X(ω) + p2. ious distortion types and compare it with existing measures such as the mean squared error (MSE) and SSIM. Figure 1 • Global affine transform. This transformation tries to shows four original images distorted with JPEG compression, match one image by applying a global affine transform blur, JPEG2000 compression and contrast reduction. MSE to another. The transformation can be encoded using appears to be a poor measure in this test, because the best six parameters and covers a variety of image changes quality image (contrast reduced) that does not exhibit any including translation, scaling (zooming in or zooming structural distortion, results in the worst MSE value. Both out), rotation, and shearing. SSIM and NCCD works reasonably well, which give the best • Local registration transformation. This is implemented quality values (highest SSIM and lowest NCCD) to the con- by aligning two images using the coherent point drift trast reduced image. registration algorithm [11] that allows for both rigid In Fig. 2, the test images underwent certain geometric and affine non-rigid transformations. distortions, for which the MSE and SSIM measures do not apply or cannot work appropriately. By contrast, NCCD Given a pair of images x and y for comparison, we attempt produces meaningful similarity evaluations. In particular, it all the above transformations from both x to y and y to x works when the two images are of different size and shape; (multiple transformations are also allowed). This is important it works if parts of an image are missing; it also works when NCCD = 0.0189 NCCD = 0.4901 NCCD = 0.0016 NCCD = 0.021

Fig. 2. Tests using images with geometric and compound distortions. images are shifted and/or the black/white pixels are reverted. 8. REFERENCES All of these demonstrate the wider applicability of NCCD. [1] Z. Wang and A. C. Bovik, Modern Image Quality Assessment. Morgan & Claypool Publishers, Mar. 2006. [2] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, 6. CONCLUSION “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Processing, vol. 13, no. 4, pp. This work aims to develop a generic image similarity measure 600–612, Apr. 2004. based upon the theoretic groundwork of Kolmogorov com- [3] M. Li and P. Vita´nyi, An Introduction to Kolmogorov Complex- plexity and the NID metric. The most important contribution ity and Its Applications. Berlin, 2nd edition: Springer, 1997. of this paper is to propose a practical framework of NCCD for [4] M. Li, X. Chen, X. Li, B. Ma, and P. M. B. Vita´nyi, “The the approximation of NID. The framework is flexible and ex- similarity metric,” IEEE Trans. Info. Theory, vol. 50, pp. 3250– pandable to include any image transformations that may help 3264, Dec. 2004. find the shortest description that converts one image to an- [5] B. Hescott and D. Koulomzin, “On clustering images using other and vice versa. Although the current implementation compression,” CS Department, Boston University, Tech. Rep., and experimental work is only preliminary, the resulting sim- 2007. ilarity measure works properly in a wide variety of scenarios. [6] N. Tran, “The normalized compression distance and image dis- To the best of our knowledge, no existing image similarity tinguishability,” in The 19th IS&T/SPIE Symposium on Elec- measure has achieved the same level of wide applicability. tronic Imaging Science and Technology, San Jose, Jan. 2007. In the future, the implementation of NCCD can be improved [7] D. Cerra, A. Mallet, L. Gueguen, and M. Datcu, “Complexity by incorporating more transformations. To further refine the based analysis of earth observation imagery: an assessment,” NCCD framework, it is also useful to take into account the in ESA-EUSC, Mar. 2008. degrees of perceptual relevance of different transformations. [8] K. Kaabneh, A. Abdullah, and Z. Al-Halalemah, “Video classi- fication using normalized information distance,” in Geometric Modeling and Imaging: New Trends, Aug. 2006, pp. 34–40. [9] H. Barlow, Possible principles underlying the transformation 7. ACKNOWLEDGMENT of sensory messages. Cambridge, MA: MIT Press, 1961. [10] X. Wu and N. Memon, “Context-based, adaptive, lossless im- This work was supported in part by the Natural Sciences and age ,” IEEE Trans. Comm, vol. 45, no. 4, pp. 437–444, Engineering Research Council of Canada in the form of Dis- Apr. 1997. covery and Strategic Grants, and in part by Ontario Ministry [11] A. Myronenko and X. Song, “Point-set registration: Coherent of Research & Innovation in the form of an Early Researcher point drift,” 2009. [Online]. Available: http://www.citebase. Award, which are gratefully acknowledged. org/abstract?id=oai:arXiv.org:0905.2635