An Analysis of the Scale Saliency Algorithm
Total Page:16
File Type:pdf, Size:1020Kb
An analysis of the Scale Saliency algorithm Timor Kadir1, Djamal Boukerroui, Michael Brady Robotics Research Laboratory, Department of Engineering Science, University of Oxford, Parks Road, Oxford OX1 3PJ, U.K. August 8, 2003 Abstract In this paper, we present an analysis of the theoretical underpinnings of the Scale Saliency algorithm recently introduced in (Kadir and Brady, 2001). Scale Saliency considers image regions salient if they are simultaneously unpredictable in some feature-space and over scale. The algorithm possesses a number of attractive properties: invariance to planar rotation, scaling, intensity shifts and trans- lation; robustness to noise, changes in viewpoint, and intensity scalings. Moreover, the approach offers a more general model of feature saliency compared with conventional ones, such as those based on kernel convolution, for example wavelet analysis. Typically, such techniques define saliency and scale with respect to a particular set of basis morphologies. The aim of this paper is to make explicit the nature of this generality. Specifically, we aim to answer the following questions: What exactly constitutes a `salient feature'? How does this differ from other feature selection methods? The main result of our analysis is that Scale Saliency defines saliency and scale independently of any particular feature morphology. Instead, a more general notion is used, namely spatial unpre- dictability. This is determined within the geometric constraint determined by the sampling window and its parameterisation. In (Kadir and Brady, 2001) this window was a circle parameterised by a single parameter controlling the radius. Under such a scheme, features are considered salient if their feature-space properties vary rapidly with an incremental change in radius. In other words, the feature favours isotropically unpredictable features. We also present a number of variations of the original algorithm: a modification for colour images (or more generally any vector-valued image) and a generalisation of the isotropic scale constraint to the anisotropic case. Keywords Visual Saliency, Scale Selection, Salient Features, Entropy, Scale-space. 1 1 Introduction Computer vision algorithms are, in general, information reduction processes. Brute-force approaches to image or image sequence analysis can quickly overwhelm most computing resources at our dis- posal. Fortunately, images are a redundant data source. The same set of inferences may be drawn from a variety of image characteristics. This becomes self-evident considering the array of different methodologies available for solving any particular vision task. Hence, the selection of a sufficient set of image regions and properties, or salient features, forms the first step in many computer vision algorithms. Two key issues face the vision algorithm designer: the subset of image properties selected for subsequent analysis and the model used to represent those properties. For example, many image matching algorithms begin with a set of `landmark' points which serve as a basis for estimating the image transformation that defines the match. In this case, well-localised and unique image regions are desirable to minimise the likelihood of false matches. For many tasks geometric and photometric invariance properties are also beneficial. Finally, there is often an implicit, but difficult to quantify, requirement that the salient regions be relevant to the task of interest | in other words, the regions or descriptions subsequently extracted from them are somehow characteristic of the scene contents they are intended to signify. Many definitions for saliency have been proposed. Perhaps the most popular have arisen out of the application of local surface differential geometry techniques to imaging. Such methods consider the image to be a discrete approximation to a surface and categorize it by application of differential operators. Closely related to these are basis projection and filtering methods. Common to both is the development of one or two dimensional features; one dimensional features include edges, lines, ridges (Bergholm, 1986, Canny, 1986); two dimensional features are often referred to as Interest points or `Corners' (Deriche and Giraudon, 1993, Harris and Stephens, 1988, Mokhtarian and Suomela, 1998). Much effort within the Scale-Space and Wavelet communities have been devoted to providing a mathematically sound basis for the application of such techniques to what are essentially discrete sets (Kœnderink, 1984, Lindeberg and ter Haar Romeny, 1994, Mallat, 1998, Witkin, 1983). In general, these methods share one assumption: that saliency is a direct property of the geometry or morphology of the image surface. While it is certainly the case that there are many useful image features that can be defined in such a manner, efforts to generalise such methods to capture a broader range of salient image regions have had limited success. We contend that one of the major factors for this is that such methods typically define both saliency and scale with respect to a small set of basis functions or geometric properties. Perhaps then, it is for this reason and the lack of a satisfactory 2 definition of what constitutes a salient feature in the broader sense, that the term `feature selection' has acquired this restricted interpretation. There are a number of exceptions to this. Phase Congruency and the related Local Energy approach (Kovesi, 1999) defines features in terms of the phase coherence of Fourier components. For example, at a step-edge all Fourier components are maximally in phase at an angle of 0◦ or 180◦ for positive or negative transitions respectively. One of the benefits of such an approach is that several feature types may be detected simultaneously. Yet despite the novelty of the model, Kovesi was primarily interested in simple one or two dimensional features typical of geometric methods. There was no effort to broaden the definition of saliency. An alternative strategy is to define saliency in terms of the probabilistic or statistical properties of the image. This approach has been most popular for region segmentation tasks (Besag, 1986, Leclerc, 1989, Li, 1995, Paragios and Deriche, 2002, Zhu and Yuille, 1996). There have also been several attempts at feature detection using statistical measures | it is well known that local variance can be employed as a basic edge detector. Other methods have attempted to estimate saliency by measuring rarity of feature properties. In (Kadir and Brady, 2001), we proposed a novel model of feature saliency. In our approach, termed Scale Saliency, regions are deemed salient if they exhibit unpredictable behaviour (in a prob- abilistic sense) simultaneously in feature-space and over scale. Scale Saliency possesses a number of attractive properties. First, it offers a more general model of feature saliency compared to conven- tional methods. Second, it incorporates an intrinsic notion of scale and a method for its selection locally. Third, it makes explicit the link between the definition of saliency and the method of de- scription. In short, it offers a coherent methodology incorporating three intimately related concepts of scale, saliency and image description. The implementation presented in (Kadir and Brady, 2001) possesses a number of other beneficial qualities: invariance to planar rotation, scaling, intensity shifts and translation; and robustness to noise, changes in viewpoint, and intensity scalings. In this paper, we present an in-depth analysis of the theoretical underpinnings of the Scale Saliency model. The aim here is to make explicit the definition of saliency in this model. Specifically, we aim to answer the following questions: What exactly constitutes a salient feature? How is this different from other feature selection methods? This paper is organised as follows. In Section 2 we provide a brief overview of the Scale Saliency algorithm. The Scale Saliency algorithm is a product of two terms measuring the unpredictability of the local PDF in feature-space and over scale respectively. Detailed analyses of these two terms are presented in Sections 3 and 4 where we derive expressions for the conditions under which Scale Saliency is maximised and discuss the underlying model. In Section 5, we present generalisations of 3 the method to colour images and anisotropic scale. In Section 6, we discuss the relationship between the Scale Saliency algorithm and transform based methods for feature detection. Finally, in Section 7 we conclude our analysis and outline a number of remaining open issues. 2 Scale Saliency In this section, we briefly describe the Scale Saliency algorithm. A more detailed discussion of the technique may be found in (Kadir and Brady, 2001). 2.1 Saliency as local unpredictability Gilles (1998) investigated the use of salient local image patches or `icons' for matching and registering two images. He defined saliency in terms of local signal complexity or unpredictability. More specifically, he estimated saliency using the Shannon entropy of local attributes. Figure 1 shows local intensity histograms from a number of image segments. Areas corresponding to high signal complexity tend to have flatter distributions, hence higher entropy. More generally, high complexity of any suitable descriptor can be used as a measure2 of local saliency. Local attributes, such as colour or edge strength, direction or phase, may be used. Given a point x, a local neighbourhood RX , and a descriptor d that takes values from D = fd1; : : : ; drg (e.g. in an 8 bit grey level image D would range from 0 to 255), local entropy (in the discrete form) is defined as: HD;RX = − pd;RX log2 pd;RX (1) Xd where pd;RX (di) is the probability of descriptor D taking the value di in the local region RX . Gilles' method has a number of limitations. It requires the specification of a window size, or scale, over which an estimate of the local PDF may be obtained. Underlying this definition of saliency is the assumption that complexity is rare in real images. This is generally true, except in the case of pure noise or self-similar images (e.g. fractals) where complexity is independent of scale and position, and textured regions, where, in general, complexity is more prevalent.