Local Feature Extraction in Log-Polar Images

B Manuela Chessa( ) and Fabio Solari

Department of Informatics, Bioengineering, Robotics and System Engineering - DIBRIS, University of Genoa, Via All’Opera Pia 13, 16145 Genova, Italy [email protected]

Abstract. We propose two different strategies to compute edges in the log-polar (cortical) domain. The space-variant processing is obtained by applying local operators (e.g. local derivative filters) directly on the log- polar images, or by embedding the same operators into the log-polar mapping, thus obtaining a cortical representation of the Cartesian fea- tures. The two approaches have been tested by taking into consideration three standard algorithms for (Canny, Marr-Hildreth and Harris), applied onto the BSDS500 dataset. Qualitative and quantita- tive comparisons show a first indication of the validity of the proposed approaches.

Keywords: Space-variant processing · Foveated images · Edge detection · · Cortical representation · Bio-inspired visual processing

1 Introduction

The computation of local image features such as edges and corners is at the basis of many approaches for matching and recognition, which are important in image processing, and robotics applications. In the literature, several edge or contour detection methods, based on local operators applied in the Cartesian domain, are described. Among them, some algorithms detect edges by convolving a grayscale image with local derivative filters (e.g. the Roberts, Sobel and Prewitt operators), the Marr and Hildreth method uses zero crossings of the Laplacian of Gaussian operator, and the Canny operator defines edges’ detection and localization criteria based on first derivatives of a Gaussian [5]. A combined edges and corners detector opera- tor, based on local derivatives, has been proposed by Harris [8]. More recent local approaches take into account color and texture information and make use of learning techniques for cue combination [11]. Recently, a contour detector method that combines multiple local cues into a globalization framework, based on spectral clustering, is described in [1], and a multi-scale Harris corner detector is proposed in [7]. Though a great effort in improving edge detectors in the Cartesian domain has been done, few works address the same problem in the log-polar (cortical) domain [14]. The space-variant images are promising for many image processing c Springer International Publishing Switzerland 2015 V. Murino and E. Puppo (Eds.): ICIAP 2015, Part I, LNCS 9279, pp. 410–420, 2015. DOI: 10.1007/978-3-319-23231-7 37 Local Feature Extraction in Log-Polar Images 411 and robotics applications, since they provide a high spatial resolution in the region of interest, i.e. the fovea, and a reduction of the amount of data to be processed, similarly to what happens in the mammals’ visual system. Indeed, the distribution of the photoreceptors in the primates’ retina is space-variant (i.e. denser in the center, the fovea, and sparser in the periphery), and the projection of such photoreceptors into the primary visual cortex can be described by a log-polar mapping [14]. However, the processing of the log-polar images is a challenging task, due to the image distortions generated by the retino-cortical transform, which often require a specific adaptation of the algorithms, in order to properly work. Primal feature extraction [10] in log-polar images has been addressed by some authors in the literature, which propose ad-hoc solutions designed to work in the cortical domain. In [12], the authors present a mechanism for computing operators such as edge detection and directly in foveated images; in [6] the authors present an approach to extract edges, bars, blobs and ends from log-polar images, based on neural networks that learn the feature’s class; and a comparison of several strategies for detection in log-polar images is presented in [18]. In this paper, we propose two approaches that allow standard algorithms to work in the log-polar domain, without specific adaptation. Thus, we do not propose a new method for feature detection, but we analyze how well known state-of-the-art methods for edges and corners extraction work in the log-polar domain. In particular, the aim of the paper is to show the performances, also in terms of accuracy of the feature detection, of the two considered approaches: (i) the local operator for edge detection (i.e. derivative of Gaussian, and Lapla- cian of Gaussian) is applied on the cortical image, i.e. on the image that has been transformed into the log-polar image through low-pass Gaussian filters; (ii) the local operator is embedded into the log-polar transform, thus producing a cortical representation of the Cartesian derivatives of the image, on which to compute edges. Moreover, we assess the two proposed approaches by using the metrics and the BSDS500 dataset presented in [1].

2 Log-Polar Mapping

The log-polar mapping is a non linear transformation that maps each point of the Cartesian domain (x, y) into a cortical domain described by the coordinates (ξ,η). In the literature, several log-polar mapping models are described [2,4,9]. We consider the central blind-spot model, since it is characterized by scale and rotation invariance [17]. The log-polar transformation is described by the following equations:    ρ ξ = loga ρ0 (1) η = qθ, where a parameterizes the non-linearity of the mapping, q is related to the angular resolution, ρ0 is the radius of the central blind spot, and 412 M. Chessa and F. Solari  (ρ, θ)=( x2 + y2, arctan (y/x)) are the polar coordinates derived from the Cartesian ones. All points with ρ<ρ0 are ignored, thus ρ0 has to be small, with respect to the size of the image. In order to deal with digital images, given a Cartesian image of M ×N pixels, and defined ρmax =0.5 min(M,N), we obtain an R×S (rings × sectors) discrete cortical image of coordinates (u, v) by taking:  u = ξ (2) v = η, where · denotes the integer part, q = S/(2π), and a = exp(ln(ρmax/ρ0)/R). Figure 1 shows the transformations through the different domains. The retinal area (i.e. the log-polar pixel) that refers to a given cortical pixel defines its receptive field (RF). By inverting Eq. 1 the centers of the RFs can be computed, and these points present a non-uniform distribution through the retinal plane, as in Figure 2a (green crosses). The optimal relationship between R and S is the one that optimizes the log-polar pixel aspect ratio γ, making it as close as possible to 1. It can be shown that, for a given R, the optimal rule is S =2π/(a − 1) [15,17].

Cartesian domain Cortical domain Retinal domain

Fig. 1. Left: the cyan circle and the green sector in the Cartesian domain (x, y)mapto vertical and horizontal stripes, respectively, in the cortical domain (ξ,η). The red area represents a RF that is mapped in the corresponding cortical pixel. Right: an example of image transformation from the Cartesian to the cortical domain, and backward to the retinal domain. The RFs (yellow circles) are overlapping the Cartesian image. The specific choice of the mapping parameters is: R = 80, S = 131, ρ0 =3,andρmax = 256. The cortical image is scaled to improve the visualization.

The shape of the RFs affects both the quality of the transformation and its computational burden. In [3] the authors analyze four techniques, each char- acterized by a different shape for the RFs: nearest pixel, bilinear interpolation, adjacent RFs, and overlapping circular RFs. The overlapping circular RFs [2,13] are the most biological plausible technique and they allow a better preservation of the image information [3], thus we consider this solution in the paper. To implement the log-polar mapping, the Cartesian plane is divided in two regions: the fovea and the periphery. The periphery is defined as the part of the plane in which the distance between the centers of two RFs on the same radius Local Feature Extraction in Log-Polar Images 413 is greater than 1 pixel (undersampling). To obtain the cortical image we use overlapping Gaussian RFs, as shown in Figure 2a. The fovea (in which we have an oversampling, i.e. the distance between two consecutive RFs is less than 1 pixel) is handled by using fixed size RFs, whereas in the periphery the size of the RFs grows. The standard deviation of the RF Gaussian profile is a third of the distance between the centers of two consecutive RFs, and the spatial support is six times the standard deviation. As a consequence of this choice, adjacent RFs overlap. A cortical pixel Ci is computed as a Gaussian weighted sum of the Cartesian pixels Pj in the i-th RF: Ci = j wij Pj, where the weights wij are the values of a normalized Gaussian centered on the i-th RF. A similar approach is used to compute the inverse log-polar mapping that produces the retinal image, where the space-variant effect of the log-polar mapping is observable.

Gaussian RFs Laplacian of Gaussian RFs Derivatives of Gaussian RFs

(a) (b) (c)

Fig. 2. The RFs considered to obtain the cortical representation of the image. (a) Gaus- sian RFs, used to obtain the cortical image (log-polar transform). (b) Laplacian of Gaussian RFs and (c) Derivative of Gaussian RFs (along horizontal and vertical axes, respectively), used to obtain the cortical representation of the derivatives of the image.

3 Feature Detection in the Log-Polar Domain

The cortical representation R(ξ,η) of a space-variant processed Cartesian image I(x, y) is described as follows:

R(ξ,η)=g(x − x0(ξ,η),y− y0(ξ,η)),I(x, y), (3) where · denotes the inner product, g(x, y) is the local operator that defines the weights of the log-polar mapping, and (x0(ξ,η),y0(ξ,η)) is the center of each RF. Figure 2 shows several operators, which are used to obtain R(ξ,η). In par- ticular, Gaussian RFs (Fig.2a) are used to obtain the cortical image, i.e. the log-polar transform of the Cartesian image, Laplacian of Gaussian and deriva- tives of Gaussian (Fig.2b-c) are used to obtain the cortical representation of the image derivatives. The choice of the RFs is at the basis of the two approaches described in the following paragraphs, and summarized in Figure 3. In particu- lar, in the following we show that the two cortical approaches can approximate the corresponding Cartesian processing. 414 M. Chessa and F. Solari

Log-polar RFs Cortical (deriv. of Gausssian) derivatives

Cortical edges

Retinal edges Edges ILPM extraction LPM

Cartesian image derivatives Edges LPM ILPM computation extraction Retinal edges

Log-polar RFs Cortical image Cortical edges (Gausssian) Cortical derivatives

Fig. 3. Sketch of the two approaches described in the paper. Top: the Cartesian image (from the BSDS500 dataset, see Section 4) is transformed (LPM) by considering over- lapping RFs embedding one of the considered local operators (in figure, the derivative of Gaussian filters) to obtain the cortical representation of the Cartesian derivatives, from which edges are extracted, thus obtaining cortical edges. Finally, through the inverse log-polar mapping (ILPM) retinal edges are computed. Bottom: the Cartesian image is transformed into the cortical image through the Gaussian RFs, then the derivatives are computed directly on the cortical image, by applying one of the considered local oper- ators, and cortical edges are extracted. Finally, through the inverse log-polar mapping retinal edges are computed.

3.1 Direct Feature Detection in the Log-Polar Domain To obtain a cortical representation, i.e. a cortical image, of a Cartesian image, we consider Gaussian filters as local operators that describe the RFs used to perform the log-polar mapping (see Fig.2a):  1 x2 + y2 g(x, y)= exp − , (4) 2πσ2 2σ2 where σ is the standard deviation. Then, standard edge and corner differential local operators are applied onto the cortical image, by following the design strate- gies devised in [15]. In particular, to apply standard edge detection algorithms we define: ∂ Rξ(ξ,η)= R(ξ,η), (5) ∂ξ where Rξ(ξ,η) is the (along ξ axis) of the cortical image, similarly for η axis. Eq. 5 can be approximated through derivatives of Gaussian, to implement the in the log-polar domain (Cannyd), which is equivalent to implement it in the Cartesian domain if the following rules are satisfied: - Local filtering operations in the log-polar domain are a good approximation of the same filtering done in the Cartesian domain, when log-polar mapping Local Feature Extraction in Log-Polar Images 415

is performed by considering the pixel aspect ratio γ close to 1, and the spatial support of the filters is small with respect to the size of the cortical image (e.g. less than 10%) [15]. - The in the cortical domain are related to the ones in the Cartesian domain by the following relationship [16]:

Rξ 1 cos η sin η Ix = ξ − , (6) Rη ρ0a ln(a) sin η cos η Iy

where Ix and Iy are the partial derivatives of the Cartesian image. Eq. 6 rep- resents a rotation and a magnitude change of the Cartesian image gradient, thus cortical gradients Rξ and Rη have the same properties of the Cartesian ones, and they can be used for edge detection (see Fig. 3 bottom).

The Marr-Hildreth edge detector (MHd) is based on the Laplacian of Gaus- sian, which can be directly applied in the cortical domain, following the previ- ously explained rules. The Harris corner and edge detector (Harrisd) is based on derivatives of Gaussian that yield the cortical image structure , and can be described as follows:

2 ∗ Rξ (ξ,η) Rξ(ξ,η)Rη(ξ,η) M(ξ,η)=w(ξ,η) 2 , (7) Rξ(ξ,η)Rη(ξ,η) Rη(ξ,η) where w(ξ,η) are Gaussian weights, and ∗ is the convolution operator. The edge and corner features are then derived, by computing:

H(ξ,η) = det(M(ξ,η)) − ktr2(M(ξ,η)), (8) where k is a scalar, whose values are in the interval [0.04, 0.15] [7], det(·)and tr(·) are the determinant and the trace operators, respectively. By considering a small spatial support for the Gaussian weights w(ξ,η), and the relationship between the cortical and the Cartesian gradients (see Eq. 6), the Harris detector in log-polar domain is a good approximation of the one in the Cartesian domain. Indeed, the cortical image structure tensor can be written in terms of a rotation of the Cartesian gradients, multiplied by a term that can be considered constant within the small spatial support of the Gaussian weights:

1 Iη2 IηIη M(ξ(x, y),η(x, y)) = w(ξ(x, y),η(x, y)) ∗ x x y , (9) ξ(x,y) η η η2 ρ0a ln(a) Ix Iy Iy

η η with Ix = Ix(x, y)cosη + Iy(x, y)sinη,andIy = −Ix(x, y)sinη + Iy(x, y)cosη. It is worth noting that the two coordinates’ axes in the log polar domain have different meanings and range of values with respect to the Cartesian ones, and this affects the structure tensor. Nevertheless, by considering the design rules previously explained the error between the processing in the two domains is negligible. 416 M. Chessa and F. Solari

3.2 Feature Detection Based on Embedded Processing in the Log-Polar Transform

By following the biological evidence, the visual processing is performed by net- works of neurons described by their RFs. The RFs can be approximated by a filter bank that performs the desired visual processing. Thus, in this paper we propose to modify the log-polar transform by using as cortical mapping weights specific filters that perform the desired feature computation, in partic- ular we consider as weights Laplacian of Gaussian and derivatives of Gaussian (see Fig. 2b-c). The cortical representation Rx(ξ,η) of the Cartesian derivative Ix(x, y) (com- puted through derivatives of Gaussian filters), along x axis, of the image is obtained as follows:

Rx(ξ,η)=gx(x − x0(ξ,η),y− y0(ξ,η)),I(x, y), (10) where gx(x, y)isthe x-axis derivative of Gaussian operator, and (x0(ξ,η),y0(ξ,η)) is the center of each RF, similarly for the y-axis. The Canny edge detector (CannyRF) can be thus implemented by using the cortical representation of the Gaussian derivatives (see Fig. 3 top). It is worth noting that this approach does not require any approximation of the filtering stage, since Rx(ξ,η)andRy(ξ,η) are the actual values of the image derivatives, represented in the log-polar domain. The same principle holds for the Marr- Hildreth (MHRF) edge detector, by using the Laplacian of Gaussian as weights of the log-polar mapping. The Harris corner and edge detector (HarrisRF) is now based on the following equation:

2 ∗ Rx(ξ,η) Rx(ξ,η)Ry(ξ,η) M(ξ,η)=w(ξ,η) 2 . (11) Rx(ξ,η)Ry(ξ,η) Ry(ξ,η)

By considering a small spatial support for w(ξ,η), edges and corners can be computed by using Eq. 8, since the image structure tensor is now defined in terms of the Cartesian image derivatives (mapped into the log-polar domain). Figure 4 shows the cortical derivatives, the cortical edges, and the corre- sponding retinal edges for a sample image, by considering the two described approaches, and the three considered algorithms. In Section 4, we will further analyze the performance of the proposed approaches.

4 Results

The qualitative and quantitative analysis of the approaches described in this paper have been performed by using the Berkeley Segmentation Data Set and Benchmarks 500 (BSDS500), described in [1]. This dataset is an extension of the BSDS300, where the original 300 images are used for training/validation and Local Feature Extraction in Log-Polar Images 417

MHRF MHd CannyRF Cannyd HarrisRF Harrisd

Cortical derivatives

Cortical edges

Cartesian image Retinal edges

Cartesian edges

Fig. 4. Edges computed for an image of the BSDS500 dataset. The columns represent the different approaches and algorithms. First row: cortical image derivatives. Secomd row: cortical edges. Third row: retinal edges (computed from the cortical one through the inverse log polar mapping). Fourth row: edges computed in the Cartesian domain for comparison. Only the highlighted part of the Cartesian image is mapped into the cortical domain and analyzed (the shadow part is not considered).

200 fresh images, together with human annotations, are added for testing. Each image was segmented by five different subjects on average. In this paper, we only use and present results for the testing images. The log-polar mapping is performed by setting the transform’s parameters as follow: R = 100, S = 155, ρ0 =3,andρmax = 320, which correspond to a compression ratio of about 10 times. Edges are computed by setting the following parameters:

- Marr-Hildreth (MHd and MHRF): the threshold of the slope of the zero- crossing values is 40. The spatial support of the Laplacian of Gaussian for × MHd is 3 3pixels. - Canny (Cannyd and CannyRF): the spatial support of the derivatives of × Gaussian for Cannyd is 9 9 pixels, and the standard deviation is 1.4. The non-maximum suppression and an hysteresis threshold (whose lower and upper bound are set to 0.3 and 0.7 of the maximum gradient value, respectively) have been considered. - Harris (Harrisd and HarrisRF): the spatial support of the derivatives of × Gaussian for Harrisd is 3 3 pixels. The weighting function w(ξ,η)isa Gaussian function with spatial support 3 × 3 pixels and standard deviation 0.5, and k is 0.05. A qualitative comparison of edge computation is presented in Figure 4.The approaches based on direct feature detection in the log-polar domain, and the ones based on feature detection using embedded processing in the log-polar trans- form perform in a similar manner. This suggests that the approximation errors 418 M. Chessa and F. Solari d Harris RF Harris

Fig. 5. Edges (green) and corners (red) computed with the Harris detector, by consid- ering derivative of Gaussian RFs embedded in the log-polar transform (first row), and directly on the cortical image (second row). Images from the BSDS500 dataset. between computing a feature directly in the cortical domain and by embedding the local operators into the log-polar transform are negligible. The differences among the three methods are due to the different choice of the parameters, but a systematic comparison of edge detection algorithms is out of the scope of this paper. The Harris method provides us both edges and corners, as Figure 5 shows for 4 sample images of the BSDS500 dataset. Also in this case, Harrisd and HarrisRF perform very similarly. In addition to the qualitative evaluation, we have tested the performances of the implemented algorithms in terms of precision, the fraction of true positives, and recall, the fraction of ground-truth

1

[F=0.51] Cannyd 0.9 [F=0.51] CannyRF [F=0.52] MH 0.8 d [F=0.51] MHRF

0.7 [F=0.51] Harrisd

[F=0.52] HarrisRF 0.6

0.5 Precision 0.4

0.3

0.2

0.1

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall

Fig. 6. Boundary benchmark on the BSDS500 dataset. Our implemented methods, working in the log-polar domain, have been compared with respect to the human ground-truth provided by the dataset. The precision-recall curves have been computed by using the code available from http://www.eecs.berkeley.edu/Research/Projects/ CS/vision/grouping/resources.html.TheF-measure for each method is shown in the legend. Local Feature Extraction in Log-Polar Images 419 boundary pixels detected [11]. We have also computed the F-measure,orhar- monic mean of precision and recall at the optimal detector threshold, which provides a score, useful to evaluate the methods. It is worth noting that it is not possible to compare the proposed approaches with the state-of-the art algorithm for edge detection, this is due to the following reasons. First, we compute edges only in the part of the images covered by RFs (see Fig. 2). In particular, for the BSDS500 dataset the part of the images transformed into the log-polar domain is the highlighted one (e.g. see Fig. 4). The ground-truth edges have been masked, accordingly. Moreover, working in the cortical domain yields a loss of details in the image periphery (due to the high compression ratio that is achieved), thus the comparison with respect to methods that work at full resolution is unfair. The precision-recall graphs are reported in Figure 6. The F-measures when choosing an optimal scale for the entire dataset have almost the same values for all the algorithms, whereas F-measures per images show that HarrisRF has slightly a better performance (F =0.54). For the sake of clarity, it is worth noting that on the same dataset Canny algorithm (in the Cartesian domain at full resolution) has a better F-measure (F =0.60).

5 Conclusion

In this paper, we have proposed two approaches for edge detection in the log- polar domain, which allow us to apply well-known and established techniques, previously developed in the Cartesian domain, without the need of adapting them to the cortical domain. We have presented the theoretical basis on which this is possible, by following two distinct approaches: one in which features are directly computed working on the log-polar images, the other in which the local differential operators at the basis of the considered algorithms are embedded into the log-polar mapping, thus allowing us to obtain a cortical representation of the Cartesian image processing. The results confirm the validity of the approaches and suggest that they could be promising for achieving feature detection in space-variant images. Neverthe- less, some issues are still open and will be further investigated. From one hand, a parametric analysis of the considered edge detectors should be performed, in order to find the optimal set of parameters, and more recent state-of-the art algo- rithms for edge detection should be analyzed in order to describe them in the cortical domain. From the other hand, it is necessary to consider multiple log- polar mapping of the same image, by moving the fovea in several image points, in order to build a full-resolution representation of edges, thus mimicking the shifting focus of attention of the human visual system. 420 M. Chessa and F. Solari

References

1. Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchi- cal image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2011) 2. Bolduc, M., Levine, M.D.: A review of biologically motivated space-variant data reduction models for robotic vision. Computer Vision and Image Understanding 69(2), 170–184 (1998) 3. Chessa, M., Sabatini, S.P., Solari, F., Tatti, F.: A quantitative comparison of speed and reliability for log-polar mapping techniques. In: Crowley, J.L., Draper, B.A., Thonnat, M. (eds.) ICVS 2011. LNCS, vol. 6962, pp. 41–50. Springer, Heidelberg (2011) 4. Florack, L.M.J.: Modeling foveal vision. In: Sgallari, F., Murli, A., Paragios, N. (eds.) SSVM 2007. LNCS, vol. 4485, pp. 919–928. Springer, Heidelberg (2007) 5. Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference (2002) 6. Gomes, H.M., Fisher, R.B.: Primal sketch feature extraction from a log-polar image. Pattern Recognition Letters 24(7), 983–992 (2003) 7. Gueguen, L., Pesaresi, M.: Multi scale harris corner detector based on differen- tial morphological decomposition. Pattern Recognition Letters 32(14), 1714–1719 (2011) 8. Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the 4th Alvey Vision Conference, pp. 147–151 (1988) 9. Jurie, F.: A new log-polar mapping for space variant imaging. Application to face detection and tracking. Pattern Recognition 32, 865–875 (1999) 10. Marr, D.: Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. Henry Holt and Co., Inc., New York (1982) 11. Martin, D., Fowlkes, C., Malik, J.: Learning to detect natural image boundaries using local brightness, color, and texture cues. IEEE Trans. Pattern Anal. Mach. Intell. 26(5), 530–549 (2004) 12. Nattel, E., Yeshurun, Y.: Direct feature extraction in a foveated environment. Pattern Recognition Letters 23(13), 1537–1548 (2002) 13. Pamplona, D., Bernardino, A.: Smooth foveal vision with gaussian receptive fields. In: 9th IEEE-RAS International Conference on Humanoid Robots (2009) 14. Schwartz, E.: Spatial mapping in the primate sensory projection: Analytic structure and relevance to perception. Biological Cybernetics 25, 181–194 (1977) 15. Solari, F., Chessa, M., Sabatini, S.P.: Design strategies for direct multi-scale and multi-orientation feature extraction in the log-polar domain. Pattern Recognition Letters 33(1), 41–51 (2012) 16. Solari, F., Chessa, M., Sabatini, S.P.: An integrated neuromimetic architecture for direct motion interpretation in the log-polar domain. Computer Vision and Image Understanding 125, 37–54 (2014) 17. Traver, V., Pla, F.: Log-polar mapping template design: From task-level require- ments to geometry parameters. Image Vision Computing 26(10), 1354–1370 (2008) 18. Wallace, A., McLaren, D.: Gradient detection in discrete log-polar images. Pattern Recognition Letters 24(14), 2463–2470 (2003)