Efficient and Effective Contour-Based Corner Detection Methods for Image

Efﬁcient and Effective Contour-based Corner Detection Methods for Image Matching

Naurin Afrin

A dissertation submitted in fulﬁlment of the requirements for the degree of Doctor of Philosophy

Supervisors: Dr Wei Lai, Professor Chengfei Liu and Dr Nabeel Mohammed

Department of Computer Science and Software Engineering Swinburne University of Technology, Australia

28 July 2019 Except where otherwise indicated, this thesis is my own original work and has not been submitted for any other degree.

Naurin Afrin 28 July 2019 This thesis is dedicated to my parents. Acknowledgments

First and foremost, I would like to express my gratefulness to the Almighty Allah for all his blessings that made everything possible for me and gave me power to overcome all the obstacles I encountered during my studies. I would like to especially thank my principle PhD supervisor, Dr Wei Lai for his guidance, support and patience throughout the doctoral study. Without his great guidance, patience and encouragement, my PhD journey would not have been made so far. I would also like to thank my coordinate supervisor Dr Chengfei Liu for his valuable suggestions. I am also thankful to my Associate supervisor Dr Nabeel Mo- hammed for his thoughtful attention, guidance, moral support and invaluable suggestions. His valuable comments and constant encouragement have remarkably improved the quality of my thesis. I am grateful to my parents Md Shamsul Alam and Susmita Shamsul for their kind, mental support, love, and encouragement and for always keeping me in their prayers. I like to thank my wonderful sisters Rumana Shabnom, Sharmin Afrin and Orin Afrin, for always encouraging me in my hard time. I am thankful to the teachers, lab mates and friends of Swinburne University of Technology for supporting and encouraging me during my candidature. Lastly and most importantly, my heartiest appreciation and sincere thanks go to my beloved husband, Raﬁ Md Najmus Sadat who spared no effort, whatsoever, to help me in reaching the light at the end of the tunnel. I would also like to thank my lovely daughter, Aaeesha Areebah for helping and supporting me all the possible way while I was busy at work. Thank you, everyone, for believing in me.

iii Abstract

Corner detection and matching are key steps in many computer vision applications such as image retrieval, image registration, image classification and object recognition. The main challenge is to find the correspondences between two images under different photometric and geometric transformations. To find such correspondence, local feature-based approach applies the following main steps: (i) detecting the location of the feature, such as the corner, (ii) selecting the interest region around the feature location (iii) building the descriptor to represent each feature , and finally (iv) applying these processes in local feature-based applications like image matching and others. In this research work, we have proposed new effective and efficient methods for those aforementioned steps. For the first step, we have developed three contour-based corner detectors to detect the feature locations. Two of the proposed corner detectors use single chord and the other one uses multi-chord to detect the corner locations. The experimental results show that our proposed methods are more effective and efficient than other existing methods. As contour-based corner detectors depend on the contour, a good edge detector can make a profound effect on the performance of the corner detection method. For this reason, we have analysed the performance of corner detectors using different edge detectors and discovered some interesting results to chose the best edge detector. For the second step, we have proposed a method for estimating the region around the location of the corners. We have used the curvature maxima to generate some circular region in this regard. Experimental results show that a descriptor built using the region estimated by our method performs better in finding the corresponding corners between two images. Finally, we have used our proposed methods in an image matching framework using Bag-of-Visual-Words model.

iv Abbreviations

ANDD Anisotropic Gaussian Directional Derivative AGAST Adaptive and Generic Accelerated Segment Test ARCSS Afﬁne Resilient Curvature Scale-Space BRIEF Binary Robust Independent Elementary Features BOVW Bag of Visual Words BPLR Boundary Preserving Local Region detector CBIR Content based Image Retrieval CCSR Chord to Cumulative Sum Ratio CHoG Compressed Histogram of Gradients CPDA Chord to Point Distance Accumulation CTAR Chord to Triangular Arm Ratio CCR Chord to Curve Ratio CCN Consistency of Corner Numbers CSS Curvature Scale Space CenSurE Center Surround Extrema DCSS Direct Curvature Scale Space DoG Difference of Gaussian EBR Edge-Based Region FAST Features from Accelerated Segment Test FREAK Fast Retina key point GCM Gradient Correlation Matrix gLoG generalized Laplacian of Gaussian GLOH Gradient Location Oriented Histogram IBR Intensity extrema-based regions ICA Independent Component Analysis LATCH Learned Arrangements of Three patCH code LoG Laplacian of Gaussian LDA Linear Discriminant Analysis

v Abbreviations

LDB Local Difference Binary LE Localization Error LBP Local Binary pattern LIOP local intensity order pattern LIROP Local Intensity Relative Order Pattern LTP Local Ternary pattern MAP Mean Average Precision MLTP Modiﬁed Local Ternary pattern MSCAD Multiple Single Chord Accumulated Distance MSCP Multi-scale Curvature Product MSER Maximally Stable Extremal Region NN Nearest Neighbor NNDR Nearest Neighbor Distance Ratio OIOP Overall Intensity Order Pattern ORB Oriented FAST and Rotated BRIEF PCA Principal Component Analysis PDF Probability Distribution Function SCAD Single Chord Accumulated Distance SCCPDA Single Chord CPDA SIFT Scale Invariant Feature Transform SURF Speeded Up Robust Features SUSAN Smallest Univalue Segment Assimilating Nucleus TBIR Text-Based Image Retrieval RoS Region of Support RMGD Ring-based Multi-Grouped Descriptor RMSE Root Mean Square Error USAN Univalue Segment Assimilating Nucleus

vi Publications on PhD Research

Journal Articles

1. Naurin Afrin and Wei Lai. Effective Interest Region Estimation Model to Rep- resent Corners for Image. In Signal & Image Processing An International Journal (SIPIJ) Vol.9, No.6, December 2018

2. Naurin Afrin, Nabeel Mohammed and Wei Lai. Local Feature Detectors and Descriptors: Where We Are Now, IET Image Processing (Under Review)

Conference Papers

1. Naurin Afrin and Wei Lai. Single chord based corner detectors on planar curves. In 23rd International Conference in Central Europe on Computer Graphics, Visualiza- tion and Computer Vision (WSCG 2015), Plzen, Czech Republic, 8-12 June 2015, Vol. 2501, pp. 45-52.

2. Naurin Afrin and Wei Lai and Nabeel Mohammed. An effective multi-chord corner detection technique. In International Conference on Digital Image Comput- ing: Techniques and Applications (DICTA 2016), Gold Coast, Queensland, Aus- tralia, 30 November-2 December 2016.

3. Naurin Afrin and Wei Lai and Nabeel Mohammed. Performance analysis of corner detection algorithms based on edge detectors. In 25th International Con- ference on Computer Graphics, Visualization and Computer Vision 2017 (WSCG 2017), Plzen, Czech Republic, 29 May-2 June 2017.

4. Naurin Afrin, Wei Lai. ”Effective Corner-based Image Retrieval Technique using Bag-of-Words Model”, International Journal of Electrical and Computer En- gineering (IJECE) (To be submitted)

vii viii Contents

Abstract iv

Abbreviations v

Abbreviations vi

Publications vii

List of Figures xix

List of Tables xx

1 Introduction 1 1.1 Background ...... 1 1.2 Motivation ...... 4 1.3 Objectives ...... 5 1.4 Contribution and Structure of the Thesis ...... 8 1.4.1 Literature Review ...... 8 1.4.2 Single chord-based Corner Detectors ...... 8 1.4.3 Multi Chord-based Corner Detector ...... 8 1.4.4 Performance Analysis of Corner Detection Algorithms based on Edge Detectors ...... 9 1.4.5 Estimating Interest Region for Corner Locations ...... 9 1.4.6 Application ...... 10 1.4.7 Conclusion and Future Work ...... 10

2 Literature Review 11 2.1 Feature ...... 11 2.2 Global Features vs Local Features ...... 12 2.3 Types of Feature Detectors ...... 15

ix Contents x

2.3.1 Edge: ...... 15 2.3.2 Corner: ...... 16 2.3.3 Blob: ...... 16 2.3.4 Region: ...... 18 2.4 Corner Detectors ...... 19 2.4.1 Intensity-Based Corner Detectors ...... 20 2.4.1.1 Classiﬁcations of intensity-based corner detectors . . . . 21 2.4.2 Dominant Point Detectors ...... 24 2.4.3 Contour-based Corner Detectors ...... 25 2.4.3.1 Steps of Contour-based corner detectors ...... 25 2.4.3.2 Contour-based corner detectors ...... 29 2.5 Interest Region Estimation ...... 34 2.5.1 Estimating scale-invariant region ...... 34 2.5.1.1 Region estimation at the time of feature detection . . . . 35 2.5.1.2 Region estimation at the time of feature description . . 37 2.5.2 Estimating afﬁne-invariant region ...... 38 2.6 Feature Descriptors ...... 39 2.6.1 Descriptors based on Geometric Relations ...... 39 2.6.2 Descriptors based on Pixels of the Interest Region ...... 40 2.6.2.1 Binary Descriptors ...... 40 2.6.2.2 Floating-Point Descriptors ...... 41 2.7 Feature Matching ...... 44 2.8 Content Based Image Retrieval (CBIR) ...... 46 2.9 Performance Evaluation ...... 49 2.9.1 Performance Evaluation of Feature Detectors ...... 49 2.9.2 Performance Evaluation of Estimated Interest Regions and Fea- ture Descriptors ...... 51 2.10 Summary ...... 52

3 Single Chord based Corner Detectors 53 3.1 CPDA Corner Detector ...... 53 3.2 Weaknesses of CPDA Corner Detector ...... 55 Contents xi

3.3 Proposed Corner Detectors ...... 61 3.3.1 Why using single chord? ...... 61 3.3.2 Proposed Single Chord CPDA Detector ...... 62 3.3.3 Proposed Chord to Cumulative Sum Ratio (CCSR) Detector . . . 67 3.4 Experimental Results ...... 69 3.4.1 Experimental Setup ...... 69 3.4.1.1 Evaluation Metrics ...... 70 3.4.1.2 Test image Dataset ...... 70 3.4.2 Parameter Optimization ...... 71 3.4.3 Performance Evaluation ...... 72 3.4.4 Complexity Comparison ...... 83 3.5 Conclusion ...... 84

4 Effective Multi-Chord based Corner Detector 85 4.1 Distance Accumulation Technique ...... 86 4.2 Proposed Method ...... 87 4.2.1 A simple corner model ...... 88 4.2.2 Single Chord Accumulated Distance Detector (SCAD) ...... 89 4.2.3 Why use multiple chord lengths? ...... 93 4.2.4 Combining multiple chords of different lengths ...... 94 4.3 Experimental Results ...... 96 4.4 Conclusion ...... 107

5 Performance Analysis of Corner Detection Algorithms Based on Edge De- tectors 108 5.1 Importance of Edge Detection for Detecting Corners ...... 109 5.2 Edge Detection Methods ...... 110 5.2.1 Sobel Operator ...... 113 5.2.2 Robert Cross operator ...... 114 5.2.3 Prewitt operator ...... 114 5.2.4 Laplacian of Gaussian (LoG) operator ...... 114 5.2.5 Zero cross operator ...... 116 5.2.6 Canny Edge Detector ...... 116 Contents xii

5.3 Using Adaptive Canny Edge Detector ...... 117 5.4 Performance Study ...... 119 5.4.1 Dataset ...... 119 5.4.2 Evaluation Method ...... 119 5.4.3 Results and Discussion ...... 120 5.5 Conclusion ...... 129

6 Effective Interest Region Estimation Method to Represent Corners for Image Retrieval 130 6.1 What is Scale Invariance ...... 131 6.2 Concepts of Scale Space ...... 132 6.3 Determining Interest Regions using Scale space ...... 135 6.4 Proposed Method ...... 140 6.5 Performance Study ...... 142 6.5.1 Experimental Setup ...... 142 6.5.2 Evaluation metrics ...... 143 6.5.3 Parameter Settings ...... 144 6.5.4 Experimental results ...... 147 6.6 Conclusion ...... 148

7 Application 153 7.1 Introduction ...... 153 7.2 Bag-of-Visual-Words model ...... 154 7.3 Proposed Framework using BOVW model ...... 157 7.3.1 key point Detection ...... 157 7.3.2 Interest Region Estimation ...... 158 7.3.3 Codebook Generation and Image matching ...... 159 7.4 Experimental Results ...... 160 7.4.1 Experimental setup ...... 160 7.4.1.1 Training Stage ...... 161 7.4.1.2 Query Phase ...... 161 7.4.2 Results ...... 162 7.4.2.1 Dataset 1: Corel 1000 dataset ...... 162 Contents xiii

7.4.2.2 Dataset 2: PASCAL VOC 2007 dataset ...... 169 7.5 Conclusion ...... 181

8 Conclusion and Future Work 182 8.1 Summary of the Research Findings ...... 182 8.2 Future Work ...... 186

Bibliography 188 List of Figures

1.1 Local feature-based approach of image matching ...... 1 1.2 Importance of corners in visual shape recognition ...... 2 1.3 An example of detecting corner location and representation from two images ...... 3 1.4 Feature matching between two images ...... 4 1.5 Image matching process using contour-based corner detector ...... 6 1.6 The aims and achievements of this thesis by Chapter ...... 7

2.1 Example of feature representation using local-feature-based model . . . 14 2.2 Examples of different features (a) Corners (b) Blobs and (c) Regions . . . 14 2.3 Different edge operators (a) sobel (b) roberts (c) prewitt and (d) LoG . . 15 2.4 SUSAN corner detector ...... 23 2.5 Steps of contour-based corner detector ...... 26 2.6 An example of different Gaussian smoothing scale applied on curve (a) original curve (b) σ = 4 (c) σ = 8 (d) σ = 16 ...... 28 2.7 Representing an image by scale- space presentation ...... 35 2.8 SIFT descriptor building using gradients ...... 42 2.9 Building a SURF descriptor...... 42 2.10 Example of LBP formation (a) 8 neighbours of radius 1, (b)12 neighbours of radius 1.5, (c) 16 neighbours of radius 2 ...... 44 2.11 CBIR framework (Image courtesy [198])...... 47

3.1 Curvature estimation process for CPDA method, Image Courtesy [17]. 55 3.2 (a) Original Lena image [44] (b) local maxima location from the estimated CPDA curvature values (c) locations after discarding the weak corners using a curvature threshold (d) Ultimate corner sets after false corners removal using the angle-threshold ...... 56 3.3 Triangle formed for different angles using CPDA method ...... 57

xiv LIST OF FIGURES xv

3.4 Estimated curvature of Curve 1 of image 3.2(b) using chord L10, L20 and

L30 and using combined Equations 3.2 and 3.3 ...... 59 3.5 (a) Detected corners using CPDA detector; (b)-(d) resulting curvature using three different chords following normalisation ...... 60 3.6 Missing corner location due to using the refinement process by the CPDA detector ...... 61 3.7 Steps of the proposed methods ...... 62 3.8 Examples of T-junctions ...... 63 3.9 Curvature estimation using SCCPDA with chord ...... 64 3.10 (a) & (c) The location of the minima found using SCCPDA curvature estimation technique; (b) & (d) the final corner set by SCCPDA using the curvature threshold ...... 66 3.11 Three different curves with three different flatness ...... 67 3.12 Detection of the corner location within the curve segment ...... 68 3.13 Average repeatability of single chord CPDA (SCCPDA) detector against different tranformations ...... 73 3.13 Average repeatability of single chord CPDA (SCCPDA) detector against different tranformations (Continued) ...... 74 3.13 Average repeatability of single chord CPDA (SCCPDA) detector against different tranformations (Continued) ...... 75 3.13 Average repeatability of single chord CPDA (SCCPDA) detector against different tranformations (Continued) ...... 76 3.14 Average repeatability of CCSR detector against different tranformations 78 3.14 Average repeatability of CCSR detector against different tranformations (Continued) ...... 79 3.14 Average repeatability of CCSR detector against different tranformations (Continued) ...... 80 3.14 Average repeatability of CCSR detector against different tranformations (Continued) ...... 81 3.15 Average repeatability of different corner detectors ...... 82 3.16 Localization Error of different corner detectors ...... 82 LIST OF FIGURES xvi

4.1 Computation of distance between a point and a chord using distance accumulation technique ...... 86 4.2 Derivation of curvature value from an angle ...... 88 4.3 Corners identified by SCAD for different angles when using a chord of length 20 ...... 90 4.4 Suitable angles for different chord lengths ...... 91 4.5 Highest average repeatability of different chords ...... 92 4.6 Angle Vs Chord for the highest repeatability ...... 92 4.7 Average repeatability achieved when using Equations 4.6 – 4.9 to calculate angle ...... 94 4.8 Corners detected using our single chord approach when using chord lengths 15 and 30 with angles 157◦ and 152◦ respectively ...... 95 4.9 Corners identified by the different detectors ...... 97 4.10 Extracted edges found in ’Lena’ image ...... 98 4.11 Corners identified by using Multiple SCAD (MSCAD) ...... 98 4.12 Comparison of Average Repeatability of MSCAD detectors with other corner detectors ...... 100 4.13 Comparison of Localization Error of MSCAD detectors with other corner detectors ...... 101 4.14 Number of missed and repeated corners by the CPDA detector using different thresholds ...... 102 4.15 Number of missed and repeated corners by the detectors ...... 102 4.16 Corners detected on the Box image from the second data set ...... 104 4.17 Comparison of Average Repeatability of MSCAD detectors with CPDA and CCR corner detectors on Second Image Dataset ...... 105 4.18 Comparison of Localization Error of MSCAD detectors with CPDA and CCR corner detectors on Second Image Dataset ...... 105 4.19 Number of missed and repeated corners by the detectors on second Image dataset ...... 106

5.1 Example of detected corner from edge map of an input image ...... 111 5.2 Gradient direction ...... 112 LIST OF FIGURES xvii

5.3 Applying convolution kernel ...... 113 5.4 Masks used by Sobel Operator...... 113 5.5 Masks used for Robert Operator...... 114 5.6 Masks used for Prewitt Operator ...... 115 5.7 Masks used for LoG ...... 115 5.8 Canny edge detection process ...... 118 5.9 Corner detected by CPDA method using different edge operators . . . . 121 5.10 Corner detected by MSCAD method using different edge operators . . . 122 5.11 Number of extracted edges after applying different transformations . . . 123 5.12 Number of corners after applying different transformations ...... 123 5.13 Number of repeated corners after applying different transformations . . 124 5.14 Extracted edges and detected corners using Canny adaptive and Canny (0.2-0.7) ...... 127 5.15 Performance comparison of Canny adaptive and Canny(0.2-0.7) . . . . . 128

6.1 Example of interest region in two images ...... 132 6.2 Example of images with two different scales ...... 133 6.3 Scale space representation of an image ...... 134 6.4 Example of different scale levels (a) Original image, (b) σ = 1, (c) σ = 2, (d) σ = 4, (e) σ = 8 and (f) σ = 16 ...... 136 6.5 Example of characteristic scales. (a) and (b) two images with different focal lengths. (c) and (d) the response over scales, respectively. Image courtesy [126]...... 137 6.6 Characteristic scales using Laplacian operator (a) blob and (b) corner () 139 6.7 Interest region detection using edge extrema ...... 142 6.8 (a) Precision and (b) Recall graphs for CCSR corner locations against a

series of threshold (th) using different dimensions ...... 145 6.9 (a) Precision and (b) Recall graphs for MSCAD corner locations against

a series of threshold (th) using different dimensions ...... 146 6.10 Performance evaluation of different interest region estimation methods using different corner detectors ...... 149 LIST OF FIGURES xviii

6.10 Performance evaluation of different interest region estimation methods using different corner detectors (Continued)...... 150 6.10 Performance evaluation of different interest region estimation methods using different corner detectors (Continued)...... 151 6.10 Performance evaluation of different interest region estimation methods using different corner detectors (Continued)...... 152

7.1 Bag of Words Model ...... 155 7.2 Proposed Bag of Words Model ...... 158 7.3 Example of corners identified by using MSCAD corner detector . . . . . 158 7.4 SIFT descriptor calculation ...... 159 7.5 Examples of clustering using K-means algorithm ...... 160 7.6 Examples of Corel1000 dataset ...... 163 7.7 Example of matched flower images using different corner detectors of Corel 1000 dataset...... 164 7.7 Example of matched flower images using different corner detectors of Corel 1000 dataset (Continued)...... 165 7.8 Example of matched busimages using different corner detectors of Corel 1000 dataset...... 166 7.8 Example of matched bus images using different corner detectors of Corel 1000 dataset (Continued)...... 167 7.9 Performance analysis of CPDA, CCR, MSCAD, CCSR and SCCPDA corner detectors using the proposed BOVW-based framework on the Corel-1000 dataset) ...... 170 7.10 Number of retrieved images of different categories using different corner detectors (Corel dataset) ...... 171 7.11 Examples of PASCAL VOC 2007 dataset ...... 172 7.12 Example of matched bus images using different corner detectors of PASCAL VOC 2007 dataset with query image at the top...... 173 7.12 Example of matched car images using different corner detectors of PAS- CAL VOC 2007 dataset with query image at the top.(Continued). . . . . 174 LIST OF FIGURES xix

7.13 Mean Average Precision (MAP%) for class ’Car’ images using different corner detectors ...... 175 7.14 Mean Average Precision (MAP%) for class ’Horse’ images using different corner detectors ...... 176 7.15 Number of matched images for class ’Car’ images using different corner detectors ...... 177 7.16 Number of matched images for class ’Horse’ images using different corner detectors ...... 177 7.17 Performance of CPDA method using different threshold ...... 178 7.18 Number of matched images using CPDA method with various threshold along with proposed MSCAD detector ...... 178 7.19 Performance analysis of CPDA, CCR and MSCAD corner detectors using the proposed BOVW-based framework on the PASCAL VOC 2007 dataset ...... 179 7.20 Example of retrieved image using the proposed framework ...... 180 List of Tables

3.1 Curvatures calculated by CPDA detector ...... 58 3.2 Recommended Gaussian smoothing scale σ for corner detectors . . . . . 72 3.3 Total detected corners by using CPDA. SCCPDA, CCSR, DoG Detector, CCR, ARCSS, He yung , Eigenvalues, MSCP and GCM corner detectors 83 3.4 Total time to detect corners from 23 images ...... 83

4.1 Details of MSCAD detectors ...... 99 4.2 Total time to detect corners from 23 images ...... 103

5.1 Time computation for different detectors (in seconds) ...... 126

6.1 Characteristic scales of Figure 6.6 (a) and (b) ...... 139

7.1 Average precision of the retrieval results for varying codebook sizes constructed for MSCAD detector (Corel dataset) ...... 163 7.2 Average precision of the retrieval results for varying codebook sizes constructed for CCR detector (Corel dataset) ...... 168 7.3 Average precision of the retrieval results for varying codebook sizes constructed for CPDA detector (Corel dataset) ...... 169 7.4 Average precision of the retrieval results for varying codebook sizes constructed for CCSR detector (Corel dataset) ...... 169 7.5 Average precision of the retrieval results for varying codebook sizes constructed for SCCPDA detector (Corel dataset) ...... 170 7.6 Mean Average Precision (MAP%) of the retrieval results for varying codebook sizes using MSCAD corner detector (PASCAL VOC 2007 dataset)171 7.7 Mean Average Precision (MAP%) of the retrieval results for varying codebook sizes using CCR corner detector (PASCAL VOC 2007 dataset) 171 7.8 Mean Average Precision (MAP%) of the retrieval results for varying codebook sizes using CPDA corner detector (PASCAL VOC 2007 dataset)172

xx Chapter 1

Introduction

1.1 Background

Image features are the most important parts of an image. Image feature detection and matching are essential aspects in many computer vision applications [140,149,183,211, 212]. But the task of finding the image correspondence has become more challenging, especially under various geometric and photometric transformations such as scale, rotation etc. The size and quality of the images made the task more complex to get the attention of the entire field of computer vision applications in the past decades. A number of researchers studied the local feature-based approach as a solution to the problem of finding correspondence between two images.

Figure 1.1: Local feature-based approach of image matching

In local feature-based approach, the prominent points in the image, which are also

1 §1.1 Background 2

Figure 1.2: Importance of corners in visual shape recognition known as local features, are selected and represented to find the correspondence between two images. A local feature can be a point, edge or even a small image patche. The local feature-based approach basically consists of three stages: 1) feature detection, 2) feature representation and 3) feature matching as shown in Figure 1.1. These three steps are the key ways to solve any image correspondence problem using local features, however, they are not completely independent. In the feature detection stage, a set of local features are detected. A local feature is a prominent part of an image that can be differentiated by using image transformations such as rotation, scaling etc [27, 112, 187]. For instance, corners, blobs, contours or T-junctions can be classified as local features. Among them, corners are the most significant features because they can represent the shape of an object in two-dimensional images. This is depicted in Figure 1.2. The figure represents the corner locations of some object, from which we can easily visualize the shape and outline of the object. In other words, we can say that corners help to reconstruct the shape structure. After the feature detection stage, the extracted features are represented by the multidimensional feature vector calculated from the neighbourhood of the feature. This feature vector is called as a feature descriptor. A feature descriptor is a distinct key component §1.1 Background 3

which must have the capability to encapsulate enough important information from the neighbourhood of the feature location. A feature descriptor should be distinctive as well as resistant to noise. To find the matching pairs of local features between two images, the corresponding features need to be described with similar histograms. One of the difficulties between the steps of detecting and describing features is to decide the appropriate size of the neighboring region (usually known as the interest region) around the corner location in order to construct the feature descriptor. Estimating the correct size of the interest region for each feature is very important, particularly when calculating the scale invariance. An example of the process is depicted in Figure 1.3. In the Figure, the x’s are the feature locations and for a specific feature marked by red x, the descriptors are represented by a histogram using the neighbourhood pixels.

Figure 1.3: An example of detecting corner location and representation from two images

In the feature matching stage, the descriptors in one image are compared to other images to ﬁnd the matching pair of features. Figure 1.4 shows an example where two images are matched using their features. The effectiveness and efﬁciency of the local feature-based approach may vary depending on the nature of the applications. Some applications, e.g. image registration, §1.2 Motivation 4

Figure 1.4: Feature matching between two images need a small number of correct matches while others, e.g. object recognition, need a high number of correct matches. Some, e.g. real-time and mobile applications, need to complete the task in a short period of time. Therefore, we have concentrated on both the effectiveness and efﬁciency of the local feature-based approach in this thesis so that the proposed approach can be applied to a wide variety of image matching applications.

1.2 Motivation

As local feature-based approach is the trending interest in the ﬁeld of computer vision nowadays, there are still few platforms need to improve in the conventional methods and that becomes challenging for the researchers. There are several reasons that motivate our research on corner detection and matching.

1. Traditional local feature-based approaches detect a large number of features. Some of the features of them are not robust and do not appear in the transformed images. This results in decreasing accuracy in the feature matching stage. On the other hand, processing a large number of features is time-consuming. Com- §1.3 Objectives 5

pared to other feature points, corners are more robust. The number of corners in an image is generallly smaller than other feature points. Corners simplify the processing of an image by reducing the amount of data without hampering any important information about the object. However, some corner detectors are computationally more expensive in compared to others and generate high localization error. So, developing effective and efﬁcient corner detectors is a challenging task for the researchers in the ﬁeld of local feature-based approaches.

2. Among different types of corner detectors, contour-based corner detectors are more stable and less sensitive to noise [24, 76, 132, 153, 215]. The primary step of these detectors [17, 23, 24, 132, 154, 217] is to extract the edges that are relevant for corner detection. For contour-based corner detection, researchers have been using the Canny edge detector since its popularisation by [17] and this trend has continued without question in [17, 23, 132, 154] and others. However, for any successful application of a corner detector, it is important to analyse the performance that is solely related to the contour of the image.

3. To estimate the interest region around a detected feature, the traditional scale- space representation is commonly used, which usually combines in the feature detection step. However, it is a challenging task to estimate the interest region for the features that are not compatible to combine with the scale-space.

4. It is a fundamental problem in computer vision area to obtain the maximum number of corner matches between the original and transformed corner set by avoiding mismatches. To get the result of effectiveness and efﬁciency of corner detectors, it is important to apply them in an image matching application.

1.3 Objectives

The research focuses on robust corner detection methods. It also explores the area of how corners can be represented and matched effectively and efﬁciently in the application level. The fundamental questions of this research are:

1. How can we detect robust corner locations as a feature? §1.3 Objectives 6

2. What is the role of edge detectors on detecting corners?

3. How can we estimate the interest region around the corner locations to build descriptors?

4. How can we design and implement a robust corner based application for image matching?

Based on the research questions presented above, our focus is to develop an image matching framework using a contour-based corner detection method as shown in Figure 1.5.

Figure 1.5: Image matching process using contour-based corner detector

The main objectives of this research are summarized as follows:

1. Corner Detection: Detecting robust corners in an image is a challenging task. There are several corner detectors are presented in the literature which tend to miss some real corners or sometimes they detect false corners. Our main objective of this thesis is to detect most of the real corners in the images while reducing the number of false corner detection. We focus on ﬁnding the problems of §1.3 Objectives 7

existing corner detection methods and propose new corner detection methods which can overcome the difﬁculties.

2. Performance Analysis of Edge Detectors: Edges play important roles in contour- based corner detectors. The ﬁrst step of these detectors is to extract the edges from an image. The performance of the corner detection process solely related to edge detection. Thus, it is important to observe the role of different edge detectors in the corner detection process.

3. Interest Region Estimation: After detecting the location, a corner is represented by the feature vector build by the feature descriptor. To construct the feature descriptor, it is signiﬁcant to evaluate the neighbourhood area throughout the corner location. The size of the interest region determines the overall performance of feature matching. Our aim is to ﬁnd out the way to estimate the interest region which performs better with the features like corners.

4. Application: In order to examine the effectiveness and efﬁciency of the corner detectors, we propose a framework adapted from the traditional bag-of-words model for image matching.

An overview of the aims and achievements of this thesis are presented in Figure 1.6.

Figure 1.6: The aims and achievements of this thesis by Chapter §1.4 Contribution and Structure of the Thesis 8

1.4 Contribution and Structure of the Thesis

In this Section, we present the organisation of the thesis. The main focus of each Chapter is explained below.

1.4.1 Literature Review

Chapter 2 presents a comprehensive review of the existing feature detectors and feature descriptors for ﬁnding the correspondences between two images. It also presents the available interest region estimation methods for the detected features. In addition, it also reviews the metrics and the image datasets that we have used for the experiments in this thesis.

1.4.2 Single chord-based Corner Detectors

In Chapter 3, two efficient and effective corner detectors have been proposed [3]. We first propose the single chord CPDA detector and next, we propose another new approach named Chord to Cumulative Sum Ratio(CCSR) to estimate the curvature value on the curve using cumulative sum based analysis. Both corner detectors use the extracted curves for corner detection for an the image. The proposed corner detectors have been compared with nine recent corner detectors using an automatic evaluation system which does not need any human intervention. The proposed detectors perform better or are comparable with other detectors in terms of accuracy and reli- ability. However, the main achievement of the proposed detectors is in its efficiency. They are the most efficient detectors among all the detectors.

1.4.3 Multi Chord-based Corner Detector

In Chapter 4, we propose a new method [6] to detect the corners from image contours. We approach the problem by ﬁrst modelling a corner by two parameters, a chord length and an angle. We use a corner model (using the two parameters) to determine a suitable threshold value, which was then used by our proposed SCAD to identify suitable corners while eliminating false and weak ones. We will then use our proposed SCAD to perform exhaustive experiments which resulted in developing §1.4 Contribution and Structure of the Thesis 9

four equations, each of which can be used to determine a suitable angle for given a chord length to be used with SCAD, thus altogether eliminating the need to manually specify the angle. We demonstrated that using different chord lengths can result in better corner detection. So we have proposed a simple approach using multiple chords. MSCAD, our proposed multi-chord detector, performs multiple SCAD detections followed by a simple duplicate removal process.

1.4.4 Performance Analysis of Corner Detection Algorithms based on Edge Detectors

Edges play an important role in the contour-based corner detection process. In Chap- ter 5, we analyse the performance of different contour-based corner detection algorithms on the edge detection perspective [5]. Most of the detectors use a Canny edge detector with a pre-deﬁned threshold. Instead of following the trend, we observe the performance of different edge detectors on the corner detection process. The experimental results show that the use of different edge detectors has a profound effect on the results of the corner detection process.

1.4.5 Estimating Interest Region for Corner Locations

In order to describe the detected corners with the feature descriptor, the interest regions need to be estimated for each corner location. Therefore in Chapter 6, we propose an interest region estimation method [4] so that the feature descriptors can be built to represent the corners and eventually to ﬁnd out the correspondences between two images under different image transformations. The proposed method systemat- ically compares the robustness of the pixel information around each corner to select the distinct contents and uses the corresponding regions as the interest regions for the corners. The experimental study shows that the proposed method estimates the interest region better than the traditional methods under different image transformations. §1.4 Contribution and Structure of the Thesis 10

1.4.6 Application

In Chapter 7, we present an image matching application using our proposed corner detection and region estimation methods. For image matching, we adapt the popular Bag-of-Visual-Words model to ﬁnd out the efﬁciency and effectiveness of our proposed corner detectors.

1.4.7 Conclusion and Future Work

Finally, Chapter 8 concludes the thesis with the major ﬁndings and some thoughts on the direction of future research. Chapter 2

Literature Review

In this Chapter, we review the existing works related to the ﬁeld of our research that are explored in this thesis. The entire Chapter is organized as follows. In Section 2.1, we explain the details of an ideal feature, while Section 2.2 describes the local and global feature-based approach. In Section 2.3, we describe different types of feature detectors. Section 2.4 presents a detailed discussion of corner detectors and their clas- siﬁcations available in the literature. Section 2.5 explains existing methods to estimate interest regions around the corner locations. Section 2.6 reviews the existing local feature-based descriptors and their limitations. In Section 2.7, different feature matching approaches are described. We explained the Content-based Image Retrieval process in Section 2.8. Section 2.9 presents the existing approaches to evaluating corner detectors and local feature descriptors including the metrics used in the experiments. Finally, Section 2.10 summarizes the Chapter.

2.1 Feature

In the ﬁeld of image processing and computer vision, one of the main tasks is to represent the image by its features. Human eyes can identify and extract all the information of an image without any problem, but it is a challenging task for the computer system. But, the question is, what a feature really means in an image? A feature is basically an image primitive that represents the content of the image. It is typically found at a particular location in an image that holds important information to ﬁnd similarity between images. An ideal feature needs some important properties [192]:

1. Repeatability: Repeatability is one of the most important properties of an ideal feature. If the extracted feature in a reference image is detected in any trans-

11 §2.2 Global Features vs Local Features 12

formed image under different viewing conditions, then the feature is considered as repeated.

2. Distinctiveness: Distinctiveness is another important property of an ideal feature. The feature needs to be distinct compared to other feature locations. This distinctiveness makes the feature a salient part of an image.

3. Localization: A good feature needs to be well-localized. That means the features should be detected in the same corresponding location from the images regardless of any geometric and photometric image transformations. If the feature is not localized perfectly, the feature descriptors built on them will give different results. The localization accuracy is particularly useful for wide baseline matching, registration and structure from motion applications.

4. Quantity: The efﬁciency and effectiveness of an ideal feature depends on the number of features detected in an image depending upon the requirement of a speciﬁc application. If there is more features, it gives a number of true matches, but at the same time the possibilities of getting false matches become higher. On the other hand, if we get less features, then there might be a chance of not getting enough matches.

5. Efficiency: The features should be detected in a time-efficient manner, which has a huge impact on real-time applications. The more the detector is efficient, the less time it will take to detect the feature locations.

6. Invariance: A good feature should be invariant in any geometric and photometric transformations such as scale, rotation etc.

In literature, There are two types of image features are proposed by the researchers: 1) Global features and 2) Local features. In the next Section, we will discuss both of the image feature types.

2.2 Global Features vs Local Features

A global feature is a multidimensional vector, that is constructed using all the pixels of the image. In this approach, two images are compared by the extracted feature vector §2.2 Global Features vs Local Features 13

from each image. For example, if we want to compare the picture of an apple and the picture of a banana, the color feature will generate different vectors for each image. Thus, global features can be explicated as a particular property such as texture, color histograms, edges or even a specific descriptor of an image [43, 80, 141, 166]. Because of the high discrimination power, global features are generally used in different applications such as image retrieval and image classification [140, 149, 183, 211]. The main advantages of global features are that they are much faster to compute and generally require small amounts of memory. However, they cannot be used in the presence of noise or for joined objects [123]. The limitation lies on the all pixel based feature extraction. If the content of the image changes, that will potentially result in a change of the feature representation, which makes it difficult to obtain invariance to image transformations [125]. Unlike global feature-based approach, local features detect a number of prominent locations of an image to distinguish it from other images under various image transformations [27, 112, 187]. These locations are then described by using nearby pixel intensities, which is known as the interest region of that location. The image is then represented based on its local structures by a set of local feature descriptors extracted from a set of interest regions. A feature descriptor is basically a multidimensional vector and it is a distinct key component which must have the capability to encapsulate enough important information from the neighbourhood of the feature location. The descriptors should also be rich enough to distinguish themselves from other descriptors. These descriptors then can be used in various image applications. The main steps of local feature-based approach are two-fold: Feature detection and Feature description [127]. Local features are computed from local shape regions, and can potentially deal with occlusions. A local feature can be a point, edge or even small image patches. However, local features are sensitive to noise and rotation and by definition can comprise many thousands of instances in a single image. Therefore, preprocessing is required to handle noise reduction. Thus, local features are more intensive than global features. As shown in Figure 2.1, the locations of the features are identified from the image in the first step. The identified features are likely to be salient among all the image pixels such as corners in Figure 2.1. Therefore, the detected feature locations are §2.2 Global Features vs Local Features 14

Figure 2.1: Example of feature representation using local-feature-based model considered as the representatives of that image. The next step is to sample the image around the location of the feature and build a descriptor to describe the feature. Figure 2.1 shows an example of feature representation of an image using local feature-based approach. First, the salient features are identiﬁed from an image. Then, the next step is to build a descriptor around the location of the feature by sampling to describe that feature. Finally, in the feature matching step ﬁnds the most similar descriptor (or a set of descriptors) in the transformed image to each descriptor in the reference image.

(a) (b) (c)

Figure 2.2: Examples of different features (a) Corners (b) Blobs and (c) Regions §2.3 Types of Feature Detectors 15

(a) (b)

Figure 2.3: Different edge operators (a) sobel (b) roberts (c) prewitt and (d) LoG

2.3 Types of Feature Detectors

The feature detectors are categorized into different groups based on the detected feature types: (1) Edge detector (2) Corner detector, (3) Blob detector and (4) Region detector. Figure 2.2 shows the example of features detected by different types of detectors.

2.3.1 Edge:

Generally, Edges refer to the sharp change in image brightness. So, if there is a high difference between two neighbouring pixels, a possible edge is detected. The edge detector determines the transition between two regions based on gray level discontinuity. Edge detectors can be classified into two classes: 1) first-order differentiation based gradient operators such as Prewitt and Sobel operators [151,210], which appear in pairs as shown in Fig 2.3 and 2) Gaussian operators like Canny [33]. Gaussian operator is used to blurring images and remove noise. Edge detectors of both classes apply some simple convolution masks on the entire image in order to compute the first order (Gradient) and/or second-order derivatives (Laplacian). Recently efforts have been devoted to multi-resolution edge analysis, sub-pixel edge detection and hysteresis thresholding [78]. The Canny edge detector is one of the most popular methods to find edges by §2.3 Types of Feature Detectors 16

separating noise from the input image. [33]. The steps of the Canny edge detection algorithm are ﬁltering, hysteresis thresholding, edge tracking and non-maximal suppression. It uses Gaussian ﬁlter Gσ to smooth the image in order to remove the noise.

g(m,n) = Gσ(m,n) ∗ f (m,n) (2.1)

2 2 √ 1 m +n where Gσ = exp(− 2 ) 2πσ2 2σ

Most of the corner detectors in the literature are using canny detector with prede- ﬁned high and low threshold, 0.7 and 0.2 respectively [17, 23, 24, 132, 154, 217].

2.3.2 Corner:

Corners [66] are the most interesting features extracted from visual images. They are now playing an increasingly important role in the analytics of visual big data, not only in overcoming the curse of dimensionality but also in saving the computation and storage burden. In general, a corner is described as a position on an edge or curve, where the slope angle switches instantly. That means in that point the entire curvature becomes high [39]. Corners are very dominant features to depict the structure of an object in two-dimensional images. We will discuss the corner detection process in detail in the next Section.

2.3.3 Blob:

A blob is a point or region in an image which is darker or lighter than its surroundings. It generally consists of dark image pixels surrounded with bright pixels, or a bright pixels surrounded with dark pixels. A blob helps to identify image areas which are very smooth and cannot be detected using corners or edges. Most of the researchers used derivative-based differential methods for Blob detection [105, 109]. Lindeberg [105] deﬁned a blob as being a region associated with at least one local extrema for a given Laplacian function. He proposed a scale-invariant blob detector that is capable of extracting any sized blobs. For a given image I(x,y) and given scale σ, a scale-space representation L(x,y,σ) is obtained by convolving the image with the Gaussian kernel: §2.3 Types of Feature Detectors 17

L(x,y,σ) = I(x,y) ∗ g(x,y,σ), 1 m2 + n2 (2.2) g(x,y,σ) = √ exp(− ) 2πσ2 2σ2 Then, the Laplacian operator is applied to such scale space representation according to 2 5 L(x,y,σ) = Lxx(x,y,σ) + Lyy(x,y,σ) (2.3)

where Lxx(Lyy) denotes the second-order partial derivative along the x(y)-axis. There are a number of blob detectors proposed in the literature [27,86,109,112,128]. The Hessian detector [175] detects blob-like structures based on the determinant of the Hessian matrix.

  Lxx(x,y,σ1,σ2) Lxy(x,y,σ1,σ2) H(x,y,σ1,σ2) =   Lxy(x,y,σ1,σ2) Lyy(x,y,σ1,σ2)

where Lxx(Lyy) is the second-order partial derivative along the x (y)-axis and Lxy the second-order mixed derivative. But the method also detects false blob locations that are either on the edge or where the signal changes in one direction. Mikola- jczyk [125] applied the scale space framework [108] with the Hessian matrix to make the blob detector scale and affine invariant and named it Hessian-Laplace [126]. The Salient Regions detector [86] uses the entropy of PDF (Probability Distribution Func- tion) of intensity values within a local region instead of derivative-based approach. The three blob detectors discussed above are more expensive to compute due to the derivatives or entropy at each image location. To improve efficiency, DoG [112] and SURF [27,187] have been developed. DoG [56] approximates the Laplacian and SURF [9, 10] approximates the Hessian matrix by using integral images [179, 200]. DoG is quicker than LoG as there is no need to calculate the second-order derivatives, but always suffers from localization problem. A generalized Laplacian of Gaussian (gLoG) filter, proposed by [96], detects common elliptical blob structures. From a set of auto- matically selected gLoG filter banks, an intermediate map is generated by clustering the convolution responses. The centers, as well as the scales, shapes, and orientations of the detected blobs, are accurately measured by detecting the local extrema of this intermediate map. §2.3 Types of Feature Detectors 18

Recent studies [7,27,48,67] have concentrated their research on accelerating up the detection process while keeping the accuracy same or at the expense of slightly less accurate result.

2.3.4 Region:

A region can be defined as the collection of homogeneous pixels in the image. Matas [122] introduced a region detector called maximally stable extremal regions (MSER), which is based on watershed segmentation algorithm [199] to find out the homogeneous regions which have either higher or lower intensity than all the pixels in its outer boundary. The set of regions are obtained by thresholding a gray-level image. Although this method is not an extensive process, it does not perform well against blurred or noisy image as well as scale changes. Intensity extrema-based Region (IBR) proposed by [194] detect local extrema of image intensities over multiple scales. Based on a local extremum in intensity, an intensity function is defined which searches the similar pixels in a radial way. Another segmentation-based region detection technique is called Superpixels [136, 159]. Superpixels are produced by using a popular image segmentation technique called normalized Cuts [176]. Superpixels has been used successfully in image segmentation, target localization, tracking [178] and modelling and exploiting mid-level visual cues [192]. However, superpixels are not suited for the purpose of image registration as the uniform regions are far from discriminative. Recently a Boundary Preserving Local Region detector (BPLR) was proposed by Kim and Grauman [91]. This method can differentiate several regions in an image by a boundary preserving manner. Among the types of features, blob and region are detected in a huge number, some of them are not stable and do not have enough repeatability. In addition, the locations of a blob or a region do not carry any properties of the blob or region. On the other hand, most of the corner detectors are able to give the information about the corners detected by them. Based on these strength of the corner-like feature, we have concentrated on the corner detection in this thesis. §2.4 Corner Detectors 19

2.4 Corner Detectors

Since one of the contributions of this thesis relates to corner detection, we review the different types of corner detectors in this Section. Detecting corners is one of the most important operations in the ﬁeld of computer vision. Corners have some inherent properties which make them more distinctive and informative. There are several corner attributes that can help in the detection and evaluation tasks [49,113,162,175,192]:

1. Location: The X and Y coordinates of the corner location

2. Curvature angle: The angle between the curves at the junction of corner location

3. Orientation: The actual orientation of the corner location

4. Classiﬁcation: Different types of corners

5. Scale: The maxima or minima of the edge for a possible corner

6. Contrast: The difference in intensity of the corner point with the background

The most important attribute of a good corner is the localization of true corner points. The remaining criteria are application-based: for example, repeatability measures the usefulness of detectors for matching applications, while stability refers to the appearance of corners in multiple images of the same visual scene, good for stereo and tracking applications. In an image alignment application the localization accuracy and repeatability rate is critical, but detecting all true corners is of little importance, as long as a sufficient number of interest points are available to allow accurate alignment of the images. However, in an object recognition application, failing to detect a corner may result in a different description of the object being generated, leading to a misclassification of the object. We can broadly classify corner detectors into three categories [23,62]: 1) Intensity- based corner detector, 2) Contour-based corner detector and 3) Model-based corner detector. The corner detectors and interest-point detectors, which have been discussed above, can also be classified into two groups: single-scale detectors [73] and multi-scale detectors [126,130,132,156]. Single-scale detectors, as the name says, use only one particular smoothing-scale. Corner detectors in this group perform better if the image has §2.4 Corner Detectors 20

features with almost similar sizes. But there is a possibility to become this ineffective if the image contains both the fine scale and the coarse scale as these type of detectors are designed to detect either fine or coarse scale features. If a scale with smaller size is applied, then the detector turns noise sensitive. As a result a number of false and weak points can be detected, which will, eventually, result in increasing the complexity of the search space as well as the matching algorithms. On the other hand, if the large scale is used, the detector again becomes robust to noise but it will overlook many important fine details of the image. Furthermore, feature-point detection on a small scale provides better localization (accuracy) of the detected feature-points than on a high scale. Different multi-scale detectors proposed in the literature are simply the extensions of single-scale detectors. For example, Gao et al. [59] proposed an improved version of the Harris detector [73] using a multi-scale analysis, which increases the robustness and generates good localisation of the corners. Basically, multi-scale detectors work with two or more smoothing scales which gives these detectors the opportunity to improve the effectiveness of the overall corner detection process. Most of the multi-scale corner detectors are based on the scale-space theory [25, 94, 109]. Though multi-scale detectors are most of the cases, able to solve the aforementioned problems, they are not always less complicated. If a detector, for example, Rattarangsi and Chin [156], detects feature points at all scales it becomes computationally more costly. Furthermore, to combine the feature points detected using multiple scales is also a difficult task to implement. On the other hand, if the detectors use a small set of smoothing-scales [130, 132] , it makes the process computationally less expensive, but it does not resolve the difficulties of single-scale detectors entirely. However, they also trace the feature points which are identified to a similar lower smoothing-scale, which is later helps to combine feature points which are detected in multiple scales.

2.4.1 Intensity-Based Corner Detectors

Intensity-based corner detector uses a measure based on intensities or gradients in a neighbourhood of an image point to decide whether it is regarded as a corner or not [73, 133, 163, 181, 214]. A contour-based corner detector extracts contours after detecting edges using an edge detector such as [33] and then searches for curvature §2.4 Corner Detectors 21

maxima [24, 76, 131, 132, 212]. Finally, a model-based corner detector determines corners by comparing image information with a predeﬁned model.

2.4.1.1 Classiﬁcations of intensity-based corner detectors

Intensity-based corner detectors can be classiﬁed into two groups- 1) derivative-based and 2) non-derivative-based corner detectors. These two groups are based on the method of cornerness measurement and the total number of scales to detect the corners. Derivative-based corner detectors can also be further categorised into single- scale and multi-scale detectors. The scale deﬁnes the level of Gaussian smoothing applied on the image. The details of these two categories will be discussed in the following sub-Sections.

1. Derivative-based corner detectors Derivative-based detectors ﬁrst consider the neighbourhood of each image point and then apply a local operator in this region. When an edge crossed this neighbourhood area, that local operator computes the curvature. Based on this theory, Kitchen and Rosenfeld [93] proposed an algorithm utilizing edge intensity and direction information using Prewitt operator to calculate the local gradients for the 0cornerity0 measurement. But this method is sensitive to noise as it depends on second-order derivative terms and not suitable for natural scenes for its poor repeatability and localization. Before this approach, Beaudet [28] introduced an operator to detect corner points as the determinant of the Hessian matrix. Later Dreschler and Nagel [46] applied Beaudet’s concept in their proposed detector as well. But this operator was unsuccessful to detect sharp corners. Moravec [134,135] considered the large intensity variation of an image and introduced the concept of ’point of interest’. [73] reported that Moravec’s method is prone to detect false corners and it is not robust to noise. To overcome the limitation, Harris et al [73] developed Plessy detector or Harris interest-point detector using circular Gaussian smoothing to reduce noise and by measuring the intensity variation in all directions from every pixel as follows:

2 2 (Ix) + (Iy) Cp(x,y) = (2.4) 2 2 2 (Ix )(Iy) − (IxIy) §2.4 Corner Detectors 22

where Ix and Iy were found using the n×n ﬁrst-difference approximations to the 2 2 partial derivatives and calculated Ix , Iy , and IxIy, then applied Gaussian smooth- 2 2 ing, and computed the sampled means (Ix ), (Iy ), and (IxIy) using the (n × n) neighbouring point samples. This idea has been later extended in several studies [173, 218]. But again this detector suffered from poor localization of corner points due to the convolution masks used to approximate the ﬁrst order or the second-order derivatives of image intensity [181].

To overcome the problem, several intensity-based corner detectors were developed [51, 63, 64, 139, 177].

2. Non-Derivative-based corner detectors

Instead of using a derivative, Non-derivative-based corner detectors consider the pixel or gray-level values. One of the most popular non-derivative-based methods is SUSAN that was based on [69] and proposed by Smith and Brady [181]. This method considers an image pixel and the corresponding circular mask around it. The center pixel is called the nucleus. If the brightness of each pixel within a mask is compared with the brightness of that mask’s nucleus then an area of the mask can be defined which is as “USAN”, an acronym standing for “Univalue Segment Assimilating Nucleus” and the smallest area of is called as SUSAN (Small Univalue Segment Assimilating Nucleus) as stated in Fig: 2.4. To find corners, the area and the center of gravity of the USAN is computed. Though this method is well localized and robust to noise, because of its non-derivative approach, it works poorly on smoothed images and has the lower repeatabilty rate. Based on SUSAN detector, Trajkovic and Hed- ley [189] proposed a method to measure the similarity between diagonal pixels of the neighbourhood to distinguish a corner. Recently, Rosten and Drum- mond [163] proposed an efficient corner detector named Features from Acceler- ated Segment Test (FAST), which classifies a point as a corner if the neighbourhood pixels within a circular region are significantly brighter or darker than the center point. The adaptive and generic accelerated segment test corner detector (AGAST) [115] improves the performance of FAST by proposing a decision tree- §2.4 Corner Detectors 23

Figure 2.4: SUSAN corner detector

based optimization framework.

3. Multi-scale corner detectors

The detection of multi-scale interest point is dependent on a Gaussian smoothing function with a number of different scales. The derivatives of the Gaussian can also be used to build multi-scale representation. Based on the scales, the Gaussian smoothing factor act as a unique ﬁlter and returns different values to convolve the Gaussian function with the input image to output a smoothed image [25]. Multi-scale based detectors basically use the concepts of single scale detectors at every scale in a series of scales to achieve scale invariance. But the main drawback of this method tends to a higher number of false matches as the same point might get detected at each scale within the scale range making it computationally expensive.

Lindeberg [109] introduced the concept of automatic scale selection by applying the scale space tool which allows the feature detectors to detect features with their own characteristic scale. According to Lindeberg, the characteristic scale of a local image pattern is the scale parameter at which the Laplacian ﬁlter provides a local maximum. Here, the scale determines the window size of the Laplacian ﬁlter. Each feature of an image will exhibit a small number of such characteristic scales which are the extrema values over a series of scales. As the §2.4 Corner Detectors 24

size of the real-life objects varies, and the distance between camera and object is unknown, it is impossible to determine the actual characteristic scale of an object from an image without using multi-scale representation. For this reason, a range of scales should be considered to determine the characteristic scales.

To achieve scale invariance, Mikolajczyk and Schmid [126] used the Harris corner detector to integrate scale space at different scales. Only the corners having extrema value for the Laplacian of Gaussian (LoG) function across the scales are selected as prominent features. Later, [219] proposed a method to analyze the topological structure in multiple scales using the Harris detector and [36,58] proposed the similar using Wavelet transformation. Lee et al. also propose to use a wavelet transform to identify corners at different scales [99]. By representing the signal in the wavelet domain, they are able to detect both arcs and corners at multiple scales.

2.4.2 Dominant Point Detectors

Dominant point detectors detect corners from the curve, mostly on closed curves, to identify a shape or object [34, 40, 160, 184]. These detectors generally try to detect the changes of slopes on the curve, which results in detecting a higher number of corners than the contour-based detectors. Rosenfeld and Johnston [160] estimated the curvature value by using the cosine value and the resulting curvature extrema were considered as the dominant points. The main drawback of this method is that it can generate false dominant point when the curves are close to each other. Later, Rosenfeld and Weszka [161] proposed an improved method by averaging the cosine value of successive non-overlapping slopes. Teh and Chin [184] proposed a dominant point detector using the ratio of the distance between a candidate point to a chord, to the length of a chord, for estimating the RoS and later the same information is used as the curvature value to detect the dominant points. Cornic [40] proposed an adaptive algorithm to detect the dominant points by using the curvature values of all the locations on the curve. To ﬁnd the bending value to use as the curvature of a location, Wang et al. [202] proposed a method by using the directions of the forward and backward vectors. [205] improved this work by using §2.4 Corner Detectors 25

the adaptive bending value to detect the dominant points.

2.4.3 Contour-based Corner Detectors

Contour-based corner detectors detect corner locations based on the contour or boundary detection. Contours carry very important information. A number of contour- based corner detectors have been proposed in the literature [17,23,24,131,132,154,217]. Contour-based techniques generally work in two stages: first, it used an edge detector to extract the contours or curves from an image. Finally, it finds the locations with the highest or lowest curvature. Comparative studies of contour-based corner detectors can be found in [21,22,130,172]. Contour-based corner detectors are more reliable than intensity-based detectors for the invariance under image transformations, robustness and efficiency.

2.4.3.1 Steps of Contour-based corner detectors

Most of the contour-based corner detectors follow the four main steps [22] shown in Figure 2.5. The steps are edge extraction and curve selection, noise removing, curvature estimation, and corner ﬁnding.

Edge extraction and curve selection

Extracting the edges from an image that is relevant to detect corners is the first step of the contour-based corner detection method. Detecting edge is a very important step of the overall corner detection process. The location and numbers of extracted edges depend on the applications. A few applications like medical imaging requires perfect edge identification which is time-consuming, while different applications like mobile robot vision require real-time vision calculations and do not rely on impec- cable edge recognition. For contour-based corner detection, researchers have been using the Canny edge detector since its popularisation by [17]. CPDA corner detector [17] first uses Canny edge detector with thresholds low = 0.2 and high = 0.7 and this trend continues in [23,24,132,154] and other recent chord-based corner detectors. As the number of corners detected depends on the number of edges extracted, [5] ana- §2.4 Corner Detectors 26

Figure 2.5: Steps of contour-based corner detector §2.4 Corner Detectors 27

lyzed the role of canny edge detection method with the both adaptive and predeﬁned threshold. The extracted curves can be of three types: 1) short curves, 2) loop curves and 3) normal curves. Short curves are normally treated as noise because they are not likely invariant to geometric image transformations. To remove them, Awrangjeb and Lu [19] proposed an approach that depends on the resolution of the image. Loop curves are the curves with both ends connected. The minimum length of the edge is determined by the following equation:

w + h L = (2.5) min α where w and h are the width and height of the image respectively and α is the edge- length controller.

Noise removing

The next step is to remove noises from the extracted curves. If the noises are not removed properly, it is likely to detect false or weak corner locations. In literature, there are two methods proposed for noise removal: 1) by applying Gaussian smoothing [17, 132, 154] and 2) using Region of Support (RoS) [34, 184, 205]. For the Gaussian smoothing-based approach, a scale (σ) is applied in the image. The bigger the scale is, the more the noise will be removed. But choosing the appropriate scale is an important task because it affects the localization of a corner being detected. Sometimes choosing a small scale can give better localization. Figure 2.6 shows the effect of applying different Gaussian smoothing scales on the curve. Now, in the Region of Support (RoS) based method, a region is deﬁned using x and y number of pixels apart from the candidate locations to the left and right sides respectively [34,39, 205]. [34] proposed a method to choose the number of pixesl adaptively. Again choosing the region of support is a challenging task because a large region may smooth out the detail of a curve while a small region may generate false corner locations. Teh and Chin [184] proposed a corner detector which does not require any input parameter to compute the RoS. However, this method is not robust in noisy §2.4 Corner Detectors 28

(a) (b)

Figure 2.6: An example of different Gaussian smoothing scale applied on curve (a) original curve (b) σ = 4 (c) σ = 8 (d) σ = 16

contours [40]. Therefore the characterization of digital straight segments by Freeman chain codes [146,204] was used to compute the RoS. Later [202,205] used the bending value and [117] used an integral square error criterion to estimate the RoS.

Curvature estimation

One of the most important steps of detecting corner is estimating the curvature value at each location of the curves. The curvature value indicates the cornerness of that location. In literature, various methods have been proposed to estimate the curvature values. [160, 161, 205] estimated the curvature by calculating the change in the slope angle of the line segment, while [24, 132] used the derivatives along the curve. One of the popular method was proposed by [17], which calculate the perpendicular distance between the candidate point and line segment of two neighbors. §2.4 Corner Detectors 29

Finding corners

After the curvature value evaluation, the next step is to find the extrema locations from those values. These extrema locations then used to make the primary set of prominent corners. Some detectors [142, 160, 161] used these locations as final corner set without further refinements, others used some refinement process to define the final sets. [17, 154] applied fixed curvature thresholds, while [34, 205] used adaptive algorithms to dynamically estimate the threshold as the refinement process.

2.4.3.2 Contour-based corner detectors

Contour-based corner detectors can be classiﬁed into various groups from different points of view, such as the number of Gaussian smoothing scales to remove the noise from the curve or the type of curvature estimation techniques, to measure the cornerness of the locations. Next, we will discuss these two groups in detail.

1. Classiﬁcation based Gaussian smoothing:

In this technique, the Gaussian smoothing is applied to the extracted edges in contour-based detectors. Based on the number of Gaussian smoothing scales used, contour-based corner detectors can be categorised into two groups: single- scale and multi-scale corner detectors. Single-scale detectors [14, 76, 132] use only one smoothing scale to each curve to calculate the curvature value. The Curvature Scale Space (CSS) [132] corner detector is a widely used single-scale detector. This detector estimates the curvature value using a coarse smoothing scale for respective pixel through the curve. Later, it identifies prominent corner locations. To improve the localisation of the detected corners, it then uses a finer scale to track the corner locations. But selecting an appropriate scale by eliminating noise is a challenging task in this process. A coarser scale makes the detector robust to noise but at the same time, it misses some potential corner locations. On the other hand, a finer scale finds significant corners, but it is sensitive to noise. To overcome this problem, Awrangjeb proposed the en- §2.4 Corner Detectors 30

hanced CSS [131] detector which uses different scales with various lengths for curves. But the problem of accurate localization of the corner location still per- sists. Similarly, Awarangjeb [17] proposed the CPDA detector to apply different smoothing scales based on the length of the curve. Awrangjeb et al. proposed a multi-scale detector (ARCSS) [24] using three different scales. Instead of the arc-length, he used afﬁne-length parametrisations to detect the corners, which made the detector more expensive to calculate. He and Yung [76] used an adaptive threshold to modify the original CSS detector. This threshold was chosen according to the curvature of its neighbourhood region and the angle on the proper region of support. A few detectors [23, 147, 156, 157, 212] use a range of smoothing scales on all the curves of the image and later they combine or select the measured cornerness from all versions of the curves. The Multi-scale Curva- ture Product (MSCP) [212] uses three different scales and applies them to each of the curves. Later, they combine the curvature measures by making a product of the corresponding locations of the curve. Similar work is found in [83], where the sum is used instead of making a product.

The main disadvantage of multi-scale detectors is that the cornerness of same locations is being measured in multiple scales, which is computationally expensive. To solve the problem, Ray and Pandyan proposed the tree-based adaptive way to smooth the curve by avoiding the construction of the scale space map [156]. A few detectors [61, 143] use a series of scales.

The Multi-scale Curvature Product (MSCP) [212] applies three different scales on each of the curves to estimate the curvature value. Later, the values are combined by making a product of the corresponding locations of the curve. There are a few other multi-scale representations to detect the corner location, such as wavelet-based detectors [15, 56, 98] . These methods search for a one dimensional orientation signal of a curve to acquire the wavelet maxima points. But, before the processing, the discrete wavelet transform is applied in two scales. Recently [84] proposed a new method which fragments the contour orientation function into scales of different parameters. This contour orientation function is obtained via the original image chain. Thus, to identify the dominant §2.4 Corner Detectors 31

shape points, local maxima and minima of successive scales of this function are acquired through a multiscale wavelet decomposition. [213] proposed an anisotropic Gaussian directional derivative-based method (ANDDs) to detect corners, which is robust to noise and supply excellent directional intensity variation throughout a pixel.

2. Classiﬁcation based on curvature estimation techniques:

Now, we will discuss the classiﬁcation based on estimating the curvature value on the extracted curves. The curvature estimation techniques can be classiﬁed into two groups: direct and indirect methods. The direct methods use geometric or algebraic calculations to determine the corners on high curvature points [17, 90, 132]. These methods usually look for the robust corner locations which can also be found under different image transformations.

[131, 132, 212] are the widely used Curvature Scale Space-based methods that used Euclidean curvature to estimate the curvature values. These techniques consider a 2 × 2 pixel neighbourhood on both sides of the candidate location. The properties of Curvature Scale Space-based methods have been investigated in [156, 220]. He and Yung [76] modiﬁed the original CSS detector by using an adaptive threshold based on the surrounding pixel’s curvature and then calculating the angle on the region of support. Similar work is found in [83]

Among the contour-based detectors, some recent chord based detectors have been reported to perform better than other contour-based corner detectors [21, 22, 154, 185]. The CPDA corner detector [17], based on the Chord-to-Point Dis- tance Accumulation (CPDA) technique [72], is one of the best chord-based corner detectors reported by the researchers. CPDA technique [72] calculates the curvature values of a two-dimensional planar curve. Later, Awrangjeb proposed a multi chord-based angle detector using the original CPDA technique. To estimate the curvature value, the general approach is to place a line on the curve and move it for each location being interior from nearest left most location to the nearest right most location. While moving the line, it calculates the distance §2.4 Corner Detectors 32

from the interior location to the straight line and make a sum of them. This sum is the curvature value of that particular location. Then it normalises the approximated curvature values of the individual chord. Then, the estimated curvature values at each point of the chord are multiplied to get the ultimate curvature value for each location. Finally, the local maxima found from the multiplied values are selected as possible corners and later reﬁned to form the absolute corner set. Fast CPDA [23] is an extension of the CPDA detector by Awrangajeb where the detector obtains some probable candidate points before starting the curvature estimation technique by using different levels of Gaussian smoothing. An analysis of the single chord CPDA corner detector can be found in [168].

Chord to Curve Ratio (CCR) and Chord to Triangle Arms Ratio (CTAR) were proposed in [185] are two very recent chord-based corner detectors. CCR uses the ratio of the chord length and the corresponding curve segment, and assign the value to the middle point of that curve segment, while CTAR estimates the curvature values by using the ratio of the length of the chord to the summation of the length of the other two arms of the triangle, which are from the middle point to each respective ends of the chord. Next, the maxima of the curvature values of both detectors are ﬁltered against a threshold to get the detected corners. Both of the detectors use a single chord to detect the corners.

In the detection process, most of the chord based corner detectors use at least one threshold value. CPDA uses two threshold values, one curvature threshold and a second angle threshold. All the threshold values in the different corner detectors are either manually or experimentally determined. Very recently MSCAD, a multi-chord based corner detector, is proposed [6]. This method uses a corner model to calculate a suitable threshold and based on that value it calculates the angle. Based on these information, it uses multiple-chords to detect prominent corner locations. This detector gives better results than CPDA and other chord-based detectors.

A KD-curvature-based corner detector [37] was proposed by computing KD curvature for each point on the curve. They also introduced corner strength, as a new concept, for controlling detection precision. §2.4 Corner Detectors 33

Several detectors use different techniques to handle the curve before corner detection. The detector proposed by [216], utilizes a number of different levels of Difference of Gaussian (DoG) to a curve to reach corresponding planar curves. The corners of the curves are then detected using these planar curves. There are other approaches that use matrix manipulation techniques to detect corner locations. For example, Eigenvalues of the covariance matrix [191] and Gra- dient Correlation Matrix (GCM) [217] use matrix manipulation to process the curve to ﬁnd the appropriate corner locations. These detectors are very complex in terms of computation because of the matrix manipulation. On the other hand, some other detectors [57, 100, 152] works on the wavelet transform. Basi- cally, these detectors use a wavelet transform on the selected curve in order to obtain multiple wavelets. These multiple wavelets indicates its contour orientation. Although,the obtained wavelets are very close to second derivatives of the curve, still they are very noise sensitive. Recently [84] proposed a new method which decomposes the contour orientation function obtained via the original image chain into scales of different details. Thus, local maxima and minima of consecutive scales of this function are obtained through a multiscale wavelet decomposition to identify the shape dominant points.

On the other hand, the indirect curvature based methods [95, 119] are mostly based on the polygonal approximation of the curve. The locations detected by the indirect methods are mainly used to represent the boundary of the shapes or pattern. Generally, they are divided into the following categories: 1) Detec- tors using local neighbourhood information of the curve and 2) Using polygonal approximation or curve fitting. For the first approach, the curvature values are estimated using the neighbourhood pixel information. A number of corner detectors proposed in literature follow this approach [39, 121, 223]. On the other approach, a curve fitting is applied on the polygonal structure for the curvature estimation. [55, 190] used curve fitting approach to detect corners. On the other hand, [118, 120, 171] used a polygonal approximation to measure the curvature. §2.5 Interest Region Estimation 34

2.5 Interest Region Estimation

One of the most important steps to describe local features is to calculate the region throughout the feature location to reach the invariance against different image transformation. The pixels inside the interest region are used to build the descriptor. This is a fundamental step to describe the corner feature. But the process is challenging under different image conditions. If the size of interest region remains unchanged for every feature, then a feature will barely match with the corresponding feature of another image which is captured in a different scale because the descriptors built on the interest regions will not be similar. Therefore, interest regions need to be estimated in such a way that the image structures within the regions remain the same for the corresponding features so that the corresponding feature descriptors become the same. Interest region estimation process can be classiﬁed into two categories- 1) estimating scale-invariant interest region. and 2) estimating afﬁne-invariant interest regions [158].

2.5.1 Estimating scale-invariant region

Scale invariance is an important property of a local feature. To achieve this property, the features of two different images which are captured from two different scales need to be matched. Therefore, the main task is to describe the corners of unknown real-life objects in such a way that the corresponding corners always have similar descriptors regardless of the scale differences. Basically, the task is to deﬁne the neighbourhood area of a corner location whose pixel information will be used to build the scale-invariant descriptor. Traditional methods [112,126] derive the interest region for each feature when detecting the local feature. Later, the interest region is used to describe the feature at the description step. However, a few methods [41,42] also derive the interest region after detecting the feature locations. Most of the corner detectors derive appropriate scales to estimate the region to build descriptors. Scale-invariant interest regions can be estimated in two ways- 1) at the time of feature detection and 2) at the time of feature description. Next, we will discuss these two processes brieﬂy. §2.5 Interest Region Estimation 35

Figure 2.7: Representing an image by scale- space presentation

2.5.1.1 Region estimation at the time of feature detection

In this approach, the information of the interest region is determined at the same time as the corner detection process. This information is then used to describe the feature around the neighbouring pixels, which is referred as the area of the region, of that particular feature location. Based on the content inside the region, the descriptor is built. The major challenge of constructing the descriptor is to calculate the correlated local regions surrounding the corner despite the scale differences. Intensity-based corner detectors [73, 163, 181] generally follow this perspective. To achieve scale invariance, Lindeberg [109] introduced the scale space theory. A scale space is generally deﬁned as a 3D representation of an image which is generally presented as a stack of gradually smoothed images (x, y, scale). Fig: 2.7 shows an example of scale-space representation. According to Lindeberg, the Laplacian-of-Gaussian (LoG) kernel is the best to generate the scale space. LoG is deﬁned as:

2 LoG(x,σ) = σ (Ixx(x,σ) + Iyy(x,σ)) (2.6) where x is the pixel location; Ixx and Iyy are the second derivatives of the pixel intensity in X and Y directions respectively; and, σ calculates the LoG kernel size to smooth the image. The LoG ﬁlter mask corresponds to a circular center-surround structure, with positive weights in the center region and negative weights in the surrounding ring structure. Thus, it produces maximal responses if applied to an image neighbourhood §2.5 Interest Region Estimation 36

that contains a similar (roughly circular) blob structure at a corresponding scale. By searching for scale-space extrema of the LoG, we can therefore, detect circular blob structures. Later, Mikolajczyk and Schmid proposed the Harris-Laplace detector [126] as a combination with the Harris corner detector [73] and Lindebergs’s automatic scale selection method [109]. The method first builds up two separate scale spaces for the Harris function and the Laplacian. It then uses the Harris function to localize candidate points on each scale level and selects those points for which the Laplacian simultaneously attains an extremum over scales. The resulting points are highly discriminative and robust to changes in scale, image rotation, illumination, and camera noise. As in the case of the Harris-Laplace, the same idea can also be applied to the Hes- sian, leading to the Hessian-Laplace detector. As with the single-scale versions, the Hessian-Laplace detector typically returns more interest regions than Harris-Laplace at slightly lower repeatability [128] Lowe [112] proposed another efficient method using the Difference of Gaussian (DoG) to detect local maxima or minima in the scale space. Difference of Gaussian (DoG) kernel is generally an estimation of normalised LoG scale, In this approach, Gaussian kernels of increasing scales are convolved with the original image to obtain the stack of layers of gradually smoothed images. The final DoG representations are then can be achieved by deducting each pair of connecting smoothed images. Then the smoothed images are sub-sampled and the process repeats to a preset sub-sampling size. A pixel with its eight neighbouring pixels and nine neighbouring pixels on each of its two adjacent DoG representations are compared to detect the local maxima.

DoG(x,σ) = (G(x,kσ) − G(x,σ)) ∗ I(x) (2.7) where k is the scale factor. Many recent algorithms perform slightly better, such as Wash detector [196] and Boundary Preserving dense Local Regions (BPLR) [92]. §2.5 Interest Region Estimation 37

2.5.1.2 Region estimation at the time of feature description

In this approach, the information to construct a scale-invariant descriptor is decided after the detection of the corner. Contour-based corner detectors usually utilize this approach to locate the corners from the edge. Contour-based detectors [6, 17, 132] examine the edges with big neighbourhood pixels, this approach leads to reach the robustness of the corner detection. Moreover, as the scale space successively smooth- out the image content, the possibility of having the same edges in every scale levels will be lower; this is the reason for not using the scale space in contour-based detectors. As an outcome, no information can be derived during corner detection that can be used to build any scale-invariant descriptors. To render the corners detected by contour-based corner detectors, the descriptors constructed by intensity-based corner detectors are also utilized in the literature. An example can be shown from Awrang- jeb’s [20] method, in which he utilised all the nearby SIFT descriptors ( [112]) for an individual corner detected by the CPDA [17] corner detector, within the 3-pixel neighbourhood throughout that corner locations to represent that corner. Unfortunately, the absence of same SIFT descriptors detected from the image under different transformations made the process very low in terms of robustness; instead, new descriptors can be identiﬁed inside the 3-pixel neighbourhood area after different transformations. This situation can lower the possibility of achieving corner matches between the reference and transformed images. [194] uses the edges near the detected corners to calculate the regions to build scale-invariant feature descriptors, which is later results in worse compared to Harris-Laplace [126] detector. In another approach, [167] applied a number of circular regions, using the corner location as the centre and then calculates the dissimilarity between the adjacent regions of the corresponding circles. Recently [169] proposed a technique to compute a suitable scale along with the corner detection process. He used the derives scale to calculate the interest region throughout the corner location in order to construct a robust local feature descriptor. This region estimation method considers a number of circular areas or regions around the corner, by remaining the corner location as the center of the circles and then it calculates the dissimilarity among the adjacent circular regions. §2.5 Interest Region Estimation 38

2.5.2 Estimating afﬁne-invariant region

Affine-invariant detectors identify similar regions in images reliably and repeatedly from different viewpoints that are related by a simple geometric transformation: scaling, rotation and shearing. The regions are detected invariant of the image transformation but the regions covariantly change with image transformation. But this approach is both computationally expensive and error-prone since the local feature patches typically contain only a small number of pixels. However, it has been shown by a number of researchers [26, 122, 126, 194] that a local affine approximation is sufficient in such cases. A number of affine invariant region estimation techniques were proposed in literature such as Maximally Stable Extremal Region (MSER) [122], Intensity- Based Regions (IBR) [194] and superpixels [69, 86]. Both the Harris-Laplace and Hessian-Laplace detectors [125] can also be extended to yield affine covariant regions. In both cases, scale-selection is based on the Laplacian and the shape of the elliptical region is determined with the second-moment matrix or auto-correlation matrix of the intensity gradient. MSER or Maximally Stable Extremal Regions have been proposed by Matas et al. [122]. A Maximally Stable Extremal Region is a connected component of a thresh- olded image. It applies a watershed segmentation algorithm to the image and extracts intensity regions which are stable over a large range of thresholds. The Maximally Stable in MSER refers to the optimization in the process of selecting an appropriate threshold and the extremal refers to the property that all pixels inside the MSER have either higher (bright extremal regions) or lower (dark extremal regions) intensity than all the pixels on its outer boundary. The MSER detector has been widely used in object recognition. Intensity-Based Regions (IBR) [194] detects local extrema of image intensities over multiple scales. Then it explores the image around them in a radial way, delineat- ing regions of arbitrary shape, which are then replaced by ellipses. Edge-Based Re- gion(EBR) [193] detects affine covariant regions in an image by exploiting the edges extracted from the image. The primary motivation of this method is that edges are typically rather stable features, that can be detected over a range of viewpoints, scales and/or illumination changes. Moreover, the dimensionality can be significantly re- §2.6 Feature Descriptors 39

duced by using the edge geometry. Kadirand Brady proposed a Salient Regions detector [86, 87]. The basic idea behind this feature detector is to look for salient features. Saliency is generally deﬁned as local complexity or unpredictability, which is measured by the entropy of the probability distribution function of intensity values within a local region. A segmentation-based technique called Superpixels was proposed by [136, 159]. Typically image segments are too large to be used as local features. Instead of using large segments, this technique increases the number of segments of the atomic region of pixels. This atomic region is referred to as superpixels that is generally produced by applying Normalized Cuts.

2.6 Feature Descriptors

As discussed in Section 2.1, after the feature detection process, every feature requires a unique identiﬁer or a signature which can later be used in identifying the corresponding feature from another image. These identiﬁers or signatures are known as feature descriptors. The descriptors are usually histograms of image information derived from interest regions. The types of existing feature descriptors are discussed in this Section. There are two types of local descriptors found in the literature: 1) Descriptors based on Geometric Relations and 2) Descriptors based on Pixels of the Interest Re- gion. In the following subsections, these two types of descriptors and their invariance against different geometric and photometric image transformations are discussed.

2.6.1 Descriptors based on Geometric Relations

Based on geometric relations, descriptors are built by using the connection between the feature locations. The relations can be the distance, angle or orientation. In literature, some methods using geometric relations have been proposed. Zhou et al. [221] used a Delaunay triangle among the improved SUSAN [181] corner locations of each image. The interior angles are then computed as the properties of the descriptor. The descriptors are invariant to geometric transformations like rotation, translation and uniform scaling. But it performs poorly for non-uniform scale or afﬁne transformations [20]. Later, [18] proposed a curvature descriptor by using the information of §2.6 Feature Descriptors 40

corner location, absolute curvature values and the angle with its two neighbourhood corners to build a local descriptor to represent the corners. They used another corner detector named ARCSS [24] that uses afﬁne-lengths between one corner and one of its neighbour corners instead of the angle with its two neighbourhood corners. These descriptors are easily constructible and relatively lower in dimension, but the distinctiveness of the corner locations are relatively low. This low distinctiveness results in false matches or miss matches.

2.6.2 Descriptors based on Pixels of the Interest Region

The most popular way to build descriptors is based on pixels of the interest region. This techniques lead the feature to become more independent and robust to occlusion. This types of local feature descriptors can be classiﬁed into two groups- 1) Binary De- scriptors and 2) Floating- Point Descriptors. Next, we will discuss descriptors within these two groups.

2.6.2.1 Binary Descriptors

Binary descriptors compare the intensity between two-pixel positions located around the selected key points. These descriptors are computationally very less costly or eco- nomic, at the same time they are very efficient in terms of feature storing and feature matching. Binary robust independent elementary features (BRIEF) [32] is the first binary descriptor that encodes the pixel information of the interest region as a binary string from a small number of intensity difference tests. To compute the distance between BRIEF descriptors, Hamming distance is used, which is significantly more efficient than the common Euclidean distance. However, this method is not rotationally invariant. Later, Oriented FAST and Rotated BRIEF (ORB), an extension [165] of this descriptor has been proposed to have rotational invariance which adopts the concepts of intensity centroid proposed by [164]. Inspired by the design pattern of the DAISY descriptor [187], BRISK descriptor [101] is proposed, which uses a circular sampling pattern with different radii. LDAHash [182] maps the descriptor vectors into the Hamming space to reduce the dimensions of the descriptor and represent them as a binary string. §2.6 Feature Descriptors 41

FREAK (Fast Retina key point) [8], an extension of BRIEF, is based on the retina sampling pattern. It operates on a cascade of binary strings, which are computed by comparing image intensities over the retinal sampling pattern. Ring-based Multi-Grouped Descriptor (RMGD) [60] is a binary descriptor that generates meaningful binary strings from pairwise ring-regions with various shapes, scales and distances by using pooling conﬁguration. Levi and Hassner [102] proposed the Learned Arrangements of Three patCH code (LATCH) descriptor, which uses the idea of sampling triplets instead of pairs to make the descriptor discriminative. Recently proposed local intensity order pattern (LIOP) and overall intensity order pattern (OIOP) descriptors [203] encode intensity order information of each pixel in different aspects effectively. Extension of LIOP descriptors was presented in [52], where several LIOPs over different measurement regions are collected into a single descriptor. Another binary descriptor named LIROP (Local Intensity Relative Order Pattern) was proposed by [209] that encodes the relative intensity ordinal information of the neighbouring points to each sampling point. [170] proposed a biologically inspired binary key point descriptors called Bink. Instead of areas, Local Difference Binary (LDB) [208] and Accelerated-KAZE (AKAZE) [9], extended the intensity tests to contain grids.

2.6.2.2 Floating-Point Descriptors

Floating-point descriptors [1,21,24,25] were mainly constructed from the histograms of locations and gradient orientations. The most popular descriptor of this type is the SIFT descriptor [112]. SIFT is one of the most robust descriptors with respect to different geometrical and photometrical changes [127]. The gradient magnitude and orientation are sampled in a 16 × 16 region around the key point. The region is then sub-divided, using a 4 × 4 grid component for each descriptor. The orientation histograms (quantized to 8 directions) are computed on each sub-region(see Fig: 2.8). Finally, this results in a 128 (i.e. 4 × 4 × 8) dimensional feature descriptor. The descriptor is then normalised to make it more invariant against illumination changes. Inspired by SIFT, many of its variants have been developed by researchers in the literature [53, 206]. §2.6 Feature Descriptors 42

Figure 2.8: SIFT descriptor building using gradients

Figure 2.9: Building a SURF descriptor.

SURF [27], the accelerated version of SIFT, is an efficient and robust scale and rotation-invariant method. This descriptor is built using the integral image and Haar wavelet to efficiently approximate the histogram of the gradient orientation. Because the gradient values within a sub-patch are integrated to generate a descriptor, it is less sensitive to image noise. This method generates a 64-dimensional vector as a descriptor (see Fig 2.9). Though the computational time is relatively less than the SIFT detector (proposed originally) and less sensitive to noises, it only can handle rotation and scale transformations. Similar approaches that use the integral image to speed up the process can be found in the literature [7,48,67]. In [67], the authors used integral histogram [150] which is a faster way of extracting histograms from computed integral images. Though this descriptor takes less time to compute, it is not invariant to rotational and non-uniform scale changes. In CenSurE (Center Surround Extrema) [7], the authors have modified the SURF §2.6 Feature Descriptors 43

descriptor to speed up the process of descriptor building by determining the key points using the extrema of the Hessian–Laplacian matrix across all. Ebrahimi [48] extended the CenSurE descriptor by predicting the response of the Haar wavelet for adjacent pixels to make the process faster. The Gradient Location Oriented Histogram (GLOH) [127] , a variant of SIFT [112], computes the SIFT descriptor in a log-polar coordinate system splitting the region into 17 sub-regions. The gradient orientations are quantised into 16 bins for each sub- region giving a 272-dimensional descriptor. This method outperforms SIFT under different image transformations. The main drawback is the computational expen- siveness because of the large dimensions. To reduce the dimensions, Principal Com- ponent Analysis (PCA) method is used. Later, Compressed Histogram of Gradients (CHoG) [35] was proposed based on the same idea. Another variation of the SIFT descriptor is Shape Context [30], which computes the descriptors in a log-polar coordinate system with nine sub-regions and four quantised bins for populating the orientations resulting a 36-dimensional descriptor. Due to its lower dimensionality (only 36) than SIFT, the computational time is comparatively small. PCA-SIFT [89] and ICA-SIFT [47, 207] have used Principle Component Analysis (PCA) and Independent Component Analysis (ICA) methods, respectively, on the SIFT [112] feature descriptor to reduce the number of dimensions. Rank-ordered SIFT (SIFT-Rank) [186] normalizes invariant feature image measurements for correspondence by using ordinal image descriptions. An ordinal description is a meta-technique which considers image measurements in terms of their ranks in a sorted array. The dimensions of the SIFT descriptors are sorted by values and their rank is considered in the final ordinal description. The DAISY descriptor [187], inspired by SIFT [112], is designed for wide-baseline stereo vision. It facilitates highly efficient dense matching by reformulating the computation of the histogram of gradient orientation. A few successful applications of DAISY can be found in [197, 222]. DAISY descriptor has higher efficiency and accuracy than SIFT [112], but it requires additional memory to build a scale space of an image derivative. Very recently a new descriptor is proposed [65], that divides the image into patches, §2.7 Feature Matching 44

(a) (b) (c)

Figure 2.10: Example of LBP formation (a) 8 neighbours of radius 1, (b)12 neighbours of radius 1.5, (c) 16 neighbours of radius 2

then the patch position and the angle is used to build pixel-based descriptors. Apart from the local descriptors, there are a number of texture descriptors [68, 75, 140] as global features available in the literature for image retrieval, face and iris recognition. Among them, Local Binary Pattern (LBP) [140] is widely used because of its computational simplicity and tolerance against illumination changes. Heikkila [77] used the idea of LBP [140] to build a local feature descriptor and it performs better than SIFT in terms of efﬁciency. LBP basically uses a set of histograms of the neighbourhood around each pixel. The basic LBP operator considers the eight neighbouring pixels around the center pixel, but the neighbourhood pixels can be increased by using the interpolation technique [70]. An example is shown in Figure 2.10. There are a few descriptors in the literature using color information [1, 81, 103] to build color invariant feature descriptor. The color descriptors perform better under any photometric image transformation, but the color information increases the complexity to make it discriminative.

2.7 Feature Matching

The most popular way of matching is to calculate the Euclidean distance of the descriptors. As discussed before, Feature descriptors are the multi-dimensional vector, the similarity between descriptors can be easily calculated by the distance of each dimension of the corresponding descriptor. The Euclidean distance between two de- §2.7 Feature Matching 45

scriptors can be measured by Equation

s n 2 Distance = ∑(DAi − DBi ) (2.8) i=1 where DA and DB are the two descriptors and n is the dimension size of the descriptors. The local descriptor matching approaches can be classiﬁed into three types [127] : (1) threshold-based matching (2) nearest neighbour-based (NN) matching (3) nearest neighbour distance ratio-based (NNDR) matching. All of the matching techniques calculate the distance between each descriptor of the reference image with each descriptor of the transformed image. If we denote DA as a descriptor in the reference image and D1 and D2 as the ﬁrst and second nearest neighbour to DA respectively, while

Di is the descriptor in the target image, then the three matching techniques work as follows:

1. Threshold-based matching: In the threshold-based approach, two descriptors,

DA and Di are matched compared to a threshold. If the distance between two descriptors is below the predeﬁned threshold, the descriptors are considered as a matched. Applications like image retrieval and image classiﬁcations usually use this threshold-based approach.

2. NN Matching: For the nearest-neighbour-based matching, again a threshold is used with different condition. In this case, if the distance between two descrip-

tors DA and Di is the lowest and below a threshold, then these two descriptors

DA and DB are referred to as matched.

3. NNDR matching: Nearest neighbour distance ratio-based (NNDR) matching is very similar to the nearest neighbour-based (NN) matching. In both approaches, one descriptor results in only one match. But this approach considers two nearest neighbours to get the match. In this case, the two descriptors are considered as matched if the distance ratio between the nearest neighbor and the second nearest neighbor is below a threshold t,

The thresholding is applied to the distance ratio between the ﬁrst and the second nearest neighbours. Thus, the descriptors are matched if it satisﬁes the condi- §2.8 Content Based Image Retrieval (CBIR) 46

tion:

||DA − D1||/||DA − D2|| < t, (2.9)

where D1 is the ﬁrst and D2 is the second nearest neighbour to DA and DA and

||DA − D1|| and DA and ||DA − D2|| are Euclidean distances of D1 to DA and D2 to

DA respectively.

NNDR matching is commonly used in applications like image registration, panorama creation, 3D construction etc because it decides a match by considering both the actual descriptor distance and the discrimination from all the other descriptors. In spite of these strategies, there are other methods used in literature that involves multidimensional search trees (KD-tree) [29], different hashing methods [31, 97, 155, 188], approximate nearest neighbours [137] etc.

2.8 Content Based Image Retrieval (CBIR)

Analysing and retrieving objects in the images is a fundamental job of computer vision applications. The recent technical evolution results in an increased number of image archives, making it harder to retrieve image within a large scale of image col- lections. The main concerns are how to represent the images and how to match and retrieve the images efficiently. Based on the query, the methods of image retrieval can be classified into two groups: text-based image retrieval (TBIR) and content-based image retrieval (CBIR). TBIR methods require some meta-data or keywords for each image and retrieval process can be performed by providing textual queries. These methods are effective and efficient as long as the images are correctly tagged by the keywords. However, this approach has some limitations such as the manual annotation which requires human interaction and the number of keywords. If the number of keywords are not sufficient, it is hard to retrieve images. Content based image retrieval (CBIR), which was first introduced in the early 1990s, provides a potential solution to these problems. Instead of textual annotation, CBIR methods describe the images by their visual content, such as color, shape, texture and local descriptors and represent them as a low-level feature vector. Images can be retrieved by calculating the similarity between the low level features. Typically some metrics are used §2.8 Content Based Image Retrieval (CBIR) 47

Figure 2.11: CBIR framework (Image courtesy [198]) to measure the similarity or dissimilarity between images. CBIR methods are much effective in practical applications such as large scale web image search and personal image database search. However, these methods may face several challenges such as occlusion, resolution and illumination, which can affect the performance of better retrieval. Figure 2.11 presents the content based image retrieval (CBIR) framework used by most of the image retrieval systems. Based on the features, CBIR methods can be classiﬁed into two groups based on- global features and local features [12]. A global feature is a multidimensional vector that is constructed using all the pixels of the image. In this approach, two images are compared by the extracted feature vector from each image. For example, if we want to compare the picture of an apple and the picture of a banana, the color feature will generate different vectors for each image. Thus global features can be explicated as a particular property such as texture, color histograms, edges or even a speciﬁc descriptor of an image [43, 80, 141, 166]. The main advantages of global features are §2.8 Content Based Image Retrieval (CBIR) 48

that they are much faster to compute and generally require small amounts of memory [140, 149, 183, 211]. However, they cannot be used in the presence of noise or for joined objects [123]. Unlike global feature-based approach, local features detect a number of prominent locations of an image to distinguish it from other images under various image transformations [27, 112, 187]. These locations are then described by using nearby pixel intensities which are called the interest region of that location. The image is then represented based on its local structures by a set of local feature descriptors extracted from a set of interest regions. However, local features are sensitive to noise and rotation and by definition can comprise many thousands of instances in a single image. Therefore, preprocessing is required to handle noise reduction, point detection, and edge detection and the sheer numbers of local features detected. Thus, local features are computationally more expensive than global features. To detect a feature, feature detectors take the image as input and outputs only the feature location with no other information. The interest region around the feature location is estimated to describe the feature and then Feature descriptors encode relevant information of the features as a vector which acts as numerical ”fingerprint” to match corresponding features. In our work, we consider corner as the feature as it is more stable than others [130,132]. Among the two types of corner detectors, intensity-based method [73, 194] typically use the scale-space representation process to calculate the region. Most of the corner detectors use several geometric parameters such as second moment matrix. Lindeberg [107] used The Laplacian and scale space determent of the Hessian to represent elliptical regions. Auto correlation matrices are used in Harris Affine and Laplace detectors [126] to detect regions. On the other hand contour-based corner detectors [6, 17, 154] are more robust and the interest region needs to be estimated in the description stage. Awrangjeb [20] used SIFT descriptor [112] to calculate the region within 3-pixels of the corner location. Sadat et el [167] used a scale-invariant estimation method which differentiates distinct contents around the corner location. Several content-based image retrieval methods have been proposed so far [82,138, 145], among them, the approach of using Bag of Visual Words (BoVW) produces better results than the other approaches due to its low computation cost and robustness against illumination, occlusion and clutter background. BoVW is one of the most popular methods to solve the main issues of CBIR and also performs well in the field of §2.9 Performance Evaluation 49

object recognition, image classiﬁcation and image annotation. Inspired by the bag of words concept used in natural language processing, BoW treats an image as a document of features [85]. The features are represented by descriptors which are later clustered based on a cluster number and the prototype of each cluster represents a visual word to generate form the Codebook. Many researchers worked on BoW framework in recent decades [10, 11, 54, 104, 212]. [104] used the BoW model based on Region of Interest. [116] used color information with BoW approach to retrieve images. [212] analysed feature detectors, descriptors, and support vector machine kernel functions.

2.9 Performance Evaluation

2.9.1 Performance Evaluation of Feature Detectors

In this Section, we will discuss the metrics for evaluating the performances of contour- based corner detector, which is frequently used in the literature. Some corner detectors in literature [34, 39, 205] used a very limited number of hand-drawn shapes and not evaluating the performances under various image transformations, while some others [131, 132] used a manual evaluation process to show the performance under a few image transformations. Awarangjeb [17, 24] proposed an automatic corner detection evaluation process examining the number of repeated corners that did not require a human visual inspection. Instead of relying on human opinion, he proposed using the detected corner locations of an image as the reference corner and then compared the locations of the detected corners in the transformed image of the former one with the reference corners. If a reference corner is detected in a corresponding transformed location, then that corner is considered as repeated. The main advantage of this process is that there is no limit on the number of images in the dataset. Moreover, this process does not require any human intervention, thus it is less subjective. There are a few metrics available in the literature to evaluate the effectiveness of corner detectors. These metrics are accuracy, consistency of corner numbers, repeatability rate and localization error. Repeatability has been deﬁned in a few ways [174, 175, 189] in the literature. Tra- §2.9 Performance Evaluation 50

jkovic and Hedley [189] deﬁned repeatability as the ratio of the number of the repeated corner to the number of corners in the original image, i.e.

Nm R = (2.10) No where Nm is the number of repeated corners detected in the transformed image. How- ever, Schmid et al. [174, 175] deﬁned the repeatability as follows,

Nm R = (2.11) min(No,Nt ) where the function min selects the lower number between two given numbers. The repeatability of the automatic evaluation system is measured using the following equation proposed by Mohanna and Mokhtarin [129]:

Nm Nm N + N AverageRepeatability = 100% × o t (2.12) 2 where No and Nt are the number of corners in the original image and the transformed image respectively. Nm is the number of matched corners between the original and transformed images. A corner is judged to be matched if it is detected within a 3-pixel radius of its expected location in the transformed image [17]. The localization error [2, 17] has been introduced to evaluate the localization of the corner detectors. The localization error indicates that how accurately the detector has detected the locations of the corners in the transformed image compared to the locations of the corresponding corners in the original image. The lower the value of the localization error, the better. It is calculated by using the root-mean-square-error (RMSE), i.e.

s Nm 1 2 2 LE = ∑(xoi − xti) + (yoi − yti) (2.13) Nm i=1

th where (xoi,yoi) and (xti,yti) are the i pair of matched corners. Mokhtarian and Mohanna [130] calculated the accuracy by matching corners iden- tiﬁed by human eyes along with the corners detected by effective and efﬁcient corner detectors. They also use the concept of consistency of corner numbers (CCN) to compare the total number of the corners detected in both the reference and transformed §2.9 Performance Evaluation 51

image. They deﬁned CCN as:

CCN = 100 × 1.1−|Nt −No| (2.14) where Nt and No are the numberss of detected corners in transformed and reference images respectively. Unfortunately, this metric does not show the actual strength of a corner detector because it only considers the number of corners detected not the location of the corners in both images. The number of corners in both reference and transformed images might be same, but the corner locations might not be the same. In this case, the detector will get the highest rank according to this metric, however, the detector is not effective.

2.9.2 Performance Evaluation of Estimated Interest Regions and Feature Descriptors

As discussed in Section 2.5, interest region estimation is the step prior to corner feature description, where the information in the estimated interest region is used for representing the corner. The correct matches of the descriptors between reference and transformed images indicate both how well the interest regions of the corners have been estimated as well as the distinctiveness of the descriptors to represent the corners. Therefore, the evaluation metrics used to evaluate the performance of the interest region estimation methods can also be used to evaluate the performance of the descriptors. The evaluation criterion we have used to evaluate the performance of the feature descriptor is based on the number of correct matches and the number of possible correct matches (correspondences). Given two sets of descriptors built from two images, we can obtain a set of matching pairs of descriptors based on the matching strategy used. From the detected corners in the transformed image, we can also derive the number of correct matches, false matches as well as the total possible correct matches by applying the transformation matrix or homography matrix on the reference points. To analyse the matching performance, we have used the Precision and Recall de- ﬁned as: §2.10 Summary 52

number of correct matches recall = (2.15) total number of correspondences number of correct matches precision = (2.16) total number of matches (correct and false) A perfect descriptor would allow us to obtain a recall equal to 1 for any precision. However, practically this is impossible to achieve. We would rather expect, low distance thresholds will provide high precision but will cause a ﬁnal low recall value. On the other hand, high threshold values will provide lower precision but higher recall.

2.10 Summary

From the discussion of the corner detectors, we have found that the contour-based corners are more stable and the number of corners is also relatively fewer than intensity- based corners as well as other features. In addition, corners detected by intensity- based detectors are more sensitive to noise. Eventually, the lower number of corners signiﬁcantly reduces the complexity in ﬁnding the matching corners between two images. Therefore, we have focused on the contour-based corner detectors in our research and we propose three contour-based corner detectors which perform better than existing detectors in Chapter 3 and Chapter 4. We have analysed the role of edge detectors in Chapter 5. Estimating the interest region is one of the most important steps in representing corners with local descriptors. Since contour-based corner detectors do not have any inherent information to estimate the interest region, we have proposed a descriptor-based interest region estimation method in Chapter 6 so that we can build the local descriptors to represent the detected corners. At last, we propose a framework to apply our proposed method in Chapter 7. Chapter 3

Single Chord based Corner Detectors

In Chapter 2, we have mentioned Chord to Point Distance Accumulation (CPDA) corner detection technique [17], which is reported as one of the best contour-based corner detectors in literature. In this Chapter, we discuss the detail of the CPDA corner detection method and identify the potential weaknesses of the detector. To overcome the weak points, we propose two new single chord-based corner detectors: Single Chord CPDA (SCCPDA) detector and Chord to Cumulative Sum Ratio (CCSR) detector [3]. SCCPDA is a modiﬁed version of CPDA method and CCSR detector is based on the cumulative sum analysis method. As both of the new corner detectors perform better than other existing corner detectors, these methods can be utilized in many computer vision applications such as image retrieval, image registration, image classiﬁcation, scenes or objects recognition and many others. This Chapter is organised as follows. In Section 3.1, the detail of CPDA [17] corner detector is discussed. In Section 3.2, some potential weaknesses of CPDA corner detector are analyzed. Then we propose two single chord-based corner detectors to overcome the problems in Section 3.3. The experimental results are discussed in Sec- tion 3.4 and Section 3.5 concludes the chapter.

3.1 CPDA Corner Detector

In this Section, we will describe the popular Chord-to-Point Distance Accumulation (CPDA) corner detection method [17]. CPDA method follows the main steps of other CSS-based corner detectors [24, 76, 216] as follows:

53 §3.1 CPDA Corner Detector 54

1. For a given input image, extract edges using the Canny edge detector [33].

2. From the edge image, extract the planar curves,

3. Fill the gaps if two ends of one or more curves are very close and ﬁnd the T- junctions.

4. Compute the curvature of each point after smoothing out the curves.

5. Use the curvature threshold to compare the curvature maxima to ﬁnd the candidate corners.

6. For better localization, trace down each of the candidate corner locations to the lowest scale.

7. Compare the corner locations with the T-junctions and add those if they are far away from the detected corners to ﬁnalize the corner set.

After extracting the curves, each curve is smoothed to remove noise with appropriate smoothing scale,σ = 1, 2, or 3. This smoothing scale depends on the length of the curves. Next, this method uses three chords of length 10,20,30. These three chords are moved along each of the extracted curves. The distance of each point on the curve to the chord is measured to calculate the curvature values. The curvature values using three of the chords are normalized and then multiplied to get the ﬁnal curvature value. This process can be explained in Figure 3.1.

In Figure 3.1, let P1, P2, P3,...., PN be the N points on a curve. So, value i of Li chord deﬁnes a straight line joining points Pj and Pj+i on the curve. To estimate the curvature value hLi(q) at point Pq using a chord which is i pixels apart, the chord is moved on each side of Pq for at most i points while keeping Pq as an interior point and the distances dq, j from Pq to the chord is calculated. Finally, all the distances are accumulated to derive the curvature estimation using equation given below:

q−1

hLi (q) = ∑ dq, j (3.1) j=q−i+1

The accumulated curvature values for each chord (from Equation 3.1) are then normalized using Equation 3.2. §3.2 Weaknesses of CPDA Corner Detector 55

Figure 3.1: Curvature estimation process for CPDA method, Image Courtesy [17]

0 hLi (q) h Li (q) = ,for 1 ≤ q ≤ N, i ∈ {10,20,30} (3.2) max(hLi ) The values calculated for three different chords from Equation 3.2 are then multiplied together using Equation 3.3.

H(q) = h0 (q) × h0 (q) × h0 (q), ≤ q ≤ N L10 L20 L30 for 1 (3.3)

Next, CPDA finds the candidate corners by rejecting weak corners using local maxima of absolute curvature by comparing the curvature values with threshold Th, which the authors set to 0.2. Based on the hypothesis that a well-defined corner should have a relatively sharp angle [38], CPDA calculates the angle from a candidate corner to its two neighboring candidate corners from the previous step, and compare with the angle threshold δ to remove false corners. The angle-threshold δ is set to 157◦. Figure 3.2 shows an example of the detected locations after each filtering process of removing the weak and false corners using the original Lena image [44].

3.2 Weaknesses of CPDA Corner Detector

Though the CPDA detector is reported to be one of the best contour-based corner detectors by outperforming other state-of-arts corner detectors, we have found several weaknesses of this method. We will discuss these weaknesses in this Section. §3.2 Weaknesses of CPDA Corner Detector 56

(a) (b)

Figure 3.2: (a) Original Lena image [44] (b) local maxima location from the estimated CPDA curvature values (c) locations after discarding the weak corners using a curvature threshold (d) Ultimate corner sets after false corners removal using the angle- threshold §3.2 Weaknesses of CPDA Corner Detector 57

1. The approximated curvature values are disproportionate to the original corner angle. A corner detector estimates the curvature value to calculate the intensiveness of the corners. These curvature values should be proportional to the corner angles. If the angle of the corner increases or decreases, the estimated curvature value also increases or decreases, respectively. However, CPDA method could not meet this requirement. For example, Figure 3.3 shows three separate angles for the triangle formed in a curve. Lines AD and AC represents the chords of the respective angles ∠ABD and ∠ABC, respectively. Table 3.1 shows the curvature values for three different chords {10,20,30} using the CPDA detector. From these curvature values, we can observe that the smallest angle results in the upper curvature values, while the middle lower angle has the minimal value.

(a) (b)

Figure 3.3: Triangle formed for different angles using CPDA method

2. Prone to missing less sharp corners if the curve has one or more sharp corners. §3.2 Weaknesses of CPDA Corner Detector 58

Detector Chord (Li) Figure 3.3 (b) Figure 3.3 (c) Figure 3.3 (d) 10 4.44 3.62 4.00 CPDA 20 10.21 8.59 9.00 30 15.66 13.56 14.00

Table 3.1: Curvatures calculated by CPDA detector

If there are more corner locations in a curve with different sharpness, then CPDA detector detects only the sharpest corner locations but fails to find the less sharp corners. The reason behind the problem lies in the normalization process of CPDA method. When the curvature value of a corner location is estimated using a chord, the normalization process fails to keep the information of actual magnitude differences of the curvature values of that particular corner location with other corners. For example, a rounded curve might not have a corner location. But the normalization process of CPDA method tends to have a location with curvature value 1. Therefore, the final product of the normalized curvature values using three chord lengths does not reflect the true cornerness of the curve. We use the popular Lena image to illustrate this criteria which is shown in Figure 3.2. To explain this more clearly, we can consider the Curve 1 of the original Lena image [44] shown in Figure 3.2 (c). Curve 1 has corners with different sharpness. The CPDA detector becomes unsuccessful to find the corner locations which are more steep. Since the changes in the slope of these two corner locations are different, the curvature values of these two points derived from the same chord will also be different. This is reflected in Figures 3.4 (a)-(c) for three chords

(L10,L20,L30) respectively. As these values are then normalized using the highest curvature value of the corresponding chord, the normalized value of the sharp corner location from each chord will be 1. As the original curvature values are now lost, the final curvature value of the less sharp corner location will be significantly lower (see Figure 3.4 (d)), thereby potentially resulting in corners being excluded in the resulting corner set which are detected extensively by using the CPDA corner detector. 3. Possibility of failing to detect the closely located corners on the curve. We observe that the CPDA corner detector might possibly unable to find the observable corners if they are located very closely. The reason behind the missing of detecting closely places corner locations lies in the use of chords which are intersecting §3.2 Weaknesses of CPDA Corner Detector 59

(a) (b)

Figure 3.4: Estimated curvature of Curve 1 of image 3.2(b) using chord L10, L20 and L30 and using combined Equations 3.2 and 3.3

curve segments at a longer distance of pixels. For example, Figure 3.5 shows a hand- drawn shape where CPDA detector repeatedly fails to detect the Corner ’C’, which is obviously a prominent corner location. The chords L10 and L20 can detect the local maxima on location ’C’ (Figures 3.5 (b) and (c)); however, the third chord L30 cannot (Figure 3.5 (d)). After normalizing the curvature values and multiplying them, the final curvature value representing the location of this corner will be too small to find as a corner by the CPDA corner detector. 4. The refinement process discards very sharp corners The last step of the CPDA detector is the corner refinement process to remove the false corners. Since the CPDA detector normalizes the curvature values of each chord and multiplies them to derive the final curvature, a refinement process is an obvious step after thresholding the maxima values. The purpose of the refinement process is to remove the false corners; however, we have found that this process may also remove §3.2 Weaknesses of CPDA Corner Detector 60

(a) (b)

Figure 3.5: (a) Detected corners using CPDA detector; (b)-(d) resulting curvature using three different chords following normalisation

some of the potential corners from the curves. There are two corner locations in Figure 3.6 where the CPDA detector can only detect one of them. Since the CPDA detector calculates the angle using two neighborhood corner locations from each side or the end of the curve in case there is no neighbor and in Figure 3.6, the missing corner does not have neighbor on its right side, the CPDA detector, therefore, uses the right end of the curve as one of the neighbors. Consequently, the calculated angle does not reflect the original angle or estimated curvature value of that location. That is how the CPDA detector misses the corner location. Finally, the complexity of the curvature estimation technique and the further refinement process to finalize the corner set, make the CPDA corner detector more expensive to compute the whole process. §3.3 Proposed Corner Detectors 61

Figure 3.6: Missing corner location due to using the reﬁnement process by the CPDA detector

3.3 Proposed Corner Detectors

In this Section, we present two single chord-based corner detectors to solve the short- comings of the CPDA and CSS-based corner detectors discussed in the previous Sec- tion. We ﬁrst propose the single chord CPDA (SCCPDA) detector and next, we propose another new approach named Chord to Cumulative Sum Ratio(CCSR) to estimate the curvature value on the curve using cumulative sum based analysis [3].

3.3.1 Why using single chord?

The original CPDA paper [17] mentions the use of multiple chords but does not justify the reason for doing so with any experimental results. In particular, CPDA uses three chords of length 10, 20 and 30. To understand whether using multiple chords is superior to using single chords, we modified the CPDA process to use a single chord Li, i ε [5, 30], where i has been decided experimentally from the experimental results. The modified CPDA detector does not need the step shown in Equation 3.3, as it is only meaningful when using multiple chords. However, as described in [17], apart from combining the distances of multiple chords, Equation 3.3 also magnifies the difference between the chord-to-point distances of weak and strong corners. §3.3 Proposed Corner Detectors 62

3.3.2 Proposed Single Chord CPDA Detector

Our proposed single chord CPDA detector (SCCPDA) [3] follows the steps of other CSS-based corner detectors as discussed in Section 3.1. The overall corner detection procedure has two main steps: 1. Edge extraction and 2 Corner Detection on Contours. The ﬂowchart of the proposed corner detector is shown in Fig 3.7.

Figure 3.7: Steps of the proposed methods

Edge extraction

In the ﬁrst step, the edges are needed to be extracted from an image by any single-pixel width edge detector. The edge detector should extract edges that are strong enough to appear in other images despite of any image transformations. Most of the CSS-based corner detectors in literature used Canny edge detector [33] in this regard. CPDA corner detector [17] ﬁrst uses Canny edge detector with thresholds low = 0.2 and high = 0.7 and this trend continues in [154], [24] and other recent chord-based corner detectors. Based on our analysis discussed in Chapter 5, we have used adaptive Canny edge detector which gives better result compared to other edge detectors. Now, the extracted edges form the edge map which is basically a binary image, where 1 represents edge pixel and 0 represents non-edge-pixel. To select the curves, we tracked the connected edge-pixels using a 3x3 neighborhood of each end point.

Curve selection

The next step includes contour extraction, ﬁlling small gaps between contours, and ﬁnd T-corners. Then smooth out each of the extracted curves and compute the cur- §3.3 Proposed Corner Detectors 63

(a) (b)

Figure 3.8: Examples of T-junctions

vature of each point on the curve and ﬁnd the candidate corners by comparing the curvature maxima values with a curvature threshold and those of the neighbouring minima and compare the corner locations with the T-junctions and add those if they are far away from the detected corners. Finally, the right corner set has been chosen. If an edge runs through any point within 2 pixels away from the end of another edge, that end is selected as a T-junction and store them to the T-corner set [17]. Figure 3.8 shows examples of T-corners from the edge images. §3.3 Proposed Corner Detectors 64

Noise removing

Noise removing is an important part of ﬁnding corner locations. Contour-based corner detectors usually remove noises by applying Gaussian smoothing or Region of support (RoS). Some corner detectors use Gaussian smoothing with multi scale [148, 212], some use single smooth scale [17, 132]. Multiple scales can make the process complex by detecting repetitive corners in various locations at different scales. It is also computationally expensive to classify the curves based on the lengths. Therefore, it is sometimes impractical to use multiple scales smoothing in practical applications. Like the other recent corner detectors [17,153], we use single scale gaussian smoothing scale to remove noise. Most of the detectors use smoothing scale σ =2 or 3. In our proposed detector, we have chosen 3 as the value of the gaussian scale σ. The curvature estimation will be performed on the resulting smoothed curve.

Curvature estimation

After the noise removal using the Gaussian scale, the next task is to estimate the curvature values on the smoothed curves. CPDA method [17] uses three chords of length 10, 20 and 30. Instead of using multiple chords like CPDA, we use a single chord Li, i ε [5, 30], where i has been decided experimentally from the experimental results presented in Section 3.1.

Figure 3.9: Curvature estimation using SCCPDA with chord

In Fig. 3.9, let P1, P2, P3,...., PN be the N points on the curve. Now the value i of §3.3 Proposed Corner Detectors 65

chord Li deﬁnes the number of points of the curve segment between points Pj and Pj+i.

To estimate the curvature value hLi(q) at point Pq using a chord which is i pixels apart, the chord is moved on each side of Pq for at most i points keeping Pq as a central point and ﬁgure out the distances dq, j from Pq to the chord. In the end, CPDA assembles the curvature estimation using equation 3.4. To achieve a similar goal we replace the step shown in Equation 4.1 with Equation 3.4, and follow it with the normalisation shown in Equation 3.2. Rest of the processes of using the curvature threshold and angle threshold are kept same.

q−1 !2

hLi (q) = ∑ dq, j (3.4) j=q−i+1

The accumulated curvature values for each chord (from Equation 3.4) are then normalized using Equation 3.5.

0 hLi (q) h Li (q) = ,for 1 ≤ q ≤ N, i ε[5,30]} (3.5) max(hLi )

Finding corners

After evaluating the curvature values, the next task is to find the local minima from each curve. The local minima locations will consider as the primary set of candidate corners. For this reason, we chose a curvature threshold (Th = 0.989). If the curvature value is less than Th, we consider the local minima locations as a prominent corner. Then we use another angle threshold θ = 157 ◦, similar to CPDA detector. Fig- ures 3.10 (a) and (c) show the locations of the minima on each curve of the Lena and Leaf images, respectively. Next, we consider the T-junctions, which are not selected as a prominent corner considering the (5 × 5 window) neighborhood pixels around any point. If the condition is fulfilled, we include the T-junctions to the final list of prominent corners. Figures 3.10 (b) and (d) are the final set of corners detected by the SCCPDA corner detector. §3.3 Proposed Corner Detectors 66

(a) (b)

Figure 3.10: (a) & (c) The location of the minima found using SCCPDA curvature estimation technique; (b) & (d) the ﬁnal corner set by SCCPDA using the curvature threshold §3.3 Proposed Corner Detectors 67

3.3.3 Proposed Chord to Cumulative Sum Ratio (CCSR) Detector

Now, we explain another approach which also use a single chord to detect corners [3]. One of the basic ways of finding corner locations is to measure the flatness of a curve. In other ways, a corner is a location where the slope of the curve changes the direction i.e. the curve is not flat. According to this hypothesis, if we put a straight line on the curve touching two ends of the curve or curve segment, the ratio of the length of the straight line to the curve length will give the essence of the flatness of the curve. As a result, this ratio value will also carry the information of the presence of corner location on the curve or curve segment if the curve length is small.

(a) (b) (c)

Figure 3.11: Three different curves with three different ﬂatness

Figure 3.11 (a)-(c) show three different curves of having different ﬂatness. Now if we measure the ratio of the straight line to the curve length, we get 0.9232, 0.8039, 0.6706 respectively. The ratio values clearly indicate that lower value carries more bend characteristics of the curve. Therefore, if we calculate the ratio on a small part of the curve this might tell the presence of a corner in that segment of the curve. Next, the distances from all the interior locations of the curve segment to the respective straight line are calculated. The location of the highest distance to the straight line from the curve segment is then considered as a corner location. This also helps to localize the corner locations. An example of this scenario is depicted in Figure 3.12. Similar to CPDA detector, the new detector also starts with extracting and smoothing curves from the image. Since the length of the curve segment needs to be measured, we have used the cumulative sum for distance calculation to calculate the curve length of two given location on the curve. The cumulative sum of the distances between each point of the curve gives a sequence of a partial sum of the distances. For example, the cumulative sums of the distances between each location of the curve §3.3 Proposed Corner Detectors 68

Figure 3.12: Detection of the corner location within the curve segment

from the start d1,d2,d3,... are d1,d1 + d2,d1 + d2 + d3,...... Therefore, the cumulative distance (CD) can be expressed as follows,

i CD(i) = ∑di (3.6)

Let P1,P2,...,PN be the N points of a curve and a ﬁxed length chord will be moved along with the curve. Please note, the length of the chord in chord based corner detection means the number of points apart to place the chord on the curve. For example, if the chord length is L = 5, the chord is placed from location p1 to p6, next the chord is placed from P2 to P7 and so on. Each ratio is assigned to the middle location of the curve segment as expressed in Equation 3.7.

p 2 2 (xp − xp ) + (yp − yp ) R(i + ceil(L)) = i i+L i i+L (3.7) CD(i + L) −CD(i)

Now, we ﬁlter out the curve segments based on a threshold against the ratio, which are minima on a curve. Now the perpendicular distances from each interior points from the curve segment to the straight line is measured and the location having the longest distance is considered as a corner. We named this detector as Chord to Cumulative Sum Ratio (CCSR). §3.4 Experimental Results 69

3.4 Experimental Results

This Section presents the experimental setup and the results to evaluate the performance of the proposed detectors. First, the experimental setup and evaluation process has been described. Second, the characteristics of the SCCPDA detector and CCSR detector have been shown in term of different parameters..Third, the overall performance of these detectors has been compared with other existing corner detectors from literature. Finally, the efﬁciency of the detectors has been presented based on the time taken to detect corners by the detectors.

3.4.1 Experimental Setup

The performance comparison estimation criterion and the experimental setup are outlined briefly in this Section. There are two main approaches in the literature to evaluate the performance of the corner detectors. The first approach requires human in- volvement to manually select corner locations based on some ground truth for the original images [130–132]. This evaluation method is impractical for real-life applications as it is very hard to use this process on datasets with a huge number of images. The other approach is based on automatic evaluation process. Some of the recently developed corner detectors [17, 24, 154] use this automated evolution process to measure the performance of the corner detectors. To measure the performance of the corner detectors, we have used the automated evaluation process which has been used in a few recently developed corner detectors [17, 24, 154]. In this process, the corner locations from the original images detected by the detectors are considered as the reference corners and then a known geometric transformation is applied to the original images to detect the corners again. Now a corner of the reference image is considered as repeated if the corresponding location of that corner is at a 3-pixels distance from a detected corner location in the transformed image. The benefit of using this approach is that human intervention is not needed. As a result, any set of image database can be used in this framework. §3.4 Experimental Results 70

3.4.1.1 Evaluation Metrics

Two types of evaluation metrics are used to measure the robustness of the proposed detectors 1) Average Repeatability [17, 132] and 2) Average Localisation Error [17]. These evaluation metrics have been used by [17, 153] in order to compare the performances. The average repeatability Ravg measures the average number of repeated corners between original and transformed images. It is deﬁned as,

Nm Nm N + N AverageRepeatability = 100% × o t (3.8) 2

Where No and Nt are number of corners detected in the original and transformed images respectively and Nm is the number of repeated corners between them. The detected corners which are selected from the original images are being utilized as the reference corners. Using these reference corners, there is no need of human intervention to decide the ground truth [17].

The localization error Le is deﬁned as the amount of pixel deviation of a repeated corner. It is measured as the root-mean-square-error (RMSE) of the repeated corner locations in the original and transformed images,

s Nm 1 2 2 Le = ∑(xoi − xti) + (yoi − yti) (3.9) Nm i=1

Where (xoi,yoi) and (xti, yti) are the positions of i-th repeated corner in original and transformed images respectively. Thus the lower LE rate is, the better a detector is.

3.4.1.2 Test image Dataset

We have used an image dataset of around 9000 images, which are obtained by imple- menting a number of different geometric transformations. Initially, we have used 23 base gray scale images to evaluate the performance of the corner detectors. These base images are consists of both real and artiﬁcial images. The same dataset is also used by [17, 153]. Seven Different transformations have been applied to these 23 images- Scaling, Shearing, Rotation, Rotation-Scale, Non-uniform Scale, JPEG Compression, Gaussian Noise. As a result, more than 8000 transformed images have been built to compare the reference corner locations. §3.4 Experimental Results 71

Rotation: In this experiment, the locations of the corners were extracted from the original images. Next, the original images were rotated with 18 different angles of the interval [−90◦,+90◦] at 10◦ apart excluding 0◦. Finally, the corner locations were again detected from all the transformed images and the evaluation metrics described above were calculated. Uniform scale: Similar to rotation transformation, the input images were scaled with the interval of 0.5 to 2.0 ,which are apart by 0.1. This scaling excludes 1.0. Non-uniform scaling: Non-uniform scaling works differently compared to uniform scaling in terms of the scale changes in both directions. We have scaled the images with 0.7 to 1.5 in x direction and 0.5 to 1.8 in y direction, which are apart by 0.1 except the condition in which the scale changes in both directions becomes similar. Rotation and scale: This is a combination of rotation and uniform scale transformations. Both transformations had been applied at the same time. The images were rotated with the interval [−30◦,+30◦] angle at 10◦ apart which excludes 0◦. The base images are selected with the scale range from 0.8 to 1.2 with 0.1 apart. This scaling excludes 1.0. JPEG compression: In this criteria, the images were compressed at 20 quality factors in the range of 5 to 100 with interval of 5. Gaussian noise: A mean zero at 10 variances of Gaussian noises are applied in the original base images with a range of 0.005 to 0.05 with interval of 0.005. Shear transform: Shear factors with the range of 0 to 0.012 at 0.002 interval are applied in the original images in both x and y directions. This crieteria is applicable in all conditions except when the shear factors in x and y directions becomes similar.

3.4.2 Parameter Optimization

In this Section, we have produced average repeatability of single CPDA detector against a series of chords from 5 to 30 and threshold from 0.15 to 0.25 with 0.01 apart. Although we have evaluated the performance of 26 chords of different lengths, we would like to show the results only for the chords from 11 to 20, because the performance of higher and smaller chords are relatively worse than the selected chords. A number of Gaussian smoothing scales which are used by different corner detectors §3.4 Experimental Results 72

Corner Detector Gaussian Smoothing Scale SCCPDA [3] 3 Zhang [217] 3 DoG Detector [216] 3 CPDA [17] 1, 2, 3 Fast CPDA [23] 3, 4 MSCP [212] 3, 3.5, 4 ARCSS [24] 3, 4, 5 He and Yung [76] 3 CCSR [3] 3 EigenValues [191] 3

Table 3.2: Recommended Gaussian smoothing scale σ for corner detectors

are listed in Table 3.2.

3.4.3 Performance Evaluation

This Section compares the average repeatability and localization error of proposed detectors with eight contour based detectors from the literature. Figures 3.15 and 3.16 show the average repeatability and localization error respectively. Both of the ﬁgures are showing the result for seven different transformations along with the average values of the seven transformations. Figure 3.13 show the average repeatability of SCCPDA detector for different chord lengths and different curvature thresholds against seven different transformations respectively mentioned in the previous Section. Figure 3.13 (h) shows the average of average repeatability of all the transformations. We have not added the axis title to any of the graphs as the titles take much space and make the actual graph smaller. The vertical axis of the graphs shows the average repeatability and the horizontal axis shows the threshold values. Each curve shows the repeatability of a chord for different thresholds. From the average of the average repeatability of all transformations, we can see that chord 18 has the best average repeatability for threshold 0.24. Similar to SCCPDA detector, the performance of CCSR detector has also been evaluated for different chord lengths and different thresholds against seven different transformations. The average repeatability of each chord is shown in Figure Figure 3.14. Figure 3.14 (h) shows the average of average repeatability of the seven transfor- §3.4 Experimental Results 73

(a) Scale

(b) Rotation

Figure 3.13: Average repeatability of single chord CPDA (SCCPDA) detector against different tranformations §3.4 Experimental Results 74

(d) Non-Uniform

Figure 3.13: Average repeatability of single chord CPDA (SCCPDA) detector against different tranformations (Continued) §3.4 Experimental Results 75

(e) Rotation and Scale

(f) Shear

Figure 3.13: Average repeatability of single chord CPDA (SCCPDA) detector against different tranformations (Continued) §3.4 Experimental Results 76

(g) JPEG

(h) Average

Figure 3.13: Average repeatability of single chord CPDA (SCCPDA) detector against different tranformations (Continued) §3.4 Experimental Results 77

mations. The evaluation process has been conducted for the chords from 5 to 21 with 2 apart and from 0.979 to 0.990 with 0.001 apart. From the average of average repeatability in Figure 3.14 (h), we can see that the chord length 9 has the highest average repeatability for threshold 0.981. Next. we compare the average repeatability and localization error of our proposed detectors with eight contour based detectors - DoGDetector [216], CCR [154],CPDA [17], He Yung [76], MSCP [212], ARCSS [24], Eigenvalues [191], GCM [217]. Figures 3.15 and 3.16 show the average repatability and localization error respectively. Both of the ﬁgures are showing the result for seven different transformations along with the average values of the seven transformations. In terms of average repeatability, except scale transformation, CCSR and SCCPDA have consistent better performance against different transformations. However, CPDA and MSCP have superior results in scale and shear transformations respectively. Con- sidering all the transformations, CCSR has the best average of average repeatability followed by CCR and SCCPDA detector. In terms of Localization Errors, presented in Figure 3.16, the CCSR is not the best corner detector. Please note that, the lower localization error, the better the detector is. Except the shear transformation and Gaussian smoothing, the original CPDA detector has the lowest localization error. However, similar to average repeatability, CCSR and SCCPDA detectors are always in the ﬁrst four detectors in all the transformations. In summary, CPDA has the lowest average localization error against all the transformation followed by SCCPDA and CCSR detectors. Table 3.3 shows the number of corners in the base 23 images detected by the corner detectors. We see that CPDA detects the least number of corners followed by SCCPDA and CCSR detectors. However, both SCCPDA and CCSR detectors have better repeatability than CPDA detector. Higher average repeatability means more corners from original image been detected in transformed image irrespective to the number of corners matched between original and transformed images.. Higher number of repeated corners with same average repeatability has obvious advantage than the less repeated corners. We have also shown the percentage of CPDA corners of 23 base images detected by other corner detectors. Both SCCPDA and CCSR detectors are detecting more than 90% of the corners which are detected by CPDA detectors. §3.4 Experimental Results 78

(a) Scale

(b) Rotation

Figure 3.14: Average repeatability of CCSR detector against different tranformations §3.4 Experimental Results 79

(d) Non-Uniform

Figure 3.14: Average repeatability of CCSR detector against different tranformations (Continued) §3.4 Experimental Results 80

(e) Rotation and Scale

(f) Shear

Figure 3.14: Average repeatability of CCSR detector against different tranformations (Continued) §3.4 Experimental Results 81

(g) JPEG

(h) Average

Figure 3.14: Average repeatability of CCSR detector against different tranformations (Continued) §3.4 Experimental Results 82

Figure 3.15: Average repeatability of different corner detectors

Figure 3.16: Localization Error of different corner detectors §3.4 Experimental Results 83

# of detected % of CPDA Corner Detector corners corners CPDA 922 100 SCCPDA 1000 91.398 CCSR 1264 92.437 DoG Detector 1342 88.152 CCR 1364 94.557 ARCSS 1402 76.431 He yung 1617 88.167 EigenValues 1710 94.757 MSCP 1747 92.646 GCM 2714 94.878

Table 3.3: Total detected corners by using CPDA. SCCPDA, CCSR, DoG Detector, CCR, ARCSS, He yung , Eigenvalues, MSCP and GCM corner detectors

Therefore, we can claim that the corners detected by these single chord detectors are not only detecting stable corners but also discarding the noises or unstable locations.

3.4.4 Complexity Comparison

Corner Detector Time(Sec) CCSR 0.056090 CCR 0.057176 Zhang 0.083072 MSCP 0.167033 He & Yung 0.310473 SCCPDA 0.310724 CPDA 0.384464 ARCSS 0.384809 DoG Detector 0.405147 EigenValues 1.078661

Table 3.4: Total time to detect corners from 23 images

Table 3.4 tabulates the total time taken to detect all the corners for the 23 base images by all the corner detectors. We see that CCSR is the most efﬁcient corner detector followed by CCR. Although SCCPDA detector is not one of the fastest corner detectors, it performs faster than the original CPDA detector. Please note that, the process of extracting edges from the images has been discarded from the total time as this process is same for all the detectors. The time shown in the Table 3.4 is taken only to §3.5 Conclusion 84

detect the corners from the extracted curves.

3.5 Conclusion

In this Chapter, we have proposed two chord-based corner detectors which are not only effective, but also efficient. At first, we have analyzed a state of art, CPDA corner detector and modified this detector in a way that the modified version not only has better average repeatability but also better efficiency. We have also proposed another contour-based corner detector, named CCSR, that outperforms CPDA and seven other existing corner detectors in term of average repeatability. CCSR also has very comparative average localization error. In addition, CCSR also detects more stable corners than CPDA detector. Another achievement of CCSR detector is, CCSR is the fastest detector to detect corners among all the detectors compared in the experiments. Chapter 4

Effective Multi-Chord based Corner Detector

Most of the chord-based corner detectors [17, 154] need at least two parameters to specify: the chord length and the threshold value. Selecting a single chord length is a difﬁcult task as different chord lengths can result in identifying different corner locations. On the other hand, all of the chord-based detectors determine the threshold value manually, which may not be ideal in practice. To solve these problems, in this Chapter, we propose a simple corner model using the distance accumulation technique to calculate a suitable threshold value. Based on the corner model, we then propose a single chord-based corner detector, Single Chord Accumulated Distance (SCAD). To overcome the problem of selecting single chord length, we implement a new method by using our SCAD detector with multiple chords in combination. Our proposed corner detectors described in the previous Chapter 3, performs better than other chord-based detectors in terms of average repeatability, but not in case of localization error. The new corner detector proposed in this chapter performs best in compared to other detectors in terms of localization error. We call this detector as Multiple Single Chord Accumulated Distance (MSCAD) [6]. This Chapter is organized as follows. In Section 4.1, the detail of distance accumulation technique is discussed. Then our proposed multi chord-based corner detector is discussed in Section 4.2. The experimental results are presented in Section 4.3 and Section 4.4 concludes the Chapter.

85 §4.1 Distance Accumulation Technique 86

4.1 Distance Accumulation Technique

An important criterion of feature detection process is that it should be invariant with different transformations. This process depends upon one or more parameters, for example, σ of a Gaussian ﬁlter, which must be chosen by some criterion. The best value for such parameters depends upon the input image. To get the ideal value, it can be useful to examine a whole interval of values. This process of computing the accumulated distance from a point in the curve to a chord within an interval, is known as distance accumulation technique [71]. The process begins with extracting the edges from the image and each curve is processed individually. The basic idea is to put a straight line on the curve and calculate the distance to the straight line to the point of the curve. As the distance depends upon the locations of the line segment and a single calculation can not give the right information, we move the line for each location being interior from nearest left most location to the nearest right most location. While moving the line, we calculate the distance from the interior location to the straight line and make a sum of them. This sum is the curvature value of that particular location.

Figure 4.1: Computation of distance between a point and a chord using distance accumulation technique

In Figure 4.1, let P1, P2, P3,...., PN be the N points on a curve. So, value i of Li chord deﬁnes a straight line joining points Pj and Pj+i on the curve. To estimate the §4.2 Proposed Method 87

curvature value hLi(q) at point Pq using a chord which is i pixels apart, the chord is moved on each side of Pq for at most i points while keeping Pq as an interior point and the distances dq, j from Pq to the chord is calculated. Finally, all the distances are accumulated to derive the curvature estimation using the Equation given below:

q−1

hLi (q) = ∑ dq, j (4.1) j=q−i+1

In distance accumulation technique, the curvature value on a point is proportional to the chord-length, which means if the chord length increases , the curvature value also increases. However, choosing a single chord-length for a given curve is a very difﬁcult task. Curves of the same length may contain different types of corners. If the selected chord length is set too low, the corner detector will be very sensitive to noise, thus resulting weak and false corners. On the other hand, if if the selected chord- length is set too high, it will smooth out details on the curve, thus losing some potential corner locations. To overcome the problem, we have proposed a corner model which is discussed in the next Section.

4.2 Proposed Method

The CPDA, CTAR and CCR detectors have to specify two parameters at least, the chord length and the threshold value used to determine whether the curvature value estimated for a point meets the corner criteria or not. The threshold values, along with the actual curvature estimation method, can be used to arrive at an approximate angle that each of these detectors imply as the maximum allowed corner angle. However, in all these detectors the threshold values are essentially hand-tuned. We have selected distance accumulation as the measure of curvature to detect corners. CPDA uses distance accumulation, however after ﬁnding the curvature values on an edge, it performs a normalization operation (Equation 3.2), which then forces CPDA to also use an angle threshold to determine true corners. We will now describe a simple corner model which allows us to use a single threshold to detect corners, while using distance accumulation [6]. §4.2 Proposed Method 88

4.2.1 A simple corner model

Given a chord length L and angle θ, we want to derive an approximate curvature value of a corner on the curve, where the corner has an angle of θ. The length of the curve is at least 2L − 1, so that the chord can traverse L − 1 times keeping the corner location as an interior. The estimated curvature value can then be used as a threshold (TL,θ) to detect corners from an image. TL,θ could be calculated from a variety of different corner conﬁgurations. However we have chosen to model the corner as a simple meeting of two straight lines at an angle θ, as shown in Figure 4.2.

Figure 4.2: Derivation of curvature value from an angle

In Figure 4.2, PQ is a curve and B is the corner location whose angle is θ. Now the curvature value of location B needs to be calculated for a given chord length L using distance accumulation technique. The chord of length L is put on the curve keeping the location B as interior. Please note that, the length L is the number of locations apart to draw the straight line (chord). Let the very next right location after B is X. So, one end of the chord is on X and other is on A which is on the left side of B. We can derive the length of AB as,

AB = L − BX (4.2) p AX = AB2 + BX2 − 2 × AB × BX × cosθ (4.3) AB × BX × sinθ d = (4.4) BX AX §4.2 Proposed Method 89

Since the chord L needs to traverse till the nearest left location of A, the value of BX will increase accordingly. So, the values of BX are from 1 to L − 1. Therefore the accumulated distance D is,

L−1 D = ∑ dBX (4.5) BX=1 Note that, the estimated curvature value is just an approximation as the distance between two locations may not be always 1.

4.2.2 Single Chord Accumulated Distance Detector (SCAD)

Given that we have a method to find a threshold TL,θ for a chord of length L and corner angle θ, we can now describe the single chord-based corner detection process we used for this stage of the experiments. The steps of extracting the image edges and smoothing is similar to the other chord-based corner detectors. The curvature value of the points on edges of the image is calculated simply as the accumulated distance for a chord of length L. Now, instead of using a threshold of 0.2 like CPDA, we compare the local maxima (of curvature values) with TL,θ to filter out the weak and false corners. No further steps are required for normalising, combining or refining. We call this method Single Chord Accumu- lated Distance (SCAD), where SCADL,θ indicates that the chord length was L, and the threshold was TL,θ (found using the method described in Section 4.2.1). Figure 4.3 shows the corners identified by SCAD20,175,SCAD20,170,SCAD20,160, and SCAD20,150. Quite clearly the angle has a large bearing on the numbers and types of corners found. The question is - given a chord length L, what angle θ will give us the “best” or atleast a “good” corner detection? To answer these questions, we performed an comprehensive set of experiments. We used the data set and the automatic evaluation process described [17, 24] and set- tled on average repeatability as defined in Equation 2.12 as a measure of “goodness”, as it is not enough to just find the corners, we need to find corners/interest points which can be identified even if an image undergoes some unknown transformation. We calculated the average repeatability of the SCAD approach for seven different transformations on the dataset. We performed experiments for chord lengths from 7 §4.2 Proposed Method 90

Figure 4.3: Corners identiﬁed by SCAD for different angles when using a chord of length 20 to 30, and for each chord length, we used angles from 145◦ to 165◦. From the results of these experiments, we extracted two important information: the best average repeatability for each chord length and the angle for which the highest repeatability was achieved per chord length. Figure 4.5 shows a bar chart of the best average repeatability for each chord length. Chord length 15 and its close neighbors have the best overall performance. In general the repeatability tapers off as we increase the chord length. In Figure 4.6 the blue diamonds represent the angles at which the best repeatability was achieved for each chord length. From lengths 7 – 15, the trend is quite clearly linear, i.e. smaller angles are required for larger chord lengths. From chord lengths 16-23, the trend is similar, although not quite as uniform. For the higher chord lengths, the angles start to taper off and start increasing again. §4.2 Proposed Method 91

Figure 4.4: Suitable angles for different chord lengths

An explanation of this behavior can be found from the scenario depicted in Figure 4.4, where initially as chord lengths increase, the angles formed at the corner decrease, i.e ∠ACB > ∠A0CB0. In fact as long as the length of the chord does not exceed the corner boundaries, this trend will continue. However for the higher chord lengths it is quite understandable that the corner boundaries will be crossed. In Figure 4.4 this situation is shown by the chord formed by the points A00CB00 and the resulting angle ∠A00CB00 > ∠A0CB0 but ∠A00CB00 < ∠ACB. It should be clariﬁed that this behavior emerges because when calculating TL,θ, we form the model angle as a simple triangle, so the θ that gives the best performing TL,θ is a measurement of the angle of the model corner for a particular L, not necessarily the angles present in an individual image. We derived four different equations from this experimental data, which will help us to determine a suitable angle, given a chord length, so that we can calculate TL,θ for SCAD, just by selecting the chord length. The red line with triangular markers show the curve that was found by ﬁtting a quadratic polynomial to the data points (blue diamonds). The equation of the quadratic curve is

θ = 0.248L2 − 1.509L + 174.4892 (4.6)

The black line with square markers show the straight line that was ﬁtted through §4.2 Proposed Method 92

Figure 4.5: Highest average repeatability of different chords

Figure 4.6: Angle Vs Chord for the highest repeatability the data points. The equation of this straight line is

θ = −0.5848L + 167.1935 (4.7)

The average repeatability shown in Figures 4.5 and 4.6 is the average of all the §4.2 Proposed Method 93

transformation, as explained in Section 3.4. However, we identiﬁed the non-uniform transformation as the transformation with the most difﬁcult geometrical changes. Fit- ting a straight line through the data points for the best average repeatability for the non-uniform transformation gives us the green straight line with circular markers. The equation of the line is

θ = −0.6035L + 168.9977 (4.8)

Finally, using the uniform changes in the best angle between chord lengths 7 – 15, we show the gray line with square markers. This line has the following rather elegant equation. θ = 172 − L (4.9)

Now, by using any one of Equations 4.6 – 4.9, we can ﬁnd an angle suitable for a chord of a certain length. As a result, now we only have to specify L, from which θ can be found. We have now reduced the number of parameters required for SCAD from two to one. The average repeatability using these equations for different transformations is shown in Figure 4.7. Best Repeatability in Figure 4.7 is the performance of the individual chord shown in Figure 4.5. Equations 4.6 – 4.9 were found to get a generalized method to ﬁnd a suitable angle given a chord length for our proposed SCAD method. Not surprisingly equation 4.9 performs well for chord lengths of 15 and below, but is not as effective for the larger chord lengths. Equations 4.6, 4.7 and 4.8 can be considered to be quite good models. The average difference in repeatability between the best results found through the comprehensive tests and the results obtained when using Equations 4.6, 4.7, 4.8 and 4.9 are 0.06%, 0.11%, 0.21%, 0.3%, when averaged over all the chord lengths. These are quite small differences and in our later Sections we use these four equations to develop a scheme to use multiple chords of different lengths.

4.2.3 Why use multiple chord lengths?

Figure 4.8 shows the corners detected by our proposed single chord-based method when using chord lengths 15 and 30, one at a time. Corners marked with a blue ’*’ §4.2 Proposed Method 94

Figure 4.7: Average repeatability achieved when using Equations 4.6 – 4.9 to calculate angle are detected when using chord length 15 with angle 157◦ and corners marked with a black ’O’ are detected when using a chord length of 30 with angle 152◦. The angles are selected as the best performing angles from our exhaustive experiment in Figure 4.5. Clearly there is a large amount of overlap, but this figure also demonstrates that using different chord lengths can result in finding different corners. There are some corners which were only identified when using chord length 15 and some which were only identified when using chord length 30. This scenario is our primary motivation for using multiple chords of different lengths. CPDA’s use of multiple chords can also be attributed to a similar motivation. However CPDA combines the results of its curvature values, calculated using different chord lengths, at an early stage of its corner determination. Our proposed method is quite different and actually lets each chord length to operate independently. We describe our method of using multiple chord lengths below.

4.2.4 Combining multiple chords of different lengths

The basic idea of combining multiple chords is to keep all the corner locations detected by the individual chord. Suppose we have a set of n chords of lengths l1,l2,l3...ln. For each of these chord lengths, we calculate the suitable angle using one of the Equations §4.2 Proposed Method 95

Figure 4.8: Corners detected using our single chord approach when using chord lengths 15 and 30 with angles 157◦ and 152◦ respectively

4.6 – 4.9 to find θ1,θ2,θ3....θn. For each {li,θi}, we use our proposed SCAD to find corners in the grey scale image I. Let the corner set returned by SCAD when using chord length li angle θi be Ci. We then determine our final set of corners Cfinal as

Cfinal = C1 ∪C2 ∪ ....Cn (4.10)

This process identifies all the locations where a corner was detected using at least one of the chord lengths independently. This differs from the CPDA approach because we estimate the curvature value independently for each chord length and we accept the decision (about whether a point is a corner or not) for each chord length. Our previously described threshold determination process gives us the confidence that each corner detected by SCAD for a particular chord length is indeed a suitable corner. Figure 4.9 shows the corners identified by CPDA, CCR and our proposed multiple chord approach for an example image from the data set mentioned in Section 4.3. CPDA uses chords of length 10, 20 and 30. For this to be a fair comparison we also use the same chord lengths with our proposed method. For this image CCR identifies §4.3 Experimental Results 96

more corners than CPDA but our proposed multiple chord-based approach identifies the highest number of corners in this image, including some corner locations which both CCR and CPDA fail to identify. Some of these locations are shown by a green bounding box in Figure 4.9c. However, this simple method of using multiple chord lengths also has the problem where it identifies corners which are just a few pixels apart. These cases are marked with a red box in Figure 4.9c. These slight variations in corner locations can be attributed to the the use of different chord lengths. We consider these ”very close” corners to really be duplicates of the same corner. To justify this, we examine the original ’Lena’ image and extract total 14 curves from the image. Then, we put each chord of length 7 to 30 on each of the extracted curve and study the corner locations found from this experiment. We have noticed that some of the corner locations differ using different chord lengths, which is only 1 or 2 pixels away from each other. For example, when we examine curve 6 in Figure 4.10, our experimental result shows that we get two different corner locations at point 16 and 17 for chord length 7 to 11 and 12 to 24 respectively. That difference is so close that we can consider them as the duplicate corners. Similar scenario have also been found in a few curves. Therefore, we add a simple duplicate removal process to our proposed method of using multiple chords where we identify pairs of corners which are at a Euclidean distance of less than or equal to 2, and replace both corners by their middle point. Figure 4.11 shows the final set of corners by our proposed method after removing 19 duplicate corners. We call this method Multiple Single Chord Accumulated Distance (MSCAD).

4.3 Experimental Results

In this Section, we present the average repeatability and localization error (Section 3.4.1.1) of the proposed detectors compare to other existing corner detectors. All the experiments were run on Matlab 2012b on an Windows 7 (64bit) machine with an Intel Core i7 4770 processor and 16GB of RAM. Since it is already shown in [185] that CPDA and CCR are two of the best contour- based corner detectors in the literature that outperforms [132], [24], [38], [212], we compare our proposed MSCAD detectors with these two detectors along with our §4.3 Experimental Results 97

(a) CPDA (b) CCR

Figure 4.9: Corners identiﬁed by the different detectors §4.3 Experimental Results 98

Figure 4.10: Extracted edges found in ’Lena’ image

Figure 4.11: Corners identiﬁed by using Multiple SCAD (MSCAD) §4.3 Experimental Results 99

proposed detectors described in Chapter 3. The previous Section shows the detected corners on a leaf image by MSCAD detector with the chord lengths 10, 20 and 30. However we chose to combine chord lengths 7, 15 and 30 with the Equations 4.6 – 4.9. The parameters of MSCAD detectors along with their equations are tabulated in Table 4.1. Our motivation for this choice comes from Figure 4.5 which shows that chord length 15 has the best average repeatability. To get better coverage we took chord lengths of double (30) and half size(7).

Method Equations Chord Description Length MSCAD BFSL Equation 4.7 7, 15, 30 Best Fitted Straight Line MSCAD BFQL Equation 4.6 7, 15, 30 Best Fitted Quadratic Line Best Fitted Straight MSCAD BFSL NU Equation 4.8 7, 15, 30 Line with Non-Uniform Scale Transformation Best Fitted Straight MSCAD BFSL (7-15) Equation 4.9 7, 15, 30 Line from chord length 7 to 15

Table 4.1: Details of MSCAD detectors

Figure 4.12 shows the average repeatability of seven different transformations by MSCAD detectors comparing with SCCPDA, CCSR, CCR and CPDA detectors. BFSL and BFQL stand for Best Fitted Straight line and Quadratic line respectively, and NU stands for Non-Uniform Transformation. Finally, (7-15) indicates the Equation 4.9 which was ﬁtted from chord length 7 to 15. The performance of all the detectors is consistent for all of the transformations. In all transformations, MSCAD detectors perform better than CPDA and CCR corner detectors. The average repeatability in the right most bar is also showing the same trend and MSCAD BFSL has the best average repeatability. The repeatability of CPDA detector is the worst among the detectors. Figure 4.13 shows the localization error of the detectors for each individual transformation as well as the overall average localization error. In this metric CCR has the worst performance. This is not surprising as CCR determines the middle point of the curve segment as the corner location without any further reﬁnement. CPDA has better performance compared to CCR, but all the MSCAD detectors outperform both CPDA and CCR. In particular, MSCAD BFSL (7-15) performs best at localizing §4.3 Experimental Results 100

the corner locations. Although CCR has better repeatability than CPDA detector, CCR has the highest localization error among the other detectors. Similar to average repeatability, MSCAD detectors has the lowest localization error followed by CPDA detector and in particular, MSCAD BFSL (7-15) performs best at localizing the corner locations.

Figure 4.12: Comparison of Average Repeatability of MSCAD detectors with other corner detectors

Figures 4.12 and 4.13 shows the performance of the different corner detectors on the basis of average repeatability and localization error. As described previously, average repeatability considers the number of corners detected in original image and transformed image, and the number of matched corners among the detected corners in both images. Therefore this measure only counts the percentage of common corners in original and transformed images instead of counting the actual number of repeated corners. For example, a detector that detects only one corner in the original image and the same corner in transformed image will have 100% repeatability. However, this detector may not be helpful for any of the applications of corner detectors. Fol- lowing from this, consider the case of two detectors with similar average repeatability. To choose between them, we should consider the actual number of corners detected in the original images. The detector which detects the higher number actually ﬁnds §4.3 Experimental Results 101

Figure 4.13: Comparison of Localization Error of MSCAD detectors with other corner detectors more points of interest, making it a more useful detector overall. We have already shown in Section 4.2.4 that MSCAD ﬁnds more corners than CPDA and CCR detectors for that particular image. Figure 4.15 shows the number of repeated corners and number of non-repeated corners by the detectors. The number of corners shown in Figure 4.15 is the total number of corners detected in the 23 primary images. The repeated and non-repeated number of corners have been calculated using the average repeatability shown in Figure 4.12. CPDA has the lowest number of repeatedly found corners, followed by CCR. All MSCAD detectors identify more corners repeatedly compared to both CPDA and CCR. Therefore, MSCAD outperforms the current state of the art chord-based corner detectors. As we have considered only one threshold for CPDA, we have examined the performance using different thresholds. We have considered threshold 0.24 and 0.989, which are used in other contour-based corner detectors. The performance of CPDA method using differents thresholds in compared to CCR and MSCAD corner detector is presented in ﬁgure 4.14. The experimental result clearly indicates that despite using different thresholds, our proposed MSCAD detector performs better. §4.3 Experimental Results 102

Figure 4.14: Number of missed and repeated corners by the CPDA detector using different thresholds

Figure 4.15: Number of missed and repeated corners by the detectors §4.3 Experimental Results 103

Corner Detector Time(Sec) CCSR 0.056090 CCR 0.057176 SCCPDA 0.310724 CPDA 0.384464 MSCAD BFSL 1.04051 MSCAD BFQL 1.078661 MSCAD BFSL NU 1.088566 MSCAD BFSL (7-15) 1.077952

Table 4.2: Total time to detect corners from 23 images

Table 4.2 tabulates the total time taken to detect all the corners for the 23 base images by all the corner detectors. We see that CCSR is the most efficient corner detector followed by CCR. Our proposed MSCAD detector is takes some extra time because of calculation of the corner detection to get better results. The time shown in the Table 4.2 is taken only to detect the corners from the extracted curves. However, MSCAD detector performs best in terms of both average repreatabilty and localization error, but the other two single chord-based detectors are faster in computation. So, we have to choose the proper corner detector based on the application we use. A question may arise about the results shown so far, as Equations 4.6 – 4.9 were derived using data obtained from this image set. To test whether these equations are still applicable for MSCAD for other images, we now present results for a second image set. This second set contains 13 primary images. These primary images include a few number of painted shapes along with some real life images that are frequently utilized in a number of research works in literature on contour-based corner detection. Worth to mention that the same dataset has also been used to present results of the CCR corner detector in [185]. Using this dataset, we only compared our method with CPDA and CCR corner detector. Figures 4.16 show the corners detected by CPDA, CCR and our proposed MSCAD for an image from this second data set. Quite clearly, CCR identifies more corners than CPDA, but both CPDA and CCR miss some locations which are quite obviously visual corners. MSCAD BFSL identifies at least 7 corner locations more than CCR, and 13 more than CPDA. This visual inspection demonstrates that although Equation 4.6 was derived using the first image set, it worked effectively for this image from the §4.3 Experimental Results 104

(a) CPDA (b) CCR (c) MSCAD BFSL

Figure 4.16: Corners detected on the Box image from the second data set second set as well. Figures 4.17 and 4.18 show the average repeatability and localization error obtained by using different corner detectors on the second set of images. The behavior of the detectors are almost similar with the first set of images. Although CCR shows best result in terms of average repeatability, MSCAD BFSL NU is only .07% lower than CCR detector. The performance of rest of the MSCAD detectors is also very much comparable. All the MSCAD detectors have the lowest (best) localization error as shown in Figure 4.18. Figure 4.19 shows the number of repeated corners and non-repeated corners. Similar to the first dataset, MSCAD detectors detect more repeated corners compared to the other two detectors. In fact MSCAD BFSL NU has the highest number of corners in both set of images. The results for this second set indicates that although Equations 4.6 – 4.9 were derived from the first image set, they are just as effective for other images. The results show that MSCAD detectors consistently identify more corner locations, more accurately, compared to CCR. MSCAD can also identify these locations under different transformations at least as reliably as CCR. These properties make it a more effective chord-based corner detector compared to the current state of the art. §4.3 Experimental Results 105

Figure 4.17: Comparison of Average Repeatability of MSCAD detectors with CPDA and CCR corner detectors on Second Image Dataset

Figure 4.18: Comparison of Localization Error of MSCAD detectors with CPDA and CCR corner detectors on Second Image Dataset §4.3 Experimental Results 106

Figure 4.19: Number of missed and repeated corners by the detectors on second Image dataset §4.4 Conclusion 107

4.4 Conclusion

In this Chapter, we propose a new method to detect the corners from image contours. We approach the problem by ﬁrst modelling a corner by two parameters, a chord length and an angle. We use the corner model (using the two parameters) to determine a suitable threshold value. Our proposed SCAD detector use this threshold value to identify suitable corners while eliminating false and weak ones. Next, we perform exhaustive experiments which resulted in developing four equations, each of which can be used to determine a suitable angle for given a chord length to be used with SCAD, thus altogether eliminating the need to manually specify the angle. We have demonstrated that using different chord lengths can result in better corner detection. So we propose a simple approach using multiple chords. MSCAD, our proposed multi-chord detector, performs multiple SCAD detection followed by a simple duplicate removal process. We have compared the performance of MSCAD against popular CPDA detector, and the very recent CCR detector. Result shows that MSCAD identiﬁes more visually obvious corner locations than both CPDA and CCR. Quantitatively we have compared the performance of MSCAD with CPDA and CCR on two image sets, both of which have been used in previous studies. MSCAD consistently outperforms CPDA in terms of average repeatability and localization error. MSCAD detectors outperforms CCR in terms of localization error and has the same average repeatability. How- ever, we have shown that all the MSCAD detectors consistently identify more corner regions than CCR while having the same average repeatability and better localization error, thus outperforming the current state of the art chord-based corner detector. Please note that, our proposed SCCPDA and CCSR corner detectors described in Chapter 3 also performs better than other detectors. However, MSCAD detector performs best in terms of both average repreatabilty and localization error, but the other two single chord-based detectors are faster in computation. So, we have to choose the proper corner detector based on the application we use. Chapter 5

Performance Analysis of Corner Detection Algorithms Based on Edge Detectors

As discussed in previous Chapters, contour-based corner detectors are more stable and less sensitive to noise among different types of corner detectors. The primary step of these detectors [17, 23, 24, 132, 154, 217] is to extract the edges that are relevant for corner detection. The role of edge detection in the image matching process is shown in Figure 1.5. The accuracy of the corner detection process relies on the accuracy of the edge detection step. In this Chapter, we analyze the role of different edge detection methods on the current state-of-art contour-based corner detectors [5]. For contour-based corner detection, CPDA method [17] first used Canny edge detector with the predefined threshold and since its popularization, researchers have been continuing this trend without question in [23, 132, 154] and others. We observe that these contour-based corner detectors use Canny edge detector with the low and high threshold as 0.2 and 0.7, respectively, which is not suitable to find corners in natural images. Thus, we examined the performance using the adaptive Canny edge detector and found that it gives excellent results for extracting edges, which results in detecting more reliable corners. A number of comparative studies [114,144,185] have been performed on edge detection techniques from the edge-quality perspective. In this Chapter, we compare these techniques in the context of corner detection. We analyze the performances of chord-to-point distance accumulation (CPDA) [17] detector and recently proposed Curve to Chord Ratio (CCR) [185] detector along with our proposed Single Chord

108 §5.1 Importance of Edge Detection for Detecting Corners 109

CPDA (SCCPDA), Chord to Cumulative Sum Ratio(CCSR) and Multiple Single Chord Accumulated Distance (MSCAD) corner detectors. More speciﬁcally, we tried to investigate the role of edge detectors on corner detection methods based on some questions for the diverse nature of different techniques:

1. Does canny edge detector give the best result in all conditions? If not, then which edge detector performs better for detecting corners?

2. Which edge detector results best in ﬁnding maximum repeatable corners?

3. Which edge detector works best under different transformations?

4. Which edge detector is fast for which corner detector?

5. Which edge detector ﬁnds and extracts the edges quickly?

This Chapter is organized as follows. Section 5.1 explains the importance of edge detection methods for detecting corners. Section 5.2 discusses some classic edge detection techniques in details. Section 5.3 explains the use of adaptive Canny edge detection method. The performance analysis is presented in Section 5.4. Finally, Section 5.5 concludes the Chapter.

5.1 Importance of Edge Detection for Detecting Corners

Edge detection is the preprocessing step for many image processing algorithms. Edge is basically those image points where intensity changes sharply. The principal objective of the edge detection is to identify the discontinuities in an image. These discontinuities may occur due to surface orientation, object occlusion, shadow line casting and surface reflectance. The edge detection method remarkably reduces the quantity of data that is required to represent an image. It also filters out the inappropriate information, while keeping the important structural features of an image. The information includes object’s size, shape, color and orientation. The importance of edge detection process depends on the application. A few applications like medical imaging requires perfect edge identification which is time-consuming, while different applications like mobile robot vision require real-time vision calculations and do not rely on impecca- ble edge recognition. §5.2 Edge Detection Methods 110

Now, the corner detection process is closely related to edge detection as a corner is basically the intersection of two edges. Thus, to detect corners in real-life applications, edge detection is necessary. Corner detection process relies on measuring the curvature of an edge that passes through a neighborhood. The strength of the corner response depends on both the edge strength and the rate of change of edge direction. Thus it is necessary to point out the true edges to get the best results from the detection process that ﬁts well to the application. An example is shown in Figure 5.1. But choosing an appropriate edge detector is not an easy task. Different effects, such as change of direction, or poor focus can result in change in the intensity values, resulting in errors such as false edge detection, loss of true edges, poor edge localization, as well as high computational time and problem due to noise. The edge detectors that depend on Gaussian smoothing, leads to poorer localization of corner position for the rounding effect at corner neighborhood. Moreover, the non-maximum suppression used in common edge detectors can make the straight lines curved. In the past few decades, numbers of methods have been proposed for the edge detection. However, the edge detection is application oriented i.e., the same algorithm cannot be applied for all types of images (applications). Therefore, the choice of an edge detection process has the signiﬁcance of chord-based corner detectors. It may be obvious that the number of corners depend on the number of edges extracted. However, it is not just how many edges are detected, but which edges are detected, that may be more important in the actual application of corner detectors.

5.2 Edge Detection Methods

An important property of the edge detection method is its ability to extract the accurate edge line with good orientation. Edge detection technique is usually applied on the gray scale image. Edge detectors can be classiﬁed into two classes: Classical operators and Gaussian operators. For the classical operators, the edge detection is simple, very fast and the edges are detected with their orientations. However, these operators are very sensitive to noise. Gaussian operators are more complex and more time-consuming, but they are more robust to noise and provide better, accurate, and well-localized edges. §5.2 Edge Detection Methods 111

(a) An input image (b) Gray scale image

Figure 5.1: Example of detected corner from edge map of an input image

Classical operators such as Roberts, Prewitt and Sobel operators, detect the presence of edges by looking for maxima and minima in the ﬁrst derivatives of the image. The gradient is a vector which has magnitude and direction. For each pixel, the gradient magnitude gives the number of the highest change in intensity in the direction where intensity is changing fastest and is always perpendicular to edge direction as shown in Figure 5.2. In order to produce separate measurements of the gradient component in each orientation (known as Gx and Gy), masks are applied individually to the input image [114]. These can then be combined together to locate the absolute magnitude of the gradient at each point and the orientation of that gradient. The gradient magnitude is given by:

q 2 2 G = Gx + Gy (5.1) §5.2 Edge Detection Methods 112

Typically, an approximate magnitude is computed using: |G| = |Gx| + |Gy| which is much faster to compute. The angle of orientation of the edge (relative to the pixel grid) is given by: Gy θ = tan−1( ) (5.2) Gx

Figure 5.2: Gradient direction

On the other hand, Gaussian operator searches for the ’zero crossings’ in the second derivative of the image to ﬁnd edges. The Gaussian operator is used to blur images and remove noise. Both classes of edge detectors apply some simple convolution masks on the entire image in order to compute the ﬁrst order (Gradient) and/or second order derivatives (Laplacian). The convolution kernel is a small 2–D matrix structure. Kernels of different size generate different results after applying convolution. A convolution kernel of nxn mask is multiplied for each pixel location (i; j) of image I by following formula.

+n/2 +n/2 f (n,n) ∗ I(i, j) = ∑ ∑ f (k,l) × I(i − k, j − l) (5.3) k=−n/2 l=−n/2

Figure 5.3 shows an example of a 3 x 3 kernel, where the value of the central pixel (shown in black) is derived from the values of its eight surrounding neighbors (shown in dark blue). §5.2 Edge Detection Methods 113

Figure 5.3: Applying convolution kernel

5.2.1 Sobel Operator

The Sobel operator [210] is a pair of 3 × 3 convolution kernels as shown in Figure 5.4. These kernels are orthogonal to each other and is perfect for the edges that existed vertically and horizontally. These kernels can be applied separately to the input image to calculate the gradient in each orientation (Gx and Gy). To ﬁnd the absolute magnitude of the gradient, gradients of each point can be combined together. The gradient magnitude and orientation can be derived by Equation 5.1 and Equation 5.2 respectively.

Figure 5.4: Masks used by Sobel Operator. §5.2 Edge Detection Methods 114

5.2.2 Robert Cross operator

The Roberts Cross operator consists of a pair of 2 × 2 convolution kernels as shown in Figure 5.5. These kernels respond to edges that existed at 45◦ to the pixel grid. One kernel is used for each of the two perpendicular orientations. Though Roberts cross operator is very simple and faster to compute, the kernel is too small to ﬁnd true edges under different noises in practice. Like Sobel operator, kernels of Roberts cross operator can also be applied separately to the input image in order to get gradient in each orientation (Gx and Gy). The gradient magnitude is given by Equation 5.1 The angle of orientation of the edge can be calculated by using the Equation 5.4

Gy 3π θ = tan−1 − (5.4) Gx 4

Figure 5.5: Masks used for Robert Operator.

5.2.3 Prewitt operator

Similar to Sobel Operator, Prewitt Operator also uses two 3 × 3 matrix which are convolved with the original image to ﬁnd vertical and horizontal edges [151]. This operator does not place any pixels those are closer to the center of the mask. The direction of gradient mask is given by the mask giving maximal response. The masks used for Prewitt operator is shown in Figure 5.6.

5.2.4 Laplacian of Gaussian (LoG) operator

The Laplacian of Gaussian operator calculates the second derivative of an image and does not require the edge direction [88]. It subtracts the brightness values of each of §5.2 Edge Detection Methods 115

Figure 5.6: Masks used for Prewitt Operator the neighboring pixels from the central pixel. When a discontinuity is present within the neighborhood in the form of a point, line or edges, the result of the Laplacian is a non-zero value. It may be either positive or negative depending whether the central point lies with respect to the edges. Commonly used kernels for LoG operator is shown in Figure 5.7.

Figure 5.7: Masks used for LoG

The Laplacian L(x,y) of an image I with pixel intensity values I(x,y) is given by:

δ2 δ2 ∇2 = + (5.5) δx2 δy2

As it is very sensitive to noise, Gaussian smoothing is often used before applying the Laplacian filter. This preprocessing step reduces the high frequency noise compo- nents prior to the differentiation step. This preprocessing is done very efficiently by first convolving the Gaussian and Laplacian filters together before applying them on the images. §5.2 Edge Detection Methods 116

5.2.5 Zero cross operator

The Zerocross Operator ﬁnds the location where the Laplacian value goes through zero i.e. points where the Laplacian changes sign. Zero crossings always lie on closed contours, and so the output from the zero crossing detector is usually a binary image. The main disadvantage of this operator is the susceptibility to noise [16].

5.2.6 Canny Edge Detector

The Canny edge detector, proposed by John Canny, is one of the most popular methods to ﬁnd edges [33]. Canny proposed three criteria of image edge detection: (i) high signal-to-noise ratio; (ii) better localization; (iii) Uniqueness of Response, i.e., a single response to a single edge. Based on those criteria, Canny edge detection algorithm consists of the following steps: Image smoothing The ﬁrst step is to remove any noise present in the original image. To do this task, a

Gaussian filter, Gσ is generally used to smooth the image in order to remove the noise ( Equation 5.6.). The performance of smoothening depends on standard deviation and filter size. The filter size should be smaller than the size of original image.

g(x,y) = Gσ(x,y) ∗ f (x,y) (5.6)

2 2 √ 1 x +y where Gσ = exp(− 2 ) 2πσ2 2σ

The output smoothened image is obtained by the convolution of original image and Gaussian function. H(x,y)=g(x,y)∗I(x,y) Image gradient calculation

The horizontal gradient Gx(x,y) and vertical gradient Gy(x,y) at each pixel location are calculated by convolving the image I(x, y) with partial derivatives of a Gaussian function, G(x,y):

1 −x2 + y2 G(x,y) = exp 2πσ 2σ2 §5.3 Using Adaptive Canny Edge Detector 117

Gradient magnitude and direction

Next, the gradient magnitude M(x, y) and direction (θ,G(x,y)) at every pixel location is calculated using the following equations:

q M(x,y) = Gx2(x,y) + Gy2(x,y) Gy(x,y) θG(x,y) = arctan Gx(x,y)

Non-maximum suppression (NMS) To ﬁnd the true edges, Non maximal suppression is applied. It is done by tracing along the edge direction and suppress any pixel value that is not considered to be an edge. To be exact, all the points where the gradient is not maximum are suppressed. It results in a thin line for edge Hysteresis thresholding.

It computes high and low thresholds Thigh and Tlow respectively, depending on the histogram magnitudes of the gradients of whole image. A pixel P(i, j) with gradient magnitude G is considered as an edge pixel if:

1. G > Thigh

2. Tlow < G < Thigh and and any of its neighbors in a 3x3 region around it have

gradient magnitudes greater than Thigh

3. If none of neighbors of pixel P(i, j) have high gradient magnitudes but at least

one falls between Tlow and Thigh, search the 5x5 region to see if any of these pixels have a magnitude greater than thigh.

The process of Canny edge detection algorithms is shown in Figure 5.8.

5.3 Using Adaptive Canny Edge Detector

The choice of an edge detection process has a great signiﬁcance in chord-based corner detectors. It may be obvious that the number of corners depends on the number of edges extracted. Most of the contour-based corner detection process uses Canny §5.3 Using Adaptive Canny Edge Detector 118

Figure 5.8: Canny edge detection process edge detector [17], [154], [24] for the initial edge extraction step. However, Canny Edge Detector requires two user-provided thresholds in the final stage for edge tracking, a high threshold Thigh and a low threshold Tlow. Selecting these two thresholds manually is challenging as it might not suitable for natural images. If Thigh is set as too high, many edges might be missed out, but if it is too small, many noises can be detected as an edge. Besides this, these two thresholds may change in different conditions such as saturation, exposure time and other reasons. Thus it is important to find a way to calculate these two thresholds adaptively. CPDA corner detector [17] first uses Canny edge detector with thresholds low = 0.2 and high = 0.7 and this trend continues in [154], [24] and other recent chord-based corner detectors. In- stead of following the trend, we analyze the role of canny edge detection method with both adaptive and predefined threshold on the current state-of-art chord-based corner §5.4 Performance Study 119

detectors. We use adaptive Canny edge detection method that follows the most popular Otsu method to calculate the thresholds Thigh and Tlow which are deduced by least square(LS) method based on gray histogram. We use the adaptive canny edge detector from the implementation of MATLAB 2012b. We have used adaptive canny operator along with canny using predeﬁned high and low threshold, 0.7 and 0.2 respectively. We referred canny with this predeﬁned threshold as canny threshold in our experiment. Most of the corner detectors in the literature are using canny threshold edge detectors [17, 23, 24, 132, 154, 217]. The experimental results have been discussed in Section 5.4.

5.4 Performance Study

In this Section, we discuss the performance of the edge detectors while applying them to detect corners using the corner detectors. First, the dataset is described. Next, the evaluation method and Finally the results are shown.

5.4.1 Dataset

We have used an image dataset of 23 different types of grey scale images to evaluate the performance of the corner detectors using different edge detectors. Seven different transformations have been applied to these base 23 images that includes Scaling, Shearing, Rotation, Rotation-Scale, Non-uniform Scale, JPEG Compression, Gaussian Noise to obtain more than 8000 transformed test images. All the experiments were run on Matlab 2012b on an Windows 7 (64bit) machine with an Intel Core i5-3470 processor and 8GB of RAM.

5.4.2 Evaluation Method

We have applied automatic corner detection evaluation process proposed by Awarang- jeb [17] to examine the number of repeated corners. In this process the detected corner locations of an image are referred to as the reference corners and then compared the locations of the detected corners in the transformed image of the former one with the reference corners. If a reference corner is detected in a corresponding transformed lo- §5.4 Performance Study 120

cation, then that corner is considered as repeated. The main advantage of this process is that there is no limit on the number of images in the dataset. Moreover, this process does not require any human intervention.

5.4.3 Results and Discussion

We studied the most commonly used Canny [33], Sobel [210], Roberts, Prewitt [151], LoG [88] and Zerocross [16] edge detection methods and conducted our experiment to find out the answers of the questions mentioned at the beginning of this Chapter. Se- lection of the best edge detection operator is a crucial step. Noise environment, edge orientation, edge structure and luminance are the variables involved in the selection of edge detector. The operator is so chosen that it is sensitive where the pixel intensity has a gradual change. The serious issues regarding edge detection are false edge detection, high processing time etc. Therefore, based on different parameters, a comparative study between various edge detection operators is performed and analyzed. We considered the performances of very popular chord-to-point distance accumulation (CPDA) [17], Curve to Chord Ratio (CCR) [185], Difference of Gaussian(DoG) [216] with our proposed Single Chord CPDA (SCCPDA), Chord to Cumulative Sum Ra- tio(CCSR) and Multiple Single Chord Accumulated Distance (MSCAD) corner detectors. Figure 5.9 shows the detected corner locations for only CPDA corner detector after using different edge detectors to an image. It is clearly seen that each edge detector gives different corner locations for the same image. This is happened because of the discrete edge extraction structure for each edge detection technique. As Canny, LoG and Zerocross extracts a good number of edges, the number of identified corners are also high. Prewitt, Roberts and Sobel derives less edges, resulting in low numbers of corner locations. The similar result can be observed from Figure 5.10, which shows the detected corner locations for only MSCAD corner detector after using different edge detectors. Initially, we have conducted our test to find out the effects of different geometrical transformations for finding edges, corner and repeatable corners. The comparative results of the edge detectors in terms of number of extracted edges, detected corners §5.4 Performance Study 121

(a) Original (b) Canny

(e) Prewitt (f) Roberts

(g) LoG (h) Zerocross

Figure 5.9: Corner detected by CPDA method using different edge operators §5.4 Performance Study 122

(a) Original (b) Canny

(e) Prewitt (f) Roberts

(g) LoG (h) Zerocross

Figure 5.10: Corner detected by MSCAD method using different edge operators §5.4 Performance Study 123

and repeatable corners under various conditions are presented in Figure 5.11, 5.12, 5.13 respectively.

Figure 5.11: Number of extracted edges after applying different transformations

Figure 5.12: Number of corners after applying different transformations

First, we tried to ﬁnd out the average number of edges retrieved using different corner detectors with different edge detectors after applying the transformations men- §5.4 Performance Study 124

Figure 5.13: Number of repeated corners after applying different transformations tioned earlier. Our first experiment is conducted to notice the effects of different geometrical transformations on the images for detecting edges. Figure 5.11 shows that the Canny edge detector lefts others behind for detecting edges in almost every conditions. Each edge detectors perform differently in various geometrical changes. Eval- uation of the images showed that under several conditions, Canny, LoG, Zerocross, Sobel, Prewitt, Roberts exhibit better performance, respectively. Numbers of detected corners also depend on the number of extracted edges. How- ever, if an edge detector extracts a good number of loosely connected edges, the detected corners will be few and not suitable for practical application (see Figure 5.9). We performed our experiment to figure out the average numbers of corners using different edge detectors under several transformations and found that Canny edge detector results best under most of the geometrical transformations for finding corners. This happened for the same reason as Canny finds more edges results in finding more corners. However, Zerocross and LoG operators performs better in scale and shear transformations than canny edge detector. The repeatability is the process of detecting the same corner locations in two or more different images of the same scene. In this regard, we analyzed how the different edge detectors effect the performances of finding repeatable corners under geometri- §5.4 Performance Study 125

cal changes. It is noticeable that though Canny edge detector finds a large number of edges, resulting more corners, it is not best for finding repeatable corners. LoG operator is best followed by Zerocross operator for finding average repeatable corners. Though LoG and Zerocross operators give better result than Canny, it malfunctions at corners and curves. The edges are not connected like canny, thus it results more edges and corner locations which may not be significant for practical applications. To find which detector is more efficient, we have examined the execution time for each of the four corner detectors using different edge detectors showed in Table 5.1. We have found that Prewitt and Sobel detectors are fast compared to others to detect edges and Robert operator is quicker than others for curve extractions. However, canny edge detector using thresholds is best for finding corners followed by Zerocross and log operator. Now from Figure 5.14, we found that Canny edge detector using adaptive threshold extracts more edges, results in finding a good number of corners, instead of using predefined threshold values. We evaluate the performance of these two edge detectors after applying seven different transformations and from Figure 5.15 we find that Adaptive Canny edge detector performs better than Canny using predefined threshold in terms of the number of edge extractions and finding corners and repeated corners. So we use adaptive canny edge detection method in the primary edge extraction step before detecting corners. §5.4 Performance Study 126

Corner De- Edge Detector Edge De- Curve Ex- Corner Total time tector tection traction Detection time time time canny 2.128 37.622 3.728 43.477 CPDA canny threshold 2.09 4.456 1.314 7.861 Prewitt 0.282 39.126 1.039 40.447 LoG 0.760 29.869 2.921 33.551 Roberts 0.309 46.604 0.586 47.499 Sobel 0.282 38.601 1.078 39.962 Zerocross 0.740 29.374 2.837 32.952 canny 2.5.87 46.644 0.632 49.864 CCR canny threshold 2.414 5.098 0.194 7.705 Prewitt 0.305 39.706 0.172 40.183 LoG 0.926 37.597 0.568 39.091 Roberts 0.399 59.435 0.139 59.974 Sobel 0.309 40.901 0.176 41.386 Zerocross 0.768 30.337 0.428 31.533 canny 2.128 37.622 3.728 43.477 SCCPDA canny threshold 2.414 5.098 0.194 7.705 Prewitt 0.006 0.309 0.027 0.342 LoG 0.760 29.869 2.921 33.551 Roberts 0.317 47.968 0.604 48.885 Sobel 0.302 39.078 0.179 39.559 Zerocross 0.740 29.374 2.837 32.952 canny 2.188 37.574 0.537 40.299 CCSR canny threshold 2.077 4.33 0.168 6.575 Prewitt 0.302 39.437 0.181 39.92 LoG 0.764 30.408 0.423 31.596 Roberts 0.31 46.865 0.113 47.289 Sobel 0.302 39.078 0.179 39.559 Zerocross 0.759 29.506 0.413 30.678 canny 2.151 37.260 3.409 42.821 MSCAD canny threshold 2.029 4.315 1.133 7.476 Prewitt 0.302 40.039 1.033 41.375 LoG 0.753 30.209 2.673 33.636 Roberts 0.317 47.968 0.604 48.885 Sobel 0.293 39.050 1.049 40.391 Zerocross 0.744 29.908 2.667 33.319

Table 5.1: Time computation for different detectors (in seconds) §5.4 Performance Study 127

(a) Original (b) Adaptive Canny

Figure 5.14: Extracted edges and detected corners using Canny adaptive and Canny (0.2-0.7) §5.4 Performance Study 128

Figure 5.15: Performance comparison of Canny adaptive and Canny(0.2-0.7) §5.5 Conclusion 129

5.5 Conclusion

In this Chapter, we have analyzed the performance of different edge operators on different contour-based corner detectors and investigate the performance under different transformations. Since edge detection is the early step in of contour-based corner detection, it is significant to know the performance of different edge detection techniques. In this Chapter, the relative performance of various edge detection techniques is carried out with five contour-based corner detectors. It has been observed that Canny edge detection algorithm results higher accuracy in detection of edges and corners, but it is not best for finding repeatable corners, which is considered as one of the most important criterion to evaluate the performance of corner detection. Instead, LoG operator gives best results. In terms of efficiency, Prewitt, Roberts and Sobel operators are fast compared to others to detect edges. Therefore, we can choose different edge detectors, rather than choosing Canny edge detector as an ideal for each scenario. More importantly, we observed the limitations of commonly used Canny edge detector using predefined threshold and applied adaptive Canny detector instead, which shows better results. Chapter 6

Effective Interest Region Estimation Method to Represent Corners for Image Retrieval

In Chapter 3 and 4, we have proposed new contour-based corner detection methods. In this Chapter, we propose a new method of estimating the interest region for the corner features detected by the our proposed contour-based corner detectors. This interest region is then used to build descriptors for the corner matching. For corner detection, corner detectors take the image as input and outputs only the corner locations with no other information. If the same object which appears on different images captured from different distances, the difference in scales of the object might give different results for the corresponding corners. Thus, it is important to describe the corners by estimating the interest region around the corner location that are invariant to various image transformations. An interest region is basically the neighborhood pixel area around the corner location. Using the interest region, descriptors are build as a vector by encoding relevant information of the corners, which acts as ’numerical ﬁngerprint’ to match identical regions between images. For the role of edge detection, corner detection and interest region estimatation process, see Figure 1.5. In our approach, we propose a new local maxima-based interest region detection method around the detected corner location to build descriptors invariant to different image transformations [4]. The rest of the Chapter is organized as follows. Section 6.1 describes the detail of scale invariance. Section 6.2 explains the concept of scale space, while Section 6.3

130 §6.1 What is Scale Invariance 131

describes the process of determining interest region using scale space along with its weaknesses. Our proposed region estimation method is presented in 6.4. Section 6.5 presents the experimental results. Finally, Section 6.6 concludes the Chapter.

6.1 What is Scale Invariance

Scale invariance is an important property of a local feature. This property can be achieved by matching the features of two images captured from two different scales. These different scales can be either from different distances or using separate focal lengths. The main task to achieve scale invariance is to find a way to describe the corners by building rapid descriptors which can be matched with similar descriptors in other images regardless of the scale differences. Basically, the task is to define interest region around the corner location. The pixels inside the interest region are used to build the scale-invariant descriptor. As discussed in Chapter 2, the first step of matching local features is to detect the features, more precisely the locations of the features. Next, each feature uses their neighborhood pixel areas to estimate a region to build descriptors. These descriptors then represent the corresponding features. This region that is estimated around the feature location to build the descriptor is known as the interest region. Interest regions can be estimated in two ways- 1) at the time of feature detection and 2) at the time of feature description. The most popular approach of region estimation in the local feature-based method is at the feature detection stage. Most of them use a 3D scale- space [109], which is computationally very expensive. On the other hand, prominent feature like corners, which do not contain any significant information in the detection, estimates the region during descriptor stage. An example of interest region in two images is shown in Figure 6.1. Determining the size of the interest region based on the content is an important part of the further feature matching. If the size of the interest region remains same for each feature, then it is hard to match with the corresponding feature of another image with different scale. The reason behind the problem lies on the descriptors built using the neighboring pixels of the particular feature location. The descriptors built with the same sized region will not be similar for individual images of different §6.2 Concepts of Scale Space 132

Figure 6.1: Example of interest region in two images scales. An example is shown in Figure 6.2, which shows two images of the same boat captured from two different distances. The red plus signs (+) in both images are the corresponding local features in the images. Now, if we build a descriptor using an interest region of the same size for both features, their descriptors are likely to be dissimilar. The red circles shown in both images have the same size; however, the content within the circles is obviously not the same. As a result, the descriptors built using these circular regions will not be the same. Therefore, interest regions need to be estimated in such a way that the structures of the image inside the region becomes unchanged, keeping the corresponding feature descriptors unchanged at the same time.

6.2 Concepts of Scale Space

The concept of scale space was ﬁrst used by [108] to achieve scale invariance between image features. It is basically a framework for multi-scale image representation developed by the computer vision community [94, 111]. In this approach, an image is represented in a number of smoother versions of the original image as a stack. These smoothing values can be obtained by convolving the image with different scales of Gaussian kernel. Beacause of the ﬁnest scale of the original image, the multi-scale §6.2 Concepts of Scale Space 133

(a) (b)

Figure 6.2: Example of images with two different scales

process convolves with the kernel with several values. The main advantage of using scale-space is to represent data at all scales simultaneously to reproduce the features of an image despite of the unknown distance or focal length of the camera and object. The motivation for generating a scale-space representation comes from the basic fact of the real-world objects. In practical, different objects can have different structures at different scales. It also may visible in various ways depending on the view. [111] explains the scenario by the example of a branch of trees and clouds. A branch of a tree is meaningful from a particular range of distance, for example, from a few centimetres or a few meters. If the branch is observed from the nanometer or kilometer distance, the overall concept of the branch changes. It is rather appropriate for the concept of molecules (which forms the tree) and the forest, respectively. Simi- larly, at ﬁner scale, the concept of cloud is actually appropriate for the droplets, which froms the cloud. Cloud is only meaningful over a certain range of coarse scales. In practical, there is no way to know what scales are appropriate for describing the data of an unknown object. Thus, the description needs to be considered at all scales simultaneously. The main idea of multi-scale representation is to generate a one-parameter family of smooth images by convolving the original image with a smoothing kernel. Each scale level gradually suppresses the ﬁne-scale structures in the image. The main reasons of representing an image in multiple scales are: 1) to explicitly represent the muli-scale aspect of real life data and (2) to simplify the further processing by sup- §6.2 Concepts of Scale Space 134

Figure 6.3: Scale space representation of an image

pressing the noise. An example of multi-scale representation is shown in Figure 6.3. The Gaussian smoothing kernel is the most commonly used to build the linear scale space because this kernel does not create new structures while generating the coarser scale level from the ﬁne scale level. Therefore, the Gaussian smoothing kernel has been used for decades in image processing to represent the image in multiple scales. The parameter is usually referred as scale (σ) in the case of using the Gaussian kernel. An example of using different level of Gaussian smoothing applied on an image is shown in Figure 6.4. To build the Gaussian scale space, let I(x,y) denotes the intensity of a given image as a function of position and G(x,y;σ) is the Gaussian kernel which is applied to convolve with the image. σ is the standard deviation of the Gaussian kernel. The bigger the scale (σ) value, the smoother the image will be. At the same time, it will remove the ﬁne details of the image.

2 2 1 − x +y G(x,y;σ) = √ e 2σ2 (6.1) 2πσ Now, the possible multi-scale representation, I(x,y;σ), of the original image is derived by

I(x,y;σ) = G(x,y;σ) ∗ I(x,y) (6.2) §6.3 Determining Interest Regions using Scale space 135

Similar to the Gaussian smoothing kernel, scale space can be computed by convolving with any of the Gaussian derivatives such as Laplacian of Gaussian (LoG) [106]. LoG is the second derivative of the Gaussian smoothing and gives information with respect to edges [13, 45, 79]. Basically it tracks the locations where the sign of the LoG response changes. The Difference of Gaussian (DoG), an approximation of LoG, is also used to build the scale space. Regardless of which technique is used to build the scale space, the main challenge is to determine the appropriate scales for each detected feature.

6.3 Determining Interest Regions using Scale space

As an image may have a number of objects at different scales, the detected features of different objects should have an suitable size of interest regions to match the corresponding feature descriptors which are built using the region information. In other words, the content in the interest regions of the corresponding features of the same objects at different distances, should be the same. The feature descriptors may also need to match even if the features are detected from the same object and from the same distance but in a different scene. The task of scale space is to determine this scale for the detected features of an image. The determined scale is usually called the Characteristic Scale of a feature. This characteristic scale, introduced by Lindeberg [109,110], is used to deﬁne the interest region to build the descriptor for representing the feature. To select the region, the characteristic scale needs to be the same for the corresponding feature locations from the reference image and the transformed image. Thus, determining the characteristic scale is one of the most important task to achieve the scale invariance of a feature detector. To determine the scale, Lindeberg [109, 110] introduced the concept of automatic scale selection by applying the scale space tool. This tool allows the feature detectors to detect feature locations with their own characteristic scale. These scales is estimated by detecting local extrema over scales of differential expressions in terms of γ-normalized derivatives. For example, two corresponding feature locations in Figure 6.5 are indicated with the plus (+) signs. Figures 6.5 (c) and (d) represent the responses of the LoG operator over a range of scales. The maxima of the response for those two locations of §6.3 Determining Interest Regions using Scale space 136

(a) (b)

Figure 6.4: Example of different scale levels (a) Original image, (b) σ = 1, (c) σ = 2, (d) σ = 4, (e) σ = 8 and (f) σ = 16 §6.3 Determining Interest Regions using Scale space 137

(a) (b)

Figure 6.5: Example of characteristic scales. (a) and (b) two images with different focal lengths. (c) and (d) the response over scales, respectively. Image courtesy [126].

the two images are found at scales 10.1 and 3.89 respectively. These two scales are called the characteristic scale for the corresponding features. According to the circles (interest regions) drawn using the characteristic scales, the features will have almost the same structure (content) to build the descriptor and eventually the descriptors of the features will be similar to each other. Note that there might be several maxima from the response of the LoG operator. Consequently, one feature location will have multiple image structures to describe and thereby multiple descriptors to represent that feature. The image structure in the interest region of a feature location from the scale, where the maximum is found, is independent of the image resolution because the maxima for the same size of the structure are expected to be found in bigger or smaller resolution images. Among all the Gaussian derivatives, normalized LoG gives the most stable scale response to determine the characteristic scales of the features [124]. However, because of the symmetric nature of the Laplacian kernel, LoG is more responsive for selecting the scales for blob-like features. Figure 6.5 also shows an example of blob features §6.3 Determining Interest Regions using Scale space 138

and their characteristic scales by using the LoG operator. Although Mikolajczyk [126] adapted the normalized LoG operator to estimate the characteristic scale for the Har- ris [73] corner locations, we have found in our experiments that the LoG operator misses lot of distinct structures around the corner location. The related experimental study is shown in Section 6.5 below. Moreover, the Harris corner detector detects more corner locations, some of them are unstable and thereby the unstable corners reduce the repeatability of the corner detector [130]. Contrarily, contour-based corner detectors only detect the stable corner locations and the repeatability is much better than the intensity-based detectors such as the Harris corner detector [73]. For this reason, we are more interested in estimating the interest region using the contour-based corner detectors. Weaknesses of Scale space to Determine Interest Region Although scale space tool has been widely used to approximate the interest region, we have found a few weaknesses of scale space to determine the interest region for corner-like features. As it is mentioned earlier, the scale space is built by using the normalized Laplacian operator which ﬁnds blob-like structures in the image and can determine the interest region well for the blob-like features because the LoG operator is symmetric in nature. In addition, a single pixel value represents the response of the LoG operator regardless of the size or standard deviation of the kernel that has been applied to the image for convolution. Consequently, the interest region of the feature can be adequately estimated if the feature location is adequately localized. Blobs are well localized structures due to its homogeneity; however, blobs do not carry signiﬁcant information at its center (feature location) as the center of a blob has no information about the structure or the size of the blob. This is one of the reasons why corner detectors are preferable as the local feature detector to represent the image. On the other hand, the image structure around a corner location is not homogeneous and a small shift in the corner location may give a completely different estimation of characteristic scales when using the scale space. For example, Figure 6.6 shows two images of a blob and a corner feature. Figure 6.6 (a) has two circles (distinguished by two colors) and their centre is at (111,108) and the location of the corner in Figure 6.6 (b) is also at the same position. Now, we have applied the normalized Laplacian operator (this operator is used to build the scale §6.3 Determining Interest Regions using Scale space 139

(a) (b)

Figure 6.6: Characteristic scales using Laplacian operator (a) blob and (b) corner ()

Location Blob Corner [111 108] 27.73, 39.93 27.73, 39.93 [113 110] 27.73, 39.93 3.11, 4.48, 16.05, 19.26, 23.11, 39.93

Table 6.1: Characteristic scales of Figure 6.6 (a) and (b)

space [126]) of a series of scales on both images and found the characteristic scales by determining the extrema values from the response on the center location over the scales. These scales for the extrema are shown in Table 6.1. We have also shown the scales for the extrema from the response of a location at (113,110) just two pixels away from the center. The normalized LoG operator gives the same characteristic scales for two locations in the case of the blob’s (Figure 6.6 (a)) characteristic scales; however, the characteristic scales of the corner location (Figure 6.6 (b)) do not match each other. This example shows that the traditional way of determining the interest region may not work well for the corner location if the corner detector detects a slightly different location for the corresponding corner in the transformed image. However, we have seen in Chapter 3 that every corner detector has at least a certain amount of localization error. So, the corresponding detected corner location may not be detected exactly at the correct location. The example of the corner in Figure 6.6 (b) does not have any other structure around the corner location; however, real-life images have many complex structures §6.4 Proposed Method 140

which may produce an even worse estimation of the characteristic scales. In addition, the neighborhood of a corner location (such as the T-junction) may also contain multiple backgrounds. The situation may also get complex while comparing a cluttered and an uncluttered object from two different images. Therefore, the scale space tool may fail to estimate the correct characteristic scale in the case of corner-like features. Scale space contains a stack of increasingly smoothed images which produce a very large amount of information, most of it being redundant [195]. Furthermore, the complexity of building the scale space increases quadratically with an increase in the window size of the kernel. Since our aim is to use the robust corners detected by the contour-based detectors (Chapter 3,4) and these detectors do not have any pre- deﬁned information about the scales of the features from the detection step, a new way to estimate the characteristic scales for the features is an obvious step in order to build the descriptor to represent these corners. Therefore, we have proposed a method of calculating the interest region for the corners detected by contour-based detectors. The proposed interest region estimation method also overcomes the weaknesses of the scale space and performs better for the corner-like features according to the experimental study shown in Section 6.5.

6.4 Proposed Method

As mentioned earlier, One of the most important steps to describe local features is to approximate the interest region throughout the feature location to gain the invariance against different image transformation. After the estimation, the pixels inside the interest region are used to build the descriptor. The descriptors, later, represent a feature. In this Section, we propose a new curvature-based interest region detection method to detect corners using contour-based corner detectors [4]. Almost all of the contour- based corner detectors first extract the edges of an image using any edge detectors such as canny [33]. The overall concept of interest region estimation is to pick the curvature maxima of the edges. The estimated region needs to be defined accurately so that for the content of the corner in the transformed image can be estimated well. To do so, first, we detect the corner locations of an image. Then, we estimate the cur- §6.4 Proposed Method 141

vature value to find the corner locations. During estimating the curvature values we also calculate the maxima of the curvatures for each corners of each edges. For each corner we calculate the distance of the nearest maxima. If the distance from the corner location to a curvature maxima is larger than a predefined distance Di, we will keep the maxima. In our method we selected Di as 50. Now for the rest of the candidate maxima, we applied a simple refinement process. In this refinement method, for a particular corner location, we consider all the maxima found after extracting edges. If the distance of two adjacent maxima is lower than 3 pixel, we compute the average of two distances from the corner location. Algorithm 1 presents the pseudo-code of the proposed method.

Algorithm 1 Proposed extrema-based method for estimating interest region around a corner location 1: procedure MYPROCEDURE1 2: Find corners using Contour-based corner detector 3: Cj= jth Corner location 4: Mi= ith Curvature maxima 5: Ri= Distance from a corner to Mi 6: r= Maximum radius of interest region 7: Th= Lowest Threshold 8: Di= Distance between Mi and Mi+1 9: Calculate all the maxima Mi of each edge. 10: Calculate Di from corner location, Cn to each maxima Mi 11: for Cj = 1···n do 12: for Mi = 1···n, which are within the circle of center Cj and radius r do 13: if Di > Th then 14: Ri is added as region for Cj 15: else 16: Di+1=(Di + Di+1)/2 17: Discard Di 18: Repeat the reﬁnement process until all the regions are Th distance apart.

Figure 6.7 shows the corners detected by MSCAD corner detector [6] as the center of the circular regions. The gradient information of all the circles can be gained by the gradient information of the biggest circle. Thus there is no need to calculate the gradient magnitude and orientation for all of the circles, it just need to be computed once for a particular corner location. §6.5 Performance Study 142

Figure 6.7: Interest region detection using edge extrema

6.5 Performance Study

In this Section, we compare our proposed method with Harris-Laplace [126], Awrang- jeb’s Method [20] and the LoG method [109]. We use MSCAD [6], CCSR [3], SCCPDA citeccsr, CCR [153] and CPDA [17] corner detection methods in this regard to observe the performances of the interest region detection process. As feature descriptor, we have applied the popular SIFT [112] descriptor with all of the methods. All the experiments were run on MATLAB 2018a on an Windows 10 (64bit) machine with an Intel Core i5-3470 processor and 8GB of RAM.

6.5.1 Experimental Setup

We have used the CPDA dataset [17,23,24,154] from the previous Chapter to evaluate the performance of the proposed method. As described in Section 3.4.1.2, the dataset has 23 images. The same transformations have been applied which are uniform-scale, rotation, non-uniform scale, the combination of rotation and scale, shear transformation and JPEG compression. Each of the transformations is described in Section 3.4.1.2. Although we are using the same dataset and the same transformations for the ex- §6.5 Performance Study 143

periments, there is a basic difference between the experiments described in Chapter 3 and 4 and this Chapter. Since the transformations applied on the images are known, the transformation matrix can easily be derived. In Chapter 3 and 4, the transformation matrix is used to find out whether or not a reference corner is detected in the transformed image. However, the experiment in this Chapter finds the matching pairs among the descriptors of reference and transformed images, and then uses the transformation matrix to validate the corresponding descriptor locations of the reference image and transformed image for each matched pair. The experiments described in Chapter 3 and 4 did not use descriptors. Since the nearest-neighborhood descriptor matching strategy (Section 2.7) is the most commonly used in finding the correspondences using local feature-based approach, we also compare the descriptors using this strategy. For a particular corner location, our proposed method and the LoG method might have several interest regions, which may result in inappropriate matched descriptors. Therefore, a refinement process was applied to figure out the incorrect matches. In this filtering process, we only consider the corresponding matches with highest number of matched relations between corners from original and transformed images, discarding the rest of the matches.

6.5.2 Evaluation metrics

To evaluate the performance against different transformations, we use commonly used precision and recall measure [127]. Precision and recall represents the number of correct matches with respect to total number of matches and number of correct matches with respect to the total number of correspondence respectively. Thus, precision and recall can be measured by the equation 6.3 and 6.4.

number of correct matches precision = (6.3) total number of matches

number of correct matches recall = (6.4) total number of correspondences §6.5 Performance Study 144

6.5.3 Parameter Settings

In this Section, we will show the recall and precision graphs for a range of thresholds to distinguish between two circular regions and a range of dimensions which can cover 360◦ of gradient orientations. Next, we will select the threshold and the dimension of the scale-histogram for our proposed method based on the evaluation. The dimension of the scale-histogram relates to the complexity of the proposed method. If we choose a higher dimension it may be easier to differentiate two regions; however, it will increase the complexity due to calculating the distance between two scale-histograms. On the other hand, lower dimensionality may decrease the possibility of ﬁnding the discriminative image structure around the feature location. The threshold is also related to the complexity in the matching step when comparing descriptors between two images. We may use a lower threshold value to get a higher number of distinct regions; however, that may provide regions which are slightly different from each other. Consequently, we will have a higher number of descriptors to compare in the matching step. In contrast, a higher threshold value may reduce the number of feature descriptors but there is the possibility of missing important image structures. Therefore, both the dimension and the threshold value need to be decided very carefully.

Figures 6.8 and 6.9 show the precision and recall using a series of thresholds (th) and dimensions. Our proposed corner detectors, CCSR and MSCAD, have been used as the feature detectors and a state-of-the-art feature descriptor named SIFT [112] has been used to describe the interest regions. Figures 6.8 (a) and 6.9 (a) show that the recall of our proposed methods is better with the lower dimensions until the dimension reaches 40, then at dimension 36 the recall starts decreasing. A similar scenario is observed in Figures 6.8 (b) and 6.9 (b). The precision is also better with the lower dimension up to 40. Therefore, we have selected 40 as the dimension size of the scale- histogram. We have used a threshold range of [0.30, 0.40] with a 0.01 interval for differen- tiating the distinct circular regions. According to the experimental results, the lower threshold gives better precision and recall. The experimental results shown in the next

Section uses the threshold (th) as 0.3. §6.5 Performance Study 145

Figure 6.8: (a) Precision and (b) Recall graphs for CCSR corner locations against a series of threshold (th) using different dimensions §6.5 Performance Study 146

Figure 6.9: (a) Precision and (b) Recall graphs for MSCAD corner locations against a series of threshold (th) using different dimensions §6.5 Performance Study 147

6.5.4 Experimental results

We use the popular precision and recall measure for the performance evaluation. Fig- ure 6.10 shows precision vs recall graph with the performance comparison of all the methods. We have considered six different image transformations to observe the performance of the corner detectors using different interest region estimation process along with our proposed interest region detection method. The parameters of using these transformations are discussed in 3.4.1.2. From the output results, it is easily observable that, under all the image transformations, our proposed interest region estimation method along with the MSCAD [6] corner detector performs continually better compared to other methods.Using CCSR [3] corner detector with the proposed method performs second best. Though for JPEG compression, LoG based approach performs better than other methods as there is no geometric transformation, our proposed method performs consistently better in almost every cased. Although the precision of Harris-Laplace [73] is better in a few cases when the recall is higher, the recall of the proposed method is always higher than Harris-Laplace. Awrangjeb’s method and the LoG method do not show consistent performance for all the image transformations. Both precision and recall in these two methods are not better than the proposed method. However, Awrangjeb’s method has better precision than Harris-Laplace only when the recall is lower. In this experiment we compared the performance of contour-based corner detectors along with Harris-Laplace detector, which is an intensity-based corner detector. The reason to select Harris-laplace was to observe the performance of intensity-based detector with the interest region methods. But we can see in all cases, contour-based detectors perform better than the intensity based detector. The reasons behind the better performance of the contour-based corner detectors lies on the effective and efﬁ- cient corner detection process, which eventually gives the higher possibility of ﬁnding the corresponding corners in the transformed images compared to the intensity-based corner detectors. §6.5 Performance Study 148

(a) Scale transformation

(b) Rotation transformation

Figure 6.10: Performance evaluation of different interest region estimation methods using different corner detectors §6.5 Performance Study 149

(d) Gaussian transformation

Figure 6.10: Performance evaluation of different interest region estimation methods using different corner detectors (Continued). §6.5 Performance Study 150

(e) Rotation and Scale transformation

(f) Non-Uniform transformation

Figure 6.10: Performance evaluation of different interest region estimation methods using different corner detectors (Continued). §6.6 Conclusion 151

(g) Shear transformation

Figure 6.10: Performance evaluation of different interest region estimation methods using different corner detectors (Continued).

6.6 Conclusion

In this Chapter, we propose a new interest region estimation method around the corner location using the curvature maxima. Our proposed method represent the corners using contour-based corner detectors and local feature detectors. The experimental results show that our proposed method performs better than other existing methods in terms of precision and recall values. Chapter 7

Application

7.1 Introduction

This Chapter presents a unified image matching framework adapted from bag-of- visual-words (BOVW) model with the integration of our proposed corner detection and interest region detection methods presented in Chapter 3, 4 and 6. We mentioned in Chapter 1 that the major concern of image matching is to find the corresponding features in two or more images. If two images contain similar contents, there is still a possibility that the contents in the two images might differ in several factors like scale, rotation and viewpoint. The main objective of this Chapter is to observe the similarity of different images by image matching performance using our proposed methods with SIFT feature descriptors by adapting the popular Bag-of-Visual-Words (BoVW) model. We mainly focus on the performance of different corner detectors for image matching. At first, we compute the corner locations from the sets of training and test images. Next, we estimate the interest region around the corner locations and build descriptors with pixels within the region using SIFT descriptor. Finally, we use these descriptors in an image matching framework based on the bag-of-visual-words (BoVW) model to retrieve similar images efficiently and accurately. The rest of this Chapter is organized as follows. Section 7.2 gives an overview of the BoVW model. In Section 7.3, the proposed unified framework of our image matching system is presented. Our experimental results are manipulated in Section 7.4. Finally, Section 7.5 concludes the Chapter.

152 §7.2 Bag-of-Visual-Words model 153

7.2 Bag-of-Visual-Words model

In this Section, we describe the Bag-of-Visual-Words (BoVW) model brieﬂy. The BoVW model is inspired by the Bag-of-Words (BoW) concept for text document analysis [85, 180], where documents are considered as a collection of words, regardless of any grammatical and word order. Each document is then represented with histogram presentation with the frequency of occurrences of the words in the vocabulary. These histograms are then applied to perform document retrieval and classiﬁcation. Due to the outstanding performance of this approach, it became popular in computer vision applications. Motivated from the Bag-of-words (BoW) model, Bag-of-Visual-Words (BoVW) model represents an image by an unordered set of discrete visual features. These image features are treated as ”words” in this approach. Basically, a ’Bag-of- Visual-Words’ is a vector of the number of occurrence of a local image features into a vocabulary. The BoVW model mainly consists of three stages-1) Feature detection, 2) Feature description and 3) Codebook generation and Image representation as shown in Figure 7.1.

Feature Detection

The first step of BoW model is to detect features or interest points. These features can be local features or global features. As discussed before, global features are used to represent the entire image, which is not always applicable in image retrieval or cat- egorization. On the other hand, local features use certain interest point or regions of an image which are the most prominent parts of the image. These interest points need to be invariant to any geometric or photometric transformations such as scales, rotation etc. There are several well-known detectors are represented in literature, among them some popular include Harris-Laplace, Hessian-Laplace, Harris Affine, Hessian Affine, Difference of Gaussian (DoG), Laplacian of Gaussian (LoG), Maximally stable external regions (MSER) and others, which are briefly discussed on Chapter 2. Scale invariant feature transform (SIFT) [112] is an efficient feature detecting method to detect the features and later describe them in an image. This has been used in many computer vision applications in recent years as it is robust compared to other methds. §7.2 Bag-of-Visual-Words model 154

Figure 7.1: Bag of Words Model §7.2 Bag-of-Visual-Words model 155

After feature extraction for the training images, the features that are detected from the images are used to build the descriptor for image matching. Feature Description Feature descriptors determine the neighborhood of pixels around a localized feature point. Similar to feature detectors, descriptors should also be invariant to various geometric and photometric transformations. The descriptors generate histograms of image information derived from interest regions. The most commonly used feature descriptors in BoW model includes Scale Invariant Feature Transform (SIFT) [112], Speeded Up Robust Features (SURF) [27] and Local Binary Pattern (LBP) [140]. Among them, SIFT is one of the most robust descriptors with respect to different geometrical and photo-metrical changes [127]. Using SIFT descriptor, the gradient magnitude and orientation are sampled in a 16×16 region around the key point. The region is then sub-divided, using a 4 × 4 grid component for each descriptor. The orientation histograms (quantized to 8 directions) are computed on each sub-region(see Fig: 2.8). Finally, this results in a 128 (i.e. 4 × 4 × 8) dimensional feature descriptor. The descriptor is then normalized to make it more invariant against illumination changes. Inspired by SIFT, many of its variants have been developed by researchers in the literature [53, 206].

Codebook Generation and Image Representation The next step after feature detection and description is to quantize the descriptors into visual words in order to produce a visual word dictionary. This quantization process minimizes the high dimension local features to a set of indexes. In this way, an image can be treated as a document with each descriptor assigned to an index ID like a text word. The next process is to batch these indexing IDs into a set of clusters. A number of different clustering algorithms are proposed in the literature, such as k-means and hierarchical clustering method [74], among them, the most popular one is the K-means clustering method for its simplicity and its unsupervised learning algorithm. K-Means clustering basically takes a large number of local features in n-dimensional space and generates a smaller number of clusters by iteration. Each cluster is treated as an individual visual word in the vocabulary, which is represented by their respective cluster center to generate the codebook. The size of the codebook §7.3 Proposed Framework using BOVW model 156

depends on the number of clusters as well as the dataset used to build the vocabulary. The choice of the codebook size has a profound effect on the overall performance of BOVW framework. If the vocabulary is very small, it would be difﬁcult to differentiate two distinct sets of features. On the other hand, a very large vocabulary would lack generalization, more prone to noise and time-consuming. However, a large vocabulary is found to increase the performance for larger datasets, compared to a smaller vocabulary. After ﬁnalizing the visual vocabulary, each descriptor is assigned to a single visual word (cluster) in the codebook. Then, a histogram is generated based on the number of occurrence of each visual word. This histogram made from the visual words, represents the image. Once the visual codebook has been generated, the image can be represented by the histogram of the codewords.

7.3 Proposed Framework using BOVW model

In this Section, we proposed a uniﬁed framework for image matching using Bag-of- Visual-Words model. The proposed framework consists of four stages: 1) Key points detection 2) Interest region estimation 3) Codebook generation and 4) Image matching. The pipeline of the proposed framework is shown in Figure 7.2. The detailed process of each stage is discussed next.

7.3.1 key point Detection

The ﬁrst step of our approach is to detect the key points. In our work, we chose corner points as the key points, as corners contain most important information of an image. For this purpose, we use our proposed CCSR, SCCPDA and MSCAD corner detectors [3, 6], which are described in detail in Chapter 3 and 4. Using the corner detector, we ﬁrst get all the possible corner locations of an image. Figure 7.3 shows an example of detected corners of an image by using MSCAD corner detector. §7.3 Proposed Framework using BOVW model 157

Figure 7.2: Proposed Bag of Words Model

(a) (b)

Figure 7.3: Example of corners identiﬁed by using MSCAD corner detector

7.3.2 Interest Region Estimation

After extracting all the corner locations, the next step is to measure the interest region around the corner location. To accomplish the tasks, we applied our proposed region estimation method [4] discussed brieﬂy in Chapter 6. First, we considered a number of circular regions keeping the corner location as center. Next, we build a histogram §7.3 Proposed Framework using BOVW model 158

Figure 7.4: SIFT descriptor calculation using the information of the pixels inside the candidate circles to calculate the dissimilarity among the adjacent circular regions. Later, the gradient magnitude and pixel orientation of the corresponding interest regions are calculated in order to build the histogram. Finally, we calculate the Euclidean distance between the histograms of each successive radii based on thresholds. We keep a circle with higher radius, only if the distance is less than a threshold, otherwise, we keep the lower radius when the length is higher than the same threshold (th).

7.3.3 Codebook Generation and Image matching

After estimating the region around the corner locations, we build the descriptors using popular SIFT technique proposed by Lowe et al [112]. In this technique, the circular region around the key-point is divided into 4 x 4, not overlapping patches and the histogram gradient orientations within these patches are calculated. Histogram smoothing is done in order to avoid sudden changes of orientation and the bin size is reduced to 8 bins in order to limit the descriptor’s size. This results into a 4 x 4 x 8 = 128 dimensional feature vector for each key point. Figure 7.4 illustrates this procedure for a 2 x 2 window. Now, we quantize the descriptors into clusters using K-means clustering algorithm. These clusters are treated as visual words. The center of the cluster is referred to as centroid. K-means clustering algorithm iteratively assigns each feature descriptor to the cluster whose mean is closest to it and then update the mean of the clusters with the centroid by new cluster. An example of clustering using K-means algorithm is depicted in Figure 7.5. Next, we calculate the distance of a descriptor from the cen- §7.4 Experimental Results 159

Figure 7.5: Examples of clustering using K-means algorithm troid of each cluster. The descriptor is then added to the cluster which has the lowest distance from the centroid. Next, we compute the occurrence frequency of visual words in a vector and then generate the codebook by using the vector histogram. To match images similar to the query image, the search is conducted by its representative visual code word. we calculate the similarity between the vector of the query image and the training dataset. We compute the Euclidean distance between training and query image descriptor using a threshold. In the next Section, we will present the results of image matching using the proposed framework.

7.4 Experimental Results

In this Section, we present the experimental results of our proposed method.

7.4.1 Experimental setup

To analyze the performance of our proposed corner detectors, we build the codebooks with SIFT-based histogram from a set of randomly selected training images in the datasets. We perform our experiment using PASCAL VOC dataset [50] consisting around 5000 images and Coil 1000 object dataset of 1000 images. We used half of the §7.4 Experimental Results 160

images each class for training and the half for query purposes. All the experiments were run on Matlab 2016b on a Windows 10 (64bit) machine with an Intel Xeon E3- 1280 processor and 32GB of RAM.

7.4.1.1 Training Stage

During the training stage, the proposed system first takes the input images and con- vert them to grayscale images. Then the corner locations of the images are detected using our proposed contour-based corner detectors discussed in Chapter 3 and 4. Us- ing the corner location, we determine the interest region to build descriptors. These descriptors are now clustered using K-means algorithm. K-means algorithm has some difficulties, one of them is the determination of the value of parameter K. We conducted our experiments for K-means with different values of K and get different results. We consider the best values of K for the better matching results. Now for each descriptor in the image, we will find the nearest visual word from the vocabulary for each feature vector with Euclidean distance based matching. Then, each image in the dataset is encoded to form a histogram of k bins, where each bin represents the frequency of occurrence of the k visual words in the image under con- sideration. The retrieval is performed by matching the histograms. For training images, we follow the initial steps as mentioned above to build the descriptor. We use the popular SIFT descriptor.

7.4.1.2 Query Phase

Similar to the traingin stage, for the query phase, the proposed system takes an image as a query image and converts it to a grayscale image. Then the corner locations are detected using the chord-based corner detectors we propose. Then following the training stage, we select the interest region to build descriptor using SIFT algorithm. Later we cluster the descriptors using K-means algorithm in order to build visual vocabulary. Next, we calculate the histogram for each image. Finally we compute the distance of the histogram of the query image to each visual word in the visual dictionary for the testing images using Euclidean Distance. From each corner location, the shortest histogram distance is chosen and then added to all the corner points in §7.4 Experimental Results 161

the query image. In this process, the minimum distance from the query image to the testing image can be calculate. We repeat the process for all the entry of the visual dictionary. Once the process is complete, the query image should have 10 lowest distances to each group representing the distance of the query image. Then we select the smallest minimum distance and classify the query image with the smallest distance to the query image. When a query ﬁnds its matching group, its functions are also mapped to the code book. The extracted points of interest and descriptors are then grouped into the visual words by calculating the distance using the Euclidean distance and selecting the smallest distance from each point of interest to each visual word. Then we generate another histogram that shows how many points of interest are grouped for each visual word. This means that the query image has its own BoVW Histogram. The BoVW Histogram query image is then matched to the training images of the same matching group in order to return the highest match.

7.4.2 Results

We have used two standard datasets to evaluate the performances of different corner detectors for image matching. In the Chapter 6, we have compared the performances mainly using the chord-based corner detectors along with a non-chord corner detector (i.e., Harris Laplace detector). In this Chapter, for the performance evaluation, we only considered the chord-based corner detectors to observe the performance. The datasets and the results are described next.

7.4.2.1 Dataset 1: Corel 1000 dataset

The Corel-1000 dataset [201] consists of 1000 natural images that are categorized into ten different classes. Each class contains 100 images and the resolution of each image is 256x384 or vice versa. Some sample images from the Corel1000 are shown in Figure 7.6. To ﬁnd out the impact of the size of the codebooks for individual features, we conduct our experiments by varying the codebook sizes for different features. 50% of the images are used as trained images and the other 50% images are used as query/test images. For performance evaluation, we have used different vocabulary sizes (500, §7.4 Experimental Results 162

Figure 7.6: Examples of Corel1000 dataset

Category 500 Words 1000 Words 1500 Words Africa 68.9 64.4 57.9 Bus 76.2 79.8 70.2 Beach 36.4 34.8 31.8 Building 35.2 43.0 42.8 Dinosaur 85.9 91.4 86.8 Elephant 59.6 27.8 23.4 Flower 85.2 83.8 89.8 Food 60.4 55.2 61.2 Horse 78.4 79.8 70.4 Mountain 49.5 55.2 56.2

Table 7.1: Average precision of the retrieval results for varying codebook sizes constructed for MSCAD detector (Corel dataset)

1000 and 1500) to observe the image matching performance of different corner detectors. Figure 7.7 and 7.8 show the results of retrieval with an example query image. The result clearly shows that our proposed MSCAD corner detector considerably gives better performance in retrieval compared to CPDA and CCR corner detectors. Table 7.1 shows the Mean Average Precision of the retrieval results using MSCAD §7.4 Experimental Results 163

(a) MSCAD

(b) SCCPDA

Figure 7.7: Example of matched ﬂower images using different corner detectors of Corel 1000 dataset. §7.4 Experimental Results 164

(d) CPDA

(e) CCR

Figure 7.7: Example of matched ﬂower images using different corner detectors of Corel 1000 dataset (Continued). §7.4 Experimental Results 165

(a) MSCAD

(b) SCCPDA

Figure 7.8: Example of matched busimages using different corner detectors of Corel 1000 dataset. §7.4 Experimental Results 166

(d) CPDA

(e) CCR

Figure 7.8: Example of matched bus images using different corner detectors of Corel 1000 dataset (Continued). §7.4 Experimental Results 167

corner detector, using the top 10 retrieved images with the varying number of words in the codebook. Images are represented with histograms built with those different codebooks. Similarly, Table 7.2, Table 7.3, Table 7.4 and Table 7.5 shows the Mean Av- erage Precision of the retrieval results using CCR, CPDA, CCSR and SCCPDA corner detectors respectively. From Tables 7.1, 7.2, 7.3, 7.4 and 7.5, it can be seen that the histograms built with codebook size of 500 words with SIFT feature descriptors have better retrieval performance than the other codebooks. However, in the case of SIFT descriptor, histogram with 1500 words also provide comparable performance with that of 500 words. Fig- ure 7.10 shows the number of matched images for different categories of images using three different corner detectors with codebook size of 500 words. The result again indicates the superiority of our proposed MSCAD detector in compared to other detectors to retrieve relevant images for most of the categories. We have evaluated the image matching performance of our BOVW-based framework with different corner detectors using different vocabulary sizes (500, 1000, 1500, 2000). We have calculated the Mean Average Precision (MAP) by using the top 10 retrieved images. The result is shown in Figure 7.9.

Category 500 Words 1000 Words 1500 Words Africa 60.6 61.4 59.1 Bus 63.6 78.6 67 Beach 37.6 25.0 32.8 Building 26.4 30.8 26.4 Dinosaur 73.4 89.1 71.4 Elephant 55.4 28.6 23.8 Flower 84.8 80.2 75.6 Food 60.2 53.2 57.0 Horse 74.6 71.8 68.6 Mountain 45.9 51.5 45.6

Table 7.2: Average precision of the retrieval results for varying codebook sizes constructed for CCR detector (Corel dataset) §7.4 Experimental Results 168

Category 500 Words 1000 Words 1500 Words Africa 58.9 65.2 59.7 Bus 68.2 79.8 76.6 Beach 37.2 43.2 35.6 Building 36.8 22.8 25.6 Dinosaur 85.4 81.2 77.2 Elephant 41.6 17.6 21.2 Flower 85.1 81.65 86.4 Food 55.4 50.6 56.6 Horse 75.0 62.8 68.0 Mountain 35.9 45.2 46.2

Table 7.3: Average precision of the retrieval results for varying codebook sizes constructed for CPDA detector (Corel dataset)

Category 500 Words 1000 Words 1500 Words Africa 58.9 63.7 52.9 Bus 71.6 74.8 68.2 Beach 36.2 44.5 41.6 Building 24.7 27.8 26.1 Dinosaur 89.1 82.2 91.8 Elephant 43.2 23.6 20.3 Flower 83.0 87.8 88.1 Food 59.5 60.5 58.9 Horse 72.5 69.8 71.4 Mountain 49.5 40.05 45.30

Table 7.4: Average precision of the retrieval results for varying codebook sizes constructed for CCSR detector (Corel dataset)

7.4.2.2 Dataset 2: PASCAL VOC 2007 dataset

We have used PASCAL VOC 2007 dataset which consists of around 5000 images. To ﬁnd out the impact of the size of the codebooks for individual features, we conduct our experiments by varying the codebook sizes for different features. Half of the images are used as trained images and the other halves are used as query/ test images. As mentioned earlier, we have used k-means clustering algorithm to form the bag of visual words or codebooks. Some sample images from the PASCAL VOC 2007 dataset are shown in Figure 7.11. Table 7.6 shows the Mean Average Precision of the retrieval results using MSCAD §7.4 Experimental Results 169

Category 500 Words 1000 Words 1500 Words Africa 56.9 61.4 57.3 Bus 70.6 64.8 73.2 Beach 35.6 41.8 31.2 Building 27.6 27.8 24.6 Dinosaur 83.4 81.1 90.1 Elephant 40.5 21.8 23.2 Flower 82.8 82.0 87.5 Food 57.2 52.3 54.7 Horse 72.6 65.1 63.6 Mountain 40.5 45.4 51.2

Table 7.5: Average precision of the retrieval results for varying codebook sizes constructed for SCCPDA detector (Corel dataset)

Figure 7.9: Performance analysis of CPDA, CCR, MSCAD, CCSR and SCCPDA corner detectors using the proposed BOVW-based framework on the Corel-1000 dataset) corner detector, when the top 10 images are retrieved with a varying number of words in the codebook. Codebooks of size 500, 1000 and 1500 words are considered and images are represented with histograms built with them. Similarly, Table 7.7 and Table 7.8 shows the percentage average precision of the retrieval results using CCR and CPDA corner detectors respectively. §7.4 Experimental Results 170

Figure 7.10: Number of retrieved images of different categories using different corner detectors (Corel dataset)

Category 500 Words 1000 Words 1500 Words Car 36.79 25.32 21.47 Horse 14.26 11.62 13.31 Background 42.21 42.43 42.31

Table 7.6: Mean Average Precision (MAP%) of the retrieval results for varying codebook sizes using MSCAD corner detector (PASCAL VOC 2007 dataset)

Category 500 Words 1000 Words 1500 Words Car 25.53 24.17 24.44 Horse 13.72 11.69 12.03 Background 40.78 42.12 41.82

Table 7.7: Mean Average Precision (MAP%) of the retrieval results for varying codebook sizes using CCR corner detector (PASCAL VOC 2007 dataset)

From Tables 7.6, 7.7 and 7.8, it can be seen that the histograms built with codebook size of 500 words with SIFT feature descriptors have better retrieval performance than §7.4 Experimental Results 171

Figure 7.11: Examples of PASCAL VOC 2007 dataset

Category 500 Words 1000 Words 1500 Words Car 26.24 25.38 21.29 Horse 13.78 8.85 8.65 Background 40.71 42.41 41.38

Table 7.8: Mean Average Precision (MAP%) of the retrieval results for varying codebook sizes using CPDA corner detector (PASCAL VOC 2007 dataset)

the other codebooks. However, in the case of SIFT descriptor, histogram with a codebook of 1500 words also provides comparable performance with that of the codebook of 500 words. Figure 7.12 show the results of retrieval with an example query image. From the retrieved result, it is easily noticeable that our proposed detectors considerably gives better performance in retrieval. To observe the performance of the corner detectors, we calculate the average mean precision from the set of trained and query images of the dataset. Examples of average number of images retrieved by different corner detectors with the class ’Horse’ and class ’Car’ are depicted in Figure 7.13 and 7.14 which uses a different number of §7.4 Experimental Results 172

(a) MSCAD

(b) SCCPDA

Figure 7.12: Example of matched bus images using different corner detectors of PAS- CAL VOC 2007 dataset with query image at the top. §7.4 Experimental Results 173

(d) CPDA

(e) CCR

Figure 7.12: Example of matched car images using different corner detectors of PAS- CAL VOC 2007 dataset with query image at the top.(Continued). §7.4 Experimental Results 174

Figure 7.13: Mean Average Precision (MAP%) for class ’Car’ images using different corner detectors codebooks. The Mean Average Precision of these two classes are presented in Figure 7.15 and 7.16. The result shows the efﬁciency of our proposed MSCAD detector with high precision competent with other detectors when images are retrieved. In our experiment, we have used CPDA method which uses the default threshold 0.2 for all of the cases. To observe the performance of CPDA method in compared to our proposed method, we have conducted another experiment. The main objective of this experiment is to observe the performance of CPDA method using different threshold. The key concern is- Can CPDA performs better than our proposed corner detectors, if we choose different threshold rather than the default value? To get the answer of this question, we have chosen two different threshold value which were used in other popular contour-based corner detectors. We have chosen threshold 0.24 and 0.989, which were used in CTAR and our proposed SCCPDA corner detectors respectively along with the default threshold 0.2. We have calculated the mean average precision (MAP%) using the dataset. Figure 7.17 shows the mean average precision re- §7.4 Experimental Results 175

Figure 7.14: Mean Average Precision (MAP%) for class ’Horse’ images using different corner detectors sults, which clearly indicates that, despite of using different thresholds, our proposed corner detector performs better than CPDA. We have also calculated the number of matched images from the dataset using different threshold values and observed the result. Again, from Figure 7.18, it is clearly observable that our proposed detector performs better in retrieving matched images. While conducting this experiment, we have also noticed that CPDA actually performs better in ﬁnding images when the threshold is high. Figure 7.19 shows the average mean precision results of different corner detectors using all the classes of PASCAL VOC 2007 dataset by using our proposed Bag- of-Visual-Words-based image matching framework discussed in Section 7.3. The resulting Mean Average Precision (MAP%) clearly indicates that our proposed MSCAD detector gives better result with accuracy. We observe from the experimental results that the process can not give 100% accuracy of matching true images. An example is depicted in Figure 7.20. We can see from the output that the result gives non-relevant image (see the second image of the second row in Figure 7.20). To overcome this problem, we can combine our process §7.4 Experimental Results 176

Figure 7.15: Number of matched images for class ’Car’ images using different corner detectors

Figure 7.16: Number of matched images for class ’Horse’ images using different corner detectors §7.4 Experimental Results 177

Figure 7.17: Performance of CPDA method using different threshold

Figure 7.18: Number of matched images using CPDA method with various threshold along with proposed MSCAD detector §7.4 Experimental Results 178

Figure 7.19: Performance analysis of CPDA, CCR and MSCAD corner detectors using the proposed BOVW-based framework on the PASCAL VOC 2007 dataset with color features. The color feature is generally represented with histogram, computed in the HSV color space quantized into 16 bins of hue, 4 bins of saturation and 4 bins of value. If we combine the whole process with color features along with the corner feature, the accuracy of the matching results will be increased as well as the non-relevant images will be eliminated. §7.4 Experimental Results 179

Figure 7.20: Example of retrieved image using the proposed framework §7.5 Conclusion 180

7.5 Conclusion

In this Chapter, a corner matching application is presented using the Bag-of-Visual words (BOVW) approach. The main aim of this approach is to observe the effectiveness of the corner detection algorithms in the image matching process. A combined framework using BOVW model with our proposed corner detector and interest region estimation method along with SIFT descriptor was used for image retrieval. We observe that the average precisions of the top retrieved results using our proposed corner detectors are better compared to others, indicating the effectiveness of the method. Chapter 8

Conclusion and Future Work

This Chapter summarizes the overall research works outlined in this thesis. This Chapter highlights the achievements and outlines a few potential directions for future work.

8.1 Summary of the Research Findings

The main goal of this thesis is to improve the effectiveness and efficiency of image matching approach by finding the corresponding corner locations between images. In this regard, we have proposed three effective and efficient contour-based corner detectors which detect stable corner locations from the images. We also analyze the role of edge detectors for the corner detection process to choose the best detector for better performance. To effectively match images based on these detected corners, an effective interest region estimation method has been proposed, which can be used to build any robust descriptor to represent corner features. Finally, we adapted the BOVW model in an image matching application to observe the performance of different contour-based corner detectors for image matching. The major findings from our contributions are described as follows.

1. Corner Detection We have analyzed a number of contour-based corner detectors and found some drawback of them. To overcome the problems we have proposed three contour-based corner detectors as follows: i) Single Chord-based Corner Detection using Chord-to-Point Distance Accu- mulation Technique

181 §8.1 Summary of the Research Findings 182

We have analyzed the Chord-to-Point Distance Accumulation (CPDA) detector as several researchers used this method as one of the best contour-based corner detectors. We have found some limitations of this detector, where it is unable to detect the robust corner locations; as well, it detects non-corner locations. All the limitations are discussed in Chapter 3. To overcome these limitations, we have proposed a single chord-based corner detector named Single Chord CPDA (SCCPDA). The original CPDA method uses three chords of length 10, 20 and 30. Instead of using multiple chords, we modified the CPDA process to use a single chord Li, i ε [5, 30], where i has been decided experimentally from the experimental results. This new modified version discards the calculation of distance accumulation of three chords of CPDA method, resulting in better average repeatability and efficiency.

ii) Single Chord-based Corner Detection using Chord to Cumulative Sum Ratio Technique We have proposed another single chord-based approach to detect corner using the measurement of ﬂatness of a curve (details in Chapter 3). As different curves have different ﬂatness, we observe that the ratio of the Chord to the curve length gives the measurement of bentness of a curve, showing the possibility of detecting the corner location. Based on this hypothesis, we have proposed a corner detector which calculates the cumulative sum of the distances of each point of a curve to the chord and the point with longest distance ratio is considered as the potential corner location. We named this detector as Chord-to-Cumulative-Sum-Ratio (CCSR) detector. This detector results in detecting more robust corners with lower localization error compared to other existing contour-based corner detectors.

iii) Corner Detection based on Multiple Chord using Accumulated Distance In Chapter 4, we have proposed a multi-chord based corner detector named Multi- ple Single-Chord Accumulated Distance (MSCAD) detector. This detector ﬁrst uses a corner model based on two parameters, a chord length and an angle. We then propose Single Chord Accumulated Distance (SCAD), a chord-based corner detector which uses the corner model to calculate a suitable threshold, which in turn is used to elim- inate weak and false corners. From a set of exhaustive experiments, we derive four §8.1 Summary of the Research Findings 183

equations, each of which can be used to estimate a suitable angle for a particular chord length, thus eliminating the need to manually specify one of the two parameters of the corner model. As different chord lengths can result in identifying different corner locations, we outline a method to use multiple chord lengths to propose Multiple SCAD (MSCAD). In a qualitative comparison, MSCAD detector consistently identiﬁes more visually obvious corner locations and repeatedly found corners under different transformations, compared to both CPDA and CCR, thus showing better performance than the state-of-the-art techniques.

2. Performance Analysis of Edge Detectors Contour-based corner detection methods are speciﬁcally interesting as they rely on edges detected from an image, and for such corner detectors, edge detection is the ﬁrst step. Almost all the contour-based corner detectors proposed in the last few years use the Canny edge detector. There is no comparative study that explores the effect of using different edge detection method on the performance of these corner detectors. In Chapter 5, we have carried out a performance analysis of different contour-based corner detectors when using different edge detectors. We studied four recently developed corner detectors, which are considered as current state-of-the-art and found that the Canny edge detector should not be taken as a default choice and in fact, the choice of edge detector can have a profound effect on the corner detection performance.

3. Interest Region Estimation Since the contour-based corner detectors do not derive any information for estimating the interest region for the detected corner locations, it is not possible to build the local feature descriptor to represent the corners. We also did not ﬁnd any existing interest region estimation methods or local feature descriptors that can be used directly on the detected corner locations in order to compare the corner locations between two images. After reviewing existing interest region estimation methods for intensity-based feature detectors, we ﬁnd that the estimation is usually done using multi-scale image representation, e.g. scale space. However, using scale space is very expensive in terms of computation and most importantly it is not compatible with contour-based corner detection. Therefore, we proposed an interest region estima- §8.1 Summary of the Research Findings 184

tion method for corner locations detected by the contour-based detector. The general approach of calculating the interest region is to choose the curvature maxima of the edges or contours. The resulting region after the estimation needs to be clariﬁed well enough to represent that corner so that in the transformed image, the corresponding corner can be estimated with the same content. We used a series of circular regions and made histograms from all the circles using the pixels’ gradient information. Next, we established the dissimilarities among the circular regions in corresponding corner locations. Consequently, we selected only those regions with a distinct image content or structure. Later, we built the feature descriptor using the pixels of the interest regions to represent the corner locations. According to the experimental study described in Chapter 6, our proposed estimation method estimates the same interest regions for corner locations which are very close to each other. The estimation method is also invariant under different image transformations such scale, rotation, a combination of scale and rotation, non-uniform scale, shear and different level of JPEG compression. The proposed method can be applied to the features detected by any type of detector to estimate the interest region of the features in order to build the feature descriptors.

4. Application In Chapter 7, to observe the performance of corner detectors for image matching, we have adapted the popular bag-of-visual-words (BoVW) model that uses our proposed corner detection method (discussed in Chapter 3.3.3 and 4) along with the proposed interest region estimation method (discussed in Chapter 6) to build descriptor using SIFT descriptor. Then we quantize the descriptors into clusters using K-means clustering algorithm. These clusters are known as visual words. The center of the cluster is referred to as centroid. Next, we calculate the distance of a descriptor from the centroid of each cluster. The descriptor is then added to the cluster which has the lowest distance from the centroid. Next, we compute the occurrence frequency of visual words in a vector and then generate the codebook by using the vector histogram. To match images similar to the query image, the search is conducted by its representative visual code word. we calculate the similarity between the vector of the query image and the training dataset. We have computed the Euclidean distance between training and query image descriptor using a threshold to retrieve similar images efﬁciently §8.2 Future Work 185

and accurately.

8.2 Future Work

There are a few future research directions which can be developed to further improve our work in this thesis. Some potential directions are discussed below.

1. Faster Corner Detection The current implementation of the contour-based corner detectors discussed in Chapter 3 and 4 estimates the discrete curvature at all the points of a curve and therefore runs slowly. In future we can take initiative to make the process faster. We can use a selected set of curve-points in this regard to observe the performances.

2. Elliptical Interest Region Estimation Our proposed interest region estimation method outlined in Chapter 6 uses circular region to select the pixels around the corner location. However, a few researchers [126, 194] use the elliptical region to describe the feature. Elliptical regions are used to make the features more invariant against afﬁne transformation. For contour-based corner detectors, we can estimate the elliptical region from the angle and edges to make the estimation more invariant against afﬁne transformation.

3. Combine other features for image matching In Chapter 7, we have used the corner features for image matching. However, single feature-based matching can not give accurate results in all the condition. To improve the performance of image matching, we can combine our method with color features as it is robust, stable and more importantly, invariant to different image transformations such as rotation, scaling etc. The combined corner and color features can be used to build a multidimensional feature vector histogram, which can be used in similarity measurement to match image accurately. To improve the result, we can combine other features like shape and edge features.

4. Robust Matching Technique Matching images based on features is robust with respect to image transformation. Thus, new robust dissimilarity metrics can always improve the performance. The study of dissimilarity techniques and §8.2 Future Work 186

spatial-based image matching is a wide research area, which would beneﬁt from more work aiming to introduce innovation.

5. Applications The proposed corner detectors and interest region estimation methods could be used in other applications, for example, mobile robot vision, panorama stitching, video sequence matching, video data mining and face and gesture recognition in future. Bibliography

[1] A. Abdel-Hakim and A. Farag. Csift: A sift descriptor with color invariant characteristics. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 1978 – 1983, 2006. 44

[2] Y. Abdeljaoued and T. Ebrahimi. Feature point extraction using scale-space representation. In Image Processing, 2004. ICIP ’04. 2004 International Conference on, volume 5, pages 3053 – 3056 Vol. 5, oct. 2004. 50

[3] N. Afrin and W. Lai. Single chord-based corner detectors on planar curves. 2015. 8, 53, 61, 62, 67, 72, 142, 147, 157

[4] N. Afrin and W. Lai. Effective interest region estimation model to represent corners for image. Signal and Image Processing : An International Journal (SIPIJ), 9(6), 2018. 9, 130, 140, 158

[5] N. Afrin, N. Mohameed, and W. Lai. Performance analysis of corner detection algorithms based on edge detectors. In Computer Graphics, Visualization and Computer Vision. 25th International Conference on, May. 2017. 9, 25, 108

[6] N. Afrin, N. Mohammed, and W. Lai. An effective multi-chord corner detection technique. In Digital Image Computing: Techniques and Applications (DICTA), 2016 International Conference on, pages 1–8. IEEE, 2016. 8, 32, 37, 48, 85, 87, 141, 142, 147, 157

[7] M. Agrawal and K. Konolige. Censure: Center surround extremas for realtime feature detection and matching. In ECCV, 2008. 18, 42

[8] A. Alahi, R. Ortiz, and P. Vandergheynst. Freak: Fast retina key point. In Com- puter vision and pattern recognition (CVPR), 2012 IEEE conference on, pages 510– 517. Ieee, 2012. 41

[9] P. F. Alcantarilla and T. Solutions. Fast explicit diffusion for accelerated features in nonlinear scale spaces. IEEE Trans. Patt. Anal. Mach. Intell, 34(7):1281–1298, 2011. 41

187 Bibliography 188

[10] N. Ali, K. B. Bajwa, R. Sablatnig, S. A. Chatzichristoﬁs, Z. Iqbal, M. Rashid, and H. A. Habib. A novel image retrieval based on visual words integration of sift and surf. PloS one, 11(6):e0157428, 2016. 49

[11] M. Alkhawlani, M. Elmogy, and H. Elbakry. Content-based image retrieval using local features descriptors and bag-of-visual words. Int. J. Adv. Comput. Sci. Appl.(IJACSA), 6(9), 2015. 49

[12] A. Alzu’bi, A. Amira, and N. Ramzan. Semantic content-based image retrieval: A comprehensive study. Journal of Visual Communication and Image Representa- tion, 32:20–54, 2015. 47

[13] V. Anh, J. Y. Shi, and H. T. Tsui. Scaling theorems for zero crossings of ban- dlimited signals. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 18(3):309 –320, mar 1996. 135

[14] N. Ansari and K. wei Huang. Non-parametric dominant point detection. Pat- tern Recognition, 24(9):849 – 862, 1991. 29

[15] F. Arrebola and F. Sandoval. Corner detection and curve segmentation by multiresolution chain-code linking. Pattern Recogn., 38(10):1596–1614, Oct. 2005. 30

[16] M. Avlash and D. L. Kaur. Performances analysis of different edge detection methods on road images. International Journal of Advanced Research in Engineer- ing and Applied Sciences, 2(6), 2013. 116, 120

[17] M. Awrangjeb and L. Guojun. Robust image corner detection based on the chord-to-point distance accumulation technique. Multimedia, IEEE Transactions on, 10(6):1059–1072, 2008. xiv, 5, 16, 25, 27, 28, 29, 30, 31, 37, 48, 49, 50, 53, 55, 61, 62, 63, 64, 69, 70, 72, 77, 85, 89, 108, 118, 119, 120, 142

[18] M. Awrangjeb and G. Lu. A robust corner matching technique. In Multimedia and Expo, 2007 IEEE International Conference on, pages 1483 –1486, july 2007. 39

[19] M. Awrangjeb and G. Lu. An improved curvature scale-space corner detector and a robust corner matching approach for transformed image identiﬁcation. Image Processing, IEEE Transactions on, 17(12):2425 –2441, dec. 2008. 27

[20] M. Awrangjeb and G. Lu. Techniques for efﬁcient and effective transformed image identiﬁcation. J. Vis. Comun. Image Represent., 20(8):511–520, Nov. 2009. Bibliography 189

37, 39, 48, 142

[21] M. Awrangjeb, G. Lu, and C. Fraser. A comparative study on contour-based corner detectors. In Digital Image Computing: Techniques and Applications (DICTA), 2010 International Conference on, pages 92 –99, dec. 2010. 25, 31

[22] M. Awrangjeb, G. Lu, and C. S. Fraser. Performance comparisons of contour- based corner detectors. Image Processing, IEEE Transactions on, 21(9):4167 –4179, sept. 2012. 25, 31

[23] M. Awrangjeb, G. Lu, C. S. Fraser, and M. Ravanbakhsh. A fast corner detector based on the chord-to-point distance accumulation technique. In Proceedings of the 2009 Digital Image Computing: Techniques and Applications, DICTA ’09, pages 519–525, Washington, DC, USA, 2009. IEEE Computer Society. 5, 16, 19, 25, 30, 32, 72, 108, 119, 142

[24] M. Awrangjeb, G. Lu, and M. Murshed. An afﬁne resilient curvature scale-space corner detector. 1:I–1233 –I–1236, april 2007. 5, 16, 21, 25, 28, 30, 40, 49, 53, 62, 69, 72, 77, 89, 96, 108, 118, 119, 142

[25] J. Babaud, A. P. Witkin, M. Baudin, and R. O. Duda. Uniqueness of the gaussian kernel for scale-space ﬁltering. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-8(1):26 –33, jan. 1986. 20, 23

[26] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-up robust features (surf). Comput. Vis. Image Underst., 110(3):346–359, June 2008. 38

[27] H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. In European conference on computer vision, pages 404–417. Springer, 2006. 2, 13, 17, 18, 42, 48, 156

[28] P. R. Beaudet. Rotationally invariant image operators. In Proc. 4th Int. Joint Conf. Pattern Recog, Tokyo, Japan, 1978, 1978. 21

[29] J. S. Beis and D. G. Lowe. Indexing without invariants in 3d object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21:1000–1015, 1999. 46

[30] S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. Pattern Analysis and Machine Intelligence, IEEE Transactions Bibliography 190

on, 24(4):509 –522, apr 2002. 43

[31] M. Brown, R. Szeliski, and S. Winder. Multi-image matching using multi-scale oriented patches. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 510 – 517 vol. 1, june 2005. 46

[32] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. Brief: binary robust independent elementary features. In Proceedings of the 11th European conference on Computer vision: Part IV, ECCV’10, pages 778–792, Berlin, Heidelberg, 2010. Springer- Verlag. 40

[33] J. Canny. A computational approach to edge detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-8(6):679 –698, nov. 1986. 15, 16, 20, 54, 62, 116, 120, 140

[34] A. Carmona-Poyato, N. Fernandez-Garc´ ´ıa, R. Medina-Carnicer, and F. Madrid- Cuevas. Dominant point detection: A new proposal. Image and Vision Comput- ing, 23(13):1226 – 1236, 2005. 24, 27, 29, 49

[35] V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod. Chog: Compressed histogram of gradients a low bit-rate feature descriptor. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 2504–2511. IEEE, 2009. 43

[36] C.-H. Chen, J.-S. Lee, and Y.-N. Sun. Wavelet transformation for gray-level corner detection. Pattern Recognition, 28(6):853 – 861, 1995. 24

[37] S. Chen, H. Meng, C. Zhang, and C. Liu. A kd curvature based corner detector. Neurocomputing, 173:434–441, 2016. 32

[38] N. Chen He, X; Yung. Corner detector based on global and local curvature properties. Optical Engineering, 47(5), 2008. 55, 96

[39] F.-H. Cheng and W.-H. Hsu. Parallel algorithm for corner ﬁnding on digital curves. Pattern Recognition Letters, 8(1):47 – 53, 1988. 16, 27, 33, 49

[40] P. Cornic. Another look at the dominant point detection of digital curves. Pat- tern Recogn. Lett., 18(1):13–25, Jan. 1997. 24, 28

[41] C. Cui and K. Ngan. Automatic scale selection for corners and junctions. In Bibliography 191

Image Processing (ICIP), 2009 16th IEEE International Conference on, pages 989 – 992, nov. 2009. 34

[42] C. Cui and K. N. Ngan. Scale- and afﬁne-invariant fan feature. Image Processing, IEEE Transactions on, 20(6):1627 –1640, june 2011. 34

[43] R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, inﬂuences, and trends of the new age. ACM Computing Surveys (Csur), 40(2):5, 2008. 13, 47

[44] J. David C. Munson. A note on lena. IEEE Transactions on Image Processing, Vol. 5, No. 1, Jan. 1996. xiv, 55, 56, 58

[45] H. Dehghan. Zero-crossing contour construction for scale-space ﬁltering. 2:1479 –1483 vol.2, nov. 1997. 135

[46] L. Dreschler and H.-H. Nagel. Volumetric model and 3d trajectory of a moving car derived from monocular tv frame sequences of a street scene. Computer Graphics and Image Processing, 20(3):199–228, 1982. 21

[47] C. Duan, X. Meng, C. Tu, and C. Yang. How to make local image features more efﬁcient and distinctive. Computer Vision, IET, 2(3):178 –189, september 2008. 43

[48] M. Ebrahimi and W. Mayol-Cuevas. Susure: Speeded up surround extrema feature detector and descriptor for realtime applications. In Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009. IEEE Computer Society Conference on, pages 9 –14, june 2009. 18, 42, 43

[49] S. Ehsan, N. Kanwal, A. F. Clark, and K. D. McDonald-Maier. Improved repeatability measures for evaluating performance of feature detectors. Electron- ics letters, 46(14):998–1000, 2010. 19

[50] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The pascal visual object classes (voc) challenge. International journal of computer vision, 88(2):303–338, 2010. 160

[51] F. Faille. Adapting interest point detection to illumination conditions. In In Dig- ital Image Computing: Techniques and Applications (DICTA, pages 499–508, 2003. 22

[52] B. Fan, F. Wu, and Z. Hu. Aggregating gradient distributions into intensity or- ders: A novel local image descriptor. In Computer Vision and Pattern Recognition Bibliography 192

(CVPR), 2011 IEEE Conference on, pages 2377–2384. IEEE, 2011. 41

[53] B. Fan, F. Wu, and Z. Hu. Aggregating gradient distributions into intensity or- ders: A novel local image descriptor. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 2377 –2384, june 2011. 41, 156

[54] C. T. Ferraz, O. PEREIRA, M. V. Rosa, and A. Gonzaga. Object recognition based on bag of features and a new local pattern descriptor. International Journal of Pattern Recognition and Artiﬁcial Intelligence, 28(08):1455010, 2014. 49

[55] P. J. Flynn and A. K. Jain. On reliable curvature estimation. In CVPR, volume 88, pages 5–9, 1989. 33

[56] X. Gao, F. Sattar, A. Quddus, and R. Venkateswarlu. Corner detection of contour images using continuous wavelet transform. In Information, Communications and Signal Processing, 2003 and Fourth Paciﬁc Rim Conference on Multimedia. Proceed- ings of the 2003 Joint Conference of the Fourth International Conference on, volume 2, pages 724 – 728 vol.2, dec. 2003. 30

[57] X. Gao, F. Sattar, A. Quddus, and R. Venkateswarlu. Multiscale contour corner detection based on local natural scale and wavelet transform. Image and Vision Computing, 25(6):890 – 898, 2007. 33

[58] X. Gao, F. Sattar, and R. Venkateswarlu. Multiscale corner detection of gray level images based on log-gabor wavelet transform. Circuits and Systems for Video Technology, IEEE Transactions on, 17(7):868 –875, july 2007. 24

[59] X. Gao, W. Zhang, F. Sattara, R. Venkateswarlu, and E. Sung. Scale-space based corner detection of gray level images using plessey operator. In Information, Communications and Signal Processing, 2005 Fifth International Conference on, pages 683–687. IEEE, 2005. 20

[60] Y. Gao, W. Huang, and Y. Qiao. Local multi-grouped binary descriptor with ring-based pooling conﬁguration and optimization. IEEE Transactions on Image Processing, 24(12):4820–4833, 2015. 41

[61] A. Garrido, N. P. de la blanca, and M. Garcia-Silvente. Boundary simpliﬁca- tion using a multiscale dominant-point detection algorithm. Pattern Recognition, 31(6):791 – 804, 1998. 30 Bibliography 193

[62] S. Gauglitz, T. Hollerer,¨ and M. Turk. Evaluation of interest point detectors and feature descriptors for visual tracking. International journal of computer vision, 94(3):335, 2011. 19

[63] M. Gevrekci and B. Gunturk. Reliable interest point detection under large illumination variations. In Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on, pages 869 –872, oct. 2008. 22

[64] M. Gevrekci and B. K. Gunturk. Illumination robust interest point detection. Computer Vision and Image Understanding, 113(4):565 – 571, 2009. 22

[65] D. Giveki, M. A. Soltanshahi, and G. A. Montazer. A new image feature descriptor for content based image retrieval using scale invariant feature transform and local derivative pattern. Optik-International Journal for Light and Electron Optics, 131:242–254, 2017. 43

[66] G. G. Gordon. Face recognition based on depth and curvature features. In Computer Vision and Pattern Recognition, 1992. Proceedings CVPR’92., 1992 IEEE Computer Society Conference on, pages 808–810. IEEE, 1992. 16

[67] M. Grabner, H. Grabner, and H. Bischof. Fast approximated sift. In IN 7TH ASIAN CONFERENCE OF COMPUTER VISION, pages 918–927, 2006. 18, 42

[68] T. Gritti, C. Shan, V. Jeanne, and R. Braspenning. Local features based facial expression recognition with face registration errors. In Automatic Face Gesture Recognition, 2008. FG ’08. 8th IEEE International Conference on, pages 1 –8, sept. 2008. 44

[69] A. Guiducci. Corner characterization by differential geometry techniques. Pat- tern Recogn. Lett., 8(5):311–318, Dec. 1988. 22

[70] A. Haﬁane, G. Seetharaman, and B. Zavidovique. Median binary pattern for textures classiﬁcation. In International Conference Image Analysis and Recognition, pages 387–398. Springer, 2007. 44

[71] J. H. Han and T. Poston. Distance accumulation and planar curvature. In Com- puter Vision, 1993. Proceedings., Fourth International Conference on, pages 487–491. IEEE, 1993. 86

[72] J. H. Han and T. Poston. Chord-to-point distance accumulation and planar Bibliography 194

curvature: a new approach to discrete curvature. Pattern Recognition Letters, 22(10):1133 – 1144, 2001. 31

[73] C. Harris and M. Stephens. A combined corner and edge detection. pages 147– 151, 1988. 19, 20, 21, 35, 36, 48, 138, 147

[74] J. A. Hartigan and M. A. Wong. Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100– 108, 1979. 156

[75] D. He and N. Cercone. Local triplet pattern for content-based image retrieval. In Proceedings of the 6th International Conference on Image Analysis and Recognition, ICIAR ’09, pages 229–238, 2009. 44

[76] X. C. He and N. H. C. Yung. Curvature scale space corner detector with adaptive threshold and dynamic region of support. pages 791–794, 2004. 5, 21, 29, 30, 31, 53, 72, 77

[77] M. Heikkila,¨ M. Pietikainen,¨ and C. Schmid. Description of interest regions with local binary patterns. Pattern Recogn., 42(3):425–436, Mar. 2009. 44

[78] S. Hinz. Fast and subpixel precise blob detection and attribution. In Image Processing, 2005. ICIP 2005. IEEE International Conference on, volume 3, pages III – 457–60, sept. 2005. 15

[79] R. Hummel and R. Moniot. Reconstructions from zero crossings in scale space. Acoustics, Speech and Signal Processing, IEEE Transactions on, 37(12):2111 –2130, dec 1989. 135

[80] A. K. Jain and A. Vailaya. Image retrieval using color and shape. Pattern recognition, 29(8):1233–1244, 1996. 13, 47

[81] A. Jalilvand, H. Boroujeni, and N. Charkari. Ch-sift: A local kernel color histogram sift based descriptor. In Multimedia Technology (ICMT), 2011 International Conference on, pages 6269 –6272, july 2011. 44

[82] H. Jegou, H. Harzallah, and C. Schmid. A contextual dissimilarity measure for accurate and efﬁcient image search. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on, pages 1–8. IEEE, 2007. 48 Bibliography 195

[83] S. Junding and Z. Zhaosheng. A new contour corner detector based on curvature scale space. In Fuzzy Systems and Knowledge Discovery, 2009. FSKD ’09. Sixth International Conference on, volume 5, pages 316 –319, aug. 2009. 30, 31

[84] I. C. d. P. Junior,´ F. N. S. d. Medeiros, F. N. Bezerra, and D. M. Ushizima. Corner detection within a multiscale framework. 30, 33

[85] F. Jurie and B. Triggs. Creating efﬁcient codebooks for visual recognition. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 1, pages 604–610. IEEE, 2005. 49, 154

[86] T. Kadir and M. Brady. Saliency, scale and image description. Int. J. Comput. Vision, 45(2):83–105, Nov. 2001. 17, 39

[87] T. Kadir, A. Zisserman, and M. Brady. An afﬁne invariant salient region detector, 2004. 39

[88] B. Kamgar-Parsi and A. Rosenfeld. Optimally isotropic laplacian operator. IEEE transactions on image processing: a publication of the IEEE Signal Processing Society, 8(10):1467–1472, 1998. 114, 120

[89] Y. Ke and R. Sukthankar. Pca-sift: a more distinctive representation for local image descriptors. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II– 506 – II–513 Vol.2, june-2 july 2004. 43

[90] B. Kerautret, J. O. Lachaud, and B. Naegel. Comparison of discrete curvature estimators and application to corner detection. In Proceedings of the 4th Inter- national Symposium on Advances in Visual Computing, ISVC ’08, pages 710–719, 2008. 31

[91] J. Kim and K. Grauman. Boundary preserving dense local regions. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1553–1560. IEEE, 2011. 18

[92] J. Kim and K. Grauman. Boundary preserving dense local regions. IEEE transactions on pattern analysis and machine intelligence, 37(5):931–943, 2015. 36

[93] L. Kitchen and A. Rosenfeld. Gray-level corner detection. Pattern recognition letters, 1(2):95–102, 1982. 21 Bibliography 196

[94] J. J. Koenderink. The structure of images. Biological Cybernetics, 50(5):363–370– 370, Aug. 1984. 20, 132

[95] A. Kolesnikov and P. Franti.¨ Polygonal approximation of closed discrete curves. Pattern Recogn., 40(4):1282–1293, Apr. 2007. 33

[96] H. Kong, H. C. Akakin, and S. E. Sarma. A generalized laplacian of gaussian ﬁlter for blob detection and its applications. IEEE transactions on cybernetics, 43(6):1719–1733, 2013. 17

[97] B. Kulis and K. Grauman. Kernelized locality-sensitive hashing. Pattern Anal- ysis and Machine Intelligence, IEEE Transactions on, 34(6):1092 –1104, june 2012. 46

[98] J.-S. Lee, Y.-N. Sun, and C.-H. Chen. Boundary-based corner detection using wavelet transform. In Systems, Man and Cybernetics, 1993. ’Systems Engineering in the Service of Humans’, Conference Proceedings., International Conference on, pages 513 –516 vol.4, oct 1993. 30

[99] J.-S. Lee, Y.-N. Sun, and C.-H. Chen. Multiscale corner detection by using wavelet transform. Image Processing, IEEE Transactions on, 4(1):100 –104, jan 1995. 24

[100] J.-S. Lee, Y.-N. Sun, and C.-H. Chen. Multiscale corner detection by using wavelet transform. Image Processing, IEEE Transactions on, 4(1):100 –104, Jan. 1995. 33

[101] S. Leutenegger, M. Chli, and R. Y. Siegwart. Brisk: Binary robust invariant scalable key points. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 2548–2555. IEEE, 2011. 40

[102] G. Levi and T. Hassner. Latch: learned arrangements of three patch codes. In Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on, pages 1–9. IEEE, 2016. 41

[103] D. Li, Y. Ke, and G. Zhang. A sift descriptor with local kernel color histograms. In Mechanic Automation and Control Engineering (MACE), 2011 Second Interna- tional Conference on, pages 992 –995, july 2011. 44

[104] W. Li, P. Dong, B. Xiao, and L. Zhou. Object recognition based on the region of Bibliography 197

interest and optimal bag of words model. Neurocomputing, 172:271–280, 2016. 49

[105] T. Lindeberg. Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention. International Journal of Computer Vision, 11:283–318, 1993. 16

[106] T. Lindeberg. Scale-Space Theory in Computer Vision. Kluwer international series in engineering and computer science. Springer, 1993. 135

[107] T. Lindeberg. Scale selection for differential operators. In Scale-Space Theory in Computer Vision, pages 317–348. Springer, 1994. 48

[108] T. Lindeberg. Scale-space theory: A basic tool for analysing structures at different scales. Journal of Applied Statistics, pages 224–270, 1994. 17, 132

[109] T. Lindeberg. Feature detection with automatic scale selection. Int. J. Comput. Vision, 30(2):79–116, Nov. 1998. 16, 17, 20, 23, 35, 36, 131, 135, 142

[110] T. Lindeberg. Principles for automatic scale selection. In Handbook on Computer Vision and Applications : volume II, Handbook on Computer Vision and Applica- tions, pages 239–274. 1999. QC 20111024. 135

[111] T. Lindeberg. Scale-space. 2007. 132, 133

[112] D. G. Lowe. Distinctive image features from scale-invariant key points. Int. J. Comput. Vision, 60(2):91–110, Nov. 2004. 2, 13, 17, 34, 36, 37, 41, 43, 48, 142, 144, 154, 156, 159

[113] B. Luo and D. Pycock. Uniﬁed multi-scale corner detection. In 4th IASTED International Conference on Visualisation, Imaging and Image Processing, 2004. 19

[114] R. Maini and H. Aggarwal. Study and comparison of various image edge detection techniques. International journal of image processing (IJIP), 3(1):1–11, 2009. 111

[115] E. Mair, G. D. Hager, D. Burschka, M. Suppa, and G. Hirzinger. Adaptive and generic corner detection based on the accelerated segment test. In European conference on Computer vision, pages 183–196. Springer, 2010. 22

[116] N. S. Mansoori, M. Nejati, P. Razzaghi, and S. Samavi. Bag of visual words approach for image retrieval using color information. In Electrical Engineering Bibliography 198

(ICEE), 2013 21st Iranian Conference on, pages 1–6. IEEE, 2013. 49

[117] M. Marji and P. Siy. A new algorithm for dominant points detection and polygonization of digital curves. Pattern Recognition, 36(10):2239 – 2251, 2003. 28

[118] M. Marji and P. Siy. A new algorithm for dominant points detection and polygonization of digital curves. Pattern recognition, 36(10):2239–2251, 2003. 33

[119] A. Masood. Dominant point detection by reverse polygonization of digital curves. Image Vision Comput., 26(5):702–715, May 2008. 33

[120] A. Masood. Optimized polygonal approximation by dominant point deletion. Pattern Recognition, 41(1):227–239, 2008. 33

[121] A. Masood and M. Sarfraz. Corner detection by sliding rectangles along planar curves. Computers & Graphics, 31(3):440–448, 2007. 33

[122] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10):761 – 767, 2004. ¡ce:title¿British Machine Vision Computing 2002¡/ce:title¿. 18, 38

[123] R. Mehrotra and J. E. Gary. Similar-shape retrieval in shape data management. Computer, 28(9):57–62, 1995. 13, 48

[124] K. Mikolajczyk and C. Schmid. Indexing based on scale invariant interest points. In Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE Interna- tional Conference on, volume 1, pages 525 –531 vol.1, 2001. 137

[125] K. Mikolajczyk and C. Schmid. An afﬁne invariant interest point detector. In Proceedings of the 7th European Conference on Computer Vision-Part I, ECCV ’02, pages 128–142, London, UK, UK, 2002. Springer-Verlag. 13, 17, 38

[126] K. Mikolajczyk and C. Schmid. Scale & afﬁne invariant interest point detectors. Int. J. Comput. Vision, 60(1):63–86, Oct. 2004. xvii, 17, 19, 24, 34, 36, 37, 38, 48, 137, 138, 139, 142, 186

[127] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27(10):1615–1630, Oct. 2005. 13, 41, 43, 45, 143, 156

[128] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffal- itzky, T. Kadir, and L. V. Gool. A comparison of afﬁne region detectors. Int. J. Bibliography 199

Comput. Vision, 65(1-2):43–72, Nov. 2005. 17, 36

[129] F. Mohanna and F. Mokhtarian. Performance evaluation of corner detection algorithms under similarity and afﬁne transforms. British Machine Vision Con- ference, pages 353–362, 2001. 50

[130] F. Mokhtarian and F. Mohanna. Performance evaluation of corner detectors using consistency and accuracy measures. Computer Vision and Image Understand- ing, (1):81–94, Apr. 19, 20, 25, 48, 50, 69, 138

[131] F. Mokhtarian and F. Mohanna. Enhancing the curvature scale space corner detector. pages O–M4B, 2001. 21, 25, 30, 31, 49, 69

[132] F. Mokhtarian and R. Suomela. Robust image corner detection through curvature scale space. IEEE Trans. Pattern Anal. Mach. Intell., 20(12):1376–1381, 1998. 5, 16, 19, 20, 21, 25, 27, 28, 29, 31, 37, 48, 49, 64, 69, 70, 96, 108, 119

[133] H. Moravec. Obstacle avoidance and navigation in the real world by a seeing robot rover. (CMU-RI-TR-80-03), September 1980. 20

[134] H. P. Moravec. Rover visual obstacle avoidance. In Proceedings of the 7th international joint conference on Artiﬁcial intelligence - Volume 2, IJCAI’81, pages 785–790, San Francisco, CA, USA, 1981. Morgan Kaufmann Publishers Inc. 21

[135] H. P. Morevec. Towards automatic visual obstacle avoidance. In Proceedings of the 5th international joint conference on Artiﬁcial intelligence - Volume 2, IJCAI’77, pages 584–584, San Francisco, CA, USA, 1977. Morgan Kaufmann Publishers Inc. 21

[136] G. Mori, X. Ren, A. Efros, and J. Malik. Recovering human body conﬁgurations: combining segmentation and recognition. In Computer Vision and Pattern Recog- nition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II–326 – II–333 Vol.2, june-2 july 2004. 18, 39

[137] M. Muja and D. G. Lowe. Fast approximate nearest neighbors with automatic algorithm conﬁguration. In In VISAPP International Conference on Computer Vi- sion Theory and Applications, pages 331–340, 2009. 46

[138] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. In Computer vision and pattern recognition, 2006 IEEE computer society conference on, Bibliography 200

volume 2, pages 2161–2168. Ieee, 2006. 48

[139] J. A. Noble. Finding corners. Image Vision Comput., 6(2):121–128, May 1988. 22

[140] T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classiﬁcation with local binary patterns. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(7):971, 2002. 1, 13, 44, 48, 156

[141] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3):145– 175, 2001. 13, 47

[142] G. V. Pedrosa and C. A. Barcelos. Anisotropic diffusion for effective shape corner point detection. Pattern Recognition Letters, 31(12):1658 – 1664, 2010. ¡ce:title¿Pattern Recognition of Non-Speech Audio¡/ce:title¿. 29

[143] S.-C. Pei and C.-N. Lin. The detection of dominant points on digital curves by scale-space ﬁltering. Pattern Recognition, 25(11):1307–1314, 1992. 30

[144] T. Peli and D. Malah. A study of edge detection algorithms. Computer graphics and image processing, 20(1):1–21, 1982. 108

[145] M. Perdoch, O. Chum, and J. Matas. Efﬁcient representation of local geometry for large scale object retrieval. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 9–16. IEEE, 2009. 48

[146] S. Pham. Digital straight segments. 36(1):10–30, October 1986. 28

[147] A. Pinheiro and M. Ghanbari. Piecewise approximation of contours through scale-space selection of dominant points. Image Processing, IEEE Transactions on, 19(6):1442 –1450, june 2010. 30

[148] A. M. G. Pinheiro and M. Ghanbari. Piecewise approximation of contours through scale-space selection of dominant points. Trans. Img. Proc., 19(6):1442– 1450, June 2010. 64

[149] J. Paiv¨ arinta,¨ E. Rahtu, and J. Heikkila.¨ Volume local phase quantization for blur-insensitive dynamic texture classiﬁcation. In A. Heyden and F. Kahl, ed- itors, Image Analysis, volume 6688 of Lecture Notes in Computer Science, pages 360–369. Springer Berlin / Heidelberg, 2011. 1, 13, 48 Bibliography 201

[150] F. Porikli. Integral histogram: a fast way to extract histograms in cartesian spaces. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Com- puter Society Conference on, volume 1, pages 829 – 836 vol. 1, june 2005. 42

[151] J. M. Prewitt. Object enhancement and extraction. Picture processing and Psy- chopictorics, 10(1):15–19, 1970. 15, 114, 120

[152] A. Quddus and M. Gabbouj. Wavelet-based corner detection technique using optimal scale. Pattern Recogn. Lett., 23:215–220, January 2002. 33

[153] S. W. T. R. M. Najmus Sadat and G. Lu. An effective and efﬁcient contour-based corner detector using simple triangular theory. pages 37–42, 2011. 5, 64, 70, 142

[154] S. W. T. R. M. Najmus Sadat and G. Lu. An effective and efﬁcient contour-based corner detector using simple triangular theory. pages 37–42, 2011. 5, 16, 25, 27, 29, 31, 48, 62, 69, 77, 85, 108, 118, 119, 142

[155] M. Raginsky and S. Lazebnik. Localitysensitive binary codes from shift- invariant kernels,” advances in neural information processing systems, 2009. 46

[156] A. Rattarangsi and R. Chin. Scale-based detection of corners of planar curves. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 14(4):430 –449, apr 1992. 19, 20, 30, 31

[157] B. K. Ray and R. Pandyan. ACORD–an adaptive corner detector for planar curves. Pattern Recognition, 36(3):703–708, Mar. 2003. 30

[158] F. Remondino. Detectors and descriptors for photogrammetric applications. In- ternational Archives of Photogrammetry, Remote Sensing and Spatial Information Sci- ences, 36(3):49–54, 2006. 34

[159] X. Ren and J. Malik. Learning a classiﬁcation model for segmentation. In Com- puter Vision, 2003. Proceedings. Ninth IEEE International Conference on, pages 10 –17 vol.1, oct. 2003. 18, 39

[160] A. Rosenfeld and E. Johnston. Angle detection on digital curves. IEEE Trans. Comput., 22(9):875–878, Sept. 1973. 24, 28, 29

[161] A. Rosenfeld and J. Weszka. An improved method of angle detection on digital curves. Computers, IEEE Transactions on, C-24(9):940 – 941, sept. 1975. 24, 28, 29 Bibliography 202

[162] E. Rosten and T. Drummond. Machine learning for high-speed corner detection. pages 430–443, 2006. 19

[163] E. Rosten and T. Drummond. Machine learning for high-speed corner detection. pages 430–443, 2006. 20, 22, 35

[164] E. Rosten, R. Porter, and T. Drummond. Faster and better: A machine learning approach to corner detection. IEEE transactions on pattern analysis and machine intelligence, 32(1):105–119, 2010. 40

[165] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. Orb: An efﬁcient alternative to sift or surf. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 2564 –2571, nov. 2011. 40

[166] Y. Rui, T. S. Huang, and S.-F. Chang. Image retrieval: Past, present, and future. In Journal of Visual Communication and Image Representation. Citeseer, 1997. 13, 47

[167] R. Sadat, S. W. Teng, and G. Lu. An effective method of estimating scale- invariant interest region for representing corner features. In Image and Vision Computing New Zealand (IVCNZ), 2012 27th International Conference of, nov. 2012. 37, 48

[168] R. M. N. Sadat, A. Hossain, M. Nabeel, and N. Afrin. Efﬁcient and reliable corner detectors through analysing cpda. In The Twentieth International Symposium on Artiﬁcial Life and Robotics 2015 (AROB 20th 2015), jan. 2015. 32

[169] R. M. N. Sadat, S. W. Teng, and G. Lu. An effective method of estimating scale- invariant interest region for representing corner features. In Proceedings of the 27th Conference on Image and Vision Computing New Zealand, pages 73–78. ACM, 2012. 37

[170] M. Saleiro, K. Terzic,´ J. Rodrigues, and J. du Buf. Bink: Biological binary key point descriptor. Biosystems, 162:147–156, 2017. 41

[171] H. Sanchez-Cruz´ and E. Bribiesca. Polygonal approximation of contour shapes using corner detectors. Journal of applied research and technology, 7(3):275–290, 2009.

[172] M. Sarfraz. Detecting corner points from digital curves. In Systems, Signals and Devices (SSD), 2011 8th International Multi-Conference on, pages 1 –9, march 2011. Bibliography 203

[173] C. Schmid and R. Mohr. Combining greyvalue invariants with local constraints for object recognition. In Computer Vision and Pattern Recognition, 1996. Proceed- ings CVPR ’96, 1996 IEEE Computer Society Conference on, pages 872 –877, jun 1996. 22

[174] C. Schmid, R. Mohr, and C. Bauckhage. Comparing and evaluating interest points. In Computer Vision, 1998. Sixth International Conference on, pages 230 – 235, jan 1998. 49, 50

[175] C. Schmid, R. Mohr, and C. Bauckhage. Evaluation of interest point detectors. Int. J. Comput. Vision, 37(2):151–172, June 2000. 17, 19, 49, 50

[176] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. Pat- tern Anal. Mach. Intell., 22(8):888–905, Aug. 2000. 18

[177] J. Shi and C. Tomasi. Good features to track. In Computer Vision and Pattern Recognition, 1994. Proceedings CVPR ’94., 1994 IEEE Computer Society Conference on, pages 593 –600, jun 1994. 22

[178] G. Shu, A. Dehghan, and M. Shah. Improving an object detector and extracting regions using superpixels. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3721–3727. IEEE, 2013. 18

[179] P. Y. Simard, L. Bottou, P. Haffner, and Y. LeCun. Boxlets: a fast convolution algorithm for signal processing and neural networks. In Proceedings of the 1998 conference on Advances in neural information processing systems II, pages 571–577, Cambridge, MA, USA, 1999. MIT Press. 17

[180] J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In null, page 1470. IEEE, 2003. 154

[181] S. M. Smith and J. M. Brady. Susan:a new approach to low level image processing. Int. J. Comput. Vision, 23(1):45–78, 1997. 258056. 20, 22, 35, 39

[182] C. Strecha, A. Bronstein, M. Bronstein, and P. Fua. Ldahash: Improved matching with smaller descriptors. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 34(1):66 –78, jan. 2012. 40 Bibliography 204

[183] M. J. Swain and D. H. Ballard. color indexing. Int. J. Comput. Vision, 7(1):11–32, Nov. 1991. 1, 13, 48

[184] C.-H. Teh and R. Chin. On the detection of dominant points on digital curves. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 11(8):859 –872, aug 1989. 24, 27

[185] S. W. Teng, R. M. N. Sadat, and G. Lu. Effective and efﬁcient contour-based corner detectors. Pattern Recognition, 48(7):2185 – 2197, 2015. 31, 32, 96, 103, 108, 120

[186] M. Toews and W. Wells. Sift-rank: Ordinal description for invariant feature correspondence. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 172 –177, june 2009. 43

[187] E. Tola, V. Lepetit, and P. Fua. A fast local descriptor for dense matching. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1 –8, june 2008. 2, 13, 17, 40, 43, 48

[188] A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 30(11):1958 –1970, nov. 2008. 46

[189] M. Trajkovic and M. Hedley. Fast corner detection. pages 75–78, 1998. 22, 49, 50

[190] D.-M. Tsai and M.-F. Chen. Curve ﬁtting approach for tangent angle and curvature measurements. Pattern Recognition, 27(5):699–711, 1994. 33

[191] D. M. Tsai, H. T. Hou, and H. J. Su. Boundary-based corner detection using eigenvalues of covariance matrices. Pattern Recognition Letters, 20:31–40, 1999. 33, 72, 77

[192] T. Tuytelaars and K. Mikolajczyk. Local Invariant Feature Detectors: A Survey. Now Publishers Inc., Hanover, MA, USA, 2008. 11, 18, 19

[193] T. Tuytelaars and L. Van Gool. Content-based image retrieval based on local afﬁnely invariant regions. In International Conference on Advances in Visual Infor- mation Systems, pages 493–500. Springer, 1999. 38

[194] T. Tuytelaars and L. Van Gool. Matching widely separated views based on afﬁne Bibliography 205

invariant regions. Int. J. Comput. Vision, 59(1):61–85, Aug. 2004. 18, 37, 38, 48, 186

[195] I. Vanhamel, C. Mihai, H. Sahli, A. Katartzis, and I. Pratikakis. Scale selection for compact scale-space representation of vector-valued images. International Journal of Computer Vision, 84:194–204, 2009. 10.1007/s11263-008-0154-4. 140

[196] C. Varytimidis, K. Rapantzikos, and Y. Avrithis. Wαsh: Weighted α-shapes for local feature detection. In Computer Vision–ECCV 2012, pages 788–801. Springer, 2012. 36

[197] C. Velardo and J.-L. Dugelay. Face recognition with daisy descriptors. In Pro- ceedings of the 12th ACM workshop on Multimedia and security, MM&Sec ’10, pages 95–100, New York, NY, USA, 2010. ACM. 43

[198] R. C. Veltkamp and M. Tanase. Content-based image retrieval systems: A survey. 2000. xiv, 47

[199] L. Vincent and P. Soille. Watersheds in digital spaces: an efﬁcient algorithm based on immersion simulations. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 13(6):583 –598, jun 1991. 18

[200] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceed- ings of the 2001 IEEE Computer Society Conference on, volume 1, pages I–511 – I–518 vol.1, 2001. 17

[201] J. Z. Wang, J. Li, and G. Wiederhold. Simplicity: Semantics-sensitive integrated matching for picture libraries. IEEE Transactions on pattern analysis and machine intelligence, 23(9):947–963, 2001. 162

[202] M.-J. J. Wang, W.-Y. Wu, L.-K. Huang, and D.-M. Wang. Corner detection using bending value. Pattern Recogn. Lett., 16(6):575–583, June 1995. 24, 28

[203] Z. Wang, B. Fan, G. Wang, and F. Wu. Exploring local and overall ordinal information for robust feature description. IEEE transactions on pattern analysis and machine intelligence, 38(11):2198–2211, 2016. 41

[204] L.-D. Wu. On the chain code of a line. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-4(3):347 –353, may 1982. 28 Bibliography 206

[205] W.-Y. Wu. Dominant point detection using adaptive bending value. Image and Vision Computing, 21(6):517 – 525, 2003. 24, 27, 28, 29, 49

[206] Y. Xu, Y. Quan, Z. Zhang, H. Ji, C. Fermuller, M. Nishigaki, and D. Dementhon. Contour-based recognition. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3402 –3409, june 2012. 41, 156

[207] M. Yamazaki and S. Fels. Local image descriptors using supervised kernel ica. In Proceedings of the 3rd Paciﬁc Rim Symposium on Advances in Image and Video Technology, PSIVT ’09, pages 94–105, 2008. 43

[208] X. Yang and K.-T. Cheng. Ldb: An ultra-fast feature for scalable augmented reality on mobile devices. In Mixed and Augmented Reality (ISMAR), 2012 IEEE International Symposium on, pages 49–57. IEEE, 2012. 41

[209] Y. Yang, F. Duan, and L. Ma. A rotationally invariant descriptor based on mixed intensity feature histograms. Pattern Recognition, 76:162–174, 2018. 41

[210] Q. Ying-Dong, C. Cheng-Song, C. San-Ben, and L. Jin-Quan. A fast subpixel edge detection method using sobel–zernike moments operator. Image and Vision Computing, 23(1):11–17, 2005. 15, 113, 120

[211] B. Yuan, H. Cao, and J. Chu. Combining local binary pattern and local phase quantization for face recognition. In Biometrics and Security Technologies (IS- BAST), 2012 International Symposium on, pages 51 –53, march 2012. 1, 13, 48

[212] M. Zhang, J. anad Marszalek, S. Lazebnik, and C. Schmid. Local features and kernels for classiﬁcation of texture and object categories: A comprehensive study. Int. J. Comput. Vision, 73(2):213–238, June 2007. 1, 21, 30, 31, 49, 64, 72, 77, 96

[213] W.-C. Zhang and P.-L. Shui. Contour-based corner detection via angle difference of principal directions of anisotropic gaussian directional derivatives. Pat- tern Recognition, 48(9):2785–2797, 2015. 31

[214] X. Zhang and X. H. Ji. An improved harris corner detection algorithm for noised images. In Advanced Materials Research, volume 433, pages 6151–6156. Trans Tech Publ, 2012. 20

[215] X. Zhang, M. Lei, D. Yang, Y. Wang, and L. Ma. Multi-scale curvature product Bibliography 207

for robust image corner detection in curvature scale space. Pattern Recogn. Lett., 28(5):545–554, 2007. 5

[216] X. Zhang, H. Wang, M. Hong, L. Xu, D. Yang, and B. C. Lovell. Robust image corner detection based on scale evolution difference of planar curves. Pattern Recogn. Lett., 30:449–455, March 2009. 33, 53, 72, 77, 120

[217] X. Zhang, H. Wang, A. W. B. Smith, X. Ling, B. C. Lovell, and D. Yang. Cor- ner detection based on gradient correlation matrices of planar curves. Pattern Recogn., 43:1207–1223, April 2010. 5, 16, 25, 33, 72, 77, 108, 119

[218] Z. Zheng, H. Wang, and E. K. Teoh. Analysis of gray level corner detection. Pattern Recognition Letters, 20(2):149 – 162, 1999. 22

[219] D. Zhengjian and M. Aihua. Harris corner detection based on the multi-scale topological feature. In Computer Science and Network Technology (ICCSNT), 2011 International Conference on, volume 3, pages 1394 –1397, dec. 2011. 24

[220] B. Zhong, K.-K. Ma, and W. Liao. Scale-space behavior of planar-curve corners. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 31(8):1517 –1524, aug. 2009. 31

[221] D. Zhou, G. Li, and Y. hui Liu. Effective corner matching based on delaunay triangulation. In Robotics and Automation, 2004. Proceedings. ICRA ’04. 2004 IEEE International Conference on, volume 3, pages 2730 – 2733 Vol.3, april-1 may 2004. 39

[222] C. Zhu, C.-E. Bichot, and L. Chen. Visual object recognition using daisy descriptor. In Multimedia and Expo (ICME), 2011 IEEE International Conference on, pages 1 –6, july 2011. 43

[223] P. Zhu and P. M. Chirlian. On critical point detection of digital shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):737–748, 1995. 33