Matchability Prediction for Full-Search Template Matching Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
Matchability Prediction for Full-Search Template Matching Algorithms Adrian Penate-Sanchez1, Lorenzo Porzi2,3, and Francesc Moreno-Noguer1 1Institut de Robotica` i Informatica` Industrial, UPC-CSIC, Barcelona, Spain 2Fondazione Bruno Kessler, Trento, Italy 3University of Perugia, Italy Abstract template matching algorithms that perform a full search on all possible transformations of the template guarantee to While recent approaches have shown that it is possible find the global maximum of the distance function, yielding to do template matching by exhaustively scanning the pa- more accurate results at the price of increasing the compu- rameter space, the resulting algorithms are still quite de- tational cost. manding. In this paper we alleviate the computational load Several approaches have focused on improving specific of these algorithms by proposing an efficient approach for characteristics of the template matching framework, such as predicting the matchability of a template, before it is actu- its speed [22, 24], its robustness to partial occlusions [4] or ally performed. This avoids large amounts of unnecessary ambiguities [26] and its ability to select reliable templates computations. We learn the matchability of templates by for rapid visual tracking [2]. Yet, most of these improve- using dense convolutional neural network descriptors that ments have been designed for the algorithms that do not per- do not require ad-hoc criteria to characterize a template. form a full search. In this work we will bring these improve- By using deep learning descriptions of patches we are able ments to algorithms that perform full search, and in particu- to predict matchability over the whole image quite reli- lar we will consider FAst-Match [15], a recent method that ably. We will also show how no specific training data is re- performs near full search at reasonable speed. quired to solve problems like panorama stitching in which Algorithms like FAst-Match, though, may still suffer you usually require data from the scene in question. Due from false positives. Furthermore, they are only able to es- to the highly parallelizable nature of this tasks we offer an timate affine transformations: this is enough to approximate efficient technique with a negligible computational cost at small projective deformations, but produces bigger errors in test time. the general case (as seen on the third experiment of [15]). We aim to provide a template selection approach to partially overcome these problems. We propose to learn a matchable template detector by modeling the probability of a template 1. Introduction to be correctly matched. In particular, we combine dense CNN features, due to the promising results obtained on a Template-based matching has been an important topic in variety of computer vision tasks [30, 16], and a logistic re- Computer Vision for nearly as long as the field has existed. gression objective. This allows for an extremely efficient Recently it has been applied successfully to dense 3D re- GPU implementation of the proposed method, which makes construction [8], image-based rendering [7], image match- its computational demands negligible when compared to the ing [32] and even used with RGBD data for object detec- matching itself. tion [12]. As finding a close approximation of the correct trans- Template Matching algorithms can consider all possible formation is not difficult when performing full search, our transformations [18, 31] of the template or just a discrete main aim will be to reduce the overlap error, as well as de- subset of them [11, 21]. By sampling a subset of all can- creasing the computational effort by preemptively selecting didate templates we are able to obtain algorithms that can good templates. We use the overlap error as it seems the operate at speeds of nearly 30 fps [11]. On the other hand, community agrees on it [15, 19, 20]; this is discussed ex- Adrian Penate-Sanchez and Lorenzo Porzi have contributed equally to tensively in [19, 20]. this work. In a similar manner as [10] did for feature points we 4321 Figure 1. First row: first image from sequences 1,2,4 and 5. Second row: heat maps representing the response of our matchable template detector on the images in the first row (blue: low probability, red: high probability). Third row: original images filtered using the heat maps to highlight the regions most likely to be selected. will identify which are the areas of a scene that give good matching problem. In order to handle general transforma- matchability. The main difference when using templates tions, in [14] a grayscale template matching algorithm that is that, instead of focusing on a set of points, one has to de- considered variations of scale and rotation was presented. fine the matchability of all the image. An example of how More recently FAst-Match [15] was proposed as an effi- our approach is capable of obtaining the interesting parts cient way to handle general affine transformations; although from all the image can be seen in Fig. 1. In this work we it was not the first successful approach to try so [9], FAst- will show that template selection is able to noticeably im- Match remains to this day the approach that shows best re- prove matching accuracy at the expense of an almost neg- sults in literature. FAst-Match [15] lies between discrete ligible additional computational cost. Furthermore, we will template matching and full search approaches. In particu- show the proposed approach to have good generalization lar it considers all the affine transformation space but, us- capabilities. ing a tree search approach, avoids searching high error sub- spaces. 2. Related Work 2.1. Feature Selection As mentioned above, we may roughly classify template matching algorithms into those that perform full search or Feature selection approaches can enhance both speed near full search of the transformation parameters [18, 31], and robustness of matching techniques. In [13] feature se- and those that just locally or discretely refine the parame- lection is done based on the upper bound of the average ters. The latter allow for fast optimization but do not guar- error, prioritizing templates that are more robust to small antee a global maximum [11]. errors in the transformation; this is appropriate to avoid test- In between, there exist a huge body of work that attempts ing huge amounts of candidate templates. In [2], the relia- to balance both. In [1] the relation between appearance dis- bility of a template is defined through a number of char- tance and spatial overlap is exploited to give an upper bound acteristics, namely the uniformity of texture, contrast and on appearance distance given the spatial overlap of two win- spatial locality among others. These characteristics are then dows in an image. By doing this, it is then possible to used to define a scoring scheme using Support Vector Re- provide a computationally efficient solution to the template gression [25] that will, at runtime, score the different candi- Rescaling CNN Descriptors Matchability Maps Fk(·) Template Selection Tθ1(·) Fk(·) Tθ2(·) Tθt(·) Fk(·) Figure 2. Schematic representation of our matchable template detection method. In a first step we apply a set of transformations Tθ1 ;:::;Tθt to the input image. Then we extract dense CNN features for each of the transformed images and use them to compute the matchability maps. Finally we select the matchable templates as the set of image patches corresponding to local maxima in the matchability maps. dates. [4] presents a method that selects template pixels that template. We can express this as an optimization problem: verify the approximation of the tracking algorithm. In [34] X method that selects an set of templates by learning a quality arg min kI1(p) − I2(Ap + t)k; (1) A;t measure and by optimizing coverage. The commented work p2I1 is mainly applied to tracking and builds on the assumption 2×2 2 where A 2 R and t 2 R are the transformation’s that a full search is not performed on the test image, it just parameters, while the vector p denotes pixel coordinates needs the template to be stable to local transformations and on the images. FAst-Match describes a branch-and-bound does not have to assume free transformations through the scheme to find a good solution of (1) over the 6-DOF space whole space. of affine transformations in a computationally efficient way. Another relevant approach [24] enhances the perfor- In general, the template I1 can be a rectangular region of mance and the speed of [11] and [12] by learning what a larger image I, which we call the source image. In this parts of the template are worth analyzing and boosts the best case, selecting a good template from I can be crucial for parts of each template only testing those, thus in the process the effectiveness of the template matching algorithm. Con- reducing greatly the computational cost. This work focus sider e.g. a natural landscape image: intuitively, a template on texture-less objects and uses edges as features making it depicting a large portion of uniformly colored sky could be specific for such objects. It was also expanded to handle the especially hard to match. Similarly to the well-known ap- RGB-D inputs of [12] in a similar manner. proach adopted for feature point descriptors and detectors We have seen how learning algorithms have been pre- such as SIFT [17] or SURF [3], we propose a matchable viously applied to speedup and increase reliability of tem- template detector which is able to select good candidate plate matching when using discrete search template match- templates from an input image. We train our detector with ing algorithms [24, 2, 13]. But when dealing with full the objective of learning a function that assigns to each can- search [18, 31] or near full search algorithms [15] you are didate template a probability of it being correctly matched.