Contributions to a Fast and Robust Object Recognition in Images Jérôme Revaud
Total Page:16
File Type:pdf, Size:1020Kb
Contributions to a fast and robust object recognition in images Jérôme Revaud To cite this version: Jérôme Revaud. Contributions to a fast and robust object recognition in images. Other [cs.OH]. INSA de Lyon, 2011. English. NNT : 2011ISAL0042. tel-00694442 HAL Id: tel-00694442 https://tel.archives-ouvertes.fr/tel-00694442 Submitted on 4 May 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Numéro d’ordre : 2011ISAL0042 Année 2011 Institut National des Sciences Appliquées de Lyon Laboratoire d’InfoRmatique en Image et Systèmes d’information École Doctorale Informatique et Mathématiques de Lyon Thèse de l’Université de Lyon Présentée en vue d’obtenir le grade de Docteur, spécialité Informatique par Jérôme Revaud Contributions to a Fast and Robust Object Recognition in Images Soutenance prévue le 27 mai 2011 devant le jury composé de : M. Patrick Gros Directeur de recherche, INRIA Rennes Rapporteur M. Frédéric Jurie Professeur, Université de Caen Rapporteur M. Vincent Lepetit Senior Researcher, EPFL Examinateur M. Jean Ponce Professeur, INRIA Paris Examinateur M. Atilla Baskurt Professeur, INSA Lyon Directeur M. Yasuo Ariki Professeur, Université de Kobe Co-directeur M. Guillaume Lavoué Maître de Conférences, INSA Lyon Co-encadrant Laboratoire d’InfoRmatique en Image et Systèmes d’information UMR 5205 CNRS - INSA de Lyon - Bât. Jules Verne 69621 Villeurbanne cedex - France Tel: +33 (0)4 72 43 60 97 - Fax: +33 (0)4 72 43 71 17 Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2011ISAL0042/these.pdf © [J. Revaud], [2011], INSA de Lyon, tous droits réservés Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2011ISAL0042/these.pdf © [J. Revaud], [2011], INSA de Lyon, tous droits réservés Abstract Object recognition in images is a growing field. Since several years, the emergence of invariant interest points such as SIFT [Low01] has enabled rapid and effective systems for the recognition of instances of specific objects as well as classes of objects (e.g. using the bag-of-words model). However, our experiments on the recognition of specific object instances have shown that under realistic conditions of use (e.g. the presence of various noises such as blur, poor lighting, low resolution cameras, etc.) progress remains to be done in terms of recall: despite the low rate of false positives, too few actual instances are detected regardless of the system (RANSAC, votes / Hough ...). In this presentation, we first present a contribution to overcome this problem of robustness for the recogni- tion of object instances, then we straightly extend this contribution to the detection and localization of classes of objects. Initially, we have developed a method inspired by graph matching to address the problem of fast recognition of instances of specific objects in noisy conditions. This method allows to easily combine any types of local features (eg contours, textures ...) less affected by noise than keypoints, while bypassing the normalization problem and without penalizing too much the detection speed. In this approach, the detection system consists of a set of cascades of micro-classifiers trained beforehand. Each micro-classifier is responsible for comparing the test image locally and from a certain point of view (e.g. as contours, or textures ...) to the same area in the model image. The cascades of micro- classifiers can therefore recognize different parts of the model in a robust manner (only the most effective cascades are selected during learning). Finally, a probabilistic model that combines those partial detections infers global detections. Unlike other methods based on a global rigid transformation, our approach is robust to complex deformations such as those due to perspective or those non-rigid inherent to the model itself (e.g. a face, a flexible magazine). Our experiments on several datasets have showed the relevance of our approach. It is overall slightly less robust to occlusion than existing approaches, but it produces better performances in noisy conditions. In a second step, we have developed an approach for detecting classes of objects in the same spirit as the bag-of-visual-words model. For this we use our cascaded micro- classifiers to recognize visual words more distinctive than the classical words simply iii Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2011ISAL0042/these.pdf © [J. Revaud], [2011], INSA de Lyon, tous droits réservés based on visual dictionaries (like Csurka et al. [CDF∗04] or Lazebnik et al. [LSP05]). Training is divided into two parts: First, we generate cascades of micro-classifiers for recognizing local parts of the model pictures and then in a second step, we use a clas- sifier to model the decision boundary between images of class and those of non-class. This classifier bases its decision on a vector counting the outputs of each binary micro- classifier. This vector is extremely sparse and a simple classifier such as Real-Adaboost manages to produce a system with good performances (this type of classifier is similar in fact to the subgraph membership kernel). In particular, we show that the association of classical visual words (from keypoints patches) and our disctinctive words results in a significant improvement. The computation time is generally quite low, given the struc- ture of the cascades that minimizes the detection time and the form of the classifier is extremely fast to evaluate. Keywords: Specific object recognition, class object recognition, graph matching, cas- cades, optimization, mobile robotic. iv Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2011ISAL0042/these.pdf © [J. Revaud], [2011], INSA de Lyon, tous droits réservés Contents Abstract iii Contents v List of Figures ix List of Tables xiii List of Algorithms xv 1 Introduction 1 1.1 A Few Preliminary Words . 1 1.2 Application Field . 2 1.3 A Short Definition of Object Recognition Terms . 4 1.4 Outlines . 6 2 Survey on Object Recognition 9 2.1 A Glance at Object Recognition . 11 2.2 Low-level Features . 12 2.2.1 Dense features . 14 2.2.1.1 Convolution-based features . 14 2.2.1.2 Non-linear features . 16 2.2.2 Sparse features . 16 2.2.2.1 Edges . 17 2.2.2.2 Keypoints . 18 2.2.2.3 Regions . 20 2.2.3 Histogram-based features . 20 2.2.3.1 Local descriptors . 20 2.3 Specific Object Recognition . 23 2.3.1 Using global features . 23 2.3.2 Using local features . 24 2.3.2.1 Rigid matching . 25 v Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2011ISAL0042/these.pdf © [J. Revaud], [2011], INSA de Lyon, tous droits réservés 2.3.2.2 Non-rigid matching . 29 2.4 Class Object Recognition . 33 2.4.1 Feature spaces for class object recognition . 34 2.4.2 Detection schemes . 36 3 Cascaded Multi-feature Incomplete Graph Matching For 3D Specific Object Recognition 41 3.1 Introduction and Motivations . 43 3.1.1 The feature combination problem . 44 3.1.2 Outlines of the proposed method . 46 3.1.3 Related works . 46 3.2 Useful notation . 49 3.3 Used Features . 49 3.3.1 Keypoints . 50 3.3.2 Edges . 51 3.3.3 Textures . 52 3.4 Algorithm Description . 52 3.4.1 The prototype graphs . 54 3.4.2 The detection lattice . 55 3.4.3 Aggregate position . 57 3.4.4 Aggregate recognition . 57 3.4.5 Clustering of detected aggregates . 60 3.4.6 Probabilistic model for clusters of hypothesis . 62 3.5 How to build the detection lattice . 66 3.5.1 Algorithm inputs . 67 3.5.2 Iterative pruning of the lattice . 67 3.5.3 Learning the micro-classifier thresholds . 69 3.5.4 Ranking of the aggregates . 69 3.5.5 Discretization of the training image into parts . 72 3.6 Conclusion . 73 4 Evaluation of Our Contribution For Specific Object Detection 75 4.1 Discussion about the evaluation . 77 4.1.1 Test datasets . 77 4.1.2 Evaluation metrics . 79 4.2 Preliminary training . 80 4.2.1 Learning the subclassifier thresholds . 80 4.2.2 Other kernel parameters . 83 4.3 The CS17 dataset . 83 4.3.1 Parameter Tuning . 84 4.3.2 Comparative experiments . 87 4.3.3 Discussion . 93 4.4 The ETHZ toys dataset . 95 vi Cette thèse est accessible à l'adresse : http://theses.insa-lyon.fr/publication/2011ISAL0042/these.pdf © [J. Revaud], [2011], INSA de Lyon, tous droits réservés 4.5 The Rothganger dataset . 98 4.6 Conclusion . 102 5 Extension of the Multi-feature Incomplete Graph Matching to Recognition of Class Objects 103 5.1 Introduction . 105 5.1.1 Method overview . 105 5.1.2 Related works . 107 5.1.3 Chapter outline . 109 5.2 Method Description . 110 5.2.1 Features used . 111 5.2.2 Window classification . 114 5.2.3 Optimization for training the classifier . 115 5.2.4 Optimization for detection speed . 117 5.3 Modifications to the original lattice . 117 5.3.1 Rotation variance . 118 5.3.2 Recognition procedure for the lattice . 118 5.3.3 Training procedure for the lattice . 120 5.4 Conclusion . 123 6 Evaluation of Our Contribution For Class Object Detection 125 6.1 Introduction .