Monocular Depth Cues in Computer Vision Applications

Monocular Depth Cues in Computer Vision Applications A dissertation submitted by Diego Cheda at Universitat Autònomade Barcelona to fulfil the degree of Doctor of Philosophy. Bellaterra, September 26, 2012 Director: Dr. Daniel Ponsa Mussarra Centre de Visióper Computador Universitat Autònomade Barcelona Co-director: Dr. Antonio M. López Pe~na Centre de Visióper Computador Universitat Autònomade Barcelona Thesis commite: Dr. Juan Andrade Cetto Institut de Robòticai InformàticaIndustrial Consejo Superior de Investigaciones Cient´ıficas Dr. Angel´ Domingo Sappa Centre de Visióper Computador Dr. David Masip Rodó Dept. d'Estudis d’InformàticaMultimèdiai Telecomunicació Universitat Oberta de Catalunya Dra. Aura Hernandez Sabaté Dept. de Ciènciesde la Computació Universitat Autònomade Barcelona Dra. Carme JuliàFerré Dept. d'Enginyeria Informàticai Matemàtiques Universitat Rovira i Virgili This document was typeset by the author using LATEX 2". The research described in this book was carried out at the Computer Vision Center, Universitat Autònomade Barcelona. Copyright ⃝c MMXII by Diego Cheda. All rights reserved. No part of this publica- tion may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the author. ISBN 978-84-940530-0-9 Printed by Ediciones GráficasRey, S.L. A Martina y a Pau Acknowledgements Foremost, I would like to thank my advisors, Dr. Daniel Ponsa and Dr. Antonio M. López, for their continuous support, advice and patience during these years. They introduced me to the Computer Vision field and shared with me a lot of his expertise and research insight. I would like to express my gratitude towards all the members of ADAS group. I would like to acknowledge the funding, academic and technical support of the Computer Vision Center, the Universitat Autònomade Barcelona, and the Ministry of Education and Science (projects TRA2011-29454-C03-01 and TIN2011-29494-C03-02, and Consolider Ingenio 2010: MIPRCV (CSD200700018)). I would also like to thank my committee for their time and comments: Dr. Juan Andrade Cetto, Dr. Angel´ Domingo Sappa, Dr. David Masip Rodó,Dra. Aura Hernandez Sabaté,and Dra. Carme JuliàFerré.I also want to thank the anonymous conference and journal reviewers for their invaluable comments along these years. I also owe a great deal to the many friends and colleagues that I have met at Computer Vision Center. Thank you all for share your time in one way or another along some endless days at CVC. I am indebted to Fernando Barrera and David Rojas for the innumerable talks about research, life, rare things and, almost as important, the morning coffee ritual. I thank to my closest friends Juan JoséTamagnini, Salvador Cavadini, and Daniel Romero for always being there. Most importantly, I thank my wonderful family. First, to all my big family back in Argentina, I thank you for your continued support and encouragement from abroad. Second, to my small family in Catalonia: to Fernanda for those nice days in Tarragona, and to Mariana and Aarónfor those incredible shared moments at Barcelona. Last, to my dear wife Martina for embarked with me in this adventure, for your continued love and support, and for keeping at my side {dealing with the good and bad times along these years. During this time, you had the generous experience of raising a family, teaching me how deep is your love. To my gorgeous son, Pau, I thank you for your smiles, your 3am wake-ups asking for milk, for make me playing again and again with the animals, and for showing me the impressive way of the life. i ii Abstract Depth perception is a key aspect of human vision. It is a routine and essential visual task that the human do effortlessly in many daily activities. This has often been associated with stereo vision, but humans have an amazing ability to perceive depth relations even from a single image by using several monocular cues. In the computer vision field, if image depth information were available, many tasks could be posed from a different perspective for the sake of higher performance and robustness. Nevertheless, given a single image, this possibility is usually discarded, since obtaining depth information has frequently been performed by three-dimensional reconstruction techniques, requiring two or more images of the same scene taken from different viewpoints. Recently, some proposals have shown the feasibility of computing depth information from single images. In essence, the idea is to take advantage of a priori knowledge of the acquisition conditions and the observed scene to estimate depth from monocular pictorial cues. These approaches try to precisely estimate the scene depth maps by employing computationally demanding techniques. However, to assist many computer vision algorithms, it is not really necessary computing a costly and detailed depth map of the image. Indeed, just a rough depth description can be very valuable in many problems. In this thesis, we have demonstrated how coarse depth information can be inte- grated in different tasks following alternative strategies to obtain more precise and robust results. In that sense, we have proposed a simple, but reliable enough tech- nique, whereby image scene regions are categorized into discrete depth ranges to build a coarse depth map. Based on this representation, we have explored the potential usefulness of our method in three application domains from novel viewpoints: camera rotation parameters estimation, background estimation and pedestrian candidate generation. In the first case, we have computed camera rotation mounted in a moving vehicle applying two novels methods based on distant elements in the image, where the translation component of the image flow vectors is negligible. In background estimation, we have proposed a novel method to reconstruct the background by pe- nalizing close regions in a cost function, which integrates color, motion, and depth terms. Finally, we have benefited of geometric and depth information available on single images for pedestrian candidate generation to significantly reduce the number of generated windows to be further processed by a pedestrian classifier. In all cases, results have shown that our approaches contribute to better performances. iii Resumen La percepciónde la profundidad es un aspecto clave en la visiónhumana. El ser humano realiza esta tarea sin esfuerzo alguno con el objetivo de efectuar diversas actividades cotidianas. A menudo, la percepciónde la profundidad se ha asociado con la visiónbinocular. Pese a esto, los seres humanos tienen una capacidad asombrosa de percibir las relaciones de profundidad, incluso a partir de una sola imagen, mediante el uso de varias pistas monoculares. En el campo de la visiónpor ordenador, si la informaciónde la profundidad de una imagen estuviera disponible, muchas tareas podr´ıanser planteadas desde una perspectiva diferente en aras de un mayor rendimiento y robustez. Sin embargo, dada una únicaimagen, esta posibilidad es generalmente descartada, ya que la ob- tenciónde la informaciónde profundidad es frecuentemente obtenida por las técnicas de reconstruccióntridimensional, que requieren dos o másimágenesde la misma escena tomadas desde diferentes puntos de vista. Recientemente, algunas propues- tas han demostrado que es posible obtener informaciónde profundidad a partir de imágenesindividuales. En esencia, la idea es aprovechar el conocimiento a priori de las condiciones de adquisiciónde la imagen y de la escena observada para estimar la profundidad empleando pistas pictóricasmonoculares. Estos enfoques tratan de estimar con precisiónlos mapas de profundidad de la escena empleando técnicascom- putacionalmente costosas. Sin embargo, muchos algoritmos de visiónpor ordenador no necesitan un mapa de profundidad detallado de la imagen. De hecho, sólouna descripciónen profundidad aproximada puede ser muy valiosa en muchos problemas. En nuestro trabajo, hemos demostrado que incluso la informaciónaproximada de profundidad puede integrarse en diferentes tareas siguiendo una estrategia hol´ıstica con el fin de obtener resultados másprecisos y robustos. En ese sentido, hemos propuesto una técnicasimple, pero fiable, por medio de la cual regiones de la imagen de una escena se clasifican en rangos de profundidad discretos para construir un mapa tosco de la profundidad. Sobre la base de esta representación,hemos explorado la utilidad de nuestro método en tres dominios de aplicacióndesde puntos de vista novedosos: la estimaciónde la rotaciónde la cámara,la estimacióndel fondo de una escena y la generaciónde ventanas de interéspara la detecciónde peatones. En el primer caso, calculamos la rotaciónde la cámaramontada en un veh´ıculoen movimiento mediante dos nuevos métodos que emplean elementos distantes en la imagen a travésde nuestros mapas de profundidad, aprovechando el hecho de que v vi el componente traslacional en los vectores de flujo de la imagen pueden considerarse como insignificantes. En la reconstruccióndel fondo de una imagen, propusimos un método novedoso que penaliza las regiones cercanas en una funciónde coste que integra, además,informacióndel color y del movimiento. Por último,empleamos la informacióngeométricay de la profundidad de una escena para la generaciónde peatones candidatos. Este método reduce significativamente el númerode ventanas generadas, las cuales seránposteriormente procesadas por un clasificador de peatones. En todos los casos, los resultados muestran que los enfoques basados en la profundidad contribuyen a un mejor rendimiento de las aplicaciones estudidadas. viii Contents Abstract iii Resumen v 1 Introduction 3 1.1 Contributions and outline . 8 2 Depth perception 13 2.1 Introduction . 13 2.2 Depth perception . 13 2.3 Depth estimation from a single image .

Monocular Depth Cues in Computer Vision Applications

Acquired Monocular Vision Rehabilitation Program

Relative Importance of Binocular Disparity and Motion Parallax for Depth Estimation: a Computer Vision Approach

Binocular Vision and Ocular Motility SIXTH EDITION

Lecture 6 Depth Perception

Depth and Size Perception 7

Clinical Suppression and Amblyopia

Understanding Eye Movements and Attentional Mechanisms Is Key to Improving Amblyopia Treatment

Whale of a View

3-D Depth Reconstruction from a Single Still Image

The Effects of Ocular Dominance on Visual Processing in College Students William Alexander Holland James Madison University

The Neural Basis of Suppression and Amblyopia in Strabismus

Picture IT at Work: an Inside Review of Monocular Vision