Structure from Motion Using a Hybrid Stereo-Vision System François Rameau, Désiré Sidibé, Cédric Demonceaux, David Fofi
Total Page:16
File Type:pdf, Size:1020Kb
Structure from motion using a hybrid stereo-vision system François Rameau, Désiré Sidibé, Cédric Demonceaux, David Fofi To cite this version: François Rameau, Désiré Sidibé, Cédric Demonceaux, David Fofi. Structure from motion using a hybrid stereo-vision system. 12th International Conference on Ubiquitous Robots and Ambient Intel- ligence, Oct 2015, Goyang City, South Korea. hal-01238551 HAL Id: hal-01238551 https://hal.archives-ouvertes.fr/hal-01238551 Submitted on 5 Dec 2015 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The 12th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI 2015) October 28 30, 2015 / KINTEX, Goyang city, Korea ⇠ Structure from motion using a hybrid stereo-vision system Franc¸ois Rameau, Desir´ e´ Sidibe,´ Cedric´ Demonceaux, and David Fofi Universite´ de Bourgogne, Le2i UMR 5158 CNRS, 12 rue de la fonderie, 71200 Le Creusot, France Abstract - This paper is dedicated to robotic navigation vision system and review the previous works in 3D recon- using an original hybrid-vision setup combining the ad- struction using hybrid vision system. In the section 3.we vantages offered by two different types of camera. This describe our SFM framework adapted for heterogeneous couple of cameras is composed of one perspective camera vision system which does not need inter-camera corre- associated with one fisheye camera. This kind of config- spondences. While the third part of this paper (section 4. uration, is also known under the name of foveated vision ) concerns the results obtained with our method. Finally, system since it is inspired by the human vision system a short conclusion ends this article. and allows both a wide field of view and a detail front view of the scene. 2. Previous works Here, we propose a generic and robust approach for SFM, which is compatible with a very broad spectrum of multi- In this section we review the already existing meth- camera vision systems, suitable for perspective and om- ods developed for the calibration and the navigation using nidirectional cameras, with or without overlapping field hybrid-vision system. We are also giving a clear defini- of view. tion about the term ”hybrid-vision system” and their uses. Keywords - Hybrid vision, SFM 2.1 Hybrid vision system The term of hybrid vision system means that the cam- eras used within the vision system are of different natures 1. Introduction or modalities [2]. This type of camera association al- Binocular vision system is a well-known configura- lows the acquisition of complementary informations, for tion in computer vision which has been studied over the instance, an extension of the field of view, depth informa- past decades. This configuration of two similar cameras tion or the study of a wider range of wavelength. is widely used for 3D reconstruction, mobile robot nav- For example, in [3] the authors proposed to merge infor- igation, etc. Such type of system is particularly inter- mation from a conventional binocular system and from esting because it allows the simultaneous capture of two two infrared cameras in order to improve pedestrian de- akin images from which stereo matching can be achieved tection process. accurately using geometrical constraints and photometric RGB-D sensors, such as the Kinect, using a reconstruc- descriptors. tion approach based on the projection of an infra-red pat- In this paper we propose to modify the conventional tern can also be viewed as hybrid vision sensor. In fact, stereo-vision system by replacing one of the cameras by a RGB camera is used to texture the reconstruction with an omnidirectional sensor, more specifically a fisheye visible color, while the infra-red camera can analyse the camera. This new combination of cameras is very ver- pattern to reconstruct the 3D structure of the scene. These satile as it combines the advantages from both cameras, two sensors are very complementary. For instance, in [4] offering desirable features for robot localisation and map- the registration of 3D point cloud is simplified by the uti- ping. Indeed, the fisheye camera provides a large vision lization of information from the RGB images. Many oth- of the scene. Furthermore, it has been proved in [1] that ers original combination exist, for example in [5], where spherical sensors are an appropriate solution to overcome the sensor consists in the association of a high resolu- the ambiguities when small amplitude motions are per- tion camera with two low resolution cameras for real time formed. On the other hand, the perspective camera can events detection. be employed to extract details from the scene captured In this article, we are interested in the case where cam- by the sensor. The other advantage offered by this second eras have different geometrical properties. More explic- camera is the possibility to estimate the displacements of itly, the association of one perspective and one omnidi- a robot at the real scale. rectional camera. This specific type of system has al- In this paper, we propose a novel approach to estimate the ready been studied especially for video-surveillance pur- motion of a mobile robot using this configuration of cam- poses for its ability to obtain conjointly a global view of eras in an efficient way. This approach is mainly inspired the scene and an accurate image of the target from one by non-overlapping SFM techniques. or multiple perspective cameras. The calibration of such type of system is discussed in [6] where one omnidirec- This article is organized in the following manner. In tional camera is used in collaboration with a network of the next section we give a definition of the term hybrid perspective cameras. This configuration of camera can potentially be valuable This work was supported by DGA (Direction Generale de l’Armement) for robotic navigation. In [7], the authors propose to use and the regional council of Burgundy. a catadioptric sensor and a perspective camera for obsta- cle detection. For multi-robot collaboration, Roberti et O t+2 al. [8] propose an original approach to compute the struc- o3 Op3 ture and the motion from multiple robots equipped with o3 different types of camera. Mo2 p3 Eynard et al. [2] also take advantage of this setup in or- Mp2 o3 Oo2 der to estimate both the attitude and the altitude of a UAV Mo1 p3 using a dense plane sweeping based registration. t+1 Mp1 2.2 3D reconstruction and localisation using a hy- Op2 o2 p Mo1 Mo brid vision system Mp2 Classical structure from motion methods consist in the p1 O joint estimation of the scene structure and the motion o1 t from a single moving camera [9]. When this type of ap- Op1 proach is used to determine the displacements of a robot, Mp =[Rp tp] the images are sorted temporally. Consequently, they are o o | o processed one after the other at every new frame acqui- Fig. 1 Model of our system for two successive dis- sition, we call this strategy a sequential SFM. The 3D placements reconstruction as well as the motion estimation obtained from these methods are up to a scale factor, which repre- sents a limitation in robotic navigation. To overcome this tion of the usual descriptor to the geometry of the cameras specific problem, a common solution is to utilize multi- (for instance Harris [14] or SIFT [15]). These descriptors ple cameras, the most basic example being the conven- are often used in conjunction with appropriated geomet- tional stereo-vision system (usually two similar perspec- ric constraints in order to remove outliers. The mentioned tive cameras). Nevertheless, this sort of configuration approaches can be used to achieve 3D reconstruction of needs a full calibration of the system. Furthermore, an the scene and to localize the cameras, however, they do accurate synchronisation of the cameras is mandatory in not consider a calibrated stereo-vision system. In fact, order to ensure a simultaneous images acquisition. The this prior calibration is carrying valuable informations stereo image matching can be used to estimate the re- which can simplify the images matching, for instance, construction of the environment with real scale. Most by rectification. Nevertheless, this epipolar rectification of the approaches of SFM using stereo-vision systems with cameras having notable different resolutions does can be split into two main steps, the first one being the not allow an accurate matching. Clearly, the inter-camera 3D reconstruction of the environment at a time t using matching is a particularly complicated step and needs the stereo-correspondences only. The second step is the tem- use of sophisticated and computationally expensive ap- poral tracking of the feature points in the next images. proaches. Moreover, the accuracy offered by such type A pose estimation can be performed by minimizing the of process is highly depending upon the dissimilarity be- re-projection error of the 3D points on the new images tween the cameras resolution. Indeed, this difference is acquired at time t +1[10]. emphasized as the focal length of the perspective camera Other techniques are also possible. In [11] the authors increases, making the previously described methods inef- proposed a motion estimation using two interlaced tri- ficient. focal tensors in order to estimate the six degrees of free- Nevertheless, the point matching between cameras of dom of the stereo-rig motion.