applied sciences
Article Spherical-Model-Based SLAM on Full-View Images for Indoor Environments
Jianfeng Li 1 , Xiaowei Wang 2 and Shigang Li 1,2,*
1 School of Electronic and Information Engineering, also with the Key Laboratory of Non-linear Circuit and Intelligent Information Processing, Southwest University, Chongqing 400715, China; [email protected] 2 Graduate School of Information Sciences, Hiroshima City University, Hiroshima 7313194, Japan; [email protected] * Correspondence: [email protected]
Received: 14 October 2018; Accepted: 14 November 2018; Published: 16 November 2018
Featured Application: For an indoor mobile robot, finding its location and building environment maps.
Abstract: As we know, SLAM (Simultaneous Localization and Mapping) relies on surroundings. A full-view image provides more benefits to SLAM than a limited-view image. In this paper, we present a spherical-model-based SLAM on full-view images for indoor environments. Unlike traditional limited-view images, the full-view image has its own specific imaging principle (which is nonlinear), and is accompanied by distortions. Thus, specific techniques are needed for processing a full-view image. In the proposed method, we first use a spherical model to express the full-view image. Then, the algorithms are implemented based on the spherical model, including feature points extraction, feature points matching, 2D-3D connection, and projection and back-projection of scene points. Thanks to the full field of view, the experiments show that the proposed method effectively handles sparse-feature or partially non-feature environments, and also achieves high accuracy in localization and mapping. An experiment is conducted to prove that the accuracy is affected by the view field.
Keywords: full-view image; spherical model; SLAM
1. Introduction For a mobile robot, finding its location and building environment maps are basic and important tasks. An answer to this need is the development of simultaneous localization and mapping (SLAM) methods. For vision-based SLAM systems, localization and mapping are achieved by observing the features of environments via a camera. Therefore, the performance of a vision-based SLAM method depends not only on the algorithm, but also the feature distribution of environments observed by the camera. Figure1 shows a sketch of an indoor environment. For such a room, there may be few features within the field of view (FOV). Observing such scenes with a limited FOV camera, the SLAM often fails to work. However, these problems can be avoided by using a full-view camera.
Appl. Sci. 2018, 8, 2268; doi:10.3390/app8112268 www.mdpi.com/journal/applsci Appl. Sci. 2018, 8, 2268 2 of 16 Appl.Appl. Sci.Sci. 20182018,, 88,, xx FORFOR PEERPEER REVIEWREVIEW 22 ofof 1616
FiFiFigureguregure 1.1. TheTheThe performanceperformance performance ofof aa visionvision vision-based--basedbased SLAMSLAM isis influencedinfluenced influenced byby thethe fieldfield field of of view. view.
AnotherAnother reasonreason forfor usingusing aa fullfull full-view--viewview cameracamera isis thatthat itit it improvesimproves improves thethe the accuracyaccuracy accuracy ofof of SLAMSLAM SLAM methods.methods. methods. BecauseBecause somesome differentdifferent motionsmotions maymay havehave similarsimilar changeschanges inin in anan imageimage ofof aa limitelimite limiteddd FOVFOV camera,camera, thesethese motions motions motions are are are discriminated discriminated discriminated withwith with aa afull full full-view--viewview image. image. image. For For For example, example, example, as as shown shown as shown in in Figure Figure in Figure 2, 2, the the2, translationthetranslation translation ofof aa of limitedlimited a limited-view--viewview cameracamera camera alongalong along thethe the horizontalhorizontal horizontal axisaxis axis oror or rotationrotation rotation inin in thethe the horizontalhorizontal horizontal directiondirection direction maymay resultresult inin the the same same movementmovement forfor somesome feafea featuresturestures onon images,images, i.e.,i.e., i.e., TargetTarget Target 11 inin FigureFigure 22.2.. However,However, forfor Target TargetsTargetss 2–4,22––4,4, the the camera camera motion motion causes causes different different movements. movements. This This This means means means that that that in in in some some cases, cases, differentdifferent motionsmotions maymay bebebe difficultdifficultdifficult to toto decouple decoupledecouple from fromfrom the thethe observation observationobservation of ofof limited limitedlimited FOV FOVFOV images, imagesimages while,, whilewhile it it isit ispossibleis possiblepossible to toto distinguish distinguishdistinguish from fromfrom the thethe observation observationobservation of ofof full-view fullfull--viewview images. images.images.
Target1 Target1 m1 m2 m1 m2
Rotation Rotation
Target2 m1 Target2 m1 m2 Camera m2 Target4 m2 Camera m2 Target4 m1 Translation m1 Translation
Target3 Target3 m2 m1 m2 m1 Figure 2. The translation of a limited-view camera along the horizontal axis and the horizontal rotation FigureFigure 2. 2. TheThe translation translation of of a a limited limited--viewview camera camera along along the the horizontal horizontal axis axis and and the the horizontal horizontal may result in the same movement for some features in images (Target 1); however, the same camera rotationrotation maymay resultresult inin thethe samesame movementmovement forfor somesome featuresfeatures inin imagesimages (Target(Target 1)1);; however,however, thethe samesame motion causes different movements for the features in other places (Target 2–4), only a full-view camera cameracamera motion motion causes causes different different movements movements for for the the features features in in other other places places (Target (Target 2 2––4),4), only only a a can capture them. fullfull--viewview cameracamera cancan capturecapture them.them. Based on the above observations, a vision-based SLAM method of using full-view images can Based on the above observations, a vision-based SLAM method of using full-view images can effectivelyBased on manage the above sparse-feature observations, or partiallya vision- non-featurebased SLAM environments, method of using and full also-view achieve images higher can effectively manage sparse-feature or partially non-feature environments, and also achieve higher effectivelyaccuracy in mana localizationge sparse and-feature mapping or partially than conventional non-feature limited environments field-of-view, and methods. also achieve Until higher now, accuracy in localization and mapping than conventional limited field-of-view methods. Until now, accuracyfew SLAM in methodslocalization for usingand mapping omnidirectional than conventional images have limited been proposed, field-of-view and althoughmethods. a Until wider now, view few SLAM methods for using omnidirectional images have been proposed, and although a wider fewresults SLAM in a bettermethods performance for using omnidirectional in localization and images mapping, have therebeen areproposed, no experiments and alth assessingough a wider how view results in a better performance in localization and mapping, there are no experiments assessing viewaccuracy results is affected in a better by performance the view field. in localization and mapping, there are no experiments assessing how accuracy is affected by the view field. how accuracyIn this paper, is affected we realize by the simultaneous view field. localization and mapping (SLAM) on full-view images. In this paper, we realize simultaneous localization and mapping (SLAM) on full-view images. The principleIn this paper, is similar we realize to the simultaneous typical approach localization of the conventional and mapping SLAM (SLAM) methods, on full PTAM-view (parallelimages. TheThe prprincipleinciple isis similarsimilar toto thethe typicaltypical approachapproach ofof thethe conventionalconventional SLAMSLAM methods,methods, PTAMPTAM (parallel(parallel tracking and mapping) [[1].1]. In the proposed method, a full-viewfull-view image is captured by Ricoh Theta [[2].2]. trackingNext, feature and mapping) points are [1]. extracted In the proposed from the method, full-view a full image.-view Then,image spherical is captured projection by Ricoh is Theta used [2]. to Next,Next, feature feature points points are are extracted extracted from from the the full full--viewview image.image. Th Then,en, spherical spherical projection projection is is used used to to compute projection and back-projectionback-projection of the scene points. Finally, feature matching is performed computeusing a spherical projection epipolar and back constraint.-projection The of characteristic the scene points. of this Finally, paper isfeature that a matching spherical modelis performed is used usingusing aa sphericalspherical epipolarepipolar constraint.constraint. TheThe characteristiccharacteristic ofof thisthis paperpaper isis thatthat aa sphericalspherical modelmodel isis usedused throughout processing,processing, from featurefeature extractingextracting toto localizationlocalization andand mappingmapping computing.computing. throughoutThe rest proc of thisessing, paper from is organized feature extracting as follows. to Inlocalization the next section, and mapping we introduce computing. the related research. TheThe rest rest of of this this paper paper is is organized organized as as follows. follows. In In the the next next section, section, we we introduceintroduce thethe related related Inresearch. Section In3, weSection introduce 3, we aintroduce camera model. a camera Section model.4 describes Section our4 describes system. our Comprehensive system. Compre validationshensive research.are shown In in Section Section 35, .we Finally, introduce we summarize a camera model. our work Section in Section 4 describes6. our system. Comprehensive validationsvalidations areare shownshown inin SectionSection 55.. Finally,Finally, wewe summarizesummarize ourour workwork inin SectionSection 66..
Appl. Sci. 2018, 8, 2268 3 of 16
2. Related Works In this section, we first introduce the related research in two categories. The first addresses omnidirectional image sensors, while the other addresses the SLAM methods for using omnidirectional image sensors. Then, we explain the characteristics of the proposed method.
2.1. Omnidirectional Image Sensor A wider FOV means more visual information. One of the methods for covering a wide FOV uses a cluster of cameras. However, multiple camera capture devices are needed, requiring a complicated calibration of multiple cameras. Conversely, it is desirable to observe a wide scene using a single camera during a real-time task. For this purpose, other than fisheye cameras [3], omnidirectional image sensors have been developed, and a popular method uses a mirror combined with a catadioptric omnidirectional camera. The omnidirectional images are acquired using mirrors such as a spherical [4], conical [5], convex hyperbolic [6], or convex parabolic [7]. To acquire a full-view image, a camera with a pair of fisheye lenses was also developed [8].
2.2. Omnidirectional-Vision SLAM Recently, many vision-based SLAM algorithms have been proposed [9]. According to the FOV of the cameras, these approaches can be grouped into two categories: one is based on perspective images with conventional limited FOV, and the other is an approach that relies on omnidirectional images with hemispherical FOV. Here, we focus the discussion of the related work on those omnidirectional SLAM approaches. One of the most popular techniques for SLAM with omnidirectional cameras is the extended Kalman filter (EKF). For example, Rituerto et al. [10] integrated the spherical camera model into the EKF-SLAM by linearizing direct and the inverse projection. Developed in [10], Gutierrez et al. [11] introduced a new computation for the descriptor patch for catadioptric omnidirectional cameras that aimed to reach rotation and scale invariance. Then, a new view initialization mechanism was presented for the map building process within the problem of EKF-based visual SLAM [12]. Gamallo et al. [13] proposed a SLAM algorithm (OV-FastSLAM) for omniVision cameras operating with severe occlusions. Chapoulie et al. [14] presented an approach which was applied to SLAM that addressed qualitative loop closure detection using a spherical view. Caruso et al. [15] proposed an extension of LSD-SLAM to a generic omnidirectional camera model. The resulting method was capable of handling central projection systems such as fish-eye and catadioptric cameras. In the above omnidirectional-vision SLAM, although they used omnidirectional images, their images could still not obtain a view as wide as that of full-view images. Additionally, since processing such as feature extraction and feature matching was not conducted based on the spherical model, they used perspective projection. These approaches are not considered a spherical-model-based approach, according to a recent survey [16]. However, visual odometry and structure from motion are also designed for estimating camera motion and 3D structure in an unknown environment. There are some camera pose estimation methods for full-vision systems were proposed [17]. An extrinsic camera parameter recovery method for a moving omni-directional multi-camera system was proposed; this method is based on using the shape-from-motion and the PnP techniques [18]. Pagani et al. investigated the use of full spherical panoramic images for Structure form Motion algorithms; several error models for solving the pose problem, and the relative pose problem were introduced [19]. Scaramuzza et al. obtained a 3D metric reconstruction of a scene from two highly-distorted omnidirectional images by using image correspondence only [20]. Our spherical model is similar to theirs; their methods also prove that wide-view based on the spherical model has its advantages over limited-view. The aim of this research is to develop a SLAM method for a mobile robot in indoor environments. The proposed method is based on a spherical model. The problem of tracking failure due to limited Appl. Sci. 2018, 8, 2268 4 of 16 viewAppl. Sci. can 2018 be, completely8, x FOR PEER avoided. REVIEW Furthermore, as described in Section1, full-view images also result4 of 16 in better performance for SLAM. The processing of the proposed full-view SLAM method is based onmethod a spherical is based camera on a model.spherical The camera effectiveness model. The of the effectiveness proposed method of the proposed is shown method using real-world is shown experimentsusing real-world in indoor experiments environments. in indoor Additionally, environments. an experiment Additionally, is also conducted an experiment to prove that is also the accuracyconducted is to affected prove bythat the the view accuracy field. is affected by the view field.
3.3. CameraCamera ModelModel andand Full-ViewFull-View ImageImage AlthoughAlthough thethe perspectiveperspective projectionprojection modelmodel servesserves asas thethe dominantdominant imagingimaging modelmodel inin computercomputer vision,vision, there there are are several several factorsfactors thatthat make make the the perspective perspective model model far far too too restrictive. restrictive. A A variety variety of of non-perspectivenon-perspective imaging systems have have been been developed, developed, such such as as a acatadioptric catadioptric se sensornsor that that uses uses a acombination combination of oflenses lenses and and mirrors mirrors [21], [21 wide], wide-angle-angle lens lenssystems systems [3], clusters [3], clusters of cameras of cameras [22], and [22 ],a andcompound a compound camera camera [23]. [ No23]. matter No matter what what form form of imagingof imaging system system is isused used,, Michael Michael D D et et al. [[2424]] presentedpresented a general imaging model model to to represent represent an an arbitrary arbitrary imaging imaging system system.. In Intheir their opinion, opinion, an animaging imaging system system may may be modeled be modeled as a asset aofset raxels of raxelson a sphere on a spheresurrounding surrounding the imaging the imaging system. system.Strictly Strictlyspeaking, speaking, there is there not a is commercial not a commercial single single viewpoint viewpoint full-view full-view camera. camera. In this In this paper paper,, we weapproximate approximate RIC RICOH-THETAOH-THETA (Ricoh, (Ricoh, 2016) 2016) as as a a single single viewpoint viewpoint full full-view-view camera because the the distancedistance betweenbetween thethe twotwo viewpointviewpoint ofof thethe fisheyefisheye lenslens ofof thethe RICOH-THETARICOH-THETA isis veryvery small,small, relativerelative toto thethe observedobserved scenesscenes [[2525].]. ThisThis approximationapproximation isis ratherrather reasonablereasonable forfor aa multiplemultiple cameracamera motionmotion estimationestimation approach,approach, asas mentionedmentioned inin [[2626].]. The RICOH-THETARICOH-THETA camera is onlyonly usedused asas aa capturecapture devicedevice ofof aa single viewpoint full full-view-view image image.. T Thehe captured captured full full-view-view image image is is then then transformed transformed to to a adiscrete discrete spherical spherical image; image; the the proposed proposed method method is is based based on on the the discrete discrete spherical spherical image (Fig (Figureure 33a).a). TheThe discretediscrete sphericalspherical modelmodel isis referredreferred toto [27[27].].
(a) (b)
(c)
FigureFigure 3. 3.( a()a the) the proposed proposed method method is based is based on the on discrete the discrete spherical spherical image; (b image)The spherical; (b)The projection spherical model;projection (c) full-viewmodel; (c image.) full-view image.
Next,Next, we describe the the spherical spherical projection projection model. model. Suppose Suppose there there is a is sphere a sphere with with a radius a radius f andf anda point a point p in spacep in space,, as shown as shown in Figure in Figure 3b. The3b. intersection The intersection of theof sphere the sphere surface surface with the with line the joining line joiningpoint p pointand thep and center the centerof the ofsphere the sphere is the isprojection the projection of the of point the point to the to sphere. the sphere. Suppose Suppose the theray raydirection direction of the of the projection projection is determinedis determined by by a polara polar angle, angle, ,θ , and and an an azimuth azimuth angle, angle, ϕ, andand thethe coordinatescoordinates ofof aa pointpoint pp inin spacespace isis T Mc = (Xc, YC, Zc) (1) = ( , , ) (1) The projection of point p, in a spherical model, can be represented as The projection of point p, in a spherical model, can be represented as