applied sciences

Article Spherical-Model-Based SLAM on Full-View Images for Indoor Environments

Jianfeng Li 1 , Xiaowei Wang 2 and Shigang Li 1,2,*

1 School of Electronic and Information Engineering, also with the Key Laboratory of Non-linear Circuit and Intelligent Information Processing, Southwest University, Chongqing 400715, China; [email protected] 2 Graduate School of Information Sciences, Hiroshima City University, Hiroshima 7313194, Japan; [email protected] * Correspondence: [email protected]

 Received: 14 October 2018; Accepted: 14 November 2018; Published: 16 November 2018 

Featured Application: For an indoor mobile robot, finding its location and building environment maps.

Abstract: As we know, SLAM (Simultaneous Localization and Mapping) relies on surroundings. A full-view image provides more benefits to SLAM than a limited-view image. In this paper, we present a spherical-model-based SLAM on full-view images for indoor environments. Unlike traditional limited-view images, the full-view image has its own specific imaging principle (which is nonlinear), and is accompanied by distortions. Thus, specific techniques are needed for processing a full-view image. In the proposed method, we first use a spherical model to express the full-view image. Then, the algorithms are implemented based on the spherical model, including feature points extraction, feature points matching, 2D-3D connection, and projection and back-projection of scene points. Thanks to the full field of view, the experiments show that the proposed method effectively handles sparse-feature or partially non-feature environments, and also achieves high accuracy in localization and mapping. An experiment is conducted to prove that the accuracy is affected by the view field.

Keywords: full-view image; spherical model; SLAM

1. Introduction For a mobile robot, finding its location and building environment maps are basic and important tasks. An answer to this need is the development of simultaneous localization and mapping (SLAM) methods. For vision-based SLAM systems, localization and mapping are achieved by observing the features of environments via a camera. Therefore, the performance of a vision-based SLAM method depends not only on the algorithm, but also the feature distribution of environments observed by the camera. Figure1 shows a sketch of an indoor environment. For such a room, there may be few features within the field of view (FOV). Observing such scenes with a limited FOV camera, the SLAM often fails to work. However, these problems can be avoided by using a full-view camera.

Appl. Sci. 2018, 8, 2268; doi:10.3390/app8112268 www.mdpi.com/journal/applsci Appl. Sci. 2018, 8, 2268 2 of 16 Appl.Appl. Sci.Sci. 20182018,, 88,, xx FORFOR PEERPEER REVIEWREVIEW 22 ofof 1616

FiFiFigureguregure 1.1. TheTheThe performanceperformance performance ofof aa visionvision vision-based--basedbased SLAMSLAM isis influencedinfluenced influenced byby thethe fieldfield field of of view. view.

AnotherAnother reasonreason forfor usingusing aa fullfull full-view--viewview cameracamera isis thatthat itit it improvesimproves improves thethe the accuracyaccuracy accuracy ofof of SLAMSLAM SLAM methods.methods. methods. BecauseBecause somesome differentdifferent motionsmotions maymay havehave similarsimilar changeschanges inin in anan imageimage ofof aa limitelimite limiteddd FOVFOV camera,camera, thesethese motions motions motions are are are discriminated discriminated discriminated withwith with aa afull full full-view--viewview image. image. image. For For For example, example, example, as as shown shown as shown in in Figure Figure in Figure 2, 2, the the2, translationthetranslation translation ofof aa of limitedlimited a limited-view--viewview cameracamera camera alongalong along thethe the horizontalhorizontal horizontal axisaxis axis oror or rotationrotation rotation inin in thethe the horizontalhorizontal horizontal directiondirection direction maymay resultresult inin the the same same movementmovement forfor somesome feafea featuresturestures onon images,images, i.e.,i.e., i.e., TargetTarget Target 11 inin FigureFigure 22.2.. However,However, forfor Target TargetsTargetss 2–4,22––4,4, the the camera camera motion motion causes causes different different movements. movements. This This This means means means that that that in in in some some cases, cases, differentdifferent motionsmotions maymay bebebe difficultdifficultdifficult to toto decouple decoupledecouple from fromfrom the thethe observation observationobservation of ofof limited limitedlimited FOV FOVFOV images, imagesimages while,, whilewhile it it isit ispossibleis possiblepossible to toto distinguish distinguishdistinguish from fromfrom the thethe observation observationobservation of ofof full-view fullfull--viewview images. images.images.

Target1 Target1 m1 m2 m1 m2

Rotation Rotation

Target2 m1 Target2 m1 m2 Camera m2 Target4 m2 Camera m2 Target4 m1 Translation m1 Translation

Target3 Target3 m2 m1 m2 m1 Figure 2. The translation of a limited-view camera along the horizontal axis and the horizontal rotation FigureFigure 2. 2. TheThe translation translation of of a a limited limited--viewview camera camera along along the the horizontal horizontal axis axis and and the the horizontal horizontal may result in the same movement for some features in images (Target 1); however, the same camera rotationrotation maymay resultresult inin thethe samesame movementmovement forfor somesome featuresfeatures inin imagesimages (Target(Target 1)1);; however,however, thethe samesame motion causes different movements for the features in other places (Target 2–4), only a full-view camera cameracamera motion motion causes causes different different movements movements for for the the features features in in other other places places (Target (Target 2 2––4),4), only only a a can capture them. fullfull--viewview cameracamera cancan capturecapture them.them. Based on the above observations, a vision-based SLAM method of using full-view images can Based on the above observations, a vision-based SLAM method of using full-view images can effectivelyBased on manage the above sparse-feature observations, or partiallya vision- non-featurebased SLAM environments, method of using and full also-view achieve images higher can effectively manage sparse-feature or partially non-feature environments, and also achieve higher effectivelyaccuracy in mana localizationge sparse and-feature mapping or partially than conventional non-feature limited environments field-of-view, and methods. also achieve Until higher now, accuracy in localization and mapping than conventional limited field-of-view methods. Until now, accuracyfew SLAM in methodslocalization for usingand mapping omnidirectional than conventional images have limited been proposed, field-of-view and althoughmethods. a Until wider now, view few SLAM methods for using omnidirectional images have been proposed, and although a wider fewresults SLAM in a bettermethods performance for using omnidirectional in localization and images mapping, have therebeen areproposed, no experiments and alth assessingough a wider how view results in a better performance in localization and mapping, there are no experiments assessing viewaccuracy results is affected in a better by performance the view field. in localization and mapping, there are no experiments assessing how accuracy is affected by the view field. how accuracyIn this paper, is affected we realize by the simultaneous view field. localization and mapping (SLAM) on full-view images. In this paper, we realize simultaneous localization and mapping (SLAM) on full-view images. The principleIn this paper, is similar we realize to the simultaneous typical approach localization of the conventional and mapping SLAM (SLAM) methods, on full PTAM-view (parallelimages. TheThe prprincipleinciple isis similarsimilar toto thethe typicaltypical approachapproach ofof thethe conventionalconventional SLAMSLAM methods,methods, PTAMPTAM (parallel(parallel tracking and mapping) [[1].1]. In the proposed method, a full-viewfull-view image is captured by Ricoh Theta [[2].2]. trackingNext, feature and mapping) points are [1]. extracted In the proposed from the method, full-view a full image.-view Then,image spherical is captured projection by Ricoh is Theta used [2]. to Next,Next, feature feature points points are are extracted extracted from from the the full full--viewview image.image. Th Then,en, spherical spherical projection projection is is used used to to compute projection and back-projectionback-projection of the scene points. Finally, feature matching is performed computeusing a spherical projection epipolar and back constraint.-projection The of characteristic the scene points. of this Finally, paper isfeature that a matching spherical modelis performed is used usingusing aa sphericalspherical epipolarepipolar constraint.constraint. TheThe characteristiccharacteristic ofof thisthis paperpaper isis thatthat aa sphericalspherical modelmodel isis usedused throughout processing,processing, from featurefeature extractingextracting toto localizationlocalization andand mappingmapping computing.computing. throughoutThe rest proc of thisessing, paper from is organized feature extracting as follows. to Inlocalization the next section, and mapping we introduce computing. the related research. TheThe rest rest of of this this paper paper is is organized organized as as follows. follows. In In the the next next section, section, we we introduceintroduce thethe related related Inresearch. Section In3, weSection introduce 3, we aintroduce camera model. a camera Section model.4 describes Section our4 describes system. our Comprehensive system. Compre validationshensive research.are shown In in Section Section 35, .we Finally, introduce we summarize a camera model. our work Section in Section 4 describes6. our system. Comprehensive validationsvalidations areare shownshown inin SectionSection 55.. Finally,Finally, wewe summarizesummarize ourour workwork inin SectionSection 66..

Appl. Sci. 2018, 8, 2268 3 of 16

2. Related Works In this section, we first introduce the related research in two categories. The first addresses omnidirectional image sensors, while the other addresses the SLAM methods for using omnidirectional image sensors. Then, we explain the characteristics of the proposed method.

2.1. Omnidirectional Image Sensor A wider FOV means more visual information. One of the methods for covering a wide FOV uses a cluster of cameras. However, multiple camera capture devices are needed, requiring a complicated calibration of multiple cameras. Conversely, it is desirable to observe a wide scene using a single camera during a real-time task. For this purpose, other than fisheye cameras [3], omnidirectional image sensors have been developed, and a popular method uses a mirror combined with a catadioptric omnidirectional camera. The omnidirectional images are acquired using mirrors such as a spherical [4], conical [5], convex hyperbolic [6], or convex parabolic [7]. To acquire a full-view image, a camera with a pair of fisheye lenses was also developed [8].

2.2. Omnidirectional-Vision SLAM Recently, many vision-based SLAM algorithms have been proposed [9]. According to the FOV of the cameras, these approaches can be grouped into two categories: one is based on perspective images with conventional limited FOV, and the other is an approach that relies on omnidirectional images with hemispherical FOV. Here, we focus the discussion of the related work on those omnidirectional SLAM approaches. One of the most popular techniques for SLAM with omnidirectional cameras is the extended Kalman filter (EKF). For example, Rituerto et al. [10] integrated the spherical camera model into the EKF-SLAM by linearizing direct and the inverse projection. Developed in [10], Gutierrez et al. [11] introduced a new computation for the descriptor patch for catadioptric omnidirectional cameras that aimed to reach rotation and scale invariance. Then, a new view initialization mechanism was presented for the map building process within the problem of EKF-based visual SLAM [12]. Gamallo et al. [13] proposed a SLAM algorithm (OV-FastSLAM) for omniVision cameras operating with severe occlusions. Chapoulie et al. [14] presented an approach which was applied to SLAM that addressed qualitative loop closure detection using a spherical view. Caruso et al. [15] proposed an extension of LSD-SLAM to a generic omnidirectional camera model. The resulting method was capable of handling central projection systems such as fish-eye and catadioptric cameras. In the above omnidirectional-vision SLAM, although they used omnidirectional images, their images could still not obtain a view as wide as that of full-view images. Additionally, since processing such as feature extraction and feature matching was not conducted based on the spherical model, they used perspective projection. These approaches are not considered a spherical-model-based approach, according to a recent survey [16]. However, visual odometry and structure from motion are also designed for estimating camera motion and 3D structure in an unknown environment. There are some camera pose estimation methods for full-vision systems were proposed [17]. An extrinsic camera parameter recovery method for a moving omni-directional multi-camera system was proposed; this method is based on using the shape-from-motion and the PnP techniques [18]. Pagani et al. investigated the use of full spherical panoramic images for Structure form Motion algorithms; several error models for solving the pose problem, and the relative pose problem were introduced [19]. Scaramuzza et al. obtained a 3D metric reconstruction of a scene from two highly-distorted omnidirectional images by using image correspondence only [20]. Our spherical model is similar to theirs; their methods also prove that wide-view based on the spherical model has its advantages over limited-view. The aim of this research is to develop a SLAM method for a mobile robot in indoor environments. The proposed method is based on a spherical model. The problem of tracking failure due to limited Appl. Sci. 2018, 8, 2268 4 of 16 viewAppl. Sci. can 2018 be, completely8, x FOR PEER avoided. REVIEW Furthermore, as described in Section1, full-view images also result4 of 16 in better performance for SLAM. The processing of the proposed full-view SLAM method is based onmethod a spherical is based camera on a model.spherical The camera effectiveness model. The of the effectiveness proposed method of the proposed is shown method using real-world is shown experimentsusing real-world in indoor experiments environments. in indoor Additionally, environments. an experiment Additionally, is also conducted an experiment to prove that is also the accuracyconducted is to affected prove bythat the the view accuracy field. is affected by the view field.

3.3. CameraCamera ModelModel andand Full-ViewFull-View ImageImage AlthoughAlthough thethe perspectiveperspective projectionprojection modelmodel servesserves asas thethe dominantdominant imagingimaging modelmodel inin computercomputer vision,vision, there there are are several several factorsfactors thatthat make make the the perspective perspective model model far far too too restrictive. restrictive. A A variety variety of of non-perspectivenon-perspective imaging systems have have been been developed, developed, such such as as a acatadioptric catadioptric se sensornsor that that uses uses a acombination combination of oflenses lenses and and mirrors mirrors [21], [21 wide], wide-angle-angle lens lenssystems systems [3], clusters [3], clusters of cameras of cameras [22], and [22 ],a andcompound a compound camera camera [23]. [ No23]. matter No matter what what form form of imagingof imaging system system is isused used,, Michael Michael D D et et al. [[2424]] presentedpresented a general imaging model model to to represent represent an an arbitrary arbitrary imaging imaging system system.. In Intheir their opinion, opinion, an animaging imaging system system may may be modeled be modeled as a asset aofset raxels of raxelson a sphere on a spheresurrounding surrounding the imaging the imaging system. system.Strictly Strictlyspeaking, speaking, there is there not a is commercial not a commercial single single viewpoint viewpoint full-view full-view camera. camera. In this In this paper paper,, we weapproximate approximate RIC RICOH-THETAOH-THETA (Ricoh, (Ricoh, 2016) 2016) as as a a single single viewpoint viewpoint full full-view-view camera because the the distancedistance betweenbetween thethe twotwo viewpointviewpoint ofof thethe fisheyefisheye lenslens ofof thethe RICOH-THETARICOH-THETA isis veryvery small,small, relativerelative toto thethe observedobserved scenesscenes [[2525].]. ThisThis approximationapproximation isis ratherrather reasonablereasonable forfor aa multiplemultiple cameracamera motionmotion estimationestimation approach,approach, asas mentionedmentioned inin [[2626].]. The RICOH-THETARICOH-THETA camera is onlyonly usedused asas aa capturecapture devicedevice ofof aa single viewpoint full full-view-view image image.. T Thehe captured captured full full-view-view image image is is then then transformed transformed to to a adiscrete discrete spherical spherical image; image; the the proposed proposed method method is is based based on on the the discrete discrete spherical spherical image (Fig (Figureure 33a).a). TheThe discretediscrete sphericalspherical modelmodel isis referredreferred toto [27[27].].

(a) (b)

(c)

FigureFigure 3. 3.( a()a the) the proposed proposed method method is based is based on the on discrete the discrete spherical spherical image; (b image)The spherical; (b)The projection spherical model;projection (c) full-viewmodel; (c image.) full-view image.

Next,Next, we describe the the spherical spherical projection projection model. model. Suppose Suppose there there is a is sphere a sphere with with a radius a radius f andf anda point a point p in spacep in space,, as shown as shown in Figure in Figure 3b. The3b. intersection The intersection of theof sphere the sphere surface surface with the with line the joining line joiningpoint p pointand thep and center the centerof the ofsphere the sphere is the isprojection the projection of the of point the point to the to sphere. the sphere. Suppose Suppose the theray raydirection direction of the of the projection projection is determinedis determined by by a polara polar angle, angle, ,θ , and and an an azimuth azimuth angle, angle, ϕ, andand thethe coordinatescoordinates ofof aa pointpoint pp inin spacespace isis T Mc = (Xc, YC, Zc) (1) = (, , ) (1) The projection of point p, in a spherical model, can be represented as The projection of point p, in a spherical model, can be represented as

= ( sin cos , sin sin , cos ) (2) Then, we have the following relation between the two Appl. Sci. 2018, 8, 2268 5 of 16

m = ( f sin θ cos ϕ, f sin θ sin ϕ, f cos θ)T (2)

Then, we have the following relation between the two

f m = ( f sin θ cos ϕ, f sin θ sin ϕ, f cos θ)T = (X , Y , Z )T (3) ρ c C c

p 2 2 2 where ρ = Xc + Yc + Zc . f Let τ = ρ ; we have ∼ m = τMc = Mc (4)

This means that m is equal to Mc apart from a scale factor. Let f = 1,

m = (sin θ cos ϕ, sin θ sin ϕ, cos θ)T (5)

Now we have the normalized spherical coordinate. This corresponds to the projection onto a unit sphere. Compared with the perspective projection of a planar image based on a pinhole camera model, a full-view image is obtained using the projection of all visible points from the sphere. Thus, a full-view image is the surface of a sphere, with the focus point at the center of the sphere. We use equidistance cylindrical projection to represent a full-view image, as shown in Figure3c. The corners of squares become more distorted as they approach the poles. Thus, the planar algorithm cannot be applied. Strictly speaking, the two lenses should be calibrated in a novel calibration process, rather than spherical approximation model, considering that the distance of two viewpoints in RICOH-THETA is smaller than 1 cm, plus, according to the experiments of the absolute errors of the estimated rotation and translation in [26]. Compared to the purely spherical camera, which did not suffer from errors induced by approximation, the spherical approximation had slightly lower performance, but was comparable when the feature tracking error was large. They also showed that the approximation even results in improvements in accuracy and stability of estimated motion over the exact algorithm.

4. Method Overview Our method continuously builds and maintains a map, which describes the system’s representation of the user’s environment. The map consists of a collection of M point features located in a world coordinate frame W. Each point feature represents a local, spherical-textured patch in the world. The jth point pj on the map has a spherical descriptor dj, and its 3D world coordinate pjW = (xjW, T yjW, zjW) in coordinate frame W. The map also maintains N keyframes, which are taken in various frame coordinates according to the keyframe insertion strategy. Each keyframe K contains a keyframe index i, the rotation RiK and translation TiK between this frame coordinate and the world coordinate. Each keyframe also stores the frame image IiK, the index of map points ViK, which appear in the keyframe, and the map points’ 2D T projection PnViK = (unViK, vnViK) . The flow of our full-view SLAM system is shown in Figure4. Like other SLAM systems, our system also has two main parts: mapping and tracking. The tracking system receives full-view images and maintains an estimation of the camera pose. At every frame, the system performs two-stage coarse-to-fine tracking. After tracking, the system determines whether a keyframe should be added. Once a keyframe is added, a spherical epipolar search is conducted, followed by a spherical match to insert the new map point. The flow of our full-view SLAM system is similar to many existing SLAM approaches. The difference is that these steps are handled based on the spherical model. Each step is now described in detail. Appl. Sci. 2018, 8, 2268 6 of 16 Appl. Sci. 2018, 8, x FOR PEER REVIEW 6 of 16

FigureFigure 4. 4. TheThe flow flow of of our our full full-view-view SLAM SLAM system. system.

4.14.1.. Image Image Acquisition Acquisition × ImagesImages are are captured captured from from a Ricoh Theta S and are scaled to 1280 × 640640.. Before Before tracking, tracking, we we run run SPHORB,SPHORB, a a new new fast fast and and robust robust binary binary feature feature detector detector and and descriptor descriptor for for spherical spherical panoramic panoramic imagesimages [ [2727].]. In In contrast contrast to to the the state-of-the-art state-of-the- sphericalart spherical features, features, this approach this approach stems from stems the from geodesic the geodesicgrid, a nearly grid, equal-areaa nearly equal hexagonal-area hexagonal grid parametrization grid parametrization of the sphere of the used sphere in climate used in modeling. climate modeling.It enables usIt enables to directly us to build directly fine-grained build fine pyramids-grained andpyramids construct and robustconstruct features robust on features the hexagonal on the hexagonalspherical grid, spherical avoiding grid, the costly avoiding computation the costly of spherical computation harmonics of spherical and their harmonics associated bandwidth and their associatedlimitation. bandw The SPHORBidth limitation. feature alsoThe achievesSPHORB scale feature and also rotation achieves invariance. scale and Therefore, rotation eachinvariance. feature Therefore,point has aeach specific feature spherical point has descriptor. a specific spherical descriptor. 4.2. Map Initialization 4.2. Map Initialization Our system is based on the map. When the system starts, we take two frames, which have a slight Our system is based on the map. When the system starts, we take two frames, which have a offset position between them, to perform the spherical match. Then, we represent feature points slight offset position between them, to perform the spherical match. Then, we represent feature using the unit sphere polar coordinate (Equation (5)). Next, we employ the RANSAC five-point stereo points using the unit sphere polar coordinate (Equation (5)). Next, we employ the RANSAC algorithm to estimate an essential matrix. Finally, we triangulate the matched feature pair to compute five-point stereo algorithm to estimate an essential matrix. Finally, we triangulate the matched the map point. The resulting map is refined through bundle adjustment [28]. feature pair to compute the map point. The resulting map is refined through bundle adjustment [28]. 4.3. Map Point Projection and Pose Update 4.3. Map Point Projection and Pose Update To project map points into the image plane; it is first transformed from the world coordinate to the currentTo project frame map coordinate points into C. the image plane; it is first transformed from the world coordinate to the current frame coordinate C. pjC = RCW pjW + TCW (6)

= + (6) where pjC is the jth map point in the current frame coordinate, pjW is the jth map point in the world wcoordinate,here is and theR jCWth mapand T pointCW represent in the current the current frame camera coordinate pose from, the is world the jth coordinate. map point in the  worldThen, coordinate we project, and the map and point p representjC onto the the unit current sphere, camera as m jxpose, mjy from, mjz theshown world in coordinate. Equation (5). Then,Then, we detectwe project all the the feature map point points in currentonto the frame unit bysphere SPHORB., as ( Next,, , we perform) shown a in fixed-range Equation  (5).search Then, around we detectthe unit all coordinate the featurem jx points, mjy, m injz current. Since the frame map by point SPHORB.pjC has aNext, spherical we perform descriptor, a  fixedwe can-range compare search it with around candidate the unit feature coordinate points to ( find , the, best ) match. Since point the p mapˆjC m ˆjx point, mˆjy , mˆjz has. a   sphericalGiven descriptor, a pair of matchedwe can compare feature points, it with mcandidatejx, mjy, m jzfeatureand pointsmˆjx, mˆ jyto, m findˆjz , thethe anglebest match between point the

pair( of, matched, ) feature. points can be computed as

Given a pair of matched feature points, (, , ) and (, , ), the angle between mjx × mˆjx + mjy × mˆjy + mjz × mˆjz the pair of matched featureej = arccospoints( canq be computed as q ) (7) 2 2 2 2 2 2 mjx + mjy + mjz · mˆjx + mˆjy + mˆjz × + × + × = arccos ( ) (7) Since the spherical point is unit, + + ∙ + +

Since the spherical point is unit, Appl. Sci. 2018, 8, 2268 7 of 16

 ej = arccos mjx × mˆjx + mjy × mˆjy + mjz × mˆjz (8)

Substituting Equations (5) and (6) into Equation (8), we obtain a function about rotation RCW and translation TCW. Since for a successfully-matched feature point and true rotation RCW and translation TCW, ej = 0, we can compute rotation RCW and translation TCW by minimizing the following objective function given n matched feature points.

n ∑ ej (9) j=1

4.4. Two-Stage Coarse-to-Fine Tracking Because of the distortion of full-view images, even with a slight camera move, the feature point would have a large move on full-view images. To have more resilience when faced with this problem, our tracking system uses the same tracking strategy as PTAM. At every frame, a prior pose estimation is generated from a motion model. Then, map points are project into the image, and their match points are searched at a coarse range. In this procedure, the camera pose is updated based on 50 coarse matches. After this, map points are re-projected into the image. A tight range is used to search the match point. The final frame pose is calculated from all the matched pairs.

4.5. Tracking Recovery Since the tracking is based on the map point projection, the camera may fail to localize its own position when fewer map points are matched to the current frame. To determine whether the tracking failed, we use a certain threshold of point matching pairs. If the tracking continuously fails for more than a few frames, our system initiates a tracking recovery procedure. We compare the current frame with all keyframes stored on the map using spherical image match. The pose of the closest keyframe is used as the current frame for the next tracking procedure, while the motion model for tracking is not considered for the next frame.

4.6. Keyframe and Map Point Insertion To maintain our system function, as the camera moves, new keyframes and map points are added to the system to include more information about the environment. After the tracking procedure, the current frame is reviewed to determine whether it is a qualified keyframe using the following conditions:

1. The tracking did not fail at this frame; 2. The recovery procedure has not been activated in the past few frames; 3. The distance from the nearest keyframe is greater than a minimum distance; 4. The time since the last keyframe was added should exceed some frames.

Map point insertion is implemented when a new keyframe is added. First, feature points in the new keyframe are considered as candidate map points. Then, we determine whether the candidate point is near successfully observed map points in the current view. This step helps us to discard candidate points that may already be on the map. Next, we select the closest keyframe by distance since map point insertion is not available from a single keyframe. To establish a correspondence between the two views, we use the spherical epipolar search to limit the search range to a small, but accurate range. Based on the tight search range, spherical matching for the candidate map point is more reliable. If a match has been found, the candidate map point is triangulated. However, until this step, we cannot guarantee the candidate map point is a new point which is not on the map. Therefore, we compare the 3D candidate map point with each existing map point by projecting them onto the Appl. Sci. 2018, 8, 2268 8 of 16

first and current keyframe image. If both offsets on the two images are small, we conclude they are the same point, and then connect the current frame information to the existing map point. Finally, the remaining candidate map point is inserted into the map.

4.7. Spherical Epipolar Search Suppose that the same point is observed by two spherical model cameras. Let the coordinates of the same point at the two camera coordinate systems be M1 and M2. The pose between the two cameras is represented by a rotation matrix, R, and a translation vector, T. According to the coplanar principle, we have T M1 EM2 = 0 (10) where E = [T] ×R and _   0 −Tz Ty [ ] =  −  T _×  Tz 0 Tx  (11) −Ty Tx 0 By using the spherical projection in Equation (4), we have

T T M1 EM2 = m1 Em2 = 0 (12)

Given m1 is known, according to Equation (5), we represent Equation (12) as

A sin θ2 cos ϕ2 + B sin θ2 sin ϕ2 + C cos θ2 = 0 (13) where T [A, B, C] = m1 E (14) In practice, we give an error range ∆e > 0

|A sin θ2 cos ϕ2 + B sin θ2 sin ϕ2 + C cos θ2| < ∆e (15)

Equation (15) gives the constraint for the spherical epipolar search and identifies the corresponding feature point.

4.8. Data Refinement After a new keyframe and new map points are added, bundle adjustment must be used as the last step. In our map, each map point corresponds to at least one keyframe. Using keyframe pose and 2D/3D point correspondences, each map point projection has an associated reprojection error, calculated as Equation (7). Finally, bundle adjustment iteratively adjusts all map points and keyframes to minimize the error. The added map point may be incorrect because of matching error. To distinguish the outliers, for every frame tracking, we calculate the reprojection error of visible map points in the current frame. If the number of map point errors exceeds the number of correct map points, the map point is discarded.

5. Validation The system described above was implemented on a desktop PC with a 2.5 GHz Intel(R) CoreTM i5-2400S processor and 8 GB RAM. To evaluate the performance of our full-view SLAM system, we first evaluated our system in a small workspace to test system functions. Then, we used an accuracy test on a measured route, which was in a sparse feature environment. Last, we moved the camera in a room to conduct a real-world test. As this paper focuses on emphasizing the advantage of the full-view-based method over limited-view-based methods, we conducted an experiment to show that the accuracy decreased while the field of view decreased. Appl. Sci. 2018, 8, 2268 9 of 16 Appl. Sci. 2018, 8, x FOR PEER REVIEW 9 of 16 5.1. Functions Test 5.1. Functions Test We prepared a video with 2429 frames. This video, which was recorded from a full-view model We prepared a video with 2429 frames. This video, which was recorded from a full-view model camera, explored a desk and its immediate surroundings. First, we let the camera move in front of camera, explored a desk and its immediate surroundings. First, we let the camera move in front of the desk to produce an overview of the scene. Then, the camera moved away from the desk. Next, the desk to produce an overview of the scene. Then, the camera moved away from the desk. Next, we occluded the camera and let it try to recover tracking at a place near the desk. The full test video we occluded the camera and let it try to recover tracking at a place near the desk. The full test video can be seen at https://youtu.be/_4CFIPA5kZU. can be seen at https://youtu.be/_4CFIPA5kZU. The final map consisted of 31 keyframes and 1243 map points. We selected nine frames The final map consisted of 31 keyframes and 1243 map points. We selected nine frames (frame (frame index: 134, 356, 550, 744, 1067, 1301, 1639, 1990, and 2103) from the test video to illustrate index: 134, 356, 550, 744, 1067, 1301, 1639, 1990, and 2103) from the test video to illustrate the the performance, as shown in Figure5. The yellow plane in these figures represents the world performance, as shown in Figure 5. The yellow plane in these figures represents the world coordinate system, which was established using map initialization. The yellow plane was stable coordinate system, which was established using map initialization. The yellow plane was stable in in the environment during this test. Whether the camera was far or near to an object, our system the environment during this test. Whether the camera was far or near to an object, our system functioned well. The final map is shown in Figure6. From the map, we identified the laptop PC, functioned well. The final map is shown in Figure 6. From the map, we identified the laptop PC, desktop PC, and other items. We also tested the tracking recovery, which was very important to desktop PC, and other items. We also tested the tracking recovery, which was very important to camera re-localization, in this video. It was also used to evaluate our tracking and mapping system. camera re-localization, in this video. It was also used to evaluate our tracking and mapping system. We occluded the camera at frame 2171 and stopped occluding at frame 2246. Simultaneous with the We occluded the camera at frame 2171 and stopped occluding at frame 2246. Simultaneous with the occluding, we moved the camera from a distant place to a closer place relative to the desk. Our system occluding, we moved the camera from a distant place to a closer place relative to the desk. Our successfully localized itself after being moved (Figure7). system successfully localized itself after being moved (Figure 7).

(a)

(b)

(c)

Figure 5.5. Sample frames from thethe deskdesk video.video. (aa)) FrameFrame 519;519; ((bb)) FrameFrame 1533;1533; ((cc)) FrameFrame 2294.2294. Appl. Sci. 2018, 8, 2268 10 of 16 Appl. Sci. 2018, 8, x FOR PEER REVIEW 10 of 16 Appl. Sci. 2018, 8, x FOR PEER REVIEW 10 of 16

((a)

((b)

FigureFigure 6. 6. The The map map produced produced in in thethe deskdesk video.video. ( (a)) m mapap points andand itsits correspondingcorresponding real real situation; situation; (b()b thet)he the map map points points are are coincide coincide with with the the realrealreal situation.situation.situation.

(a) (a)

(b) (b)

Figure 7. Cont. Appl. Sci. 2018, 8, 2268 11 of 16

Appl. Sci. 20182018,, 88,, xx FORFOR PEERPEER REVIEWREVIEW 1111 ofof 1616

((cc))

Figure 7.7. Tracking recoveryrecovery inin thethe deskdesk video.video. ( (aa)) Tracking Tracking wellwell beforebefore occlusion;occlusion; ( (b)) LetLet thethe systemsystem lostlost tracking; tracking; ( (cc)) Tracking Tracking recoveryrecoveryrecovery eveneveneven thethethe locationlocationlocation isisis changed.changed.changed.

5.25.2... Accuracy Test TToo validatevalidate our our system system further furtherfurther and and compare compare with the with limited-view the limited method--view (PTAM),method we(PTAM), measured we measuredroutes in a routes space in with a space sparse with features, sparse asfeatures, shown as in shown Figure 8in. TheFigure camera 8. The moved camera from moved the from starting the startingpointstarting to pointpoint the ending toto thethe point endingending with pointpoint no rotationwithwith nono rotation asrotation quickly asas asquicklyquickly possible. as possible. The first The 10 cm first was 10 usedcm was for used map forinitialization;for mapmap initialization;initialization; we provided wewe providedprovided the real the scalethe realreal for scalescale the system. forfor thethe system.system. During DuringDuring the test, thethe the test,test, limited-view thethe limitedlimited SLAM--view SLAMfailed because failed because the environment the environment was sparse. was sparse. Only a Only few features a few features were captured were captured in limited in limited views. Weviews. tested We PTAM tested [PTAM1], one [ of1], the one most of the famous most famous limited-view limited SLAM--view approaches,SLAM approaches, on perspective on perspective images imagesgeneratedimages generated generated directly from directly directly full-view from from fullimages. full--view Figure images.9 shows Figure that 9 PTAM shows failed that PTAM to track failed in a sparse to track feature in a sparseenvironment,sparse featurefeature which environment,environment, was a wall whichwhich in this waswas test. aa However, wallwall inin thisthis our test.test. full-view However,However, SLAM ourour tracked fullfull--view well inSLAM this situation tracked wellbecause in this it can situation capture because features it can from capture other place features except from the other wall. place The except final map the wall. is shown The asfinal the map left isimageis shownshown in Figureasas thethe 10leftleft. Two imageimage boxes inin FigureFigure were defined 10.10. TwoTwo in boxesboxes the final werewere map. defineddefined The coordinate inin thethe finalfinal system map.map. onTheThe the coordinatecoordinate map was systemthesystem world onon coordinate thethe mapmap waswas system, thethe worldworld which coordinatecoordinate was also the system,system, starting which point was in this also test. the Each starting green point point in represents this test. Eacha keyframe. green point The trajectory,represents which a keyframe. was very The close trajectory, to the groundwhich was truth, very is shownclose to as the the ground right image truth, inis shownFigureshown 10asas . thethe The rightright video imageimage can be inin seenFigureFigure at 10.https://youtu.be/c1ER1OqyjXM10. TheThe videovideo cancan bebe seenseen atat https://youtu.be/c1ER1OqyjXM. ..

Figure 8. AA measured measured route in a sparse environment.

Figure 9.9. At the same position, PTAM fails to track asas fewerfewer featurefeature pointspoints visible,visible, while full-viewfull--view SLAMSLAM tracks tracks well. well. Appl. Sci. 2018, 8, 2268 12 of 16 Appl. Sci. 2018, 8, x FOR PEER REVIEW 12 of 16 Appl. Sci. 2018, 8, x FOR PEER REVIEW 12 of 16

Figure 10. The map and the trajectory for Figure 9. Figure 10. TheThe map and the trajectory for Figure 99..

Then, we conducted an experiment to test our system’s accuracy in a larger environment. We Then, we we conducted conducted an an experiment experiment to totest test our our system’s system’s accuracy accuracy in a in larger a larger environment. environment. We planned a new, measured route, as shown in Figure 11. The camera moved in a clockwise loop pWelanned planned a new a new,, measured measured route, route, as as shown shown in in Figure Figure 11. 11 The. The camera camera moved moved in in a a clockwise clockwise loop through 16 marked points, we randomly put a few objects around the route to generate a sparse through 1616 markedmarked points, points, we we randomly randomly put put a few a few objects objects around around the routethe route to generate to generate a sparse a sparse space, space, plus, no objects are put around points 9–12. Point A was the starting point. The first 10 cm space,plus, no plus, objects no areobjects put aroundare put pointsaround 9–12. point Points 9–12. A was Point the A starting was the point. starting The firstpoint. 10 The cm fromfirst 10 A tocm B from A to B was used for map initialization. Using continuous tracking and mapping, we obtained fromwas used A to forB was map used initialization. for map initialization. Using continuous Using trackingcontinuous and tracking mapping, and we mapping, obtained we the obtained distance the distance between point A and every marked point. We measured the ground truth using a laser thebetween distance point between A and point every A marked and every point. marked We measured point. We the measured ground truththe ground using atruth laser using rangefinder. a laser rangefinder. The final errors for every marked point using our method are shown in Table 1 and rangefinder.The final errors The for final every errors marked for pointevery usingmarked our point method using are shownour method in Table are1 andshown Figure in Table 12. As 1 seen and Figure 12. As seen in Table 1, the accuracy of marked points (8–12) decreased compared to the other Figurein Table 12.1, theAs seen accuracy in Table of marked 1, the accuracy points (8–12) of marked decreased points compared (8–12) decreased to the other compared points. to The the reason other points. The reason is that we set few trackable features around these marked points to simulate a points.is that weThe set reason few trackable is that we features set few around trackable these features marked around points these to simulate marked a points sparse to environment. simulate a sparse environment. Our proposed method is based on full-view images. It uses the features from sparseOur proposed environment. method Our is basedproposed on full-view method images.is based Iton uses full the-view features images. from It uses every the direction, features so from this every direction, so this had only a small influence on our system. However, limited-view based everyhad only direction, a small soinfluence this had on only our system. a small However, influence limited-view on our system. based However, methods, limited such as-view PTAM based [1], methods, such as PTAM [1], immediately failed to track at these points because of a limited view methods,immediately such failed as PTAM to track [1], at immediately these points failed because to track of a limited at these view points with because few features. of a limited The resultview with few features. The result shows that our method tracked in a sparse environment, and that the withshows few that features. our method The result tracked shows in a that sparse our environment, method tracked and in that a sparse the accuracy environment of our, methodand thatwas the accuracy of our method was sufficient. The average error was 0.02 m. accuracysufficient. of The our average method error was wassufficient. 0.02 m. The average error was 0.02 m.

Figure 11. The designed route. Figure 11. The designed route.

Figure 12. The errors of marked points. Figure 12. TheThe errors of marked points.

Appl. Sci. 2018, 8, x FOR PEER REVIEW 13 of 16 Appl. Sci. 2018, 8, 2268 13 of 16 Table 1. The accuracy compared with ground truth.

Marked PointTable Calculated 1. The accuracy Value compared (m) Ground with ground Truth (m) truth. Errors (m) 1 0.290 0.299 0.009 Marked Point Calculated Value (m) Ground Truth (m) Errors (m) 2 0.601 0.602 0.001 31 1.086 0.290 1.082 0.299 0.004 0.009 2 0.601 0.602 0.001 4 1.131 1.116 0.015 3 1.086 1.082 0.004 54 1.241 1.131 1.239 1.116 0.002 0.015 65 1.401 1.241 1.406 1.239 0.005 0.002 76 1.604 1.401 1.615 1.406 0.011 0.005 87 1.830 1.604 1.857 1.615 0.027 0.011 98 1.546 1.830 1.590 1.857 0.044 0.027 9 1.546 1.590 0.044 10 1.482 1.521 0.039 10 1.482 1.521 0.039 1111 1.473 1.473 1.516 1.516 0.043 1212 1.515 1.515 1.559 1.559 0.044 1313 0.998 0.998 1.001 1.001 0.007 1414 0.735 0.735 0.741 0.741 0.006 1515 0.506 0.506 0.522 0.522 0.016 16 0.378 0.425 0.047 16 0.378 0.425 0.047 AverageAverage error == 0.0200.020 mm

5.3.5.3. Route Test In this this section section,, we we demonstrate demonstrate the the performance performance of of our our system system in real in- real-worldworld situations. situations. We Weallowed allowed the thecamera camera to move to move along along the the wall wall in ina asquare square in in a aroom room to to build build the the map. Some sample frames are shown in Figure 13 to illustrate the formation of the map and the trajectory. We markedmarked ““PP”” onon mapmap imagesimages toto definedefine thethe currentcurrent cameracamera positions and marked “door” on both the full-viewfull-view image and and map map image. image. Readers Readers may may find the find relationship the relationship between between the map and the the map real and environment the real easierenvironment to understand easier to using understand these markers. using these The markers. final trajectory The final was trajectory a square was as planned. a square Theas planned. video is availableThe video at ishttps://youtu.be/QzilX069U0M available at https://youtu.be/QzilX069U0M.. The surrounding The estimation surrounding and estimation the trajectory and were the similartrajectory to thatwere of similar the real to environment.that of the real environment. To have a further further validation validation on on route route test test,, as as Figure Figure 14 14a shows,a shows, we we put put the the full full-view-view camera camera on onthe theMAV, MAV, then then letlet the the MAV MAV fly fly on on a aplanned planned route. route. One One side side of of the the route route is is a a tall building with unique features, while the other side is full ofof treestrees withwith confusingconfusing features.features. The finalfinal trajectory is very close to our planned route. Our final final est estimatedimated trajectory, the trajectory from PTAM, and the trajectory from GPS,GPS, are compared in Figure 1414.. PTAM performs badly and even loses tracking as MAV rotates,rotates, becausebecause limited-viewlimited-view SLAM systems would not insert the keyframe if only rotation happens. And And t thehe trajectories trajectories from from our our system system and and GPS GPS are are very very close, close, but but around around the start point, the GPS GPS has has some some errors, errors which, which is more is more than thanthe estimated the estimated trajectory trajectory by our system. by our The system. GPS trajectoryThe GPS showstrajectory that shows the MAV that the is very MAV close is very to the close wall, to the which wall is, which not possible; is not possible this is; because this is because the GPS the signal GPS maysignal be may affected be affected by the by tall the building tall building nearby. nearby. The The whole whole test test video video by by our our system system is is availableavailable on https://youtu.be/R8C_W5HY1Dg. .

Figure 13. Sample frames from the indoor test. Appl. Sci. 2018, 8, 2268 14 of 16 Appl. Sci. 2018, 8, x FOR PEER REVIEW 14 of 16 Appl. Sci. 2018, 8, x FOR PEER REVIEW 14 of 16

(a) (b) (a) (b)

(c) (d) (c) (d) Figure 14. (a) full-view camera on the MAV; (b)PTAM performs badly as MAV rotates or in a sparse Figure 14.14. ((a) full-viewfull-view camera on the MAV; (b)PTAM)PTAM performsperforms badlybadly asas MAVMAV rotatesrotates oror inin aa sparsesparse feature view; (c) GPS flight trajectory; (d) The flight trajectory by GPS, PTAM, and our system. featurefeature view;view; ((cc)) GPSGPS flightflight trajectory;trajectory; ((dd)) TheThe flightflight trajectorytrajectory byby GPS,GPS, PTAM,PTAM, and and our our system. system. 5.4. Accuracy Validation with a Decrease in the Field of View 5.4.5.4. Accuracy Validation with a Decrease inin thethe FieldField ofof ViewView In this section, we validate the inherent disadvantages of limited-view based methods. We InIn this this section, section, we we validate validate the the inherent inherent disadvantages disadvantages of limited-view of limited- basedview methods.based methods. We discuss We discuss the decreased field of vision and how the performance of our system would behave. Similar thediscuss decreased the decreased field of field vision of vision and how and the how performance the performance of our of systemour system would would behave. behave. Similar Similar to to Section 5.2, we designed a route and collected full-view images for tracking and mapping. To Sectionto Section 5.2 ,5.2 we, designed we designed a route a route and collected and collected full-view full images-view images for tracking for tracking and mapping. and mapping. To decrease To decrease the field of view from full-view images, we decreased the range of theta since the full-view thedecrease field ofthe view field fromof view full-view from full images,-view weimages, decreased we decreased the range the of rangetheta ofsince theta the since full-view the full image-view image was presented in the spherical model (Figure 15). Thus, we obtained limited-view images wasimage presented was presented in the spherical in the spherical model model (Figure (Figure 15). Thus, 15). Thus, we obtained we obtained limited-view limited- imagesview images with with desirable view angles. We tested our system with (theta = 160, 140, 120, 100, 80 degrees) views, desirablewith desirable view angles.view angles. We tested We tested our system our system with (theta with= ( 160,theta140, = 160, 120, 140, 100, 120, 80 degrees)100, 80 degrees) views, and views, the and the experiment was conducted ten times. Each time, the limited-view was generated randomly, experimentand the experiment was conducted was conducted ten times. ten Each times. time, Each the time, limited-view the limited was-view generated was generated randomly, randomly, so the so the tracking result was different depending on the features. The accuracy with different theta trackingso the tracking result was result different was diffe dependingrent depending on the features. on the features. The accuracy The accuracy with different with theta differentvalues theta is values is listed in Table 2. listedvalues in is Table listed2 .in Table 2. As shown in Table 2, as the field of view decreased, the accuracy decreased. When the field of As shown in Table2 2,, as as the the field field of of view view decreased, decreased, the the accuracy accuracy decreased. decreased. WhenWhen thethe fieldfield of of view was less than 100 degrees, some experiments failed to track (‘X’ in Table 2). Because a limited view was lessless thanthan 100100 degrees,degrees, somesome experimentsexperiments failed to track (‘X’ in TableTable2 2).). Because Because aa limited limited view means limited information, only limited features from one direction could not retrain a result view meansmeans limitedlimited information,information, only only limited limited features features from from one one direction direction could could not not retrain retrain a resulta result as as accurately as from all directions, especially for sparse environments. The average error showed accuratelyas accurately as fromas from all directions,all directions, especially especially for sparsefor sparse environments. environments. The The average average error error showed showed that that wider views resulted in a better performance. widerthat wider views views resulted resulted in a betterin a better performance. performance.

Figure 15. Achieving limited view from full-view images by decreasing theta. Figure 15. Achieving limited view from full-view images by decreasing theta. Figure 15. Achieving limited view from full-view images by decreasing theta.

Appl. Sci. 2018, 8, 2268 15 of 16

Table 2. The accuracy with different theta.

Times theta = 160 Deg. theta = 140 Deg. theta = 120 Deg. theta = 100 Deg. theta = 80 Deg. 1 0.016 m 0.213 m 0.088 m X X 2 0.184 m 0.093 m 0.294 m X X 3 0.026 m 0.037 m 0.244 m X X 4 0.026 m 0.037 m 0.244 m X X 5 0.126 m 0.078 m 0.159 m 0.158 m X 6 0.015 m 0.301 m 0.093 m 0.187 m X 7 0.015 m 0.097 m 0.156 m 0.170 m 0.111 m 8 0.015 m 0.097 m 0.156 m 0.170 m 0.111 m 9 0.104 m 0.076 m 0.005 m 0.032 m 0.321 m 10 0.077 m 0.102 m 0.027 m 0.141 m 0.315 m Average error 0.060 m 0.113 m 0.147 m X X

6. Conclusions In this paper, we presented a full-view SLAM system for full-view images. Based on the spherical model, our system allowed tracking features even in a sparse environment. However, the limited-view SLAM systems did not perform well. The contribution of this paper is that we realized tracking and mapping directly on full-view images. Key algorithms of SLAM were all adjusted to a spherical model. The experiment showed that our method could successfully be applied in camera pose estimation in an unknown scene. Since our system was recently developed, no processing optimization was considered, and we did not validate the time cost in this paper. Future works will consider increasing the speed of our system using multi-threads and GPU.

Author Contributions: Software, J.L.; Supervision, S.L.; Validation, J.L. and X.W.; Writing—original draft, J.L.; Writing—review & editing, S.L. Funding: This work is supported by Fundamental Research Funds for the Central Universities (SWU20710916). Conflicts of Interest: The authors declare no conflict of interest.

References

1. Klein, G.; Murray, D. Parallel tracking and mapping for small AR workspaces. In Proceedings of the 6th IEEE/ACM ISMAR, Nara, Japan, 13–16 November 2007; pp. 225–234. 2. Available online: https://theta360.com/en/ (accessed on 1 October 2018). 3. Miyamoto, K. Fish eye lens. J. Opt. Soc. Am. 1964, 54, 1060–1061. [CrossRef] 4. Hong, J.; Tan, X.; Pinette, B.; Weiss, R.; Riseman, E.M. Image-based Homing. In Proceedings of the 1991 IEEE International Conference on Automation, Sacramento, CA, USA, 9–11 April 1991; pp. 620–625. 5. Yagi, Y.; Kawato, S.; Tsuji, S. Real-time omni-directional vision for robot navigation. Trans. Robot. Autom. 1994, 10, 11–22. [CrossRef] 6. Yamazawa, K.; Yagi, Y.; Yachida, M. Omnidirectional imaging with hyperboloidal projection. In Proceedings of the 1993 IEEE/RSJ International Conference on Intelligent Robots and Systems, Yokohama, Japan, 26–30 July 1993; pp. 1029–1034. 7. Nayar, S.K. Catadioptric omnidirectional camera. In Proceedings of the IEEE Computer Society Conference on and Pattern Recognition, San Juan, PR, USA, 17–19 June 1997; pp. 482–488. 8. Li, S. Full-view spherical image camera. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; pp. 386–390. 9. Fuentes-Pacheco, J.; Ruiz-Ascencio, J.; Rendon-Mancha, J. Visual simultaneous localization and mapping: A survey. Artif. Intell. Rev. 2015, 43, 55–81. [CrossRef] 10. Rituerto, A.; Puig, L.; Guerrero, J.J. Visual SLAM with an omnidirectional camera. In Proceedings of the 20th International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 348–351. 11. Gutierrez, D.; Rituerto, A.; Montiel, J.M.M.; Guerrero, J.J. Adapting a real-time monocular visual SLAM from conventional to omnidirectional cameras. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; pp. 343–350. Appl. Sci. 2018, 8, 2268 16 of 16

12. Valiente, D.; Jadidi, M.G.; Miró, J.V.; Gil, A.; Reinoso, O. Information-based view initialization in visual SLAM with a single omnidirectional camera. Robot. Auton. Syst. 2015, 72, 93–104. [CrossRef] 13. Gamallo, C.; Mucientes, M.; Regueiro, C.V. Omnidirectional visual SLAM under severe occlusions. Robot. Auton. Syst. 2015, 65, 76–87. [CrossRef] 14. Chapoulie, A.; Rives, P.; Filliat, D. A spherical representation for efficient visual loop closing. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops), Barcelona, Spain, 6–13 November 2011; pp. 335–342. 15. Caruso, D.; Engel, J.; Cremers, D. Large-scale direct SLAM for omnidirectional cameras. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots Syst. (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 141–148. 16. Taketomi, T.; Uchiyama, H.; Ikeda, S. Visual SLAM algorithms: A survey from 2010 to 2016. IPSJ Trans. Comput. Vis. Appl. 2017, 9, 16. [CrossRef] 17. Scaramuzza, D.; Fraundorfer, F. Visual Odometry [Tutorial]. Robot. Autom. Mag. 2011, 18, 80–92. [CrossRef] 18. Sato, T.; Ikeda, S.; Yokoya, N. Extrinsic Camera Parameter Recovery from Multiple Image Sequences Captured by an Omni-Directional Multi-camera System. In Proceedings of the European Conference on Computer Vision, Prague, Czech Republic, 11–14 May 2004; Volume 2, pp. 326–340. 19. Pagani, A.; Stricker, D. Structure from Motion using full spherical panoramic cameras. In Proceedings of the IEEE International Conference on Computer Vision Workshops IEEE, Barcelona, Spain, 6–13 November 2011; pp. 375–382. 20. Scaramuzza, D.; Martinelli, A.; Siegwart, R. A Flexible Technique for Accurate Omnidirectional Camera Calibration and Structure from Motion. In Proceedings of the IEEE International Conference on Computer Vision Systems IEEE, New York, NY, USA, 4–7 January 2006; p. 45. 21. Nayar, S.K.; Baker, S. Catadioptric image formation. In Proceedings of the Darpa Image Understanding Workshop, Bombay, India, 1–4 January 1998; Volume 35, pp. 35–42. 22. Swaminathan, R.; Nayar, S.K. Nonmetric Calibration of Wide-Angle Lenses and Polycameras. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1172–1178. [CrossRef] 23. Gaten, E. Geometrical optics of a galatheid compound eye. J. Comp. Physiol. A 1994, 175, 749–759. [CrossRef] 24. Grossberg, M.D.; Nayar, S.K. A General Imaging Model and a Method for Finding its Parameters. Computer Vision. In Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 108–115. 25. Available online: https://theta360.com/en/about/theta/technology.html (accessed on 1 October 2018). 26. Kim, J.S.; Hwangbo, M.; Kanade, T. Spherical Approximation for Multiple Cameras in Motion Estimation: Its Applicability and Advantages. Comput. Vis. Image Underst. 2010, 114, 1068–1083. [CrossRef] 27. Zhao, Q.; Feng, W.; Wan, L.; Zhang, J. SPHORB: A fast and robust binary feature on the sphere. Int. J. Comput. Vis. 2015, 113, 143–159. [CrossRef] 28. Triggs, B. Bundle adjustment—A modern synthesis. In International Workshop on Vision Algorithms; Springer: Berlin/Heidelberg, Germany, 1999; pp. 298–372.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).