Omnidirectional Visual-Inertial Odometry Using Multi-State Constraint

Milad Ramezani 1 and Kourosh Khoshelham 1 and Laurent Kneip 2

Abstract— We present an Omnidirectional Visual-Inertial VIO can be substantially improved by increasing the Field of Odometry (OVIO) approach based on Multi-State Constraint View (FoV) of the camera. A larger FoV allows capturing Kalman Filtering (MSCKF) to estimate the ego-motion of a a larger extent of the environment, thereby increasing the moving platform. Instead of considering visual measurements on image plane, we use individual planes for each point that likelihood of usable regions in images for feature extraction are tangent to the unit sphere and normal to the corresponding and matching. Furthermore, a larger FoV provides a better measurement ray. This way, we combine spherical images cap- geometry for the pose estimation due to the fact that it tured by omnidirectional camera with inertial measurements is more likely to track features from different angles sur- within the filtering method MSCKF. The key hypothesis of rounding the moving camera. More importantly, a wider OVIO is that a wider field of view allows incorporating more visual features from the surrounding environment, thereby im- FoV allows features to be tracked over a longer sequence proving the accuracy and robustness of the motion estimation. of images. In theory, this increases the number of visual Moreover, by using an omnidirectional camera, it is less likely measurements, and consequently improves the precision and to end up in a situation where there is not enough texture. robustness of pose estimation. It is worth noting that larger We provide an evaluation of OVIO using synthetic and real FoV can solve the rotation- translation ambiguity which is a video sequences captured by a fish-eye camera, and compare the performance with MSCKF using a perspective camera. The serious problem with smaller FoVs. results show the superior performance of the proposed OVIO. Despite these clear advantages, until now only little at- tention has been paid to large FoV and omnidirectional I.INTRODUCTION cameras for visual inertial odometry. To fill this gap, this Pose estimation by means of visual sensors is becoming paper presents an Omnidirectional Visual-Inertial Odometry important in many applications including navigation and (OVIO) approach using Multi-State Constraint Kalman Fil- autonomous vehicles. The demand for accurate position tering (MSCKF) [5], which integrates visual information ex- information is more pronounced in areas where the Global tracted from an omnidirectional camera with the inertial mea- Positioning System (GPS) has no coverage, such as urban surements from an INS. Since the standard perspective model canyons. is not suitable for larger FoVs, we develop a measurement A common approach to visual navigation is Visual Odom- model in MSCKF that incorporates measurement residuals etry (VO), which uses only visual information to estimate on the unit sphere rather than on the image plane. Moreover, motion. This method is highly sensitive to the quality of we apply an omnidirectional model to calculate the global image observations. By combining visual features from a 3D position of features and propagate their uncertainties. video camera with inertial measurements from an Inertial We evaluate OVIO and compare its performance with the Navigation System (INS), the resilience in low-texture sit- conventional VIO within MSCKF through experiments in uations and accuracy can be substantially improved. This synthetic and real environments. technique, which is known as Visual-Inertial Odometry The paper is organized as follows: After a brief discussion (VIO), has been an active research area for more than a of the state-of-the-art VIO algorithms in section II, we intro- decade [1]–[4]. VIO employs visual measurements in order duce in section III our Omnidirectional Visual Inertial Odom- to compensate for INS errors within an estimation model. etry (OVIO) approach using MSCKF for pose estimation. In Today, most VIO systems use visual information from a section IV, we evaluate our approach and compare it with perspective camera with a limited Field of View (FoV). MSCKF using a perspective camera through experiments Therefore, achieving a robust positioning performance by in synthetic and real environments. Finally, conclusions are the state-of-the-art VIO approaches is still a challenge in drawn and directions for future research are discussed. environments that lack salient features due to illumination changes or insufficient texture in the entire image as well as II. RALATED WORK the presence of moving objects covering a significant area Algorithms for the integration of an exteroceptive sensor of the image of a perspective camera. The robustness of such as a camera with a proprioceptive inertial sensor are 1M. Ramezani and K. Khoshelham are with the Department known as Visual-Inertial Odometry (VIO) and have been of Infrastructure Engineering, The University of Melbourne, investigated through a vast body of literature in [1]- Australia [email protected], [5]. From the estimation point of view, VIO approaches can [email protected] 2L. Kneip is with the ShanghaiTech School of Information Science and be classified into optimization-based and filtering methods. Technology, Shanghai, China [email protected] In optimization methods, unknown parameters such as the sensor pose are estimated by iterative minimization of a least- In the last few years these cameras have gained popularity squares error function. In [6] the sensor pose, scene structure, in robotics [23], [25], and a substantial body of work has and other unknowns are estimated using all image and been devoted to modelling their imaging geometry [26], inertial measurements simultaneously. However, because this [27] and calibration [28]–[30]. Despite the advantages of method requires that all measurements from all images are omnidirectional cameras for ego-motion estimation, they available before the estimation, it is not suitable for real-time have been mostly used in VO [23]–[25], but rarely in VIO. applications. In order to solve the real-time operability issue, In [31] an effort was made in order to fuse omnidirectional Leutenegger et al. [7] bounded the size of the optimization vision with inertial sensors using an adaptive estimator. This window by marginalizing out old states. This method is method estimates the position and linear velocity of a mobile introduced as Sliding Window Filter (SWF) in [8]. The robot, but not other pose parameters such as orientation. shortcoming of this approach is that not all measurements of Furthermore, the performance of the algorithm has not been a seen feature are used, because multiple poses are discarded tested in a large scale environment. resulting in a loss of constraints. Overall the review of the literature shows a preference for In contrast to non-linear optimization, filtering methods perspective cameras in VO and VIO approaches. Motivated estimate sensor motion and other parameters recursively in a by the potential of omnidirectional cameras for pose esti- filter as measurements from INS and camera become avail- mation, this work presents an omnidirectional visual-inertial able. Therefore, this approach is appropriate for real-time odometry (OVIO) approach. applications even in large-scale environments with infinitely III.OMNIDIRECTIONALVISUAL-INERTIAL long image sequences [6]. A popular estimator in filtering ODOMETRY methods is the Extended Kalman Filter (EKF) [1], [2], [9]– [11]. In EKF-based methods, a state vector that comprises We begin by briefly describing the system model, and then the unknowns is first propagated in a linear system model introduce a new measurement model in the MSCKF based on and then is updated by feature measurements extracted from the tangential plane around the measurement ray on the unit the images. If the scene structure and the current INS pose sphere. After defining the new measurement model, we can are simultaneously estimated, the approach is called EKF- link the omnidirectional observations with the system model based Simultaneous Localization and Mapping (SLAM) to update the state vector and its covariance. A detailed [12]–[15]. An advantage of EKF-SLAM algorithms is that description of the MSCKF can be found in [1], [5]. the correlations between camera motion and 3D features are A. System model considered since camera poses and 3D positions of observed M × features are estimated together. However, correctly handling Let X be the (16+7 ) 1 EKF state vector defined as: these correlations imposes computational complexity that BT T E T T E T compromises this technique in an environment with many X = [qE bg vB ba pB ··· qC1T EpT ··· qCiT EpT ··· qCM T EpT ]T features [1]. E C1 E Ci E CM There are several filtering algorithms in which, contrary to (1) B SLAM, the global position of features is not jointly estimated where qE is the 4 × 1 unit quaternion representing the in the state vector [5], [16]–[21]. These methods utilize the orientation of the INS resolved in the body frame {B}, E E visual measurements to impose epipolar constraints between vectors vB, and pB are the linear velocity and the either pairs of images [16]–[18] or multiple camera poses [5], position of the INS w.r.t. the Earth frame {E}, respectively, [22]. Amongst these methods, Multi-State Constraint Kalman bg and ba are the 3 × 1 vectors describing the gyroscope and Filtering (MSCKF) [5] benefits from the entire geometric accelerometer biases, respectively. The remaining parameters Ci constraints that a feature produces when seen from multiple are M camera poses that include the camera quaternions, qE , camera poses. By employing all these geometric constraints, relative to the camera frame {Ci}, i = 1 ··· M and positions, E {E} more accurate estimates of the camera poses can be obtained. pCi ,in the Earth frame . Following equation (1), the What all vision-aided navigation systems have in common (15 + 6M) × 1 error-state vector is: is that they are highly dependent on visual information observed from the environment. If the visible scene lacks T T E T T E T δX = [δθB δbg δvB δba δpB ··· sufficiently salient features, or moving entities occupy large δθT EδpT ··· δθT EδpT ··· δθT EδpT ]T C1 C1 Ci Ci CM CM parts of the image, these approaches cannot provide accurate (2) positioning. The aforementioned issues are more likely to where δθB, δθCi is the INS rotation perturbation in the body occur if the camera is limited to a small FoV [23]. On frame and the camera rotation perturbation in the camera the contrary, larger FoV cameras (e.g. using a fish-eye lens frame respectively. The rotation perturbation can be linearly or catadioptric system) have the potential to achieve more approximated from the quaternion perturbation [32]. B E E accurate ego-motion estimation, because they allow visual The INS related components (qE , vB, pB, bg, ba) in features to be tracked over a longer image sequence [24]. the state vector are propagated by the following equations: ◦ ◦ Cameras that have a FoV of 360 in azimuth and 90 to 1 ◦ E E E E ˙ ˙ 140 p˙ = vB, v˙ B = aB, q˙ = q ⊗ω, ba = 0, bg = 0 in elevation [6] are known as omnidirectional cameras B B 2 B and include fish eye lenses or catadioptric optical systems. (3) In these expressions the sign ⊗ indicates quaternion multi- plication between the INS orientation (quaternion) and the angular velocity of the INS, ω. The estimate of the angular velocity, ωˆ is obtained from the measured angular velocity, ωm, and the estimated gyroscope bias, bˆg:

ωˆ ≈ ωm − bˆg (4) E The vector aB is the INS acceleration in the Earth frame that is estimated from the measured INS acceleration, am, by: E E ˆ E aˆB ≈ CB(am − ba) + g (5) E where CB is the rotation matrix between body frame to Earth frame, bˆa is the estimated vector of acceleration bias, and gE is the gravity vector in the Earth frame. When a new image is captured, the camera exterior parameters are appended to the state and can be estimated from the INS orientation and position by: Fig. 1: Different components of projection in a single view-point qˆCi = qC ⊗ qˆB, Epˆ = Epˆ + CE Bp (6) E B E Ci B B C omnidirectional camera and the triangulation between view C1 and j 3 s where entities qC , and Bp are the calibration parameters Ci. The scene point X ∈ R is measured at location xij on the B C 00 j between the INS and the camera expressing the misalignment unit sphere. The vector u is the mapped coordinate of X on the sensor plane so that u00 is along x00 and p00 is the vector of this and the displacement respectively. point on the mirror or the lens. By reprojecting the coordinate of Having added the camera pose to the state vector, the the scene point calculated from triangulation in the camera frame, s propagated covariance is augmented according to the Jaco- ˜xij , we obtain the residual rij in the tangential plane. bian matrix driven from equations (6) w.r.t. the state vector. This way, the MSCKF estimates motion by directly imposing geometric constraints on multiple camera poses, M. where the coefficients ai, i = 1 ··· N are calculated through B. Measurement Model the camera calibration. In the conventional MSCKF, the measurement model is Amongst several omnidirectional camera models [28]– described on the image plane [5]. This plane is unsuited for [30], we choose the Scaramuzza model to describe the highly distorted images acquired by omnidirectional cameras. relation between the image point and the normalized point Therefore, we define the measurement model on a plane on the unit sphere. The advantage of this model is its around each measurement ray that is tangent to the unit flexibility with different types of central omnidirectional sphere. As it can be seen in Fig. 1, central omnidirectional camera models fitting a polynomial to the non-linear function to find the relation between the image point and the point cameras have a single viewpoint, Ci, i = 1 ··· Nj, that all N on the mirror or lens. rays pass through. j is the number of camera poses from j which the scene point Xj is observed. The normalized ray Let us assume that the feature X is observed from a set direction, xs is the mapping of the scene point to the unit of Nj camera poses, Ci , i ∈ Sj . The normalized point, ij s j sphere. Collinear with this vector, there exists a 3D vector, xij, is the unit spherical point of X with respect to view Ci p00, which is the map of the ray to the mirror in catadioptric which is given by: cameras or the fish-eye lens in dioptric systems. This vector  s  xij  00  can be written as: s s uij xij = yij  = N( 00 ) (9)  u00  g(||u ||) p00 = (7) zs ij g(||u00||) ij where u00 is the corresponding point of Xj on the sensor where N(.) maps points from the mirror or lens onto the plane that can be obtained from the pixel coordinate u0 unit sphere to spherically normalize them. 00 0 2 under an affine transformation u = Au + t, A ∈ R , t ∈ x 00 xs = N(x) = (10) R, and g(||u ||) is a non-linear function of the Euclidean ||x|| distance ||u00|| of the point u00 to the sensor axis origin varying with respect to the type of the refraction in dioptric Considering the covariance of point on the sensor plane 2 or the reflection in catadioptric systems. This function can as Qu00u00 = σ I2, the covariance matrix of the spherically be replaced by a generic omnidirectional-camera model normalized point varies depending on the location of the proposed by Scaramuzza et al. [30]. point on the sensor plane. According to [33], it is given by: N P  00 X 00 i 00 00 0 g(||u ||) = ai||u || (8) s u u sT Qxsxs = J T J (11) i=0 0 0 where matrix Js is the Jacobian given by: where δX is the error-state vector described in the system ∂ s model, wij is the vector of measurement noise. The Jacobian s x s sT j j J = = (I3 − x x )/||x|| (12) matrices H , and H are given by the following equations: ∂x Xi fi Now, using the normalized ray direction, we formulate the j h j i H = 0 JT(x˜s )CCi b(Xˆ )∧c −JT(x˜s )CCi Xˆ j 0 collinearity equations: Xi s ij E 0 s ij E h (20) xs = N([I |0 ]M−1Xsj) (13) where the dimension of the left and right zeros are 2×(15+ ij 3 3,1 Ci 6(i−1)) and 2×6(M −i), with M number of camera poses where Xs = [X ; X ] is the homogeneous vector of the 0 h augmented in the state. world point, and the matrix MCi is the camera motion which consists of the rotation matrix of the Earth frame with respect j T s −1 ˆ sjT Hf = Js (x˜ij)[I3|03,1]MC null(X ) (21) Ci i i to the camera system, CE and the position of the projection E center in the Earth frame, PC : Now, we stack the residuals rij, pertaining to the obser- i j vations of the feature X for i ∈ j, to form a single block  CiT E  S CE PCi MC = (14) vector: i 0T 1 j j sj 3,1 rj = HXδX + Hf δXr + wj (22) Equation (13) consists of three collinearity equations per where Hj and Hj are the block matrices of Hj and Hj measured camera ray and four arrays for the feature point. X f Xi fi respectively. The measurement noise covariance matrix, R is Due to the redundancy in equation (13), the covariance j a block diagonal matrix including the individual covariance matrices corresponding to the spherical measurements are matrices, R , i = 1 ··· N given by [33]: singular. Therefore before they can be used in the mea- ij j surement model, they must be reduced to the minimum P u00u00 0 R = JT(x˜s ) J (x˜s ) (23) parameters using [33]: ij s ij 0T 0 s ij  T sT T s r s Finally, the standard measurement model that can be used xr = (null(x˜ )) x = x (15) sT in EKF update is achieved by projecting the equation (22) on j where r and s express the 2D tangential plane of the unit to the left null space of the matrix Hf . The reduced equation sphere evaluated at the reprojected point x˜s. is: By mapping the omnidirectional measurements onto the rrj = HrjδX + wrj (24) tangential plane as well as their corresponding reprojected where points, and assuming that the noise has a zero-mean Gaussian jT T distribution, and is uncorrelated to the system model, we rrj = (null(Hf )) rj are now able to determine the residuals and linearize them jT T Hrj = (null(Hf )) HXj (25) around the reprojected points on the tangential plane. The w = (null(HjT))Tw reprojected points are obtained from their 3D estimated po- rj f j sition w.r.t the Earth frame. The linearization of the residuals Equation (24) is a standard linear measurement model that can be written as: is used in an EKF estimation to update the error-state, δX, j T s Ci ˆ as well as the covariance. rij = Js (x˜ij)CE b(X0)∧cδθCi T s Ci ˆ j E IV. EXPERIMENTS − Js (x˜ij)CE Xh δPCi (16) T s −1 sjT sj We evaluate our Omnidirectional Visual-Inertial Odometry + J (x˜ )[I3|03,1]M null(Xˆ )δX s ij Ci r (OVIO) using a synthetic dataset and a real dataset. We E In the preceding equation, vectors δθCi , and δPCi are compare the performance of the OVIO with the MSCKF-VIO the perturbation of the camera pose that are augmented in in terms of the accuracy of the reconstructed trajectories. In sj the MSCKF, the vector δXr is the error of the reduced both experiments we measure the accuracy by computing coordinate of the spherical scene feature Xj. The symbol the Root Mean Squared Error (RMSE) which represents the ˆ j ˆ j difference between the estimated pose and the ground truth. b(X0)∧c is the skew-symmetric of X0:  ˆj ˆ j  A. Synthetic Dataset 0 −Z0 Y0 ˆ j ˆj ˆ j b(X0)∧c =  Z0 0 −X0  (17) In order to examine the effect of a larger FoV in the ˆ j ˆ j −Y0 X0 0 performance of VIO, we used the photorealistic synthetic images of the urban canyon dataset generated by Zhang et The Jacobian matrix J (x˜s ) is also given by: s ij al. [24] in Blender. Two sequences of 1500 images each, T ◦ s 1 xx T one captured by a simulated perspective camera (90 FoV) Js(x˜ ) = (I3 − )null(x )|x=˜xs (18) ◦ ij ||x|| xTx ij and one by a simulated fish-eye camera (180 FOV) along a trajectory of an approximate length of 295 m were used. Now, equation (16) can be rewritten as: The resolution of these images is 640 × 480 pixels and r = Hj δX + Hj δXsj + w (19) ij Xi fi r ij the frequency is 30 Hz. The synthetic dataset does not include INS measurements. Therefore, we generated the INS measurements from the ground truth comprising the 3D position and 4D quaternion of each image. Since the ground truth sampled at 30 Hz in the software is not sufficient for simulating INS measurements, we synthesized the INS accelerations and angular velocities using the ground truth resampled at 300 Hz. For feature extraction and matching, we use the SURF algorithm [34] in both the perspective and fish-eye images. Having extracted salient image features, we track the cor- responding features along the image sequence by utilizing Kanade-Lucas-Tomasi (KLT) tracker [35]. It is worth noting that SURF feature detector is not the best choice used in feature tracking especially in real-time pose estimation. Comparing various feature detectors, we use this detector in order to have the best feature detection in terms of robustness and well-distribution. This way, the effect of poor conditions Fig. 2: 2D top view of the estimated trajectories from different caused by weak feature detection will be minimized on the integrations in global frame. pose estimation. To estimate the global position of features, an inverse- depth Gauss-Newton parametrization [36] is used. The ad- vantage of this method is avoiding local minimization [5]. We use ray direction as observation on the spherical images [37]. Having observed a scene point in an image sequence, we select the first and the last measurements for the triangulation to attain the best initialization of 3D position, then other frames between are added in order to find the optimum solution in an iterative fashion. The planar trajectories estimated from different integra- Fig. 3: Left: Cumulative distribution of the rotational errors for the tions are shown in Fig. 2. As it can be observed, while synthetic dataset. Right: Cumulative distribution of the translational errors on the same trajectory. the drift of INS integration is significant, the reconstructed trajectories of MSCKF with perspective and fish-eye images closely follow the ground truth. In the following, we compare of the spherical and perspective MSCKF as measured by the only the performance of the proposed spherical MSCKF with total error, defined as the root mean square of RMSEs, and the perspective MSCKF. ARMSE, defined as the average of RMSEs along the trajec- To compare the accuracy of our method with the per- tory. It can be seen that the spherical MSCKF outperforms spective VIO, we computed the cumulative error distribution the perspective MSCKF in terms of both average RMSE and curves for rotational and translational errors measured by total error. Considering the length of the trajectory and the RMSE at each time-step. Fig. 3 shows the result. As it total error for translation, the spherical MSCKF achieves can be seen, overall, OVIO yields more accurate position a translational accuracy of 0.8% of the trajectory length, and rotation estimates than VIO. In particular, a comparison compared to 1.2% for the perspective. of the translational errors of the two methods reveals that the spherical MSCKF results in a maximum error that is B. Real Dataset only half of that of the perspective MSCKF. Also, from the To test the OVIO algorithm in a real environment, we rotational graph we can see that the spherical MSCKF results strapped our sensor board, as shown in Fig. 4, on top of a in a larger percentage of smaller errors compared to the car and drove in a built up area near the Parkville campus perspective MSCKF. Table I summarizes the performance of the University of Melbourne. The sensor board included a usb3 fish-eye camera providing 185◦ FoV installed to be TABLE I: COMPARISON OF AVERAGE ROOT MEAN upward-looking. This way, we remove the negative effect of SQUARED ERROR (ARMSE) AND TOTAL ERROR OF mobile objects (e.g. vehicles, pedestrians) that exist in urban THE PROPOSED SPHERICAL MSCKF WITH PERSPECTIVE environments. Moreover, tracking features from both sides MSCKF. of the motion trajectory could provide better geometry to Spherical MSCKF Perspective MSCKF obtain more accurate position and rotation estimates. It is Trans.ARMSE(m) 1.31 2.86 Rot.ARMSE(Rad) 0.073 0.095 also worth noting that an upward-looking camera ensures Total Trans.Error(m) 2.3 3.5 sufficient difference between the motion heading and the Total Rot.Error(Rad) 0.08 0.11 camera optical axis resulting in an imaging geometry which Fig. 5: Left: A raw image captured by the used fish-eye camera looking upward. Right: Features tracked between two consecutive images.

Fig. 4: The sensors used in our real experiment. We installed the sensors on a board that is attached on the roof of car using rubber magnets.

Fig. 6: Left: 2D top view of the estimated trajectories from different is more suitable for vision-based positioning [38]. However, integrations in global frame. Right: The comparison between the using a camera facing upward could decrease the number of estimated trajectory and the ground truth for the areas that GPS has given accurate positioning solution. tracked features in areas with short buildings, even though the camera has a fish-eye lens. A sample image of the real dataset and the tracked features overlaid on the Google Earth is shown. The trajectory from both sides of the road between two consecutive images estimated by the OVIO and the ground truth for the area are shown in Fig. 5. The resolution of images is 1280×720 where precise GPS was available is shown in Fig. 6, right. and they are recorded at about 20 Hz. In total, 5260 sequen- The total RMSE over this part of the trajectory, which is tial images were captured along a 1.3 km-long trajectory. For 600 m long, is about 2 m. Thus, the relative accuracy is feature tracking, we used the same algorithm as we did for 0.33% that shows the position can be kept less than 0.5 m the synthetic environment. On average, features were tracked within 150 meter GPS outage. This accuracy can respond in 30 sequential images. Therefore, the maximum number to the positioning demand of many outdoor location-based for camera poses in the state was set as 30. This value applications. is appropriate to employ the constraints between multiple camera poses in the state vector in MSCKF. V.CONCLUSIONSANDFUTUREWORK The inertial measurements consisting of accelerations and In this paper, we proposed the integration of INS with angular velocities in 3 dimensions were obtained at a rate omnidirectional cameras using the MSCKF, which is named of 100 Hz by utilizing MMQG -INS [39]. GPS observations Omnidirectional Visual-Inertial Odometry (OVIO). Because were collected and used for the initialization of the INS. the original MSCKF has been developed for perspective However raw inertial measurements were stored without cameras, the measurement model is unsuited for highly being corrected by GPS. In order to synchronize the inertial distorted images captured by omnidirectional camera. We measurements with the images, we recorded the local time- introduced a new measurement model projecting residuals stamp of the images and precisely converted them to the onto the tangential plane around each ray. Our experiment GPS time considering the difference between Coordinated on synthetic data shows the superiority of the OVIO over Universal Time (UTC) and the local time as well as the conventional MSCKF-VIO uses a perspective camera. We leap seconds which was 17 on the date of the experiment. evaluated the OVIO approach using a real world video stream This allows comparing the absolute accuracy of the vehicle captured by an upward looking 185 fish-eye camera on top motion estimated by the proposed OVIO with the ground of a car. The result shows a small 0.33% drift of the OVIO truth obtained by the surveying grade GPS antenna, GS from the ground truth over the travelled trajectory. This 15 (See in Fig. 4), and improved by post processing. The implies that the OVIO approach can be used in many location frequency of the ground truth is 1 Hz resulting in 242 points based applications especially autonomous navigation in areas with centimeter level accuracy for the evaluation. where GPS cannot provide precise positioning. Future work In Fig. 6, left, the top view of the estimated trajectory will focus on integrating City Information Modelling (CIM) with the OVIO to remove the drift and maintain the accuracy [21] Q. F. Ke, T. Kanade, R. Prazenica, and A. Kurdila, “Vision-based for longer trajectories. kalman filtering for aircraft state estimation and structure from mo- tion,” in AIAA Guidance, Navigation, and Control Conference and ACKNOWLEDGMENT Exhibit, 2007, p. 6003. [22] D. Nister,´ O. Naroditsky, and J. Bergen, “Visual odometry for ground This work is supported by a Research Engagement Grant vehicle applications,” Journal of Field Robotics, vol. 23, no. 1, pp. from the Melbourne School of Engineering, Melbourne Re- 3–20, 2006. [23] D. Caruso, J. Engel, and D. Cremers, “Large-scale direct slam for search Scholarship. omnidirectional cameras,” in Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE, 2015, pp. 141– REFERENCES 148. [24] Z. Zhang, H. Rebecq, C. Forster, and D. Scaramuzza, “Benefit of [1] A. I. Mourikis and S. I. Roumeliotis, “A multi-state constraint kalman large field-of-view cameras for visual odometry,” in Robotics and filter for vision-aided inertial navigation,” in Robotics and automation, Automation (ICRA), 2016 IEEE International Conference on. IEEE, 2007 IEEE international conference on. IEEE, 2007, pp. 3565–3572. 2016, pp. 801–808. [2] J. Kim and S. Sukkarieh, “Real-time implementation of airborne [25] J. C. Bazin, C. Demonceaux, P. Vasseur, and I. Kweon, “Motion esti- inertial-slam,” Robotics and Autonomous Systems, vol. 55, no. 1, pp. mation by decoupling rotation and translation in catadioptric vision,” 62–71, 2007. and Image Understanding, vol. 114, no. 2, pp. 254– [3] M. Li and A. I. Mourikis, “High-precision, consistent ekf-based visual- 273, 2010. inertial odometry,” The International Journal of Robotics Research, [26] J. P. Barreto and H. Araujo, “Issues on the geometry of central vol. 32, no. 6, pp. 690–711, 2013. catadioptric image formation,” in Computer Vision and Pattern Recog- [4] C. Forster, L. Carlone, F. Dellaert, and D. Scaramuzza, “Imu preinte- nition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer gration on manifold for efficient visual-inertial maximum-a-posteriori Society Conference on, vol. 2. IEEE, 2001, pp. II–II. estimation.” Georgia Institute of Technology, 2015. [27] C. Geyer and K. Daniilidis, “A unifying theory for central panoramic [5] A. I. Mourikis, “A multi-state constraint kalman filter for systems and practical implications,” in European conference on com- vision-aided inertial navigation,” Dept. of Computer Science puter vision. Springer, 2000, pp. 445–461. and Engineering, University of Minnesota, Tech. Rep., 2006, [28] J. Kannala and S. S. Brandt, “A generic camera model and calibration www.cs.umn.edu/∼mourikis/tech reports/TR MSCKF.pdf. method for conventional, wide-angle, and fish-eye lenses,” IEEE [6] D. Strelow and S. Singh, “Motion estimation from image and iner- transactions on pattern analysis and machine intelligence, vol. 28, tial measurements,” The International Journal of Robotics Research, no. 8, pp. 1335–1340, 2006. vol. 23, no. 12, pp. 1157–1195, 2004. [29] C. Mei and P. Rives, “Single view point omnidirectional camera [7] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale, calibration from planar grids,” in Robotics and Automation, 2007 IEEE “Keyframe-based visual–inertial odometry using nonlinear optimiza- International Conference on. IEEE, 2007, pp. 3945–3950. tion,” The International Journal of Robotics Research, vol. 34, no. 3, [30] D. Scaramuzza, A. Martinelli, and R. Siegwart, “A flexible technique pp. 314–334, 2015. for accurate omnidirectional camera calibration and structure from mo- [8] G. Sibley, L. Matthies, and G. Sukhatme, “Sliding window filter with tion,” in Computer Vision Systems, 2006 ICVS’06. IEEE International application to planetary landing,” Journal of Field Robotics, vol. 27, Conference on. IEEE, 2006, pp. 45–45. no. 5, pp. 587–608, 2010. [31] L. Li, Y.-H. Liu, K. Wang, and M. Fang, “A simple algorithm for po- [9] A. Huster, E. W. Frew, and S. M. Rock, “Relative position estimation sition and velocity estimation of mobile robots using omnidirectional for auvs by fusing bearing and inertial rate sensor measurements,” in vision and inertial sensors,” in Robotics and Biomimetics (ROBIO), OCEANS’02 MTS/IEEE, vol. 3. IEEE, 2002, pp. 1863–1870. 2014 IEEE International Conference on. IEEE, 2014, pp. 1235– [10] S. You and U. Neumann, “Fusion of vision and gyro tracking for robust 1240. registration,” in Virtual Reality, 2001. Proceedings. [32] J. Sola, “Quaternion kinematics for the error-state kf,” Labora- IEEE. IEEE, 2001, pp. 71–78. toire dAnalyse et dArchitecture des Systemes-Centre national de la [11] D. G. Kottas and S. I. Roumeliotis, “Exploiting urban scenes for recherche scientifique (LAAS-CNRS), Toulouse, France, Tech. Rep, vision-aided inertial navigation.” in Robotics: Science and Systems, 2012. 2013. [33] W. Forstner,¨ “Minimal representations for testing and estimation in [12] J. W. Langelaan, “State estimation for autonomous flight in cluttered projective spaces,” Photogrammetrie-Fernerkundung-Geoinformation, environments,” 2006. vol. 2012, no. 3, pp. 209–220, 2012. [13] D. Strelow, “Motion estimation from image and inertial measure- [34] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust ments,” Ph.D. dissertation, Carnegie Mellon University, 2004. features (surf),” Computer vision and image understanding, vol. 110, [14] L. Ong, M. Ridley, J.-H. Kim, E. Nettleton, S. Sukkarieh, et al., “Six no. 3, pp. 346–359, 2008. dof decentralised slam,” in Australasian Conference on Robotics and [35] C. Tomasi and T. Kanade, “Detection and tracking of point features,” Automation, 2003, pp. 10–16. 1991. [15] E. Eade and T. Drummond, “Scalable monocular slam,” in Proceedings [36] J. M. Montiel, J. Civera, and A. J. Davison, “Unified inverse depth of the 2006 IEEE Computer Society Conference on Computer Vision parametrization for monocular slam.” Robotics: Science and Systems, and Pattern Recognition-Volume 1. IEEE Computer Society, 2006, 2006. pp. 469–476. [37] A. Torii, A. Imiya, and N. Ohnishi, “Two-and three-view geometry [16] S. I. Roumeliotis, A. E. Johnson, and J. F. Montgomery, “Augmenting for spherical cameras,” in Proceedings of the sixth workshop on inertial navigation with image-based motion estimation,” in Robotics omnidirectional vision, camera networks and non-classical cameras. and Automation, 2002. Proceedings. ICRA’02. IEEE International Citeseer (cf. p. 81), 2005. Conference on, vol. 4. IEEE, 2002, pp. 4326–4333. [38] J. Oliensis, “A new structure-from-motion ambiguity,” IEEE Transac- [17] D. D. Diel, P. DeBitetto, and S. Teller, “Epipolar constraints for vision- tions on Pattern Analysis and Machine Intelligence, vol. 22, no. 7, pp. aided inertial navigation,” in Application of Computer Vision, 2005. 685–700, 2000. WACV/MOTIONS’05 Volume 1. Seventh IEEE Workshops on, vol. 2. [39] “Systron donner inertial,” http://www.unmannedsystemstechnology.c- IEEE, 2005, pp. 221–228. om/company/systron-donner-inertial/mmqg-ins-gps/, Accessed: 2017- [18] D. S. Bayard and P. B. Brugarolas, “An estimation algorithm for 02-20. vision-based exploration of small bodies in space,” in American Control Conference, 2005. Proceedings of the 2005. IEEE, 2005, pp. 4589–4595. [19] S. Soatto, R. Frezza, and P. Perona, “Motion estimation via dynamic vision,” IEEE Transactions on Automatic Control, vol. 41, no. 3, pp. 393–413, 1996. [20] S. Soatto and P. Perona, “Recursive 3-d visual motion estimation using subspace constraints,” International Journal of Computer Vision, vol. 22, no. 3, pp. 235–259, 1997.

View publication stats