Omnidirectional Visual-Inertial Odometry Using Multi-State Constraint Kalman Filter

Omnidirectional Visual-Inertial Odometry Using Multi-State Constraint Kalman Filter Milad Ramezani 1 and Kourosh Khoshelham 1 and Laurent Kneip 2 Abstract— We present an Omnidirectional Visual-Inertial VIO can be substantially improved by increasing the Field of Odometry (OVIO) approach based on Multi-State Constraint View (FoV) of the camera. A larger FoV allows capturing Kalman Filtering (MSCKF) to estimate the ego-motion of a a larger extent of the environment, thereby increasing the moving platform. Instead of considering visual measurements on image plane, we use individual planes for each point that likelihood of usable regions in images for feature extraction are tangent to the unit sphere and normal to the corresponding and matching. Furthermore, a larger FoV provides a better measurement ray. This way, we combine spherical images cap- geometry for the pose estimation due to the fact that it tured by omnidirectional camera with inertial measurements is more likely to track features from different angles sur- within the filtering method MSCKF. The key hypothesis of rounding the moving camera. More importantly, a wider OVIO is that a wider field of view allows incorporating more visual features from the surrounding environment, thereby im- FoV allows features to be tracked over a longer sequence proving the accuracy and robustness of the motion estimation. of images. In theory, this increases the number of visual Moreover, by using an omnidirectional camera, it is less likely measurements, and consequently improves the precision and to end up in a situation where there is not enough texture. robustness of pose estimation. It is worth noting that larger We provide an evaluation of OVIO using synthetic and real FoV can solve the rotation- translation ambiguity which is a video sequences captured by a fish-eye camera, and compare the performance with MSCKF using a perspective camera. The serious problem with smaller FoVs. results show the superior performance of the proposed OVIO. Despite these clear advantages, until now only little at- tention has been paid to large FoV and omnidirectional I. INTRODUCTION cameras for visual inertial odometry. To fill this gap, this Pose estimation by means of visual sensors is becoming paper presents an Omnidirectional Visual-Inertial Odometry important in many applications including navigation and (OVIO) approach using Multi-State Constraint Kalman Fil- autonomous vehicles. The demand for accurate position tering (MSCKF) [5], which integrates visual information ex- information is more pronounced in areas where the Global tracted from an omnidirectional camera with the inertial mea- Positioning System (GPS) has no coverage, such as urban surements from an INS. Since the standard perspective model canyons. is not suitable for larger FoVs, we develop a measurement A common approach to visual navigation is Visual Odom- model in MSCKF that incorporates measurement residuals etry (VO), which uses only visual information to estimate on the unit sphere rather than on the image plane. Moreover, motion. This method is highly sensitive to the quality of we apply an omnidirectional model to calculate the global image observations. By combining visual features from a 3D position of features and propagate their uncertainties. video camera with inertial measurements from an Inertial We evaluate OVIO and compare its performance with the Navigation System (INS), the resilience in low-texture sit- conventional VIO within MSCKF through experiments in uations and accuracy can be substantially improved. This synthetic and real environments. technique, which is known as Visual-Inertial Odometry The paper is organized as follows: After a brief discussion (VIO), has been an active research area for more than a of the state-of-the-art VIO algorithms in section II, we intro- decade [1]–[4]. VIO employs visual measurements in order duce in section III our Omnidirectional Visual Inertial Odom- to compensate for INS errors within an estimation model. etry (OVIO) approach using MSCKF for pose estimation. In Today, most VIO systems use visual information from a section IV, we evaluate our approach and compare it with perspective camera with a limited Field of View (FoV). MSCKF using a perspective camera through experiments Therefore, achieving a robust positioning performance by in synthetic and real environments. Finally, conclusions are the state-of-the-art VIO approaches is still a challenge in drawn and directions for future research are discussed. environments that lack salient features due to illumination changes or insufficient texture in the entire image as well as II. RALATED WORK the presence of moving objects covering a significant area Algorithms for the integration of an exteroceptive sensor of the image of a perspective camera. The robustness of such as a camera with a proprioceptive inertial sensor are 1M. Ramezani and K. Khoshelham are with the Department known as Visual-Inertial Odometry (VIO) and have been of Infrastructure Engineering, The University of Melbourne, investigated through a vast body of literature in robotics [1]- Australia [email protected], [5]. From the estimation point of view, VIO approaches can [email protected] 2L. Kneip is with the ShanghaiTech School of Information Science and be classified into optimization-based and filtering methods. Technology, Shanghai, China [email protected] In optimization methods, unknown parameters such as the sensor pose are estimated by iterative minimization of a least- In the last few years these cameras have gained popularity squares error function. In [6] the sensor pose, scene structure, in robotics [23], [25], and a substantial body of work has and other unknowns are estimated using all image and been devoted to modelling their imaging geometry [26], inertial measurements simultaneously. However, because this [27] and calibration [28]–[30]. Despite the advantages of method requires that all measurements from all images are omnidirectional cameras for ego-motion estimation, they available before the estimation, it is not suitable for real-time have been mostly used in VO [23]–[25], but rarely in VIO. applications. In order to solve the real-time operability issue, In [31] an effort was made in order to fuse omnidirectional Leutenegger et al. [7] bounded the size of the optimization vision with inertial sensors using an adaptive estimator. This window by marginalizing out old states. This method is method estimates the position and linear velocity of a mobile introduced as Sliding Window Filter (SWF) in [8]. The robot, but not other pose parameters such as orientation. shortcoming of this approach is that not all measurements of Furthermore, the performance of the algorithm has not been a seen feature are used, because multiple poses are discarded tested in a large scale environment. resulting in a loss of constraints. Overall the review of the literature shows a preference for In contrast to non-linear optimization, filtering methods perspective cameras in VO and VIO approaches. Motivated estimate sensor motion and other parameters recursively in a by the potential of omnidirectional cameras for pose esti- filter as measurements from INS and camera become avail- mation, this work presents an omnidirectional visual-inertial able. Therefore, this approach is appropriate for real-time odometry (OVIO) approach. applications even in large-scale environments with infinitely III. OMNIDIRECTIONAL VISUAL-INERTIAL long image sequences [6]. A popular estimator in filtering ODOMETRY methods is the Extended Kalman Filter (EKF) [1], [2], [9]– [11]. In EKF-based methods, a state vector that comprises We begin by briefly describing the system model, and then the unknowns is first propagated in a linear system model introduce a new measurement model in the MSCKF based on and then is updated by feature measurements extracted from the tangential plane around the measurement ray on the unit the images. If the scene structure and the current INS pose sphere. After defining the new measurement model, we can are simultaneously estimated, the approach is called EKF- link the omnidirectional observations with the system model based Simultaneous Localization and Mapping (SLAM) to update the state vector and its covariance. A detailed [12]–[15]. An advantage of EKF-SLAM algorithms is that description of the MSCKF can be found in [1], [5]. the correlations between camera motion and 3D features are A. System model considered since camera poses and 3D positions of observed M × features are estimated together. However, correctly handling Let X be the (16+7 ) 1 EKF state vector defined as: these correlations imposes computational complexity that BT T E T T E T compromises this technique in an environment with many X = [qE bg vB ba pB ··· qC1T EpT ··· qCiT EpT ··· qCM T EpT ]T features [1]. E C1 E Ci E CM There are several filtering algorithms in which, contrary to (1) B SLAM, the global position of features is not jointly estimated where qE is the 4 × 1 unit quaternion representing the in the state vector [5], [16]–[21]. These methods utilize the orientation of the INS resolved in the body frame fBg, E E visual measurements to impose epipolar constraints between vectors vB, and pB are the linear velocity and the either pairs of images [16]–[18] or multiple camera poses [5], position of the INS w.r.t. the Earth frame fEg, respectively, [22]. Amongst these methods, Multi-State Constraint Kalman bg and ba are the 3 × 1 vectors describing the gyroscope and Filtering (MSCKF) [5] benefits from the entire geometric accelerometer biases, respectively. The remaining parameters Ci constraints that a feature produces when seen from multiple are M camera poses that include the camera quaternions, qE , camera poses. By employing all these geometric constraints, relative to the camera frame fCig, i = 1 ··· M and positions, E fEg more accurate estimates of the camera poses can be obtained. pCi ,in the Earth frame . Following equation (1), the What all vision-aided navigation systems have in common (15 + 6M) × 1 error-state vector is: is that they are highly dependent on visual information observed from the environment.

Load more