<<

1

AI-IMU Dead-Reckoning

∗ † ∗ Martin BROSSARD , Axel BARRAU and Silvere` BONNABEL ∗MINES ParisTech, PSL Research University, Centre for Robotics, 60 Bd Saint-Michel, 75006 Paris, France †Safran Tech, Groupe Safran, Rue des Jeunes Bois-Chateaufort,ˆ 78772, Magny Les Hameaux Cedex, France

Abstract—In this paper we propose a novel accurate method for dead-reckoning of wheeled vehicles based only on an Inertial 400 benchmark IMU proposed Measurement Unit (IMU). In the context of intelligent vehicles, robust and accurate dead-reckoning based on the IMU may prove useful to correlate feeds from imaging sensors, to safely navigate through obstructions, or for safe emergency stops in the extreme case of exteroceptive sensors failure. The key components of 200 the method are the Kalman filter and the use of deep neural GPS outage networks to dynamically adapt the noise parameters of the filter. The method is tested on the KITTI odometry dataset, and our end only

dead-reckoning inertial method based on the IMU accurately ) 0

estimates 3D position, velocity, orientation of the vehicle and m self-calibrates the IMU biases. We achieve on average a 1.10% ( y start translational error and the algorithm competes with top-ranked methods which, by contrast, use LiDAR or stereo vision. We make our implementation open-source at: 200 − https://github.com/mbrossar/ai-imu-dr

Index Terms—localization, deep learning, invariant extended Kalman filter, KITTI dataset, inertial , inertial mea- 400 surement unit −

I.INTRODUCTION 0 200 400 600 800 NTELLIGENT vehicles need to know where they are x (m) located in the environment, and how they are moving throughI it. An accurate estimate of vehicle dynamics allows validating information from imaging sensors such as lasers, ul- Fig. 1. Trajectory results on seq. 08 (drive #28, 2011/09/30) [3] of the KITTI trasonic systems, and video cameras, correlating the feeds, and dataset. The proposed method (green) accurately follows the benchmark also ensuring safe motion throughout whatever may be seen trajectory for the entire sequence (4.2 km, 9 min), whereas the pure IMU integration (cyan) quickly diverges. Both methods use only IMU signals and along the road [1]. Moreover, in the extreme case where an are initialized with the benchmark pose and velocity. We see during the GPS emergency stop must be performed owing to severe occlusions, outage that occurs in this sequence, our solution keeps estimating accurately lack of texture, or more generally imaging system failure, the the trajectory. vehicle must be able to assess accurately its dynamical motion. For all those reasons, the Inertial Measurement Unit (IMU) arXiv:1904.06064v1 [cs.RO] 12 Apr 2019 appears as a key component of intelligent vehicles [2]. Note adapt to the motion, e.g., lateral slip is larger in bends than that Global Navigation Satellite System (GNSS) allows for in straight lines. Using the recent tools from the field of global position estimation but it suffers from phase tracking Artificial Intelligence (AI), namely deep neural networks, we loss in densely built-up areas or through tunnels, is sensitive to propose a method to automatically learn those parameters and jamming, and may not be used to provide continuous accurate their dynamic adaptation for IMU only dead-reckoning. Our and robust localization information, as exemplified by a GPS contributions, and the paper’s organization, are as follows: outage in the well known KITTI dataset [3], see Figure 1. • we introduce a state-space model for wheeled vehicles as Kalman filters are routinely used to integrate the outputs well as simple assumptions about the motion of the car of IMUs. When the IMU is mounted on a car, it is common in Section II; practice to make the Kalman filter incorporate side information • we implement a state-of-the-art Kalman filter [4,5] that about the specificity of wheeled vehicle dynamics, such as exploits the kinematic assumptions and combines them approximately null lateral and upward velocity assumption in with the IMU outputs in a statistical way in Section III-C. the form of pseudo-measurements. However, the degree of It yields accurate estimates of position, orientation and confidence the filter should have in this side information is velocity of the car, as well as IMU biases, along with encoded in a covariance noise parameter which is difficult associated uncertainty (covariance); to set manually, and moreover which should dynamically • we exploit deep learning for dynamic adaptation of co- 2

variance noise parameters of the Kalman filter in Section stereo odometry experiment. Finally, [25] uses deep learning IV-A. This module greatly improves filter’s robustness for estimating covariance of a local odometry algorithm that and accuracy, see Section V-D; is fed into a global optimization procedure, and in [26] we • we demonstrate the performances of the approach on the used Gaussian processes to learn a wheel encoders error. Our KITTI dataset [3] in Section V. Our approach solely conference paper [20] contains preliminary ideas, albeit not based on the IMU produces accurate estimates and com- concerned at all with covariance adaptation: a neural network petes with top-ranked LiDAR and stereo camera methods essentially tries to detect when to perform ZUPT. [6,7]; and we do not know of IMU based dead-reckoning Dynamic adaptation of noise parameters in the Kalman filter methods capable to compete with such results; is standard in the tracking literature [27], however adaptation • the approach is not restricted to inertial only dead- rules are application dependent and are generally the result reckoning of wheeled vehicles. Thanks to the versatility of manual “tweaking” by engineers. Finally, in [28] the of the Kalman filter, it can easily be applied for railway authors propose to use classical machine learning techniques vehicles [8], coupled with GNSS, the backbone for IMU to to learn static noise parameters (without adaptation) of the self-calibration, or for using IMU as a speedometer in Kalman filter, and apply it to the problem of IMU-GNSS path-reconstruction and map-matching methods [9]–[12]. fusion.

II.IMU AND PROBLEM MODELLING A. Relation to Previous Literature An inertial navigation system uses and gyros Autonomous vehicle must robustly self-localize with their provided by the IMU to track the orientation Rn, velocity embarked sensor suite which generally consists of odometers, 3 3 vn R and position pn R of a moving platform relative IMUs, radars or LiDARs, and cameras [1,2,12]. Simultaneous ∈ ∈ to a starting configuration (R0, v0, p0). The orientation is Localization And Mapping based on inertial sensors, cameras, encoded in a rotation matrix R SO(3) whose columns n ∈ and/or LiDARs have enabled robust real-time localization sys- are the axes of a frame attached to the vehicle. tems, see e.g., [6,7]. Although these highly accurate solutions based on those sensors have recently emerged, they may drift A. IMU Modelling when the imaging system encounters troubles. As concerns wheeled vehicles, taking into account vehicle The IMU provides noisy and biased measurements of the constraints and odometer measurements are known to increase instantaneous angular velocity vector ωn and specific acceler- the robustness of localization systems [13]–[16]. Although ation an as follows [17] quite successful, such systems continuously process a large IMU ω ω ωn = ωn + bn + wn , (1) amount of data which is computationally demanding and a a aIMU = a + b + w , (2) energy consuming. Moreover, an autonomous vehicle should n n n n ω a ω a run in parallel its own robust IMU-based localization algorithm where bn , bn are quasi-constant biases and wn , wn are zero- to perform maneuvers such as emergency stops in case of other mean Gaussian noises. The biases follow a random walk sensors failures, or as an aid for correlation and interpretation bω = bω + wbω , of image feeds [2]. n+1 n n (3) a a ba High precision aerial or military inertial navigation systems bn+1 = bn + wn , (4) achieve very small localization errors but are too costly for bω ba where wn , wn are zero-mean Gaussian noises. consumer vehicles. By contrast, low and medium-cost IMUs The kinematic model is governed by the following equations suffer from errors such as scale factor, axis misalignment and IMU IMU random walk noise, resulting in rapid localization drift [17]. Rn+1 = Rn exp ((ωndt)×) , (5) This makes the IMU-based positioning unsuitable, even during IMU IMU IMU vn+1 = vn + (Rn an + g) dt, (6) short periods. pIMU = pIMU + vIMUdt, (7) Inertial navigation systems have long leveraged virtual and n+1 n n pseudo-measurements from IMU signals, e.g. the widespread between two discrete time instants sampling at dt, where we Zero velocity UPdaTe (ZUPT) [18]–[20], as covariance adap- let the IMU velocity be vIMU R3 and its position pIMU R3 n ∈ n ∈ tation [21]. In parallel, deep learning and more generally ma- in the world frame. RIMU SO(3) is the 3 3 rotation matrix n ∈ × chine learning are gaining much interest for inertial navigation that represents the IMU orientation, i.e. that maps the IMU [22,23]. In [22] velocity is estimated using support vector frame to the world frame, see Figure 2. Finally (y)× denotes regression whereas [23] use recurrent neural networks for end- the skew symmetric matrix associated with cross product with 3 3 to-end inertial navigation. Those methods are promising but y R . The true angular velocity ωn R and the true ∈ 3 ∈ restricted to pedestrian dead-reckoning since they generally specific acceleration an R are the inputs of the system (5)- ∈ consider slow horizontal planar motion, and must infer ve- (7). In our application scenarios, the effects of earth rotation locity directly from a small sequence of IMU measurements, and Coriolis acceleration are ignored, Earth is considered flat, whereas we can afford to use larger sequences. A more and the gravity vector g R3 is a known constant. ∈ general end-to-end learning approach is [24], which trains All sources of error displayed in (1) and (2) are harmful deep networks end-to-end in a Kalman filter. Albeit promising, since a simple implementation of (5)-(7) leads to a triple the method obtains large translational error > 30% in their integration of raw data, which is much more harmful that the 3

IMU IMU application to visual inertial localization and general consid- Rn , pn erations. The noise is assumed to be a centered Gaussian nn (0, Nn) where the covariance matrix Nn is set by IMU ∼ N vn the user and reflects the degree of validity of the information: the larger N the less confidence is put in the information. vc n ◦ n Starting from an initial Gaussian belief about the state, w x (xˆ , P ) where xˆ represents the initial estimate and ◦ i 0 ∼ N 0 0 0 the covariance matrix P0 the uncertainty associated to it, the c c Rn, pn EKF alternates between two steps. At the propagation step, the c estimate xˆn is propagated through model (9) with noise turned off, wn = 0, and the covariance matrix is updated through T T Pn+1 = FnPnFn + GnQnGn , (11) Fig. 2. The coordinate systems that are used in the paper. The IMU pose IMU IMU where Fn, Gn are Jacobian matrices of f( ) with respect to xn (Rn , pn ) maps vectors expressed in the IMU frame i (red) to the world · frame w (black). The IMU frame is attached to the vehicle and misaligned with and un. At the update step, pseudo-measurement is taken into c c the car frame c (blue). The pose between the car and inertial frames (Rn, pn) account, and Kalman equations allow to update the estimate IMU c is unknown. IMU velocity vn and car velocity vn are respectively expressed in the world frame and in the car frame. xˆn+1 and its covariance matrix Pn+1 accordingly. To implement an EKF, the designer needs to determine the functions f( ) and h( ), and the associated noise matrices Qn unique integration of differential wheel speeds [12]. Indeed, a N · · Q N 2 and n. In this paper, noise parameters n and n will be bias of order  has an impact of order t /2 on the position wholly learned by a neural network. after t seconds, leading to potentially huge drift.

B. Problem Modelling A. Defining the Dynamical Model f( ) · We distinguish between three different frames, see Figure 2: We now need to assess the evolution of state variables (8). IMU IMU IMU ω a i) the static world frame, w; ii) the IMU frame, i, where (1)- The evolution of Rn , pn , vn , bn and bn is already given (2) are measured; and iii) the car frame, c. The car frame is an by the standard equations (3)-(7). The additional variables c c ideal frame attached to the car, that will be estimated online Rn and pn represent the car frame with respect to the IMU. and plays a key role in our approach. Its orientation w.r.t. i is This car frame is rigidly attached to the car and encodes an denoted Rc SO(3) and its origin denoted pc R3 is the unknown fictitious point where the pseudo-measurements of n ∈ n ∈ car to IMU level arm. In the rest of the paper, we tackle the Section III-B are most advantageously made. As IMU is also c c following problem: rigidly attached to the car, and Rn, pn represent misalignment between IMU and car frame, they are approximately constant IMU Dead-Reckoning Problem. Given an initial known c IMU IMU IMU c c R configuration (R0 , v0 , p0 ), perform in real-time IMU Rn+1 = Rn exp((wn )×), (12) dead-reckoning, i.e. estimate the IMU and car variables c c pc pn+1 = pn + wn . (13) IMU IMU IMU ω a c c c c xn := (Rn , vn , pn , bn , bn, Rn, pn) (8) R p where we let wn , wn be centered Gaussian noises with small Rc pc using only the inertial measurements ωIMU and aIMU. covariance matrices σ I, σ I that will be learned during n n Rc pc training. Noises wn and wn encode possible small variations III.KALMAN FILTERING WITH PSEUDO-MEASUREMENTS through time of level arm due to the lack of rigidity stemming The Extended (EKF) was first implemented from dampers and shock absorbers. in the Apollo program to localize the space capsule, and is now pervasively used in the localization industry, the radar B. Defining the Pseudo-Measurements h( ) industry, and robotics. It starts from a dynamical discrete-time · Consider the different frames depicted on Figure 2. The non-linear law of the form velocity of the origin point of the car frame, expressed in the xn+1 = f(xn, un) + wn (9) car frame, writes  for where xn denotes the state to be estimated, un is an input, and vn c lat cT IMUT IMU c wn is the process noise which is assumed Gaussian with zero vn = vn  = Rn Rn vn + (ωn)×pn, (14) up mean and covariance matrix Qn. Assume side information vn is in the form of loose equality constraints h(x ) 0 c 3 n from basic screw theory, where p R is the car to IMU ≈ n ∈ is available. It is then customary to generate a fictitious level arm. In the car frame, we consider that the car lateral observation from the constraint function: and vertical velocities are roughly null, that is, we generate

yn = h(xn) + nn, (10) two scalar pseudo observations of the form (10) that is,  lat  lat lat  lat and to feed the filter with the information that y = 0 (pseudo- yn h (xn) + nn vn n yn = up = up up = up + nn, (15) measurement) as first advocated by [29], see also [13,30] for yn h (xn) + nn vn 4

 lat upT where the noises n= nn , nn are assumed centered and 2×2 Gaussian with covariance matrix Nn R . The filter is ∈ lat up then fed with the pseudo-measurement that yn = yn = 0. AI-based Noise Nn+1 lat up Assumptions that vn and vn are roughly null are common Parameter Adapter for cars moving forward on human made roads or wheeled moving indoor. Treating them as loose constraints, i.e., N IMU allowing the uncertainty encoded in n to be non strictly null, ωn Invariant Extended xˆn+1, IMU leads to much better estimates than treating them as strictly an Kalman Filter Pn+1 null [13]. It should be duly noted the vertical velocity vup is expressed n proposed IMU dead-reckoning system in the car frame, and thus the assumption it is roughly null generally holds for a car moving on a road even if the motion is 3D. It is quite different from assuming null vertical velocity in the world frame, which then boils down to planar horizontal Fig. 4. Structure of the proposed system for inertial dead-reckoning. The motion. measurement noise adapter feeds the filter with measurement covariance from The main point of the present work is that the validity of raw IMU signals only. the null lateral and vertical velocity assumptions widely vary depending on what maneuver is being performed: for instance, the approach which consists of two main blocks summarized vlat is much larger in turns than in straight lines. The role n as follows: of the noise parameter adapter of Section IV-A, based on AI • the filter integrates the inertial measurements (1)-(2) with techniques, will be to dynamically assess the parameter Nn dynamical model f( ) given by (3)-(7) and (12)-(13), that reflects confidence in the assumptions, as a function of · and exploits (15) as measurements h( ) with covariance past and present IMU measurements. · matrix Nn to refine its estimates; • the noise parameter adapter determines in real-time the C. The Invariant Extended Kalman Filter (IEKF) most suitable covariance noise matrix Nn. This deep learning based adapter converts directly raw IMU signals (1)-(2) into covariance matrices Nn without requiring Nn+1 knowledge of any state estimate nor any other quantity.

The amplitude of process noise parameters Qn are considered IMU ωn xˆn+1, fixed by the algorithm, and are learned during training. IMU Propagation Update an Pn+1 Note that the adapter computes covariances meant to im- prove localization accuracy, and thus the computed values Invariant Extended Kalman Filter may broadly differ from the actual statistical covariance of yn (15), see Section V-D for more details. In this respect, our approach is related to [24] but the considered problem is more challenging: our state-space is of dimension 21 whereas [24] Fig. 3. Structure of the IEKF. The filter uses the noise parameter Nn+1 of has a state-space of dimension only 3. Moreover, we compare pseudo-measurements (15) to yield a real time estimate of the state xˆn+1 along with covariance Pn+1. our results in the sequel to state-of-the-art methods based on stereo cameras and LiDARs, and we show we may achieve For inertial navigation, we advocate the use of a recent similar results based on the moderately precise IMU only. EKF variant, namely the Invariant Extended Kalman Filter (IEKF), see [4,5], that has recently given raise to a commercial A. AI-based Measurement Noise Parameter Adapter aeronautics product [31] and to various successes in the field of visual inertial odometry [32]–[34]. We thus opt for an IEKF The measurement noise parameter adapter computes at each n N to perform the fusion between the IMU measurements (1)- instant the covariance n+1 used in the filter update, see (2) and (15) treated as pseudo-measurements. Its architecture, Figure 3. The base core of the adapter is a Convolutional which is identical to the conventional EKF’s, is recapped in Neural Network (CNN) [35]. The networks takes as input a N Figure 3. window of inertial measurements and computes IMU IMU n  However, understanding in detail the IEKF [4] requires Nn+1 = CNN ωi , ai i=n−N . (16) some background in Lie group geometry. The interested reader { } is referred to the Appendix where the exact equations are Our motivations for the above simple CNN-like architecture provided. are threefold: i) avoiding over-fitting by using a relatively small number of parameters in the network and also by making its outputs IV. PROPOSED AI-IMUDEAD-RECKONING independent of state estimates; This section describes our system for recovering all the ii) obtaining an interpretable adapter from which one can variables of interest from IMU signals only. Figure 4 illustrates infer general and safe rules using reverse engineering, 5

e.g. to which extent must one inflate the covariance during C. Training turns, for e.g., generalization to all sorts of wheeled and We seek to optimize the relative translation error trel commercial vehicles, see Section V-D; computed from the filter estimates xˆn, which is the averaged iii) letting the network be trainable. Indeed, as reported in increment error for all possible sub-sequences of length 100 m [24], training is quite difficult and slow. Setting the to 800 m. adapter with a recurrent architecture [35] would make Toward this aim, we first define the learnable parameters. It the training even much harder. consists of the 6210 parameters of the adapter, along with the The complete architecture of the adapter consists of a bunch parameter elements of P0 and Q in (18)-(19), which add 12 of CNN layers followed by a full layer outputting a vector parameters to learn. We then choose an Adam optimizer [36] T  lat up 2 2×2 −4 zn = z , z R . The covariance Nn+1 R is with learning rate 10 that updates the trainable parameters. n n ∈ ∈ then computed as Training consists of repeating for a chosen number of epochs  2 β tanh(zlat) 2 β tanh(zup) the following iterations: Nn+1 = diag σ 10 n , σ 10 n , (17) lat up i) sample a part of the dataset; with β R>0, and where σlat and σup correspond to our ii) get the filter estimates for then computing loss and ∈ initial guess for the noise parameters. The network thus may gradient w.r.t. the learnable parameters; inflate covariance up to a factor 10β and squeeze it up to a ii) update the learnable parameters with gradient and opti- factor 10−β with respect to its original values. Additionally, mizer. as long as the network is disabled or barely reactive (e.g. Following continual training [37], we suppose the number of when starting training), we get zn 0 and recover the initial epochs is huge, potentially infinite (in our application we set 2 ≈ covariance diag (σlat, σup) . this number to 400). This makes sense for online training Regarding process noise parameter Qn, we choose to fix it in a context where the vehicle gathers accurate ground-truth to a value Q and leave its dynamic adaptation for future work. poses from e.g. its LiDAR system or precise GNSS. It requires However its entries are optimized during training, see Section careful procedures for avoiding over-fitting, such that we use IV-C. dropout and data augmentation [35]. Dropout refers to ignoring units of the adapter during training, and we set the probability B. Implementation Details p = 0.5 of any CNN element to be ignored (set to zero) during We provide in this section the setting and the implementa- a sequence iteration. tion details of our method. We implement the full approach Regarding i), we sample a batch of nine 1 min sequences, in Python with the PyTorch1 library for the noise parameter where each sequence starts at a random arbitrary time. We adapter part. We set as initial values before training add to data a small Gaussian noise with standard deviation 10−4 ii 2 , a.k.a. data augmentation technique. We compute )  R v bω ba Rc pc  P0 = diag σ0 I2, 0, σ0 I2, 04, σ0 I, σ0 I, σ0 I, σ0 I , with standard backpropagation, and we finally clip the gradient (18) norm to a maximal value of 1 to avoid gradient explosion at 2 step iii). Q = diag (σωI, σaI, σbω I, σba I, σRc I, σpc I) , (19) We stress the loss function consists of the relative translation 2 Nn = diag (σlat, σup) , (20) error trel, i.e. we optimize parameters for improving the filter

R −3 v bω accuracy, disregarding the values actually taken by Nn, in the where I = I3, σ0 = 10 rad, σ0 = 0.3 m/s, σ0 = a c spirit of [24]. 10−4 rad/s, σb = 3.10−2 m/s2, σR = 3.10−3 rad, c 0 0 p −1 σ0 = 10 m in the initial error covariance P0, σω = −2 −2 2 −4 V. EXPERIMENTAL RESULTS 1.4.10 rad/s, σa = 3.10 m/s , σbω = 10 rad/s, −3 2 −4 −4 σba = 10 m/s , σRc = 10 rad, σpc = 10 m for the We evaluate the proposed method on the KITTI dataset noise propagation covariance matrix Q, σlat = 1 m/s, and [3], which contains data recorded from LiDAR, cameras and σup = 3 m/s for the measurement covariance matrix. The zero IMU, along with centimeter accurate ground-truth pose from values in the diagonal of P0 in (18) corresponds to a perfect different environments (e.g., urban, highways, and streets). prior of the initial yaw, position and zero vertical speed. The dataset contains 22 sequences for benchmarking odometry The adapter is a 1D temporal convolutional neural network algorithms, 11 of them contain publicly available ground-truth with 2 layers. The first layer has kernel size 5, output dimen- trajectory, raw and synchronized IMU data. We download the − sion 32, and dilatation parameter 1. The second has kernel size raw data with IMU signals sampled at 100 Hz (dt = 10 2 s) 5, output dimension 32 and dilatation parameter 3, thus it set rather than the synchronized data sampled at 10 Hz, and the window size equal to N = 15. The CNN is followed by discard seq. 03 since we did not find raw data for this a fully connected layer that output the scalars zlat and zup. sequence. The RT30032 IMU has announced gyro and ac- Each activation function between two layers is a ReLU unit celerometer bias stability of respectively 36 deg /h and 1 mg. [35]. We define β = 3 in the right part of (17) which allows The KITTI dataset has an online benchmarking system that for each covariance element to be 103 higher or smaller than ranks algorithms. However we could not submit our algorithm its original values. for online ranking since sequences used for ranking do not

1https://pytorch.org/ 2https://www.oxts.com/ 6

IMLS [6] ORB-SLAM2 [7] IMU proposed test length duration environment t r t r t r t r seq. rel rel rel rel rel rel rel rel (km) (s) (%) (deg/m) (%) (deg/m) (%) (deg/m) (%) (deg/m) 01 2.6 110 highway 0.48 0.08 1.38 0.20 5.35 0.12 1.11 0.12 03 - 80 country ------04 0.4 27 country 0.25 0.08 0.41 0.21 0.97 0.10 0.35 0.08 06 1.2 110 urban 0.78 0.07 0.89 0.22 5.78 0.19 0.97 0.20 07 0.7 110 urban 0.32 0.12 1.16 0.49 12.6 0.30 0.84 0.32 08 3.2 407 urban, country 1.84 0.17 1.52 0.30 549 0.56 1.48 0.32 09 1.7 159 urban, country 0.97 0.11 1.01 0.25 23.4 0.32 0.80 0.22 10 0.9 120 urban, country 0.50 0.14 0.31 0.34 4.58 0.25 0.98 0.23 average scores 0.98 0.12 1.17 0.27 171 0.31 1.10 0.23

Table 1. Results on [3]. IMU integration tends to drift or diverge, whereas the proposed method may be used as an alternative to LiDAR based (IMLS) and stereo vision based (ORB-SLAM2) methods, using only IMU information. Indeed, on average, our dead-reckoning solution performs better than ORB-SLAM2 and achieves a translational error being close to that of the LiDAR based method IMLS, which is ranked 3rd on the KITTI online benchmarking system. Data from seq. 03 was unavailable for testing algorithms, and sequences 00, 02 and 05 are discussed separately in Section V-C. It should be duly noted IMLS, ORB-SLAM2, and the proposed AI-IMU algorithm, all use different sensors. The interest of ranking algorithms based on different information is debatable. Our goal here is rather to evidence that using data from a moderately precise IMU only, one can achieve similar results as state of the art systems based on imaging sensors, which is a rather surprising feature. contain IMU data, which is reserved for training only. Our implementation is made open-source at: ground-truth IMLS ORB-SLAM2 200 IMU proposed https://github.com/mbrossar/ai-imu-dr.

A. Evaluation Metrics and Compared Methods 150 To assess performances we consider the two error metrics 5 s car stop

proposed in [3]: )

1) Relative Translation Error ( ): m 100 trel which is the aver- ( aged relative translation increment error for all possible sub- y sequences of length 100 m,..., 800 m, in percent of the traveled distance; 50 2) Relative Rotational Error (trel): that is the relative rotational increment error for all possible sub-sequences of length 100 m,..., 800 m, in degree per meter. 0 We compare four methods which alternatively use LiDAR, stereo vision, and IMU-based estimations: start end • IMLS [6]: a recent state-of-the-art LiDAR-based ap- 100 50 0 50 100 150 200 proach ranked 3rd in the KITTI benchmark. The author − − x (m) provided us with the code after disabling the loop-closure module; • ORB-SLAM2 [7]: a popular and versatile library for Fig. 5. Results on seq. 07 (drive #27, 2011/09/30) [3]. The proposed monocular, stereo and RGB-D cameras that computes method competes with LiDAR and methods, whereas the a sparse reconstruction of the map. We took the open- IMU integration broadly drifts after the car stops. source code, disable loop-closure capability and evaluate the stereo algorithm without modifying any parameter; (e.g. for testing seq. 10, we train on seq. 00-09) so that • IMU: the direct integration of the IMU measurements based on (4)-(5), that is, pure inertial navigation; the noise parameter has never been confronted with the evaluated sequence; iii) we run the IMU-based methods on • proposed: the proposed approach, that uses only the IMU signals and involves no other sensor. the full raw sequence with ground-truth initial configuration IMU IMU IMU (R0 , v0 , p0 ), whereas we initialize remaining variables ω a c c at zero (b0 = b0 = p0 = 0, R0 = I); and iv) we get B. Trajectory Results the estimates only on time corresponding to the odometry We follow the same protocol for evaluating each sequence: benchmark sequence. LiDAR and visual methods are directly i) we initialize the filter with parameters described in Section evaluated on the odometry sequences. IV-B; ii) we train then the noise parameter adapter following Results are averaged in Table 1 and illustrated in Figures Section IV-C for 400 epochs without the evaluated sequence 1, 5 and 6, where we exclude sequences 00, 02 and 05 7

200 1,500 ground-truth IMLS ORB-SLAM2 − ground-truth prop. w/o alignment IMU proposed 100 proposed prop. w/o cov. adapter

1,600 0 − start/end )

100 m − ( ) y

m 1,700 ( 200 − y − 300 − 1,800 − 400 − 950 1,000 1,050 1,100 1,150 1,200 1,250 500 − x (m)

100 0 100 200 300 400 500 − x (m) Fig. 7. End trajectory results on the highway seq. 01 (drive #42, 2011/10/30) [3]. Dynamically adapting the measurement covariance and considering mis- alignment between car and inertial frames enhance the performances of the proposed method from a translational error of 1.94% to one of 1.11%. Fig. 6. Results on seq. 09 (drive #33, 2011/09/30) [3]. The proposed method competes with LiDAR and visual odometry methods, whereas the IMU integration drifts quickly after the first turn. C. Results on Sequences 00, 02 and 05 Following the procedure described in Section V-B, the proposed method seems to have degraded performances on which contain problems with the data, and will be discussed seq. 00, 02 and 05, see e.g. Figure 9. However, the behavior separately in Section V-C. From these results, we see that: is wholly explainable: data are missing for a couple of seconds due to logging problems which appear both for IMU and • LiDAR and visual methods perform generally well in ground-truth. This is illustrated in Figure 10 for seq. 02 where all sequences, and the LiDAR method achieves slightly we plot available data over time. We observe jump in the better results than its visual counterpart; IMU and ground-truth signals, that illustrate data are missing • our method competes on average with the latter image between t = 1 and t = 3. The problem was corrected manually based methods, see Table 1; when using those sequences in the training phase described in • directly integrating the IMU signals leads to rapid drift Section V. of the estimates, especially for the longest sequences but Although those sequences could have been discarded due to even for short periods; logging problems, we used them for testing without correcting • Our method looks unaffected by stops of the car, as in their problems. This naturally results in degraded performance, seq. 07, see Figure 5. but also evidences our method is remarkably robust to such problems in spite of their inherent harmfulness. For instance, The results are remarkable as we use none of the vision the 2 s time jump of seq. 02 results in estimate shift, but no sensors, nor wheel odometry. We only use the IMU, which divergence occurs for all that, see Figure 9. moreover has moderate precision. We also sought to compare our method to visual inertial D. Discussion odometry algorithms. However, we could not find open-source The performances are owed to three components: i) the use of such method that performs well on the full KITTI dataset. of a recent IEKF that has been proved to be well suited for We tested [38] but the code in still under development (results IMU based localization; ii) incorporation of side information sometimes diverge), and the authors in [39] evaluate their not in the form of pseudo-measurements with dynamic noise open-source method for short sequences ( 30 s). The paper parameter adaptation learned by a neural network; and iii) ≤ [33,40] evaluate their visual inertial odometry methods on accounting for a “loose” misalignment between the IMU frame seq. 08, both get a final error around 20 m, which is four and the car frame. times what our method gets, with final distance to ground- As concerns i), it should be stressed the method is perfectly truth of only 5 m. This clearly evidences that methods taylored suited to the use of a conventional EKF and is easily adapted for ground vehicles [13,15] may achieve higher accuracy and if need be. However we advocate the use of an IEKF owing robustness that general methods designed for , to its accuracy and convergence properties. To illustrate the drones and aerial vehicles. benefits of points ii) and iii), we consider two sub-versions 8

z x y lat up VI.CONCLUSION ωn an an cov(yn ) cov(yn ) This paper proposes a novel approach for inertial dead- reckoning for wheeled vehicles. Our approach exploits deep )

2 neural networks to dynamically adapt the covariance of simple

s 2

/ assumptions about the vehicle motions which are leveraged in 2 m

( an invariant extended Kalman filter that performs localization, 0

)) velocity and sensor bias estimation. The entire algorithm is n

y fed with IMU signals only, and requires no other sensor. The 2 method leads to surprisingly accurate results, and opens new

(cov( − perspectives. In future work, we would like to address the 10 learning of the Kalman covariance matrices for images, and

log 4 − also the issue of generalization from one vehicle to another.

0 20 40 60 80 100 120 ACKNOWLEDGMENTS t (s) The authors would like to thank J-E. DESCHAUD for sharing the results of the IMLS algorithm [6], and Paul CHAUCHAT for relevant discussions. Fig. 8. Covariance values computed by the adapter on the highway seq. 01 (drive #42, 2011/10/30) [3]. We clearly observe a large increase in the REFERENCES covariance values when the car is turning between t = 90 s and t = 110 s. [1] G. Bresson, Z. Alsayed, L. Yu et al., “Simultaneous Localization and Mapping: A Survey of Current Trends in Autonomous Driving,” IEEE Transactions on Intelligent Vehicles, vol. 2, no. 3, pp. 194–220, 2017. [2] OxTS, “Why it is Necessary to Integrate an Inertial Measurement Unit with Imaging Systems on an Autonomous Vehicle,” https : / / www . oxts of the proposed algorithm. One without alignment, i.e. where . com / technical-notes/why-use-ins-with-autonomous-vehicle/, 2018. c c Rn and pn are not included in the state and fixed at their [3] A. Geiger, P. Lenz, C. Stiller et al., “Vision meets robotics: The KITTI initial values Rc = I, pc = 0, and a second one that uses the dataset,” The International Journal of Robotics Research, vol. 32, no. 11, n n pp. 1231–1237, 2013. static filter parameters (18)-(20). [4] A. Barrau and S. Bonnabel, “The Invariant Extended Kalman Filter as a Stable Observer,” IEEE Transactions on Automatic Control, vol. 62, End trajectory results for the highway seq. 01 are plotted no. 4, pp. 1797–1812, 2017. in Figure 7, where we see that the two sub-version methods [5] ——, “Invariant Kalman Filtering,” Annual Review of Control, Robotics, have trouble when the car is turning. Therefore their respective and Autonomous Systems, vol. 1, no. 1, pp. 237–257, 2018. [6] J.-E. Deschaud, “IMLS-SLAM: Scan-to-Model Matching Based on 3D translational errors trel are higher than the full version of Data,” in International Conference on Robotics and Automation. IEEE, the proposed method: the proposed method achieves 1.11%, 2018. the method without alignment level arm achieves 1.65%, and [7] R. Mur-Artal and J. Tardos, “ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras,” Transactions on the absence of covariance adaptation yields 1.94% error. All Robotics, vol. 33, no. 5, pp. 1255–1262, 2017. methods have the same rotational error rrel = 0.12 deg /m. [8] F. Tschopp, T. Schneider, A. W. Palmer et al., “Experimental Com- This could be anticipated for the considered sequence since parison of Visual-Aided Odometry Methods for Rail Vehicles,” IEEE Robotics and Automation Letters, vol. 4, no. 2, pp. 1815–1822, 2019. the full method has the same rotational error than standard [9] M. Mousa, K. Sharma, and C. G. Claudel, “Inertial Measurement Units- IMU integration method. Based Probe Vehicles: Automatic Calibration, Trajectory Estimation, and Context Detection,” IEEE Transactions on Intelligent Transportation As systems with equipped AI-based approaches may be Systems, vol. 19, no. 10, pp. 3133–3143, 2018. hard to certify for commercial or industrial use [41], we note [10] J. Wahlstrom, I. Skog, J. G. P. Rodrigues et al., “Map-Aided Dead- Reckoning Using Only Measurements of Speed,” IEEE Transactions on adaptation rules may be inferred from the AI-based adapter, Intelligent Vehicles, vol. 1, no. 3, pp. 244–253, 2016. and encoded in an EKF using pseudo-measurements. To this [11] A. Mahmoud, A. Noureldin, and H. S. Hassanein, “Integrated Po- aim, we plot the covariances computed by the adapter for seq. sitioning for Connected Vehicles,” IEEE Transactions on Intelligent Transportation Systems, pp. 1–13, 2019. 01 in Figure 8. The adapter clearly increases the covariances [12] R. Karlsson and F. Gustafsson, “The Future of Automotive Localization during the bend, i.e. when the gyro yaw rate is important. Algorithms: Available, reliable, and scalable localization: Anywhere and This is especially the case for the zero velocity measurement anytime,” IEEE Signal Processing Magazine, vol. 34, no. 2, pp. 60–69, 2 2017. (15): its associated covariance is inflated by a factor of 10 [13] K. Wu, C. Guo, G. Georgiou et al., “VINS on Wheels,” in International between t = 90 s and t = 110 s. This illustrates the kind Conference on Robotics and Automation. IEEE, 2017, pp. 5155–5162. of information the adapter has learned. Interestingly, we see [14] A. Brunker, T. Wohlgemuth, M. Frey et al., “Odometry 2.0: A Slip- Adaptive EIF-Based Four-Wheel-Odometry Model for Parking,” IEEE large discrepancies may occur between the actual statistical Transactions on Intelligent Vehicles, vol. 4, no. 1, pp. 114–126, 2019. uncertainty (which should clearly be below 100 m2/s2) and [15] F. Zheng and Y.-H. Liu, “SE(2)-Constrained Visual Inertial Fusion for the inflated covariances whose values are computed for the Ground Vehicles,” IEEE Sensors Journal, vol. 18, no. 23, pp. 9699– 9707, 2018. sole purpose of filter’s performance enhancement. Indeed, [16] M. Buczko, V. Willert, J. Schwehr et al., “Self-Validation for Automotive such a large noise parameter inflation indicates the AI-based Visual Odometry,” in Intelligent Vehicles Symposium. IEEE, 2018, pp. part of the algorithm has learned and recognizes that pseudo- 1–6. [17] M. Kok, J. D. Hol, and T. B. Schon,¨ “Using Inertial Sensors for measurements have no value for localization at those precise Position and Orientation Estimation,” Foundations and Trends R in moments, so the filter should barely consider them. Signal Processing, vol. 11, no. 1-2, pp. 1–153, 2017. 9

IMLS [6] ORB-SLAM2 [7] IMU proposed test length duration environment t r t r t r t r seq. rel rel rel rel rel rel rel rel (km) (s) (%) (deg/m) (%) (deg/m) (%) (deg/m) (%) (deg/m) 00 3.7 454 urban 1.46 0.17 1.33 0.29 426 4.68 7.40 2.65 02 5.1 466 urban 2.42 0.12 1.84 0.28 346 0.87 2.85 0.38 05 2.2 278 urban 0.83 0.10 0.75 0.22 189 0.52 2.76 0.86

Table 2. Results on [3] on seq. 00, 02 and 05. The degraded results of the proposed method are wholly explained by a problem of missing data, see Section V-C and Figure 10.

[18] A. Ramanandan, A. Chen, and J. Farrell, “Inertial Navigation Aiding by Stationary Updates,” IEEE Transactions on Intelligent Transportation ground-truth Systems, vol. 13, no. 1, pp. 235–248, 2012. IMLS end [19] G. Dissanayake, S. Sukkarieh, E. Nebot et al., “The Aiding of a Low- 600 ORB-SLAM2 cost Strapdown Inertial Measurement Unit Using Vehicle Model Con- straints for Land Vehicle Applications,” IEEE Transactions on Robotics IMU and Automation, vol. 17, no. 5, pp. 731–747, 2001. proposed [20] M. Brossard, A. Barrau, and S. Bonnabel, “RINS-W: Robust Inertial Navigation System on Wheels,” Submitted to IROS 2019, 2019. [Online]. 400 Available: https://hal.archives-ouvertes.fr/hal-02057117/file/main.pdf [21] F. Aghili and C.-Y. Su, “Robust Relative Navigation by Integration

) 2 s time of ICP and Adaptive Kalman Filter Using Laser Scanner and IMU,” m

( jump in IEEE/ASME Transactions on Mechatronics, vol. 21, no. 4, pp. 2015–

y 200 data 2026, 2016. [22] H. Yan, Q. Shan, and Y. Furukawa, “RIDI: Robust IMU Double Integration,” in European Conference on Computer Vision, 2018. [23] C. Chen, X. Lu, A. Markham et al., “IONet: Learning to Cure the Curse of Drift in Inertial Odometry,” in Conference on Artificial Intelligence 0 AAAI, 2018. start [24] T. Haarnoja, A. Ajay, S. Levine et al., “Backprop KF: Learning Discriminative Deterministic State Estimators,” in Advances in Neural Information Processing Systems, 2016. 200 [25] K. Liu, K. Ok, W. Vega-Brown et al., “Deep Inference for Covariance − Estimation: Learning Gaussian Noise Models for State Estimation,” in 0 200 400 600 800 1,000 International Conference on Robotics and Automation. IEEE, 2018. [26] M. Brossard and S. Bonnabel, “Learning Wheel Odometry and IMU x (m) Errors for Localization,” in International Conference on Robotics and Automation. IEEE, 2019. [27] F. Castella, “An Adaptive Two-Dimensional Kalman Tracking Filter,” IEEE Transactions on Aerospace and Electronic Systems, vol. AES-16, Fig. 9. Results on seq. 02 (drive #34, 2011/09/30) [3]. The proposed method no. 6, pp. 822–829, 1980. competes with LiDAR and visual odometry methods until a problem in data [28] P. Abbeel, A. Coates, M. Montemerlo et al., “Discriminative Training occurs (2 seconds are missing). It is remarkable that the proposed method be of Kalman Filters,” in Robotics: Science and Systems, vol. 2, 2005. robust to such trouble causing a shift estimates but no divergence. [29] M. Tahk and J. Speyer, “Target tracking problems subject to kine-matic constraints,” IEEE Transactions on Automatic Control, vol. 32, no. 2, pp. 324–326, 1990. [30] D. Simon, “Kalman Filtering with State Constraints: A Survey of Linear

) and Nonlinear Algorithms,” IET Control Theory & Applications, vol. 4, s

/ x y x y z no. 8, pp. 1303–1318, 2010. t pn pn ωn ωn ωn 4 [31] A. Barrau and S. Bonnabel, “Aligment Method for an Inertial Unit,” rad Patent 15/037,653, 2016.

(10 2 s time jump [32] M. Brossard, S. Bonnabel, and A. Barrau, “Unscented Kalman Filter on n Lie Groups for Visual Inertial Odometry,” in International Conference ω 2 on Intelligent Robots and Systems. IEEE/RSJ, 2018. ), [33] S. Heo and C. G. Park, “Consistent EKF-Based Visual-Inertial Odometry m on Matrix Lie Group,” IEEE Sensors Journal, vol. 18, no. 9, pp. 3780– 3788, 2018. (0.1 0 [34] R. Hartley, M. G. Jadidi, J. W. Grizzle et al., “Contact-Aided Invariant n

p Extended Kalman Filtering for Legged State Estimation,” in

), Robotics Science and Systems, 2018. s ( [35] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. The MIT t 2 −3.36 3.37 3.38 3.39 press, 2016. [36] D. P. Kingma and J. Ba, “ADAM: A Method for Stochastic Optimiza- n (timestamp) 104 tion,” in International Conference on Learning Representations, 2014. · [37] G. Parisi, R. Kemker, J. Part et al., “Continual Lifelong Learning with Neural Networks: A Review,” Neural Networks, vol. 113, pp. 54–71, 2019. Fig. 10. Data on seq. 02 (drive #34, 2011/09/30) [3], as function of [38] T. Qin, J. Pan, S. Cao et al., “A General Optimization-based Framework timestamps number n.A 2 s time jump happen around n = 33750, i.e. data for Local Odometry Estimation with Multiple Sensors,” 2019. have not been recorded during this jump. It leads to jump in ground-truth and [39] M. Ramezani and K. Khoshelham, “Vehicle Positioning in GNSS- IMU signals, causing estimate drift. Deprived Urban Areas by Stereo Visual-Inertial Odometry,” IEEE Trans- actions on Intelligent Vehicles, vol. 3, no. 2, pp. 208–217, 2018. 10

[40] S. Heo, J. Cha, and C. G. Park, “EKF-Based Visual Inertial Navigation 3) Update Step: the measurement vector yn+1 is computed Using Sliding Window Nonlinear Optimization,” IEEE Transactions on by stacking the motion information Intelligent Transportation Systems, pp. 1–10, 2018. [41]N.S underhauf,¨ O. Brock, W. Scheirer et al., “The limits and potentials  lat  vn+1 of deep learning for robotics,” The International Journal of Robotics yn+1 = up = 0, (28) Research, vol. 37, no. 4-5, pp. 405–420, 2018. vn+1 with assessed uncertainty a zero-mean Gaussian variable with PPENDIX covariance Nn+1 = cov (yn+1). We then compute an updated A A + + state xˆn+1 and updated covariance Pn+1 following the IEKF The Invariant Extended Kalman Filter (IEKF) [4,5] is an methodology, i.e. we compute EKF based on an alternative state error. One must define a T  linearized error, an underlying group to derive the exponential S = Hn+1Pn+1Hn+1 + Nn+1 , (29) T map, and then methodology is akin to the EKF’s. K = Pn+1Hn+1/S, (30) 1) Linearized Error: the filter state xn is given by (8). + e = K (yn+1 yˆn+1) , (31) The state evolution is given by the dynamics (5)-(7) and (12)- − IMU+ IMU+ IMU IMU χˆ n+1 = expSE (3) ξ χˆ n+1, (32) (13), see Section II. Along the lines of [4], variables χn := 2 IMU IMU IMU + b+ (Rn , vn , pn ) are embedded in the Lie group SE2(3) (see bn+1 = bn+1 + e (33)  c  Appendix B for the definition of SE2(3) and its exponential ˆ c+ R + ˆ c  ωT aT T 6 Rn+1 = expSO(3) ξ Rn+1, (34) map). Then biases vector bn = bn , bn R is merely ∈6 c+ c pc+ treated as a vector, that is, as an element of R viewed as a pˆn+1 = pˆn+1 + e , (35) c Lie group endowed with standard addition, Rn is treated as + Pn+1 = (I21 KHn+1) Pn+1, (36) element of Lie group SO(3), and pc R3 as a vector. Once − n ∈ the state is broken into several Lie groups, the linearized error summarized as Kalman gain (30), state innovation (31), state writes as the concatenation of corresponding linearized errors, update (32)-(35) and covariance update (36), where Hn+1 is that is, the measurement Jacobian matrix with respect to linearized

c c error (21) and thus given as:  IMUT bT R T p T T en = ξn en ξn en (0, Pn) , (21)  IMUT c  ∼ N Hn = A 0 R 0 (p ) 0 B C , (37) 21 n n × where state uncertainty en R is a zero-mean Gaussian − ∈ 21×21 variable with covariance Pn R . As (15) are measure- where A = [I2 02] selects the two first row of the right part cT IMUT IMU IMU ω ∈ of (37), B = R R (v ) and C = (ω b )×. ments expressed in the robot’s frame, they lend themselves n n n × − n − n to the Right IEKF methodology. This means each linearized error is mapped to the state using the corresponding Lie group APPENDIX B on the right exponential map, and multiplying it by elements The Lie group SE2(3) is an extension of the Lie group of the state space. This yields: SE(3) and is described as follows, see [4] where it was first χIMU IMU χˆ IMU introduced. A 5 5 matrix χn SE2(3) is defined as n = expSE2(3) (ξn ) n , (22) × ∈   ˆ b Rn vn pn bn = bn + en, (23) χn = SE2(3). (38)  c  02×3 I2 ∈ Rc = exp ξR Rˆ c , (24) n SO(3) n n 9 c The uncertainties ξn R are mapped to the Lie algebra c c p ∈ ∧ pn = pˆn + en , (25) se (3) through the transformation ξ ξ defined as 2 n 7→ n ˆ R v p T where ( ) denotes estimated state variables. ξ = ξ T , ξ T , ξ T  , · n n n n (39) 2) Propagation Step: we apply (5)-(7) and (12)-(13) to  R v p  ∧ ξn × ξn ξn propagate the state and obtain xˆn+1 and associated covariance ξn = se2(3), (40) 02×5 ∈ through the Riccati equation (11) where the Jacobians Fn, Gn are related to the evolution of error (21) and write: where ξR R3, ξv R3 and ξp R3. The closed-form n ∈ n ∈ n ∈ expression for the exponential map is given as Fn = I21×21+   ∧ ∧ 2 ∧ 3 0 0 0 Rn 0 03×6 expSE2(3) (ξn) = I + ξn + a(ξn ) + b(ξn ) , (41) −IMU (g)× 0 0 (v )×Rn Rn 03×6 (26) R R R  n  1−cos kξ k kξ k−sin kξ k  − IMU −  dt, ( n ) n ( n )  0 I3 0 (pn )×Rn 0 03×6 where a = kξRk and b = kξRk3 , and which − n n 012×21 inherently uses the exponential of SO(3), defined as     exp ξR = exp ξR (42) Rn 0 03×12 SO(3) n n × IMU R R2 (vn )×Rn Rn 03×12  = I + ξ + a ξ . (43) Gn =  IMU  dt, (27) n × n × (pn )×Rn 0 03×12  012×3 012×3 I12×12

IMU with Rn = Rn , 0 = 03×3, and Qn denotes the classical covariance matrix of the process noise as in Section IV-B.