GP-BayesFilters: Bayesian Filtering Using Gaussian Process Prediction and Observation Models

Jonathan Ko and Dieter Fox Dept. of Computer Science & Engineering, University of Washington, Seattle, WA

Abstract— Bayesian filtering is a general framework for re- observation models can be combined with particle filters (GP- cursively estimating the state of a dynamical system. The most PF) and extended Kalman filters (GP-EKF). The development common instantiations of Bayes filters are Kalman filters (ex- of GP-EKFs requires a linearization of GP regression models, tended and unscented) and particle filters. Key components of each Bayes filter are probabilistic prediction and observation which we derive in this paper. We furthermore perform a models. Recently, Gaussian processes have been introduced as a thorough comparison of the performance of the different filters non-parametric technique for learning such models from training based on simulation experiments and data collected by a data. In the context of unscented Kalman filters, these models robotic blimp. have been shown to provide estimates that can be superior to This paper is organized as follows. After providing the those achieved with standard, parametric models. In this paper we show how Gaussian process models can be integrated into necessary background on Bayesian filtering and Gaussian other Bayes filters, namely particle filters and extended Kalman processes, we introduce the different instantiations of GP- filters. We provide a complexity analysis of these filters and BayesFilters in Section III. Section IV presents the experi- evaluate the alternative techniques using data collected with an mental evaluation. We conclude in Section V. autonomous micro-blimp. II.BACKGROUNDOF GP-BAYESFILTERS I.INTRODUCTION Before we describe the generic GP-BayesFilter, let us dis- Estimating the state of a dynamical system is a fundamen- cuss the basic concepts underlying Bayes filters and Gaussian tal problem in robotics. The most successful techniques for processes. state estimation are Bayesian filters such as particle filters A. Bayes Filters or extended and unscented Kalman filters [13]. Bayes filters recursively estimate posterior probability distributions over Bayes filters recursively estimate posterior distributions over the state of a system. The key components of a Bayes filter the state xk of a dynamical system conditioned on all sensor are the prediction and observation models, which probabilis- information collected so far: tically describe the temporal evolution of the process and the measurements returned by the sensors, respectively. Typically, p(xk|z1:k, u1:k−1) ∝ Z these models are parametric descriptions of the involved p(zk|xk) p(xk|xk−1, uk−1) p(xk−1|z1:k)dxk−1 (1) processes [13]. However, parametric models are not always able to capture all aspects of a dynamical system. Here z1:k and u1:k−1 are the histories of sensor measure- To overcome the limitations of parametric models, re- ments and controls obtained up to time k. The term p(xk | searchers have recently introduced non-parametric, Gaussian xk−1, uk−1) is the prediction model, a probabilistic model process (GP) regression models [12] to learn prediction and of the system dynamics. p(zk | xk), the observation model, observation models for dynamical systems. GPs have been describes the likelihood of making an observation zk given the applied successfully to the problem of learning predictive state xk. Typically, these models are parametric descriptions state models [3, 4, 9]. The fact that GP regression models of the underlying processes, see [13] for several examples. provide uncertainty estimates for their predictions allows them In GP-BayesFilters, both prediction and observation models to be readily incorporated into particle filters as observation are learned from training data using non-parametric, Gaussian models [2] or as improved distributions [10]. Ko and process regression. colleagues introduced GP-UKFs, which combine GP predic- tion and observation models with unscented Kalman filters. B. Gaussian Processes for Regression Using data collected with a robotic blimp they demonstrated Gaussian processes (GP) are a powerful, non-parametric that GP-UKFs outperform parametric unscented Kalman filters tool for learning regression functions from sample data. Key and that the performance of GP-UKFs can be increased by advantages of GPs are their modeling flexibility, their ability to combining GP models with parametric models [7]. provide uncertainty estimates, and their ability to learn noise In this paper we investigate the integration of Gaussian and smoothness parameters from training data [12]. Processes (GP) into different forms of Bayes filters. In addition A Gaussian process represents posterior distributions over to GP-UKFs, the previously introduced combination with functions based on training data. To see, assume we have a set unscented Kalman filters, we show how GP prediction and of training data, D = hX, yi, where X = [x1, x2, ..., xn] is a containing d-dimensional input examples xi, and y = The resulting GP prediction and observation models are then [y1, y2, ..., yn] is a vector containing scalar training outputs yi. A GP assumes that the data is drawn from the noisy process p(xk|xk−1, uk−1) ≈ N (GPµ([xk−1, uk−1],Dp), GPΣ([xk−1, uk−1],Dp)) (8) yi = f(xi) +  , (2) and where ε is zero , additive with variance σ2 . Conditioned on training data D = hX, yi and a test input n p(zk|xk) ≈ N (GPµ(xk,Do), GPΣ(xk,Do)) , (9) x∗, a GP defines a Gaussian predictive distribution over the output y∗ with mean respectively. The reader may notice that while these models are Gaussians, both the and variances are non-linear T 2 −1 GPµ (x∗,D) = k∗ [K + σnI] y (3) functions of the input and training data. Furthermore, the and variance locally Gaussian nature of these models allows a very natural integration into different instantiations of Bayes filters, as we T  2 −1 GPΣ (x∗,D) = k(x∗, x∗) − k∗ K + σnI k∗. (4) will describe in Section III. GPs are typically defined for scalar outputs, and GP- Here, k is the kernel of the GP, k is a vector defined ∗ BayesFilters represent models for vectorial outputs by learning by kernel values between x and the training inputs X, and ∗ a different GP for each output dimension. As a result, the K is the n × n kernel matrix of the training input values; that noise GP are diagonal matrices. Another issue to is, k [i] = k(x , x ) and K[i, j] = k(x , x ). Note that the Σ ∗ ∗ i i j consider is that GPs assume a zero mean prior over the outputs prediction uncertainty, captured by the variance GP , depends Σ of the functions. A direct ramification of this assumption is that on both the process noise and the correlation between the test the GP predictions tend towards zero as the distance between input and the training data. the test input and the training data increases. In practice, this The choice of the kernel function depends on the applica- problem can be reduced by collecting sufficient training data tion, the most widely used being the squared exponential, or covering possible states, controls, and observations, and by Gaussian, kernel: incorporating parametric models into the GP, as shown in [7]. 0 2 − 1 (x−x0)W (x−x0)T k(x, x ) = σ e 2 , (5) f III.INSTANTIATIONS OF GP-BAYESFILTERS 2 where σf is the signal variance. The diagonal matrix W We will now show how GP models can be incorporated into defines the smoothness of the process along the different input different instantiations of Bayes filters. Specifically, we present dimensions. algorithms for GP integration into particle filters, extended The GP parameters θ = [W, σf , σn], describing the kernel Kalman filters, and unscented Kalman filters. For notation, function (5) and the process noise (2), respectively, are called we will stick close to the versions presented in [13]. the hyperparameters of the Gaussian process. These hyperpa- rameters can be learned by maximizing the log likelihood of A. GP-PF: Gaussian Process Particle Filters the training data using numerical optimization techniques such Particle filters are sample-based implementations of Bayes as conjugate gradient descent [12]. filters. The key idea of particle filters is to represent posteriors over the state x by sets X of weighted samples: C. Learning Prediction and Observation Models with GPs k k m (m) Gaussian process regression can be applied directly to Xk = {hxk , wk i | m = 1,...,M} the problem of learning prediction and observation models m (m) Here each xk is a sample (or state), and each wk is required by the Bayes filter (1). The training data for each a non-negative numerical factor called importance weight. GP is a set of input-output relations. The prediction model Particle filters update posteriors according to a sam- x u maps the state and control, ( k, k), to the state transition pling procedure [13]. Table I shows how this proce- ∆x = x −x k k+1 k. The next state can then be found by adding dure can be implemented with GP prediction and obser- the state transition to the previous state. The observation vation models. In Step 4, the state at time k is sam- model maps from the state, xk, to the observation, zk. The m pled based on the previous state xk−1 and control uk−1, appropriate form of the prediction and observation training using the GP prediction model defined in (8). Here, data sets is thus m GP([xk−1, uk−1],Dp) is short for the Gaussian represented by 0 m m  Dp = h(X,U),X i (6) N GPµ([xk−1, uk−1],Dp), GPΣ([xk−1, uk−1],Dp) . Note D = hX,Zi , (7) that the of this prediction is typically different for o each sample, taking the local density of training data into where X is a matrix containing ground truth states, and X0 = account. Importance sampling is implemented in Step 5, where [∆x1, ∆x2, ..., ∆xk] is a matrix containing transitions made each particle is weighted by the likelihood of the most recent m from those states when applying the controls stored in U. Z measurement zk given the sampled state xk . This likelihood is the matrix of observations made when in the corresponding can be easily extracted from the GP observation model defined states X. in (9). All other steps are identical to the generic particle filter 1: Algorithm GP-PF(Xk−1, uk−1, zk): 1: Algorithm GP-EKF(µk−1, Σk−1, uk−1, zk): ˆ 2: Xk = Xk = ∅ 2: µˆk = GPµ([µk−1, uk−1],Dp) 3: for m = 1 to M do 3: Qk = GPΣ ([µk−1, uk−1],Dp) m m ∂GP [µ ,u ],D 4: sample x ∼ GP([x , u ],Dp) µ( k−1 k−1 p) k k−1 k−1 4: Gk = ∂xk−1 [m] ` m m ´ 5: wk = N zk; GPµ(xk ,Do), GPΣ(xk ,Do) ˆ T 5: Σk = Gk Σk−1 Gk + Qk ˆ ˆ m [m] 6: Xk = Xk + hxk , wk i 6: zˆk = GPµ(ˆµk,Do) 7: endfor 7: Rk = GPΣ (ˆµk,Do) 8: for m = 1 to M do ∂GPµ(ˆµk,Do) 8: Hk = ∂x [i] k 9: draw i with probability ∝ wk ˆ T ˆ T −1 9: Kk = Σk Hk (Hk Σk Hk + Rk) [i] 10: add xk to Xk 10: µk =µ ˆk + Kk(zk − zˆk)) 11: endfor ˆ 11: Σk = (I − Kk Hk) Σk 12: return X k 12: return µk, Σk

TABLE I TABLE II THE GP-PF ALGORITHM. THE GP-EKF ALGORITHM. algorithm, where Steps 8 through 11 implement the resampling 5 uses these matrices to compute the predictive uncertainty, step (see [13]). in the same way as the standard EKF algorithm. Similarly, Steps 6 through 8 compute the predicted observation, zˆk, the B. GP-EKF: Gaussian Process Extended Kalman Filters noise covariance, Rk, and the linearization of the observation In addition to the GP mean and covariance estimates used in model, Hk, using the GP observation model. The remaining the GP-PF, the incorporation of GP models into the extended steps are identical to the standard EKF algorithm, where Step 9 Kalman filter requires a linearization of the GP function. For- computes the Kalman gain, followed by the update of the mean tunately, this linearization follows directly from the definition and covariance estimates in Steps 10 and 11, respectively. of (3). For each output dimension, the derivative of the GP mean function is: C. GP-UKF: Gaussian Process Unscented Kalman Filters T The GP-UKF algorithm in Table III is restated for com- ∂ (GPµ (x∗,D)) ∂(k∗)  2 −1 = K + δnI y. (10) pleteness of exposition [7]. The key idea underlying unscented ∂(x∗) ∂(x∗) Kalman filters is to replace the linearization employed by the As noted above, k∗ is the vector of kernel values between EKF by a more accurate linearization based on the unscented the query input x∗ and the training inputs X. The partial transform. To do so, the UKF generates a set of so-called derivatives of the kernel vector function are sigma points based on the mean and variance estimates of  ∂(k(x ,x )) ∂(k(x ,x ))  the previous time step. This set, generated in Step 2, contains ∗ 1 ... ∗ 1 ∂(x∗[1]) ∂(x∗[d]) 2d+1 points, where d is the dimensionality of the state space. ∂(k∗)  . . .  =  . .. .  , (11) Each of these points is then projected forward in time using ∂(x∗)   ∂(k(x∗,xn)) ... ∂(k(x∗,xn)) the GP prediction model in Step 3. The noise, ∂x∗[1]) x∗[d]) computed in Step 4, is identical to the noise used in the GP- where n is the number of training points and d is the dimen- EKF algorithm. Steps 5 and 6 are identical to the standard sionality of the input space. The partial derivatives depend on unscented Kalman filter; they compute the predictive mean the kernel function used. For the squared exponential kernel and covariance from the predicted sigma points. A new set of we get sigma points is extracted from this updated estimate in Step 7. The GP observation model is used in Step 8 to predict an ∂(k(x∗, x)) 1 T 2 − 2 (x∗−x)W (x∗−x) = −Wiiσf (x∗[i] − x[i])e . (12) observation for each of these points. The observation noise ∂(x∗[i]) matrix, Rk, is set to the GP uncertainty in Step 9. (10) defines the d-dimensional Jacobian vector for a single Steps 10 through 16 are standard UKF updates (see [13]). output dimension. The full lxd Jacobian of a prediction or The mean observation is determined from the observation observation model is determined by stacking l Jacobian vectors sigma points in Step 10, and Steps 11 and 12 compute together, one for each of the output dimensions. uncertainties and correlations used to determine the Kalman We are now prepared to incorporate GP prediction and ob- gain in Step 13. Finally, the next two Steps update the mean servation models into an EKF, as shown in Table II. Step 2 uses and covariance estimates, which are returned in Step 16. the GP prediction model (3) to generate the predicted mean µˆk. D. GP-BayesFilter Complexity Analysis Step 3 sets the additive process noise, Qk, which corresponds directly to the GP uncertainty. Gk, the linearization of the This section examines the runtime complexity of the differ- prediction model, is extracted from the GP via (10). Step ent algorithms. In typical tracking applications, the number 1: Algorithm GP-UKF(µk−1, Σk−1, uk−1, zk): equivalent to a kernel evaluation). We denote these costs ` p p ´ as Ckern and Cmult, respectively. The computation of the 2: Xk−1 = µk−1 µk−1 + γ Σk−1 µk−1 − γ Σk−1 [i] “ [i] ” mean function GPµ (x∗,D) defined in (3) requires n such 3: for i = 0 ... 2n: X¯ = GPµ [X , uk−1],Dp k k−1 evaluations, one for each combination of the query point x∗ 4: Q = GP ([µ , u ],D ) k Σ k−1 k−1 p and a training point xi. In addition, this operation must be 2n X [i] ¯[i] done for each output dimension since we use a separate GP 5: µˆk = wm Xk i=0 to model each output dimension. We assume the number of 2n output dimensions is equal to the dimensionality of the state ˆ X [i] ¯[i] ¯[i] T 6: Σk = wc (Xk − µˆk)(Xk − µˆk) + Qk space: i=0 „ q q « ˆ ˆ Cµ = nd (Ckern + Cmult) (16) 7: Xˆk = µˆt µˆt + γ Σk µˆt − γ Σk “ ”  2 −1 ˆ[i] ˆ[i] Note that the term K + σnI y in (3) and (10) is inde- 8: for i = 0 ... 2n: Zk = GPµ Xk ,Do pendent of x∗ and can be cached. The computation of the 9: R = GP (ˆµ ,D ) k Σ k o GP (x ,D) given in (4) requires nd 2n Σ ∗ X [i] ˆ[i] additional multiplications given proper caching: 10: zˆk = wm Zk i=0 2n CΣ = nd Cmult, (17) X [i] ˆ[i] ˆ[i] T 11: Sk = wc (Zk − zˆk)(Zk − zˆk) + Rk i=0 and similarly, the linearization of the GP, given in (12), also 2n requires nd multiplications: ˆ x,z X [i] ˆ[i] ˆ[i] T 12: Σk = wc (Xk − µˆk)(Zk − zˆk) i=0 Ctse = nd Cmult. (18) ˆ x,z −1 13: Kk = Σ S k k The total costs in terms of kernel computations and multipli- 14: µk =µ ˆk + Kk(zk − zˆk) ˆ T cations are tabulated below. 15: Σk = Σk − Kk Sk Kk 16: return µk, Σk TABLE IV COMPUTATIONAL REQUIREMENTS TABLE III Ckern Cmult THE GP-UKF ALGORITHM. GP UKF nd(2d + 1) nd(2d + 2) GP EKF nd 3nd n of training points used for the GP is much higher than GP P F Mnd 2Mnd the dimensionality d of the state space. In these cases, the complexity of GP-BayesFilters is dominated by GP operations, Unfortunately, there is no way to compare kernel operations on which we will concentrate our analysis. Furthermore, since directly to multiplications since various different kernels can the ratio between prediction and observation evaluations is the be used. However, analysis in these terms can still be useful. same for all algorithms, we focus on the prediction step of each First, the number of particles is the key determinant for the algorithm (the correction step is analogous). complexity of the GP-PF. As can be seen from Table V, GP-PF Since GP-PFs need to perform one GP mean and variance is not very competitive time-wise to the other filters. On the computation per particle, the overall complexity of GP-PFs other hand, GP-PF is the only algorithm which can perform 2d follows as global localization. GP-UKF has roughly times more kernel computations than GP-EKF. Since the kernel computation is Cpf = M (Cµ + CΣ). (13) generally much slower than a multiply operation, the cost for the kernel operations will dominate in these two algorithms. The GP-EKF algorithm requires one GP mean and variance computation plus one GP Taylor series expansion step: IV. EXPERIMENTS A. Robotic Blimp Experiments Cekf = Cµ + CΣ + Ctse (14) 1) Testbed: The experimental testbed for evaluating the GP- The GP-UKF algorithm requires one GP mean computation Bayes filters is a robotic micro-blimp. A custom-built gondola for each sigma point. One GP variance computation is also is suspended beneath a 5.5 foot (1.7 meter) long envelope. The necessary per step. Since the UKF requires 2d+1 sigma points, gondola houses two main fans that pivot together to provide the complexity follows as thrust in the longitudinal (forwards-up) plane. A third motor located in the tail provides thrust to yaw the blimp about the Cukf = (2d + 1)Cµ + CΣ. (15) body-fixed Z-axis. There are a total of three control inputs: To get a more accurate measure, we have to consider the the power of the gondola fans, the angle of the gondola fans, complexity of the GP operations. The core of each GP and the power of the tail fan. mean prediction operation consists of a kernel evaluation (5) Observations for the filters come from two network cam- followed by a multiplication (we treat (12) as computationally eras mounted in the laboratory. Observations are formed by TABLE V BLIMPTRACKINGQUALITY 20 15 Tracking MLL p(mm) ξ(deg) v(mm/s) ω(deg/s) time(s) 10 algorithm GP UKF 14.9±0.5 89±1.3 4.7±0.2 50±0.4 4.5±0.1 1.28±0.3 5 GP EKF 13.0±0.2 93±1.4 5.2±0.1 52±0.5 4.6±0.1 0.29±0.1 0 GP P F 9.4±1.9 91±7.5 6.4±1.6 52±3.7 5.0±0.2 449.4±21 paraUKF 10.1±0.6 111±3.8 7.9±0.1 64±1.2 7.6±0.1 0.33±0.1 mean log likelihood (MLL) −5 paraEKF 8.4±1.0 112±3.8 8.0±0.2 65±1.6 7.5±0.2 0.21±0.1

paraP F -4.5±4.2 115±4.5 10.5±1.7 73±5.4 9.4±0.3 30.7±5.8 60 GPEKF GPUKF background subtraction followed by fitting an ellipse to the 40 remaining pixels. The observations are then the parameters of the ellipse in image space. 20 A motion capture system is used to capture the ground percentage of failed runs 0 truth positions of the blimp as it flies. The VICON motion 0 10 20 30 40 50 60 70 80 capture (MOCAP) system tracks reflective markers attached percentage of total training data to the blimp as 3D points in space. This data is used to train Fig. 1. GP-UKF shows better accuracy and robustness compared to GP-EKF the GPs and parametric motion models, and to evaluate the as the number of training points for the GP is reduced. tracking performance of the various filtering algorithms. The testbed software is written in a combination of C++ and MAT- 4) Training Data Sparsity: LAB. Gaussian process code is from Neil Lawrence [8]. All The next experiment tests the tracking accuracy of the GP-EKF vs. GP-UKF as the number experiments were performed on an Intel R XeonTM running at 3.40GHz. of training points is reduced. For each level of data sparseness, 2) Parametric Prediction and Observation Models: The we perform cross validation by selecting 16 sets containing parametric motion model appropriate for the blimp was de- the corresponding number of training data. These sets were scribed in detail in [7]. The state of the blimp consists of then tested on hold out data, just as in the four-fold cross position p, orientation ξ parameterized by Euler angles, linear validation structure from the previous experiment. Results for velocity v, and angular velocity ω. The parametric motion this experiment are shown using two measures. The first is the model takes into account forces including gravity, buoyancy, percentage of runs that failed to accurately track the blimp. thrust, and drag. This model is discretized in time in order to A run is considered a failure if the MLL is less than −20, produce gˆ which predicts the next state given the current state since such runs have completely lost track of the blimp. The and control input. other measure is the MLL of the remaining runs. There is a The parametric observation model takes as input the six- fairly smooth degradation of accuracy for both methods. The dimensional pose of the blimp within the camera’s coordinate tracking only experiences drastic reduction in accuracy with system. It outputs the parameters of an ellipse in the camera’s less than 15% of the total training points (≈ 130). This result image space. The ellipse parameters are generated based on a also shows GP-UKF to be more accurate and more robust than shape model of the blimp. GP-EKF at all levels of training data. 3) Tracking Results: Experimental results were obtained B. Synthetic Experiment via 4-fold cross-validation. The data was broken up into 4 The previous experiment showed that GP-UKFs and GP- equally sized sections covering approximately 5 minutes of EKFs have very similar behavior for sparse training data. Here, blimp flight each. The algorithms were tested on each section we demonstrate the somewhat surprising result that GP-EKFs with the remaining sections used for learning of the GP can handle certain sparseness conditions better than GP-UKFs. models and parameters for the physics based models. The The filters use range observations to known landmarks to GP motion models were optimized using approximately 900 track a robot moving in a square circuit. The robot uses the training points while the GP observation model used 800 following motion model: points. The particle filters were run with 2000 particles.

Table V shows the tracking accuracy of the various filters. p(t + 1) = p(t) + v(t) + p (19) Tracking accuracy is shown separately for position, rotation, v(t + 1) = v(t) + u(t) + v, (20) velocity, and rotational velocity. In addition, mean log likeli- hoods (MLL) of the ground truth given the state estimate and where p denotes position, v velocity of the robot, and u the time per filter update are shown. control input. p and v are zero mean Gaussian noise for The three GP-Bayes filters have better accuracy in general position and velocity, respectively. The robot receives as an than their parametric Bayes filter counterparts. GP-UKF out- observation the range to one of the landmarks. Observations performs GP-EKF by a small margin, however, the GP-EKF are made for each landmark in a cyclic fashion. is more than four times faster than the GP-UKF. GP-PF has The training data for the GPs is obtained from a training surprisingly poor accuracy. This is likely due to an insufficient run of the robot. The GPs for the motion model learn the number of particles for the size of the state space. change in position and velocity from one time step to the 500 we showed how GP models can be linearized and incorporated into extended Kalman filters. In addition to developing the 400 algorithms, we provide a complexity analysis of the different 300 instances of GP-BayesFilters.

200 In our experiments, all versions of GP-BayesFilters out- perform their parametric counterparts. Typically, GP-UKFs 100 perform slightly superior to GP-EKFs, at the cost of higher 0 computational complexity. However, one experiment demon-

−100 strates that GP-EKFs can outperform GP-UKFs when training data is sparse and thus does not cover all sigma points −200 −100 0 100 200 300 400 500 600 generated during tracking. The additional operations needed for GP models can result Fig. 2. The robot is indicated by the square and the landmarks by the circles. The dashed line represents the trajectory of the robot. The (green / in prohibitive complexity, especially in the case of particle grey) shaded region indicates the density of training data. The five crosses filters. We thus conjecture that the best applications for GP- indicate sigma points of the GP-UKF during a tracking run. Notice that the BayesFilters are those where computational complexity is not four outer points are in areas of low training data density. crucial, or where accuracy is very important. When efficiency is important, the use of sparse GP models can greatly re- next. This is similar to the modeling of the blimp dynamics. duce the computational complexity of GP regression [12]. The observation model uses as training data range information Furthermore, as shown in [6], Gaussian process models can and positions from the robot during the training run. That is, be combined with parametric models, resulting in further observation training data only comes from locations visited improved accuracy. during the training run. ACKNOWLEDGMENTS TABLE VI This work was supported in part by DARPA’s ASSIST TRACKINGERRORFORSYNTHETICTEST Program (contract numbers NBCH-C-05-0137). p v REFERENCES GP UKF 74.6±0.34 8.0±0.003 GP EKF 48.2±0.08 7.2±0.005 [1] A. Doucet, N. de Freitas, and N. Gordon, editors. Sequential Monte GP P F 43.8±0.07 6.0±0.003 Carlo in Practice. Springer-Verlag, New York, 2001. [2] B. Ferris, D. Hahnel,¨ and D. Fox. Gaussian processes for signal strength- As can be seen in Table V, the tracking performance of GP- based location estimation. In Proc. of Robotics: Science and Systems, 2006. UKF is significantly worse than that of GP-EKF and GP-PF. [3] A. Girard, C. Rasmussen, J. Quinonero¨ Candela, and R. Murray-Smith. This is a result of the sigma points being spread out around Gaussian process priors with uncertain inputs – application to multiple- the mean prediction of the filter. In this experiment, however, step ahead forecasting. In Advances in Neural Information Processing Systems 15 (NIPS). 2005. the training data only covers a small envelope around the [4] D. Grimes, R. Chalodhorn, and R. Rao. Dynamic imitation in a robot trajectory. Therefore, it can happen that sigma points are humanoid robot through nonparametric probabilistic inference. In outside the training data and thus receive poor GP estimates. Proc. of Robotics: Science and Systems, 2006. [5] K. Kersting, C. Plagemann, P. Pfaff, and W. Burgard. Most likely The GP-EKF does not suffer from this problem since all heteroscedastic gaussian process regression. In Proc. of the International GP predictions are made from the mean point which is well Conference on (ICML), 2007. within the coverage of the training data. GP-PFs handle such [6] J. Ko, D. Klein, D. Fox, and D. Hahnel.¨ GP-UKF: Unscented Kalman filters with Gaussian process prediction and observation models. In problems by assigning very low weights to samples with Proc. of the IEEE/RSJ International Conference on Intelligent Robots inaccurate GP models. and Systems (IROS), 2007. This experiment indicates that one must be careful when [7] J. Ko, D.J. Klein, D. Fox, and D. Hahnel.¨ Gaussian processes and reinforcement learning for identification and control of an autonomous using GP-UKFs to select training points that ensure broad blimp. In Proc. of the IEEE International Conference on Robotics & enough coverage of the operational space of the system. Automation (ICRA), 2007. [8] N.D. Lawrence. http://www.dcs.shef.ac.uk/˜neil/fgplvm/. V. CONCLUSIONSAND FUTURE WORK [9] K. Liu, A. Hertzmann, and Z. Popovic. Learning physics-based motion style with nonlinear inverse optimization. In ACM Transactions on Recently, several researchers have demonstrated that Gaus- Graphics (Proc. of SIGGRAPH), 2005. sian process regression models are well suited as components [10] C. Plagemann, D. Fox, and W. Burgard. Efficient failure detection on mobile robots using Gaussian process proposals. In Proc. of the of Bayesian filters. GPs are non-parametric models that can be International Joint Conference on Artificial Intelligence (IJCAI), 2007. learned from training data and that provide estimates that take [11] V. Quoc, A. Smola, and S. Canu. Heteroscedastic Gaussian process both sensor noise and model uncertainty due to data sparsity regression. In Proc. of the International Conference on Machine Learning (ICML), 2005. into account. In this paper, we introduced GP-BayesFilters, [12] C.E. Rasmussen and C.K.I. Williams. Gaussian processes for machine the integration of Gaussian process prediction and observation learning. The MIT Press, 2005. models into generic Bayesian filtering techniques. In addition [13] S. Thrun, W. Burgard, and D. Fox. Probabilistic Robotics. MIT Press, to the existing GP-UKF algorithm, we developed GP-PFs, Cambridge, MA, September 2005. ISBN 0-262-20162-3. which combine GP models with particle filtering. Furthermore,