Nonparametric Density Estimation for Learning Noise Distributions in Mobile Robotics
Total Page:16
File Type:pdf, Size:1020Kb
Nonparametric Density Estimation for Learning Noise Distributions in Mobile Robotics David M. Rosen and John J. Leonard Histogram of observations Abstract—By admitting an explicit representation of the uncer- 350 tainty associated with sensing and actuation in the real world, the probabilistic robotics paradigm has led to remarkable improve- 300 ments in the robustness and performance of autonomous systems. However, this approach generally requires that the sensing and 250 actuation noise acting upon an autonomous system be specified 200 in the form of probability density functions, thus necessitating Count that these densities themselves be estimated. To that end, in this 150 paper we present a general framework for directly estimating the probability density function of a noise distribution given only a 100 set of observations sampled from that distribution. Our approach is based upon Dirichlet process mixture models (DPMMs), a 50 class of infinite mixture models commonly used in Bayesian 0 −10 −5 0 5 10 15 20 25 nonparametric statistics. These models are very expressive and Observation enjoy good posterior consistency properties when used in density estimation, but the density estimates that they produce are often too computationally complex to be used in mobile robotics Fig. 1. A complicated noise distribution: This figure shows a histogram of applications, where computational resources are limited and 3000 observations sampled from a complicated noise distribution in simulation speed is crucial. We derive a straightforward yet principled (cf. equations (37)–(40)). The a priori identification of a parametric family capable of accurately characterizing complex or poorly-understood noise approximation method for simplifying the densities learned by distributions such as the one shown here is a challenging problem, which DPMMs in order to produce computationally tractable density renders parametric density estimation approaches vulnerable to model mis- estimates. When implemented using the infinite Gaussian mixture specification. Nonparametric approaches overcome this difficulty by inferring model (a specific class of DPMMs that is particularly well-suited an appropriate model directly from the data. for mobile robotics applications), our approach is capable of n approximating any continuous distribution on R arbitrarily well in total variation distance, and produces C1, bounded, families [5]. However, all of these techniques depend upon everywhere positive, and efficiently computable density estimates the practitioner’s identification of a set of candidate parametric suitable for use in real-time inference algorithms on mobile models containing at least one member that is able to capture robotic platforms. the true noise characteristics well. It is far from clear that such I. INTRODUCTION a suitable a priori model selection will always be possible, especially in cases where the physical noise-generating process By admitting an explicit representation of the uncertainty is complex or not well understood (cf. Fig.1). This renders associated with sensing and actuation in a complex, noisy, and existing parametric density estimation approaches vulnerable ever-changing world, the widespread adoption of the proba- to model misspecification, which can severely degrade the per- bilistic robotics paradigm has led to remarkable improvements formance of autonomous systems that rely upon the resulting in the robustness and performance of autonomous systems. density estimates. However, this approach generally requires [22] that the sensing In this paper, we present a method for performing and actuation noise acting upon an autonomous system be nonparametric density estimation suitable for learning noise specified in the form of probability density functions, thus distributions in mobile robotics applications. Our approach necessitating that these densities themselves be estimated. is based upon Dirichlet process mixture models (DPMMs), a In the context of mobile robotics, this density estimation class of infinite mixture models commonly used in Bayesian problem is commonly addressed by assuming a priori that the nonparametric statistics [8]. These models are very expressive true density is a member of a particular parametric family and enjoy good posterior consistency properties when used p(·|θ) (e.g. Gaussian, gamma, exponential, etc.), and then in density estimation [12], [25], but the density estimates attempting to estimate the parameter θ that best captures that they produce are often too complex to admit real-time the observed noise characteristics. More advanced approaches inference using the limited computational resources available (e.g. the Bayes [19] or Akaike [1] Information Criteria) allow on mobile robotic platforms. We derive a straightforward yet joint model selection and parameter estimation from amongst principled approximation method for simplifying the densities a (possibly very large) finite set of candidate parametric learned by DPMMs in order to produce computationally The authors are with the Massachusetts Institute of Technology, Cambridge, tractable density estimates. When implemented using the infi- MA 02139, USA fdmrosen,[email protected] nite Gaussian mixture model (a specific class of DPMMs that is particularly well-suited for mobile robotics applications), 2) Posterior belief: The posterior belief for Gjθ1; : : : ; θn is our approach is capable of approximating any continuous dis- again Dirichlet-process distributed: n tribution on R arbitrarily well in total variation distance, and n ! 1 α n 1 X produces C , bounded, everywhere positive, and efficiently Gjθ ; : : : ; θ ∼ DP α + n; G + · δ 1 n α + n 0 α + n n θi computable density estimates suitable for use in real-time i=1 inference algorithms on mobile robotic platforms. (5) where δθ is the measure assigning unit mass to the point θi. II. THE DIRICHLET PROCESS i In other words, the Dirichlet process provides a class of priors The Dirichlet process is a stochastic process whose samples over probability distributions G that is closed under posterior are themselves random distributions. It was introduced by Fer- updates given observations θ1; : : : ; θn sampled from G. guson in [8], who proposed its use in Bayesian nonparametric 3) Posterior predictive distribution: Given the observa- statistics as a prior over the set of all probability distributions tions θ1; : : : ; θn, we can obtain the predictive distribution on a given sample space. In this section, we review its formal θn+1jθ1; : : : ; θn for a subsequent sample θn+1 drawn from G definition and some of the properties that make it an attractive by observing that θn+1jG; θ1; : : : ; θn ∼ G and then marginal- choice as a prior in Bayesian nonparametric applications. izing over G using the posterior distribution for Gjθ1; : : : ; θn A. Formal definition obtained in (5). This gives the posterior predictive distribution: n Let Ω be a sample space, G0 a probability measure on Ω, α n 1 X + θ jθ ; : : : ; θ ∼ G + · δ : (6) and α 2 R a positive scalar. A random probability distribu- n+1 1 n α + n 0 α + n n θi tion G on Ω is distributed according to the Dirichlet process i=1 with base distribution G0 and concentration parameter α, In other words, the posterior base distribution in (5) is also denoted G ∼ DP(α; G0), if the predictive distribution for θn+1 given θ1; : : : ; θn. 4) Sample clustering and the Chinese restaurant process: (G(A1);:::;G(Ak)) ∼ Dir (αG0(A1); : : : ; αG0(Ak)) (1) Equation (6) implies an important clustering property for k for every finite measurable partition fAigi=1 of Ω. In other samples drawn from a random distribution G with a Dirichlet words, we require that the (random) probabilities assigned by process prior. By the chain rule of probability, we can realize a G to elements of the partition fA gk be Dirichlet distributed. i i=1 sequence of draws θ1; : : : ; θn from G by sequentially sampling B. Useful properties of the Dirichlet process each θi according to In this subsection we collect some useful properties of the θi ∼ θijθ1; : : : ; θi−1; 1 ≤ i ≤ n; (7) Dirichlet process that make it an attractive choice as a prior in Bayesian nonparametric applications. Due to space constraints where the right-hand side of equation (7) is the predictive we present these results without proof; interested readers are distribution given in (6). Now, observe that the presence of encouraged to consult the excellent introductory articles [11], the distributions δθj centered on the previous draws θj in the [20] for a more detailed exposition. predictive distribution imply that p(θi = θjjθ1; : : : ; θi−1) > 0 For the remainder of this section, we let Ω be a sample for all 1 ≤ j < i ≤ n; i.e., that previously sampled values + space, G0 a probability measure on Ω, α 2 R , G ∼ will repeat with positive probability. DP(α; G0), and θ1; : : : ; θn ∼ G. More precisely, consider drawing the sample θi from the 1) Expectation and variance: Letting A ⊆ Ω be a mea- conditional distribution θijθ1; : : : ; θi−1 at timestep i. Let- c ∗ ∗ surable subset and considering the partition fA; A g, we have ting θ1; : : : ; θm denote the unique values within the set ∗ ∗ from (1) that fθ1; : : : ; θi−1g of previously drawn samples and n1; : : : ; nm denote their corresponding empirical counts, we have from (6) G(A) ∼ Beta (αG0(A); α(1 − G0(A))) (2) ∗ that θi repeats the value θj with probability and therefore that ∗ ∗ nj E [G(A)] = G0(A); p(θ = θ jθ ; : : : ; θ ) = ; (8) i j 1 i−1 α + i − 1 G (A) · (1 − G (A)) (3) Var [G(A)] = 0 0 : 1 + α