Bayesian Filtering: from Kalman Filters to Particle Filters, and Beyond ZHE CHEN

Total Page:16

File Type:pdf, Size:1020Kb

Bayesian Filtering: from Kalman Filters to Particle Filters, and Beyond ZHE CHEN MANUSCRIPT 1 Bayesian Filtering: From Kalman Filters to Particle Filters, and Beyond ZHE CHEN Abstract— In this self-contained survey/review paper, we system- IV Bayesian Optimal Filtering 9 atically investigate the roots of Bayesian filtering as well as its rich IV-AOptimalFiltering..................... 10 leaves in the literature. Stochastic filtering theory is briefly reviewed IV-BKalmanFiltering..................... 11 with emphasis on nonlinear and non-Gaussian filtering. Following IV-COptimumNonlinearFiltering.............. 13 the Bayesian statistics, different Bayesian filtering techniques are de- IV-C.1Finite-dimensionalFilters............ 13 veloped given different scenarios. Under linear quadratic Gaussian circumstance, the celebrated Kalman filter can be derived within the Bayesian framework. Optimal/suboptimal nonlinear filtering tech- V Numerical Approximation Methods 14 niques are extensively investigated. In particular, we focus our at- V-A Gaussian/Laplace Approximation ............ 14 tention on the Bayesian filtering approach based on sequential Monte V-BIterativeQuadrature................... 14 Carlo sampling, the so-called particle filters. Many variants of the V-C Mulitgrid Method and Point-Mass Approximation . 14 particle filter as well as their features (strengths and weaknesses) are V-D Moment Approximation ................. 15 discussed. Related theoretical and practical issues are addressed in V-E Gaussian Sum Approximation . ............. 16 detail. In addition, some other (new) directions on Bayesian filtering V-F Deterministic Sampling Approximation . ....... 16 are also explored. V-G Monte Carlo Sampling Approximation . ....... 17 Index Terms— Stochastic filtering, Bayesian filtering, V-G.1ImportanceSampling.............. 18 Bayesian inference, particle filter, sequential Monte Carlo, V-G.2RejectionSampling................ 19 sequential state estimation, Monte Carlo methods. V-G.3SequentialImportanceSampling........ 19 V-G.4Sampling-ImportanceResampling....... 20 V-G.5StratifiedSampling................ 21 “The probability of any event is the ratio between the V-G.6MarkovChainMonteCarlo........... 22 value at which an expectation depending on the happening of the event ought to be computed, and the value of the V-G.7HybridMonteCarlo............... 23 thing expected upon its happening.” V-G.8Quasi-MonteCarlo................ 24 — Thomas Bayes (1702-1761), [29] VI Sequential Monte Carlo Estimation: Particle Filters 25 “Statistics is the art of never having to say you’re wrong. Variance is what any two statisticians are at.” VI-ASequentialImportanceSampling(SIS)Filter..... 26 —C.J.Bradfield VI-BBootstrap/SIRfilter................... 26 VI-CImprovedSIS/SIRFilters................ 27 Contents VI-DAuxiliary Particle Filter ................. 28 VI-ERejectionParticleFilter................. 29 I Introduction 2 VI-F Rao-Blackwellization ................... 30 I-AStochasticFilteringTheory............... 2 VI-GKernelSmoothingandRegularization......... 31 I-BBayesianTheoryandBayesianFiltering........ 2 VI-HDataAugmentation................... 32 I-C Monte Carlo Methods and Monte Carlo Filtering . 2 VI-H.1 Data Augmentation is an Iterative Kernel I-DOutlineofPaper..................... 3 SmoothingProcess................ 32 VI-H.2 Data Augmentation as a Bayesian Sampling II Mathematical Preliminaries and Problem Formula- Method...................... 33 tion 4 VI-I MCMC Particle Filter .................. 33 II-APreliminaries....................... 4 VI-JMixtureKalmanFilters................. 34 II-BNotations......................... 4 VI-KMixtureParticleFilters................. 34 II-CStochasticFilteringProblem.............. 4 VI-LOtherMonteCarloFilters................ 35 II-D Nonlinear Stochastic Filtering Is an Ill-posed Inverse VI-MChoicesofProposalDistribution............ 35 Problem.......................... 5 VI-M.1PriorDistribution................ 35 II-D.1InverseProblem................. 5 VI-M.2Annealed Prior Distribution ........... 36 II-D.2 Differential Operator and Integral Equation . 6 VI-M.3Likelihood..................... 36 II-D.3RelationstoOtherProblems.......... 7 VI-M.4Bridging Density and Partitioned Sampling . 37 II-EStochasticDifferentialEquationsandFiltering.... 7 VI-M.5Gradient-BasedTransitionDensity....... 38 VI-M.6EKFasProposalDistribution.......... 38 III Bayesian Statistics and Bayesian Estimation 8 VI-M.7UnscentedParticleFilter............ 38 III-ABayesianStatistics.................... 8 VI-NBayesianSmoothing................... 38 III-BRecursiveBayesianEstimation............. 9 VI-N.1Fixed-pointsmoothing.............. 38 VI-N.2Fixed-lagsmoothing............... 39 The work is supported by the Natural Sciences and Engineering VI-N.3Fixed-intervalsmoothing............ 39 Research Council of Canada. Z. Chen was also partially supported VI-OLikelihoodEstimate................... 40 by Clifton W. Sherman Scholarship. VI-PTheoreticalandPracticalIssues............. 40 The author is with the Communications Research Laboratory, McMaster University, Hamilton, Ontario, Canada L8S 4K1, e- VI-P.1ConvergenceandAsymptoticResults..... 40 mail: [email protected], Tel: (905)525-9140 x27282, VI-P.2Bias-Variance................... 41 Fax:(905)521-2922. VI-P.3Robustness.................... 43 VI-P.4AdaptiveProcedure............... 46 MANUSCRIPT 2 VI-P.5EvaluationandImplementation......... 46 its line have been proposed and developed to overcome its limitation. VIIOther Forms of Bayesian Filtering and Inference 47 VII-AConjugate Analysis Approach .............. 47 B. Bayesian Theory and Bayesian Filtering VII-BDifferential Geometrical Approach . .......... 47 VII-CInteractingMultipleModels............... 48 Bayesian theory2 was originally discovered by the British VII-DBayesian Kernel Approaches ............... 48 researcher Thomas Bayes in a posthumous publication in VII-EDynamicBayesianNetworks............... 48 1763 [29]. The well-known Bayes theorem describes the VIIISelected Applications 49 fundamental probability law governing the process of log- VIII-ATargetTracking...................... 49 ical inference. However, Bayesian theory has not gained VIII-BComputerVisionandRobotics............. 49 its deserved attention in the early days until its modern VIII-CDigitalCommunications................. 49 form was rediscovered by the French mathematician Pierre- VIII-DSpeechEnhancementandSpeechRecognition..... 50 Simon de Laplace in Th´eorie analytique des probailit´es.3 VIII-EMachineLearning..................... 50 Bayesian inference [38], [388], [375], devoted to applying VIII-FOthers........................... 50 Bayesian statistics to statistical inference, has become one VIII-GAnIllustrativeExample:Robot-ArmProblem..... 50 of the important branches in statistics, and has been ap- IX Discussion and Critique 51 plied successfully in statistical decision, detection and es- IX-AParameterEstimation.................. 51 timation, pattern recognition, and machine learning. In IX-BJointEstimationandDualEstimation......... 51 particular, the November 19 issue of 1999 Science mag- IX-CPrior............................ 52 azine has given the Bayesian research boom a four-page IX-DLocalizationMethods.................. 52 special attention [320]. In many scenarios, the solutions IX-EDimensionalityReductionandProjection....... 53 IX-FUnansweredQuestions.................. 53 gained through Bayesian inference are viewed as “optimal”. Not surprisingly, Bayesian theory was also studied in the X Summary and Concluding Remarks 55 filtering literature. One of the first exploration of itera- tive Bayesian estimation is found in Ho and Lee’ paper I. Introduction [212], in which they specified the principle and procedure of Bayesian filtering. Sprangins [426] discussed the itera- HE contents of this paper contain three major scien- tive application of Bayes rule to sequential parameter esti- tific areas: stochastic filtering theory, Bayesian theory, T mation and called it as “Bayesian learning”. Lin and Yau and Monte Carlo methods. All of them are closely discussed [301] and Chien an Fu [92] discussed Bayesian approach around the subject of our interest: Bayesian filtering. In to optimization of adaptive systems. Bucy [62] and Bucy the course of explaining this long story, some relevant the- and Senne [63] also explored the point-mass approximation ories are briefly reviewed for the purpose of providing the method in the Bayesian filtering framework. reader a complete picture. Mathematical preliminaries and background materials are also provided in detail for the C. Monte Carlo Methods and Monte Carlo Filtering self-containing purpose. The early idea of Monte Carlo4 can be traced back to A. Stochastic Filtering Theory the problem of Buffon’s needle when Buffon attempted in 1777 to estimate π (see e.g., [419]). But the modern Stochastic filtering theory was first established in the formulation of Monte Carlo methods started from 1940s early 1940s due to the pioneering work by Norbert Wiener in physics [330], [329], [393] and later in 1950s to statis- [487], [488] and Andrey N. Kolmogorov [264], [265], and it tics [198]. During the World War II, John von Neumann, culminated in 1960 for the publication of classic Kalman Stanislaw Ulam, Niick Metropolis, and others initialized filter (KF) [250] (and subsequent Kalman-Bucy filter in 1 the Monte Carlo method in Los Alamos Laboratory. von 1961 [249]), though many credits should be also due to Neumann also used Monte Carlo method to calculate the some earlier work by Bode and Shannon [46], Zadeh and elements of
Recommended publications
  • Kalman and Particle Filtering
    Abstract: The Kalman and Particle filters are algorithms that recursively update an estimate of the state and find the innovations driving a stochastic process given a sequence of observations. The Kalman filter accomplishes this goal by linear projections, while the Particle filter does so by a sequential Monte Carlo method. With the state estimates, we can forecast and smooth the stochastic process. With the innovations, we can estimate the parameters of the model. The article discusses how to set a dynamic model in a state-space form, derives the Kalman and Particle filters, and explains how to use them for estimation. Kalman and Particle Filtering The Kalman and Particle filters are algorithms that recursively update an estimate of the state and find the innovations driving a stochastic process given a sequence of observations. The Kalman filter accomplishes this goal by linear projections, while the Particle filter does so by a sequential Monte Carlo method. Since both filters start with a state-space representation of the stochastic processes of interest, section 1 presents the state-space form of a dynamic model. Then, section 2 intro- duces the Kalman filter and section 3 develops the Particle filter. For extended expositions of this material, see Doucet, de Freitas, and Gordon (2001), Durbin and Koopman (2001), and Ljungqvist and Sargent (2004). 1. The state-space representation of a dynamic model A large class of dynamic models can be represented by a state-space form: Xt+1 = ϕ (Xt,Wt+1; γ) (1) Yt = g (Xt,Vt; γ) . (2) This representation handles a stochastic process by finding three objects: a vector that l describes the position of the system (a state, Xt X R ) and two functions, one mapping ∈ ⊂ 1 the state today into the state tomorrow (the transition equation, (1)) and one mapping the state into observables, Yt (the measurement equation, (2)).
    [Show full text]
  • The Exponential Family 1 Definition
    The Exponential Family David M. Blei Columbia University November 9, 2016 The exponential family is a class of densities (Brown, 1986). It encompasses many familiar forms of likelihoods, such as the Gaussian, Poisson, multinomial, and Bernoulli. It also encompasses their conjugate priors, such as the Gamma, Dirichlet, and beta. 1 Definition A probability density in the exponential family has this form p.x / h.x/ exp >t.x/ a./ ; (1) j D f g where is the natural parameter; t.x/ are sufficient statistics; h.x/ is the “base measure;” a./ is the log normalizer. Examples of exponential family distributions include Gaussian, gamma, Poisson, Bernoulli, multinomial, Markov models. Examples of distributions that are not in this family include student-t, mixtures, and hidden Markov models. (We are considering these families as distributions of data. The latent variables are implicitly marginalized out.) The statistic t.x/ is called sufficient because the probability as a function of only depends on x through t.x/. The exponential family has fundamental connections to the world of graphical models (Wainwright and Jordan, 2008). For our purposes, we’ll use exponential 1 families as components in directed graphical models, e.g., in the mixtures of Gaussians. The log normalizer ensures that the density integrates to 1, Z a./ log h.x/ exp >t.x/ d.x/ (2) D f g This is the negative logarithm of the normalizing constant. The function h.x/ can be a source of confusion. One way to interpret h.x/ is the (unnormalized) distribution of x when 0. It might involve statistics of x that D are not in t.x/, i.e., that do not vary with the natural parameter.
    [Show full text]
  • Machine Learning Conjugate Priors and Monte Carlo Methods
    Hierarchical Bayes for Non-IID Data Conjugate Priors Monte Carlo Methods CPSC 540: Machine Learning Conjugate Priors and Monte Carlo Methods Mark Schmidt University of British Columbia Winter 2016 Hierarchical Bayes for Non-IID Data Conjugate Priors Monte Carlo Methods Admin Nothing exciting? We discussed empirical Bayes, where you optimize prior using marginal likelihood, Z argmax p(xjα; β) = argmax p(xjθ)p(θjα; β)dθ: α,β α,β θ Can be used to optimize λj, polynomial degree, RBF σi, polynomial vs. RBF, etc. We also considered hierarchical Bayes, where you put a prior on the prior, p(xjα; β)p(α; βjγ) p(α; βjx; γ) = : p(xjγ) But is the hyper-prior really needed? Hierarchical Bayes for Non-IID Data Conjugate Priors Monte Carlo Methods Last Time: Bayesian Statistics In Bayesian statistics we work with posterior over parameters, p(xjθ)p(θjα; β) p(θjx; α; β) = : p(xjα; β) We also considered hierarchical Bayes, where you put a prior on the prior, p(xjα; β)p(α; βjγ) p(α; βjx; γ) = : p(xjγ) But is the hyper-prior really needed? Hierarchical Bayes for Non-IID Data Conjugate Priors Monte Carlo Methods Last Time: Bayesian Statistics In Bayesian statistics we work with posterior over parameters, p(xjθ)p(θjα; β) p(θjx; α; β) = : p(xjα; β) We discussed empirical Bayes, where you optimize prior using marginal likelihood, Z argmax p(xjα; β) = argmax p(xjθ)p(θjα; β)dθ: α,β α,β θ Can be used to optimize λj, polynomial degree, RBF σi, polynomial vs.
    [Show full text]
  • Applying Particle Filtering in Both Aggregated and Age-Structured Population Compartmental Models of Pre-Vaccination Measles
    bioRxiv preprint doi: https://doi.org/10.1101/340661; this version posted June 6, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Applying particle filtering in both aggregated and age-structured population compartmental models of pre-vaccination measles Xiaoyan Li1*, Alexander Doroshenko2, Nathaniel D. Osgood1 1 Department of Computer Science, University of Saskatchewan, Saskatoon, Saskatchewan, Canada 2 Department of Medicine, Division of Preventive Medicine, University of Alberta, Edmonton, Alberta, Canada * [email protected] Abstract Measles is a highly transmissible disease and is one of the leading causes of death among young children under 5 globally. While the use of ongoing surveillance data and { recently { dynamic models offer insight on measles dynamics, both suffer notable shortcomings when applied to measles outbreak prediction. In this paper, we apply the Sequential Monte Carlo approach of particle filtering, incorporating reported measles incidence for Saskatchewan during the pre-vaccination era, using an adaptation of a previously contributed measles compartmental model. To secure further insight, we also perform particle filtering on an age structured adaptation of the model in which the population is divided into two interacting age groups { children and adults. The results indicate that, when used with a suitable dynamic model, particle filtering can offer high predictive capacity for measles dynamics and outbreak occurrence in a low vaccination context. We have investigated five particle filtering models in this project. Based on the most competitive model as evaluated by predictive accuracy, we have performed prediction and outbreak classification analysis.
    [Show full text]
  • Polynomial Singular Value Decompositions of a Family of Source-Channel Models
    Polynomial Singular Value Decompositions of a Family of Source-Channel Models The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Makur, Anuran and Lizhong Zheng. "Polynomial Singular Value Decompositions of a Family of Source-Channel Models." IEEE Transactions on Information Theory 63, 12 (December 2017): 7716 - 7728. © 2017 IEEE As Published http://dx.doi.org/10.1109/tit.2017.2760626 Publisher Institute of Electrical and Electronics Engineers (IEEE) Version Author's final manuscript Citable link https://hdl.handle.net/1721.1/131019 Terms of Use Creative Commons Attribution-Noncommercial-Share Alike Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/ IEEE TRANSACTIONS ON INFORMATION THEORY 1 Polynomial Singular Value Decompositions of a Family of Source-Channel Models Anuran Makur, Student Member, IEEE, and Lizhong Zheng, Fellow, IEEE Abstract—In this paper, we show that the conditional expec- interested in a particular subclass of one-parameter exponential tation operators corresponding to a family of source-channel families known as natural exponential families with quadratic models, defined by natural exponential families with quadratic variance functions (NEFQVF). So, we define natural exponen- variance functions and their conjugate priors, have orthonor- mal polynomials as singular vectors. These models include the tial families next. Gaussian channel with Gaussian source, the Poisson channel Definition 1 (Natural Exponential Family). Given a measur- with gamma source, and the binomial channel with beta source. To derive the singular vectors of these models, we prove and able space (Y; B(Y)) with a σ-finite measure µ, where Y ⊆ R employ the equivalent condition that their conditional moments and B(Y) denotes the Borel σ-algebra on Y, the parametrized are strictly degree preserving polynomials.
    [Show full text]
  • Dynamic Detection of Change Points in Long Time Series
    AISM (2007) 59: 349–366 DOI 10.1007/s10463-006-0053-9 Nicolas Chopin Dynamic detection of change points in long time series Received: 23 March 2005 / Revised: 8 September 2005 / Published online: 17 June 2006 © The Institute of Statistical Mathematics, Tokyo 2006 Abstract We consider the problem of detecting change points (structural changes) in long sequences of data, whether in a sequential fashion or not, and without assuming prior knowledge of the number of these change points. We reformulate this problem as the Bayesian filtering and smoothing of a non standard state space model. Towards this goal, we build a hybrid algorithm that relies on particle filter- ing and Markov chain Monte Carlo ideas. The approach is illustrated by a GARCH change point model. Keywords Change point models · GARCH models · Markov chain Monte Carlo · Particle filter · Sequential Monte Carlo · State state models 1 Introduction The assumption that an observed time series follows the same fixed stationary model over a very long period is rarely realistic. In economic applications for instance, common sense suggests that the behaviour of economic agents may change abruptly under the effect of economic policy, political events, etc. For example, Mikosch and St˘aric˘a (2003, 2004) point out that GARCH models fit very poorly too long sequences of financial data, say 20 years of daily log-returns of some speculative asset. Despite this, these models remain highly popular, thanks to their forecast ability (at least on short to medium-sized time series) and their elegant simplicity (which facilitates economic interpretation). Against the common trend of build- ing more and more sophisticated stationary models that may spuriously provide a better fit for such long sequences, the aforementioned authors argue that GARCH models remain a good ‘local’ approximation of the behaviour of financial data, N.
    [Show full text]
  • A Compendium of Conjugate Priors
    A Compendium of Conjugate Priors Daniel Fink Environmental Statistics Group Department of Biology Montana State Univeristy Bozeman, MT 59717 May 1997 Abstract This report reviews conjugate priors and priors closed under sampling for a variety of data generating processes where the prior distributions are univariate, bivariate, and multivariate. The effects of transformations on conjugate prior relationships are considered and cases where conjugate prior relationships can be applied under transformations are identified. Univariate and bivariate prior relationships are verified using Monte Carlo methods. Contents 1 Introduction Experimenters are often in the position of having had collected some data from which they desire to make inferences about the process that produced that data. Bayes' theorem provides an appealing approach to solving such inference problems. Bayes theorem, π(θ) L(θ x ; : : : ; x ) g(θ x ; : : : ; x ) = j 1 n (1) j 1 n π(θ) L(θ x ; : : : ; x )dθ j 1 n is commonly interpreted in the following wayR. We want to make some sort of inference on the unknown parameter(s), θ, based on our prior knowledge of θ and the data collected, x1; : : : ; xn . Our prior knowledge is encapsulated by the probability distribution on θ; π(θ). The data that has been collected is combined with our prior through the likelihood function, L(θ x ; : : : ; x ) . The j 1 n normalized product of these two components yields a probability distribution of θ conditional on the data. This distribution, g(θ x ; : : : ; x ) , is known as the posterior distribution of θ. Bayes' j 1 n theorem is easily extended to cases where is θ multivariate, a vector of parameters.
    [Show full text]
  • 36-463/663: Hierarchical Linear Models
    36-463/663: Hierarchical Linear Models Taste of MCMC / Bayes for 3 or more “levels” Brian Junker 132E Baker Hall [email protected] 11/3/2016 1 Outline Practical Bayes Mastery Learning Example A brief taste of JAGS and RUBE Hierarchical Form of Bayesian Models Extending the “Levels” and the “Slogan” Mastery Learning – Distribution of “mastery” (to be continued…) 11/3/2016 2 Practical Bayesian Statistics (posterior) ∝ (likelihood) ×(prior) We typically want quantities like point estimate: Posterior mean, median, mode uncertainty: SE, IQR, or other measure of ‘spread’ credible interval (CI) (^θ - 2SE, ^θ + 2SE) (θ. , θ. ) Other aspects of the “shape” of the posterior distribution Aside: If (prior) ∝ 1, then (posterior) ∝ (likelihood) 1/2 posterior mode = mle, posterior SE = 1/I( θ) , etc. 11/3/2016 3 Obtaining posterior point estimates, credible intervals Easy if we recognize the posterior distribution and we have formulae for means, variances, etc. Whether or not we have such formulae, we can get similar information by simulating from the posterior distribution. Key idea: where θ, θ, …, θM is a sample from f( θ|data). 11/3/2016 4 Example: Mastery Learning Some computer-based tutoring systems declare that you have “mastered” a skill if you can perform it successfully r times. The number of times x that you erroneously perform the skill before the r th success is a measure of how likely you are to perform the skill correctly (how well you know the skill). The distribution for the number of failures x before the r th success is the negative binomial distribution.
    [Show full text]
  • Monte Carlo Smoothing for Nonlinear Time Series
    Monte Carlo Smoothing for Nonlinear Time Series Simon J. GODSILL, Arnaud DOUCET, and Mike WEST We develop methods for performing smoothing computations in general state-space models. The methods rely on a particle representation of the filtering distributions, and their evolution through time using sequential importance sampling and resampling ideas. In particular, novel techniques are presented for generation of sample realizations of historical state sequences. This is carried out in a forward-filtering backward-smoothing procedure that can be viewed as the nonlinear, non-Gaussian counterpart of standard Kalman filter-based simulation smoothers in the linear Gaussian case. Convergence in the mean squared error sense of the smoothed trajectories is proved, showing the validity of our proposed method. The methods are tested in a substantial application for the processing of speech signals represented by a time-varying autoregression and parameterized in terms of time-varying partial correlation coefficients, comparing the results of our algorithm with those from a simple smoother based on the filtered trajectories. KEY WORDS: Bayesian inference; Non-Gaussian time series; Nonlinear time series; Particle filter; Sequential Monte Carlo; State- space model. 1. INTRODUCTION and In this article we develop Monte Carlo methods for smooth- g(yt+1|xt+1)p(xt+1|y1:t ) p(xt+1|y1:t+1) = . ing in general state-space models. To fix notation, consider p(yt+1|y1:t ) the standard Markovian state-space model (West and Harrison 1997) Similarly, smoothing can be performed recursively backward in time using the smoothing formula xt+1 ∼ f(xt+1|xt ) (state evolution density), p(xt |y1:t )f (xt+1|xt ) p(x |y ) = p(x + |y ) dx + .
    [Show full text]
  • A Geometric View of Conjugate Priors
    Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Geometric View of Conjugate Priors Arvind Agarwal Hal Daume´ III Department of Computer Science Department of Computer Science University of Maryland University of Maryland College Park, Maryland USA College Park, Maryland USA [email protected] [email protected] Abstract Using the same geometry also gives the closed-form solution for the maximum-a-posteriori (MAP) problem. We then ana- In Bayesian machine learning, conjugate priors are lyze the prior using concepts borrowed from the information popular, mostly due to mathematical convenience. geometry. We show that this geometry induces the Fisher In this paper, we show that there are deeper reasons information metric and 1-connection, which are respectively, for choosing a conjugate prior. Specifically, we for- the natural metric and connection for the exponential family mulate the conjugate prior in the form of Bregman (Section 5). One important outcome of this analysis is that it divergence and show that it is the inherent geome- allows us to treat the hyperparameters of the conjugate prior try of conjugate priors that makes them appropriate as the effective sample points drawn from the distribution un- and intuitive. This geometric interpretation allows der consideration. We finally extend this geometric interpre- one to view the hyperparameters of conjugate pri- tation of conjugate priors to analyze the hybrid model given ors as the effective sample points, thus providing by [7] in a purely geometric setting, and justify the argument additional intuition. We use this geometric under- presented in [1] (i.e. a coupling prior should be conjugate) standing of conjugate priors to derive the hyperpa- using a much simpler analysis (Section 6).
    [Show full text]
  • Time Series Analysis 5
    Warm-up: Recursive Least Squares Kalman Filter Nonlinear State Space Models Particle Filtering Time Series Analysis 5. State space models and Kalman filtering Andrew Lesniewski Baruch College New York Fall 2019 A. Lesniewski Time Series Analysis Warm-up: Recursive Least Squares Kalman Filter Nonlinear State Space Models Particle Filtering Outline 1 Warm-up: Recursive Least Squares 2 Kalman Filter 3 Nonlinear State Space Models 4 Particle Filtering A. Lesniewski Time Series Analysis Warm-up: Recursive Least Squares Kalman Filter Nonlinear State Space Models Particle Filtering OLS regression As a motivation for the reminder of this lecture, we consider the standard linear model Y = X Tβ + "; (1) where Y 2 R, X 2 Rk , and " 2 R is noise (this includes the model with an intercept as a special case in which the first component of X is assumed to be 1). Given n observations x1;:::; xn and y1;:::; yn of X and Y , respectively, the ordinary least square least (OLS) regression leads to the following estimated value of the coefficient β: T −1 T βbn = (Xn Xn) Xn Yn: (2) The matrices X and Y above are defined as 0 T1 0 1 x1 y1 X = B . C 2 (R) Y = B . C 2 Rn; @ . A Matn;k and n @ . A (3) T xn yn respectively. A. Lesniewski Time Series Analysis Warm-up: Recursive Least Squares Kalman Filter Nonlinear State Space Models Particle Filtering Recursive least squares Suppose now that X and Y consists of a streaming set of data, and each new observation leads to an updated value of the estimated β.
    [Show full text]
  • Adaptive Motion Model for Human Tracking Using Particle Filter
    2010 International Conference on Pattern Recognition Adaptive Motion Model for Human Tracking Using Particle Filter Mohammad Hossein Ghaeminia1, Amir Hossein Shabani2, and Shahryar Baradaran Shokouhi1 1Iran Univ. of Science & Technology, Tehran, Iran 2University of Waterloo, ON, Canada [email protected] [email protected] [email protected] Abstract is periodically adapted by an efficient learning procedure (Figure 1). The core of learning the motion This paper presents a novel approach to model the model is the parameter estimation for which the complex motion of human using a probabilistic sequence of velocity and acceleration are innovatively autoregressive moving average model. The analyzed to be modeled by a Gaussian Mixture Model parameters of the model are adaptively tuned during (GMM). The non-negative matrix factorization is then the course of tracking by utilizing the main varying used for dimensionality reduction to take care of high components of the pdf of the target’s acceleration and variations during abrupt changes [3]. Utilizing this velocity. This motion model, along with the color adaptive motion model along with a color histogram histogram as the measurement model, has been as measurement model in the PF framework provided incorporated in the particle filtering framework for us an appropriate approach for human tracking in the human tracking. The proposed method is evaluated by real world scenario of PETS benchmark [4]. PETS benchmark in which the targets have non- The rest of this paper is organized as the following. smooth motion and suddenly change their motion Section 2 overviews the related works. Section 3 direction. Our method competes with the state-of-the- explains the particle filter and the probabilistic ARMA art techniques for human tracking in the real world model.
    [Show full text]