Bayesian Inference Via Approximation of Log-Likelihood for Priors in Exponential Family

1 Bayesian Inference via Approximation of Log-likelihood for Priors in Exponential Family Tohid Ardeshiri, Umut Orguner, and Fredrik Gustafsson Abstract—In this paper, a Bayesian inference technique based x x x ··· x on Taylor series approximation of the logarithm of the likelihood 1 2 3 T function is presented. The proposed approximation is devised for the case, where the prior distribution belongs to the exponential family of distributions. The logarithm of the likelihood function is linearized with respect to the sufficient statistic of the prior distribution in exponential family such that the posterior obtains y1 y2 y3 yT the same exponential family form as the prior. Similarities between the proposed method and the extended Kalman filter for nonlinear filtering are illustrated. Furthermore, an extended Figure 1: A probabilistic graphical model for stochastic dy- target measurement update for target models where the target namical system with latent state xk and measurements yk. extent is represented by a random matrix having an inverse Wishart distribution is derived. The approximate update covers the important case where the spread of measurement is due to the target extent as well as the measurement noise in the sensor. (HMMs) whose probabilistic graphical model is presented in Fig. 1. In such filtering problems, the posterior to the last Index Terms—Approximate Bayesian inference, exponential processed measurement is in the same form as the prior family, Bayesian graphical models, extended Kalman filter, ex- distribution before the measurement update. Thus, the same tended target tracking, group target tracking, random matrices, inference algorithm can be used in a recursive manner. inverse Wishart. The exact posterior distribution of a latent variable can not always be given a compact analytical expression since I. INTRODUCTION the integral in the denominator of (1) may not be available in analytical form. Consequently, the number of parameters Determination of the posterior distribution of a latent vari- needed to express the posterior distribution will increase able x given the measurements (observed data) y is at the every time the Bayes rule (1) is used for inference. Several core of Bayesian inference using probabilistic models. These methods for approximate inference over probabilistic models probabilistic models describe the relation between the random are proposed in the literature such as variational Bayes [1], latent variables, the deterministic parameters, and the measure- [2], expectation propagation [3], integrated nested Laplace ap- ments. Such relations are specified by prior distributions of proximation (INLA) [4], generalized linear models (GLMs) [5] the latent variables p(x), and the likelihood function p(yjx) and, Monte-Carlo (MC) sampling methods [6], [7]. which give a probabilistic description of the measurements Variational Bayes (VB) and expectation propagation (EP) given (some of) the latent variables. Using the probabilistic are two analytical optimization-based solutions for the ap- model and measurements the exact posterior can be expressed proximate Bayesian inference [8]. In these two approaches, in a functional form using the Bayes rule Kullback-Leibler divergence [9] between the true posterior p(x)p(yjx) distribution and an approximate posterior is minimized. INLA p(xjy) = : (1) R p(x)p(yjx) dx is a technique to perform approximate Bayesian inference in latent Gaussian models [10] using the Laplace approximation. The exact posterior distribution can be analytical. A sub- GLMs are an extension of ordinary linear regression when arXiv:1510.01225v1 [cs.LG] 5 Oct 2015 class of cases where the posterior is analytical is when the errors belong to the exponential family. posterior belongs to the same family of distributions as the Sampling methods such as particle filters and Markov Chain prior distribution. In such cases, the prior distribution is Monte Carlo (MCMC) methods provide a general class of called a conjugate prior for the likelihood function. A well- numerical solutions to the approximate Bayesian inference known example where an analytical posterior is obtained using problem. However, in this paper, our focus is on fast analytical conjugate priors is when the latent variable is a priori normal- approximations which are applicable to large-scale inference distributed and the likelihood function given the latent variable problems. The proposed solution is built on properties of as its mean is again normal. the exponential family of distributions and earlier work on The conjugacy is especially useful when measurements are extended Kalman filter [11]. processed sequentially as they appear in the filtering task for In this paper, a Bayesian inference technique based on stochastic dynamical systems known as hidden Markov models Taylor series approximation of the logarithm of the likelihood function is presented. The proposed approximation is Tohid Ardeshiri and Fredrik Gustafsson are with the Department of derived for the case where the prior distribution belongs to Electrical Engineering, Linkoping¨ University, 58183 Linkoping,¨ Sweden, (e- mail: [email protected], [email protected]). The authors gratefully acknowl- the exponential family of distributions. The rest of this paper edge funding from the Swedish Research Council (VR) for the project is organized as follows; In Section II an introduction to Scalable Kalman Filters. exponential family of distribution is provided. In Section III Umut Orguner is with Department of Electrical and Electronics Engi- neering, Middle East Technical University, 06531 Ankara Turkey, (email: a general algorithm for approximate inference in the expo- [email protected]). nential family of distributions is suggested. We show how the 2 proposed technique is related to the extended Kalman filter p(Yjx; λ) in exponential family, where λ is a set of hyper- for nonlinear filtering in Section III-B. A novel measurement parameters of the likelihood and x is the latent variable. The update for tracking extended targets using random matrices likelihood of the m IID measurements can be written as is given in Section IV. The new measurement update for m Y tracking extended targets is evaluated in numerical simulations p(Yjx; λ) = h(yj) in Section V. The concluding remarks are given in Section VI. j=1 0 m 1 X j II. THE EXPONENTIAL FAMILY × exp @η(x; λ) · T (y ) − mA(η(x; λ))A : (6) The exponential family of distributions [8] include many j=1 common distributions such as Gaussian, beta, gamma and We seek a conjugate prior distribution for x denoted by Wishart. For x 2 X the exponential family in its natural form p(x). The prior distribution p(x) is a conjugate prior if the can be represented by posterior distribution p(x; η) = h(x) exp(η · T (x) − A(η)); (2) p(xjY) / p(Yjx; λ)p(x) (7) where η is the natural parameter, T (x) is the sufficient statistic, is also in the same exponential family as the prior. Let us A(η) is log-partition function and h(x) is the base measure. assume that the prior is in the form η and T (x) may be vector-valued. Here a·b denotes the inner p(x) / exp(F(x)) (8) product of a and b1. The log-partition function is defined by the integral for some function F(·). Hence, Z p(xjY) / p(Yjx; λ) exp(F(x)) (9) A(η) , log h(x) exp(η · T (x)) dx: (3) 0 1 X m X j / exp @η(x; λ) · T (y ) − mA(η(x; λ)) + F(x)A : Also, η 2 Ω = fη 2 RmjA(η) < +1g where Ω is the natural parameter space. Moreover, Ω is a convex set and A(·) is a j=1 convex function on Ω. (10) When a probability density function (PDF) in the exponen- For the posterior (10) to be in the same exponential family as tial family is parametrized by a non-canonical parameter θ, the prior [12], F(·) needs to be in the form the PDF in (2) can be written as F(x) = ρ1 · η(x; λ) − ρ0A(η(x; λ)) (11) p(x; θ) = h(x) exp(η(θ) · T (x) − A(η(θ))); (4) for some ρ , (ρ0; ρ1), such that where θ is the non-canonical parameter vector of the PDF. p(xjY) / Example 1. Consider the normal distribution p(x) = 00 m 1 1 d X j N (x; µ, Σ) with mean µ 2 R and covariance matrix Σ. Its exp @@ρ1 + T (y )A · η(x; λ) − (m + ρ0)A(η(x; λ))A : PDF can be written in exponential family form given in (2) j=1 where (12) − d h(x) = (2π) 2 ; (5a) Hence, the conjugate prior for the likelihood (6) is x parametrized by ρ and is given by T (x) = ; (5b) vec(xxT) 1 p(x; ρ) = exp (ρ1 · η(x; λ) − ρ0A(η(x; λ)) ; (13) Σ−1µ Z η(µ, Σ) = ; (5c) vec(− 1 Σ−1) where 2 Z 1 1 A(η(µ, Σ)) = µTΣ−1µ + log jΣj: (5d) Z = exp (ρ1 · η(x; λ) − ρ0A(η(x; λ)) dx: (14) 2 2 In Example 2, the conjugate prior for the normal likelihood is derived using the exponential family form for the normal Remark 1. In the rest of this document, the vec(·) operators distribution given in Example 1. In Example 3, a conjugate will be dropped to keep the notation less cluttered. That is, prior for a more complicated likelihood function is derived. we will use the simplified notation T (x) = (x; xxT) instead of (5b) and η(µ, Σ) = (Σ−1µ, − 1 Σ−1) instead of (5c). Example 2. Consider the measurement y with the likelihood 2 p(yjx) = N (y; Cx;R). The likelihood can be written in exponential family form as in A. Conjugate Priors in Exponential Family − d p(yjx) = (2π) 2 exp (η(Cx;R) · T (y) − A(η(Cx;R))) In this section we study the conjugate prior family for a like- (15) lihood function belonging to the exponential family.

Bayesian Inference Via Approximation of Log-Likelihood for Priors in Exponential Family

STAT 802: Multivariate Analysis Course Outline

Mx (T) = 2$/2) T'w# (&&J Terp

Interpolation of the Wishart and Non Central Wishart Distributions

Singular Inverse Wishart Distribution with Application to Portfolio Theory

An Introduction to Wishart Matrix Moments Adrian N

Package 'Matrixsampling'

Arxiv:1402.4306V2 [Stat.ML] 19 Feb 2014 Advantages Come at No Additional Computa- Archambeau and Bach, 2010]

Wishart Distributions for Covariance Graph Models

2. Wishart Distributions and Inverse-Wishart Sampling

Bayesian Inference for the Multivariate Normal

Exact Largest Eigenvalue Distribution for Doubly Singular Beta Ensemble

Non-Central Multivariate Chi-Square and Gamma Distributions