<<

Density Approximation Based on Dirac Mixtures with Regard to Nonlinear Estimation and Filtering

Oliver C. Schrempf, Dietrich Brunn, Uwe D. Hanebeck Intelligent Sensor-Actuator-Systems Laboratory Institute of Computer Science and Engineering Universitat¨ Karlsruhe (TH), Germany [email protected], [email protected], [email protected]

Abstract— A deterministic procedure for optimal approxi- approximate an arbitrary density . The proposed mation of arbitrary density functions by of method differs from the deterministic type of particle filters Dirac mixtures with equal weights is proposed. The optimality in [8] as a distance measure is employed to find an optimal of this approximation is guaranteed by minimizing the distance of the approximation from the true density. For this purpose approximation of the true density. However, typical distance a distance measure is required, which is in general not well measures quantifying the distance between two densities are defined for Dirac mixtures. Hence, a key contribution is to not well defined for the case of Dirac mixtures. Examples compare the corresponding cumulative distribution functions. are the Kullback–Leibler distance [9] and integral quadratic This paper concentrates on the simple and intuitive in- distances between the densities. Hence, in this paper the tegral quadratic distance measure. For the special case of a Dirac mixture with equally weighted components, closed– corresponding cumulative distribution functions of the true form solutions for special types of densities like uniform and density and its approximation are compared in order to define Gaussian densities are obtained. Closed–form solution of the an optimal Dirac Mixture approximation. This can be viewed given optimization problem is not possible in general. Hence, as a reversal of the procedure introduced in [10], where a another key contribution is an efficient solution procedure for distribution distance, in that case the Kolmogorv–Smirnov arbitrary true densities based on a homotopy continuation approach. , is used to calculate optimal of a In contrast to standard Monte Carlo techniques like particle density given observed samples. filters that are based on random sampling, the proposed ap- Here, we apply the integral quadratic distance between the proach is deterministic and ensures an optimal approximation cumulative distributions, which is simple and intuitive. Other with respect to a given distance measure. In addition, the num- possible distribution distances include the Kolmogorov– ber of required components (particles) can easily be deduced by application of the proposed distance measure. The resulting Smirnov distance [11], or the Cramer–von´ Mises distance approximations can be used as basis for recursive nonlinear [12], [13]. For the special case of a Dirac mixture with filtering mechanism alternative to Monte Carlo methods. equally weighted components, closed–form solutions for spe- cial types of densities like uniform and Gaussian densities are I. INTRODUCTION obtained. The more general case for non-equally weighted Bayesian methods are very popular for dealing with sys- components is discussed in [14]. Since a closed–form solu- tems suffering from and are used in a wide tion of the given optimization problem is not possible in of applications. For nonlinear systems, unfortunately general, an efficient solution procedure for arbitrary true the processing of the probability density functions involved densities based on a homotopy continuation approach similar in the estimation procedure, typically cannot be performed to the approach introduced in [15] is applied. exactly. This effects especially the type of density in recur- The approximations yielded by the approach presented sive processing, which changes and increases the complexity. in this paper can immediately be used for implementing a Hence, nonlinear estimation in general requires the approxi- recursive nonlinear filter that could serve as an alternative to mation of the underlying true densities by means of generic the popular particle filters. In contrast to a standard particle density types. representation, the proposed approach provides an optimal In literature different types of parametric continuous den- approximation with respect to a given distance measure. sities have been proposed for approximation, including Gaus- Furthermore, the approach is deterministic, since no random sian mixtures [1], expansions [2], and ex- numbers are involved. In addition, the number of required ponential densities [3]. Furthermore, discrete approximations components (particles) can easily be deduced by taking the are very popular. A well known approach is to represent the distance measure presented into account. true density by means of a set of samples [4]. This is used The paper is organized as follows. After the problem by the class of particle filters [5]. Typically, the locations and formulation in Section II, the conversion of the approx- weights of the particles are determined by means of Monte imation problem into an equivalent optimization problem Carlo techniques [6], [7]. is explained in Section III. Closed–Form solutions of this In this paper we provide a different view on such a discrete optimization problem for special types of densities are given representation. The given data points are interpreted as a in Section IV. A general solution approach for the case of mixture of Dirac delta components in order to systematically arbitrary densities is then given in Section V. Conclusions and a few remarks about possible extensions and future work 1

are given in Section VI. →

It is important to note that this paper is restricted to the ) 0.5 x (

case of random variables. Furthermore, the focus is F on the special case of Dirac mixtures with equally weighted 0 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 components. This dramatically simplifies the derivations and x → allows for closed-form approximations in some important 1

cases. →

) 0.5 x

II. PROBLEM FORMULATION ( ˜ F We consider a given density function f(x). The goal is to 0 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 approximate this density by means of a Dirac mixture given x → by 1 L f(x, η)= w δ(x − x ) , → i i (1) ) 0.5 x

i=1 ( F where in the context of this paper, the weighting factors 0 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 wi are assumed to be equal and given by wi =1/L. The x → vector η contains the positions of the individual Dirac functions according to Fig. 1. Approximation of the uniform distribution for a different number L =5 L =10 L =15 T of components , , and . η = x1,x2,...,xL . For the remainder of this paper, it is assumed that the Theorem III.1 The optimal parameters xi, i =1,...,L of positions are ordered according to the Dirac mixture approximation f(x, η) according to (1) of ˜ x1

III. THE OPTIMIZATION APPROACH PROOF. The necessary condition for a minimum of the distance G(η) G(η) The first key idea is to reformulate the above approxi- measure is satisfied by the roots of the of with respect to the parameter vector η according to mation problem as an optimization problem by minimizing ˜ a certain distance between the true density f(x) and its ∂G(η) =0 . approximation f(x). Instead of comparing the densities di- ∂η rectly, which does not make sense for Dirac Delta functions, x the corresponding (cumulative) distribution functions are For the individual parameters i we obtain Z ! employed for that purpose. ∂G(η) ∞ XL ˜ The distribution function corresponding to the true density = −2 F (x) − wj H(x − xj ) δ(x − xi) dx ∂xi ˜ −∞ j=1 f(x) is given by x for i =1,...,L. Using the fundamental property of the Dirac delta F˜(x)= f˜(t) dt , function gives −∞ ! ∂G(η) XL ˜ the distribution function corresponding to the Dirac mixture = −2 F (xi) − wj H(xi − xj ) . ∂xi approximation can be written as j=1 x L Evaluating the Heaviside function and setting the result to zero F (x, η)= f(t, η) dt = wiH(x − xi) , (2) finally gives the desired result −∞ i=1 ˜ 2 i − 1 ! F (xi) − =0 where H(.) denotes the Heaviside function defined as 2 L ⎧ i =1,...,L  ⎨ 0,x<0 for . H(x)= 1 ,x=0 . ⎩ 2 1,x>0 IV. SOLUTION FOR SPECIAL CASES A distance measure can then given by For illustrating the usefulness of the result given in The- ∞ 2 orem III.1, we consider two special types of densities that G(η)= F˜(x) − F (x, η) dx . (3) admit a closed–form solution for the desired parameters of −∞ the Dirac mixture approximation. A. Special Case: Uniform Density → → ) Without loss of generality, we consider the uniform density ) x x ( ⎧ ( f ⎨ 0,x<0 F ˜ f(x)= 1, 0 ≤ x<1 . (5) −2 0 2 −2 0 2 ⎩ 0,x≥ 1 x → x → → The corresponding distribution function is given by → ) ⎧ ) x x ( ⎨ 0,x<0 ( f F˜(x)= x, 0 ≤ x<1 . F ⎩ 1,x≥ 1 −2 0 2 −2 0 2 x → x → ˜ Hence, in (4) we have F (xi)=xi, and the parameters of → → ) the Dirac mixture approximation are immediately given by ) x x ( ( f

2 i − 1 F x = i 2 L (6) −2 0 2 −2 0 2 x → x → for i =1,...,L. F˜(x) The true distribution and its Dirac mixture approx- Fig. 2. Approximation of the standard and density for imation are shown for a different number of components a different number of components L =3, L =5and L =10. L in Figure 1. Obviously, the Dirac mixture approximation converges to the true density for L →∞, which will be shown more formally in the next theorem. Lemma IV.1 The of the Dirac mixture ap- proximation of the uniform distribution is (independent of L Theorem IV.1 The distance measure G(η) between the uni- the number of components ) given by ˜ form density f(x) and its Dirac mixture approximation with L 1 2 i − 1 1 L E {x} = = . components is given by f L 2 L 2 i=1 1 1 G(η)= 12 L2 PROOF. Z „ «! ∞ XL 2 i − 1 and decreases quadratically towards zero as the number of E {x} = x w δ x − dx , f i 2 L components L increases. −∞ i=1 which upon using the fundamental property of the Dirac delta PROOF. The distance measure is now given by function gives L Z „ «!2 X 1 2 i − 1 1 XL 1 2i − 1 E {x} = . G(η)= x − H x − dx , f L 2 L L 2L i=1 0 i=1 Simplification yields which is equivalent to L (L +1)− L 1 ! E {x} = = . X2L Z 1 f 2 L2 2 2L 2 G(η)= x dx .  i=1 0 Evaluating the integral gives Hence, the expected value of the approximation density is equal to the expected value of the true density independent X2L 1 1 1 1 G(η)= = , of the number of components. 3 (2L)3 3 (2L)2 i=1 Lemma IV.2  The of the Dirac mixture approxima- which concludes the proof. tion is given by  L2 − 1 The expected value and the variance of the given true E (x − E {x})2 = . ˜ f f 12 L2 density f(x) are given by 1 PROOF. The variance is given by E {x} = ! f˜ 2 ˘ ¯ XL 1 (2 i − 1)2 1 E (x − E {x})2 = − f f L (2 L)2 4 and i=1 2 1 (2 L +1)(2L − 1) 1 E x − E {x} = . = − , f˜ f˜ 12 12 L2 4 which gives the above result.  γ =0 γ =0.332 γ =0.665 γ =1 1 1 1 1 0.8 0.8 0.8 0.8 → → 0.6 → 0.6 → 0.6 0.6 ) ) ) ) x x x x ( ( 0.4 ( 0.4 ( 0.4 0.4 F F F F 0.2 0.2 0.2 0.2 0 0 0 0 −2 0 2 −2 0 2 −2 0 2 −2 0 2 x → x → x → x → 1 1 1 1 0.8 0.8 0.8 0.8

→ 0.6 → 0.6 → 0.6 → 0.6 ) ) ) ) x x x x

( 0.4 ( 0.4 ( 0.4 ( 0.4 f f f f 0.2 0.2 0.2 0.2 0 0 0 0 −2 0 2 −2 0 2 −2 0 2 −2 0 2 x → x → x → x →

Fig. 3. Progressive approximation of the Gaussian mixture with two components Example V.2 for γ =0...1. The approximation density consists of L =15components. The height of each Dirac component in the density approximation corresponds to its weight value.

The variance of the approximation density converges to V. S OLUTION FOR THE GENERAL CASE ˜ the true variance when the number of components goes to For general true densities f(.), a closed–form solution for infinity. the parameters of the approximating Dirac mixture density is not possible. Hence, we have to resort to a numerical solution B. Special Case: Gaussian Density of (4) in Theorem III.1. Without loss of generality, we consider a standard normal Of course, in the scalar case considered in this paper, a density with distribution function wealth of numerical procedures for solving (4) are readily   available. However, the multidimensional case calls for more 1 x 1 F˜(x)= erf √ + , advanced approaches. This is even more important, when 2 2 2 Dirac mixture approximations with non–equally weighted components are considered [14]. Hence, we provide an effi- (.) where erf denotes the . cient solution procedure, that will be derived and explained in simple scalar cases, but is also very well suited for the Lemma IV.3 The parameters of a Dirac mixture approxima- more advanced cases. tion of the standard normal density are given by The approach pursued here is based on the intuition that   √ in typical density approximation scenarios, a certain prior −1 2 i − 1 − L xi = 2 erf density is given, which is transformed by the considered type L of processing step. This includes transformation of random variables, the Bayesian filter step (measurement update), and for i =1,...,L. the prediction step (time update) for propagating a given PROOF. With (4) we have prior density through a nonlinear dynamic system. „ « Instead of performing the considered processing step at 1 xi 1 2 i − 1 erf √ + = , once, the effect on the resulting density is introduced grad- 2 2 2 2 L ually. For that purpose, a continuous transformation of the which immediately gives the desired result.  given density towards the desired density is employed. This typically allows us to start with a “simple” density s(x), for which the approximation is either already known or can The Dirac mixture approximation of the standard normal easily be constructed. density and the corresponding distribution for a different A progression parameter γ is introduced, which is used to L ˜ number of components is shown in Figure 2. parameterize the true density f(.). Without loss of generality, the progression parameter γ is assumed to range in the ˜ Remark IV.1 We assume, that a suitable implementation of interval γ ∈ [0, 1], such that f(x, γ =0)corresponds to −1 ˜ the inverse error function erf (.)1 is available. the simple density s(x) and f(x, γ =1)corresponds to ˜ the original true density, i.e., f(x, γ =0)=s(x) and ˜ ˜ 1The corresponding MATLAB function is denoted by erfinv() f(x, γ =1)=f(x). L=10 L=20 L=30 L=40 1 1 1 1 0.8 0.8 0.8 0.8 → → 0.6 → 0.6 → 0.6 0.6 ) ) ) ) x x x x ( ( 0.4 ( 0.4 ( 0.4 0.4 F F F F 0.2 0.2 0.2 0.2 0 0 0 0 −2 0 2 −2 0 2 −2 0 2 −2 0 2 x → x → x → x → 1 1 1 1

0.8 0.8 0.8 0.8

→ 0.6 → 0.6 → 0.6 → 0.6 ) ) ) ) x x x x

( 0.4 ( 0.4 ( 0.4 ( 0.4 f f f f 0.2 0.2 0.2 0.2

0 0 0 0 −2 0 2 −2 0 2 −2 0 2 −2 0 2 x → x → x → x →

Fig. 4. Approximation of the Gaussian mixture from Example V.2 with a Dirac mixture with L =10, L =20, L =30, and L =40components.

A straightforward progression from the simple density for i =1,...,L. With ˜ s(x) to the original true density f(x) is given by ˜ ∂ F (xi,γ) ˜ ˜ ˜ = f(xi,γ) , f(x, γ)=(1− γ) s(x)+γ f(x) , (7) ∂xi which is demonstrated in the next example. Different pro- we obtain gression schedules are possible as shown in [14]. ∂ F˜(x ,γ) − i = f˜(x ,γ)˙x (γ) , ∂γ i i Example V.1 We consider a Gaussian Mixture density with two components given by where x˙ i(γ) denotes the derivative of xi with respect to the ˜ γ f(x)=w1N(x, m1,σ1)+w2N(x, m2,σ2) (8) progression parameter . In vector–matrix–notation, we obtain the following system with w1 =0.3 ,w2 =0.7 , of explicit first–order ordinary differential equations (ODE) m = −0.5 ,m =2.0 ,     1 2 b η(γ),γ = P η(γ),γ · η˙(γ) , (9) σ1 =1,σ2 =0.3 . with ⎡ ⎤ As simple density, we select the uniform density given in (5). ∂ F˜(x ,γ) − 1 The resulting homotopy is visualized in Figure 3 for various ⎢ ∂γ ⎥ values of γ ∈ [0, 1]. In addition, the optimal Dirac mixture ⎢ ˜ ⎥   ⎢− ∂ F (x2,γ) ⎥ approximations with L =15components are shown for the ⎢ ∂γ ⎥ b η(γ),γ = ⎢ ⎥ corresponding values of γ. This approximation will be dis- ⎢ . ⎥ cussed in what follows. ⎣ . ⎦ ˜ − ∂ F (xL,γ) The approximation now tracks the true density that is ∂γ progressively modified by increasing γ. In order for the and parameter vector η(γ) to track the optimum, we require a   P η(γ),γ = f˜(x ,γ), f˜(x ,γ),...,f˜(x ,γ) . differential relation between the progression parameter γ and diag 1 2 L the parameter vector η. For that purpose, we consider the For the specific progression given in (7), we obtain cumulative version of (7) given by ∂ F˜(x ,γ) F˜(x, γ)=(1− γ) S(x)+γ F˜(x) , i = F˜ (x (γ)) − S˜ (x (γ)) . ∂γ i i S(x)     where is the cumulative distribution corresponding to b η(γ),γ = b η(γ) the simple density s(x). The resulting progressive version As a result, we obtain with ˜ ˜ ⎡ ⎤ F (x, γ) of F (x) is then plugged into (4) and we take the S˜(x (γ)) − F˜(x (γ)) ˜ 1 1 derivative with respect to γ. Since F (xi,γ) is both an explicit ⎢ ⎥   ⎢ ˜ ˜ ⎥ and due to xi = xi(γ) an implicit function of γ, we obtain ⎢ S(x2(γ)) − F (x2(γ)) ⎥ b η(γ) = ⎢ ⎥ . (10) ˜ ˜ ⎣ . ⎦ ∂ F (xi,γ) ∂ F (xi,γ) ∂xi(γ) . + =0 , ˜ ˜ ∂γ ∂xi ∂γ S(xL(γ)) − F (xL(γ)) 3

2.5

2

1.5

1

0.5 → x 0

−0.5

−1

−1.5

−2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 γ→

Fig. 5. Trace of the parameter vector η(γ) with L =15samples. Each line in the shows the evolution of a particular parameter.

Example V.2 This example demonstrates the approximation [2] S. Challa, Y. Bar-Shalom, and V. Krishnamurthy, “Nonlinear Filtering of the Gaussian mixture density from Example V.1 by means via Generalized Edgeworth Series and Gauss–Hermite Quadrature,” of a Dirac mixture. For that purpose, we use the simple IEEE Transactions on Signal Processing, vol. 48, no. 6, pp. 1816– progression scheme given in (7) also used in Example V.1. 1820, 2000. For tracking the parameter` vector,´ the system of ODE (9) is [3] D. Brigo, B. Hanzon, and F. LeGland, “A Differential Geometric solved for γ ∈ [0, 1] with b η(γ),γ given in (10). The progres- Approach to Nonlinear Filtering: The Projection Filter,” IEEE Trans- sion is started with a parameter vector η(γ =0)corresponding actions on Automatic Control, vol. 42, no. 2, pp. 247–252, 1998. to the optimal approximation of the uniform density given in (6). [4] N. Oudjane and S. Rubenthaler, “ and Uniform Particle For γ =1, the parameter vector η(γ =1)corresponding to the Approximation of Nonlinear Filters in Case of non–Ergodic Signal,” Prepublication,´ Laboratoire de Probabilites´ et Modeles` Aleatoires,´ desired optimal Dirac mixture approximation of the true density Universite´ de Paris VI, Tech. Rep. PMA786, January 2003. [Online]. (8) is obtained. This is demonstrated in Figure 3 for a fixed Available: http://www.proba.jussieu.fr/mathdoc/textes/PMA-786.pdf number of L =15components and a few selected values of [5] S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A Tutorial γ ∈ [0, 1]. Please note that γ continuously covers the interval on Particle Filters for On–line Non–linear/Non–Gaussian Bayesian [0, 1]. Tracking,” IEEE Transactions of Signal Processing, vol. 50, no. 2, The resulting Dirac mixture approximations are shown in pp. 174–188, 2002. Figure 4 for a different number of mixture components. [6] E. Bølviken, P. J. Acklam, N. Christophersen, and J.-M. Størdal, Figure 5 shows the evolution of the parameters of the Dirac “Monte Carlo Filters for Non–Linear State Estimation,” Automatica, mixture, i.e., the positions of the individual components, as the vol. 37, pp. 177–183, 2001. progression parameter γ varies from γ =0to γ =1. [7] A. Doucet, S. Godsill, and C. Andrieu, “On Sequential Monte Carlo Sampling Methods for Bayesian Filtering,” and Computing, vol. 10, no. 3, pp. 197–208, 2000. ISCUSSION AND UTURE ORK VI. D F W [8] E. Bølviken and G. Storvik, “Deterministic and Particle In this paper a systematic procedure for approximating an Filters in State Space Models,” in Sequential Monte Carlo Methods in Practice, A. Doucet, N. de Freitas, and N. Gordon, Eds. Springer arbitrary probability density function by means of a Dirac Series of ”Statistics for Engineering and Information Science”, 2001. mixture with equally weighted components as a special case [9] S. Kullback and . A. Leibler, “On Information and Sufficiency,” of the method proposed in [14] has been introduced. The Annals of , vol. 22, no. 2, pp. 79–86, 1951. resulting approximation can be used immediately as a basis [10] M. D. Weber, L. M. Leemis, and R. K. Kincaid, “Minimum Kolmogorov-Smirnov Test Statistic Parameter Estimates,” Journal of for recursive nonlinear filtering mechanisms. This can be Statistical Computation and Simulation, vol. 76, no. 3, pp. 196–206, seen as an alternative to Monte Carlo based particle filters. 2006. A special benefit of the proposed method lies in the fact, [11] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, “Kolmogorov–Smirnov Test,” in Numerical Recipes in C: The Art of that the approximation is deterministic and hence, successive Scientific Computing. Cambridge University Press, 1992, pp. 623– application yields identical results, which is not the case for 628. random number sampling based Monte Carlo methods. [12] D. A. Darling, “The Kolmogorov–Smirnov, Cramer–von´ Mises Tests,” The proposed procedure has been introduced in the context The Annals of Mathematical Statistics, vol. 28, no. 4, pp. 823–83, 1957. of scalar random variables for the sake of simplicity. It can, [13] D. D. Boos, “Minimum Distance for Location and Good- however, be generalized to random vectors in a straightfor- ness of Fit,” Journal of the American Statistical association, vol. 76, ward manner. no. 375, pp. 663–670, 1981. Of course, the procedure is not limited to integral quadratic [14] O. C. Schrempf, D. Brunn, and U. D. Hanebeck, “Dirac Mixture Den- sity Approximation Based on Minimization of the Weighted Cramer–´ distance measures. Similar derivations as the ones given in von Mises Distance,” in Proceedings of the International Conference this paper can be performed for different distance measures. on Multisensor Fusion and Integration for Intelligent Systems (MFI 2006), Heidelberg, Germany, September 2006, pp. 512–517. [15] U. D. Hanebeck and O. Feiermann, “Progressive Bayesian Estimation REFERENCES for Nonlinear Discrete–Time Systems: The Filter Step for Scalar [1] D. L. Alspach and H. W. Sorenson, “Nonlinear Bayesian Estimation Measurements and Multidimensional States,” in Proceedings of the Using Gaussian Sum Approximation,” IEEE Transactions on Auto- 2003 IEEE Conference on Decision and Control (CDC’03), Maui, matic Control, vol. AC–17, no. 4, pp. 439–448, 1972. Hawaii, 2003, pp. 5366–5371.