Moment-based Dirac Mixture Approximation of Circular Densities

Uwe D. Hanebeck ∗ and Anders Lindquist ∗∗

∗ Intelligent Sensor-Actuator-Systems Laboratory (ISAS), Institute for Anthropomatics, Karlsruhe Institute of Technology (KIT), Germany (e-mail: [email protected]). ∗∗ Department of Automation, Shanghai Jiao Tong University, Shanghai, China, and CIAM and ACCESS Centers, Royal Institute of Technology, Stockholm, Sweden (e-mail: [email protected]).

Abstract: Given a circular probability density function, called the true probability density function, the goal is to find a Dirac mixture approximation based on some circular moments of the true density. When keeping the locations of the Dirac points fixed, but almost arbitrarily located, we are applying recent results on the circulant rational covariance extension problem to the problem of calculating the weights. For the case of simultaneously calculating optimal locations, additional constraints have to be deduced from the given density. For that purpose, a distance measure for the deviation of the Dirac mixture approximation from the true density is derived, which is then is minimized while considering the moment conditions as constraints. The method is based on progressive numerical minimization, converges quickly and gives well- distributed Dirac mixtures that fulfill the constraints, i.e., have the desired circular moments.

Keywords: Circular densities, Dirac mixture approximation, moment problem.

1. INTRODUCTION Moment problems have a long history in mathematics, going back to Chebyshev, Markov, and Lyapunov (for a reference, see, e.g., Krein and Nudelman (1977)). In the Periodic quantities occur in a variety of applications setting of this paper with probability densities on the unit (Mardia and Jupp, 2009). In the scalar case, they can circle, one needs to consider (truncated) trigonometric always be mapped to the circumference of a circle with moment problems: given a finite sequence (c0, c1, . . . , cN ) appropriate radius. When the error in these quantities is of numbers, find a positive measure dµ such that large, the topology cannot be ignored and the uncertainty is Z π described by means of circular probability density functions, inθ which leads to so called circular . Based on this e dµ = ck , n = 0, 1,...,N. description, systematic estimation, filtering, and data fusion −π can be performed. Non-periodic quantities will be called (Since c0 = 1, there are N nontrivial moments.) If the linear in the following. measure is absolutely continuous, there is a probability density p such that dµ = p(θ)dθ. In this paper, we shall Although the primary uncertainties often can be described consider the approximation of such a probability measure by simple continuous circular probability density functions by a µ that is a staircase function. One can formally regard such as the , see Sec. 6, or the such a measure as one with a probability density wrapped (Mardia and Jupp, 2009), processing these distributions quickly becomes difficult L X ˆ (Azmani et al., 2009; Stienne et al., 2013). Hence, a discrete p(θ) = wj δ(θ − θj) , (1) approximation is often useful, which could either mean j=1 discrete probabilities over a regular lattice or irregularly spaced samples. In the latter case we can distinguish where wj, j = 1,...,L, are positive weights adding up to ˆ between randomly or deterministically placed samples. In one, and where θj, j = 1,...,L, are the Dirac locations. this paper, we focus on deterministically placed samples, This is the formalism we shall use in this paper. which we will call Dirac mixture approximation, which either are equally weighted or equipped with weights (that For noncircular/linear quantities, moment-based Dirac sum to one). mixture approximations have been proposed in the context of the Linear Regression Kalman Filter (LRKF), see Replacing a continuous circular probability density function Lefebvre et al. (2005). As these are Gaussian filters, the by a Dirac mixture approximation can be posed as an Dirac mixture approximation is performed for Gaussian approximation problem. Typically, either the resolution, i.e., densities. Typical examples are the Unscented Kalman the number of discrete samples on the circle, is prespecified Filter (UKF) in Julier et al. (2000), its scaled version or some measure of the approximation quality is given. The in Julier (2002), and its higher-order generalization in parameters of the Dirac mixture approximation are then Tenne and Singh (2003). A generalization to an arbitrary calculated in such a way that the discrete approximation number of deterministic samples that are placed along matches some properties of the given continuous density, the coordinate axes is introduced in Huber and Hanebeck e.g., its moments. (2008). Also for linear quantities, systematic Dirac mixture approximation based on distance measures have been pro- 2. PROBLEM FORMULATION posed for the case of scalar continuous densities (Schrempf et al., 2006a,b). An algorithm for sequentially increasing the We consider periodic probability density functions p : IR → number of components is given in Hanebeck and Schrempf IR+ with p(θ) = p(θ + c) for c ∈ IR+. When c = 2π, p(θ) (2007) and applied to recursive nonlinear prediction in is called a circular density and it is sufficient to consider a Schrempf and Hanebeck (2007). Systematic Dirac mixture domain Γ that is some interval of length 2π. Without loss approximations of arbitrary multi-dimensional Gaussian of generality we will use Γ = [−π, π]. densities are calculated in Hanebeck et al. (2009). A more efficient method for the case of standard normal 2.1 Circular Dirac mixtures distributions with a subsequent transformation is given in Gilitschenski and Hanebeck (2013). In Hanebeck (2014), a A circular Dirac mixture density p(θ) with L Dirac faster version of Hanebeck et al. (2009) has been introduced components is given by (1) with positive weights, i.e., that gives slightly suboptimal results. The method does wi > 0 for i = 1,...,L, adding up to one and locations not rely on comparing the probability masses on all scales ˆ as in Hanebeck et al. (2009). Instead, repulsion kernels θi ∈ Γ for i = 1,...,L. δ(.) denotes the Dirac delta are introduced to assemble an induced kernel density and distribution. perform the comparison of the given density with its Dirac mixture approximation. 2.2 Circular moments For circular quantities, especially the von Mises distribution The circular moments of a probability density function p(θ) and the wrapped Normal distribution, a first approach to are given by Dirac mixture approximation in the spirit of the UKF Z π  inθ inθ is introduced in Kurz et al. (2013b). The approach is Ep e = e p(θ) dθ (2) based on matching the first circular moment and the −π locations of three Dirac components are calculated. The for n ∈ IN. Obviously, the circular moments are the resulting approximation has already been applied to coefficients of the Fourier series expansion of the probability sensor scheduling based on bearings-only measurements density function p(θ). (Gilitschenski et al. (2013)). The results are also used to perform recursive circular filtering for tracking an object For the circular Dirac mixture in (1), the circular moments constrained to an arbitrary one-dimensional manifold in are given by Kurz et al. (2013a), L Z π  inθ X inθ ˆ This paper focuses on Dirac mixture approximations with Ep e = wj e δ(θ − θj) dθ . (3) an arbitrary number of, say L, components that match j=1 −π a given number of, say N ≤ L, circular moments of a Using the sifting property of the Dirac delta distribution given circular probability density function. We start with gives Dirac mixture approximations with prespecified component L  inθ X inθˆj locations, where the weights are determined based on the Ep e = wj e . (4) theory of circulant rational covariance extension developed j=1 by Lindquist and Picci (2013), which in turn generalizes pre- vious results on reciprocal processes in Krener (1986); Levy 2.3 Approximation problem et al. (1990); Carli et al. (2011); Carli and Georgiou (2011) and on covariance extension with rationality constraints in Given a circular density p˜(θ), we would like to find a Dirac Georgiou (1987); Byrnes et al. (1995, 1999). Subsequently, mixture approximation p(θ) as in (1) that has exactly the we consider additionally optimizing the component loca- same first N circular moments tions, as we desire to adapt the density of Dirac components Z π inθ to the height of the given continuous probability density cn = e p˜(θ) dθ, n = 1, 2,...,N. (5) function. This gives another L parameters. For the under- −π determined case, i.e., the number of constraints resulting Clearly c0 = 1 is fixed. from the given circular moments is less than the number Remark 1. Many interesting variations of this approxima- of parameters, we minimize a measure of deviation of tion problem exist. First of all, N arbitrary moments could the Dirac mixture approximation from the given circular be considered. Second, it would also make sense to consider probability density function while considering the moment more given moments than there are degrees of freedom conditions as constraints. This is performed in analogy to in the approximating density. In that case, the moments the ideas proposed for linear quantities in Hanebeck (2014). cannot be exactly attained. Instead, some error between The paper is structured as follows. A detailed formulation the given moments and the approximate moments has to of the considered approximation problem is given in Sec. 2. be minimized. The resulting moment problem is formulated in Sec. 3. The case of a Dirac mixture approximation with fixed 3. MOMENT PROBLEM locations is treated in Sec. 4. For the case of a smaller amount of moments constraints compared to the number of We will now formalize the problem of constructing a Dirac parameters of the Dirac mixture approximation, a distance mixture approximation for a given density in such a way measure between the Dirac mixture approximation and that the two densities exactly share the first N circular the given density is derived in Sec. 5 and the resulting moments. First of all, we focus on symmetric densities, constrained optimization problem is given. An example which simplifies the notation. The results can then easily of approximating circular probability density functions be generalized to non-symmetric densities. Without loss of is given in Sec. 6, where a von Mises distribution is generality it is sufficient to consider densities symmetric approximated. Conclusion are given in Sec. 7. to θ = 0 during the approximation, as the resulting Dirac mixture approximation can later be shifted to arbitrary some of the weights wj are constrained to be zero. We center locations by shifting each component accordingly. also define the zeroth moment c0, which clearly is one, i.e., c = 1. For symmetric densities p(θ), the imaginary parts of the 0 circular moments are zero, so that we are left with real 4.1 Dual cones moments Z π Ep {cos(n θ)} = cos(n θ) p(θ) dθ (6) Consider the class of trigonometric polynomials −π N that are from now on used as the given moments c = iθ X −ikθ n Q(e ) = qke , q−k = qk , (11) E {cos(n θ)}. p k=−N For symmetric Dirac mixtures, we have to distinguish two T N+1 where q := [q0, q1, . . . , qN ] ∈ R with q0 > 0. Let cases, i.e., even and odd numbers of components. Here we P (L) be the cone of all such (N + 1)-vectors for which focus on an even number of components, so that the circular + Dirac mixture centered around θ = 0 can be written as Q(ζj) ≥ 0 , j = 1, 2,...,L. (12) L/2 T Then, setting qˆ = [Q(ζ0),Q(ζ1),...,Q(ζL−1)] , we have X  ˆ ˆ  p(θ) = wj · δ(θ + θj) + δ(θ − θj) (7) qˆ = Fq , (13) j=1 where ˆ  ˆ ˆ ˆ  with θj ∈ [0, π] for j = 1, . . . , L/2. Its moments are given 1 2 cos(θ1) 2 cos(2θ1) ··· 2 cos(Nθ1) by ˆ ˆ ˆ 1 2 cos(θ2) 2 cos(2θ2) ··· 2 cos(Nθ2) L/2 F = . . . .  , (14) X ˆ ......  Ep {cos(n θ)} = 2 wj cos(n θj) . (8) . . . . .  ˆ ˆ ˆ j=1 1 2 cos(θL) 2 cos(2θL) ··· 2 cos(NθL) since We are given N moments cn for n = 1,...,N that we N would like to match with a symmetric Dirac mixture iθ X approximation with L components as in (7). When weights Q(e ) = q0 + 2 qk cos(kθ) . and locations are free parameters, we have L − 1 param- k=1 eters available for matching the given N moments. The Consequently, the vectors q ∈ P+(L) are precisely the q satisfying Fq ≥ 0. However, qˆ 6= 0 since q0 > 0. We also parameters are collected in a parameter vector η given by ◦  ˆ ˆ T define the interior P+(L) of P+(L). This is an open cone η = w1, . . . , wL/2, θ1,..., θL/2 (9) which requires all qˆj := Q(ζj), j = 0, 1,...,L − 1, to be and we now parametrize the circular Dirac mixture in positive. (7) as p(θ) = p(θ, η). The parameter vector is confined The L × (N + 1) matrix F is a Vandermonde matrix and, to the domain S = IR+ × ... × IR+ × Γ × ... × Γ with 2 w + ... + 2 w = 1. as L > 2N, it has full row rank. It follows from the theory 1 L/2 of discrete Fourier transforms and is easy to check that ˆ ˆ T In Sec. 4, we fix θ1,..., θL in advance and just determine F F = LI , (15) the weights w1, . . . , wL. With this method, the moment where I is the (N + 1) × (N + 1) identity. Hence equations should actually be under-determined leaving 1 room for tuning parameters. q = FT qˆ . (16) L 4. APPLYING THE CIRCULANT RATIONAL TRIGONOMETRIC MOMENT PROBLEM Next define the dual cone C+(L) of all T N+1 c := [c0, c1, . . . , cN ] ∈ R with c0 = 1 (17) In this section we apply some recent results in Lindquist such that and Picci (2013) to the approximation problem of Section 2. T (In that paper the density p is a spectral density, but the c q ≥ 0 for all q ∈ P+(L) , (18) formalism remains the same.) Then we need to fix the ◦ and let the open cone C+(L) be its interior. Using locations of the Dirac points a priori, but by choosing many Plancherel’s Theorem for discrete Fourier transforms we equidistant points on the discrete circle and constraining have the weights to be zero at certain points, we can achieve L−1 an almost arbitrary selection of the locations of the Dirac T 1 X c q = C(ζj)Q(ζj) , (19) points. L j=0 i∆ To this end, set ζ1 := e , where ∆ := 2π/L, and define the where k ik∆ N the discrete variable ζ taking the L values ζk := ζ1 = e iθ X ikθ C(e ) = cke , (20) running counter-clockwise on the discrete unit circle TL. ¯ k=−N In particular, ζL−k = ζk. Then, for a sufficiently large L > 2N, we consider the Dirac delta distributions in (1) yielding an alternative formulation of (18). Since the with θˆ := (j − 1)∆ leading to a moment problem moments of a periodic probability density have a positive j Toeplitz matrix, the following proposition follows from L−1 X n Proposition 6 in Lindquist and Picci (2013). ζj wj+1 = cn, n = 1, 2,...,N, (10) T j=0 Proposition 2. Let c = [c1, c2, . . . , cN ] be the moments ◦ where c1, c2, . . . , cN are the moments defined in (5) of the (5). Then there is an integer L0 so that c ∈ C+(L) for all given continuous probability density p˜ to be matched, and L ≥ L0. 4.2 Point-mass approximation by convex optimization where P (ζj−1) ˆ wj = , θj = (j − 1)∆ . (30) We are now in the position to reformulate Theorems 3, 4, Q(ζj−1) and 7 in Lindquist and Picci (2013) in the present setting. In particular, we can use the maximum entropy solution ◦ setting P = 1. Theorem 3. (Lindquist and Picci (2013)). Let c ∈ C+ (L). Then, for each P ∈ P+(L), there is a unique Qˆ ∈ P+(L) such that 4.3 A generalization of Theorem 3 P (ζ) p (ζ) = (21) L ˆ Inspecting the proof of Theorem 3 in Lindquist and Picci Q(ζ) (2013) and modifying it along the lines of Byrnes and satisfies the moment condition Lindquist (2003, 2006), it is trivial to see that P need not L−1 belong to P (N) but could, for example, be an arbitrary X k + ζj pL(ζj) = ck , k = 0, 1,...,N. (22) probability density. j=0 Theorem 5. Let f be an arbitrary continuous probability If Qˆ has zeros on the unit circle, then P have the same density. Then, replacing P by f in Theorem 3, all state- zeros, and there is cancellation of zeros in (21). Moreover, ments of that theorem remain true. Moreover, QL tends to a limit Q as L → ∞ and Qˆ is the unique minimum of the strictly convex functional ∞ L−1 f(ζ) lim pL(ζ) = p∞(ζ) := . (31) X   L→∞ Q (ζ) JP (q) = C(ζj)Q(ζj) − P (ζj) log Q(ζj) , (23) ∞ j=0 The limit density p∞ satisfies the moment conditions (5). where the variable Q is given by (11) and C by (20). Moreover, Proof. Exchanging P for f, a trivial modification of the P (ζ) proofs of Theorems 3, 4, and 7 in Lindquist and Picci (2013) lim p (ζ) = , (24) proves the theorem. L ˜ L→∞ Q(ζ) Corollary 6. Letp ˜ be a probability density with the mo- where Q˜ is the unique minimum of the functional ments c1, c2, . . . , cN , and set P :=p ˜ everywhere in the Z π statement of Theorem 3. Then  iθ iθ iθ iθ  JP (Q) = C(e )Q(e )−P (e ) log Q(e ) dθ . (25) lim pL(ζ) =p ˜(ζ) . (32) −π L→∞

It is shown in Lindquist and Picci (2013) that the problem Proof. The limit of pL as L → ∞ is obtained from (31) to minimize (23) is the dual problem of the primal problem setting f :=p ˜. It remains to show that Q∞ ≡ 1 in this to maximize the generalized entropy gain case so that p∞ =p ˜, which will be done by means of the L−1 primal optimization problems of Remark 4. However, since X IP (p) = P (ζj) log p(ζj) (26) p∞ satisfies the moment conditions (5), the minimum of j=0 D(˜pkp) ≥ 0 is zero, and hence the unique minimizer of the Kullback-Leibler divergence D(˜pkp) subject to the moment subject to the moment condition constraints (5) is precisely p˜. Consequently, the optimal L−1 dual solution is Q∞ ≡ 1, as claimed. X n ζj p(ζj) = cn , n = 1, 2,...,N. j=1 4.4 Determining nonequidistant circular Dirac mixtures The optimal solution of the primal problem is precisely (21), where Q is the optimal solution to the dual problem to As explained in the introduction, in general we would like minimize (23). Choosing P = 1, we obtain the maximum- to place the Dirac points denser where the probability entropy solution. mass is larger and more sparse where it is small. Then the procedure above needs to be modified. Remark 4. It easy to see that maximizing IP is equivalent minimizing the Kullback-Leibler divergence To this end, let M be the unique integer such that L = 2M L−1   when L is even and L = 2M + 1 when L is odd, and let ˆ X P (ζj) D(P kp) = P (ζj) log (27) R+ be the class of probability densities p(ζj) j=0 M iθ X ikθ with respect to p. Likewise, the problem to minimize (25) R(e ) = rke , r−k = rk (33) is the dual of the problem to minimize the continuous k=−M Kullback-Leibler divergence that are nonnegative on the discrete unit circle. By Z π  iθ  iθ P (e ) Theorem 5, we can replace P ∈ P+ by R ∈ R+ in the D(P kp) = P (e ) log iθ dθ (28) statement of Theorem 3. We now choose L sufficiently large −π p(e ) and remove Dirac points by setting R(ζj) = 0 in points subject to the moment constraints in (5). (See, e.g., where no point mass is desired. Solving the optimization Georgiou and Lindquist (2003).) problem of Theorem 3, we then obtain a solution (29) with ˆ ˆ iθk If we are content with equidistantly placed Dirac point, we θk removed whenever R(e ) = 0. now immediately have a procedure to determine a solution To construct such an R we choose a vector ˆr ∈ M+1, such to the approximation problem of Section 2. In fact, applying R that rˆ = R(ζ ), j = 0, 1,...,M, in the following manner. Theorem 3, we obtain the solution j j Choose rˆj = 0 in positions where no Dirac point is needed ˆ ˆ T η = [w1, . . . , wL, θ1,..., θL] , (29) and take the other rˆj to be positive tuning parameters. We fix only M + 1 values for R on the circle, rather than L, the given density at the location of the Dirac component. ¯ since the symmetry condition ζL−k = ζk will then prescribe For that purpose, we take an unnormalized von Mises ˆ values for R in the remaining L − M − 1 points. Then, distribution (see Sec. 6) according to eκj cos(θ−θj ) located at analogously to (16), we obtain θˆ . Dividing it by a normalization term eκj , it is normalized   j r0 to height one. Multiplying it with the height of the given r1   1 T density at the Dirac component location p˜(θˆ ) gives the r :=  .  = F˜ ˆr , (34) j  .  M + 1 desired kernel functions ˆ r κj cos(θ−θj ) M ˆ e kj(θ) =p ˜(θj) (37) where F˜ is the (M + 1) × (M + 1) matrix eκj  ˆ ˆ  for j = 1,...,N, where κj is a spread parameter to be 1 2 cos(θ1) ··· 2 cos(Mθ1) determined. ˆ ˆ 1 2 cos(θ2) ··· 2 cos(Mθ2)  F˜ =   . (35) The constants κ in (37) are determined by maintaining . . .. .  j . . . .  the mass in the Dirac component given by wj, so we obtain ˆ ˆ 1 2 cos(θM+1) ··· 2 cos(MθM+1) Z π ˜ kj(θ) dθ = wj , (38) Since F is a square Vandermonde matrix, it is nonsingular −π and which gives F˜F˜ T = F˜ T F˜ = (M + 1)I , (36) ˆ −κi where I is the (M + 1) × (M + 1) identity, so there is a one- 2π p˜(θj) e I0(κi) = wj , (39) one correspondence between r and ˆr. Therefore inserting where I0(.) is the of the first kind and zeroth (34) into (33) yields a suitable R. order. For obtaining κj, j = 1,...,N, (39) is rewritten as w f(κ) = 2π e−κ I (0, κ) − i (40) 5. DISTANCE MEASURE 0 ˆ p˜(θi) In this section, we assume that more Dirac components and we desire to find the zero κ0 of the function, i.e., are available than required for satisfying the given moment f(κ0) = 0. For that purpose, a Newton–like method is used constraints. This redundancy is exploited by continuously with placing the Dirac components in such a way that the ∂ f(κ) = 2π e−κ (I (κ) − I (κ)) . (41) resulting circular Dirac mixture p(θ) is as close to the ∂ κ 1 0 given continuous density p˜(θ). Closeness between the two densities, the given continuous density p˜(θ) and its The resulting kernels have a larger width in low-density approximation p(θ), is quantified by a distance measure. regions of the given continuous density, as these regions are characterized with few Dirac components per unit length. For the direct comparison of continuous densities and Dirac In high-density areas, the kernels have a smaller widths, mixture densities, typical distance measures such as the which results in more Dirac components per unit length. Kullback-Leibler distance or squared integral distances are not well defined (Hanebeck and Klumpp, 2008). An The density induced by the kernels in (37) is given by alternative in the case of linear quantities is to compare N X cumulative distributions instead of densities. Doing so is, k(θ, η) = kj(θ) . (42) however, complicated in the circular case. In this paper, j=1 we adapt the method proposed for linear quantities in Hanebeck (2014) to the circular case. This approach is This induced kernel density k(θ, η) is now used to char- inspired by blue-noise sampling methods, e.g., see Schl¨omer acterize the circular Dirac mixture p(θ). We assume that et al. (2011). Here, we use kernels for describing the mass when the mass distributions of p˜(θ) and the circular Dirac of individual Dirac components as in Fattal (2011) instead mixture p(θ) are similar, then k(θ) is close to p˜(θ) on of discs. Γ and we consider the densities p˜(θ) and p(θ) to be close. Following this argument, k(θ) is used instead of the circular The key idea of this paper is the following. Individual Dirac mixture p(θ, η) to define a distance measure D(η) Dirac components of the circular Dirac mixture p(θ) are between p˜(θ) and p(θ, η) on Γ. characterized by repulsion kernels. A repulsion kernel is an appropriate kernel function representing the spread Here, we use a squared integral distance measure given by Z of probability mass that is required to approximate the 2 given density at a given location. The individual repulsion D(η) = (˜p(θ) − k(θ, η)) dθ . (43) kernels are summed up to form an induced kernel density, Γ which is then a continuous representation of the circular As the final optimization problem, we now desire to Dirac mixture p(θ). For obtaining a distance measure, the minimize the distance measure in (43) while maintaining deviation between the given density and the induced kernel the moment constraints, i.e., density is considered. In this paper, we use a squared integral distance, which is minimized by means of an ηopt = arg min d(η) η∈S (44) optimization method in order to find the parameters of the s.t. E {cos(n θ)} = c for n = 1,...,N. circular Dirac mixture p(θ). p n Minimization of the distance measure under the moment 5.1 Using von Mises distribution kernels constraints is nonlinear and nonconvex. Applying standard numerical minimization methods to (44) will most likely end ˆ For characterizing each Dirac component j located at θj, up in local minima and the results depend on the starting we use a von Mises distribution normalized to the height of values for the parameter vector. As this is undesired, we κ=2 π/2 D2 = 0.052 L=4 0.6 N=0

0.5 0.6

0.4 0.4 → →

) π ) θ 0.3 θ 0

p( π p( − 0.2 0.2 0 0.1 1 1 0 0 0 −2 0 2 sin(θ) → −1 −1 θ → θ → cos( ) −π/2

Fig. 1. Graphical visualizations for a symmetric Dirac mixture approximation of a von Mises distribution with L = 4 components and no moment constraints (N=0) for κ = 2. (Left) Standard plot. (Middle) Circular plot in three dimensions. (Right) Circular plot in two dimensions. propose the use of a homotopy continuation method (Allgo- The given circular probability density function p˜(θ) is wer and Georg, 2003). In doing so, we start with a density approximated with a circular Dirac mixture p(θ, η) defined with a known approximation and gradually approach the ˆ according to (7). Both weights wj and locations θj are desired density. For that purpose, a progression parameter optimized for j = 1,...,L. γ is defined for parametrizing the given density as p˜γ (θ) The approximation problem can now be simplified by noting p˜(θ, γ) = . (45) that it is sufficient to consider a specific fixed location value R π γ −π p˜ (θ) dθ µ, where without loss of generality we use µ = 0. The Dirac We start with γ = 0, which gives us a uniform density on mixture approximation calculated for this location value the circle, i.e., we have can later be shifted to arbitrary location values by shifting 1 each component accordingly. As a result, the true circular p˜(θ, γ = 0) = . (46) moments are given by 2π The Dirac mixture approximation for this density is known In(˜κ) and given by cn = (50) I (˜κ) 1 π 0 wj = and θj = (2j − L − 1) (47) L L for n ∈ IN, where I0(.) is the Bessel function of the first for j=1, . . . , L. kind and zeroth order and In(.) is the Bessel function of the first kind and nth order. We now progressively increase the parameter γ and for every γ solve the optimization problem by using the result from For a von Mises distribution, the progressive parametriza- the previous step as starting values, which ensures tracking tion is given by the desired solution, see Alg. 1. No predictor is used. As 1 a corrector in line 7, we use the function fmincon for p˜(θ, γ) = eγκ˜ cos(θ−µ) . (51) constrained optimization in our MATLAB implementation. 2πI0(γκ˜) A maximum amount of iterations is prespecified and success is reported when the optimum for the next value of γ can For a von Mises distribution with κ˜ = 2, a symmetric Dirac be found within these iterations. mixture approximation of a von Mises distribution with L = 4 components and no moment constraints (N=0) is 6. AN APPROXIMATION EXAMPLE visualized with three different types of plots in Fig. 1. The given density is always shown in yellow, the induced kernel As an example of a continuous circular density, we consider density corresponding to its Dirac mixture approximation the von Mises distribution (Mardia and Jupp, 2009) given in blue. On the left side, a standard linear plot of the by 1 periodic probability density functions is shown. The middle p(θ, µ, κ˜) = eκ˜ cos(θ−µ) , (48) plot shows a circular plot in three dimensions. The right 2πI0(˜κ) graph shows a dedicated circular plot in two dimensions where µ is the location parameter, κ˜ is the concentration to compactly represent the results of the simulations. parameter, and we have µ ∈ Γ and κ˜ > 0. I0(.) is the Bessel The outermost ring shows the given circular probability function of the first kind and zeroth order. density function in yellow and the induced kernel density The circular moments of the von Mises density are given corresponding to its Dirac mixture approximation in by blue. The middle ring shows the locations of the Dirac components in purple, where the width is proportional to  inθ In(˜κ) inµ Ep e = e (49) the associated weight. The innermost ring shows the error I0(˜κ) between the given circular probability density function for n ∈ IN. In(.) is the Bessel function of the first kind and and the induced kernel density corresponding to its Dirac nth order. mixture approximation normalized to one in red. Dirac mixture approximation maintaining a set of predefined circular moments. It will be Input :Given densityp ˜, number of Dirac components L, interesting to compare the performance of the two proposed number of moments N to maintain methods. This will be the topic of a future study. Output :Dirac mixture approximation with weights wj and The next step is the generalization to higher-dimensional ˆ locations θj , j = 1,...,L collected in parameter η periodic manifolds such as torii or spheres. according to (9) // Initialize progression step counter ACKNOWLEDGEMENTS PC = 0; 1 // Initialize progression parameter γ The authors would like to thank Gerhard Kurz and Igor γ = 0; 2 Gilitschenski from the Intelligent Sensor-Actuator-Systems // Initialize ∆γ Laboratory (ISAS) at the Karlsruhe Institute of Technology ∆γ =  (some small positive number); 3 (KIT) for their helpful comments. // Parameters for in-/decreasing ∆γ Down = 0.5 , Up = 1.5; 4 // Initial parameter vector for γ = 0 REFERENCES ˆ Initialize η with wj and θj , j = 1,...,L from (47); 5 Allgower, E.L. and Georg, K. (2003). Introduction to while γ < 1 do 6 Numerical Continuation Methods. SIAM. // Try correcting values for increased γ Azmani, M., Reboul, S., Choquel, J.B., and Benjel- [ηtmp, success] = Corrector(η, γ + ∆γ); 7 loun, M. (2009). A recursive fusion filter for angu- lar data. In 2009 IEEE International Conference on if Correction step successful? then 8 // Make trial update the temporary estimate Robotics and Biomimetics (ROBIO), 882–887. doi: η = ηtmp; 9 10.1109/ROBIO.2009.5420492. // Increment γ Byrnes, C.I. and Lindquist, A. (2006). The generalized γ = γ + ∆γ; 10 moment problem with complexity constraint. Integral Equations and Operator Theory, 56, 163–180. // Increase step size ∆γ = Up ∗ ∆γ; 11 Byrnes, C., Gusev, S., and Lindquist, A. (1999). A convex optimization approach to the rational covariance // Increment progression step counter PC extension problem. SIAM J. Control and Optmimization, PC = PC + 1; 12 37, 211–229. else 13 // Decrease step size Byrnes, C. and Lindquist, A. (2003). A convex opti- ∆γ = Down ∗ ∆γ; 14 mization approach to generalized moment problems. end 15 In K. Hashimoto, Y. Oishi, and Y. Yamamoto (eds.), // Limit γ to [0, 1] Control and Modeling of Complex Systems: Cybernetics if γ + ∆γ > 1 then 16 in the 21st Century: Festschrift in Honor of Hidenori ∆γ = 1 − γ; 17 Kimura on the Occasion of his 60th Birthday, 3–21. end 18 Birkhauser. end 19 Byrnes, C., Lindquist, A., Gusev, S., and Matveev, A. (1995). A complete parameterization of all positive Algorithm 1: Dirac mixture approximation of given rational extensions of a covariance sequence. IEEE continuous circular probability density function. Trans. Automatic Control, 40, 1841–1857. Carli, F., Ferrante, A., Pavon, M., and Picci, G. (2011). Circular plots of the Dirac mixture approximation of a von A maximum entropy solution of the covariance exten- Mises distribution with L = 6 and L = 8 components sion problem for reciprocal processes. IEEE Trans. and no moment constraints (N=0) for (left) κ = 0, Automatic Control, 56, 1999–2012. (middle) κ = 1, (right) κ = 3 in Fig. 2 and Fig. 3, Carli, F.P. and Georgiou, T. (2011). On the covariance respectively. It is obvious that for a larger number of completion problem under a circulant structure. IEEE components L, the error between the given density and its Trans. Automatic Control, 56(1), 918 –922. Dirac mixture approximation decreases. For N = L − 1 Fattal, R. (2011). Blue-noise point sampling us- moment constraints, circular plots of the Dirac mixture ing kernel density model. In ACM SIGGRAPH approximation of a von Mises distribution with L = 6 and 2011 papers, SIGGRAPH ’11, 48:1–48:12. ACM, New L = 8 components are shown for (left) κ = 0, (middle) York, NY, USA. doi:10.1145/1964921.1964943. URL κ = 1, (right) κ = 3 in Fig. 4 and Fig. 5, respectively. Here, the errors are always a littler larger than in the http://doi.acm.org/10.1145/1964921.1964943. unconstrained case. However, in all cases, the Dirac mixture Georgiou, T.T. and Lindquist, A. (2003). Kullback-leibler approximations are well-distributed and approximation approximation of spectral density functions. IEEE quality is very good. Trans. Information Theory, 49, 2910–2917. Georgiou, T. (1987). Realization of power spectra from 7. CONCLUSION partial covariances. IEEE Trans. on Acoustics, Speech and Signal Processing, 35, 438–449. The first proposed method for calculating the weights Gilitschenski, I. and Hanebeck, U.D. (2013). Efficient for a Dirac mixture approximation with fixed component Deterministic Dirac Mixture Approximation. In Pro- locations has not been simulated so far. Simulations for the ceedings of the 2013 American Control Conference (ACC second proposed approximation method are shown in the 2013). Washington D. C., USA. paper. It reliably provides well-distributed Dirac mixture Gilitschenski, I., Kurz, G., and Hanebeck, U.D. (2013). approximations of given circular densities while exactly Circular Statistics in Bearings-only Sensor Scheduling. κ=0 π/2 D2 = 0.028 κ=1 π/2 D2 = 0.037 κ=3 π/2 D2 = 0.051 L=6 L=6 L=6 N=0 N=0 N=0

π π π 0 0 0 −π −π −π

−π/2 −π/2 −π/2

Fig. 2. Circular plot of Dirac mixture approximation of von Mises distribution with L = 6 components and no moment constraints (N=0) for (left) κ = 0, (middle) κ = 1, (right) κ = 3.

κ=0 π/2 D2 = 0.026 κ=1 π/2 D2 = 0.033 κ=3 π/2 D2 = 0.044 L=8 L=8 L=8 N=0 N=0 N=0

π π π 0 0 0 −π −π −π

−π/2 −π/2 −π/2

Fig. 3. Circular plot of Dirac mixture approximation of von Mises distribution with L = 8 components and no moment constraints (N=0) for (left) κ = 0, (middle) κ = 1, (right) κ = 3.

In Proceedings of the 16th International Conference on IFAC World Congress (IFAC 2008), volume 17. Seoul, Information Fusion (Fusion 2013). Istanbul, Turkey. Republic of Korea. Hanebeck, U.D. (2014). Kernel-based Deterministic Blue- Julier, S., Uhlmann, J., and Durrant-Whyte, H.F. (2000). noise Sampling of Arbitrary Probability Density Func- A new method for the nonlinear transformation of tions. In Proceedings of the 48th Annual Conference means and covariances in filters and estimators. IEEE on Information Sciences and Systems (CISS 2014). Transactions on Automatic Control, 45(3), 477–482. doi: Princeton, New Jersey, USA. 10.1109/9.847726. Hanebeck, U.D., Huber, M.F., and Klumpp, V. (2009). Julier, S.J. (2002). The scaled unscented transformation. Dirac Mixture Approximation of Multivariate Gaussian In American Control Conference, 2002. Proceedings Densities. In Proceedings of the 2009 IEEE Conference of the 2002, volume 6, 4555– 4559 vol.6. IEEE. doi: on Decision and Control (CDC 2009). Shanghai, China. 10.1109/ACC.2002.1025369. Hanebeck, U.D. and Klumpp, V. (2008). Localized Cumu- Krein, M. and Nudelman, A. (1977). The Markov Moment lative Distributions and a Multivariate Generalization Problem and Extremal Problems. American Mathemati- of the Cram´er-von Mises Distance. In Proceedings of cal Society, Providence, Rhode Island. the 2008 IEEE International Conference on Multisensor Krener, A. (1986). Reciprocal processes and the stochastic Fusion and Integration for Intelligent Systems (MFI realization problem for acausal systems. In C.I. Byrnes 2008), 33–39. Seoul, Republic of Korea. and A. Lindquist (eds.), Modelling, Identification and Hanebeck, U.D. and Schrempf, O.C. (2007). Greedy Al- Robust Control, 197–211. North-Holland, Amsterdam. gorithms for Dirac Mixture Approximation of Arbitrary Kurz, G., Faion, F., and Hanebeck, U.D. (2013a). Con- Probability Density Functions. In Proceedings of the strained Object Tracking on Compact One-dimensional 2007 IEEE Conference on Decision and Control (CDC Manifolds Based on Directional Statistics. In Proceed- 2007), 3065–3071. New Orleans, Louisiana, USA. ings of the Fourth IEEE GRSS International Conference Huber, M.F. and Hanebeck, U.D. (2008). Gaussian Fil- on Indoor Positioning and Indoor Navigation (IPIN ter based on Deterministic Sampling for High Qual- 2013). Montbeliard, France. ity Nonlinear Estimation. In Proceedings of the 17th κ=0 π/2 D2 = 0.028 κ=1 π/2 D2 = 0.043 κ=3 π/2 D2 = 0.062 L=6 L=6 L=6 N=4 N=4 N=4

π π π 0 0 0 −π −π −π

−π/2 −π/2 −π/2

Fig. 4. Circular plot of Dirac mixture approximation of von Mises distribution with L = 6 components and N = 4 moment constraints for (left) κ = 0, (middle) κ = 1, (right) κ = 3.

κ=0 π/2 D2 = 0.026 κ=1 π/2 D2 = 0.038 κ=3 π/2 D2 = 0.070 L=8 L=8 L=8 N=6 N=6 N=6

π π π 0 0 0 −π −π −π

−π/2 −π/2 −π/2

Fig. 5. Circular plot of Dirac mixture approximation of von Mises distribution with L = 8 components and N = 6 moment constraints for (left) κ = 0, (middle) κ = 1, (right) κ = 3.

Kurz, G., Gilitschenski, I., and Hanebeck, U.D. (2013b). Schrempf, O.C., Brunn, D., and Hanebeck, U.D. (2006a). Recursive Nonlinear Filtering for Angular Data Based Density Approximation Based on Dirac Mixtures with on Circular Distributions. In Proceedings of the 2013 Regard to Nonlinear Estimation and Filtering. In American Control Conference (ACC 2013). Washington Proceedings of the 2006 IEEE Conference on Decision D. C., USA. and Control (CDC 2006). San Diego, California, USA. Lefebvre, T., Bruyninckx, H., and De Schutter, J. (2005). Schrempf, O.C., Brunn, D., and Hanebeck, U.D. (2006b). The linear regression kalman filter. In Nonlinear Kalman Dirac Mixture Density Approximation Based on Mini- Filtering for Force-Controlled Robot Tasks, volume 19 of mization of the Weighted Cram´er-von Mises Distance. Springer Tracts in Advanced Robotics. In Proceedings of the 2006 IEEE International Confer- Levy, B.C., Frezza, R., and Krener, A.J. (1990). Model- ence on Multisensor Fusion and Integration for Intel- ing and estimation of discrete-time gaussian reciprocal ligent Systems (MFI 2006), 512–517. Heidelberg, Ger- processes. IEEE Trans. Automatic Control, 35(9), 1013– many. 1023. Schrempf, O.C. and Hanebeck, U.D. (2007). Recursive Lindquist, A. and Picci, G. (2013). The circulant rational Prediction of Stochastic Nonlinear Systems Based on covariance extension problem: The complete solution. Optimal Dirac Mixture Approximations. In Proceedings IEEE Trans. Automatic Control, 58(11), 2848–2861. of the 2007 American Control Conference (ACC 2007), Mardia, K.V. and Jupp, P.E. (2009). Directional Statistics. 1768–1774. New York, New York, USA. John Wiley & Sons. Stienne, G., Reboul, S., Azmani, M., Choquel, J., and Schl¨omer, T., Heck, D., and Deussen, O. (2011). Benjelloun, M. (2013). A multi-temporal multi-sensor Farthest-point optimized point sets with maximized circular fusion filter. Information Fusion. doi: minimum distance. In Proceedings of the ACM 10.1016/j.inffus.2013.05.012. SIGGRAPH Symposium on High Performance Tenne, D. and Singh, T. (2003). The higher order un- Graphics, HPG ’11, 135–142. ACM, New York, scented filter. In American Control Conference, 2003. NY, USA. doi:10.1145/2018323.2018345. URL Proceedings of the 2003, volume 3, 2441– 2446 vol.3. http://doi.acm.org/10.1145/2018323.2018345.