Optimal Quantization of Circular Distributions

Igor Gilitschenski1, Gerhard Kurz2, Uwe D. Hanebeck2, and Roland Siegwart1

Abstract—In this work, a new method for approximating In linear state-spaces, one of the approaches overcoming circular distributions by a mixture of weighted these limitations is based on stochastic quantization. Quanti- discrete samples is proposed. Particularly, the wrapped normal zation has been well known for many years in the context of distribution, the , and the Bingham distri- bution are considered. The approximation approach is based on signal processing and information compression [4]. Recently, formulating a quantizer and a global optimality measure, which there has been an emphasis on the quantization of probability can be optimized directly. Furthermore, a relationship between distributions [5]. For the Gaussian case, this has been proposed the and the von Mises distribution are in [6], which is based on a quadratic quantization approach. formulated showing that it is sufficient to approximate a von In this work, we investigate optimal quadratic quantization Mises distribution with suitably chosen parameters in order to obtain an optimal approximation of the Bingham distribution. of circular probability distributions. The involved optimality The resulting approximation is of particular interest for filtering measure gives rise to an error bound for computing expectations applications, because the involved optimality measure gives rise to of nonlinear transformations of random variables. The proposed a general error estimate in propagation of uncertainties through quantizers are of considerable interest for nonlinear filtering nontrivial functions in the circular domain. involving circular quantities because a discrete distribution I.INTRODUCTION with a sufficiently large number of samples can be chosen to maintain a predefined quality level for the prediction step. This A broad range of applications in perception and sensing work considers the wrapped normal, the von Mises, and the requires estimation of quantities that are defined on inherently Bingham distributions because of their importance in many nonlinear state-spaces. These quantities involve direction of the applications involving circular data. However, the proposed wind, poses of a robot, or direction of arrival measurements in approach is easily adaptable to other circular distributions and signal processing. Linear approximation of such state-spaces other application scenarios besides stochastic filtering, such as may be employed in cases of low noise. This gives raise to analysis of time series involving periodic quantities. using well-known linear estimation techniques such as the In summary, this paper contains the following contributions Kalman filter. However, in cases involving high levels of noise, neglecting the underlying geometric structure of the manifold • A novel method for approximating probability densities becomes infeasible. on the circle with applications to wrapped normal (WN) and von Mises (VM) distributions. For the circle, which is considered in this work, a number of • Derivation of a simple relationship between the VM and different filtering techniques have been proposed that take the Bingham distributions for easily adapting the approxima- underlying geometry into account. This is achieved by making tions of the VM distribution to the Bingham case. use of probability distributions from directional • An error bound for scenarios where the resulting sample [1], [2], [3], which is a subfield of statistics that considers set is applied to numerical integration for Lipschitz uncertainties defined on these manifolds. For example, circular continuous transforms of random variables. equivalents of a Kalman Filter may be based on matching • Evaluation comparing the proposed approach to other trigonometric moments and are used for consideration of simple state-of-the-art approaches. system models, particularly the identity or a shift. Whenever system models are more complex, the continuous The remainder of this paper is structured as follows. The next distribution representing the current estimate is typically section gives an overview of related work. Some preliminaries approximated by a discrete distribution in order to simplify on circular distributions and stochastic quantization are given propagation. For the linear case, there exists an extensive body in Sec. III. The proposed method is presented in Sec. IV and of research discussing different types of such approximations. evaluated in Sec. V. Finally, the work is concluded in Sec. VI. However, this is not the fact when approximating continuous densities defined on the circle as state-of-the-art is limited by II.RELATED WORK only using a given number of discrete values (as in the UKF) There are numerous applications involving stochastic filtering or not providing any provable error bound for the estimation based on . In [7], a von Mises distribution procedure. based filter is used for azimutal speaker tracking. This distribu- tion is also used in [8] for GNSS signal tracking. Active speaker 1 I. Gilitschenski and R. Siegwart are with the Autonomous Systems Labora- tory (ASL), Institute for Robotics and Intelligent Systems, ETH Zurich, Switzer- localization using von Mises and wrapped normal distributions land (e-mail: [email protected], [email protected]). is discussed in [9], [10]. A multivariate variant of the wrapped 2 G. Kurz and U. D. Hanebeck are with the Intelligent Sensor- is used in context of radar signal processing Actuator-Systems Laboratory (ISAS), Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology (KIT), Germany (e-mail: in [11]. Furthermore, a discussion of circular estimation using [email protected], [email protected]). lattice theory is presented in [12]. Early literature considering 0.4 0.4 1

0.3 0.3 ) θ ( ) ) θ θ ( 0.2 ( 0.2 0.5 VM WN f f Bingham 0.1 0.1 f

0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 −1 −1 −1 −1 −1 −1 sin(θ) cos(θ) sin(θ) cos(θ) sin(θ) cos(θ) Figure 1: Density functions of the probability distributions considered in this work. All location parameters were set in a way ensuring the / one to be at 0. The wrapped normal density (red) has dispersion parameter σ = 1, for the von Mises density (green) κ = 1 was choosen, and the Z matrix of the Bingham density (blue) was choosen as Z = diag(−10, 0).

filtering on nonlinear domains dates back to [13], [14] and is related to the normal distribution in some sense. Their density still an active are of research [15]. functions are shown in Fig. 1. Most of the nonlinear filters discussed so far typically One of the main reasons for the wide-spread use of the assume simple system models such as the identity or a mere normal distribution is the central limit theorem. Thus, it is shift. In order to tackle propagation through more complex desirable to obtain a which also appears functions sampling schemes reminiscent of the unscented as a limit distribution for a wide class of circular densities. Kalman filter (UKF) have been proposed in [16], [17]. Similar An obvious approach is wrapping a normal distributed random to the UKF, these schemes are based on matching moments for vector x ∼ N (µ, σ2) around an interval of length 2π. This obtaining a discrete approximation of the underlying continuous results in the wrapped normal distribution. distribution. The method presented in [18] considers a situation Definition 1. Let X ∈ [−π, π) be a with where the underlying discrete distribution has more Dirac corresponding p.d.f. components than required in order to satisfy the moment ∞ constraints. Then, a distance measure is used to exploit this 1 X (θ − µ + 2 k π)2  fWN(θ) = √ exp redundancy in order to make the discrete distribution as similar 2πσ2 2σ2 as possible (with respect to this distance measure) to its original k=−∞ continuous counterpart. where σ > 0, µ, x ∈ [−π, π). Then, the distribution of X In contrast to these approaches, the method presented in this is called a wrapped normal distribution and we write X ∼ work does not aim at satisfying moment constraints. Instead, WN(µ, σ). a quantization based method is used in order to provide It can be easily seen that this distribution naturally arises a discrete approximation of the continuous density. Unless as a limit distribution for a circular summation scheme of precomputation and interpolation are used, this comes at the other wrapped distributions (which originally converged to cost of a higher computation time compared to the UKF-like the normal distribution). Furthermore, the underlying density approaches. However, the main advantage over state of the art is function appears as a solution of the heat equation giving further an error bound for propagation of uncertainty through Lipschitz justification for considering this distribution [19]. Similar to continuous functions. Therefore, in scenarios where a Lipschitz the normal distribution, it can be shown that the sum of constant is known a priori, it is possible to approximate the two wrapped normal distributed random variables is itself result of an expectation computation up to a predefined level a wrapped normal distributed random variable (after a modulo of precision. 2π operation). III.PRELIMINARIES Unfortunately, the wrapped normal distribution is lacking This section first revisits some circular probability distri- one of the important properties of the normal distribution. The butions and their properties. Then, the concept of optimal product of two wrapped normal densities is not itself a rescaled quadratic quantization is introduced. wrapped normal density. This is particularly a problem for a typical Bayesian measurement update step in stochastic filtering. A. Circular Distributions Furthermore, handling of the infinite sum within the density Circular probability distributions are inherently defined on might be computationally inconvenient in some applications, the circle S1. Thus, there are several ways to parametrize them particularly for large σ. and to derive their respective probability density functions The von Mises distribution arises when taking a N (µ, C)

(p.d.f.). The most widely used circular distributions arise from distributed random vector in R2 with µ = 1 and condition- modifying non-circular counterparts in order to restrict them to ing it to the unit circle. The definition of this distribution is a circular domain. All distributions considered in this work are usually formulated as follows. Definition 2. Let X ∈ [−π, π) be a random variable with For the purpose of this work, the boundary between two corresponding p.d.f. neighbouring sets Ci and Cj may be arbitrarily added to either 1 Ci or Cj. fVM(θ) = exp(κ cos(θ − µ)) There are many ways to define a quantization error describing 2πI (κ) 0 the quality of a given quantizer. Thus, a general definition is where κ ≥ 0, µ, x ∈ [−π, π), and I0(κ) is the modified Bessel given here. function of order 0. Then, the distribution of X is called a von Mises distribution and we write X ∼ VM(µ, κ). Definition 4. For an N-component Voronoi quantizer Vx (i.e., x ∈ ΩN ) of a P, the N-th quantization p The von Mises distribution can be used for a closed form error for P of order p is defined by E X − Vx(X) , where measurement update step, thus it is of particular interest for X is a P-distributed random variable. stochastic filtering. The same property is also maintained by the Bingham distribution [20]. It is an antipodally symmetric In the case where ||·|| is the Euclidean norm and p = 2, the equivalent to the von Mises distribution and it is constructed resulting error measure is also known as (quadratic) distortion and denoted by D . Using (1), this error measure can be in a similar way. Instead of requiring µ = 1, we require x µ = 0. Thus, the resulting density has 180◦ symmetry. formulated as Z p p Definition 3. Let X ∈ [−π, π) be a random variable with E X − Vx(X) = min ||xi − X|| dP . corresponding p.d.f. Ω 1≤i≤N 1 Finally, the problem of optimal quantization for P of order p f (θ) = exp(v(θ)> MZM> v(θ)) Bingham N(Z) can be formulated as finding ∗ p where v(θ) = (cos(θ), sin(θ))>, M ∈ R2×2 is orthogonal, x = arg min E X − Vx(X) . x∈ΩN and Z ∈ R2×2 is diagonal. Then, the distribution of X is called a two-dimensional Bingham distribution and we write A quantizer Vx can be used to define a discrete probability X ∼ Bingham(Z). distribution where each component xi is assigned probability The particular interest in the Bingham distribution stems from weight its universal applicability not only to the circular case but also to w = P(X ∈ C ) . (2) representing uncertain orientations using its four-dimensional i i variant. Besides the filtering applications mentioned in the This can be applied to deriving error bounds for approximate introduction, the Bingham distribution was also applied in the numerical integration of some function g :Ω → Ω0, e.g., analysis of textures [21] and shapes. by using an optimal quantizer with respect to the quadratic distortion error measure. The resulting approximate integral is B. Optimal Quadratic Quantization given by The basic idea of quantization in digital signal processing is to reduce the number of values (which is typically uncountable) E(g(X)) ≈ E(g(Vx(X)) of some signal to a countable set. Optimal quantization of a N Z N n X X continuous probability distribution defined on a space Ω ⊂ R = g(xi) dP = wig(xi) . addresses the same problem. The basic goal considered in this i=1 Ci i=1 work is the reduction of the number of values describing a prob- In this scenario, the following error bound [6], [22] can be ability distribution to some finite set and thus approximation of obtained for a Lipschitz continuous function g with Lipschitz a continuous probability distribution by a discrete distribution. constant Lg Quantization of probability distributions is discussed in greater depth in [5]. E(g(X)) − E(g(Vx(X))) 2 Voronoi quantizers can be used for the considered approxi-  ≤ E g(X) − g(Vx(X)) mation problem. Such a quantization can be thought of as a 2 (3) ≤ L · X − V (X)  vector x ∈ ΩN consisting of means of Voronoi cells (in the g E x 2 p scenario considered in this paper, it will always be assumed ≤ Lg · Dx , that this vector is sorted, i.e., xi < xi+1). The Voronoi cells defined by this vector are given by where Dx denotes the quadratic distortion of the quantizer Vx.

Ci = {x ∈ Ω: d(x, xi) < d(x, xj), ∀j 6= i} , IV. QUANTIZATION OF CIRCULAR DENSITIES where d(·, ·) is a suitable choice for the underlying distance. A circular equivalent to the quadratic distortion will be used Thus, the following notation defining a quantizer is meaningful. as the considered error measure. It is defined by

2 Vx(X) = xi for x ∈ Ci . (1) Dx := E(d(X,Vx(X)) ) , (4) 10 f. Then, the gradient of the error measure (4) with respect to κ=0.1 x is given by 8 2 8 κ=1 ∂Dx h ixi+1/2 h ixi+1/2 κ (2) (3) 6 6 =10 = −2 F (y) + 2 xi F (y) .

x ∂x xi−1/2 xi−1/2 1 0 i x 4 D 4 2 −2 2 A proof is given in Appendix A. In the following, a brief 0 look will be taken at the considered circular distributions and −2 0 2 4 6 0 2 4 x some theoretical aspects of computing the distortion for a 2 x quantizer of this distributions. Figure 2: The first plot shows the the distortion for a 2-quantizer A. Wrapped Normal Case of a WN(0, 1) distribution (where x1 < x2 is required). The second plot shows the distortion in a similar scenario for the The antiderivative of a wrapped normal density can not be VM(0, κ) distribution, where the means of the Voronoi cells computed in closed form. However, due to its relationship to are symmetrically located at −x and x. the Gaussian distribution it can be represented as an infinite sum involving the erf function. Proposition 2. The distortion of a quantization V of the where d(·, ·) is the circular equivalent of an Euclidean distance x wrapped normal distribution WN(0, σ) is given by taking the periodicity of the circle into account. That is, d(·, ·) L ∞ is given by X X (1) (2) (3) Dx = Pi,k − 2Pi,k + Pi,k . d(x, y) := min{|x − y|, 2π − |x − y|} i=1 k=−∞ for every x, y ∈ [a, a + 2π) and an arbitrary domain a. This (1) The components Pi,k are given by choice makes (3) adaptable to the circular case. An important " application of this is the derivation of an upper bound for the (1) 2 error in an approximate propagation on the circular domain. It Pi,k = σ (2kπ − y) fk(y) can be observed that the distortion of optimal quantizers with a #xi+1/2 fixed number of components is maximized when the considered  σ2  y + 2kπ  + 2k2π2 + erf √ , densities approach/become the uniform distribution on the circle 2 2σ (this happens for σ → ∞, κ = 0, and Z = diag(c, c) with xi−1/2 arbitrary c ∈ ). Thus, an optimal N-component quantizer R   xi+1/2 of the uniform distribution yields an upper limit for the (2) 2 y + 2kπ Pi,k = − xi σ · fk(y) + k · π · erf √ , approximation error made by approximating the true continuous 2σ xi−1/2 density by a discrete density obtained from quantization. and Using the 2π-periodicity of circular densities allows rewriting x2 y + 2kπ xi+1/2 the error measure (4) as P (3) = i erf √ . i,k 2 2σ N Z xi−1/2 X 2 Dx = (x − xi) · f(x) dx A proof is given in Appendix B. The resulting error measure i=1 Ci for the case of N = 2 Voronoi cells is shown in the left plot N Z xi+1/2 X 2 Fig. 2. The triangular structure of the plot is due to the fact, = (x − xi) · f(x) dx , that we assume the locations of the components to be given in i=1 xi−1/2 an increasing order, i.e., x2 > x1. where xi are assumed to be in an increasing order, x0 = B. Von Mises Case xN − 2π, xN+1 = x0 + 2π and xi+1/2 = (xi + xi+1)/2. Optimization procedures for computing an optimal quantizer A simplification occurs for the quantization of the von Mises can be usually speed up by providing the gradient. Thus, a distribution, because there is no infinite series involved in the general result is derived on the gradient of the distortion for actual density function. However, the antiderivative of the von continuous circular densities. Mises density needs to be computed numerically. It is given Some further notation will be needed for the formulation of by the following results and their respective proofs. For a p.d.f. ∞ ! (1) 2 (2) (3) 1 2 X sin(j(x − µ)) f(y), let F denote the antiderivative of y ·f(y), let F (y) F (x) = x + I (κ) . (3) VM 2π I (κ) i j denote the antiderivative of y · f(y), and let F (y) denote 0 i=1 the antiderivative of the density f(y) itself. This series representation unfortunately involves the evaluation Proposition 1. Consider a circular random variable X ∈ of a Bessel function in each summand. Furthermore, no (1) (2) [−π, π) following a continuous distribution given by the density similar representations for FVM and FVM are known so far. 0.8 0.8 1.5

0.6 0.6 )

θ 1 ( ) ) θ θ ( 0.4 ( 0.4 VM WN f f 0.5 Bingham 0.2 0.2 f

0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 −1 −1 −1 −1 −1 −1 sin(θ) cos(θ) sin(θ) cos(θ) sin(θ) cos(θ) Figure 3: Optimal quantization of the WN(0, 1), VM(0, 1), and Bingham(A, Z) distributions, where A is the same as in (6) and Z = diag(−10, 0). The quantizer is represented by the induced discrete probability density, where the height of the discrete components is proportional to their respective probability weights.

Algorithm 1: Optimal quantization of a Bingham distribu- variable with θ = (2 · atan2(m2,2, m1,2)) mod 2π, and tion κ = (z2 − z1)/2. Input: Number of Voronoi cells N; Bingham distribution This avoids the need for running an optimization method in parameters M = (m ), Z = diag(z , z ) with i,j 1 2 a scenario involving multimodal densities, because an optimal z > z . 2 1 2N-component quantizer of a Bingham distribution can be Output: Optimal 2N quantizer x derived from an N-component quantizer of a von Mises /* Extract mode of Bingham */ distribution. The whole process is described in Algorithm 1. µ ← atan2(m2,2, m1,2) V. EVALUATION /* von Mises parameter */ Several problems need to be addressed when implementing κ ← (z2 − z1)/2 a method to compute optimal quantizers. The starting value of the optimizer is of particular importance in finding the /* Optimal quantization of VM(0, κ) */ y ← OptimalVonMisesQuantizer(0, κ, N) optimal quantizer because of the following three reasons. First, a bad starting value can result in the optimizer not finding /* Obtain optimal Bingham quantizer */ the optimum, because components located sufficiently far y y y y  x ← 1 ,..., N , 1 + π, . . . , N + π away from the mean of a very peaked distribution might not 2 2 2 2 change their position during the optimization process. This is x ← ((x1 + µ) mod 2π, . . . , (x2N + µ) mod 2π) a consequence of floating-point round-off errors making the computed derivative of the distortion (with respect to these components) become 0. Second, our utility measure assumes Consequently, using direct numerical integration is necessary xi < xi+1 and thus, choosing the same value as starting point to compute the distortion for a quantizer of the von Mises for all xi might break this ordering, which would need to be distribution. The same problem arises in the computation of fixed by introducing additional sort operations. Finally, a good its gradient using Proposition 1. However, a speed up can heuristic for the starting value might significantly reduce the be achieved by leaving away the normalization constant (or number of required iterations. However, in these evaluations, a precomputing it at the very beginning of the entire optimization very rough heuristic was used. For approximating the zero mean procedure). This avoids numerous evaluations of the Bessel wrapped normal distribution the start values for the optimizer function. The utility function for approximating von Mises were chosen as distribution using a quantizer with N = 2 and symmetric π(2i − 1 − N) x = −x xi = · min(σ, 1) , i = 1,...,N, placement of the components 1 2 is shown in the right N plot of Fig. 2. and for the zero mean von Mises distribution the start values C. Bingham Case were chosen as Instead of deriving a quantizer particularly suited for the π(2i − 1 − N)  1  x = · min √ , 1 , i = 1,...,N, Bingham distribution, a relationship with the von Mises i N κ distribution can be used to obtain an optimal quantizer from with min κ−1/2, 1 = 1 for κ = 0. Approximations of an optimal quantizer of the von Mises distribution. distributions with a non-zero mean can be made by using Proposition 3. Let X be a Bingham(M, Z) distributed a suitable shift. random variable, where M = (mi,j) Z = diag(z1, z2) with Formulas from Proposition 2 were used for approximation z2 > z1. Then, 2X mod 2π is a VM(θ, κ) distributed random of the distortion in the wrapped normal case. Numerical κ = 1 κ = 10 0.015 N Dx iter time Dx iter time σ=0.1 5 1.1 · 10−1 4 0.1s 8.8 · 10−3 7 0.2s 0.01 σ=1 10 2.8 · 10−2 5 0.5s 2.5 · 10−3 6 0.6s κ=1 15 1.3 · 10−2 6 1.3s 1.2 · 10−3 7 1.5s κ=10 −3 −4 0.005 30 3.1 · 10 6 5.0s 3.2 · 10 6 5.0s Norm Error Table II: Simulation results for optimal quantization of von 0 5 10 15 20 Mises distributions. N Figure 4: Norm error between the first circular moment of the discrete probability densities derived from optimal quantization symmetry) makes the proposed approach suitable for many (with N components) of wrapped normal (red) and von Mises real-time applications. Several strategies can be used to achieve (green) distributions and their true first circular moments. a further speed up, where numerical optimization is avoided entirely. First, sub optimal quantizers can be used which are obtained by computationally efficient heuristics. In this case, integration was used for computing the distortion and its the distortions of such quantizers need to be precomputed in derivative in the von Mises case. The resulting optimal advance in order to make a design choice about the proper quantizers for some circular densities are shown in Fig. 3, number of components involved. Furthermore, computation where the quantizers are represented by the induced discrete of probability weights of the induced discrete probability probability distribution approximating the original continuous distribution also involves numerical evaluations making a density. good heuristic choice necessary. Second, a combination of The first circular moment was used to evaluate the approx- interpolation and precomputed look-up tables may be used imation quality of the proposed quantizers. It is given by in order to approximate an optimal quantization. In this case, E(exp(iX)) and of particular importance for moment matching error bounds for the involved interpolation method can be used based parameter estimation of the involved distributions. The to derive a bound on the suboptimality of the interpolated true first circular moment was compared with the first circular quantizers. moment of the discrete distributions obtained from optimal VI.CONCLUSION quantization with respective probability weights as defined in (2). The results are shown in Fig. 4. In this work, Voronoi quantization of circular densities is proposed. These Voronoi quantizers give rise to an induced σ = 0.1 σ = 1 discrete probability distribution approximating the original N D iter time D iter time x x continuous distribution. The proposed method can be used 5 8.0 · 10−4 5 0.2s 7.7 · 10−2 5 0.2s 10 2.3 · 10−4 8 0.5s 2.1 · 10−2 8 0.7s to guarantee a predefined degree of precision in computing 15 1.1 · 10−4 8 1.1s 9.8 · 10−3 7 1.2s transformations of circular random variables. This is of 30 2.8 · 10−5 8 4.3s 2.5 · 10−3 8 5.3s particular interest in the stochastic filtering. For future work, Table I: Simulation results for optimal quantization of wrapped it is of interest to derive quantization based approximations of normal distributions. continuous probability densities on other manifolds, such as the manifold of orientations SO(3) and the manifold of rigid- An overview of the optimization procedure for the wrapped body motions SE(3), because of their importance in many normal case is given in Table I. These evaluations were applications. Furthermore, use of different circular distributions computed using a straight forward implementation of the error and error measures might result in deriving more efficient measure (and its derivatives) in Matlab 2013b and a trust-region approximation techniques. optimization procedure (which is the default for fminunc). APPENDIX All simulations were performed on a system with an Intel Core i7-4770 CPU. In the wrapped normal case, the computation A. Proof of Proposition 1 time rises not only for a rising number of components but also for a rising σ. This is because the number of relevant terms in The gradient can be decomposed into. the infinite sum within the wrapped normal densities (and thus ∂Dx (1) (2) (3) within the corresponding antiderivatives) is dependent on the = Gi − 2 Gi + Gi . ∂xi parameter σ. The overall picture looks similar for the von Mises case (see Table II). Contrary to the wrapped normal distribution, Furthermore, let x˜i := xi, x˜i±1/2 := xi±1/2 for 2 < i < N, the computation time rises for a very peaked density. This is x˜0 := xL, and x˜N+1 = x1 (similarly x˜N+1/2 := x1−1/2 and due to the use of an adaptive integration method. x˜1−1/2 := xN+1/2). A first useful observation is that x and x˜ The presented quantization can be used directly in a scenario have been chosen in a way such that without real-time requirements, e.g., for analysis of certain f(˜xi) = f(xi) (5) types of periodical time-series. Furthermore, a better implemen- (k) tation (e.g., by generating compiled binary code and considering holds. In the next step, the Gi are derived. B. Proof of Proposition 2  2 x˜ x The expectation (X − q(X)) can be decomposed into (1) δ h (1) i i−1/2 h (1) i i+1/2 E Gi = F (y) + F (y) δxi x˜i−3/2 xi−1/2 L ∞ !  2 X X (1) (2) (3) x˜ E (X − q(X)) = Pi,k − 2Pi,k + Pi,k h (1) i i+3/2 + F (y) . i=1 k=−∞ x˜i+1/2 Now, each component is computed separately. In each case, Use of x˜ instead of x in the first and last summand is to make the computations come down to using the relationship to the sure that for i = 1 or i = L periodicity is taken into account Gaussian distribution. correctly. Leaving away unnecessary terms yields Z xi+1/2 y2  (y + 2kπ)2  P (1) = √ exp − dy δ i,k 2 2 (1) (1) x 2πσ 2σ Gi = F (˜xi−1/2) i−1/2 δxi " σ(2kπ − y)  (y + 2kπ)2  δ  (1) (1)  = √ exp − + F (xi+1/2) − F (xi−1/2) 2π 2σ2 δxi #xi+1/2 δ (1)  2    − F (˜x ) . 2 2 σ y + 2kπ δx i+1/2 + 2k π + erf √ , i 2 2σ 2 2 x (5) x˜i−1/2 − xi−1/2 i−1/2 = f(xi−1/2) " 2 2 2 2 = σ (2kπ − y) fk(y) xi+1/2 − x˜i+1/2 + f(xi+1/2) . 2 #xi+1/2  σ2  y + 2kπ  + 2k2π2 + erf √ , 2 2σ x (2) i−1/2 Similar reasoning can be used for computation of Gi . This yields Z xi+1/2  2  (2) xi y (y + 2kπ) Pi,k = √ exp − dy (2) x˜i−1 · x˜i−1/2 − xi · xi−1/2 2 2σ2 xi−1/2 2πσ Gi = · f(xi−1/2) 2  (y+2kπ)2  x h (2) i i+1/2 hσ · exp − 2σ2 + F (y) = − xi √ xi−1/2 2π xi · x − x˜i+1 · x˜   x + i+1/2 i+1/2 · f(x ) y + 2kπ i i+1/2 2 i+1/2 + k · π · erf √ 2σ xi−1/2   xi+1/2 2 y + 2kπ = − xi σ · fk(y) + k · π · erf √ , (3) 2σ x And for Gi , we obtain i−1/2 2 2 x˜ − x h ixi+1/2 (3) i−1 i (3) Z xi+1/2 2  2  Gi = · f(xi−1/2) + 2 xi F (y) (3) xi (y + 2kπ) 2 x P = √ exp − dy i−1/2 i,k 2πσ2 2σ2 x2 − x˜2 xi−1/2 i i+1 2 xi+1/2 + · f(xi+1/2) . x y + 2kπ  2 = i erf √ . 2 2σ Putting all of this together yields the desired result xi−1/2

∂Dx (1) (2) (3) =Gi − 2 · Gi + Gi ∂xi 2 2 C. Proof of Proposition 3 (˜x − x˜i−1) − (x − xi) = i−1/2 i−1/2 f(x ) 2 i−1/2 Due to the fact that z2 > z1, we observe that one of x x h (2) i i+1/2 h (3) i i+1/2 the modes of the Bingham distribution is located at µ with − 2 F (y) + 2 xi F (y) (cos(µ), sin(µ)) = (m , m ). Furthermore, we note that M xi−1/2 xi−1/2 1,2 2,2 2 2 is orthogonal. Consequently, the density can be rewritten as (x − xi) − (˜x − x˜i+1) + i+1/2 i+1/2 f(x ) 2 i+1/2 x x fBingham(x; M, Z) = fBingham(x − µ; A, Z) , h (2) i i+1/2 h (3) i i+1/2 = − 2 F (y) + 2 xi F (y) . xi−1/2 xi−1/2 with 0 1 A = . (6) 1 0 The combination of z1 < z2 and this particular choice of A [7] J. Traa and P. Smaragdis, “A Wrapped Kalman Filter for Azimuthal Speaker Tracking,” Signal Processing Letters, IEEE, vol. 20, no. 12, pp. ensure fBingham(x − µ; A, Z) to take its maximum values at 1257–1260, dec 2013. x = µ and x = µ − π. The mean µ is obtained by [8] G. Stienne, S. Reboul, M. Azmani, J. Choquel, and M. Benjelloun, µ = atan2(m , m ) . “GNSS Dataless Signal Tracking with a Delay Semi-Open Loop and a 2,2 1,2 Phase Open Loop,” Signal Processing, vol. 93, no. 5, pp. 1192–1209, Now, it is clearly sufficient to consider a Bingham(A, Z) may 2013. [9] I. Markovic and I. Petrovic, “Bearing-only Tracking with a Mixture of distributed random variable X in the following. von Mises Distributions,” in 2012 IEEE/RSJ International Conference In order to rewrite the density of Bingham(A, Z), we note on Intelligent Robots and Systems. IEEE, oct 2012, pp. 707–712. that Bingham(A, Z+c·I) for arbitrary c ∈ R (see [23]). Thus [10] I. Markovic, A. Portello, P. Danes, I. Petrovic, and S. Argentieri, “Active diag(z , z ) and diag(0, z −z ) describe the same distribution Speaker Localization with Circular Likelihoods and Bootstrap Filtering,” 1 2 2 1 in 2013 IEEE/RSJ International Conference on Intelligent Robots and and it can be assumed that the first entry of Z is zero. Then, Systems. IEEE, nov 2013, pp. 2914–2920. the p.d.f. of the Bingham distribution can be rewritten as [11] V. Carotenuto, A. Aubry, A. De Maio, and A. Farina, “Phase Noise Modeling and its Effects on the Performance of Some Radar Signal 2 f(x) ∝ exp (z2 − z1) cos(x) . Processors,” in 2015 IEEE Radar Conference (RadarCon). IEEE, may 2015, pp. 0274–0279. Making use of [12] R. G. McKilliam, “Lattice Theory, Circular Statistics and Polynomial Phase Signals,” Ph.D. dissertation, University of Queensland, 2010. cos(2x) + 1 cos(x)2 = [13] J. Lo and A. Willsky, “Estimation for Rotational Processes with One 2 Degree of Freedom–Part I: Introduction and Continuous-Time Processes,” IEEE Transactions on Automatic Control, vol. 20, no. 1, pp. 10–21, feb yields the density representation 1975.  cos(2x) + 1 [14] A. Willsky and J. Lo, “Estimation for Rotational Processes with f(x) ∝ exp (z2 − z1) one Degree of Freedom–Part II: Discrete-Time Processes,” IEEE 2 Transactions on Automatic Control, vol. 20, no. 1, pp. 22–30, feb 1975. z − z  [15] M. Azmani, S. Reboul, J.-B. Choquel, and M. Benjelloun, “A Recursive ∝ exp 2 1 cos(2x) . 2 Fusion Filter for Angular Data,” in International Conference on Robotics and Biometrics (ROBIO), dec 2009, pp. 882–887. The density f˜ of 2X can be obtained by applying the [16] G. Kurz, I. Gilitschenski, and U. D. Hanebeck, “Recursive Bayesian Filtering in Circular State Spaces,” IEEE Aerospace and Electronic transformation theorem for probability densities. This procedure Systems Magazine, vol. 31, no. 3, 2016. yields [17] I. Gilitschenski, G. Kurz, S. J. Julier, and U. D. Hanebeck, “Unscented z − z  Orientation Estimation Based on the Bingham Distribution,” IEEE f˜(x) ∝ exp 2 1 cos(x) Transactions on Automatic Control, vol. 61, no. 1, pp. 172–177, jan 2 2016. [18] U. D. Hanebeck and A. Lindquist, “Moment-based Dirac Mixture Because of its 2π periodicity, f˜ is also proportional to the den- Approximation of Circular Densities,” in Proceedings of the 19th IFAC sity of 2X mod 2π. Thus, 2(X + µ) mod 2π ∼ VM(θ, κ) World Congress (IFAC 2014), Cape Town, South Africa, aug 2014. [19] G. S. Chirikjian, Stochastic Models, Information Theory, and Lie Groups, with θ = (2µ mod 2π) and κ = (z2 − z1)/2. Volume 1: Classical Results and Geometric Methods. Birkhauser,¨ 2009. REFERENCES [20] C. Bingham, “An Antipodally Symmetric Distribution on the Sphere,” The Annals of Statistics, vol. 2, no. 6, pp. 1201–1225, nov 1974. Statistical Analysis of Circular Data [1] N. I. Fisher, . Cambridge University [21] K. Kunze and H. Schaeben, “The Bingham Distribution of Quaternions Press, 1995. and Its Spherical Radon Transform in Texture Analysis,” Mathematical Directional Statistics [2] K. V. Mardia and P. E. Jupp, , 1st ed. Wiley, 1999. Geology, vol. 36, no. 8, pp. 917–943, nov 2004. [3] G. S. Watson, Statistics on Spheres. Wiley, 1983. [4] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. [22] G. Pages,` “A Space Quantization Method for Numerical Integration,” Springer, 1992. Journal of Computational and Applied Mathematics, vol. 89, no. 1, pp. [5] S. Graf and H. Luschgy, Foundations of Quantization for Probability 1–38, mar 1998. Distributions. Springer, 2000. [23] G. Kurz, I. Gilitschenski, S. J. Julier, and U. D. Hanebeck, “Recursive [6] G. Pages and J. Printems, “Optimal Quadratic Quantization for Numerics: Estimation of Orientation Based on the Bingham Distribution,” in the Gaussian Case,” Monte Carlo Methods and Applications, vol. 9, Proceedings of the 16th International Conference on Information Fusion no. 2, 2003. (Fusion 2013), Istanbul, Turkey, Jul. 2013.