Goodness-Of-Fit for a Concentrated Von Mises-Fisher Distribution

Comput Stat (2012) 27:69–82 DOI 10.1007/s00180-011-0238-4 ORIGINAL PAPER Goodness-of-fit for a concentrated von Mises-Fisher distribution Adelaide Maria Sousa Figueiredo Received: 29 May 2008 / Accepted: 1 February 2011 / Published online: 16 February 2011 © Springer-Verlag 2011 Abstract The von Mises-Fisher distribution is widely used for modelling directional data. In this paper we propose goodness-of-fit methods for a concentrated von Mises-Fisher distribution and we analyse by simulation some questions concerning the application of these tests. We analyse the empirical power of the Kolmogorov- Smirnov test for several dimensions of the sphere, supposing as alternative hypothesis a mixture of two von Mises-Fisher distributions with known parameters. We also compare the empirical power of the Kolmogorov-Smirnov test with the Rao’s score test for data on the sphere, supposing as alternative hypothesis, a mixture of two Fisher distributions with unknown parameters replaced by their maximum likelihood estimates or a 5-parameter Fisher-Bingham distribution. Finally, we give an example with real spherical data. Keywords Directional data · Fisher-Bingham distribution · Goodness-of-fit test · Von Mises-Fisher distribution Mathematics Subject Classification (2000) 62H11 · 62G10 1 Introduction The von Mises-Fisher distribution is frequently used for modelling directional data (see for instance, Watson 1983, Fisher et al. 1987, Fisher 1993 and Mardia and Jupp 2000). This distribution is called Fisher distribution for data on the sphere and is called von Mises distribution for data on the circle. Some goodness-of-fit tests have A. M. S. Figueiredo (B) Matemática e Sistemas de Informação, Faculdade de Economia/LIAAD—INESC Porto L.A., Universidade do Porto, Rua Dr. Roberto Frias, 4200-464 Porto, Portugal e-mail: [email protected] 123 70 A. M. S. Figueiredo been proposed in the literature for this distribution in the case of circular data and spherical data. For the particular case of data defined on the circle, Lockhart and Stephens (1985) gave goodness-of-fit tests for the von Mises distribution and Lawson (1988) considered the fit of the von Mises distribution using GLIM. For the particular case of data defined on the sphere, Lewis and Fisher (1982) presented graphical methods for investigating the fit of a Fisher distribution to spherical data; Fisher and Best (1984) considered goodness-of-fit tests based on the empirical distribution function to investigate the adequacy of fit of Fisher’s distribution on the sphere and Rivest (1986) proposed a goodness-of fit test for the Fisher distribution in small concentrated samples. Mardia et al. (1984) derived a likelihood ratio test for the adequacy of a von Mises-Fisher distribution against the alternative of a Fisher-Bingham distribution and Boulerice and Ducharme (1997) proposed for directional and axial data, general purposes smooth tests of goodness-of-fit for rotationally symmetric distributions, including the von Mises-Fisher distribution against general families of embedding alternatives constructed from complete orthonormal bases of functions. In Sect. 2 we recall some distributions used for directional data, namely the von Mises-Fisher distribution and the 5-parameter Fisher-Bingham distribution. In Sect. 3 we propose some goodness-of-fit methods for a concentrated von Mises-Fisher distribution based on an asymptotic chi-square distribution and we also recall a goodness-of-fit test based on the Rao’s score statistic. In Sect. 4 we carry out a simulation study for analysing some questions concerning the application of the methods suggested in this paper: the adequacy of using the asymptotic chi-square distribution in the tests and the adequacy of using the tabulated critical values of the Kolmogorov-Smirnov statistic when the parameters of the von Mises-Fisher distribution are unknown and replaced by their maximum likelihood estimates. We also determine by simulation the critical values of the Rao’s score statistic in some cases and we compare them with the asymptotic values. In Sect. 5 we analyse the empirical power of the Kolmogorov-Smirnov test for several dimensions of the sphere, supposing as an alternative hypothesis a mixture of two von Mises-Fisher distributions. We also compare the empirical power of this test with the Rao’s score test for data on the sphere supposing as alternative hypothesis a mixture of two Fisher distributions with unknown parameters replaced by their maximum likelihood estimates or a Kent distribution. In Sect. 6 we consider spherical data given in the literature for illustrating the previous methods and finally, in Sect. 7,we conclude the paper. 2 Some distributions for directional data p p The von Mises-Fisher distribution on the unit sphere in R , Sp−1= x ∈ R : x x=1 is usually denoted by Mp (μ,κ) and it has probability density function given by f (x|μ,κ) = cp (κ) exp κx μ x ∈ Sp−1, μ ∈ Sp−1,κ≥ 0 , (2.1) where the normalising constant cp (κ) is defined by 123 Goodness-of-fit for a concentrated von Mises-Fisher distribution 71 κ p −1 (κ) = 2 1 cp p 2 I p − (κ) 2 2 1 and Iν denotes the modified Bessel function of the first kind and order ν.Formore details about this function, see for instance, Mardia and Jupp (2000), p. 168. The parameter μ is the mean direction and κ is the concentration parameter around μ. The parameter μ is also the mode and −μ is the antimode (provided that κ>0). This distribution is rotationally symmetric about μ (Mardia and Jupp 2000, p. 179). If x comes from Mp (μ,κ) distribution and U is an orthogonal matrix, then Ux comes from Mp (Uμ,κ) distribution. If x comes from Mp (μ,κ) distribution then for large κ (Mardia and Jupp 2000, p. 172): . κ − μ ∼ χ 2 . 2 1 x p−1 (2.2) Let (x1, x2,...,xn) be a random sample of size n from the von Mises-Fisher dis- (μ,κ) . tribution Mp Let Rn be the resultant length mean of the sample defined by = 1/2 , ,..., Rn x x , where x is the sample vector mean of (x1 x2 xn) defined by = n μ x i=1 xi n. The maximum likelihood estimator of is the sample mean direction, that is μ = x0 = xRn and the maximum likelihood estimator of κ is the solution of the equation A p (κ) = Rn, (2.3) where the function A p (κ) is defined by A p (κ) =−c (κ)cp (κ) =I p (κ)I p − (κ) . p 2 2 1 (See Mardia and Jupp 2000, p. 198). For p = 3, the von Mises-Fisher distribution is called Fisher distribution and denoted by M3 (μ,κ) or by F (μ,κ).Forp = 2 the von Mises-Fisher distribution is called von Mises distribution and denoted by M2 (μ,κ) or by M (μ,κ). The Fisher distribution is a particular case of the 5-parameter Fisher-Bingham distribution or Kent distribution proposed by Kent (1982). This distribution, denoted by FB5 (κ, β, ) is defined by the density 2 2 ( ) = ( ,β)−1 γ + β γ − γ ∈ . f x c k exp k (1)x (2)x (3)x x S2 (2.4) The parameters are the concentration k ≥ 0, the ovalness β ≥ 0 and a (3 × 3) = γ , γ , γ , γ ortogonal matrix (1) (2) (3) where (1) is the mean direction or pole, γ γ (2) is the major axis and (3) is the minor axis. If β = 0, then (2.3) reduces to a Fisher density. The FB5 distribution is a spherical analogue of the bivariate normal distribution. More details about the FB5 distribution can be found in Kent (1982). 123 72 A. M. S. Figueiredo 3 Goodness-of-fit methods We wish to test the null hypothesis H0:Thesample(x1, x2,...,xn) comes from a von Mises-Fisher distribution Mp (μ,κ), with large κ. Basedontheresult(2.2), we suggest to reduce the goodness-of-fit of a von Mises- Fisher distribution to the goodness-of-fit of a chi-square distribution. For each xi , = ,..., = κ − μ i 1 n of the sample we calculate the value yi 2 1 xi and we test the ( ,..., ) χ 2 null hypothesis H0:Thesample y1 yn comes from a (p−1) population. Then, κ for a large concentration parameter ,totestH0 we may use a chi-square Q–Q plot and the usual chi-square and Kolmogorov-Smirnov (K.-S.) goodness-of-fit tests. If the parameters μ and κ are unknown, they are replaced by their maximum likelihood estimates. Mardia et al. (1984) derive a goodness-of-fit test for the von Mises-Fisher distribution. These authors consider the likelihood ratio test for the null hypothesis of a Fisher distribution against the alternative of a Fisher-Bingham distribution and use the Rao’s statistic, which is asymptotically equivalent to the likelihood ratio statistic, to test this hypothesis. This test takes a simpler form for p = 3 with an alternative of a Kent distribution (see Kent 1982; Mardia and Jupp 2000, pp. 272–273) and as we’ll use it in this paper, next we’ll describe it briefly. Let x1,...,xn be observations on the sphere S2.LetH x0,(0, 0, 1) be the rota- tion matrix, which transforms the sample mean direction x0 into the pole (0, 0,1) . Denote by yi the 2-vector defined by the last two components of H x0,(0, 0, 1) xi . Let l1 and l2 be the eigenvalues of n 1 y y . n i i i=1 Then the score statistic takes the form κ3 2 T = n l1 − l2 , (3.1) 4 κ − 3R where κ is the maximum likelihood estimate of κ under the Fisher distribution.

Goodness-Of-Fit for a Concentrated Von Mises-Fisher Distribution

Contributions to Directional Statistics Based Clustering Methods

A Guide on Probability Distributions

Recent Advances in Directional Statistics

Multivariate Statistical Functions in R

Handbook on Probability Distributions

A New Unified Approach for the Simulation of a Wide Class of Directional Distributions

A Logistic Normal Mixture Model for Compositions with Essential Zeros

High-Precision Computation of Uniform Asymptotic Expansions for Special Functions Guillermo Navas-Palencia

Package 'Directional'

Spherical-Homoscedastic Distributions: the Equivalency of Spherical and Normal Distributions in Classiﬁcation

Clustering on the Unit Hypersphere Using Von Mises-Fisher Distributions

A Toolkit for Inference and Learning in Dynamic Bayesian Networks Martin Paluszewski*, Thomas Hamelryck