K-Nearest Neighbor Based Consistent Entropy Estimation for Hyperspherical Distributions

Entropy 2011, 13, 650-667; doi:10.3390/e13030650 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article k-Nearest Neighbor Based Consistent Entropy Estimation for Hyperspherical Distributions Shengqiao Li 1,, Robert M. Mnatsakanov 1,2, and Michael E. Andrew 1 1 Health Effects Laboratory Division, National Institute for Occupational Safety and Health, Morgantown, WV 26505, USA; E-Mail: [email protected] 2 Department of Statistics, West Virginia University, Morgantown, WV 26506, USA Authors to whom correspondence should be addressed; E-Mails: [email protected] (S.L.); [email protected] (R.M.). Received: 22 December 2010; in revised form: 27 January 2011 / Accepted: 28 February 2011 / Published: 8 March 2011 Abstract: A consistent entropy estimator for hyperspherical data is proposed based on the k-nearest neighbor (knn) approach. The asymptotic unbiasedness and consistency of the estimator are proved. Moreover, cross entropy and Kullback-Leibler (KL) divergence estimators are also discussed. Simulation studies are conducted to assess the performance of the estimators for models including uniform and von Mises-Fisher distributions. The proposed knn entropy estimator is compared with the moment based counterpart via simulations. The results show that these two methods are comparable. Keywords: hyperspherical distribution; directional data; differential entropy; cross entropy; Kullback-Leibler divergence; k-nearest neighbor 1. Introduction The Shannon (or differential) entropy of a continuously distributed random variable (r.v.) X with probability density function (pdf ) f is widely used in probability theory and information theory as a measure of uncertainty. It is defined as the negative mean of the logarithm of the density function, i.e., H(f)=−Ef [ln f(X)] (1) Entropy 2011, 13 651 k-Nearest neighbor (knn) density estimators were proposed by Mack and Rosenblatt [1]. Penrose and Yukich [2] studied the laws of large numbers for k-nearest neighbor distances. The nearest neighbor entropy estimators when X ∈ Rp were studied by Kozachenko and Leonenko [3]. Singh et al.[4] and Leonenko et al.[5] extended these estimators using k-nearest neighbors. Mnatsakanov et al. [6] studied knn entropy estimators for variable rather than fixed k. Eggermontet et al.[7] studied the kernel entropy estimator for univariate smooth distributions. Li et al.[8] studied parametric and nonparametric entropy estimators for univariate multimodal circular distributions. Neeraj et al.[9] studied knn estimators of circular distributions for the data from the Cartesian product, that is, [0, 2π)p. Recently, Mnatsakanov et al.[10] proposed an entropy estimator for hyperspherical data based on the moment-recovery (MR) approach (see also Section 4.3). In this paper, we propose k-nearest neighbor entropy, cross-entropy and KL-divergence estimators for hyperspherical random vectors defined on a unit p-hypersphere Sp−1 centered at the origin in p-dimensional Euclidean space. Formally, Sp−1 = {x ∈ Rp : x =1} (2) 2πp/2 The surface area Sp of the hypersphere is well known: Sp = Γ( p ) , where Γ is the gamma function. For 2 a part of the hypersphere, the area of a cap with solid angle φ relative to its pole is given by Li [11] (cf. Gray [12]): 1 1 p−1 − 2 S(φ)= 2 Sp 1 sgn(cos φ)Icos φ 2 , 2 (3) where sgn is the sign function, and Ix(α, β) is the regularized incomplete beta function. For a random vector from the unit circle S1, the von Mises distribution vM(μ,κ) is the most widely used model: 1 κμT x fvM(x; μ,κ)= e 2πI0(κ) where T is the transpose operator, ||μ|| =1and κ ≥ 0 are the mean direction vector and concentration parameters, and I0 is the zero-order modified Bessel function of the first kind. Note that the von Mises distribution has a single mode. The multimodal extension to the von Mises distribution is the so-called generalized von Mises model. Its properties are studied by Yfantis and Borgman [13] and Gatto and Jammalamadaka [14]. The generalization of von Mises distribution onto Sp−1 is the von Mises-Fisher distribution (also known as Langevin distribution) vMFp(μ,κ) with pdf, κμT x fp(x; μ,κ)=cp(κ)e (4) where the normalization constant is κp/2−1 cp(κ)= p/2 (2π) Ip/2−1(κ) and Iν(x) is the ν-order modified Bessel function of the first kind. See Mardia and Jupp [15] (p. 167) for details. Since von Mises-Fisher distributions are members of the exponential family, by differentiating the cumulant generating function, one can obtain the mean and variance of μT X: T Ef [μ X]=Ap(κ) Entropy 2011, 13 652 and T Vf [μ X]=Ap(κ) d − 2 − − where Ap(κ)=Ip/2(κ)/Ip/2−1(κ), and Ap(κ)= dκ Ap(κ)=1 Ap(κ) (p 1)/κAp(κ). See Watamori [16] for details. Thus the entropy of fp is: T H(fp)=−Ef [ln fp(X)] = − ln cp(κ) − κEf [μ X]=− ln cp(κ) − κAp(κ) (5) and 2 T 2 Vf [ln fp(X)] = κ Vf [μ X]=κ Ap(κ) (6) Spherical distributions have been used to model the orientation distribution functions (ODF) in HARDI (High Angular Resolution Diffusion Imaging). Knutsson [17] proposed a mapping from (p =3) orientation to a continuous and distance preserving vector space (p =5). Rieger and Vilet [18] generalized the orientation in any p-dimensional spaces. McGraw et al.[19] used vMF3 mixture to model the 3-D ODF and Bhalerao and Westin [20] applied vMF5 mixture to 5-D ODF in the mapped space. Entropy of the ODF is proposed as a measure of anisotropy (Ozarslan¨ et al.[21], Leow et al.[22]). McGraw et al.[19] used Renyi´ entropy for the vMF3 mixture since it has a closed form. Leow et al.[22] proposed an exponential isotropy measure based on the Shannon entropy. In addition, KL-divergence can be used to measure the closeness of two ODF’s. A nonparametric entropy estimator based on knn approach for hyperspherical data provides an easy way to compute the entropy related quantities. In Section 2, we will propose the knn based entropy estimator for hyperspherical data. The unbiasedness and consistency are proved in this section. In Section 3, the knn estimator is extended to estimate cross entropy and KL-divergence. In Section 4, we present simulation studies using uniform hyperspherical distributions and aforementioned vMF probability models. In addition, the knn entropy estimator is compared with the MR approach proposed in Mnatsakanov et al.[10]. We conclude this study in Section 5. 2. Construction of knn Entropy Estimators p−1 Let X ∈ S be a random vector having pdf f and X1, X2,...,Xn be a set of i.i.d. random vectors drawn from f. To measure the nearness of two vectors x and y, we define a distance measure T as the angle between them: φ = arccos(x y) and denote the distance between Xi and its k-th nearest neighbor in the set of n random vectors by φi := φn,k,i. With the distance measure defined above and without loss of generality, the na¨ıve k-nearest neighbor density estimate at Xi is thus, k/n fn(Xi)= (7) S(φi) where S(φi) is the cap area as expressed by (3). Let Ln,i be the natural logarithm of the density estimate at Xi, k/n Ln,i =lnfn(Xi)=ln (8) S(φi) Entropy 2011, 13 653 and thus we construct a similar k-nearest neighbor entropy estimator (cf. Singh et al.[4]): n n 1 1 Hn(f)=− [Ln,i − ln k + ψ(k)] = ln[nS(φi)] − ψ(k) (9) n i=1 n i=1 Γ(k) where ψ(k)= Γ(k) is the digamma function. In the sequel, we shall prove the asymptotic unbiasedness and consistency of Hn(f). 2.1. Unbiasedness of Hn To prove the asymptotic unbiasedness, we first introduce the following lemma: Lemma 2.1. For a fixed integer k<n, the asymptotic conditional mean of Ln,i given Xi = x,is E[ lim Ln,i|Xi = x]=lnf(x)+lnk − ψ(k) (10) n→∞ Proof. ∀ ∈ R, consider the conditional probability P {Ln,i < |Xi = x} = P {fn(Xi) <e|Xi = x} k − = P {S(φi) > e } (11) n Equation (11) implies that there are at most k samples falling within the cap Ci centered at Xi = x k − with area Sci = n e . If we let pn,i = f(y) dy Ci and Yn,i be the number of samples falling onto the cap Ci, then Yn,i ∼ BIN(n, pn,i), is a binomial random variable. Therefore, P {Ln,i < |Xi = x} = P {Yn,i <k} k → →∞ → →∞ If we let n 0 as n , then pn,i 0 as n . It is reasonable to consider the Poisson ke− approximation of Yn,i with mean λn,i = npn,i = pn,i. Thus, the limiting distribution of Yn,i is a Sci Poisson distribution with mean: − pn,i − λi = lim λn,i = ke lim = ke f(x) (12) n→∞ n→∞ Sci Define a random variable Li having the conditional cumulative density function, FL ,x( ) = lim P {Ln,i < |Xi = x} i n→∞ then k−1 − j [kf(x)e ] −kf(x)e− FLi,x( )= e j=0 j! By taking derivative w.r.t. , we obtain the conditional pdf of Li: − k [kf(x)e ] −kf(x)e− fL ,x( )= e (13) i (k − 1)! Entropy 2011, 13 654 The conditional mean of Li is ∞ − k [kf(x)e ] −kf(x)e− E[Li|Xi = x]= · e d −∞ (k − 1)! By change of variable, z = kf(x)e−, ∞ k−1 z −z E[Li|Xi = x]= [ln f(x)+lnk − ln z] e dz (k − 1)! 0 ∞ zk−1 =lnf(x)+lnk − ln z e−z dz 0 (k − 1)! =lnf(x)+lnk − ψ(k) (14) −L Corollary 2.2.

K-Nearest Neighbor Based Consistent Entropy Estimation for Hyperspherical Distributions

On Measures of Entropy and Information

Information Theory 1 Entropy 2 Mutual Information

Information Theory and Maximum Entropy 8.1 Fundamentals of Information Theory

A Characterization of Guesswork on Swiftly Tilting Curves Ahmad Beirami, Robert Calderbank, Mark Christiansen, Ken Duffy, and Muriel Medard´

Language Models

Neural Networks and Backpropagation

Entropy and Decisions

A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Carnegie Mellon University Outline • First Part Based Very

Entropy, Relative Entropy, Cross Entropy Entropy

Reduced Perplexity: a Simplified Perspective on Assessing Probabilistic Forecasts Kenric P

Entropy Methods for Joint Distributions in Decision Analysis Ali E

Chapter 2. Information Theory and Bayesian Inference