An Asymptotic Test for Bimodality Using the Kullback–Leibler Divergence
Total Page:16
File Type:pdf, Size:1020Kb
S S symmetry Article An Asymptotic Test for Bimodality Using The Kullback–Leibler Divergence Javier E. Contreras-Reyes Departamento de Estadística, Facultad de Ciencias, Universidad del Bío-Bío, Concepción 4030000, Chile; [email protected] Received: 16 May 2020; Accepted: 9 June 2020; Published: 16 June 2020 Abstract: Detecting bimodality of a frequency distribution is of considerable interest in several fields. Classical inferential methods for detecting bimodality focused in third and fourth moments through the kurtosis measure. Nonparametric approach-based asymptotic tests (DIPtest) for comparing the empirical distribution function with a unimodal one are also available. The latter point drives this paper, by considering a parametric approach using the bimodal skew-symmetric normal distribution. This general class captures bimodality, asymmetry and excess of kurtosis in data sets. The Kullback–Leibler divergence is considered to obtain the statistic’s test. Some comparisons with DIPtest, simulations, and the study of sea surface temperature data illustrate the usefulness of proposed methodology. Keywords: bimodality; bimodal skew-symmetric normal distribution; Kullback–Leibler divergence; Shannon entropy; sea surface temperature 1. Introduction The bimodality of a frequency distribution is crucial in several fields. For example, [1] analyzed bimodality-generating mechanisms for age-size plant population data sets. Ashman et al. [2] discussed the presence of bimodality in globular cluster metallicity distributions, velocity distributions of galaxies in clusters, and burst durations of gamma-ray sources. Hosenfeld et al. [3] detected bimodality in samples of elementary schoolchildren’s reasoning performance. Bao et al. [4] applied the minimum relative entropy method for bimodal distribution to remanufacturing system data. Freeman and Dale [5] assessed bimodality to detect the presence of dual cognitive processes, and Shalek et al. [6] found bimodal variation in immune cells using ribonucleic acid (RNA) fluorescence. In the literature, exist several measures of a frequency distribution’s bimodality. Classical inferential methods for detecting bimodality focused on third and fourth moments through kurtosis measure [1]. Darlington [7] and Hildebrand [8] claimed that kurtosis is more a measure of unimodality versus bimodality than a measure of peakedness versus flatness. Hartigan and Hartigan [9] considered an asymptotic test to compare the empirical distribution function with a unimodal one. This paper is motivated by the latter, but considers a parametric approach. Specifically, we considered a generalized class of distributions that involves bimodal behavior in empirical distribution. This class is called the bimodal skew-symmetric normal (BSSN) distribution [10], and includes the particular case of bimodal normal distribution of [11]. Thus, with BSSN distribution it is possible to capture asymmetric and platykurtic/leptokurtic (excess negative/positive kurtosis) in data sets, in addition to bimodality. Besides, entropic measures are useful to obtain the statistic’s test if some regularity conditions of the probability distribution function are accomplished [12]. We considered the Shannon entropy [13], the Kullback–Leibler [14] divergence, and the BSSN maximum likelihood estimators to provide an asymptotic test for bimodality. This paper is organized as follows: some properties and inferential aspects of BSSN distribution are presented in Section2. In Section3, we provide the computation and description of information Symmetry 2020, 12, 1013; doi:10.3390/sym12061013 www.mdpi.com/journal/symmetry Symmetry 2020, 12, 1013 2 of 13 theoretic measures related to BSSN distribution and then develop a hypothesis test about significance of bimodality parameter together with a simulation study (Section4). In Section5, real data of sea surface temperature collected off northern Chile illustrate the usefulness of the developed methodology. Discussion concludes the paper in Section6. 2. Bimodal Skew-Symmetric Normal Distribution Definition 1. Let X be a continuous random variable defined at R, so we say X is bimodal skew-symmetric normal (BSSN), distributed and denoted as X ∼ BSSN(m, s2, b, d) [10], if its probability density function (pdf) is given by f (x) = c[(x − b)2 + d]f(x; m, s2), x 2 R, (1) where m, b 2 R are location parameters, s > 0 and d ≥ 0 denote respectively the scale and bimodality parameters, f(·; m, s2) is the normal pdf with location m and scale s parameters; and c = [l2 + s2 + d]−1, with l = b − m. The mean and variance of X are given by E[X] = m − 2cls2, (2) Var[X] = c2s2(3s4 + 4ds2 + [l2 + d]2), (3) respectively. Equation (1) shows that X ∼ N(m, s2) as d ! ¥ or jbj ! ¥. The pdf of Equation (1) can also be expressed in a standardized form as is presented next. Definition 2. A random variable Y has a BSSN distribution, with location parameters m, b 2 R, positive scale parameter s, and non-negative bimodality parameter d if its pdf is 1 f (y) = csy2 − 2cly + − 1 f(y), y 2 R, s where y = (x − m)/s, f(·) is the standardized normal pdf with location 0 and scale 1, and c and l are defined as in Equation (1). Figure1 portrays various plots of the BSSN pdf, accommodating various shapes in terms of skewness, kurtosis and bimodality. We observed that bimodality is presented for smallest d, see also Proposition 2.4 in [10]. In addition, the m and b parameters allows accomodating skewness and kurtosis. > For a random sample X = (X1, ... , Xn) with pdf given in Equation (1), the log-likelihood function can be written as n n 2 1 2 `(q; X) = n log c + ∑ log[(Xm − b) + d] − ∑ Ym, (4) m=1 2 m=1 2 > where Ym = (Xm − m)/s, m = 1, ... , n, and q = (m, s , b, d) . Therefore, the MLE qb is obtained by maximizing the function (4). The Fisher Information Matrix (FIM) related to maximum likelihood equations and derivatives with respect to q, is 0 1 Imm Ism Ibm Idm B C B Ims Iss Ibs Ids C I(q) = B C , (5) @ Imb Isb Ibb Idb A Imd Isd Ibd Idd Symmetry 2020, 12, 1013 3 of 13 = − [ 2`( ) ] = f 2 g = where its elements denoted by Iqiqj E ¶ q; X /¶qi¶qj , qk m, s , b, d , k 1, 2, 3, 4; are 2nc 2cl 2 n Imm = − n + , s4 s4 s2 2 d + l2 I = 2ns2 − 4nc2 , ss s2 n n d − (X − b)2 I = I − − 2 i , bb mm 2 ∑ [ + ( − )2]2 s i=1 d Xi b c 2 n 1 I = −n + , dd 4 ∑ [ + ( − )2]2 s i=1 d Xi b c 2 I = −4nl − 2n(X − m) = I , ms s2 sm n I = − I = I , mb s2 mm bm c 2 I = 2nl = I , md s4 dm c 2 I = 4nl = I , sb s2 bs Isb I = = I , sd 2 ds n X − b I = −I − 2 i = I , bd md ∑ [ + ( − )2]2 db i=1 d Xi b 1 n where X = n ∑i=1 Xi. It can be seen that FIM of Equation (5) is regular, for all d ≥ 0. a) b) 0.20 0.12 0.10 0.15 0.08 ) ) x x ( ( f f 0.10 0.06 0.04 0.05 0.02 0.00 0.00 −15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15 x x c) d) 0.20 0.15 0.15 ) ) 0.10 x x ( ( f f 0.10 0.05 0.05 0.00 0.00 −15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15 x x Figure 1. Various shapes of the pdfs of X ∼ BSSN(m, s2, b, d), with s2 = 5, d = 0, 2, 5, 10 (black, red, blue and violet lines, respectively); and (a) m = 1, b = 1,(b) m = 1, b = 0,(c) m = −1, b = −0.5, and (d) m = −1, b = 0.5 parameters. Symmetry 2020, 12, 1013 4 of 13 3. Information Measures In the next subsections, we present the main results of information measures for BSSN distribution. 3.1. Shannon Entropy The entropy concept is attributed to uncertainty of information or mathematical contrariety of information. Of all possible entropies presented in the literature, we focus on Shannon entropy (SE) [13]. The SE of a random variable Z with pdf f (z) is defined as the expected value given by Z H(Z) = −E[log f (Z)] = − f (z) log f (z)dz, (6) R where E[g(Z)] denotes the expected information in Z for a function g(z). In this case, SE is the expected value of the function g(z) = − log f (z), which satisfies g(1) = 0 and g(0) = ¥. We extend this notation in all expected values expressed in this paper. Proposition 1. Let X ∼ BSSN(m, s2, b, d) with pdf defined in Equation (1), the SE of X is given by " 2# 1 2ps2 (cs)2 l2 + d h i H(X) = log + 3s2 + 4d − 4l2 + − E logf(X − b)2 + dg , (7) 2 c2 2 s with l = b − m. Proof. From Equation (1), we have 1 1 log f (x) = log c + log [(x − b)2 + d] − log(2ps2) − (x − m)2. 2 2s2 Then, from the definition of SE given in Equation (6), we have Z Z 1 H(X) = − f (x) log cdx + log(2ps2) f (x)dx R R 2 Z Z 1 − [( − )2 + ] ( ) + ( − )2 ( ) log x b d f x dx 2 x m f x dx, R R 2s 1 1 h i = − log c + log(2ps2) + E[(x − m)2] − E log f(X − b)2 + dg . (8) 2 2s2 Given that E[(X − m)2] = Var[X − m] + E[X − m]2 = Var[X] + (E[X] − m)2, the result for H(X) yields from Equations (2) and (3) and some basic algebra. For any d, the expected values of Equation (7) are not directly computable.