The Modes of a Mixture of Two Normal Distributions

Silesian J. Pure Appl. Math. vol. 6, is. 1 (2016), 59–67 Grzegorz SITEK Department of Statistics, University of Economics in Katowice, Poland THE MODES OF A MIXTURE OF TWO NORMAL DISTRIBUTIONS Abstract. Mixture distributions arise naturally where a statistical pop- ulation contains two or more subpopulations. Finite mixture distributions refer to composite distributions constructed by mixing a number K of component distributions. The first account of mixture data being analyzed was documented by Pearson in 1894. We consider the distribution of a mixture of two normal distributions and investigate the conditions for which the distribution is bimodal. This paper presents a procedure for answer- ing the question of whether a mixture of two normal distributions which five known parameters µ1, µ2, σ1, σ2, p is unimodal or not. For finding the modes, a simple iterative procedure is given. This article presents the possibility of estimation of modes using biaverage. 1. Introduction Mixture models have been widely used in econometrics and social science, and the theories for mixture models have been well studied Lindsay (see [5]). The importance of the research for unimodality or bimodality in statistics have been described by Murphy (see [6]). Consider f(x, p)= pf (x)+(1 p)f (x) (1) 1 − 2 2010 Mathematics Subject Classification: 60E05, 62E99. Keywords: mixture of normal distributions, bimodal distributions, biaverage. Corresponding author: G. Sitek ([email protected]). Received: 25.07.2016. 60 G. Sitek where, for i =1, 2, 1 − − 2 2 (x µi) /2σi fi(x)= e σi√2π with 0 <p< 1. The function f(x, p) is the probability density function of a mixture of two normal distributions. Here we are concerned with the study of the modes of the mixtures (1). The density f(x, p) may have more then one mode, but, except in a very special case, there is no simple rule to know whether the mixture is unimodal or bimodal. 2. Theoretical discussion The separation of the components of a two-component Gaussian mixture can be expressed by the difference between the component means, which is ∆= µ µ . 2 − 1 Eisenberger (see [4]) gave the sufficient condition that a mixture is unimodal if 2 2 2 27σ1σ2 ∆ < 2 2 . 4(σ1 + σ2) Accordingly to this condition, a mixture with σ1 = σ2 = 1 is unimodal for ∆ < 1.84. Behboodian (see [3]) considered this problem, too, and derived the following sufficient condition for a mixture of two Gaussian distributions to be unimodal ∆ 2 min σ , σ . ≤ { 1 2} Since σ1 = σ2 = 1 is assumed, his classification corresponds to that one chosen in this work,which is ∆ < 2. We consider the following conditions: 1) If µ1 = µ2 f(x, p) is unimodal for all p, 0 <p< 1. 2 ′ p(x µ ) (x µ ) f (x, p)= 1 exp 1 − 3 − −2 − √2πσ1 2σ1 − (1 p)(x µ ) (x µ )2 2 exp 2 =0. (2) − −3 − −2 − √2πσ2 2σ2 Equation (2) has one root x = µ1 = µ2. The modes of a mixture of two normal distributions 61 2) If µ = µ and σ = σ = σ and p = 1 . 1 6 2 1 2 2 The mixture density 0.5 (x µ )2 0.5 (x µ )2 f(x, p)= exp − − 1 + exp − − 2 √2πσ 2σ2 √2πσ 2σ2 of two normal probability density functions with the same standard deviation, σ, but with different means, µ1 and µ2 respectively, is bimodal if and only if µ µ > 2σ. | 2 − 1| Depending on the distance between µ1 and µ2, the mixture density will have 1 either a maximum at x0 = 2 (µ1 + µ2) (the unimodal case) or a local minimum at 1 x0 = 2 (µ1 + µ2) (the bimodal case). Indeed, x0 is a stationary point because 1 (x µ ) (x µ )2 f ′ x 0 1 0 1 ( 0)= − −2 exp − −2 + 2√2πσ σ h 2σ i 2 1 (x0 µ2) (x0 µ2) + − −2 exp − −2 = 2√2πσ σ h 2σ i − µ2−µ1 µ2−µ1 − µ1−µ2 − 1 ( ) ( ( ) )2 1 ( ) ( µ2 µ1 )2 2 2 2 2 . = 2 exp − 2 + 2 exp − 2 =0 2√2πσ σ h 2σ i 2√2πσ σ h 2σ i Now we must check the second derivative to see whether a maximum or a minimum occurs. 1 (x µ )2 (x µ ) 2 (x µ )2 f ′′ x 0 1 0 1 0 1 ( 0)= exp − −2 + −2 exp − −2 2√2πσ3 − h 2σ i σ h 2σ i − 2 2 2 (x0 µ2) (x0 µ2) (x0 µ2) exp − −2 + −2 exp − −2 = − h 2σ i σ h 2σ i −(µ2−µ1) 2 −(µ2−µ1) 2 −(µ2−µ1) 2 1 2 2 2 = exp 2 + 2 exp 2 2√2πσ3 − h 2σ i σ h 2σ i − −(µ1−µ2) 2 −(µ1−µ2) 2 −(µ1−µ2) 2 2 2 2 exp 2 + 2 exp 2 = − h 2σ i σ h 2σ i − µ2−µ1 2 − µ2−µ1 2 1 ( ) ( ) 2 2 > = exp 2 1+ 2 0 2√2πσ3 − h 2σ i − σ if (µ µ )2 > 4σ2 or, equivalently, if µ µ > 2σ. Thus, a minimum occurs 2 − 1 | 2 − 1| only if the distance between the two means exceeds two standard deviations. 62 G. Sitek 3) If µ = µ and σ = σ . A sufficient condition that there exists values of p, 1 6 2 1 6 2 0 <p< 1 for which f(x, p) is bimodal is that 2 2 2 8σ1 σ2 (µ2 µ1) > 2 2 . − (σ1 + σ2 ) For every set of values µ1, µ2, σ1, σ2 exist p, 0 <p< 1 for which f(x, p) is unimodal. ′ Now suppose µ2 > µ1. Since x = µ1 is not a root of f (x, p) = 0, one can divide (2) by the first term of f ′(x, p). After rearranging, one obtains µ x σ3p g(x)= 2 − h(x)= 2 , x µ σ3(1 p) − 1 1 − where 2 2 (x µ2) (x µ1) h(x) = exp − 2 + − 2 . − 2σ2 2σ1 Since σ3p 2 > 0 σ3(1 p) 1 − and this term takes on all finite positive values exactly once on the interval 0 < p< 1 for all fixed values σ1 and σ1, each value x for which g(x) > 0 there is a root of equation σ3p g(x)= 2 σ3(1 p) 1 − for some unique p, and hence is a root of f ′(x, p) = 0 for exactly one value of p. For x > µ2 and x<µ1, g(x) 6 0, so that one is interested only in values of x on the interval µ <x<µ . In this interval g(x) > 0,g(x) as x µ and 1 2 → ∞ → 1 g(µ2) = 0. Therefore, since g(x) is continuous on µ1 <x<µ2, g(x) takes on all positive finite values at least once in the interval. Moreover, if g(x) is monotone decreasing in this interval, all positive values will be attained exactly once so that since there will then exist a one-to-one correspondence between the values of g(x) for µ1 <x<µ2 and 3 σ2 p σ3(1 p) 1 − for 0 <p< 1, f(x, p) will have a single maximum for all p and will be unimodal. ′ Since decreasing monotonicity is implied by g (x) < 0, on µ1 <x<µ2, condition for which this relation is satisfied will now be investigated. The modes of a mixture of two normal distributions 63 For µ1 <x<µ2: ′ h(x) g (x)= σ2(x µ )(µ x)2 + σ2σ2(x µ )2 1 − 1 2 − 1 2 − 1 h 2 2 2 2 + σ2 (x µ1) (µ2 x) σ2σ1 (µ2 µ1) < − − − − i h(x) 27 < (σ2 + σ2)(µ µ )3 σ2σ2(µ µ ) < 0 σ2σ2(x µ )2 4 1 2 2 − 1 − 2 1 2 − 1 1 2 − 1 h i if 2 2 2 27σ1σ2 (µ2 µ1) < 2 2 . (3) − 4(σ1 + σ2 ) Thus for values of µ1,µ2, σ1, σ1, satisfying the inequality (3), g(x) decreases mono- tonically on µ1 <x<µ2. Then, for each value of p, 0 <p< 1, there exists only one value of x for which f ′(x, p) = 0. This must be a maximum since f(x, p) 0 → as x . → ±∞ (µ1+µ2) However, for x = 2 : µ1+µ2 ′ µ + µ 4h( ) 1 g 1 2 = 2 (σ2 + σ2)(µ µ )3 σ2σ2(µ µ ) > 0 2 σ2σ2(µ µ )2 8 1 2 2 − 1 − 2 1 2 − 1 1 2 2 − 1 h i if 2 2 2 8σ1 σ2 (µ2 µ1) > 2 2 . − (σ1 + σ2 ) 3. Biaverage and modes of a mixture of two normal distributions We consider a mixture of two normal distributions 2 2 p (x µ1) 1 p (x µ2) f(x, p)= exp − −2 + − exp − −2 . √2πσ1 2σ1 √2πσ2 2σ2 The modes of this mixture is determined from the condition ′ ′ pf (x, µ , σ )+(1 p)f (x, µ , σ )=0. 1 1 1 − 2 2 2 64 G. Sitek This formula can be written as [2]: 2 2 pµ1 −(x−µ1) (1−p)µ2 −(x−µ2) 3 exp 2 + 3 exp 2 σ1 2σ1 σ2 2σ2 x = h 2i h 2 i. (4) p −(x−µ1) (1−p) −(x−µ2) σ3 exp 2σ2 + σ3 exp 2σ2 1 h 1 i 2 h 2 i The above equation can be solved iteratively. Suppose there are two numbers m and m such that min = E((X a)(X b))2 = E((X m)(X m))2, (5) a,b − − − − where ∞ E(X)= xf(x)dx. Z−∞ The numbers m and m then call biaverage.Biaverage is a two-dimensional vector (m, m).

The Modes of a Mixture of Two Normal Distributions

Exploring Bimodality in Introductory Computer Science Performance Distributions

Recombining Partitions Via Unimodality Tests

UNIMODALITY and the DIP STATISTIC; HANDOUT I 1. Unimodality Definition. a Probability Density F Defined on the Real Line R Is C

Field Guide to Continuous Probability Distributions

On Multivariate Unimodal Distributions

Large Time Unimodality for Classical and Free Brownian Motions with Initial Distributions

Nber Working Paper Series the Distribution of the Size

The Basic Distributional Theory for the Product of Zero Mean Correlated

Distributionally Robust Chance Constrained Optimal Power Flow Assuming Log-Concave Distributions

Alpha-Skew Generalized T Distribution

Distributionally Robust Chance Constrained Optimal Power

Statistics to Detect Low-Intensity Anomalies in PV Systems