A Sufficiency Paradox: an Insufficient Statistic Preserving the Fisher
Total Page:16
File Type:pdf, Size:1020Kb
A Sufficiency Paradox: An Insufficient Statistic Preserving the Fisher Information Abram KAGAN and Lawrence A. SHEPP this it is possible to find an insufficient statistic T such that (1) holds. An example of a regular statistical experiment is constructed The following example illustrates this. Let g(x) be the density where an insufficient statistic preserves the Fisher information function of a gamma distribution Gamma(3, 1), that is, contained in the data. The data are a pair (∆,X) where ∆ is a 2 −x binary random variable and, given ∆, X has a density f(x−θ|∆) g(x)=(1/2)x e ,x≥ 0; = 0,x<0. depending on a location parameter θ. The phenomenon is based on the fact that f(x|∆) smoothly vanishes at one point; it can be Take a binary variable ∆ with eliminated by adding to the regularity of a statistical experiment P (∆=1)= w , positivity of the density function. 1 P (∆=2)= w2,w1 + w2 =1,w1 =/ w2 (2) KEY WORDS: Convexity; Regularity; Statistical experi- ment. and let Y be a continuous random variable with the conditional density f(y|∆) given by f1(y)=f(y|∆=1)=.7g(y),y≥ 0; = .3g(−y),y≤ 0, 1. INTRODUCTION (3) Let P = {p(x; θ),θ∈ Θ} be a parametric family of densities f2(y)=f(y|∆=2)=.3g(y),y≥ 0; = .7g(−y),y≤ 0. (with respect to a measure µ) of a random element X taking values in a measurable space (X , A), the parameter space Θ (4) being an interval. Following Ibragimov and Has’minskii (1981, There is nothing special in the pair (.7,.3); any pair of positive chap. 1), a triple E =(X , A, P) is called a regular statistical numbers (p, 1 − p) except (.5,.5) will suffice. experiment,if(a)p(x; θ) is continuously differentiable in θ ∈ Θ From (3) and (4) one sees that both densities are continuously for all x ∈Xand (b) the Fisher information on θ contained in differentiable everywhere. Formulas (2)–(4) determine the joint X, distribution of (∆,Y). For any Borel set A ⊂ R, 2 1 ∂p(x; θ) P (∆ = i, Y ∈ A)=w f (y)dy, i =1, 2. IX (θ)= dµ(x) θ i i (5) p(x; θ) ∂θ A An observation (“data”) is a pair (∆,X)=(∆,Y + θ) with is finite (the integral is taken over the set where p(x; θ) > 0). θ ∈ Θ as a parameter. From (5), Suppose that Pθ(∆ = i, X ∈ A)=wi fi(x − θ)dx, i =1, 2. (6) T :(X , A) ⇒ (T , B) A Because is a statistic. Define IT (θ) as the Fisher information on θ con- ∞ 2 tained in T , then IT (θ) ≤ IX (θ),θ∈ Θ and, if T is sufficient I = [g (x)] /g(x)dx =1, for θ, then 0 the Fisher information I(∆,X)(θ) on θ contained in (∆,X) (that IT (θ)=IX (θ),θ∈ Θ. (1) does not depend on a location parameter ) equals See Ibragimov and Has’minskii (1981, chap. 1, theorem 7.2). I(∆,X) = w1I + w2I =1. Regularity of a statistical experiment does not require posi- tivity of p(x; θ); the density may vanish smoothly. Because of Because the information is finite, the statistical experiment of an observation of (∆,X) is regular. Take now the second component X of the data as a statistic. Abram Kagan is Professor, Department of Mathematics, University of Mary- From (2)–(4), the density function of X, land, College Park, MD 20742 (E-mail: [email protected]). Lawrence A. Shepp is Professor, Department of Statistics, Rutgers University, New Brunswick, f(x; θ)=f(x − θ)=w1f1(x − θ)+w2f2(x − θ) NJ 08855. The authors are grateful to the editor and an associate editor whose comments helped improve the article. The first author’s work on the article was (.7w1 + .3w2)g(x − θ),x≥ θ done during his visit to the University of Leeds with financial support from = (7) Engineering and Physical Sciences Research Council of the U.K. (.3w1 + .7w2)g(θ − x),x≤ θ 54 The American Statistician, February 2005, Vol. 59, No. 1 © 2005 American Statistical Association DOI: 10.1198/000313005X21041 is continuously differentiable in θ for all x and the Fisher infor- For a purely statistical proof of (9), see Kagan (2003). mation IX (θ) on θ contained in X does not depend on θ and Let g(x) be a continuously differentiable probability density equals function such that IX =(.7w1 + .3w2)I +(.3w1 + .7w2)I =1. g(x) > 0,x>0,g(x)=0,x≤ 0, and Thus, IX (θ)=I(∆,X)(θ),θ∈ Θ though X is not sufficient for θ (∆,X) ∞ because the probability element of , I = [g(x)]2/g(x)dx < ∞. 0 w f(x − θ|∆=i), (i, x) ∈{1, 2}×R,θ∈ Θ i An example of such density is given in Section 1. is not factorized into R(x; θ)r(i, x) as required by the factor- Set now ization theorem (see Lehmann 1986, chap. 2, theorem 8). See f1(x)=.7g(x)+.3g(−x),f2(x)=.3g(x)+.7g(−x), Section 3 for a self-contained proof of insufficiency of X. (10) To understand the analytical origin of the phenomenon, one can look at the last step in the proof of theorem 7.2 in Ibragimov and define Ei ,i =1, 2 as a statistical experiment consisting of and Has’minskii (1981, chap. 1). The relation (1) holding for all observing a random variable with density fi(x − θ) with θ ∈ R θ ∈ Θ implies as a parameter. One has ∞ ∂p(x; θ) 2 = γ(T (x); θ)p(x; θ). (8) I(E1; θ)=.7 [g (x − θ)] /g(x − θ)dx ∂θ θ θ To get from (8) the required factorization p(x; θ)= +.7 [g(−x + θ)]2/g(−x + θ)dx = I R(T (x); θ)r(x), one needs to divide both sides of (8) by p(x; θ) −∞ that is, in general, impossible because p(x; θ) may vanish at (similarly, I(E2; θ)=I)sothatE1, E2 are regular statistical x = x(θ) (or at θ = θ(x); this is exactly the case with experiments. Their mixture E = w1E1 + w2E2 consists in ob- f(x−θ|∆) in the above example: it vanishes at x = θ). Though serving a random variable with density not an obstruction for an experiment to be regular, it is an ob- struction for solving (8). Adding to the regularity the condition f(x − θ)=w1f1(x − θ)+w2f2(x − θ) p(x; θ) > 0,x∈X,θ∈ Θ makes (1) a characteristic prop- erty of sufficient statistics. It seems that the positivity condi- (.7w1 + .3w2)g(x − θ),x≥ θ = (11) tion (or the like) is missing from theorem 7.2 in Ibragimov and (.3w1 + .7w2)g(θ − x),x≤ θ. Has’minskii (1981, chap. 1). The proofs in Kagan, Linnick, and Rao (1973, theorem 8.3.1) and in Witting (1985, pp. 333–334) The information I(E; θ) does not depend on θ and, as can be use the positivity condition. seen from (11), As to the statistical origin of the phenomenon, it is convexity I(E; θ)=(.7w + .3w )I +(.3w + .7w )I but not strict convexity of the Fisher information with respect 1 2 1 2 to mixtures of statistical experiments. It is a sort of a surprise = I = w1I(E1; θ)+w2I(E2; θ). (12) that the phenomenon of an insufficient statistic preserving the Relation (12) shows that the convexity of the Fisher information Fisher information owes its origin to this property (that may be with respect to mixtures is not strict. Indeed, E1, E2 are different of interest in its own) briefly discussed in Section 2. Section 3 experiments but in (9) the equality sign holds. contains the main result and discussion. Other references related Consider now the choice of a smooth fi(x).Ifg(x) is the to the phenomenon discussed in this article are Akahira and density of a gamma distribution Gamma (α, 1), Takeuchi (1991), Bahadur (1955), and Huber (1981, chap. 4). g(x)=(1/Γ(α))xα−1e−x,x≥ 0, 2. CONVEXITY OF THE FISHER INFORMATION AND THE SOURCE OF THE PARADOX then for α>m+1one has Let Ei,i =1, 2 be a regular statistical experiment consisting (m) g(0) = g (0) = ···= g (0) = 0, (13) in observing a random variable T with values in T having a density fi(t; θ) depending on a scalar parameter θ. Denote by and the first m derivatives of fi(x) also vanish at x =0.For I(Ei; θ) the Fisher information on θ in Ei. x>0, fi(x),i=1, 2 is infinitely differentiable. An experiment E is called a mixture of E1, E2 with (mix- Plainly, fi(x) will have the same smoothness if it is con- ture) coefficients w1,w2,w1 ≥ 0,w2 ≥ 0,w1 + w2 =1 structed from any g(x) which is positive and infinitely differ- if it consists in observing a random variable with the density entiable for x>0 and vanishes for x ≤ 0 in such a way that f(x; θ)=w1f1(x; θ)+w2f2(x; θ). If the mixture is denoted as relations (13) hold. Let now ∆ be a binary random variable with E = w1E1 + w2E2, P (∆=1)=w1,P(∆=2)=w2 (14) the convexity of the Fisher information means with known w1 > 0,w2 > 0,w1 + w2 =1and assume that I(w1E1 + w2E2; θ) ≤ w1I(E1; θ)+w2I(E2; θ).