Lecture 7, Review

Data Reduction

We should think about the following questions carefully before the "simplification" process:

• Is there any loss of information due to summarization?

• How to compare the amount of information about θ in the original X and in 푇(푿)?

• Is it sufficient to consider only the "reduced data" T?

1. Sufficient

A T is called sufficient if the conditional distribution of X given T is free of θ (that is, the conditional is a completely known distribution).

Example. Toss a coin n times, and the probability of head is an unknown θ. Let T = the total number of heads. Is T sufficient for θ?

Sufficiency Principle

If T is sufficient, the "extra information" carried by X is worthless as long as θ is concerned. It is then only natural to consider inference procedures which do not use this extra irrelevant information. This leads to the Sufficiency Principle :

Any inference procedure should depend on the data only through sufficient statistics.

1

Definition: (in terms of Conditional Probability)

(discrete case):

For any x and t, if the conditional pdf of X given T:

푃(퐗 = 퐱, 푇(퐗) = 푡) 푃(퐗 = 퐱) 푃(퐗 = 퐱|푇(퐗) = 푡) = = 푃(푇(퐗) = 푡) 푃(푇(퐗) = 푡) does not depend on θ then we say 푇(퐗) is a sufficient statistic for θ.

Sufficient Statistic, the general definition (for both discrete and continuous variables):

Let the pdf of data X is 푓(퐱; 휃) and the pdf of T be 푞(푡; 휃). If

푓(퐱; 휃)/푞(푇 (퐱); 휃) is free of θ, (may depend on x) (∗) for all x and θ, then T is a sufficient statistic for θ.

Example: Toss a coin n times, and the probability of head is an unknown parameter θ. Let T = the total number of heads. Is T sufficient for θ?

푋 푖. 푖. 푑. Bernoulli: 푓(푥) = 휃 (1− 휃) , 푥 = 0,1

∑ ∑ 푓(풙; 휃) = 푓(푥,…, 푥) = 휃 (1− 휃)

푇 = ∑ 푋 ~ B(n,θ): 푛 ∑ ∑ 푞(푡; 휃) = 푞(∑ 푥) = 휃 (1− 휃) ∑ 푥

Thus 푓(퐱; 휃) 푛 = 1/ 푞(푇 (퐱); 휃) 푡 is free of θ,

So by the definition, ∑ 푋 is a sufficient statistic for 휃.

2

Example. 푋,…, 푋 iid 푁 (휃,1). 푇 = 푋.

Remarks: The definition (*) is not always easy to apply.

• Need to guess the form of a sufficient statistic.

• Need to figure out the distribution of T.

How to find a sufficient statistic?

2. (Neyman-Fisher) Factorization theorem.

T is sufficient if and only if 푓(퐱; 휃) can be written as the product 푔(푇(퐱); 휃)ℎ(퐱), where the first factor depends on x only though 푇(퐱) and the second factor is free of θ.

Example. Binomial. iid bin(1, θ)

Solution 1:

Bernoulli: 푓(푥) = 휃(1− 휃), 푥 = 0,1

∑ ∑ 푓(퐱; 휃) = 푓(푥,…, 푥) = 휃 (1− 휃) = 휃∑ (1− 휃)∑ ⋅ [1]

= 푔 푥 , 휃 ⋅ℎ(푥,…, 푥)

So according to the factorization theorem, 푇 = ∑ 푋 is a sufficient statistic for 휃.

Solution 2:

푓(퐱; 휃) = 푓(푥, 푥,…, 푥|휃) 휃 (1− 휃) , 푖푓 푥 = 0,1, 푖 = 1,2,…, 푛 = 0, 표푡ℎ푒푟푤푖푠푒

∑ ( )∑ = 휃 1− 휃 , 푖푓 푥 = 0,1, 푖 = 1,2,…, 푛 0, 표푡ℎ푒푟푤푖푠푒

3

= 휃 (1− 휃) ℎ(푥, 푥,…, 푥)

= 푔(푡, 휃)ℎ(푥,…, 푥), where 푔(푡, 휃) = 휃(1− 휃) and

1, 푖푓 푥 = 0,1, 푖 = 1,2,…, 푛 ℎ(푥 ,…, 푥 ) = 0, 표푡ℎ푒푟푤푖푠푒.

Hence T is a sufficient statistic for θ.

Example. Exp(λ).

Let 푋,…, 푋 be a random from an with rate 휆. And Let 푇 = 푋 + 푋 +⋯+ 푋 and 푓 be the joint density of 푋, 푋,…, 푋.

푓(퐱; 휆) = 푓(푥, 푥,…, 푥|휆) 휆푒 , 푖푓 푥 > 0, 푖 = 1,2,…, 푛 = 0, 표푡ℎ푒푟푤푖푠푒

∑ = 휆 푒 , 푖푓 푥 > 0, 푖 = 1,2,…, 푛 0, 표푡ℎ푒푟푤푖푠푒

= 휆 푒 ℎ(푥,…, 푥)

= 푔(푡, 휆)ℎ(푥,…, 푥) where 푔(푡, 휆) = 휆푒, and

1, 푖푓 푥 > 0, 푖 = 1,2,…, 푛 ℎ(푥 ,…, 푥 ) = 0, 표푡ℎ푒푟푤푖푠푒.

Hence T is a sufficient statistic for 휆.

Example. Normal. iid N(θ,1).

Please derive the sufficient statistic for θ by yourself.

4

When the of X depends on θ, should be more careful about factorization. Must use indicator functions explicitly.

Example. Uniform. iid 푼(ퟎ, 휽).

Solution 1:

Let 푋,…, 푋 be a random sample from an uniform distribution on (0, 휃). And Let 푇 = 푋() and 푓 be the joint density of

푋, 푋,…, 푋.

Then

푓(퐱; 휃) = 푓(푥, 푥,…, 푥|휃) 1 , 푖푓 휃 > 푥 > 0, 푖 = 1,2,…, 푛 = 휃 0, 표푡ℎ푒푟푤푖푠푒

1 , 푖푓 휃 > 푥 > 0, 푖 = 1,2,…, 푛 = 휃 0, 표푡ℎ푒푟푤푖푠푒 1 , 푖푓 휃 > 푥 ≥⋯≥ 푥 >0 = 휃 () () 0, 표푡ℎ푒푟푤푖푠푒

= 푔(푡, 휃)ℎ(푥,…, 푥) where

, 푖푓 휃 > 푡 >0 푔(푡, 휃) = , 0, 표푡ℎ푒푟푤푖푠푒 and

1, 푖푓 푥 >0 ℎ(푥 ,…, 푥 ) = () 0, 표푡ℎ푒푟푤푖푠푒.

Hence T is a sufficient statistic for 휃.

*** I personally prefer this approach because it is most straightforward. Alternatively, one can use the and simplify the solution as illustrated next.

5

Definition: Indicator function 1, 푖푓 푥 ∈ 퐴 퐼 (푥)= 0, 푖푓 푥 ∉ 퐴

Solution 2 (in terms of the indicator function):

Uniform: 푓(푥) = , 푥 ∈ (0, 휃)

1 푓(퐱; 휃) = 푓(푥 ,…, 푥 ) = , 푥 ∈ (0, 휃),∀푖 휃 1 = 퐼 (푥 ) 휃 (,) 1 = 퐼 푥 ⋅ 퐼 푥 휃 (,) () (,) () 1 =[ 퐼 푥 ]⋅[퐼 푥 ] 휃 (,) () (,) () = 푔푥(), 휃 ⋅ℎ(푥,…, 푥)

So by factorization theorem, 푥() is a sufficient statistic for 휃.

Example: Please derive the sufficient statistics for θ, when given a random sample of size n from 푈 (휃, 휃 + 1).

Solution:

1. Indicator function approach:

Uniform: 푓(푥) = 1, 푥 ∈(휃, 휃 + 1)

푓(푥,…, 푥|휃) = (1) , 푥 ∈ (휃, 휃 +1),∀푖 = (1) 퐼(,)(푥)

= 퐼(,)푥() ⋅ 퐼(,)푥()

=[퐼(,)푥() ⋅ 퐼(,)푥()] ⋅ [1]

= 푔푥(), 푥(), 휃 ⋅ ℎ(푥,…, 푥)

So, 푇 = 푋(), 푋() is a SS for 휃.

2. Do not use the indicator function:

6

푓(푥, 푥,…, 푥|휃) 1, 푖푓 휃 +1> 푥 > 휃, 푖 = 1,2,…, 푛 = 0, 표푡ℎ푒푟푤푖푠푒

1, 푖푓 휃 +1> 푥 > 휃, 푖 = 1,2,…, 푛 = 0, 표푡ℎ푒푟푤푖푠푒 1, 푖푓 휃 +1> 푥 ≥⋯≥ 푥 > 휃 = () () 0, 표푡ℎ푒푟푤푖푠푒

= 푔(푥(), 푥(), 휃)ℎ(푥,…, 푥) where

1, 푖푓 휃 +1> 푥 푎푛푑 푥 > 휃 푔푥 , 푥 , 휃 = () () , () () 0, 표푡ℎ푒푟푤푖푠푒 and

ℎ(푥,…, 푥) =1

So 푇 = 푋(), 푋() is a SS for 휃.

Two-dimensional Examples.

Example. Normal. iid 푵(흁, 흈ퟐ). 휽 = (흁, 흈ퟐ) (both unknown).

Let 푋,…, 푋 be a random sample from a 푁(휇, 휎). And Let

1 푋 = 푋 , 푛

1 S = (푋 − 푋), 푛 −1 and let 푓 be the joint density of 푋, 푋,…, 푋.

1 () 푓(퐱; 휽) = 푓(푥, 푥,…, 푥|휇, 휎 )= 푒 √2휋휎

7

1 ∑ ( ) = 푒 2휋휎

Now

(푥 − 휇) = (푥 − 푥̅ + 푥̅ − 휇)

= (푥 − 푥̅) +2 (푥 − 푥̅)(푥̅ − 휇) + (푥̅ − 휇)

= (n−1)s +2(푥̅ − 휇) (푥 − 푥̅) + 푛(푥̅ − 휇)

= (n−1)s + 푛(푥̅ − 휇).

Thus,

푓(퐱; 휽) = 푓(푥, 푥,…, 푥 | 휇, 휎 ) 1 1 = exp (− ((푛 −1)푠 + 푛(푥̅ − 휇))) 2휋휎 2휎

= 푔(푥̅, 푠 , 휇, 휎 )ℎ(푥,…, 푥), where

푔(푥̅, 푠, 휇, 휎)

1 1 = 푒푥푝 − (푛 −1)푠 + 푛(푥̅ − 휇), 2휋휎 2휎 and

ℎ(푥,…, 푥) = 1.

In this case we say (푋, 푆) is sufficient for (휇, 휎).

8

3. (Regular)

The density function of a regular exponential family is:

푓 (푥; 휽) = 푐(휽)ℎ(푥) exp 푤(휽)푡(푥) , 휽 = (휃,…, 휃)

Example. Poisson(θ)

1 푓 (푥; 휃) = exp (−휃) exp[ln(휃) ∗ 푥] 푥!

Example. Normal. 푵(흁, 흈ퟐ). 휽 = (휇, 휎) (both unknown).

1 () 푓푥; 휇, 휎)= 푒 √2휋휎 1 1 = exp − (푥 − 휇) √2휋휎 2휎 1 1 = exp − (푥 −2푥휇 + 휇) √2휋휎 2휎

1 휇 1 = exp − exp − (푥 −2푥휇) √2휋휎 2휎 2휎

4. Theorem (Exponential family & sufficient

Statistic). Let 푋,…, 푋 be a random sample from the regular exponential family.

Then

푻(푿) = 푡(푋) ,…, 푡(푋) is sufficient for 휽 = (휃,…, 휃).

9

Example. Poisson(θ)

Let 푋,…, 푋 be a random sample from Poisson(θ)

Then

푇(푿) = 푋 is sufficient for 휃 .

Example. Normal. 푵(흁, 흈ퟐ). 휽 = (휇, 휎) (both unknown).

Let 푋,…, 푋 be a random sample from 푵(휇, 휎 )

Then

푻(푿) =( 푋 , 푋 ) is sufficient for 휽 = (휇, 휎).

Exercise.

Apply the general exponential family result to all the standard families discussed above such as binomial, Poisson, normal, exponential, gamma.

A Non-Exponential Family Example.

Discrete uniform.

푃 (푋 = 푥) = 1/휃, 푥 = 1,..., 휃, 휃 is a positive integer.

Another Non-exponential Example.

푋,..., 푋iid 푈 (0, 휃), 푇 = 푋().

10

Universal Cases.

푋,…, 푋are iid with density 푓 .

• The original data 푋,…, 푋 are always sufficient for 휃.

(They are trivial statistics, since they do not lead any data reduction)

• Order statistics 푇 = 푋(),…, 푋() are always sufficient for 휃.

( The dimension of order statistics is 푛, the same as the dimension of the data. Still this is a nontrivial reduction as 푛! different values of data corresponds to one value of 푇 . )

5. Theorem (Rao-Blackwell)

Let 푋,…, 푋 be a random sample from the population with pdf 푓 (풙; 휽). Let 푻(푿) be a sufficient statistic for θ, and 푼(푿) be any unbiased estimator of θ.

Let 푼∗(푿) = 퐸[푼(푿)|푻], then

(1) 푼∗(푿) is an unbiased estimator of 휽, (2) 푼∗(푿) is a function of T, (3) 푽풂풓(푼∗)≤ 푽풂풓(푼) for every 휽, and 푽풂풓(푼∗) < 푉푎푟(푈) for some 휽 unless 푼∗ = 푼 with probability 1 .

Rao-Blackwell theorem tells us that in searching for an unbiased estimator with the smallest possible (i.e., the best estimator, also called the uniformly minimum variance unbiased estimator – UMVUE, which is also referred to as simply the MVUE), we can restrict our search to only unbiased functions of the sufficient statistic T(X).

11

6. Transformation of Sufficient Statistics

1. If 푇 is sufficient for 휃 and 푇 = 푐(푈), a mathematical function of some other statistic, then 푈 is also sufficient.

2. If 푇 is sufficient for 휃, and 푈 = 퐺(푇 ) with 퐺 being one-to- one, then 푈 is also sufficient.

Remark: When one statistic is a function of the other statistic and vice versa, then they carry exactly the same amount of information.

Examples:

• If ∑ 푋 is sufficient, so is 푋.

• If (∑ 푋, ∑ 푋 ) are sufficient, so is (푋, 푆 ).

• If ∑ 푋 is sufficient, so is (∑ 푋 , ∑ 푋) is sufficient, and so is ∑ 푋, ∑ 푋 ).

Examples of non-sufficiency.

Ex. 푋, 푋 iid Poisson(휆). 푇 = 푋– 푋 is not sufficient.

Ex. 푋,..., 푋 iid pmf 푓 (푥; 휃). 푇 = (푋,..., 푋) is not sufficient.

12

7. Minimal Sufficient Statistics

It is seen that different sufficient statistics are possible. Which one is the "best"? Naturally, the one with the maximum reduction.

• For 푁 (휃,1), 푋 is a better sufficient statistic for 휃 than ( 푋, 푆).

Definition:

푇 is a minimal sufficient statistic if, given any other sufficient statistic 푇′, there is a function 푐(•) such that 푇 = 푐(푇′).

Equivalently, 푇 is minimal sufficient if, given any other sufficient statistic 푇′, whenever 퐱 and 퐲 are two data values such that 푇′(퐱) = 푇′(퐲), then 푇 (퐱) = 푇 (퐲).

Partition Interpretation for Minimal Sufficient Statistics:

• Any sufficient statistic introduces a partition on the sample space.

• The partition of a minimal sufficient statistic is the coarsest.

• Minimal sufficient statistic has the smallest dimension among possible sufficient statistics. Often the dimension is equal to the number of free (exceptions do exist).

Theorem (How to check minimal sufficiency).

A statistic T is minimal sufficient if the following property holds: For any two sample points x and y, 푓 (풙; 휃)/푓 (풚; 휃) does not depend on 휃 (i.e. 푓 (풙; 휃)/푓 (풚; 휃) is a constant function of 휃) if and only if 푇(풙) = 푇(풚).

13

8. Exponential Families & Minimal Sufficient Statistic:

For a random sample from the regular exponential family with probability density 푓 (퐱; 휽) = 푐(휽)ℎ(퐱) exp [∑ 푤(휽)푡(퐱)], where 휽 is k dimensional, the statistic

푇(퐗) =( 푡(퐗) ,…, 푡(퐗)) is minimal sufficient for 휽.

Example. Poisson(θ)

Let 푋,…, 푋 be a random sample from Poisson(θ)

Then

푇(푋) = 푋 is minimal sufficient for 휃 .

Example. Normal. 푵(흁, 흈ퟐ). 휽 = (휇, 휎) (both unknown).

Let 푋,…, 푋 be a random sample from 푵(휇, 휎 )

Then

푻(푿) =( 푋 , 푋 )

Is minimal sufficient for 휽 = (휇, 휎).

Remarks:

• Minimal sufficient statistic is not unique. Any two are in one- to-one correspondence, so are equivalent.

14

9. Complete Statistics

Let a {푓(푥, 휃), 휃 ∈ 훩} be given. Let 푇 be a statistic. Induced family of distributions 푓(푡, 휃), 휃 ∈ 훩.

A statistic 푇 is complete for the family {푓(푥, 휃), 휃 ∈ 훩}, or equivalently, the induced family 푓(푡, 휃), 휃 ∈ 훩 is called complete, if 퐸(푔(푇)) = 0 for all 휃 ∈ 훩 implies that 푔(푇)=0 with probability 1.

Example. Poisson(θ)

Let 푋,…, 푋 be a random sample from Poisson(θ)

Then

푇(푋) = 푋 is minimal sufficient for 휃 . Now we show that T is also complete.

We know that 푇(푋) = ∑ 푋 ~푃표푖푠푠표푛(휆 = 푛휃)

Consider any function 푢(푇). We have

푢(푡)푒휆 퐸[푢(푇)] = 푡!

Because 푒 ≠ 0, setting 퐸[푢(푇)] =0 requires all the coefficient

푢(푡)

푡! to be zero, which implies 푢(푇) = 0.

Example. Let 푋,..., 푋 be iid from 퐵푖푛(1, 휃). Show 푇 = ∑ 푋 is a complete statistic. (*Please read our text book for more examples – but the following result on the regular exponential family is the most important.)

15

10. Exponential Families & Complete Statistics

Theorem. Let 푋,..., 푋 be iid observations from the regular exponential family, with the pdf 푓 (퐱; 휃) = 푐(휃)ℎ(퐱) exp [∑ 푤(휃)푡(퐱)], and 휃 =

(휃1,…, 휃). Then

푇(퐗) =( 푡(퐗) ,…, 푡(퐗)) is complete if the parameter space {(푤(휃),..., 푤 (휃))∶ 휃 ∈ 훩} contains an open set in 푅.

(This is only a sufficient condition, not a necessary condition)

Example. 푋,..., 푋∼ 푁 (휃,1),−∞ < 휃 < ∞.

Example. Poisson(θ); 0 < 휃 < ∞.

1 푓 (푥; 휃) = exp (−휃) exp[ln(휃) ∗ 푥] 푥!

Example. Normal. 푵(흁, 흈ퟐ). 휽 = (휇, 휎) (both unknown). −∞ < 휇 < ∞, 0 < 휎 < ∞.

1 () 푓푥; 휇, 휎)= 푒 √2휋휎 1 1 = exp − (푥 − 휇) √2휋휎 2휎 1 1 = exp − (푥 −2푥휇 + 휇) √2휋휎 2휎

1 휇 1 = exp − exp − (푥 −2푥휇) √2휋휎 2휎 2휎

Example. 푋,..., 푋∼ 푁 (휃,1), 휃 = 1,2, is not complete.

Example. 푋,..., 푋∼ 푁 (휃,1),−∞ < 휃 < ∞, is complete.

Example. {Bin(2,푝), p = 1/2, p = 1/4} is not complete.

16

Example. The family {Bin(2,p), 0 < p < 1} is complete.

Properties of the Complete Statistics

(i) If 푇 is complete and 푆 = 휓(푇 ), then 푆 is also complete.

(ii) If a statistic T is complete and sufficient, then any minimal sufficient statistic is complete.

(iii) Trivial (constant) statistics are complete for any family.

11. Theorem (Lehmann-Scheffe). (Complete Sufficient Statistic and the Best Estimator)

If T is complete and sufficient, then 푼 = 풉(푻) is the Best Estimator (also called UMVUE or MVUE) of its expectation.

Example. Poisson(θ)

Let 푋,…, 푋 be a random sample from Poisson(θ)

Then

푇(푋) = 푋 is complete sufficient for 휃 . Since

푇(푋) ∑ 푋 푈 = = 푛 푛 is an unbiased estimator of θ – by the Lehmann-Scheffe theorem we know that U is a best estimator (UMVUE/MVUE) for θ.

17

12. Theorem (Basu)

A complete sufficient statistic T for the parameter θ is independent of any – that is, a statistic whose distribution does not depend on θ

Example. Consider a random sample of size n from a normal distribution 푵(흁, 흈ퟐ). 휽 = (휇, 휎).

ˆ  X  Consider the MLEs 풇풐풓 흁, 흈ퟐ   2 2 (Xi  X ) ˆ   n

It is easy to verify that 푋 is a complete sufficient statistic for 휇, for fixed values of 휎. Also:

풏흈ퟐ ~흌ퟐ(풏 − ퟏ) 흈ퟐ which does not depend on 휇. It follows from the Basu Theorem that the two MLEs are independent to each other.

18