Lecture 7, Review Data Reduction 1. Sufficient Statistics
Total Page:16
File Type:pdf, Size:1020Kb
Lecture 7, Review Data Reduction We should think about the following questions carefully before the "simplification" process: • Is there any loss of information due to summarization? • How to compare the amount of information about θ in the original data X and in 푇(푿)? • Is it sufficient to consider only the "reduced data" T? 1. Sufficient Statistics A statistic T is called sufficient if the conditional distribution of X given T is free of θ (that is, the conditional is a completely known distribution). Example. Toss a coin n times, and the probability of head is an unknown parameter θ. Let T = the total number of heads. Is T sufficient for θ? Sufficiency Principle If T is sufficient, the "extra information" carried by X is worthless as long as θ is concerned. It is then only natural to consider inference procedures which do not use this extra irrelevant information. This leads to the Sufficiency Principle : Any inference procedure should depend on the data only through sufficient statistics. 1 Definition: Sufficient Statistic (in terms of Conditional Probability) (discrete case): For any x and t, if the conditional pdf of X given T: 푃(퐗 = 퐱, 푇(퐗) = 푡) 푃(퐗 = 퐱) 푃(퐗 = 퐱|푇(퐗) = 푡) = = 푃(푇(퐗) = 푡) 푃(푇(퐗) = 푡) does not depend on θ then we say 푇(퐗) is a sufficient statistic for θ. Sufficient Statistic, the general definition (for both discrete and continuous variables): Let the pdf of data X is 푓(퐱; 휃) and the pdf of T be 푞(푡; 휃). If 푓(퐱; 휃)/푞(푇 (퐱); 휃) is free of θ, (may depend on x) (∗) for all x and θ, then T is a sufficient statistic for θ. Example: Toss a coin n times, and the probability of head is an unknown parameter θ. Let T = the total number of heads. Is T sufficient for θ? 푋 푖. 푖. 푑. Bernoulli: 푓(푥) = 휃 (1− 휃) , 푥 = 0,1 ∑ ∑ 푓(풙; 휃) = 푓(푥,…, 푥) = 휃 (1− 휃) 푇 = ∑ 푋 ~ B(n,θ): 푛 ∑ ∑ 푞(푡; 휃) = 푞(∑ 푥) = 휃 (1− 휃) ∑ 푥 Thus 푓(퐱; 휃) 푛 = 1/ 푞(푇 (퐱); 휃) 푡 is free of θ, So by the definition, ∑ 푋 is a sufficient statistic for 휃. 2 Example. 푋,…, 푋 iid 푁 (휃,1). 푇 = 푋. Remarks: The definition (*) is not always easy to apply. • Need to guess the form of a sufficient statistic. • Need to figure out the distribution of T. How to find a sufficient statistic? 2. (Neyman-Fisher) Factorization theorem. T is sufficient if and only if 푓(퐱; 휃) can be written as the product 푔(푇(퐱); 휃)ℎ(퐱), where the first factor depends on x only though 푇(퐱) and the second factor is free of θ. Example. Binomial. iid bin(1, θ) Solution 1: Bernoulli: 푓(푥) = 휃(1− 휃), 푥 = 0,1 ∑ ∑ 푓(퐱; 휃) = 푓(푥,…, 푥) = 휃 (1− 휃) = 휃∑ (1− 휃)∑ ⋅ [1] = 푔 푥 , 휃 ⋅ℎ(푥,…, 푥) So according to the factorization theorem, 푇 = ∑ 푋 is a sufficient statistic for 휃. Solution 2: 푓(퐱; 휃) = 푓(푥, 푥,…, 푥|휃) 휃 (1− 휃) , 푖푓 푥 = 0,1, 푖 = 1,2,…, 푛 = 0, 표푡ℎ푒푟푤푖푠푒 ∑ ( )∑ = 휃 1− 휃 , 푖푓 푥 = 0,1, 푖 = 1,2,…, 푛 0, 표푡ℎ푒푟푤푖푠푒 3 = 휃 (1− 휃) ℎ(푥, 푥,…, 푥) = 푔(푡, 휃)ℎ(푥,…, 푥), where 푔(푡, 휃) = 휃(1− 휃) and 1, 푖푓 푥 = 0,1, 푖 = 1,2,…, 푛 ℎ(푥 ,…, 푥 ) = 0, 표푡ℎ푒푟푤푖푠푒. Hence T is a sufficient statistic for θ. Example. Exp(λ). Let 푋,…, 푋 be a random sample from an exponential distribution with rate 휆. And Let 푇 = 푋 + 푋 +⋯+ 푋 and 푓 be the joint density of 푋, 푋,…, 푋. 푓(퐱; 휆) = 푓(푥, 푥,…, 푥|휆) 휆푒 , 푖푓 푥 > 0, 푖 = 1,2,…, 푛 = 0, 표푡ℎ푒푟푤푖푠푒 ∑ = 휆 푒 , 푖푓 푥 > 0, 푖 = 1,2,…, 푛 0, 표푡ℎ푒푟푤푖푠푒 = 휆 푒 ℎ(푥,…, 푥) = 푔(푡, 휆)ℎ(푥,…, 푥) where 푔(푡, 휆) = 휆푒, and 1, 푖푓 푥 > 0, 푖 = 1,2,…, 푛 ℎ(푥 ,…, 푥 ) = 0, 표푡ℎ푒푟푤푖푠푒. Hence T is a sufficient statistic for 휆. Example. Normal. iid N(θ,1). Please derive the sufficient statistic for θ by yourself. 4 When the range of X depends on θ, should be more careful about factorization. Must use indicator functions explicitly. Example. Uniform. iid 푼(ퟎ, 휽). Solution 1: Let 푋,…, 푋 be a random sample from an uniform distribution on (0, 휃). And Let 푇 = 푋() and 푓 be the joint density of 푋, 푋,…, 푋. Then 푓(퐱; 휃) = 푓(푥, 푥,…, 푥|휃) 1 , 푖푓 휃 > 푥 > 0, 푖 = 1,2,…, 푛 = 휃 0, 표푡ℎ푒푟푤푖푠푒 1 , 푖푓 휃 > 푥 > 0, 푖 = 1,2,…, 푛 = 휃 0, 표푡ℎ푒푟푤푖푠푒 1 , 푖푓 휃 > 푥 ≥⋯≥ 푥 >0 = 휃 () () 0, 표푡ℎ푒푟푤푖푠푒 = 푔(푡, 휃)ℎ(푥,…, 푥) where , 푖푓 휃 > 푡 >0 푔(푡, 휃) = , 0, 표푡ℎ푒푟푤푖푠푒 and 1, 푖푓 푥 >0 ℎ(푥 ,…, 푥 ) = () 0, 표푡ℎ푒푟푤푖푠푒. Hence T is a sufficient statistic for 휃. *** I personally prefer this approach because it is most straightforward. Alternatively, one can use the indicator function and simplify the solution as illustrated next. 5 Definition: Indicator function 1, 푖푓 푥 ∈ 퐴 퐼 (푥)= 0, 푖푓 푥 ∉ 퐴 Solution 2 (in terms of the indicator function): Uniform: 푓(푥) = , 푥 ∈ (0, 휃) 1 푓(퐱; 휃) = 푓(푥 ,…, 푥 ) = , 푥 ∈ (0, 휃),∀푖 휃 1 = 퐼 (푥 ) 휃 (,) 1 = 퐼 푥 ⋅ 퐼 푥 휃 (,) () (,) () 1 =[ 퐼 푥 ]⋅[퐼 푥 ] 휃 (,) () (,) () = 푔푥(), 휃 ⋅ℎ(푥,…, 푥) So by factorization theorem, 푥() is a sufficient statistic for 휃. Example: Please derive the sufficient statistics for θ, when given a random sample of size n from 푈 (휃, 휃 + 1). Solution: 1. Indicator function approach: Uniform: 푓(푥) = 1, 푥 ∈(휃, 휃 + 1) 푓(푥,…, 푥|휃) = (1) , 푥 ∈ (휃, 휃 +1),∀푖 = (1) 퐼(,)(푥) = 퐼(,)푥() ⋅ 퐼(,)푥() =[퐼(,)푥() ⋅ 퐼(,)푥()] ⋅ [1] = 푔푥(), 푥(), 휃 ⋅ ℎ(푥,…, 푥) So, 푇 = 푋(), 푋() is a SS for 휃. 2. Do not use the indicator function: 6 푓(푥, 푥,…, 푥|휃) 1, 푖푓 휃 +1> 푥 > 휃, 푖 = 1,2,…, 푛 = 0, 표푡ℎ푒푟푤푖푠푒 1, 푖푓 휃 +1> 푥 > 휃, 푖 = 1,2,…, 푛 = 0, 표푡ℎ푒푟푤푖푠푒 1, 푖푓 휃 +1> 푥 ≥⋯≥ 푥 > 휃 = () () 0, 표푡ℎ푒푟푤푖푠푒 = 푔(푥(), 푥(), 휃)ℎ(푥,…, 푥) where 1, 푖푓 휃 +1> 푥 푎푛푑 푥 > 휃 푔푥 , 푥 , 휃 = () () , () () 0, 표푡ℎ푒푟푤푖푠푒 and ℎ(푥,…, 푥) =1 So 푇 = 푋(), 푋() is a SS for 휃. Two-dimensional Examples. Example. Normal. iid 푵(흁, 흈ퟐ). 휽 = (흁, 흈ퟐ) (both unknown). Let 푋,…, 푋 be a random sample from a normal distribution 푁(휇, 휎). And Let 1 푋 = 푋 , 푛 1 S = (푋 − 푋), 푛 −1 and let 푓 be the joint density of 푋, 푋,…, 푋. 1 () 푓(퐱; 휽) = 푓(푥, 푥,…, 푥|휇, 휎 )= 푒 √2휋휎 7 1 ∑ ( ) = 푒 2휋휎 Now (푥 − 휇) = (푥 − 푥̅ + 푥̅ − 휇) = (푥 − 푥̅) +2 (푥 − 푥̅)(푥̅ − 휇) + (푥̅ − 휇) = (n−1)s +2(푥̅ − 휇) (푥 − 푥̅) + 푛(푥̅ − 휇) = (n−1)s + 푛(푥̅ − 휇). Thus, 푓(퐱; 휽) = 푓(푥, 푥,…, 푥 | 휇, 휎 ) 1 1 = exp (− ((푛 −1)푠 + 푛(푥̅ − 휇))) 2휋휎 2휎 = 푔(푥̅, 푠 , 휇, 휎 )ℎ(푥,…, 푥), where 푔(푥̅, 푠, 휇, 휎) 1 1 = 푒푥푝 − (푛 −1)푠 + 푛(푥̅ − 휇), 2휋휎 2휎 and ℎ(푥,…, 푥) = 1. In this case we say (푋, 푆) is sufficient for (휇, 휎). 8 3. (Regular) Exponential Family The density function of a regular exponential family is: 푓 (푥; 휽) = 푐(휽)ℎ(푥) exp 푤(휽)푡(푥) , 휽 = (휃,…, 휃) Example. Poisson(θ) 1 푓 (푥; 휃) = exp (−휃) exp[ln(휃) ∗ 푥] 푥! Example. Normal. 푵(흁, 흈ퟐ). 휽 = (휇, 휎) (both unknown). 1 () 푓푥; 휇, 휎)= 푒 √2휋휎 1 1 = exp − (푥 − 휇) √2휋휎 2휎 1 1 = exp − (푥 −2푥휇 + 휇) √2휋휎 2휎 1 휇 1 = exp − exp − (푥 −2푥휇) √2휋휎 2휎 2휎 4. Theorem (Exponential family & sufficient Statistic). Let 푋,…, 푋 be a random sample from the regular exponential family. Then 푻(푿) = 푡(푋) ,…, 푡(푋) is sufficient for 휽 = (휃,…, 휃). 9 Example. Poisson(θ) Let 푋,…, 푋 be a random sample from Poisson(θ) Then 푇(푿) = 푋 is sufficient for 휃 . Example. Normal. 푵(흁, 흈ퟐ). 휽 = (휇, 휎) (both unknown). Let 푋,…, 푋 be a random sample from 푵(휇, 휎 ) Then 푻(푿) =( 푋 , 푋 ) is sufficient for 휽 = (휇, 휎). Exercise. Apply the general exponential family result to all the standard families discussed above such as binomial, Poisson, normal, exponential, gamma. A Non-Exponential Family Example. Discrete uniform. 푃 (푋 = 푥) = 1/휃, 푥 = 1,..., 휃, 휃 is a positive integer. Another Non-exponential Example. 푋,..., 푋iid 푈 (0, 휃), 푇 = 푋(). 10 Universal Cases. 푋,…, 푋are iid with density 푓 . • The original data 푋,…, 푋 are always sufficient for 휃. (They are trivial statistics, since they do not lead any data reduction) • Order statistics 푇 = 푋(),…, 푋() are always sufficient for 휃. ( The dimension of order statistics is 푛, the same as the dimension of the data. Still this is a nontrivial reduction as 푛! different values of data corresponds to one value of 푇 . ) 5. Theorem (Rao-Blackwell) Let 푋,…, 푋 be a random sample from the population with pdf 푓 (풙; 휽). Let 푻(푿) be a sufficient statistic for θ, and 푼(푿) be any unbiased estimator of θ. Let 푼∗(푿) = 퐸[푼(푿)|푻], then (1) 푼∗(푿) is an unbiased estimator of 휽, (2) 푼∗(푿) is a function of T, (3) 푽풂풓(푼∗)≤ 푽풂풓(푼) for every 휽, and 푽풂풓(푼∗) < 푉푎푟(푈) for some 휽 unless 푼∗ = 푼 with probability 1 .