Lecture 7, Review
Data Reduction
We should think about the following questions carefully before the "simplification" process:
• Is there any loss of information due to summarization?
• How to compare the amount of information about θ in the original data X and in 푇(푿)?
• Is it sufficient to consider only the "reduced data" T?
1. Sufficient Statistics
A statistic T is called sufficient if the conditional distribution of X given T is free of θ (that is, the conditional is a completely known distribution).
Example. Toss a coin n times, and the probability of head is an unknown parameter θ. Let T = the total number of heads. Is T sufficient for θ?
Sufficiency Principle
If T is sufficient, the "extra information" carried by X is worthless as long as θ is concerned. It is then only natural to consider inference procedures which do not use this extra irrelevant information. This leads to the Sufficiency Principle :
Any inference procedure should depend on the data only through sufficient statistics.
1
Definition: Sufficient Statistic (in terms of Conditional Probability)
(discrete case):
For any x and t, if the conditional pdf of X given T:
푃 (퐗 = 퐱, 푇(퐗) = 푡) 푃 (퐗 = 퐱) 푃 (퐗 = 퐱|푇(퐗) = 푡) = = 푃 (푇(퐗) = 푡) 푃 (푇(퐗) = 푡) does not depend on θ then we say 푇(퐗) is a sufficient statistic for θ.
Sufficient Statistic, the general definition (for both discrete and continuous variables):
Let the pdf of data X is 푓(퐱; 휃) and the pdf of T be 푞(푡; 휃). If
푓(퐱; 휃)/푞(푇 (퐱); 휃) is free of θ, (may depend on x) (∗) for all x and θ, then T is a sufficient statistic for θ.
Example: Toss a coin n times, and the probability of head is an unknown parameter θ. Let T = the total number of heads. Is T sufficient for θ?