Brief Review on Estimation Theory
Total Page:16
File Type:pdf, Size:1020Kb
£ Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. [email protected] This presentation is essentially based on the course ‘BASTA’ by E. Moulines ¢ ¡ Brief review on estimation theory. Oct. 2005 2 £ Presentation Outline Basic concepts and preliminaries • Parameter estimation • Asymptotic theory • Estimation methods (ML, moment, ...) • ¢ ¡ K. ABED-MERAIM ENST PARIS £ Basic Concepts and Preliminaries ¢ ¡ Brief review on estimation theory. Oct. 2005 4 £ Definition and applications The statistics represent the set of methods that allow the analysis (and information extration) of a given set of observations (data). Application examples include: The determination of the production quality by a probing study. • The measure of the visibility impact of a web site (i.e. number of • readed pages, visiting strategies, ...). The modelisation of the packets flow at a high-speed network gate. • The descrimination of important e-mails from spam. • The prediction of missing data for the restoration of old recordings. • The estimation and tracking of a mobile position in a cellular system. • etc, etc, ... ¢ • ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 5 £ Some history... One can distinguish 3 phases of development: Begining of XIX-th century, apprition of the first data analysis • experiments (Prony, Laplace) and the first canonical method in statistics (Gauss, Bayes). In the first part of the XX-th century (until the 1960s) the basis of the • statistical inference theory have been established by (Pearson, Fisher,, Neyman, Cramer,...). However, due to the lack of powerful calculation machines, the applications and the impact of the statistics were quite limited. With the fast development of computers and data bases, the statistic has • seen a huge expansion and the number of its applications covers a very large number of domains either in the industry or in research labs. ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 6 £ Statistical model In statistics, the observation x =(x , x , , xn) are seen as a • 1 2 ··· realization of a random vector (process) Xn =(X ,X , ,Xn) 1 2 ··· which law P is partially known. The observation model translates the a priori knowledge we have on • the data. The nature and complexity of the model varies considerably from one • application to another... ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 7 £ Parametric model Parametric model: is a set of probability laws (Pθ,θ Θ) indexed by • ∈ scalar or vectorial parameter θ IRd. ∈ Observation: the observation X is a random variable of distribution Pθ, • where the parameter θ is unknown. The probability of a given event is a function of θ and hence we’ll • write: Pθ(A),Eθ(X), ... ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 8 £ Objectives When considering parametric models, the objectives are often: The estimation: which consists to find an approximate value of • parameter θ. The testing: which is to answer the following type of questions... Can • we state, given the observation set, that the proportion of defective objects θ is smaller that ).)1 with a probability higher than 99%? ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 9 £ Example: Gaussian model A random variable X is said standard gaussian if it admits a p.d.f. • 1 x2 Φ(x)= exp( ). √2π − 2 which is referred to as X = (0, 1). N X is a gaussian random variable of mean µ and variance σ2 if • X = µ + σX0 where X0 is a standard gaussian. Gaussian model: the observation (X , ,Xn) are n gaussian iid • 1 ··· random variables of mean µ and variance σ2 (i.e. θ =(µ, σ)). ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 10 £ Statistic’s concept To build statistical estimators or tests, one has to evaluate certain • function of the observation: Tn = T (X , ,Xn). Such a function is 1 ··· called statistic. It is crucial that the defined statistic is not a function of the parameter θ • or the exact p.d.f. of the observations. A statistic is a random variable which distribution can be computed • from that of the observations. Note that a statistic is a random variable but not any random variable is • a statistic. ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 11 £ Examples n Empirical mean: Tn = Xi/n. • i=1 P Median value: Tn =(X)n. • Min + Max: Tn =0.5 (max(X , ,Xn) + min(X , ,Xn)). • 1 ··· 1 ··· n 2 Variance: Tn = X /n. • i=1 i P ¢ ¡ K. ABED-MERAIM ENST PARIS £ Parametric Estimation ¢ ¡ Brief review on estimation theory. Oct. 2005 13 £ Parametric versus non-parametric Non-parametric: The p.d.f. f of X is unknown but belongs to a known • function space , e.g. F = f : IR IR+, twice differentiable andf” M . F { → ≤ } leads to difficult estimation problems !! Semi-parametric: Consider for example a set of observations • (Xi,zi) following the regression model Xi = g(θ,zi)+ ǫi where g { } is a known function and ǫi are iid random variables. This model is said semi-parametric if the p.d.f. of ǫi is completely unknown. Parametric: The previous model is parametric if the p.d.f. of ǫi is • known (up to certain unknown point parameters). ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 14 £ Parametric estimation Let X =(X ,X , ,Xn) be an observation of a statistical model • 1 2 ··· (Pθ,θ Θ). ∈ An estimator is a function of the observation • θˆn(X)= θˆn(X ,X , ,Xn) 1 2 ··· used to infer (approximate) the value of the unknown parameter. ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 15 £ Example: Estimation of the mean value Let (X ,X , ,Xn) be a n-sample iid observation given by • 1 2 ··· Xi = θ + Xi , θ IR and Xi are iid zero-mean random variables. 0 ∈ 0 Mean estimators: • ˆ n 1- Empirical mean: θn = i=1 Xi/n. P 2- Median value: θˆn =(X)n. ˆ max(X1,···,Xn)+min(X1,···,Xn) 3- (Min + Max)/2: θn = 2 . ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 16 £ Estimator A statistic is referred to as ‘estimator’ to indicate that it is used to • ‘estimate’ a given parameter. The estimation theory allows us to characterize ‘good estimators’. • For that one needs ‘performance measures’ of a given estimator. • Different performance measures exist that sometimes might lead to • different conclusions: i.e. an estimator might be ‘good’ for a first criterion and ‘bad’ for another. ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 17 £ Bias An estimator T of parameter θ is said unbiased if θ is the mean-value • of the distribution of T (θ being the exact value of the parameter): i.e. Eθ(T )= θ. Otherwise, the estimator T is said ‘biased’ and the difference • b(T,θ)= Eθ(T ) θ represents the estimation bias. − ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 18 £ Example: variance estimation 1 Let (X , ,Xn) be an iid observation of pdf pθ(x)= p(x µ), • 1 ··· σ − θ =(µ, σ2), and p satisfies x2p(x)dx =1 and xp(x)dx =0. R R 1 n 2 2 Sn = (Xi X) is an unbiased estimator of σ . • n−1 i=1 − P 1 n 2 2 Vn = (Xi X) is a biased estimator of σ which bias is • n i=1 − given byPb = σ2/n. It is however said asymptotically unbiased as the − bias goes to zero when n tends to infinity. ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 19 £ Unbiased estimator Instead of θ, one might be interested by a function of this parameter... • For example in the previous example, the objective can be to estimate 2 σ = √θ2 instead of σ = θ2. When θ is a parameter vector, one might, in particular, be interested in estimating only a sub-vector of θ. T is an unbiased estimator of g(θ) if Eθ(T )= g(θ) for all θ Θ. • ∈ Otherwise, b(T, θ, g)= Eθ(T ) g(θ) would represent the bias of this • − estimator. ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 20 £ Bias and transforms Non-lineat transforms of unbiased estimators are not necessarily • unbiased: i.e. if T is an unbiased estimator of θ, g(T ) is not in general an unbiased estimate of g(θ). For example, if T is an unbiased estimate og θ then T 2 is not an • unbiased estimate of θ2. Indeed, we have 2 2 2 Eθ(T )= varθ(T )+(Eθ(T )) = varθ(T )+ θ . ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 21 £ Mean squares error Another pertinent performance measure is the mean squares error (MSE). The MSE measures the dispersion of the estimator arround the ‘true’ value of the parameter: MSE(T,θ)= R(T,θ)= E(T (X) θ)2. − The MSE can be decomposed into: 2 MSE(T,θ)=(b(T,θ)) + varθ(T ). ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 22 £ Example: MSE of the empirical mean 2 (X , ,Xn) n-sample iid observation of law (µ, σ ). • 1 ··· N −1 n Empirical mean: X = n Xi. • i=1 P Unbiased estimator and • σ2 var(X)= . n ¢ ¡ K. ABED-MERAIM ENST PARIS Brief review on estimation theory. Oct. 2005 23 £ Estimator’s comparison: Risk measure We have considered previously the quadratic risk (loss) function: • l(θ,α)=(θ α)2. − Other risk (loss) functions are possible and sometimes more suitable: • 1- Absoluve-value error: l(θ,α)= θ α, − 2- Truncated quadratic risk function: l(θ,α) = min((θ α)2, d2).