Lecture 4 Estimating a Univariate Distribution

← →1 Lecture 4 Estimating a Univariate Distribution In this lecture we consider estimating the CDF and PDF of a continuous random variable based on a simple random sample from it. Suppose the data we have is a sample Y1,Y2,...,Ym that is indepen- dently and identically distributed from the distribution F. Estimating the Cumulative Distribution Function of X The natural estimator of the relative CDF is the empirical cumulative distribution function, denoted by Fm(y) the proportion of the sample data that do not exceed the value y ⇐ CSSS 594 Distributional Methods → ← →2 This function is also called the empirical distribution function and the sample distribution function. Mathematically: m 1 X Fm(y) = I(Yj ≤ y) m j=1 where 1 if the event S is true I(S) = (1) 0 otherwise is the indicator function. Note that Fm(y) is a step function of y with jumps of 1/m at the ordered values of the sample data. ⇐ CSSS 594 Distributional Methods → ← →3 How well does Fm(y) estimate F (y)? For a fixed y, Fm(y) is itself a random variable. The exact distribution of mFm(y) is binomial with m trials and probability of success F (y) This leads to: Theorem. For each value of y, Fm(y) is a consistent estimator of F (y). The sequence Fm(y), m = 1, 2,..., is also asymptotically normal: ( F (y)(1 − F (y))) Fm(y) ∼ AN F (y), − ∞ < y < ∞ m as m → ∞. ⇐ CSSS 594 Distributional Methods → ← →4 Some important notation The Gaussian or Normal Distribution The notation N(µ, σ2) is used to denote a normal (or Gaussian) distribution with mean µ and variance σ2. The standard normal is N(0, 1) and the corresponding CDF is often denoted by Φ(x), −∞ < x < ∞. Asymptotic Convergence of Distributions Consider a sequence of random variables X1,X2,... where the mth random variable has CDF Fm(x). Suppose X has CDF H(x). ⇐ CSSS 594 Distributional Methods → ← →5 We say that the Xm converges in distribution to X if, for each continuity point of H(x), lim F (x) = H(x). m→∞ m This concept measures a sense in which the Xm are “cross-sectionally” close to X when the sample size is large. It does not focus on how close a particular sequence of Xm is to X, only the aggregate. ⇐ CSSS 594 Distributional Methods → ← →6 We say that the Xm converges with probability one to X if, ! P lim X = X = 1. m→∞ m This concept measures a sense in which the Xm are “longitudinally” close to X when the sample size is large. If a sequence converges with probability one, then it also converges in distribution. ⇐ CSSS 594 Distributional Methods → ← →7 We say the sequence is asymptotically normal with “mean” µm and 2 “variance” σm > 0 if Xm − µm σm converges in distribution to a standard normal distribution. In this situation H(x) = Φ(x) and so is continuous for each −∞ < x < ∞. For additional information see Kelly (1994) or Serfling (1980). ⇐ CSSS 594 Distributional Methods → ← →8 Notation for the convergence properties of sequences 1. Deterministic sequences: Let xn and yn be two real-valued deterministic (nonrandom) sequences. Then, as n → ∞, (a) xn = O(yn) if and only if lim supn→∞ |xn/yn| < ∞, (b) xn = o(yn) if and only if limn→∞ |xn/yn| = 0. 2. Random sequences: Let Xn and Yn be two real-valued random sequences. Then, as n → ∞, (a) Xn = Op(Yn) if and only if for all > 0, there exist δ and N such that P (|Xn/Yn| > δ) < , for all n > N, (b) Xn = op(Yn) if and only if for all > 0, limn→∞ P (|Xn/Yn| > ) = 0. ⇐ CSSS 594 Distributional Methods → ← →9 The result states that there is convergence for each value individual of y. One commonly used measure of the global closeness of Fm(y) to F (y) is the Kolmogorov-Smirnov distance Dm = sup |Fm(y) − F (y)|. 0<r<1 The convergence of Fm(y) to F (y) occurs simultaneously for all y in the sense that Dm converges to zero with probability one, that is, P [limm→∞ Dm = 0] = 1. In this sense, for large sample sizes the deviation between Fm(y) and F (y) will be small for all y. See Serfling (1980). ⇐ CSSS 594 Distributional Methods → ← →10 Estimation of the Quantile function Recall: Q(p) = F −1(p) = inf{x | F (x) ≥ p }. x The natural estimator Q(p) is the pth quantile of the sample distribution function Fm(y) defined by Qm(p) = inf{y : Fm(y) ≥ p}. The properties of Qm(p) as an estimator of Q(p) are similar to those of Fm(y) as an estimator of F (y). Theorem. Assume that 0 < p < 1 and suppose F (y) possesses a density, f(y), in a neighborhood of Q(p) and f(y) is positive and continuous at Q(p) Then, as m → ∞, ( p(1 − p) ) Qm(p) ∼ AN Q(p), mf 2(Q(p)) ⇐ CSSS 594 Distributional Methods → ← →11 These estimators have the drawback that they are step functions, while F (y) and Q(p) is usually continuous and much smoother. This suggests that alternative estimators exist that may better reflect the properties of F (y). In particular, if we had a smooth estimator of f(y), fˆ(y) say, we R y ˆ could estimate F (y) by 0 f(x)dx. ⇐ CSSS 594 Distributional Methods →.

Lecture 4 Estimating a Univariate Distribution

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support