
Econometrics II: Statistical Analysis Prof. Dr. Alois Kneip Statistische Abteilung Institut für Finanzmarktökonomie und Statistik Universität Bonn Contents: 1. Empirical Distributions, Quantiles and Nonparametric Tests 2. Nonparametric Density Estimation 3. Nonparametric Regression 4. Bootstrap 5. Semiparametric Models EconometricsII-Kneip 0–1 Some literature: • Gibbons, J.D. , A. (1971): Nonparametric Statistical Infe- rence, McGraw-Hill, Inc. for Data Analysis; Clarendon Press • Bowman, A.W. and Azzalini, A. (1997): Applied Smoothing Techniques for Data Analysis; Clarendon Press • Li and Racine (2007): Nonparametric Econometrics; Prince- ton University Press • Greene, W.H. (2008): Econometric Analysis; Pearson Edu- cation • Silverman, B.W. (1986): Density Estimation for Statistics and Data Analysis, Chapman and Hall • Davison, A.C and Hinkley, D.V. (2005): Bootstrap Methods and their Application, Cambridge University Press • Yatchew, A. (2003): Semiparametric Regression for the Ap- plied Econometrician, Cambridge University Press • Hastie, T., Tisbshirani, R. and Friedman, J. (2001): The ele- ments of statistical learning, Springer Verlag EconometricsII-Kneip 0–2 1 Empirical distributions, quantiles and nonparametric tests 1.1 The empirical distribution function The distribution of a real-valued random variable X can be com- pletely described by its distribution function F (x) = P (X ≤ x) for all x 2 IR: It is well-known that any distribution function possesses the fol- lowing properties: • F (x) is a monotonically increasing function of x • Any distribution function is right-continuous: lim F (x + j∆j) = F (x) ∆!0 for any x 2 IR. Furthermore, lim F (x − j∆j) = F (x) − P (X = x) ∆!0 • RIf F (x) is continuous, then there exists a density f such that x 2 −∞ f(t)dt = F (x) for all x IR. If f(x) is continuous at x, then F 0(x) = f(x). Data: i.i.d. random sample X1;:::;Xn For given data, the sample analogue of F is the so-called empiri- cal distribution function, which is an important tool of statistical inference. Let I(·) denote the indicator function, i.e., I(x ≤ t) = 1 if x ≤ t, and I(x ≤ t) = 0 if x > t. EconometricsII-Kneip 1–1 Empirical distribution function: P 1 n ≤ Fn(x) = n i=1 I(Xi x), i.e Fn(x) is the proportion of observations with Xi ≤ x Properties: • 0 ≤ Fn(x) ≤ 1 • Fn(x) = 0, if x < X(1), where X(1) - smallest observation • F (x) = 1, if x ≥ X(n), where X(n) - largest observation • Fn monotonically increasing step function Example x1 x2 x3 x4 x5 x6 x7 x8 5,20 4,80 5,40 4,60 6,10 5,40 5,80 5,50 Corresponding empirical distribution function: 1.0 0.8 0.6 0.4 0.2 0.0 4.0 4.5 5.0 5.5 6.0 6.5 EconometricsII-Kneip 1–2 For real valued random variables the empirical distribution func- tion is closely linked with the so-called “order statistics”. • Given a sample X1;:::;Xn, the corresponding order stati- stics is the n-tuple of the ordered observations (X(1);:::;X(n)), where X(1) ≤ X(2) ≤ · · · ≤ X(n). • For r = 1; : : : ; n, X(r) is called r-th order statistics. Order statistics can only be determined for one-dimensional ran- dom variables. But an empirical distribution function can also be defined for random vectors. Let X be a d-dimensional ran- d T dom variable defined on IR , and let Xi = (Xi1;:::;Xid) de- note an i.i.d. sample of random vectors from X. Then for any T x = (x1; : : : ; xd) F (x) = P (X1 ≤ x1;:::;Xd ≤ xd) and 1 Xn F (x) = I(X ≤ x ;:::;X ≤ x ) n n i1 1 id d i=1 We can also define the so-called “empirical measure” Pn. For any A ⊂ IRd 1 Xn P (A) = I(X 2 A) n n i i=1 Note that Pn(A) simply quantifies the relative frequency of obser- vations falling into A. As n ! 1 Pn(A) !P P (A) Note that Pn of course depends on the observation and thus is random. At the same time, however, it possesses all properties of a probability measures. When knowing Fn we can uniquely reconstruct all observed va- lues fX1;:::;Xng The only information lost is the exact succes- EconometricsII-Kneip 1–3 sion of these values. For i.i.d. samples this information is comple- tely irrelevant for all statistical purposes. All important statistics (and estimators) can thus be written as functions of Fn (or Pn)). In particular, in theoretical literature expectations and corre- sponding samples averages are often represented in the following form: For a continuous function g Z Z E(g(X)) = g(x)dP = g(x)dF (x) and Z Z 1 X g(X ) = g(x)dP = g(x)dF (x) n i n n i=1 R Here, g(x)dF (x) refers to the Stieltjes integral. This is a gene- ralization of the well-known Riemann integral. Let d = 1, and consider a partition a = x0 < x1 < ··· < xm = b of an interval [a; b]. Then Z b Xm g(x)dF (x) = lim g(ξj)(F (xj) − F (xj−1) m!1;sup jx −x j!0 a i+1 i j=1 if the limit exists and is independent of the specific choices of ξj 2 [xj−1; xj]. It can be shown that for any continuous function g and any distribution function F the correspondingR StieltjesR integral 1 ≡ exist for any finite interval [a; b]. −∞ g(x)dF (x) g(x)dF (x) corresponds to the limit (if existent) as a ! −∞, b ! 1. EconometricsII-Kneip 1–4 1.2 Theoretical properties of empirical distri- bution functions In the following we will assume that X is a real-valued random variable (d = 1). Theorem: For every x 2 IR nFn(x) ∼ B(n; F (x)); i.e., nFn(x) has a binomial distribution with parameters n and F (x). The probability distribution of Fn(x) is thus given by 0 1 ( ) m @ n A m n−m P Fn(x) = = F (x) (1−F (x)) ; m = 0; 1; : : : ; n n m Some consequences: • E(Fn(x)) = F (x), i.e. Fn(x) is an unbiased estimator of F (x) • 1 − V ar(Fn(x)) = n F (x)(1 F (x)), i.e. as n increases the va- riance of Fn(x) decreases. • Fn(x) is a (weakly) consistent estimator of F (x). Theorem of Glivenko-Cantelli: ! P lim sup jFn(x) − F (x)j = 0 = 1 n!1 x2IR EconometricsII-Kneip 1–5 The distribution of Y = F (X) Note: there is an important difference between F (x) und F (X): • For any fixed x 2 IR the corresponding value F (x) is also a fixed number, F (x) = P (X ≤ x) • F (X) is a random variable, where F denotes the distribution function of X. Theorem: Let X by a random variable with a continuous distri- bution function F . Then Y = F (X) has a (continuous) uniform distribution on the interval (0; 1), i.e. F (X) ∼ U(0; 1); P (a ≤ F (X) ≤ b) = b − a for all 0 ≤ a < b ≤ 1 Consequence: If F is continuous, then • F (X1);:::;F (Xn) can be interpreted as an i.i.d. random sample of observations from a U(0; 1) distribution • (F (X(1));:::;F (X(n)) is the corresponding order statistics EconometricsII-Kneip 1–6 1.3 Quantiles Quantiles are an essential tool for statistical analysis. They provi- de important information for characterizing location and disper- sion of a distribution. In statistical inference they play a central role in measuring risk. Let X denote a real valued random varia- ble with distribution function F . Quantiles: For 0 < τ < 1, any qτ 2 IR satisfying F (qτ ) = P (X ≤ qτ ) ≥ τ and P (X ≥ qτ ) ≥ 1 − τ is called τth quantile (or simply τ-quantile) of X. Note that quantiles are not necessarily unique. for given τ, there may exist an interval of possible values fulfilling the above con- ditions. But if X is a continuous random variable with density f, then qτ is unique if f(qτ ) > 0 (then F (qτ ) = τ and F (q) =6 τ for all q =6 qτ ). In statistical literature most work on quantiles is based on the so-called quantile function which is defined as an “inverse” dis- tribution function. For 0 < τ < 1 the quantile function is defined by Q(τ) : inffyj F (y) ≥ τg • For any 0 < τ < 1 the value qτ = Q(τ) is a τ-quantile satis- fying the above conditions. If there is an interval of possible values for qτ , Q(τ) selects the smallest possible value. • Like the distribution function, the quantile function provides a complete characterization of the random variable X. • If the distribution function F (x) is strictly monotonically increasing, then Q(τ) is the inverse of F , Q(τ) = F −1(τ). EconometricsII-Kneip 1–7 Important quantiles: • µmed = Q(0:5) is the median of X (with probability at least 0.5 an observation is smaller or equal to Q(0:5), and with probability at least 0.5 an observation is larger or equal to Q(0:5) • Q(0:25) and Q(0:75) are called lower and upper quartile, respectively. Instead of the standard deviation, the inter-quartile range IRQ = Q(0:75) − Q(0:25) (also called quartile coefficient of dispersion) is frequently used as a measure of statistical dispersion. Note that P (X 2 [Q(0:25);Q(0:75)]) ≈ 0:5. • Q(0:1);Q(0:2);:::;Q(0:9) are the “deciles” of X.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages61 Page
-
File Size-