Econometrics II: Statistical Analysis

Prof. Dr. Alois Kneip Statistische Abteilung Institut für Finanzmarktökonomie und Statistik Universität Bonn

Contents: 1. Empirical Distributions, Quantiles and Nonparametric Tests 2. Nonparametric Density Estimation 3. Nonparametric Regression 4. Bootstrap 5. Semiparametric Models

EconometricsII-Kneip 0–1 Some literature: • Gibbons, J.D. , A. (1971): Nonparametric Statistical Infe- rence, McGraw-Hill, Inc. for Data Analysis; Clarendon Press • Bowman, A.W. and Azzalini, A. (1997): Applied Smoothing Techniques for Data Analysis; Clarendon Press • Li and Racine (2007): Nonparametric Econometrics; Prince- ton University Press • Greene, W.H. (2008): Econometric Analysis; Pearson Edu- cation • Silverman, B.W. (1986): Density Estimation for Statistics and Data Analysis, Chapman and Hall • Davison, A.C and Hinkley, D.V. (2005): Bootstrap Methods and their Application, Cambridge University Press • Yatchew, A. (2003): Semiparametric Regression for the Ap- plied Econometrician, Cambridge University Press • Hastie, T., Tisbshirani, R. and Friedman, J. (2001): The ele- ments of statistical learning, Springer Verlag

EconometricsII-Kneip 0–2 1 Empirical distributions, quantiles and nonparametric tests

1.1 The empirical distribution function

The distribution of a real-valued random variable X can be com- pletely described by its distribution function

F (x) = P (X ≤ x) for all x ∈ IR.

It is well-known that any distribution function possesses the fol- lowing properties: • F (x) is a monotonically increasing function of x • Any distribution function is right-continuous:

lim F (x + |∆|) = F (x) ∆→0 for any x ∈ IR. Furthermore,

lim F (x − |∆|) = F (x) − P (X = x) ∆→0 • ∫If F (x) is continuous, then there exists a density f such that x ∈ −∞ f(t)dt = F (x) for all x IR. If f(x) is continuous at x, then F ′(x) = f(x).

Data: i.i.d. random sample X1,...,Xn For given data, the sample analogue of F is the so-called empiri- cal distribution function, which is an important tool of statistical inference. Let I(·) denote the indicator function, i.e., I(x ≤ t) = 1 if x ≤ t, and I(x ≤ t) = 0 if x > t.

EconometricsII-Kneip 1–1 Empirical distribution function: ∑ 1 n ≤ Fn(x) = n i=1 I(Xi x),

i.e Fn(x) is the proportion of observations with Xi ≤ x Properties:

• 0 ≤ Fn(x) ≤ 1

• Fn(x) = 0, if x < X(1), where X(1) - smallest observation

• F (x) = 1, if x ≥ X(n), where X(n) - largest observation

• Fn monotonically increasing step function Example

x1 x2 x3 x4 x5 x6 x7 x8 5,20 4,80 5,40 4,60 6,10 5,40 5,80 5,50 Corresponding empirical distribution function:

1.0

0.8

0.6

0.4

0.2

0.0 4.0 4.5 5.0 5.5 6.0 6.5

EconometricsII-Kneip 1–2 For real valued random variables the empirical distribution func- tion is closely linked with the so-called “order statistics”.

• Given a sample X1,...,Xn, the corresponding order stati- stics is the n-tuple

of the ordered observations (X(1),...,X(n)), where X(1) ≤ X(2) ≤ · · · ≤ X(n).

• For r = 1, . . . , n, X(r) is called r-th order statistics.

Order statistics can only be determined for one-dimensional ran- dom variables. But an empirical distribution function can also be deﬁned for random vectors. Let X be a d-dimensional ran- d T dom variable deﬁned on IR , and let Xi = (Xi1,...,Xid) de- note an i.i.d. sample of random vectors from X. Then for any T x = (x1, . . . , xd)

F (x) = P (X1 ≤ x1,...,Xd ≤ xd) and 1 ∑n F (x) = I(X ≤ x ,...,X ≤ x ) n n i1 1 id d i=1

We can also deﬁne the so-called “empirical measure” Pn. For any A ⊂ IRd 1 ∑n P (A) = I(X ∈ A) n n i i=1

Note that Pn(A) simply quantiﬁes the relative frequency of obser- vations falling into A. As n → ∞ Pn(A) →P P (A) Note that Pn of course depends on the observation and thus is random. At the same time, however, it possesses all properties of a probability measures.

When knowing Fn we can uniquely reconstruct all observed va- lues {X1,...,Xn} The only information lost is the exact succes-

EconometricsII-Kneip 1–3 sion of these values. For i.i.d. samples this information is comple- tely irrelevant for all statistical purposes. All important statistics

(and estimators) can thus be written as functions of Fn (or Pn)). In particular, in theoretical literature expectations and corre- sponding samples averages are often represented in the following form: For a continuous function g ∫ ∫ E(g(X)) = g(x)dP = g(x)dF (x) and ∫ ∫ 1 ∑ g(X ) = g(x)dP = g(x)dF (x) n i n n i=1 ∫ Here, g(x)dF (x) refers to the Stieltjes integral. This is a gene- ralization of the well-known Riemann integral. Let d = 1, and consider a partition a = x0 < x1 < ··· < xm = b of an interval [a, b]. Then ∫ b ∑m g(x)dF (x) = lim g(ξj)(F (xj) − F (xj−1) m→∞;sup |x −x |→0 a i+1 i j=1 if the limit exists and is independent of the speciﬁc choices of ξj ∈

[xj−1, xj]. It can be shown that for any continuous function g and any distribution function F the corresponding∫ Stieltjes∫ integral ∞ ≡ exist for any ﬁnite interval [a, b]. −∞ g(x)dF (x) g(x)dF (x) corresponds to the limit (if existent) as a → −∞, b → ∞.

EconometricsII-Kneip 1–4 1.2 Theoretical properties of empirical distri- bution functions

In the following we will assume that X is a real-valued random variable (d = 1). Theorem: For every x ∈ IR

nFn(x) ∼ B(n, F (x)), i.e., nFn(x) has a binomial distribution with parameters n and

F (x). The probability distribution of Fn(x) is thus given by ( ) m n m n−m P Fn(x) = = F (x) (1−F (x)) , m = 0, 1, . . . , n n m

Some consequences:

• E(Fn(x)) = F (x), i.e. Fn(x) is an unbiased estimator of F (x) • 1 − V ar(Fn(x)) = n F (x)(1 F (x)), i.e. as n increases the va- riance of Fn(x) decreases.

• Fn(x) is a (weakly) consistent estimator of F (x).

Theorem of Glivenko-Cantelli: ( )

P lim sup |Fn(x) − F (x)| = 0 = 1 n→∞ x∈IR

EconometricsII-Kneip 1–5 The distribution of Y = F (X)

Note: there is an important diﬀerence between F (x) und F (X): • For any ﬁxed x ∈ IR the corresponding value F (x) is also a ﬁxed number, F (x) = P (X ≤ x) • F (X) is a random variable, where F denotes the distribution function of X. Theorem: Let X by a random variable with a continuous distri- bution function F . Then Y = F (X) has a (continuous) uniform distribution on the interval (0, 1), i.e.

F (X) ∼ U(0, 1),

P (a ≤ F (X) ≤ b) = b − a for all 0 ≤ a < b ≤ 1

Consequence: If F is continuous, then

• F (X1),...,F (Xn) can be interpreted as an i.i.d. random sample of observations from a U(0, 1) distribution

• (F (X(1)),...,F (X(n)) is the corresponding order statistics

EconometricsII-Kneip 1–6 1.3 Quantiles

Quantiles are an essential tool for statistical analysis. They provi- de important information for characterizing location and disper- sion of a distribution. In statistical inference they play a central role in measuring risk. Let X denote a real valued random varia- ble with distribution function F .

Quantiles: For 0 < τ < 1, any qτ ∈ IR satisfying

F (qτ ) = P (X ≤ qτ ) ≥ τ and P (X ≥ qτ ) ≥ 1 − τ is called τth quantile (or simply τ-quantile) of X. Note that quantiles are not necessarily unique. for given τ, there may exist an interval of possible values fulﬁlling the above con- ditions. But if X is a continuous random variable with density f, then qτ is unique if f(qτ ) > 0 (then F (qτ ) = τ and F (q) ≠ τ for all q ≠ qτ ). In statistical literature most work on quantiles is based on the so-called quantile function which is deﬁned as an “inverse” dis- tribution function. For 0 < τ < 1 the quantile function is deﬁned by

Q(τ) : inf{y| F (y) ≥ τ}

• For any 0 < τ < 1 the value qτ = Q(τ) is a τ-quantile satis- fying the above conditions. If there is an interval of possible

values for qτ , Q(τ) selects the smallest possible value. • Like the distribution function, the quantile function provides a complete characterization of the random variable X. • If the distribution function F (x) is strictly monotonically increasing, then Q(τ) is the inverse of F , Q(τ) = F −1(τ).

EconometricsII-Kneip 1–7 Important quantiles:

• µmed = Q(0.5) is the median of X (with probability at least 0.5 an observation is smaller or equal to Q(0.5), and with probability at least 0.5 an observation is larger or equal to Q(0.5) • Q(0.25) and Q(0.75) are called lower and upper quartile, respectively. Instead of the standard deviation, the inter-quartile range IRQ = Q(0.75) − Q(0.25) (also called quartile coeﬃcient of dispersion) is frequently used as a measure of statistical dispersion. Note that P (X ∈ [Q(0.25),Q(0.75)]) ≈ 0.5. • Q(0.1),Q(0.2),...,Q(0.9) are the “deciles” of X. • Q(0.01),Q(0.02),...,Q(0.99) are the “percentiles” of X. The median is of particular interest. In classical nonparametric statistics it often preferred to the mean µ = E(x) in order to localize the center of a distribution. Diﬀerent from the mean, the median is deﬁned for any real valued random variable X. The median is a robust measure, its value is not much aﬀec- ted by the tails of a distribution (⇒ empirically, outliers in the data do not play much of a role when estimating a median or quartiles). If a distribution is heavily skewed, then the median is more informative than the mean for localizing the “center” of a distribution.

• If the distribution of X is symmetric, then µmed = µ (provi- ded that µ = E(X) exists).

• For skewed distribution µ ≠ µmed. In general,

µmed < µ if the distribution is right-skewed,

µmed > µ if the distribution is left-skewed.

EconometricsII-Kneip 1–8 For many important measures “summarizing” characteristics of a distribution, there exist diﬀerent versions which are either ba- sed on moments or on quantiles. The quantile-based versions are necessarily more robust, since quantiles are well-deﬁned for any distribution, while the existence of moments already introduces some restriction. Some summary measures:

• Location measures: mean µ = E(X), median µmed • Dispersion measures: standard deviation σ, IRQ • Skewness measures: ( )3 (X−µ γ1 := E σ

Q(τ)+Q(1−τ)−2µmed γ(τ) = Q(τ)−Q(1−τ) (for τ > 0.5) In empirical analysis, sample quantiles are used to estimate the unknown true quantiles of X.

Data: i.i.d random sample X1,...,Xn from X

The sample quantile function Qn(τ) is then deﬁned by using the empirical distribution function Fn instead of F . The sample quantile function: For 0 < τ < 1 deﬁne

Qn(τ) : inf{y| Fn(y) ≥ τ}

• For a ﬁxed τ ∈ (0, 1), Qn(τ) is called the τth sample quantile.

A frequently used tool for descriptive data analysis is the so- called boxplot. The boxplot provides a graphical description of the empirical distribution of the observed data by using sample quantiles. It provides information about median, lower and upper quartiles, as well as outliers.

EconometricsII-Kneip 1–9 Example: Order statistic (n=10): 0,1 0,1 0,2 0,4 0,5 0,7 0,9 1,2 1,4 1,9

Histogram:

0.8

0.6

0.4

0.2

0.0 0.0 0.5 1.0 1.5 2.0 x

Boxplot:

0.0 0.5 1.0 1.5 2.0 x

EconometricsII-Kneip 1–10 EconometricsII-Kneip

Stundenlohn

0 10 20 30 40 Frauen Maenner 1–11 1.4 Nonparametric tests: the Kolmogorov-Smirnov test

There exists an enormous variety of nonparametric tests for diﬀe- rent statistical problems. Starting with the Kolmogorov-Smirnov one sample test we will introduce some important test procedu- res which are based on the use of empirical distribution functions and order statistics. There exist many further “classical” nonparametric tests based on various approaches. A reference is the book by Gibbons (1971). Although approaches and setups are diﬀerent, there are some common characteristics shared by all of these tests: • Generality: The null hypothesis of interest is formulated in a general way; no parametrization, no dependence on existence and values of moments of speciﬁc distributions. • Distribution-free tests: The distribution of the tests stati-

stics under H0 is does not depend on the underlying distri- bution of the variable of interest • Robustness: test results should not be unduly aﬀected by ”outliers” or small departures from the model assumptions

Goodness-of-ﬁt tests: There are a number of nonparametric tests which try to assess whether a given distribution is suited to a dataset. The aim is to verify whether an observed variable possesses a speciﬁed distribution, as e.g. an exponential distribu- tion with parameter λ = 1 or a normal distribution with mean 0 and variance 1. The most important test in this context is the Kolmogorov-Smirnov test

EconometricsII-Kneip 1–12 Assumption: Real-valued random variable X with continuous distribution function F

Data: i.i.d. random sample X1,...,Xn from X

Goal: Test of the null hypothesis H0 : F = F0, where F0 is a given distribution function.

Idea: Fn(x) is an unbiased and consistent estimator of F (x).

Hence, if the null hypothesis is correct and F = F0, the diﬀe- rences |Fn(x) − F0(x)| should be suﬃciently small.

Kolmogorov-Smirnov test:

H0 : F (x) = F0(x) for all x ∈ IR

H1 : F (x) ≠ F0(x) for some x ∈ IR Test statistic:

Dn = sup |Fn(x) − F0(x)| x∈IR

H0 is rejected if Dn > dn,1−α, where dn,1−α is the 1−α-quantile of the distribution of Dn under H0.

Problem: Distribution of Dn under H0?

a) Under H0 : F = F0 the test statistic Dn is distribution-free. It coincides with the distribution of the random variable ∗ | − ∗ | Dn = sup y Fn (y) . y∈[0,1] ∗ Here, Fn denotes the empirical distribution function of an i.i.d. sample Y1,...,Yn from a U(0, 1)-distribution. b) Asymptotic distribution (n large): For every

EconometricsII-Kneip 1–13 λ > 0 we obtain √ ∑∞ k−1 −2k2λ2 lim P (Dn ≤ λ/ n) = 1 − 2 (−1) e n→∞ k=1

• Result a) implies that the critical values of a Kolmogorov- Smirnov test can be approximated by Monte-Carlo-simulations: – Using a random number generator draw an i.i.d. sample

Y1,...,Yn from a U[0, 1]-distribution, and calculate the ∗ | − ∗ | corresponding value Dn,1 = supy∈IR y Fn (y) . – Iterate k times (k large, e.g. k = 2000) ⇒ ∗ ∗ ∗ k values: Dn,1,Dn,2,...,Dn,k – the (1 − α)-quantile of the empirical distribution of ∗ ∗ ∗ Dn,1,Dn,2,...,Dn,k provides an approximation of dn,1−α (the larger k, the more accurate the approximation)

• There exist tables providing critical values dn,1−α for small n.

Example: A manufacturer of a certain SUV claims that when driving at a constant speed of 100 km/h fuel consumption of the SUV is normally distributed with mean µ = E(X) = 12 und standard deviation σ = 1. A random sample of 10 SUVs leads to the following observed fuel consumptions:

12.4 11.8 12.9 12.6 13.0 12.5 12.0 11.5 13.2 12.8

Calculating the K-S test statistic yields (n = 10): D10 = 0.3554

Critical value of the test for n = 10 and α = 0.05: d10,0.95 = 0.409

⇒ H0 is accepted, since 0.3554 < 0.409

Remark: In principle, the test may also be used for discrete

EconometricsII-Kneip 1–14 distributions. In this case the test is conservative, i.e. under

H0 the probability of a type I error is usually smaller than α. Composite null hypotheses

It is common to speak of a composite null hypothesis, if F0(x) ≡

F0(x, θ) is only speciﬁed up to an unknown parameter vector θ ∈ IRm. An example is the normal distribution with unknown mean and variance, i.e. θ = (µ, σ2). In such a case the aim is simply to test whether the data are “normally distributed” (irrespective of the particular mean and variance). Testing problem:

H0 : F (x) = F0(x, θ) for all x ∈ IR; θ unknown

H1 : For all possible θ: F (x) ≠ F0(x, θ) for some x ∈ IR Test statistic:

ˆ Dn = sup |Fn(x) − F0(x, θ)| x∈IR ˆ Here, θ denotes the maximum-likelihood estimate∑ of θ. ˆ ¯ 2 2 1 − ¯ 2 Normal distribution: θ = (X, σˆ ), σˆ = n i(Xi X) .

H0 is rejected if Dn > dn,1−α • In general one uses the same critical values as in the case of a simple null hypothesis (see above). This implies that the

test is conservative, i.e. under H0 the probability of a type I error is usually smaller than α. • For the special case of a normal distribution, exact criti- cal values have been determined by Lillifors. The resulting “Lillifors test” is implemented in many statistical program packages.

EconometricsII-Kneip 1–15 1.5 Nonparametric one-sample tests

1.5.1 Rank statistics

Many nonparametric tests are (implicitly or explicitly) based on ranks of observations. Ranks are easily determined from order statistics.

• Consider an i.i.d. random sample X1,...,Xn from a conti-

nuous random variable X. If Xi ≠ Xj for all i ≠ j, then the

rank r(Xi) of observation Xi, i = 1, . . . , n, is deﬁned by ∑n r(Xi) := I(Xj ≤ Xi). j=1 This means that the smallest observation has rank 1, while the largest observation has rank n, and

r(X(i)) = i i = 1, . . . , n

• For an i.i.d. sample from a continuous random variable we

have P (Xi = Xj for some i ≠ j) = 0. Consequently, with

probability 1, r(X1), . . . , r(Xn) is a random permutation of all natural numbers between 1 and n. n+1 – E(r(Xi)) = 2 n2−1 – V ar(r(Xi)) = 12 • In practice, it can of course occur that there exist “ties”, i.e. diﬀerent observations which have equal values. In this case an average rank is assigned to all observations with identical value.

EconometricsII-Kneip 1–16 Examples (n=5):

Xi 0, 3 1, 5 −0, 1 0, 8 1, 0

r(Xi) 2 5 1 3 4

Xi 2, 0 0, 5 0, 9 1, 3 2, 6

r(Xi) 4 1 2 3 5

Xi 1, 09 2, 17 2, 17 2, 17 3, 02

r(Xi) 1 3 3 3 5

Xi 0, 5 0, 5 0, 9 1, 3 1, 3

r(Xi) 1, 5 1, 5 3 4.5 4.5

Note: If there are ties, then the empirical variance of r(Xi) is n2−1 necessarily smaller than 12 .

1.5.2 Linear rank statistics (one sample)

Consider a random variable X with continuous distribution func- tion F

Data: i.i.d. random sample X1,...,Xn Nonparametric one-sample tests try to verify hypotheses concer- ning the location of the center of a distribution. More precisely, they aim to test whether the median µmed is equal to a pre- speciﬁed value µ0.

Recall that for a continuous random variable the median µmed necessarily statisﬁes F (µmed) = 0.5. For simplicity, in the follo- wing we will only consider two-sided tests. One-sided tests are completely analogous.

EconometricsII-Kneip 1–17 Formal testing problem:

H0 : µmed = µ0

H1 : µmed ≠ µ0

Example: For studying the intelligence of PhD students at a certain university n = 10 students were randomly selected and the corresponding IQ-values were measured using an IQ test. This lead to the following 10 observations:

Xi 99 131 118 112 128 136 120 107 134 122

Question: Is the data compatible with the hypothesis H0 : µmed = 110?

Linear rank statistics for the one-sample problem rely on the ranks of the absolute values of the diﬀerences Di = Xi − µ0:

r(|Di|) := rank of |Di| = |Xi − µ0| in the sample

of the absolute values|D1|,..., |Dn|

Moreover, let 1 if Xi − µ0 > 0 Vi := 0 if Xi − µ0 ≤ 0

+ For a suitable weight function g a linear rank statistics Ln is then deﬁned by ∑n + | | · Ln = g(r( Di )) Vi i=1

EconometricsII-Kneip 1–18 IQ-example (µ0 = 110):

Xi 99 131 118 112 128 136 120 107 134 122

Vi 0 1 1 1 1 1 1 0 1 1

|Di| 11 21 8 2 18 26 10 3 24 12

r(|Di|) 5 8 3 1 7 10 4 2 9 6

There exist some general theoretical results on the choice of a sui- table weight function for constructing locally optimal rank tests. The term “locally optimal” refers to the assumption that the un- derlying F is “close” to some pre-speciﬁed parametric distribution (e.g. normal). In practice, the most frequently used linear rank tests are the sign test and the Wilcoxon test.

The sign test: The sign test is the linear rank test with the simplest possible weight function: g(x) = 1 for all x. For testing

H0 : µmed = µ0 the sign test thus relies on the test statistics ∑n + Vn = Vi i=1 • 1 1 Under H0 we obtain P (Vi = 1) = 2 and P (Vi = 0) = 2 • ∗ This implies that the null distribution of Vn is a binomial 1 distribution with parameters n and 2 , 1 V + ∼ B(n, ). n 2

⇒ For a given signiﬁcance level α > 0, the sign test rejects H0 if + + either P (B 1 ≤ V ) ≤ α/2 or P (B 1 ≥ V ) ≤ α/2. n, 2 n n, 2 n n large: the binomial distribution may be approximated by a

EconometricsII-Kneip 1–19 normal distribution. Under H0 we have approximatively V + − n/2 n√ ∼ AN(0, 1) n/4

Remark: Since F is continuous we have P (Xi − µ0 = 0) = 0. In practice, however, there may exist observations with Xi −µ0 = 0. In this case it is common practice to eliminate these observations and to apply the sign test to the corresponding reduced sample.

The Wilcoxon test: The Wilcoxon test is a linear rank test based on the weight function g(x) = x for all x. It relies on the additional assumption that the underlying distribution is sym- metric. The test statistic is ∑n + | | · Wn = r( Di ) Vi i=1 For a given signiﬁcance level α > 0, the Wilcoxon test rejects + ≤ + ≥ H0 if either Wn wn,α/2 or Wn wn,1−α/2. Here, wn,α/2 and wn,α/2 are the corresponding quantiles of the distribution of Wn under H0.

• If F is symmetric, then under H0 the statistic Wn is distribution- 1 free. Under H0, V1,...,Vn are i.i.d. with P (Vi = 1) = 2 1 and P (Vi = 0) = 2 , while symmetry of F implies that the random variables Vi and |Di| are independent. Hence, all possible combinations of zeros and ones for the indicator

variables V1,...,Vn are equally probable, while at the sa-

me time r(|D|), . . . , r(|Dn|) are purely random permutations of {1, . . . , n}. Therefore, critical values can be obtained by straightforward combinatorial methods.

EconometricsII-Kneip 1–20 • Asymptotic approximation (n large):

W + − n(n+1) √n 4 ∼ AN(0, 1), + V ar(Wn )

+ n(n+1)(2n+1) where V ar(Wn ) = 24 Note: The theoretical derivation of the null distribution relies on the assumption of a continuous random variable (probability of ties equal to zero). Ties may of course exist in practice. Then the above distribu- tion are only approximatively valid, and the accuracy of approximation decreases with the number of ties. In the literature there can be found some formulas which provide corrected critical values in the presence of ties.

Application: Paired-sample procedures

Paired samples: § ¤ Sample (X1,Y1),..., (Xn,Yn)

X1,...,Xn i.i.d. with distribution function FX

Y1,...,Ym i.i.d. with distribution function FY

Xi und Yi not independent; e.g. (Xi,Yi) repeated measure- ¦ments for the same statistical unit ¥

Example: advertising campaign The following table represents the weekly sales (in 10000 Euro) of a trade chain before and after an advertising campaign.

chain store 1 2 3 4 5 6 before campaign (X) 18,5 15,6 20,1 17,2 21,1 19,3 after campaign (Y) 20,2 16,6 19,8 19,3 21,9 19,0

EconometricsII-Kneip 1–21 ⇒ x¯ = 18, 63, y¯ = 19, 47 Question: Has the advertising campaign been successful? Did the campaign (in tendency) lead to signiﬁcantly higher sales? Nonparametric approach: Analysis of the resulting sample of diﬀerences

Z1 = X1 − Y1,Z2 = X2 − Y2,...,Zn = Xn − Yn The above problem can be translated into the question: Is the median of Z1,...,Zn signiﬁcantly diﬀerent from zero? ⇒ Testing problem:

H0 : µmed;Z = 0

H1 : µmed;Z ≠ 0

⇒ Application of the sign test (or Wilcoxon test) based on Z1,...,Zn.

Power for detecting alternatives: • Parametric alternative (assuming normality): Student t-test • The asymptotic relative eﬃciency of the sign test relative to the t-Test ist 0.637 if the underlying distribution is normal. The sign test can be much more eﬃcient than the t-test if the underlying distribution is skew or possesses heavy tails. • For a symmetric distribution the Wilcoxon test is always more eﬃcient than the sign test. The asymptotic relative eﬃciency of the Wilcoxon test relative to the t-Test ist 0.96 if the underlying distribution is normal.

EconometricsII-Kneip 1–22 1.6 Two-sample tests

In the following we consider two random variables X und Y with continuous distribution functions FX und FY

Data: i.i.d random samples X1,...,Xm and Y1,...,Yn from un- derlying populations with distribution functions FX und FY . Xi is independent of Yj for all i, j.

Problem: Test the null hypothesis H0 : FX = FY of equality of the underlying distribution

Example: Coﬀee and the speed of typing on a keyboard An experiment was conducted in order to measure the inﬂuence of caﬀeine on the speed of typing on a computer keyboard. 20 trained test persons were randomly divided into two groups of 10 persons. The ﬁrst group did not receive any beverages, but each member of the second group had to drink a big cup of coﬀee (administering 200 mg caﬀeine). Every test person then had to type a text on a keyboard. The following table provides the respective average number of characters typed per minute.

no caﬀeine (X) 242.8 245.3 244.0 240.2 247.1 248.3 241.7 244.7 246.5 240.4 200 mg caﬀ. (Y) 246.4 251.1 250.2 252.3 248.0 250.9 246.1 248.2 245.6 250.0 Question: Does there exist a diﬀerence between typing speeds with and without caﬀeine?

Formal testing problem:

H0 : FX = FY

H1 : FX ≠ FY

EconometricsII-Kneip 1–23 For two sample tests based on order statistics the rank of the observations Xi and Yj in the combined samples of all n + m observations play a central role. If there are no ties, then r(Xi) is deﬁned by ∑m ∑n r(Xi) := I(Xj ≤ Xi) + I(Yj ≤ Xi) j=1 j=1 and consequently ∑n r(X(i)) = i + I(Yj ≤ X(i)) j=1 for all i = 1, . . . , n. If H0 : FX = FY is correct, then all ranks bet- 1 ween 1 and m + n are equally probable, P (r(Xi) = j) = n+m for all j ∈ {1, . . . , m+n}. More precisely, under H0, r(X1), . . . , r(Xm) can be interpreted as m numbers randomly drawn from the set {1, 2, . . . , m + n}. All possible sequence of these m numbers are equally probable. This will not be true under the alternative.

1.6.1 The Kolmogorov-Smirnov two-sample test

Note:

• The empirical distribution functions FX,m and FY,n are un-

biased and consistent estimators of FX and FY , respectively.

• If the null hypothesis H0 : FX = FY is correct, all diﬀe-

rences |FX,m(x) − FY,n(x)| are purely random and should be suﬃciently small. This motivates the two-sample test of Kolmogorov and Smirnov for testing H0 : FX = FY .

EconometricsII-Kneip 1–24 Test statistic:

Dm,n = sup |FX,m(x) − FY,n(x)| x∈IR

H0 is rejected if Dm,n > dm,n,1−α, where dm,n,1−α is the 1−α- quantile of the distribution of Dm,n under the null hypothesis.

a) Under H0 : FX = FY , the test statistic Dmn is distribution- free. Critical values can be obtained by straightforward com- binatorics. Recall that ties do not play any role in theoretical analysis, since they have probability 0. We obtain {

Dm,n = max max |FX,m(Xi) − FY,n(Xi)|, i=1,...,m }

max |FX,m(Yj ) − FY,n(Yj )| j=1,...,n { n i 1 ∑ = max max | − I(Yj ≤ X(i))|, i=1,...,m m n j=1 m } 1 ∑ i max | I(Xj ≤ Y(i))| − | i=1,...,n m n j=1

The values of Dm,n thus only depend on the ranks of Xi,

Xj in the combined sample of all m + n observation. Since

all under H0 all ranks are equally probable, critical values are thus obtained by a simple counting procedure. b) Asymptotic distribution (n large): For all λ > 0 √ ∑∞ k−1 −2k2λ2 lim P (Dm,n ≤ λ/ mn/(m + n)) = 1−2 (−1) e n→∞ k=1

c) The Kolmogorov-Smirnov test is consistent for all alternati- ves.

EconometricsII-Kneip 1–25 1.6.2 Linear rank statistics

• Rank tests are explicitly constructed on the basis of the ranks

of Xi and Yi in the combined samples of all N = m + n observations.

• Under H0 : FX = FY the combined sample can be interpre- ted as an i.i.d. random sample of size N := m + n from a

population with distribution function FX = FY . If there are no ties, the ranks are random permutations of the natural numbers between 1 and N. Rank tests then aim to verify, whether the distribution of ranks is indeed purely random, or if there are systematic diﬀerences between the ranks of

the X and Y variables which indicate that FX ≠ FY . Most commonly used rank tests for the two-sample problem can be classiﬁed together as linear combinations of indicator varia- bles for the combined (ordered) samples. Such statistics are often called linear rank statistics. For the following theoretical analysis we will assume that FX and FY are continuous and that there are not ties in the samples. Let 1 if the i-th variable in the combined, Vi := ordered sample is an X-variable 0 else Linear rank statistics can now generally be written in the form ∑N LN = aiVi, i=1 where a1, a2,... are pre-speciﬁed weights (“scores”). Diﬀerent test procedures use diﬀerent speciﬁcations of the scores ai.

• (V1,V2,...,VN ) is a vector consisting of m ones and n zeros.

EconometricsII-Kneip 1–26 N There are diﬀerent possible combinations of these m m ones and n zero, each of which has the same probability under

H0.

• Under H0 : FX = FY the distribution of LN is distribution- free. Critical values can be determined by straightforward combinatorics: q(c) P (LN = c |H0) = , N m

where q(c) denotes∑ the number of vectors (V1,...,VN ) satis- N fying LN = i=1 aiVi = c.

• Moments under H0: m – E(Vi) = N mn – V ar(Vi) = N 2 −mn – Cov(Vi,Vj) = N 2(N−1) This implies ∑ – E(L ) = m N a N N i=1 i ∑ ∑ mn N 2 − N 2 – V ar(LN ) = N 2(N−1) (N i=1 ai ( i=1 ai) ) • Asymptotic distribution (n large):

LN − E(LN ) ZN = √ ∼ AN(0, 1). V ar(LN ) Tests based on linear rank statistics are not consistent against all possible alternatives. However, they can be constructed in such a way that they are particularly powerful in detecting some important types of alternatives, as for example shifts in locati- on. The point is that in many practically relevant situations the EconometricsII-Kneip 1–27 structure of the distributions Fx and FY is quite similar, but there exists a shift in the centers of these distributions (diﬀerent median, means). Mathematically this can be formalized by the concept of stocha- stic dominance. Deﬁnition: A real random variable X (ﬁrst order) stochastically dominates a real random variable Y (written X ≥FSD Y if P (X > z) ≥ P (Y > z) for all z or equivalently

FX (z) ≤ FY (z) for all z

If X ≥FSD Y ., then µX,med > µY,med, where µX,med and µY,med denote the medians of X and Y , respectively. Moreover, if E(X) exists, then E(X) > E(Y ). Tests for the location problem are particulary powerful against alternatives of the form FX (z) < FY (z) or FX (z) > FY (z). Loca- tion tests based on linear rank statistics rely on specifying scores such that a1 < a2 < ··· < an is a strictly monotonically increa- sing sequence. Note that the following tests may also be able to detect alterna- tives where stochastic dominance of one variable is not exactly satisﬁed. They will, however, not be consistent against alternati- ves, where the centers of the distributions are equal and the only diﬀerence lies in the fact that one variable is more dispersed than the other. The Wilcoxon-Mann-Whitney-test (Mann-Whitney-U- test):

The best known two-sample location test is the Wilcoxon-Mann- Whitney-test. The test statistic is a special linear rank statistic EconometricsII-Kneip 1–28 with scores ai = i, i = 1, . . . , n: ∑N ∑m WN = i · Vi = r(Xj) i=1 j=1

For α > 0 let ωN,α denote the α-quantile of the distribution of

Wn under H0.

• Two-sided test (H0 : FX = FY against H1 : FX ≠ FY ):

H0 is rejected if WN ≤ ωN,α/2 or WN ≥ ωN,1−α/2.

• One-sided test (H0 : FX = FY against H1 : FX (z) < FY (z) for all z):

H0 is rejected if WN ≥ ωN,1−α.

• One-sided test (H0 : FX = FY against H1 : FX (z) > FY (z) for all z):

H0 is rejected if WN ≤ ωN,α.

• Unter H0, Wn is distribution-free. Critical values can be ob- tained in a combinatorial way (see above). • m(N+1) mn(N+1) E(WN ) = 2 , V ar(Wn) = 12

• Asymptotic approximation (n large): WN approximatively m(N+1) mn(N+1) normal with mean 2 and variance 12 .

Note: The theoretical derivation of the null distribution relies on the assumption of continuous random variables (probability of ties equal to zero). Ties may of course exist in practice. Then the above distribution are only approximatively valid, and the accuracy of approximation decreases with the number of ties. In the literature there can be found some formulas which provide corrected critical values in the presence of ties.

EconometricsII-Kneip 1–29 The test by van der Waerden

The van der Waerden-test relies on a special linear rank statistic −1 i with scores ai = Φ ( N+1 ). Here, Φ is the distribution functi- on of the standard normal distribution. This leads to the test statistic ∑N i ∑m r(X ) VW = Φ−1( ) · V = Φ−1( j ) N N + 1 i N + 1 i=1 j=1 Critical vaules can again be obtain by using the general results for linear rank statistics.

Both tests mentioned in this section possess a considerable power for detecting shifts in location. Asymptotic relative eﬃciencies are calculated with respect to restricted, parametrized classes of alternatives H1 : FX (t) = FY (t−δ) for some δ ∈ IRand all t ∈ IR. Power for detecting alternatives: • Parametric t-test: Additional assumption: normal distributions with equal va- 2 2 riances, X ∼ N(µ1, σ ) und Y ∼ N(µ2, σ ) ⇒ two-sample t-Test with test statistic X¯ − Y¯ T = √ S 1/n + 1/m

Under H0 the statistic T follows a Student t-distribution with N − 2 degrees of freedom (Rejection of H0 if |T | is to large). • The asymptotic relative eﬃciency of the Wilcoxon-Mann- Whitney-test relative to the t-Test ist 0.955 if the underlying distributions are normal. The Wilcoxon-Mann-Whitney test is more eﬃcient than the t-Test for strongly skewed or heavy

EconometricsII-Kneip 1–30 tailed distribution. A lower bound for the asymptotic relative eﬃciency is 0.864, an upper bound does not exist. • Assuming normal distributions, the asymptotic relative eﬃ- ciency of the an der Waerden Test-test relative to the t-Test is equal to 1. If the distribution have heavy tails, then the Wilcoxon-Mann-Whitney-test is more powerful than the van der Waerden-test.

Scale alternatives: There are also rank tests which are specia- lized to detect whether one random variable is more dispersed than the other (scale alternative). Such tests already rely on the assumption that the centers of the distributions are equal, i.e.,

µX,med = µY,med (which may be tested using a location test). Test statistics are linear rank statistics which assign small values ai to very small and very large observation, and assign large va- lues ai to observations in the center of the distribution. The best known test in this context is the Siegel-Tukey-test. It is based on the test statistic

∑N SN = ai · Vi, i=1 where the weights a are calculated as follows:

a1 = 1, aN = 2, aN−1 = 3, a2 = 4, a3 = 5, aN−2 = 6,

aN−3 = 7, a4 = 8, a5 = 9, aN−4 = 10,...

The critical values of the Siegel-Tukey-test coincide with the cri- tical values of the Wilcoxon-Mann-Whitney-test.

EconometricsII-Kneip 1–31 1.7 Multiple comparisons

In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selec- ted based on the observed values. Errors in inference, including conﬁdence intervals that fail to include their corresponding po- pulation parameters or hypothesis tests that incorrectly reject the null hypothesis are more likely to occur when one considers the set as a whole. This is an important, although largely ignored problem in app- lied econometric work. In empirical studies often dozens or even hundreds of tests are performed for the same data set. When searching for signiﬁcative test results, one may come up with false discoveries. • Multiple tests: In some study many diﬀerent tests are done simultaneously • Example: m diﬀerent, independent test of signiﬁcance level α > 0 (independence means that the test statistics used are mutually independent; this is usually not true in practice).

Assume that the respective null hypothesis H0 holds for each of the m tests Type I error m ⇒ P by at least = 1 − (1 − α) =: αm > α one of the m tests

EconometricsII-Kneip 1–32 m αm 1 0.05 3 0.143 5 0.226 10 0.401 100 0.994 (!) ⇒ Interpretation of signiﬁcant results?

• Analogous problem: Construction of m (1 − α)-conﬁdence intervals at least one of the m conﬁdence m P intervals does not contain = 1 − (1 − α) > α the true parameter value

This represents the general problem of multiple comparisons. In practice, it will not be true that all test statistics used are mu- tually independent. This even complicates the problem. We will still have the eﬀect that the probability of at least one falsely signiﬁcant results increases with the number m of tests, but it will not be equal to 1 − (1 − α)m. A statistically rigorous solution of this problem consists in mo- difying the constructions of tests or conﬁdence intervals in order to arrive at simultaneous tests or simultaneous conﬁdence intervals:

Type I error by P ≤ α at least one of the m tests

EconometricsII-Kneip 1–33 or All conﬁdence interval P simultaneously contain the ≥ 1 − α true parameter values

For certain problems (e.g. analysis of variance) there exist speci- ﬁc procedure for constructing simultaneous conﬁdence intervals. The only generally applicable procedure seems to be the Bonfer- roni correction. It is based on Boole’s inequality.

Theorem (Boole): Let A1,A2,...,Am denote m diﬀerent events. Then

∑m P (A1 ∪ A2 ∪ · · · ∪ Am) ≤ P (Ai). i=1

This inequality also implies that with A¯i denoting the comple- mentary event “not Ai” ∑m P (A1 ∩ A2 ∩ · · · ∩ Am) ≥ 1 − P (A¯i). i=1

Application: Bonferroni adjustment • ∗ α m diﬀerent tests of level α = m : Type I error by ∑m α ⇒ P ≤ = α m at least one of the m tests i=1

• Analogously: Construction of m (1−α∗) conﬁdence intervals,

EconometricsII-Kneip 1–34 ∗ α α = m , all conﬁdence interval ∑m α ⇒ P simultaneously contain the ≥ 1 − = 1 − α m i=1 true parameter values

Example: Regression analysis For n = 40 US corporations a multiple regression model is used to model the observed return of capital Y in dependence of 12 explanatory variables. After eliminating two outliers, the following table provides the results of the regression analysis. Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.24883 0.14386 1.730 0.09603 . WCFTCL 1.11519 0.36955 3.018 0.00579 ** WCFTDT -0.21457 0.39528 -0.543 0.59206 GEARRAT -0.01992 0.10610 -0.188 0.85261 LOGSALE 0.49969 0.18335 2.725 0.01156 * LOGASST -0.48743 0.17500 -2.785 0.01005 * NFATAST -0.30425 0.15446 -1.970 0.06003 . CAPINT -0.08022 0.03706 -2.165 0.04017 * FATTOT -0.11086 0.09125 -1.215 0.23571 INVTAST 0.23047 0.23588 0.977 0.33790 PAYOUT 0.00168 0.01717 0.098 0.92284 QUIKRAT 0.08012 0.10827 0.740 0.46617 CURRAT -0.18976 0.09244 -2.053 0.05070 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0552 on 25 degrees of freedom Multiple R-Squared: 0.6958, Adjusted R-squared: 0.5498 F-statistic: 4.765 on 12 and 25 DF, p-value: 0.0004878

EconometricsII-Kneip 1–35 1.8 Maxima of a ﬁnite sequence of random va- riables

The problem of multiple comparisons is closely connected with the problem of bounding maxi=1,...,m |Xi| for a sequence of ran- dom variables. When the probability distribution of each random variable Xi is known, Boole’s inequality can be used in order to obtain (fairly rough) stochastic bounds for maxi=1,...,m |Xi|. Mo- re precise results can, however, be obtained for some practically important special cases.

1.8.1 Maximum of a sample of bounded random varia- bles

Let X1,...,Xn be an i.i.d. sample, and assume that for some (unknown) θ ∈ IRthe underlying distribution possesses a density f with the following properties: • f(θ) > 0 and f(x) = 0 for all x > θ • For any ϵ > 0, f is continuous in the interval [θ − ϵ, θ]. This implies that all possible values of X are bounded by θ, P (X > θ) = 0. Problem: Estimate θ A natural estimator of θ is given by ˆ θn := X(n) = max Xi i=1,...,n

ˆ ˆ Note that with probability 1 we have θn ≤ θ. θn is therefore a biased estimator of θ. It is fairly straightforward to derive the

EconometricsII-Kneip 1–36 ˆ asymptotic distribution of θn. For any c > 0 we obtain c P (n(θ − θˆ ) ≤ c) = P (X ∈ [θ − , θ] for some i = 1, . . . , n) n i n ∑n c = 1 − P ( I(X ∈ [θ − , θ]) = 0) i n i=1 ∑ n ∈ − c But i=1 I(Xi [θ n , θ]) has a binomial distribution with ∈ − c parameters n and P (X [θ n , θ]). Therefore, ( ) c n P (n(θ − θˆ ) ≤ c) = 1 − 1 − P (X ∈ [θ − , θ] n n → ∞ ∈ − c c 1 But as n , P (X [θ n , θ]) = f(θ) n + o( n ). Furthermore, it is well known that for any λ > 0 λ lim (1 − )n = exp(−λ), n→∞ n and consequently, ( ) c n f(θ)c lim 1 − P (X ∈ [θ − , θ] = lim (1− )n = exp(−f(θ)c). n→∞ n n→∞ n We can conclude that as n → ∞ the asymptotic distribution of ˆ n(θ − θn) is an exponential distribution with parameter f(θ), ˆ n(θ − θn) →D Exp(f(θ))

This type of problems is quite important in economics. The abo- ve setup represents a simple form of an extreme value problem which are, for example, important in ﬁnance. The estimation of (conditional) maxima of observation is the subject of production frontier analysis. The setup of frontier analysis can be described as follows: In an industrial sector there are usually a large number of competing companies. Each ﬁrm produces a production output X on the

EconometricsII-Kneip 1–37 basis of several production inputs z ∈ IRp. For a given input vector z there is a maximal output g(z) which can be produced based on the current state of technology. The function g(z) is called production function. A ﬁrm with input vector z is eﬃcient if its output equals g(z), and it is (to some degree) ineﬃcient if the output is smaller than g(z). For a sample of measured production outputs X1,...,Xn the basic model then can be written as

Xi = g(Zi) + ui, i = 1, . . . , n, where ui is a negative random variable, i.e. P (ui ≤ 0) = 1, which measures the degree of ineﬃciency. This in turn implies that P (Xi ≥ g(z)|Zi = z) = 0. The situation described above corresponds to the trivial case p = 0 with no inputs and g(z) ≡ θ being a ﬁxed constant. In practice, there will of course always exist a number p > 0 of important input variables which leads to the much more compli- cated problem of estimating conditional maxima. Diﬀerent esti- mation methods (e.g. data envelopment analysis) have been deve- loped in deterministic frontier analysis. Procedures of stochastic frontier analysis are based on a variant of the above model which adds a normally distributed measurement error ϵi, i.e. it is assu- med that Xi = g(Zi)+ui +ϵi, i = 1, . . . , n,. For some overview see e.g. • Cooper, Seiford and Tone (2006): Introduction to data enve- lopment analysis and its uses, Springer Verlag • Kumbhakar and Lovell (2000): Stochastic frontier analysis, Cambridge University Press

EconometricsII-Kneip 1–38 Example of stochastic frontier analysis (p = 1):

1.8.2 Maximum of normal variables

Let X1,...,Xm be a collection of standard normal random va- riables, i.e. Xi ∼ N(0, 1). Note that it is not assumed that the variables are independent. | | Problem: Establish a bound for supi=1,...,m Xi which is valid for large m. We ﬁrst establish a simple tail bound for a standard normal va- riable X: For any c > 0 ∫ 1 ∞ t2 P (X ≥ c) = √ exp(− )dt 2 2π ∫c 1 ∞ t t2 ≤ √ exp(− )dt c 2 2π c 2 ∞ 2 1 t 1 c = √ exp(− ) = √ exp(− ) c 2π 2 c c 2π 2

EconometricsII-Kneip 1–39 √ Let A be some constant with A > 2. Using Boole’s inequality we can then infer from the above bound that √ sup |Xi| ≤ A log m i=1,...,m

2 1 − A +1 holds with probability at least 1− √ √ m 2 . Note that A log m 2π as m → ∞ this probability converges to 1. This bound is heavily used in wavelet regression and high-dimensional model selection procedures like the Lasso. For example assume a standard linear regression model with normal errors and a ve- ry large number m ≈ n of explanatory variables. For the the ˆ estimated regression coeﬃcient βj we have √ n(βˆ − β ) √j j ∼ N(0, 1), j = 1, . . . , m σ qjj

2 where σ is the error variance, and qjj is the jth diagonal element 1 T −1 × of the matrix ( n XX ) , where in this case X is the n m dimensional matrix of regressors. Hence, whenever βj = 0 we have √ nβˆ √ j ∼ N(0, 1), σ qjj and the above bound implies that √ ˆ √ nβj √ ≤ A log m for all j ∈ {1, . . . , m} with βj = 0 σ qjj holds with high probability if m is large.

EconometricsII-Kneip 1–40 1.9 More on quantiles

Quantiles and quantile regression are an important empirical tool in risk analysis. For non-normal data quantile regression oﬀers a robust alternative to usual least squares methods.

1.9.1 The check function

It is well known that if E(X2) < ∞ the mean µ = E(X) is obtained by minimizing squared loss: ( ) µ = arg min E (X − c)2 . c∈IR

If E(|X|) < ∞, then the median is obtained by minimizing L1- loss (absolute deviations):

µmed = arg min E (|X − c|) . c∈IR The condition E(|X|) < ∞ can be avoided by rewriting the mi- nimization problem in the (otherwise equivalent) form

µmed = arg min E (|X − c| − |X|) . c∈IR Note that E (|X − c| − |X|) < ∞ for any real valued random variable X and every c ∈ IR. In general, for every τ ∈ (0, 1) the τ-quantile Q(τ) can be obtai- ned by minimizing expected loss with respect to a an asymmetric linear loss function bases on the

Check function: ρτ (u) = (τ − I(u < 0))u, u ∈ IR

EconometricsII-Kneip 1–41 Q(τ) then minimizes

Vτ (q) := E (ρτ (X − q)) = τE (|X − q| · I(X > q)) + (1 − τ)E (|X − q| · I(X < q)) ∫ ∫ = τ |x − q|dF (x) + (1 − τ) |x − q|dF (x) x>q xmoment conditions on X can be avoided by formally considering the (otherwise equivalent) problem of mini- mizing E (ρτ (X − q)) − E (ρτ (X)) with respect to q. In order to verify that Q(τ) is indeed the minimizer of the above minimization problem let us analyze the structure of Vτ (q). The following arguments also apply to the modiﬁed version Vτ (q) =

E (ρτ (X − q)) − E (ρτ (X)) (to be used if E(|X|) does not exist). It is easily seen that

• Vτ (q) = E (ρτ (X − q)) is a continuous function of q.

• If F (q) is continuous at q, then Vτ (q) is diﬀerentiable at q. • If P (X = q) > 0, then F (u) has a jump at u = q, while

Vτ (u) has a kink (i.e. is not diﬀerentiable) at u = q. One can, however, always deﬁne directional derivatives, i.e. right and

left derivatives when considering the limits Vτ (q − |∆|) and

Vτ (q + |∆|) as ∆ → 0. More precisely, For the left-derivative we have

∂Vτ (q) 1 = lim Vτ (q + |∆|) = −τP (X > q) + (1 − τ)P (X ≤ q) ∂q+ ∆→0 |∆| = −τ + P (X ≤ q),

EconometricsII-Kneip 1–42 while the right derivative is given by

∂Vτ (q) 1 = lim Vτ (q − |∆|) = −τP (X ≥ q) + (1 − τ)P (X < q) ∂q− ∆→0 −|∆| = −τ + P (X < q).

Now let Q(τ) denote a τ-quantile of X. Note that for any q ∈ IR with F (q) = P (X ≤ q) > F (Q(τ)) ≥ τ, we also have P (X < q) = F (q) − P (X = q) ≥ τ. Therefore,

• ∂Vτ (q) > 0 and ∂Vτ (q) ≥ 0 for any q ∈ IR with F (q) > ∂q+ ∂q− F (Q(τ))

• ∂Vτ (q) < 0 and ∂Vτ (q) < 0 for any q ∈ IR with F (q) < ∂q+ ∂q− F (Q(τ))

This implies that Q(τ) minimizes Vτ (q). If X is a continuous random variable, then necessarily F (Q(τ)) = τ, and any solution hence satisﬁes the ﬁrst order condition

0 = −τ + P (X ≤ q(τ)) = −τ + F (q(τ)).

Recall from the deﬁnition of quantiles that the solution is not necessarily unique. If F has constant segments, there may exist an interval of possible values for Q(τ). But Q(τ) is necessarily unique if F is continuous and if the corresponding density f satisﬁes f(Q(τ)) > 0.

Let X1,...,Xn be an i.i.d. random sample from X. The above arguments also imply that sample quantiles Qn(τ), τ ∈ (0, 1), can be obtained by minimizing ρτ with respect to the empirical

EconometricsII-Kneip 1–43 distribution function. Any possible value Qn(τ) minimizes 1 ∑n V (q) := ρ (X − q) τ,n n τ i i=1 1 ∑ 1 ∑ = τ |X − q| + (1 − τ) |X − q| n i n i ∫ i:Xi>q i:∫Xi

= τ |x − q|dFn(x) + (1 − τ) |x − q|dFn(x) x>q x

Assume that the distribution of X possesses a density f with f(Q(τ)) > 0. Then Q(τ) is unique, and it is easy to show that

Qn(τ) is a consistent estimator of Q(τ). Furthermore, ( ) √ τ(1 − τ) n(Q (τ) − Q(τ)) → N 0, n D f(Q(τ))2

1.9.2 Quantile regression

Quantile regression plays an increasingly important role in eco- nometrics. It opens a way to explore regression relationship in depth. Much more information can be obtained than by using trasitional least squares regression which only aims to quantify a conditional mean. Furthermore, a crucial property is robustness. In particular, median regression is preferable to least squares re- gression when dealing with heavy-tailed distributions.

Assume an i.i.d sample (Y1,X1),..., (Yn,Xn), where Yi ∈ IR is k a response variable of interest, while Xi ∈ IR is a vector of explanatory variables. We are now interested in determining quantiles of the conditional distribution of Y given X. For any vector x ∈ IRk there is a conditional distribution function FY |X=x(y) = P (Y = y|X = x)

EconometricsII-Kneip 1–44 and a corresponding conditional quantile function QY |X=x(τ), τ ∈ (0, 1). In the following we will assume that all conditional distribution functions are continuous which implies the existence of conditional densities fY |X=x(·).

Note that if Y and X are independent, then FY |X=x = FY and k QY |X=x(·) = QY (·) for all x ∈ IR , where FY and QY denote the (marginal) distribution and quantile functions of Y , respectively.

Otherwise, FY |X=x and QY |X=x will depend on the value X = x. Standard quantile regression now rests upon the assumption that for a given τ ∈ (0, 1)

T k QY |X=x(τ) = x βτ for some βτ ∈ IR

If this assumption holds for all τ ∈ (0, 1), we arrive at the general model T Yi = Xi β(Zi), where the random variable Zi ∼ U(0, 1) is independent of Xi, and k β : (0, 1) → IR is a measurable function such that β(τ) = βτ . Special cases:

T 1) Simple OLS model with Xi = (1,Xi1,...,Xi,k−1)

∑k Yi = β1 + βjXij + ϵi, i = 1, . . . , n, j=2

where ϵ1, . . . , ϵn are i.i.d errors with continuous strictly mo-

notonically increasing distribution function Fϵ. Then ∑k ∑k −1 −1 Q | (τ) = β + β X +F (τ) = β + F (τ) + β X Y X=Xi 1 j ij ϵ | 1 {zϵ } j ij j=2 j=2 βτ,1

EconometricsII-Kneip 1–45 2) Heteroskedastic errors: ∑k ∑k Yi = βjXij + ( γjXij)ϵi, i = 1, . . . , n, j=1 j=1

where ϵ1, . . . , ϵn are i.i.d errors with continuous strictly mo- notonically increasing distribution function Fϵ, and αj, γj ∈ IR, j = 1, . . . , k. Then ∑k ∑k ∑k −1 QY |X=Xi (τ) = βjXij + ( γjXij)Fϵ (τ) = βτ,jXij j=1 j=1 j=1

−1 for βτ,j = βj + γjFϵ (τ), j = 1, . . . , k. A remarkable property of this approach is its equivariance to monotonic transformations: For a nondecreasing function h we T have Qh(Y )|X=Xi (τ) = h(QY |X=Xi (τ)). For example, if ατ +x βτ T is the τth conditional quantile of log Y , then exp(ατ + x βτ ) is the τth conditional quantile of Y . k The coeﬃcients βτ ∈ IR can be estimated by using the check- ˆ function approach. Estimates βτ are determined by minimizing ∑n − T Vτ,n(β) := ρτ (Yi Xi β) ∑ i=1 ∑ | − T | − | − T | = τ Yi Xi β + (1 τ) Yi Xi β T T i:Yi>Xi β i:Yi

EconometricsII-Kneip 1–46 • The structure of Vτ,n(β) is similar to the structure of the

function Vτ,n(q) analyzed before. Since Vτ,n(β) is not diﬀe- rentiable with respect to β, the estimator does not have a closed analytical form. There exist, however, very eﬃcient li- near programming algorithms which allow to determine esti- mates numerically. ˆ • Due to a linear loss function, βτ is much more robust to outliers than the least squares estimator. • Quantile regression is not the same as regressions based on split samples because every quantile regression utilizes all sample data (with diﬀerent weights). Thus, quantile regres- sion also avoids the sample selection problem arising from sample splitting. It is possible to calculate a measure for goodness-of-ﬁt of a quan- tile regression model by generalizing the usual notion of R2 and calculating V (βˆ ) R1(τ) = 1 − τ,n τ , V˜τ,n(Qy,n(τ)) where Qy,n(τ) is∑ the τth sample quantile of Y1,...,Yn, and ˜ n − Vτ,n(Qy,n(τ) = i=1 ρτ (Yi Qy,n(τ)). Standard asymptotic theory for quantile regression has to be ba- sed on the following assumptions:

a) (Y1,X1),..., (Yn,Xn) is an i.i.d random sample from (Y,X) ( ) ∥ ∥2 b) The regressors have bounded second moment, i.e. E Xi 2 < ∞

c) For any x ∈ IRk, the conditional distribution of the “error T terms” ϵi := Yi−x βτ given X = x has a density fτ (ϵ|X = x)

EconometricsII-Kneip 1–47 satisfying ∫ 0 fτ (ϵ|X = x)dϵ = τ. −∞ T Note that necessarily fτ (0|X = x) = fY |X=x(x βτ ). d) The regressors and error density are such that the k × k matrix ( ) | T Cτ := E fτ (0 Xi)XiXi is positive deﬁnite. ˆ Under these conditions it can be shown that( βτ is) a (weakly) T consistent estimator of βτ , and with M := E XiXi we obtain √ ( ) ˆ − → − −1 −1 n(βτ βτ ) D N 0, τ(1 τ)Cτ MCτ

If fτ (0|X = x) ≡ fτ (0) does not depend on x. i.e., conditional homogeneity, then the result simpliﬁes to ( ) √ − ˆ − → τ(1 τ) −1 n(βτ βτ ) D N 0, 2 M fτ (0) Inference is either based on estimates of the covariance matrix or (more frequently) on the bootstrap. The diﬃculty in approximating the asymptotic covariance ma- trix consists in estimating the values fτ (0|Xi) of the conditional densities (or of fτ (0) under an homogeneity assumption). A pos- sibility is to use (conditional) kernel density estimators. Other procedures have, for example, been proposed by Hendricks and Koenker (1991), or Powell (1991). Let us ﬁnally compare the eﬃciency of ordinary least squares estimators with the estimators to be obtained by LAD (τ = 1/2) under the simple model

T Yi = Xi β + ϵi, i = 1, . . . , n,

EconometricsII-Kneip 1–48 with i.i.d. errors ϵ1, . . . , ϵn. Also assume that the error distribu- tion is symmetric around 0 and has a density f with f(0) > 0.

For τ = 1/2 we then have βτ = β. The model implies conditional homogeneity, and therefore ( ) √ 1 n(βˆ − β) → N 0, M −1 τ D 4f(0)2 Under the additional moment conditions necessary to derive asym- ptotics for the least squares estimator βˆ we obtain √ ( ) ˆ 2 −1 n(β − β) →D N 0, σ M ,

2 where σ = Var(ϵi). For normally distributed errors the least squares estimator βˆ is ˆ an asymptotically more eﬃcient estimator than βτ . In this ca- se, f(0) = √1 and hence the asymptotic variance of βˆ is σ 2π τ 2πσ2 −1 ≈ 2 −1 4 M 1.57σ M , which is larger than the asymptotic variance of βˆ. The situation changes, however, when considering heavy-tailed error distributions. Examples of heavy-tailed, symmetric distri- butions are the student t(ν)-distributions for small values of ν.

• Assume that ϵi ∼ t(1). This is, of course, a very extreme case since the t(1)-distribution (which is also a particular case of the so-called Cauchy-distribution) does not possess any

moments. Even E(|ϵi|) does not exist. This implies that the ordinary least squares estimator is not even consistent in this case (no central limit arguments apply). Median regression is still applicable and asymptotic normality still holds. We 1 then have f(0) = π .

• Assume that ϵi ∼ t(3). Then Var(ϵi) = 3 and the asympto- tic variance of βˆ is thus equal to 3M −1. At the same time,

EconometricsII-Kneip 1–49 √ 6 3 ˆ we have f(0) = 9π , and the asymptotic variance of βτ is approximately 1.85M −1 < 3M −1.

1.10 Appendix: Statistical test procedures

1.10.1 Basic concepts

Assume a random sample X1,...,Xn, where the distributions of

X1,...,Xn depend on some unknown parameter θ ∈ Ω, where Ω is some parameter space.

General Testing problem:

H0 : θ ∈ Ω0 against H1 : θ ∈ Ω1.

H0 is the null hypothesis, while H1 is the alternative. Ω0 ⊂ Ω and Ω1 ⊂ Ω are used to denote the possible values of θ under H0 and H1, respectively. Necessarily, Ω0 ∩ Ω1 = ∅. For a large number of tests we have Ω = IRand the respective null hypothesis states that θ has a speciﬁc value θ0 ∈ IR, i.e., Ω0 =

{θ0} and H0 : θ = θ0. Depending on the alternative one then often distinguishes between one-sided (Ω1 = (θ0, ∞) or Ω1 =

(−∞, θ0)) and two-sided tests (Ω1 = {θ ∈ IR|θ ≠ θ0}). Statistical hypothesis testing: The data is used in order to decide whether to accept or to reject H0.

EconometricsII-Kneip 1–50 Test statistic: Every hypothesis test relies on a corresponding test statistic T = T (X1,...,Xn). Any test statistics is a real valued random variable, and for given observations the resulting observed value Tobs is used to decide between H0 and H1. Ge- nerally, the distribution of T under H0 is analyzed in order to deﬁne a rejection region C:

• Tobs ̸∈ C ⇒ H0 is not rejected

• Tobs ∈ C ⇒ H0 is rejected

Typically C is of the form (−∞, c0], [c1, ∞) or (−∞, c0] ∪ [c1, ∞). The limits of the respective intervals are cal- led critical values, and are obtained from quantiles of the null distribution (null distribution ≡ distribution of T under H0).

Type I error: H0 is rejected when it is true

Type II error: the test fails to reject a false H0

In a statistical signiﬁcance test, the probability of a type I error is controlled by the signiﬁcance level: Signiﬁcance level α (e.g. α = 5%)

P ( type I error ) = P (T ∈ C| H0 true) ≤ α

Note: sup P (T ∈ C|θ ∈ Ω0) is called the size of the test. The preselected signiﬁcance level is a bound for the size, which may not be attained if, e.g., the relevant probability function is dis- crete. Practically important signiﬁcance levels: • α = 0.05 - It is common to say that a test result is “signiﬁ-

cant” if a hypothesis test of level α = 0.05 rejects H0 • α = 0.01 - It is common to say that a test result is “strongly

signiﬁcant” if a hypothesis test of level α = 0.01 rejects H0 EconometricsII-Kneip 1–51 Statistical software usually determines the p-value of a test. p-value = probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true

Interpretation: • The p-value is random since it depends on the observed da- ta. Diﬀerent random samples will lead to diﬀerent p-values. • For given data, having determined the p-value of a test we also know the test decisions for all possible levels α:

α > p-value ⇒ H0 is rejected

α < p-value ⇒ H0 is accepted

Example: X ∼ N(µ, σ2); Observation from an i.i.d sample of size n = 5:

X1 = 19.20, X2 = 17.40, X3 = 18.50, X4 = 16.50, X5 = 18.90, ⇒ X¯ = 18.1

Testing problem:

H0 : µ = 17 against H1 : µ ≠ 17 (two-sided test) Since the variance is unknown, we have to use a t-test in order to test H0. Test statistic of the t-test: √ n(X¯ − µ ) T = 0 , S ∑ 2 1 n − ¯ 2 2 where S = n−1 i=1(Xi X) is an unbiased estimator of σ .

√ 5(18.1 − 17) T = = 2.187 obs 1.125 ⇒ p-value = P (|Tn−1| ≥ 2.187) = 0.094

EconometricsII-Kneip 1–52 t-test for diﬀerent signiﬁcance levels α:

α = 0.2 ⇒ 2.187 > t4,0.9 = 1.533 ⇒ H0 is rejected

α = 0.1 ⇒ 2.187 > t4,0.95 = 2.132 ⇒ H0 is rejected

α = 0.094 = p-Wert ⇒ 2.187 = t4,0.953 = 2.187 ⇒ H0 is rejected

α = 0.05 ⇒ 2.187 < t4,0.975 = 2.776 ⇒ H0 is accepted

α = 0.01 ⇒ 2.187 < t4,0.995 = 4.604 ⇒ H0 is accepted

1.10.2 The power function

For every possible value θ ∈ Ω0 ∪ Ω1, all sample sizes n and each signiﬁcance level α the corresponding value of the power function β is given by the probability

βn,α(θ)

:= P (H0 is rejected, if the true parameter value equals θ)

Obviously, βn,α(θ) ≤ α for all θ ∈ Ω0. For any θ ∈ Ω1, 1−βn,α(θ) is the probability of committing a type II error. The power function is an important tool for accessing the quality of a test and for comparing diﬀerent test procedures. Obviously, the power of a test depends on the true value θ ∈ Ω, the sample size n, and on the signiﬁcance level α. Some important terminology: • If possible, a test is constructed in such a way that size equals

level, i.e., βn,α(θ) = α for some θ ∈ Ω0. In some cases, ho- wever, as for discrete test statistics or complex, composi- te null hypothesis, it is not possible to reach the level, and

EconometricsII-Kneip 1–53 supθ∈Ω0 βn,α(θ) < α. In this case the test is called “conser- vative”. • Unbiased test: A signiﬁcance test of level α > 0 is called

“unbiased” if βn,α(θ) ≥ α for all θ ∈ Ω1. • Consistent Test: A signiﬁcance test of level α > 0 is called “consistent” if

lim βn,α(θ) = 1 n→∞

for all θ ∈ Ω1. When choosing between diﬀerent testing procedures for the same testing problem, one will usually prefer the “most powerful” test. Consider a ﬁxed sample size n.

• For a speciﬁed θ ∈ Ω1, a test with power function βn,α(θ) is said to be most powerful for θ if for any alternative test ∗ with power function βn,α(θ), ≥ ∗ βn,α(θ) βn,α(θ) holds for all levels α > 0.

• A test with power function βn,α(θ) is said to be uniformly most powerful against the set of alternatives Ω1 if for any ∗ alternative test with power function βn,α(θ), ≥ ∗ ∈ βn,α(θ) βn,α(θ) holds for all θ Ω1, α > 0

Unfortunately, uniformly most powerful tests only exist for very special testing problems.

Example: Let X1,...,Xn be an i.i.d. random sample for X. Assume that n = 9, and that X ∼ N(µ, 0.182). Hence, in this simple example only the mean µ = E(X) is unknown, while the standard deviation has the known value σ = 0.18.

EconometricsII-Kneip 1–54 Testing problem:

H0 : µ = µ0 against H1 : µ ≠ µ0 for µ0 = 18.3. Since the variance is known, a test may rely on the test statistics √ n(X¯ − µ ) 3(X¯ − 18.3) Z = 0 = σ 0.18

Under H0 we have Z ∼ N(0, 1), and for the signiﬁcance level α = 0.05 the null hypothesis is rejected if

|Z| ≥ z1−α/2 = 1.96

Here z1−α/2 denotes the corresponding quantile of the standard normal distribution. Note that the size of this test equals its level α = 0.05. For determining the rejection region of a test it suﬃces to de- termine the distribution of the test statistic under H0. But in order to calculate the power function one needs to quantify the distribution of the test statistic for all possible values θ ∈ Ω. For many important problems this is a formidable task. In our example it is, however, quite easy. Note that for any µ ∈ R the corresponding distribution of Z ≡ Zµ is √ √ (√ ) n(µ − µ ) n(X¯ − µ) n(µ − µ ) Z = 0 + ∼ N 0 , 1 µ σ σ σ This implies that with Φ denoting the distribution function of the standard normal distribution we obtain ( ) βn,α(µ) = P |Zµ| > z1−α/2 ( √ ) ( √ ) n(µ − µ0) n(µ − µ0) = 1 − Φ z − − + Φ −z − − 1 α/2 σ 1 α/2 σ

This example illustrates the power function of a “good” test. Un- der H0 : µ = µ0 we have βn,α(µ0) = α. The test is unbiased, since

EconometricsII-Kneip 1–55 βn,α(µ0) > α for any µ ≠ µ0. Furthermore, the test is consistent, since limn→∞ βn,α(µ) = 1 for every ﬁxed µ ≠ µ0.

For ﬁxed sample size n, βn,α(µ) increases as the distance |µ−µ0| ∗ ∗ increases. If |µ−µ0| > |µ −µ0| then βn,α(µ) > βn,α(µ ). On the other hand, βn,α(µ) decreases as the size α of the test decreases. ∗ If α > α then βn,α(µ) > βn,α∗ (µ).

The example values µ0 = 18.3, n = 9 and σ = 0.18 lead to

• µ = 18.36 and α = 0.05 ⇒ β9,0.05(18.36) = 0.168

• µ = 18.48 and α = 0.05 ⇒ β9,0.05(18.48) = 0.873

• µ = 18.48 and α = 0.01 ⇒ β9,0.05(18.48) = 0.663

1.10.3 Asymptotic relative eﬃciency

In statistical literature power comparisons of diﬀerent tests are most frequently based on asymptotic theory. Explicit power cal- culations can usually only be done for simple structured hypothe- ses. We will thus only consider the case that the testing problem concerns the value of some real-valued parameter θ:

H0 : θ = θ0 against H1 : θ > θ0, for some pre-speciﬁed θ0 ∈ IR. Two-sided tests can be analyzed analogously.

Any corresponding test is based on a test statistic T , and H0 is rejected if the observed value of T is too large. The distribution of this test statistic will depend on the sample size n and on the true value θ, i.e., T ≡ Tn(θ). A ﬁrst assumption is that, as usual, the asymptotic distribution of Tn(θ) is asymptotically normal. More precisely, we now assume

EconometricsII-Kneip 1–56 that there exist some functions eT (θ) and σT (θ) such that for all θ ∈ Ω = [θ0, ∞) √ n(Tn(θ) − eT (θ)) →D N(0, 1), σT (θ) as n → ∞. We will require that σT (θ) is continuous, while eT (θ) is strictly monotonically increasing and continuously diﬀerentia- ′ → ∞ ble in θ with eT (θ0) > 0. One can conclude that as n an asymptotic approximation of the power function of the test is given by (√ ) n(Tn(θ) − eT (θ0)) β (θ) = P > z − n,α σ (θ ) 1 α/2 (√ T 0 √ ) n(Tn(θ) − eT (θ)) n(eT (θ) − eT (θ0)) σT (θ0) = P + > z − σ (θ) σ (θ) σ (θ) 1 α/2 ( T √ T ) T σT (θ0) n(eT (θ) − eT (θ0)) = 1 − Φ z1−α/2 − σT (θ) σT (θ) for any given signiﬁcance level α > 0. e Consider an alternative test with test statistic Tn(θ) possessing similar properties such as n → ∞ its power function can be approximated by ( ) √ e e n(Tn(θ0) − eT (θ0)) βn,α(θ) = P > z1−α/2 σeT (θ) ( √ ) σeT (θ0) n(eT (θ) − eT (θ0)) = 1 − Φ z1−α/2 − σeT (θ) σeT (θ)

We again assume that σeT (θ) is continuous, while eT (θ) is strict- ly monotonically increasing and continuously diﬀerentiable in θ e′ with eT (θ0) > 0. This already shows that asymptotically it does not make much sense to compare the eﬃciency of the two tests on the basis of

EconometricsII-Kneip 1–57 a ﬁxed alternative θ > 0, since then limn→∞ βn,α(θ) = 1 as well e as limn→∞ βn,α(θ) = 1. The idea is then to consider sequences of local alternatives θm with θm → θ0 as m → ∞.

For an arbitrary d > 0 deﬁne a sequence {θm}m=1,2,... ⊂ Ω1 such − √d that θm θ0 = m for all m = 1, 2,... . Asymptotic eﬃciency analysis then poses the following questions for arbitrary α < β < 1 and large m: • When using the ﬁrst test, how many observations n(m) do

we need such that the probability of rejecting H0 is at least β when θ = θm? • When using the second test, how many observations ne(m)

do we need such that the probability of rejecting H0 is at least β when θ = θm? Formally this requires to determine n(m) and ne(m) such that e βn(m),α(θm) ≈ β and βne(m),α(θm) ≈ β. Taylor expansions of eT (θ) and eT (θ) yield √ √ n(m)(e (θ ) − e (θ )) n(m)e′ (θ )(θ − θ ) n(m) T m T 0 = T 0 m 0 + o( ) σ (θ ) σ (θ ) m √ T m √ T 0 ne(m)(e (θ ) − e (θ )) ne(m)e′ (θ )(θ − θ ) ne(m) = T m T 0 , T 0 m 0 + o( ) σeT (θm) σeT (θ0) m

σT (θ0) σeT (θ0) On the other hand, z − → z − as well as z − → σT (θ) 1 α/2 1 α/2 σeT (θ) 1 α/2 → ∞ − z1−α/2 as m ( . Furthermore,) with γ(α, β) := z1−α/2 z1−β we obtain 1 − Φ z1−α/2 − γ(α, β) = β. Therefore, with n(m) = 2 2 2 2 γ(α,β) σT (θ0) γ(α,β) σeT (θ0) m ′ 2 2 and ne(m) = m ′ 2 2 , e (θ0) d e (θ0) d T √ T n(m)(e (θ ) − e (θ )) lim T m T 0 = γ(α, β) m→∞ σ (θ ) √ T m ne(m)(e (θ ) − e (θ )) lim T m T 0 = γ(α, β), →∞ m σeT (θm)

EconometricsII-Kneip 1–58 and we can conclude that 2 2 γ(α, β) σT (θ0) lim βn(m),α(θm) = β for n(m) = m ′ m→∞ 2 2 eT (θ0) d 2 2 e γ(α, β) σeT (θ0) lim βne(m),α(θm) = β for ne(m) = m ′ m→∞ e 2 2 eT (θ0) d

ne(m) The quotient n(m) is a unique number which does obviously not depend on α, β or the speciﬁc choice of d used to construct

{θm}m=1,2,.... This quotient deﬁnes the asymptotic relative eﬃciency of test T relative to test Te: e ′ 2e 2 e n(m) eT (θ0) σT (θ0) ARE(T, T ) = lim = ′ m→∞ e 2 2 n(m) eT (θ0) σT (θ0) Interpretation: • ARE(T, Te) = 1 ⇒ both tests equally eﬃcient (for detecting local alternatives) • ARE(T, Te) = γ < 1 ⇒ Test Te is more eﬃcient than test T ! In order to achieve (approximately) identical local power Te needs fewer observations (by the factor γ) than test T . • ARE(T, Te) = γ∗ > 1 ⇒ Test T is more eﬃcient than test Te! In order to achieve (approximately) identical local power Te needs more observations than test T .

Remark: If a test statistic is based on an (asymptotically) un- biased estimator of the true value of θ, i.e., E(Tn(θ)) = θ, then − − ′ eT (θm) eT (θ0) = θ θ0 and hence eT (θ0) = 1. When comparing two tests based on diﬀerent unbiased estimators ARE therefore 2 e σeT (θ0) reduces to ARE(T, T ) = 2 . The “better” estimator with a σT (θ0) smaller asymptotic variance thus also deﬁnes the more eﬃcient test.

EconometricsII-Kneip 1–59