Statistical Inference and Methods
David A. Stephens
Department of Mathematics Imperial College London
[email protected] http://stats.ma.ic.ac.uk/∼das01/
13th December 2005
David A. Stephens Statistical Inference and Methods Methods of Inference
Part II
Session 2: Methods of Inference
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 1/ 132
Frequentist considerations Likelihood Quasi-likelihood Estimating Equations Generalized Method of Moments Bayesian
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 2/ 132
Repeated observation of random variables X1, X2,..., Xn yields data x1, x2,..., xn.
Parametric Probability Model (pdf) fX |θ(x; θ). Objective is inference about parameter θ, a parameter in p p dimensions in parameter space Θ ⊆ R . We seek a procedure for producing estimators of θ that have desirable properties.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 3/ 132
Estimators: An estimator, Tn, derived from a random sample of size n is a statistic, any function of the random variables to be observed X1,..., Xn:
Tn = t(X1,..., Xn)
An estimate, tn, is a real value determined as the observed value of an estimator by data x1,..., xn.
tn = t(x1,..., xn) We will assess the worth of an estimator and the procedure used to produce it by inspecting its frequentist (empirical) properties assuming, for example, that the proposed model is correct.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 4/ 132
Desirable Properties of Estimators: Suppose the true model has θ = θ0. Unbiasedness
EX |θ0 [Tn] = θ0 Asymptotic Unbiasedness
lim EX |θ [Tn] = θ0 n→∞ 0
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 5/ 132
Consistency: As n −→ ∞, Weak consistency: p Tn −→ θ0
Strong consistency: a.s. Tn −→ θ0
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 6/ 132
Laws of Large Numbers Under regularity conditions, for function g, as n −→ ∞, Weak Law: n 1 X p g(X ) −→ E [g(X )] n i X |θ0 i=1
Strong Law: n 1 X a.s. g(X ) −→ E [g(X )] n i X |θ0 i=1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 7/ 132
For unbiased/asymptotically unbiased (but inconsistent) ? estimators, an estimator Tn is efficient if it has smaller variance than all other unbiased estimators. Efficiency ? VarX |θ0 [Tn ] ≤ VarX |θ0 [Tn] Asymptotic Efficiency
? lim VarX |θ [T ] ≤ lim VarX |θ [Tn] n→∞ 0 n n→∞ 0
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 8/ 132
In summary, an estimator should have good frequentist properties it should recover the true value of the parameter as the sample size becomes infinite (consistency) if an estimator is inconsistent, it may be at least asymptotically unbiased, in which case the asymptotic distribution should have low variance (efficient) However, consistency/asymptotic unbiasedness are not in themselves desirable ...
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 9/ 132
EXAMPLE: Let X1,..., Xn ∼ N(θ, 1). Then the two estimators
n 1 X T = X = X 1n n i i=1 100100 T = T + 2n 1n n are both consistent and asymptotically unbiased for θ, with the same asymptotic variance. However, their finite sample behaviours are somewhat different ...
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 10/ 132
Finite sample behaviour is also crucial. Could consider
Sampling distribution of Tn for finite n, that is the empirical behaviour Tn over different random samples of size n an asymptotic approximation to this distribution suitable for large n For example, we typically construct an Asymptotic Normal approximation to this distribution
2 Tn ∼ AN(µn, σn)
for suitable values of µn and σn.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 11/ 132
The Standard error of an estimator Tn of parameter θ is q s.e. (Tn; θ) = VarfX |θ [Tn] = se (θ)
for some function se . The estimated standard error is e.s.e (Tn) = se bθn
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 12/ 132
Likelihood Methods
We seek a general method for producing estimators/estimates from data under a presumed model that utilizes the observed information in the most effective fashion.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 13/ 132
The likelihood function:
L(θ; x) = fX |θ(x1,..., xn; θ) and under independence
n Y L(θ; x) = fX |θ(xi ; θ) i=1 The log-likelihood function:
n X l(θ; x) = log fX |θ(xi ; θ) i=1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 14/ 132
Objective: Inference about θ via L or l
Assertion: The likelihood contains all relevant information about parameter θ represented by the data.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 15/ 132
Maximum Likelihood: Estimate θ by bθn = t(x1,..., xn)
bθn(x1,..., xn) = arg max l(θ; x) θ∈Θ with corresponding estimator
bθn(X1,..., Xn)
Maximum likelihood estimate/estimator (mle) bθn is often computed as a zero-crossing of the first derivative of l(θ, x).
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 16/ 132
Let T ˙ ∂l ∂l l(θ; x) = Ol(θ; x) = ,..., ∂θ1 ∂θp
be the vector of first partial derivatives. Then bθn solves
l˙(θ; x) = 0.
Score function: Sθ(X ) = l˙(θ; X ). Note: in many models
EX |θ[Sθ(X )] = 0.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 17/ 132
Hessian Matrix:
h i ∂2l H(θ; x) = l¨(θ; x) = ij ∂θi θj be the p × p matrix of second partial derivatives. Define A ¨ Ψθ (X ) = −l(θ; X ).
Consider also B T Ψθ (X ) = Sθ(X )Sθ(X ) .
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 18/ 132
Then for many models
A B EX |θ[Ψθ (X )] = EX |θ[Ψθ (X )] = nI(θ)
where I(θ) is the unit Fisher Information for the model. I(θ) is a positive definite/non-singular and symmetric matrix. Let
J (θ) = I(θ)−1.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 19/ 132
We can consider sample-based versions of these quantities Observed Score: Sθ(x) = l˙(θ; x). Observed Unit Information:
n 1 X I A(n, θ) = ΨA(x ) n n θ i i=1 or n 1 X I B (n, θ) = S (x )S (x )T n n θ i θ i i=1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 20/ 132
Note: by the Laws of Large Numbers, as n −→ ∞,
A p B p In (n, θ0) −→ I(θ0) In (n, θ0) −→ I(θ0)
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 21/ 132
Cramer-Rao Efficiency Bound
An efficiency bound for unbiased estimators: if Tn is unbiased, then under regularity conditions,
−1 −1 h A i h B i VarX |θ[Tn] ≥ EX |θ[Ψθ (X )] = EX |θ[Ψθ (X )] This is the Cramer-Rao Lower Bound.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 22/ 132
Properties of mles Under regularity conditions, the mle is consistent asymptotically unbiased
asymptotically efficient, with asymptotic variance J (θ0) equal to the Cramer-Rao lower bound.
invariant: if bθn estimates θ, and φ = g(θ), then φbn = g(bθn).
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 23/ 132
Asymptotic Normality Using the CLT,
√ L n(bθn − θ0) −→ N(0, J (θ0)) or −1 bθn ∼ AN(θ0, n J (θ0)) √ i.e. bθn converges to θ0 at rate n.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 24/ 132
Hypothesis Testing & Confidence Intervals
We seek a general method for testing a specific hypothesis about parameters in probability models.
H0 : θ = c0 H1 : θ 6= c0
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 25/ 132
There are five crucial components to a hypothesis test, namely
Test Statistic, Tn
Null Distribution, distribution of Tn under H0.
Critical Region, R, and Critical Value(s)(CR1 , CR2 )
Tn ∈ R =⇒ Reject H0
Significance Level, α,
α = P[Tn ∈ R|H0].
P-Value, p, p = P[|Tn| ≥ |t(x)||H0].
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 26/ 132
The Likelihood Ratio Test The Likelihood Ratio Test statistic for testing H0 against H1 is
sup fX |θ(X ; θ) θ∈Θ1 Tn = sup fX |θ(X ; θ) θ∈Θ0
where H0 is rejected if Tn is too large, that is, if P[Tn ≥ k|H0] = α.
If H0 imposes q independent constraints on H1, then, as n −→ ∞
A 2 2 log Tn ∼ χq. (1)
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 27/ 132
The Rao/Score/Lagrange Multiplier Test The Rao/Score/Lagrange Multiplier statistic, Rn, for testing
H0 : θ = θ0 H1 : θ 6= θ0 is defined by T −1 Rn = Zn [I(θ0)] Zn (2) where 1 Z = √ l˙(X ; θ ). n n 0
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 28/ 132
For large n, if H0 is true,
A 2 Rn ∼ χp
and H0 is rejected if Rn is too large, that is, if Rn ≥ C, and where
P [Rn ≥ C|H0] = α for significance level α.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 29/ 132
Interpretation and Explanation: The score test uses these results; if H0 is true,
A Sθ0 (X ) ∼ N (0, nI(θ0))
so that the standardized score
V = L−1S (X ) ∼A N (0, I ) n θ0 θ0 p
where Ip is the p × p identity matrix, and where matrix A (θ) is given by L LT = nI (θ ) . θ0 θ0 0
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 30/ 132
Hence, by the usual normal distribution theory
T T −1 A 2 Rn = Vn Vn = Zn {I(θ0)} Zn ∼ χp
so that observed test statistic
n 1 X r = z T {I(θ )}−1 z where z = √ S (x ) n n 0 n n n θ0 i i=1
2 should be an observation from a χp distribution.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 31/ 132
Extension: It is legitimate, if required, to replace I(θ0) by a suitable estimate Iˆn(˜θn). For example
I(˜θ ) n ˆ ˜ A ˜ In(θn) = In (n, θn) (3) B ˜ In (n, θn)
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 32/ 132
The Wald Test The Wald Test statistic, Wn, for testing H0 against H1 : θ 6= θ0 is defined by
√ T h i √ Wn = n ˜θn − θ0 Iˆn(˜θn) n ˜θn − θ0 (4)
Then, for large n, if H0 is true,
A 2 Wn ∼ χp
and H0 is rejected if Wn is too large, that is, if Wn ≥ C, and where P[Wn ≥ C|H0] = α for significance level α.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 33/ 132
Interpretation and Explanation: the logic of the Wald test depends on the asymptotic Normal distribution of the score equation derived estimates
√ d −1 n ˜θn − θ0 → N 0, I(θ0)
so that A −1 ˜θn ∼ N θ0, I(θ0) Again, estimates of the Fisher Information such as those in (3) can be substituted for I (θ0) in (4).
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 34/ 132
Extension to tests for components of θ. The theory above concerns tests for the whole parameter vector θ. Often it is of interest to consider components of θ, that is, if θ = (θ1, θ2), we might wish to test
H0 : θ1 = θ10, with θ2 unspecified
H1 : θ1 6= θ10, with θ2 unspecified
The Rao Score and Wald tests can be developed to allow for testing in this slightly different context.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 35/ 132
Suppose that θ1 has dimension m and θ2 has dimension p − m. Let the Fisher information matrix I(θ) and its inverse be partitioned I I I(θ) = 11 12 I21 I22 −1 −1 −1 [I11.2] − [I11.2] I12 [I22] J (θ) = −1 −1 −1 − [I22.1] I21 [I11] [I22.1] be a partition of the information matrix, where −1 I11.2 = I11 − I12 [I22] I21
−1 I22.1 = I22 − I21 [I11] I12 and all quantities depend on θ. David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 36/ 132
The Rao/score/LM statistic is given by
h (0) i−1 T ˆ ˜ A 2 Rn = Zn0 In(θn ) Zn0 ∼ χm
˜(0) ˆ ˜(0) where θn is the estimate of θ under H0 and In(θn ) is the ˜(0) estimated Fisher information I1, evaluated at θn , obtained using any of the estimates in (3).
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 37/ 132
The Wald statistic is given by √ h i √ ˜ T ˆ(11.2) ˜ ˜ A 2 Wn = n(θn1 − θ10) In (θn) n θn1 − θ10 ∼ χm
where ˜θn1 is the vector component of ˜θn corresponding to θ1 (11.2) under H1, and ˆIn (˜θn) is the estimated version of I11.2 (using the sample data, under H1) evaluated at ˜θn, obtained using any of the estimates in (3).
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 38/ 132
Confidence Intervals A 100(1 − γ)% confidence interval (CI) C(X ) for parameter θ is an interval such that P[θ ∈ C(X )] = 1 − γ
under assumptions made about X = (X1, X2,..., n) from model fX |θ. In most cases this corresponds to an interval C(X ) ≡ (L(X ), U(X )) such that
P[L(X ) ≤ θ ≤ U(X )] = 1 − γ
under fX |θ. Notice that C(X ) is a random interval that can be estimated for real data x by C(x).
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 39/ 132
2 Example: If X1,..., Xn ∼ N(µ, σ ), then √ n(X − µ) ∼ N(0, 1) σ so that, under this model, if γ = 0.05, √ n(X − µ) P −1.96 ≤ ≤ 1.96 = 1 − γ σ
and a 100(1 − γ)% CI is given by σ σ L(X ) = X − 1.96√ U(X ) = X + 1.96√ n n
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 40/ 132
Note: Connection with Testing
Under H0 : θ = θ0, for test statistic Tn and Critical Region R,
α = P[Tn ∈ R|H0].
Typically, Tn is a pivotal quantity whose form depends on θ, but whose distribution does not. It can be shown that the 100(1 − γ)% CI is the range of values of θ0 that can be hypothesized under H0 such that the hypothesis is not rejected at significance level γ.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 41/ 132
The coverage probability of any random interval C(X ) is the probability P[θ ∈ C(X )]
computed under the true model fX |θ. Thus the coverage probability for a true 100(1 − γ)% CI is (1 − γ). In complicated estimation problems, confidence intervals and coverage probabilities are typically verified using simulation.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 42/ 132
Quasi-Likelihood
Quasi-likelihood (QL) methods were introduced to extend the models that can be fitted to data. The origin of QL methods lie in the attempts to extend the normal linear model to non-normal data, that is, to extend to
Generalized Linear Models
We again begin with a parametric probability model, fY |θ.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 43/ 132
Exponential Family of Distributions : Suppose
a (ξ)b (y) − c (ξ) f (y; ξ) = exp ξ ξ ξ + d(y, φ) Y |ξ φ
or equivalently, in canonical form, writing θ = aξ(ξ), we have
θb(y) − c(θ) f (y; θ) = exp + d(y, φ) Y |θ φ
Without loss of generality, we assume b(y) = y. Then
EY |θ[Y ] =c ˙(θ) = µ VarY |θ[Y ] = φc¨(θ) = φV (µ),
say, that is, expectation and variance are functionally related. David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 44/ 132
A slight generalization allows different data points to be weighted by potentially different weights, wi , that is, the likelihood becomes θy − c(θ) f (y ; θ) = exp w i + d(y , φ/w ) Y |θ i i φ i i
so that wi is a known constant that changes the scale of datum i. Then EY |θ[Y ] = µ VarY |θ[Y ] = φV (µ)/w.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 45/ 132
A Generalized Linear Model is a model such that the expectation is modelled as a function of predictors X , that is
µ =c ˙(θ) = g −1(X β)
for some link function, g, a monotone function onto R. The canonical link is the link such that
g(˙c(θ)) = θ.
The term X β is the linear predictor.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 46/ 132
For an exponential family GLM, the log-likelihood in the canonical parameterization is n X θi yi − c(θi ) l(β; y) = constant + w + d(y , φ/w ) i φ i i i=1
Partial differentiation with respect to βj yields a score equation :
n n ∂l(β; y) 1 X ∂θi 1 X ∂θi = w (y − c˙(θ )) = w (y − µ ) ∂β φ i ∂β i i φ i ∂β i i j i=1 j i=1 j
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 47/ 132
But, with link function g, we have
g(µi ) = g(˙c(θi )) = Xi β = ηi
thus ∂ηi ∂g(˙c(θi )) ∂θi = =g ˙ (˙c(θi ))¨c(θi ) ∂βj ∂βj ∂βj
and hence, asc ¨(θi ) = V (µi ),
∂ηi ∂θi =g ˙ (µi )V (µi ) . ∂βj ∂βj
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 48/ 132
But ∂ηi = Xij . ∂βj Thus we have for the jth (j = 1,..., p) score equation n ∂l(β; y) X wi (yi − µ )Xij = i = 0. ∂β φ g˙ (µ )V (µ ) j i=1 i i where, recall, −1 µi = g (Xi β).
Estimation of (β1, . . . , βp) can be achieved by solution of this system of equations. Note that φ can be omitted from this system as φ > 0 by assumption. David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 49/ 132
If the canonical link function is used
∂θi θi = Xi β =⇒ = Xij ∂βj and the score equations become
n ∂l(β; y) X = w (y − µ )X = 0 j = 1,..., p. ∂β i i i ij j i=1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 50/ 132
In this model, the assumptions about the specific form of fY |θ allowed the construction of the score equations; for different probability models, the different components take different forms:
fY |θ(y; θ) ≡ Poisson(λ) canonical parameter θ = log λ, canonical link g(t) = log(t), µ = λ = exp(θ), V (µ) = λ = µ (so that V (t) = t). wi = 1, φ = 1.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 51/ 132
fY |θ(y; θ) ≡ Binomial(n, ξ) canonical parameter θ = log(ξ/(1 − ξ)), canonical link g(t) = log(t/(1 − t)), µ = ξ = exp(θ)/(1 + exp(θ)), V (µ) = ξ(1 − ξ) = µ(1 − µ) (so that V (t) = t(1 − t)). wi = ni , φ = 1.
Note: yi presumed to be modelled in proportionate form, that is, if Z ∼ Binomial(n, ξ), we model Y = Z/n.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 52/ 132
2 fY |θ(y; θ) ≡ Normal(ξ, σ ) canonical parameter θ = ξ, canonical link g(t) = 1, µ = ξ, V (µ) = 1 (so that V (t) = 1). wi = 1, φ = σ2. Note: here mean and variance are modelled orthogonally.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 53/ 132
Quasi-Likelihood: The score equations are key in the estimation, and are derived directly from the probabilistic assumptions:
n X (yi − µ )Xij w i = 0 j = 1,..., p. i g˙ (µ )V (µ ) i=1 i i However, these equations can be used as the basis for estimation even if they are not motivated by probabilistic modelling.
We can propose forms for V (µi ) directly without reference to any specific model. This is the basis of quasi-likelihood methods.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 54/ 132
Examples: 2 V (µi ) = µi , the constant coefficient of variation model where E[Y ] µ 1 i = i = p 1/2 1/2 Var[Yi ] φ µi φ
V (µi ) = φi µi (1 − µi ) (an overdispersed binomial model)
V (µi ) = φi µi (an overdispersed Poisson model) 2 V (µi ) = φi µi (an overdispersed Exponential model).
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 55/ 132
Estimating Equations
The quasi-likelihood approach is a special case of a general approach to estimation based on estimating equations. An estimating function is a function
n n X X G(θ) = G(θ, Yi ) = Gi (θ) (5) i=1 i=1 of the same dimension as θ for which
E[G(θ)] = 0. (6)
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 56/ 132
The estimating function G(θ) is a random variable because it is a function of Y . The corresponding estimating equation that defines the estimator θb has the form n X G(θb) = Gi (θb) = 0. (7) i=1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 57/ 132
For inference, the frequentist properties of the estimating function are derived and are then transferred to the resultant estimator. The estimating function may be constructed to be a simple function of the data, while the estimator of the parameter that solves (7) will often be unavailable in closed form. The estimating function (5) is a sum of random variables which provides the opportunity to evaluate its asymptotic properties via a central limit theorem. The art of constructing estimating functions is to make them dependent on distribution-free quantities, for example, the population moments of the data. The following theorem that forms the basis for asymptotic inference.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 58/ 132
Theorem: Estimator θbn which is the solution to n X G(θbn) = Gi ( θbn) = 0, i=1 has asymptotic distribution
A −1 T−1 θbn ∼ N θ, A BA , where n ∂G X ∂Gi (θ) A = A (θ) = E = E n ∂θT ∂θT i=1 n X B = Bn(θ) = Cov(G) = Cov{Gi (θ)}. i=1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 59/ 132
The form of the covariance of the estimator here, the covariance of the estimating function, flanked by the inverse of the Jacobian of the transformation from the estimating function to the parameter.
In practice, A = An(θ) and B = Bn(θ) are replaced by Ab = An(θbn ) and Bb = Bn(θbn ), respectively. In this case, we have
A −1 T−1 θbn ∼ Np θ, Ab BbAb , (8)
p p since Ab −→ A and Bb −→ B.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 60/ 132
The accuracy of the asymptotic approximation to the sampling distribution of the estimator is dependent on the parameterization adopted. A rule of thumb is to obtain the confidence interval on a reparameterization which takes the parameter onto the real line (for example, the logistic transform for a probability, or the logarithmic transform for a dispersion parameter), and then to transform to the more interpretable scale. Estimators for functions of interest, φ = g(θ), may be obtained via φb = g( θb), and the asymptotic distribution may be found using the delta method.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 61/ 132
Sandwich Estimation A general method of avoiding stringent modelling conditions when the variance of an estimator is calculated is provided by sandwich estimation. The basic idea is to estimate the variance of the data empirically with minimum modelling assumptions, and to incorporate this in the estimation of the variance of an estimator.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 62/ 132
We have seen that when the estimating function corresponds to a score equation derived from a probability model, then under the model I = A = −B so that
Var(θb) = A(θ)−1B(θ)A(θ)T−1 = I(θ)−1. If the model is not correct then this equality does not hold, and the variance estimator will be incorrect.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 63/ 132
An alternative is to evaluate the variance empirically via
n X ∂ Ab = G(θb, Yi ), ∂θ i=1 and n X T Bb = G(θb, Yi )G(θb, Yi ) . i=1 This method is general and can be applied to any estimating function, not just those arising from a score equation.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 64/ 132
Suppose we assume E[Y] = µ and Var(Y) = φV with Var(Yi ) = φV (µi ), and Cov(Yi , Yj ) = 0, i, j = 1, ..., n, i 6= j, as a working covariance model. Under this specification it is natural to take the quasi-likelihood as an estimating function, in which case
Cov{U(β)} = DTV−1Cov(Y)V−1D/φ2
to give
T −1 −1 T −1 −1 T −1 −1 Vars (βb) = (D V D) D V Cov(Y)V D(D V D) ,
and so the appropriate variance is obtained by substituting in the correct form for Cov(Y) which is, of course, unknown.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 65/ 132
However, a simple “sandwich” estimator of the variance is given by
T −1 −1 T −1 T −1 T −1 −1 Vars (βb) = (D V D) D V R RV D(D V D) ,
T where R = (R1, ..., Rn) is the n × 1 vector with Ri = Yi − µi (βb). This estimator is consistent for the variance of βb, under correct specification of the mean, and with uncorrelated data. There is finite sample bias in Ri as an estimate of Yi − µi (β) and versions that adjust for the estimation of the parameters β are also available
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 66/ 132
The great advantage of sandwich estimation is that it provides a consistent estimator of the variance in very broad situations. There are two things to bear in mind For small sample sizes, the sandwich estimator may be highly unstable, and in terms of mean squared error model-based estimators may be preferable for small to medium sized n; empirical is a better description of the estimator than robust. If the model is correct, then the model-based estimators are more efficient.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 67/ 132
Generalized Estimating Equations The models above focus on independent data only. However, the methods can be extended to the dependent data cases. Recall the Normal Linear (regression) model Y = X β + where Y is n × 1, X is n × p, β is p × 1, is n × 1, 2 and ∼ N(0, σ In) for identically distributed errors. David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 68/ 132
More generally, we can assume ∼ N(0, Σ) and model the observable quantities as dependent repeated observations on a series of N experimental units that are modelled independently, so that Σ is block diagonal: Σ1 0 ··· 0 0 Σ2 ··· 0 Σ = . . .. . . . . . 0 0 ··· ΣN
correlated data from one stochastic process
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 69/ 132
In this model, we have the ML (and GLS) estimates (conditional on Σ) given by
T −1 −1 T −1 βbn = (X Σ X ) X Σ y
and it follows that βbn is unbiased, and Normally distributed with variance (X TΣ−1X )−1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 70/ 132
Typically Σ is not known, and possibly contains unknown parameters, α. It can be estimated in a number of ways
ML (distribution-based) REML (distribution-based) Robust (sandwich) estimation (model-free)
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 71/ 132
An estimating equation approach can be used to view this form of estimation in a distribution-free context. We consider the Generalized Estimating Equation given by
G(β) = X TW −1(y − X β)
for symmetric, non-singular matrix W (that is, a matrix version of the independent case given above). Then E[G(β)] = 0, and
T −1 −1 T −1 βbW = (X W X ) X W y is unbiased with
T −1 −1 T −1 −1 T −1 −1 Var(βbW ) = (X W X ) X W ΣW X (X W X )
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 72/ 132
Could choose W = Σ, so that
T −1 −1 Var(βbW ) = (X Σ X )
W = I, so that
T −1 T T −1 Var(βbW ) = (X X ) X ΣX (X X ) .
We still need to estimate Σ = Σ(α).
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 73/ 132
We focus on the repeated measures case, where independent units i = 1,..., N has n1,..., nN observations.
Suppose that Σi = Wi , a known constant “working” covariance matrix. Then we have
N !−1 N ! X T −1 X T −1 βbW = Xi Wi Xi Xi Wi yi i=1 i=1 with
N !−1 T −1 −1 X T −1 Var(βbW ) = (X W X ) = Xi Wi Xi i=1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 74/ 132
If W = W (α, β), then the corresponding estimating function is
N X T −1 G(α, β) = Xi Wi (α, β)(yi − Xi β) i=1
If α is consistently estimated by αbn, then we can substitute in and leave the estimating function
N X T −1 G(β) = Xi Wi (αbn, β)(yi − Xi β) i=1
and βbW can be found using the usual iterative schemes.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 75/ 132
In this case, the estimated variance of βbW is given by
N !−1 X T −1 Vard(βbW ) = Xi Wi (αbn, βbn)Xi i=1 N ! X T −1 −1 × Xi Wi (αbn, βbn)Σbi Wi (αbn, βbn)Xi i=1 N !−1 X T −1 × Xi Wi (αbn, βbn)Xi i=1
where it still remains to estimate Σ by Σ.b
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 76/ 132
We use the estimate based on the quantities
T (yi − Xi βbn)(yi − Xi βbn) i = 1,..., N.
For example, for a balanced design (all ni equal to M), with common covariances, for equally-spaced data, we estimate
N M 1 X X 2 [Σi ] = (yij − Xij βbn) jj NM i=1 j=1
and N 1 X [Σi ] = (yij − Xij βbn)(yik − Xik βbn). jk N i=1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 77/ 132
The model can be extended by the inclusion of a link function h such that µi = h(Xi β) in which case the estimating function is
N X T −1 G(β) = Di Wi (α, β)(yi − Xi β) i=1
where Di is the ni × p matrix of partial derivatives.
∂µij [Di ]jk = ∂βk
for j = 1,..., ni , k = 1,..., p. David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 78/ 132
Summary: GEE given by estimating function
N X T −1 G(α, β) = Di Wi (yi − µi ) i=1 where
µi = h(Xi β) ∂µ D = i = X T i ∂βT i Wi is a working covariance model.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 79/ 132
b = yi − µbi = yi − h(Xi βb) Dbi is Di evaluated at µbi . Ab given by N X T −1 Ab = Dbi Wi Dbi i=1 Bb given by
N ! X T −1 T −1 Bb = Dbi Wi bi bi Wi Dbi i=1
the variance of βb is Ab−1BbAb−1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 80/ 132
Generalized Method of Moments
The Generalized Method of Moments (GMM) approach to estimation is designed to produce estimates that satisfy moment conditions that are appropriate in the context of the modelling situation. It is an extension to conventional method of moments (MM) in which theoretical and empirical moments are matched. See Hall (2005), Generalized Method of Moments, Oxford Advanced Texts in Econometrics.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 81/ 132
2 Example: X1,..., Xn ∼ Normal(µ, σ ), independent. We have 2 2 EX |θ[X ] = µ EX |θ[X ] = σ . We equate to the first two empirical moments n n 1 X 1 X m = x = x m = x2 1 n i 2 n i i=1 i=1 yielding equations for estimation 2 2 m1 = µ m2 = µ + σ , or equivalently
m1 − µ = 0 2 2 m2 − (µ + σ ) = 0 (9)
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 82/ 132
Example: X1,..., Xn ∼ Gamma(α, β), independent. We have α α(α + 1) E [X ] = E [X 2] = . X |θ β X |θ β2 We equate to the first two empirical moments
n n 1 X 1 X m = x = x m = x2 1 n i 2 n i i=1 i=1 yielding 2 m1 m1 αbn = 2 βbn = 2 m2 − m1 m2 − m1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 83/ 132
A problem with this approach is that typically p is finite (that is, we have a finite number of parameters to estimate), and (often) an infinite number of moments to select from; for example, we could use α(α + 1)(α + 2) α(α + 1)(α + 2)(α + 3) E [X 3] = E [X 4] = X |θ β3 X |θ β4 as two equations to estimate α and β. That is, the MM estimator is not uniquely defined.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 84/ 132
Econometric Model Suppose, for t = 1, 2,...,, we have
D D yt = α0xt + t S S yt = β01nt + β02xt + t (10) D S yt = yt (= yt , say)
where, in year t, D yt is the Demand, S yt is the Supply, xt is the price,
nt is a factor influencing supply.
We wish to estimate α0, given pairs (xt , yt ), t = 1,..., T .
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 85/ 132
OLS does not work effectively in this model; the estimates are typically biased.
D Instead suppose that there is an observable variable zt related to xt , but so that D D Cov[zt , t ] = 0
e.g. any of the factors that affect supply, nt . D It is typical to assume that E[t ], so that
D E[yt ] = α0E[xt ].
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 86/ 132
Then taking expectations through equation (10) we have the following relationship
D D E[zt yt ] − α0E[zt xt ] = 0 (11)
and thus an MM estimate is
T X D zt yt α = t=1 . bT T X D zt xt t=1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 87/ 132
GMM proceeds as follows: we define a population moment condition via vector function g as
EV |θ[g(vt , θ)] = 0.
For example, in the Normal example above, from equation (9), we have vt − µ g(Vt , θ) = 2 2 vt − µ − σ
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 88/ 132
In the econometric supply/demand model (11) we have
D D g(vt , θ) = zt yt − α0zt xt
D T so that vt = (zt , yt , xt ) , and θ = α0.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 89/ 132
The Generalized Method of Moments (GMM) Estimator of parameter θ based on q moment conditions
E[g(Vt , θ)] = 0. is given as the value of θ that minimizes
T QT (θ) = g T (θ) WT g T (θ) where T 1 X g (θ) = g(v , θ) T T t t=1 p and WT is a positive semidefinite matrix such that WT −→ W , a constant positive definite matrix, as T −→ ∞. David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 90/ 132
Note: in general q ≥ p. If q = p, then the system is just identified. If q > p, then the system is over-identified. Over-identification is what distinguishes GMM from MM. Some mild regularity conditions are needed to ensure that the estimation procedure works effectively.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 91/ 132
Regularity Conditions: strict stationarity,
g is continuous in θ for all vt , finite expectation that is continuous on Θ, q population moment constraints
E[g(vt , θ0)] = 0 (q × 1)
global identification
? ? E[g(Vt , θ )] 6= 0 θ 6= θ0
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 92/ 132
Conditions on the derivative of g: the (q × p) matrix of derivatives of g with respect to the elements of θ
∂gj ∂θk
θ0 is an interior point of Θ, the expectation matrix
∂g E ∂θT
exists and is finite, and has rank p when evaluated at θ0.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 93/ 132
Recall that the estimator, bθT is the value that minimizes
T QT (θ) = g T (θ) WT g T (θ) where T 1 X g (θ) = g(v , θ)(q × 1). T T t t=1 Under the regularity conditions, we have
T D(v, bθT ) WT g T (bθT ) = 0 (p × 1) where T 1 X ∂g(vt , θ) D(v, θ) = (q × p). T ∂θT t=1
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 94/ 132
Alternative representation: moment condition
T 1/2 F (θ0) W E[g(vt , θ0)] = 0 where ∂g(v , θ ) F (θ ) = W 1/2E t 0 0 ∂θT
We have rank{F (θ0)} = p. Identifying Restrictions:
T −1 T 1/2 F (θ0)(F (θ0) F (θ0)) F (θ0) W E[g(vt , θ0)] = 0 Overidentifying Restrictions:
T −1 T 1/2 (1q − F (θ0)(F (θ0) F (θ0)) F (θ0) )W E[g(vt , θ0)] = 0
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 95/ 132
Asymptotic Properties: under mild regularity conditions p bθT −→ θ0 (uniformly on Θ), and
1/2 L T T (bθT − θ0) −→ N(0, MSM ) where h 1/2 i S = lim Var T g T (θ0) T −→∞ and T −1 T M = (D0 WD0) D0 W with ∂g(v , θ ) D = E t 0 0 ∂θT
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 96/ 132
M can be estimated using
T −1 T Mb T = (DT (bθT ) WT DT (bθT )) DT (bθT ) WT
where T 1 X ∂g(vt , θ) D (θ) = T T ∂θT t=1
Estimation of S is more complicated; a number of different methods exist to produce an estimate Sb, depending on the context.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 97/ 132
Optimal choice of W : It can be shown that the optimal choice of W is S−1, so in the finite sample case we use
−1 WT = SbT . In practice, an iterative procedure can be used: −1 At step 1, set WT = 1q. Estimate bθT (1), and then SbT (1). −1 At step 2, 3,..., set WT = SbT (i − 1). Estimate bθT (i), and −1 then SbT (i). Iterate until convergence. This algorithm typically converges in relatively few steps.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 98/ 132
Implementation in the Linear Regression Model. Consider the model
T yt = xt θ0 + ut t = 1,..., T
with instruments zt , where
xt is p × 1
zt is q × 1. Define T ut (θ) = yt − xt θ0
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 99/ 132
Assumptions: (Strict) Stationarity
zT satisfies the population moment condition (PMC)
E[zt ut (θ0)] = 0
(an orthogonality condition).
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 100/ 132
Fundamental Decomposition:
T E[zt ut (θ)] = E[zt ut (θ0)] + E[zt xt ](θ0 − θ) so that, via the PMC
T E[zt ut (θ)] = E[zt xt ](θ0 − θ),
so θ0 is identified if
T E[zt xt ](θ0 − θ) 6= 0.
T Note that this is a linear system; E[zt xt ] is a (q × p) matrix - we need T rank{E[zt xt ]} = p.
David A. Stephens Statistical Inference and Methods Frequentist Inference Likelihood Methods Hypothesis Testing Methods of Inference Quasi-Likelihood Estimating Equations GMM Bayesian Methods
Session 2: Methods of Inference 101/ 132
Estimator: We have (in matrix form)
−1 T −1 T QT (θ) = T U(θ) Z WT T Z U(θ) where now y is (T × 1), X is (T × p), Z is (T × q), U is (T × 1), U(θ) = y − X θ Then