Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel

Microeconometrics Based on the textbooks Verbeek: A Guide to Modern and Cameron and Trivedi: Microeconometrics

Robert M. Kunst [email protected]

University of Vienna and Institute for Advanced Studies Vienna

October 21, 2013

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Outline

Basics Ordinary The model Goodness of fit Restriction tests on coefficients Asymptotic properties of OLS Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

What is micro-econometrics? Micro-econometrics is concerned with the statistical analysis of individual (not aggregated) economic data. Some aspects of such data imply special emphasis: ◮ Samples are larger than for economic aggregates, often thousands of observations are available; ◮ Non-linear functional dependence may be identified from data (for aggregates linear or pre-specified functions must suffice); ◮ Qualitative or limited-dependent variables are common; ◮ Individuals can often be regarded as independent, random assumptions can be plausible (no serial correlation); ◮ Individuals are often heterogeneous: heteroskedasticity; ◮ The logical sequences in time are unavailable. Cause and effect are identified from theory, not from the data.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Ordinary least squares Linear regression:

There are N observations on a ‘dependent’ variable y and on K 1 ‘explanatory’ variables (regressors) x ,..., xK . The − 2 best-fitting linear combination that minimizes

N ′ 2 S(β˜)= (yi x β˜) − i i =1 ′ is the OLS (ordinary least squares) vector b = (b1,..., bK ) . ′ Notation: x 1 but x = (1, x ,..., x K ). 1 ≡ 1 12 1

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Ordinary least squares The normal equations of OLS

To minimize S(β˜), equate its derivative to 0 (‘normal equations’):

N N N ′ ′ 2 xi (yi x b) = 0 ∴ xi x b = xi yi , − − i i i i i =1 =1 =1 ′ or simply, in matrix and vector notation, using y = (y1,..., yN ) ′ and X = (x1,..., xN ) ,

′ − ′ b = (X X ) 1X y.

′ Notation: Here, xi = (1, xi2,..., xiK ) .

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Ordinary least squares yˆ and residuals

The (‘systematic’ or ‘predicted’) value

′ yˆi = xi b

is the best linear approximation of y from x2,..., xK , the best approximation by linear combinations. The difference between y andy ˆ is called the residual

′ ei = yi yˆi = yi x b. − − i

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Ordinary least squares The function S(.) evaluated at its minimum N N 2 2 S(b)= (yi yˆi ) = e − i i i =1 =1 is the residual sum of squares (RSS). Because of the normal equations, N N ′ xi (yi x b)= xi ei = 0, − i i i =1 =1 such that residuals and regressors are orthogonal. Also, N N ′ ′ ei =0= (yi x b) =y ¯ x¯ b − i − i i =1 =1 ory ¯ =x ¯′b. averages are on the regression hyperplane.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Ordinary least squares

The case K = 2 is called the simple linear regression. Geometrically, a straight line is fitted through points xi , yi . Note the property b =y ¯ b x¯ 1 − 2 for the intercept. Starting from (X ′X )−1X ′y, some algebra yields

N i=1(xi x¯)(yi y¯) b2 = − − N 2 (xi x¯) i=1 − for the slope.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Ordinary least squares A dummy regressor

If xi = 1 for some observations and xi = 0 otherwise (man/woman, employed/unemployed, before/after 1989), x is called a dummy variable.x ¯ will be the share of 1’s in the sample.

b1 is the average y for the 0 individualsy ¯[0], and b2 is the difference betweeny ¯[1] andy ¯[0].

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Ordinary least squares Linear regression and matrix notation

Most issues are simpler and more compact in matrix notation:

′ 1 x12 . . . x1K x1 y1 . . . . . X =  . . .  =  .  , y =  .  . 1 x . . . x x′ y  N2 NK   N   N        The sum of squares to be minimized is now

′ ′ ′ ′ ′ S(β˜) = (y X β˜) (y X β˜)= y y 2y X β˜ + β˜ X X β,˜ − − − and ∂S(β˜)/∂β˜ = 0 yields immediately b = (X ′X )−1X ′y.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Ordinary least squares The The vector of observations is the sum of its systematic part and the residuals

′ − ′ y = Xb + e = X (X X ) 1X y + e =y ˆ + e.

The matrix PX that transforms y into its systematic part

′ −1 ′ yˆ = X (X X ) X y = PX y

is called the projection matrix. It is singular, N N of rank K, × and PX PX = PX . Note the orthogonality

PX (I PX ) = 0 −

to the matrix MX = I PX that transforms y to residuals e. − Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

The linear regression model Linear regression in a

◮ Without a statistical frame, regression is a mere descriptive device. The OLS coefficients are not estimates. They can be calculated, as long as X ′X is non-singular (no multicollinearity); ◮ The classical didactic regression model assumes stochastic (random) errors and non-stochastic regressors. Current textbooks tend to avoid that model; ◮ The stochastic regression model assumes X and y to be realizations of random variables. Under conditions, operations can be conducted conditional on X , and most classical results can be retrieved.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

The linear regression model Main ingredients of the statistical regression model

In the regression in matrix notation

y = X β + ε,

y and X collect realizations of observable random variables, β contains K unobserved non-random parameters, and ε collects realizations of an unobserved , the error term. The assumption of random sampling implies that observations are independent across the index i (the rows). A non-stochastic regressor has the same values in a different sample. A deterministic regressor has known values before sampling (constant, trend).

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

The linear regression model Exogenous regressors

y is often called the dependent variable, xk , k = 1,..., K, are the regressors or covariates (usage of the word ‘independent variable’ is discouraged). If the exogeneity assumption on the regressors

E[εi xi ] = 0 | ′ holds, it follows that E[yi xi ]= x β and also that E[εi ]= 0. With | i non-stochastic regressors, this exogeneity assumption is unnecessary. (There are also other definitions of exogeneity)

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

The linear regression model and estimate

An estimate is a obtained from a sample that is designed to approximate an unknown parameter closely: b is an estimate for β. The unknown but fixed β is called a K–dimensional parameter or a K–vector of scalar parameters (real numbers). In regression, b may be called the coefficient estimates, β may be called the coefficients.

An estimator is a rule that says how to obtain the estimate from data. OLS is an estimator.

The residual ei approximates the true error term εi but it is not an estimate, as εi is not a parameter. Don’t be sloppy, never confuse .

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

The linear regression model Gauss-Markov conditions

The Gauss-Markov conditions imply nice properties for the OLS estimator. In particular, for the linear regression model ′ yi = xi β + εi , assume A1 Eεi = 0, i = 1,..., N;

A2 ε ,...,εN and x ,..., xN are independent; { 1 } { 1 } 2 A3 Vεi = σ , i = 1,..., N;

A4 cov(εi , εj ) = 0, i, j = 1,..., N, i = j. (A1) identifies the intercept, (A2) is an exogeneity assumption, (A3) is called the homoskedasticity assumption, (A4) assumes the absence of .

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

The linear regression model Some implications of the Gauss-Markov assumptions

◮ (A1), (A3), (A4) imply for the vector random variable ε that 2 Eε = 0 and V(ε)= σ IN , a ‘scalar matrix’; ◮ 2 (A1)–(A4) imply that also E(ε X )=0 and V(ε X )= σ IN , | | other types of exogeneity assumptions; ◮ (A1) and (A2) imply that E(y X )= X β. |

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

The linear regression model Unbiasedness of OLS

Under assumptions (A1) and (A2), the OLS estimator is unbiased, i.e. Eb = β, because of

′ − ′ ′ − ′ Eb = E(X X ) 1X y = E (X X ) 1X (X β + ε) ′ − ′ { } = β + E (X X ) 1X ε = β { }

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

The linear regression model of OLS

Under assumptions (A1)–(A4), the variance of the OLS estimator follows the simple formula

′ − V(b X )= σ2(X X ) 1, | because of

′ V(b X ) = E (b β)(b β) X | { −′ − −′ ′ | }′ − = E (X X ) 1X εε X (X X ) 1 X {′ − ′ ′ ′ |− } = (X X ) 1X E(εε X )X (X X ) 1 ′ − | = σ2(X X ) 1

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

The linear regression model The Gauss-Markov Theorem

Theorem Under the assumptions (A1)–(A4), the OLS estimator is the best linear unbiased estimator, i.e. the estimator among the linear unbiased with the smallest variance.

◮ There are usually non-linear and/or biased estimators with a smaller variance; ◮ Smallness of variance is defined via definiteness, i.e. Σ Σ 1 ≤ 2 if Σ Σ non-negative definite; 2 − 1 ◮ The acronym BLUE stands for ‘best linear unbiased estimator’. Gauss-Markov says that OLS is BLUE.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

The linear regression model Estimating the

The OLS variance σ2(X ′X )−1 is estimated by plugging in an estimate for σ2 = E(ε2) that uses residuals

N N 1 1 ˜s2 = e2, s2 = e2, N 1 i N K i i i − =1 − =1 with s2 often preferred as it is unbiased. Note that

Es2 = σ2, Es = σ. The square roots of the variances in the diagonals of s2(X ′X )−1 are the standard errors of the coefficients bk .

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Goodness of fit The R2 The most customary goodness-of-fit measure is the R2 defined by

N 2 (ˆyi y¯) Vˆ (ˆy) R2 = i=1 − = , N 2 (yi y¯) Vˆ (y) i=1 − with Vˆ denoting empirical variance andy ¯ the sample . Orthogonality betweeny ˆ and e implies that

Vˆ (y)= Vˆ (ˆy)+ Vˆ (e),

such that

Vˆ (e) N e2 R2 = 1 = 1 i=1 i . N 2 − Vˆ (y) − (yi y¯) i=1 − Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Goodness of fit Variants of R2 If a regression is run without an intercept (‘homogeneous regression’), the uncentered R2 N e2 R2 = 1 i=1 i 0 − N 2 i=1 yi 2 becomes attractive. Comparison with the usual R becomes difficult. If R2 is interpreted as an estimate of the squared correlation coefficient of y andy ˆ, it is severely biased. The adjusted R2 1 N 2 − e R¯ 2 = 1 N K i=1 i 1 N 2 − (yi y¯) N−1 i=1 − has less . It is not a panacea, however, for the problem that R2 cannot be used for . Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Restriction tests on coefficients The t–statistic

Additional to (A1)–(A4), assume

A5 The errors εi follow a . 2 Then, it follows that the vector ε (0,σ IN ). Clearly, 2 ∼N y X (X β,σ IN ), and also the OLS coefficient b follows a | ∼N normal distribution. It can be shown that the ratio

bk βk tk = − s√ckk

is t–distributed with N K degrees of freedom. Here, ckk denotes − ′ − the k-th diagonal element of (X X ) 1.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Restriction tests on coefficients The t–test

0 The null hypothesis of interest is H0 : βk = βk . Then, under the null hypothesis, the t–statistic

0 bk βk tk = − s√ckk

is tN−K distributed. As N K increases, tN−K approaches the − normal (0, 1) distribution. It is customary to say that ‘a N coefficient is significant’ whenever its tk for H0 : βk = 0 is larger than 1.96 or even 2.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Restriction tests on coefficients The F–test Assume the null hypothesis of interest is

H0 : βK−J+1 = . . . = βK = 0,

i.e. J restrictions on the coefficients. Let S0 denote the residual sum of squares if OLS is applied to the restricted model with K J regressors, and S the RSS for the unrestricted model with − 1 K regressors. One can show that the F–statistic

(S S )/J F = 0 − 1 S /(N K) 1 − is under H distributed F (J, N K). As N gets large, the F–test 0 − becomes essentially a χ2(J) test.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Restriction tests on coefficients Variants of the F–test

◮ For J = 1, F is the square of the t–statistic; ◮ For J = K 1, the F–statistic is a statistical counterpart of − 2 the purely descriptive R , S1 is the regression RSS, S0 is the 2 total sum of squares (yi y¯) . This is the only F shown in − a standard regression printout; ◮ For any linear combinations of restrictions, such as β2 = β3, the concept is easily generalized, with S0 corresponding to an OLS regression under the specified restrictions and J the number of linear independent restrictions.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Restriction tests on coefficients t–test, F–test, and Often, the t–test and the F–test in linear regression are referred to as ‘the Wald test’. This is not quite accurate: ◮ The Wald test is one out of three basic construction principles for hypothesis tests in parametric inference, the other two are the likelihood-ratio test and the Lagrange multiplier test. These are not specific tests, but construction principles for tests; ◮ The main idea of the Wald test is to estimate the model under the general (‘maintained’) hypothesis and to check whether the null hypothesis is fulfilled; ◮ The Wald principle is attractive when the general unrestricted model is easy to estimate but estimating the restricted model is costly. In linear regression, all models are easily estimated.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Restriction tests on coefficients p–values

Currently, the significance of test decision is reported in one of two ways: 1. The value of the test statistic is compared to critical values of the null distribution (often 1%,5%,10%). If the statistic transgresses the 1% point, it receives more asterisks (stars) than if it only transgresses the 10% point: ‘star wars’; 2. The marginal significance level is calculated, at which the decision whether to reject the null or not becomes inconclusive: the p–value. If the p–value is less than 0.05, the evidence against the null is significant at 5%. Remember that the p–value is not the probability of the null hypothesis. It should not be called ‘prob–value’.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Restriction tests on coefficients Some language for hypothesis tests

◮ Rejecting H0 although it is correct is a type I error; ◮ The probability of a type I error is called the size of the test. If the size is not constant under H0, the maximum may be called the size; ◮ Not rejecting H0 although it is incorrect is a type II error; ◮ One minus the probability of a type II error (the probability of correct rejection) is called the power of the test. The power will typically not be constant on the alternative. It will increase with the distance to the null. Close to the null, the power will be close to the size.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Asymptotic properties of OLS Consistency of OLS Introduce the assumption ′ A6 limN→∞ X X /N =ΣX , a finite non-singular matrix, which is stronger than no multicollinearity. It excludes ‘asymptotic multicollinearity’ as well as increasing regressors. Theorem Under the assumptions (A1)–(A4),(A6), it holds that plimb = β.

This property is also called in short ’b is consistent for β’. Formally, it is defined by

lim P bk βk > δ = 0 δ > 0 k. N→∞ {| − | } ∀ ∀

Proof uses the Chebyshev inequality.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Asymptotic properties of OLS Remarks on OLS consistency

◮ If g(.) is a continuous function, then the consistency of b for β implies the consistency of g(b) for g(β); ◮ Consider the representation

− ′ − − ′ − ′ − − ′ b = (N 1X X ) 1N 1X y = β + (N 1X X ) 1N 1X ε,

such that b converges to β if the last term converges to 0. −1 ′ The factor N X X converges to a fixed matrix ΣX by assumption. There will be consistency if N−1X ′ε converges to 0, and this will hold true under conditions weaker than the Gauss-Markov conditions, admitting some correlation among errors and some heteroskedasticity.

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Asymptotic properties of OLS Asymptotic normality of OLS

Theorem Under assumptions (A1)–(A4),(A6), it holds that

− √N(b β) (0,σ2Σ 1). − →N X

Again, for this result conditions weaker than the Gauss-Markov conditions would suffice. Note that the normality assumption (A5) is not needed, according to the . The theorem implies that b is asymptotically distributed as (β,σ2(X ′X )−1). Similarly, the corresponding t–statistics will be N asymptotically (0, 1) distributed etc. N

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna Basics Endogenous regressors Maximum likelihood Limited dependent variables Panel data

Microeconometrics University of Vienna and Institute for Advanced Studies Vienna