Chapter 2. Fixed Effects Models / 2-1
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 2. Fixed Effects Models / 2-1 © 2003 by Edward W. Frees. All rights reserved Chapter 2. Fixed Effects Models Abstract. This chapter introduces the analysis of longitudinal and panel data using the general linear model framework. Here, longitudinal data modeling is cast as a regression problem by using fixed parameters to represent the heterogeneity; nonrandom quantities that account for the heterogeneity are known as fixed effects. In this way, ideas of model representation and data exploration are introduced using regression analysis, a toolkit that is widely known. Analysis of covariance, from the general linear model, easily handles the many parameters that are needed to represent the heterogeneity. Although longitudinal and panel data can be analyzed using regression techniques, it is also important to emphasize the special features of these data. Specifically, the chapter emphasizes the wide cross-section and the short time-series of many longitudinal and panel data sets, as well as the special model specification and diagnostic tools needed to handle these features. 2.1 Basic fixed effects model Data Suppose that we are interested in explaining hospital costs for each state in terms of measures of utilization, such as the number of discharged patients and the average hospital stay per discharge. Here, we consider the state to be the unit of observation, or subject. We differentiate among states with the index i, where i may range from 1 to n, and n is the number of subjects. Each state is observed Ti times and we use the index t to differentiate the observation times. With these indices, let yit denote the response of the ith subject at the tth time point. Associated with each response yit is a set of explanatory variables, or covariates. For example, for state hospital costs, these explanatory variables include the number of discharged patients and the average hospital stay per discharge. In general, we assume there are K explanatory variables xit,1, xit,2, …, xit,K that may vary by subject i and time t. We achieve a more compact notational form by expressing the K explanatory variables as a K × 1 column vector x it,1 xit,2 xit = . M xit,K To save space, it is customary to use the alternate expression xit = (xit,1, xit,2, …, xit,K)′, where the prime “ ′ ” means transpose. (You will find that some sources prefer to use a superscript “T” for transpose. Here, T will refer to the number of time replications.) Thus, the data for the ith subject consists of: {xi1,1 ,L, xi1,K , yi1} M {}x , , x , y iTi ,1 L iTi ,K iTi 2-2 / Chapter 2. Fixed Effects Models that can be expressed more compactly as {x′i1 , yi1} M . {}x′ , y iTi iTi Unless specified otherwise, we allow the number of responses to vary by subject, indicated with the notation Ti. This is known as the unbalanced case. We use the notation T = max{T1, T2, …, Tn} to be the maximal number of responses for a subject. Recall from Section 1.1 that the case Ti = T for each i is called the balanced case. Basic models To analyze relationships among variables, the relationships between the response and the explanatory variables are summarized through the regression function E yit = α + β1 xit,1+ β2 xit,2+ ... + βK xit,K, (2.1) that is linear in the parameters α, β1 ,…, βk. For applications where the explanatory variables are nonrandom, the only restriction of equation (2.1) is that we believe that the variables enter linearly. As we will see in Chapter 6, for applications where the explanatory variables are random, we may interpret the expectation in equation (2.1) as conditional on the observed explanatory variables. We focus attention on assumptions that concern the observable variables, {xit,1, ... , xit,K, yit}. Assumptions of the Observables Representation of the Linear Regression Model F1. E yit = α + β1 xit,1+ β2 xit,2+ ... + βK xit,K. F2. {xit,1, ... , xit,K} are nonstochastic variables. 2 F3. Var yit = σ . F4. { yit } are independent random variables. The “observables representation” is based on the idea of conditional linear expectations (see Goldberger, 1991, for additional background). One can motivate assumption F1 by thinking of (xit,1, ... , xit,K, yit) as a draw from a population, where the mean of the conditional distribution of yit given { xit,1, ... , xit,K} is linear in the explanatory variables. Inference about the distribution of y is conditional on the observed explanatory variables, so that we may treat { xit,1, ... , xit,K} as nonstochastic variables. When considering types of sampling mechanisms for thinking of (xit,1, ... , xit,K, yit) as a draw from a population, it is convenient to think of a stratified random sampling scheme, where values of {xit,1, ... , xit,K} are treated as the strata. That is, for each value of {xit,1, ... , xit,K}, we draw a random sample of responses from a population. This sampling scheme also provides motivation for assumption F4, the independence among responses. To illustrate, when drawing from a database of firms to understand stock return performance (y), one can choose large firms, measured by asset size, focus on an industry, measured by standard industrial classification and so forth. You may not select firms with the largest stock return performance because this is stratifying based on the response, not the explanatory variables. A fifth assumption that is often implicitly required in the linear regression model is: F5. {yit} is normally distributed. This assumption is not required for all statistical inference procedures because central limit theorems provide approximate normality for many statistics of interest. However, formal justification for some, such as t-statistics, do require this additional assumption. Chapter 2. Fixed Effects Models / 2-3 In contrast to the observables representation, the classical formulation of the linear regression model focuses attention on the “errors” in the regression, defined as εit = yit – (α + β1 xit,1+ β2 xit,2+ ... + βK xit,K). Assumptions of the Error Representation of the Linear Regression Model E1. yit = α + β1 xit,1+ β2 xit,2+ ... + βK xit,K + εit where E εit = 0. E2. {xit,1, ... , xit,K} are nonstochastic variables. 2 E3. Var εit = σ . E4. { εit } are independent random variables. The “error representation” is based on the Gaussian theory of errors (see Stigler, 1986, for a historical background). As described above, the linear regression function incorporates the additional knowledge from independent variables through the relation E yit = α + β1 xit,1+ β2 xit,2+ ... + βK xit,K. Other unobserved variables that influence the measurement of y are encapsulated in the “error” term εit, which is also known as the “disturbance” term. The independence of errors, F4, can be motivated by assuming that {εit} are realized through a simple random sample from an unknown population of errors. Assumptions E1-E4 are equivalent to assumptions F1-F4. The error representation provides a useful springboard for motivating goodness of fit measures. However, a drawback of the error representation is that it draws the attention from the observable quantities (xit,1, ... , xit,K, yit) to an unobservable quantity, {εit}. To illustrate, the sampling basis, viewing {εit} as a simple random sample, is not directly verifiable because one cannot directly observe the sample {εit}. Moreover, the assumption of additive errors in E1 will be troublesome when we consider nonlinear regression models in Part II. Our treatment focuses on the observable representation in Assumptions F1-F4. In assumption F1, the slope parameters β1, β2, …, βK are associated with the K explanatory variables. For a more compact expression, we summarize the parameters as a column vector of dimension K × 1, denoted by β 1 β = M . β K With this notation, we may re-write assumption F1 as E yit = α + xit′ β, (2.2) because of the relation xit′ β = β1 xit,1+ β2 xit,2+ ... + βK xit,K . We call the representation in equation (2.2) cross-sectional because, although it relates the explanatory variables to the response, it does not use the information in the repeated measurements on a subject. Because it also does not include (subject-specific) heterogeneous terms, we also refer to the equation (2.2) representation as part of a homogeneous model. Our first representation that uses the information in the repeated measurements on a subject is E yit = αi + xit′ β. (2.3) Equation (2.3) and assumptions F2-F4 comprise the basic fixed effects model. Unlike equation (2.2), in equation (2.3) the intercept terms, αi, are allowed to vary by subject. Parameters of interest The parameters {βj} are common to each subject and are called global, or population, parameters. The parameters {αi} vary by subject and are known as individual, or subject-specific, 2-4 / Chapter 2. Fixed Effects Models parameters. In many applications, we will see that population parameters capture broad relationships of interest and hence are the parameters of interest. The subject-specific parameters account for the different features of subjects, not broad population patterns. Hence, they are often of secondary interest and are called nuisance parameters. As we saw in Section 1.3, the subject-specific parameters represent our first device that helps control for the heterogeneity among subjects. We will see that estimators of these parameters use information in the repeated measurements on a subject. Conversely, the parameters {αi} are non-estimable in cross-sectional regression models without repeated observations. That is, with Ti = 1, the model yi1 = αi + β1 xi1,1+ β2 xi1,2+ ... + βK xi1,K + εi1 has more parameters (n+K) than observations (n) and thus, we cannot identify all the parameters. Typically, the disturbance term εit includes the information in αi in cross-sectional regression models.