ESTIMATION IN THE FIXED-EFFECTS ORDERED MODEL Chris Muris*

Abstract—This paper introduces a new estimator for the fixed-effects Second, the new estimator for the regression coefficient is ordered logit model. The proposed method has two advantages over existing estimators. First, it estimates the differences in the cut points along with the more efficient than existing estimators, despite the additional regression coefficient, leading to provide bounds on partial effects. Second, cut-point differences that need to be estimated. I show a strict the proposed estimator for the regression coefficient is more efficient. I use efficiency gain with respect to the most efficient estimator the fact that the ordered logit model with J outcomes and T observations 4 can be converted to a binary choice logit model in (J − 1)T ways. As an currently available. A simulation study suggests that the empirical illustration, I examine the income-health gradient for children efficiency gain can be substantial. using the Medical Expenditure Panel Survey. The proposed estimator is based on the observation that the ordered logit model with J outcomes and T observa- I. Introduction tions can be converted to a binary choice logit model in (J − 1)T ways. The conditional maximum likelihood estima- HE fixed-effects ordered logit model is widely used in tor in Chamberlain (1980) can be applied to each of these T empirical research in economics.1 The model allows a binary choice models. The proposed procedure optimally researcher with panel data and an ordinal dependent variable combines the information from all these binary choice mod- to control for time-invariant unobserved heterogeneity that els. Existing methods do one of the following: they use only is correlated with the observed covariates in an unrestricted (J −1) transformations or they collapse the ordered variable way. to a binary choice variable, which corresponds to using only In this paper, I propose a new estimator for the fixed- one such transformation. These procedures are less efficient effects ordered logit model. This procedure has two advan- for the regression coefficient and do not provide an estimate tages over existing methods. First, it simultaneously esti- of the cut-point differences. mates the differences in the cut points and the regression To fix ideas, consider the following simple example. Let coefficient. The cut-point differences can be used to bound yt ∈ {1, 2, 3} be an ordered random variable indexed by time. a marginal effect.2 Existing estimators do not estimate the Choose category 1 as the cutoff category and consider the cut-point differences and consequently cannot be used to transformation dt,1 = 1 {yt ≤ 1}. Since the transformed vari- 3 examine the magnitude of the regression coefficient. able dt,1 is a binary choice variable, the conditional logit estimator in Chamberlain (1980) can be applied to dt,1. The Received for publication December 8, 2014. Revision accepted for same procedure can be repeated for a transformation based publication April 21, 2016. Editor: Bryan S. Graham. on the other cutoff category d = 1 {y ≤ 2}. For efficiency, * Simon Fraser University. t,2 t I thank Irene Botosaru, Heng Chen, Bryan Graham, Brian Krauth, Krishna one could then combine the estimators based on dt,1 and dt,2. Pendakur, Pedro Portugal, Pedro Raposo, Marcel Voia, and Nathanael The first paper to point this out is Das and van Soest (1999). Vellekoop for very helpful comments and suggestions. This paper has ben- I show that there are two additional transformations when efited greatly from the feedback of two anonymous referees and from seminar participants at the 2014 Seattle-Vancouver Econometrics Con- we allow the cutoff category to vary over time, ference, Catolica Lisbon, University of Kentucky, the 2015 Canadian Economic Association meetings, Bristol University, Cemmap, and Maas- 1 {y1 ≤ 1} if t = 1, tricht University. dt,(1,2) = 1 {yt ≤ t} = A supplemental appendix is available online at http://www.mitpress 1 {y2 ≤ 2} if t = 2, journals.org/doi/suppl/10.1162/REST_a_00617. 1 Examples of empirical research using panel data with ordered responses and dt,(2,1) = 1 {yt ≤ 3 − t}. These transformations provide can be found in many areas of economics. In health economics, Carman (2013) looks at the intergenerational transfers and health, and Frijters, additional information about the regression coefficient, and Haisken-DeNew, and Shields (2005) and Khanam, Nghiem, and Connelly about how far apart the categories are. Therefore, combining (2014) look at the relationship between income and health. In the context the conditional logit estimators based on these transforma- of the economics of educations, Fairlie, Hoffmann, and Oreopoulos (2014) estimate the effect of same-minority teachers on student outcomes. Allen tion allows us to estimate the regression coefficient more and Allnutt (2013) are interested in the effect of the Teach First program efficiently and simultaneously gives us an estimator for the on student achievement. In labor economics, examples of authors using the cut-point differences. fixed-effects ordered logit model are Das and van Soest (1999), Hamermesh (2001), and Booth and van Ours (2008, 2013). An area of research where The remainder of this paper is organized as follows. fixed-effects ordered logit models are heavily used is the empirical research Section II surveys the related literature. Section III intro- of life satisfaction; see, among many others, Ferrer-i-Carbonell and Frijters duces the fixed-effects ordered logit model and presents the (2004), Frijters et al. (2004), and Blanchflower and Oswald (2004). Finally, J − T the fixed-effects order logit model is useful for the analysis of (sovereign) main result concerning the ( 1) sufficient . In credit ratings, for example, Amato and Furfine (2003) and Afonso, Gomes, section IIIC, I discuss how the cut-point differences can be and Rother (2013). used to bound a certain marginal effect that may be rel- 2 Without the cut-point differences, partial effects cannot be computed. This is closely related to an analogous drawback in the context of the fixed- evant to empirical practice. In section IV, I introduce the effects binary choice logit model. See the discussion in Chamberlain (1984), Honoré (2002), and Wooldridge (2010). 4 This estimator was introduced by Das and van Soest (1999). Their 3 See Baetschmann, Staub, and Winkelmann (2015) for a recent contribu- estimator, and some variations on it, are discussed in Baetschmann et al. tion and an overview of the existing procedures. (2015).

The Review of Economics and Statistics, July 2017, 99(3): 465–477 © 2017 by the President and Fellows of Harvard College and the Massachusetts Institute of Technology doi:10.1162/REST_a_00617

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021 466 THE REVIEW OF ECONOMICS AND STATISTICS

conditional maximum likelihood estimator based on a sin- Magnac (2004) relaxes the serial independence assumption gle transformation and establish its asymptotic properties. on the error terms. In section V, I show how to efficiently combine the infor- This paper is not the first to consider estimation in the mation from all such estimators into a single estimator for fixed-effects ordered logit model. Das and van Soest (1999) the regression coefficient and cut-point differences. I show discuss how to combine the information from several binary that the estimator is at least as efficient as currently avail- choice models into one estimator. Baetschmann et al. (2015) able procedures. Section VI discusses the implementation of analyze the estimator in Das and van Soest and discuss dif- the estimator, focusing on its implementation in Stata. I also ferent ways of aggregating the information from the binary discuss a composite likelihood procedure designed to over- choice models considered by Das and van Soest. They also come finite-sample issues. Section VII contains an empirical introduce a composite likelihood estimator for the regres- illustration on health satisfaction of children as it relates to sion coefficient that is asymptotically less efficient than the family income. Section VIII concludes. A supplementary estimator in Das and van Soest but is shown to be preferable online appendix contains additional proofs and derivations in small samples in an extensive simulation study. They also (appendix A) and documents the results of a simulation study show that the procedure in Ferrer-i-Carbonell and Frijters (appendix B). (2004) is inconsistent. Of the fourteen empirical papers listed in note 1, nine use methods developed for fixed-effects model in ordered II. Related Literature response models: Baetschmann et al. (2015) appears three times, Ferrer-i-Carbonell and Frijters’s (2004) procedure is This paper contributes to the literature on estimation used by five papers, and the Chamberlain (2010) approach in nonlinear, parametric, large-n fixed-T panel data mod- and the procedure in Das and van Soest are each used els with fixed effects by providing an estimator for the once. Four papers use linear panel data models, ignoring parameters in the fixed-effects ordered logit model. the discrete nature of the dependent variable. In such models, estimation is complicated due to the inci- A (correlated) random effects ((C)RE) approach is used in dental parameters problem (see, e.g., Lancaster, 2000). For five papers. In a CRE model, the unobserved heterogeneity a small class of models, model-specific solutions are avail- is modeled as a function of time-invariant characteristics, able. The binary choice logit model is discussed in detail in including time-averaged regressors, with an additive error the next paragraph. Machado (2004) analyzes the binomial term that is assumed to be independent of the regressors regression model with logistic link function. Truncated and in the model. That model is more restrictive than a fixed- censored regression models are discussed by Honoré (1992), effects model, which does not impose any restrictions on the and count data models are discussed in Hausman, Hall, and relationship between the unobserved heterogeneity and the Griliches (1984). regressors. The main drawback of the CRE approach is that Results for restricted classes of models are sometimes misspecification of the model for the unobserved heterogene- available. Hahn (1997) provides results on efficient esti- ity will produce an inconsistent estimator. Honoré (2002) mation in panel data models if the density belongs to the points out some further drawbacks of the CRE approach in exponential class. Wooldridge (1999) considers two classes the context of nonlinear models. of models with multiplicative unobserved heterogeneity. Lancaster (2002) proposes a likelihood-based approach on III. Model and Main Result models that allow an orthogonal reparameterization of the incidental parameter. Bonhomme (2012) proposes a func- Section IIIA introduces the fixed-effects ordered logit tional differencing approach and shows that it is useful for model. Section IIIB shows that the model can be transformed a class of models including random coefficient models and into a fixed-effects binary choice logit model by using poten- models. Excellent reviews for the non- tially time-varying cutoff categories. This leads to the main linear panel data literature are in Arellano and Honoré (2001) result, which establishes the existence of a sufficient statistic and Arellano and Bonhomme (2012). for the unobserved heterogeneity in each of the transformed Estimation in the fixed-effects ordered logit model is models. Finally, section IIIC discusses how the differences closely related to the literature on fixed-effects binary choice of the cut-point parameters in the ordered model can be used logit models. Building on Andersen (1970), Chamberlain to bound a marginal effect, thus providing an interpretation (1980) discusses the conditional maximum likelihood esti- for the magnitude of the regression coefficient. mator (CMLE) in the fixed-effects binary choice logit model and in an unordered logistic model. The A. Fixed-Effects Ordered Logit Model logistic case is special. Chamberlain (2010) shows that in the fixed-effects binary choice model with unbounded and Consider the ordered logit model with additive unobserved strictly√ exogenous regressors, and i.i.d., unit-variance error heterogeneity in the latent variable, terms, n consistent estimation is possible only if the error ∗ = α + β + = ··· ≥ terms follow a logistic distribution. In a two-period setting, yit i Xit uit, t 1, , T 2, (1)

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021 ESTIMATION IN THE FIXED-EFFECTS ORDERED LOGIT MODEL 467

1×K for an individual i with a vector of regressors Xit ∈ R , shows that such a sufficient statistic for the unobserved het- K×1 T and associated vector of regression coefficients β ∈ R . erogeneity parameter αi is available for (J − 1) different The error terms are assumed to be serially independent transformations of the fixed-effects ordered logit model in conditional on the regressors Xi = (Xi1, ··· , XiT ) and the equations (1) to (3). unobserved heterogeneity αi ∈ R, and to follow a standard The ordered dependent variable yit can be transformed into logistic distribution a binary variable by checking whether yit ≤ π (t) for any time series of cutoff categories π (t), t = 1, ··· , T. There ··· | α ∼ T (ui1, , uiT ) ( i, Xi) iid LOG (0, 1).(2)are (J − 1) ways of constructing such a transformation π = π T ( (t))t=1. Denote by di,π the binary time series that results The observed, ordered dependent variable yit ∈ {1, ··· , J} is ∗ from applying transformation π to observation i: linked to the latent variable y through cut points γ ∈ R in it j  the following way: = = { ≤ π } = ··· ⎧ di,π di,t,π 1 yit (t) , t 1, , T . ⎪ ∗ γ ⎪1if yit < 1, ⎪ The transformed observation d π follows the model: ⎨ γ ≤ ∗ γ i, 2if1 yit < 2, y = ∗ it ⎪. . (3) y = α + X β + u , (6) ⎪. . it i it it ⎩⎪ | γ ≤ ∗ ui (αi, Xi) ∼ iid LOG (0, 1), (7) J if J−1 yit. di,t,π = 1 αi + Xitβ + uit < γπ(t) . (8) We will refer to the model in equations (1) to (3) as the fixed-effects ordered logit model. I will refer to the model in equations (6) to (8) as the π- = ··· A random sample of size n on (yi, Xi) (yi1, , yiT , transformed fixed-effects binary choice logit model. Denote ··· Xi1, , XiT ) is available for the estimation of the regression the number of observations below or at the associated cutoff coefficient and the cut points. In the asymptotic analysis, categories by the number of cross-section units diverges to infinity. The number of time periods T can be small as long as T ≥ 2. T ¯ = Conditional on the covariates Xi and the unobserved het- di,π di,t,π. = erogeneity αi, the probability that the ordered dependent t 1 variable yit assumes a particular value j is Furthermore, denote by p π (d) the probability distribu- i, ¯ = | α tion of d π conditional on d π, written as a function of P (yit j Xi, i)  i, i,  the regression coefficient β of interest and the cut points = P γj−1 < αi + Xitβ + uit < γj Xit, αi   γ = (γ , ··· , γ − ): = Λ γ − α − β − Λ γ − α − β 1 J 1 j i Xit j−1 i Xit ,(4)  ¯ ¯ p π (d|β, γ) ≡ P d π = d|d π = d, X , α . where Λ (x) = exp (x) / (1 + exp (x)) is the CDF of the i, i, i, i i γ = logistic distribution, and we have implicitly set 0 Finally, denote by F¯ the set of all binary T—vectors that −∞ γ =+∞ d and J . The maximum likelihood estima- set exactly d¯ elements to 1: tor is affected by the incidental parameters problem (see   Lancaster, 2000) since the number of parameters in the = ∈ { }T ¯ = ¯ Fd¯ f 0, 1 such that f d . likelihood function, ¯ n T J  The following theorem formalizes that di,π is a sufficient statistic for α in the π-transformed fixed effects binary Ln (β, δ, α) = Λ γj − αi − Xitβ i = = = choice logit model, equations (6) to (8). i 1 t 1 j 1 1{yit =j} − Λ γ − − α − X β , (5) j 1 i it Theorem 1. If the random vector (yi, Xi) follows the fixed- n β effects ordered logit model in equations (1)–(3), then for any grows with the sample size . The estimator for based on π equation (5) will be inconsistent for n →∞if T is fixed. transformation , the conditional probability distribution of the π-transformed dependent variable di,π is given by B. Transformations 1 pi,π (d| β, γ) =     The incidental parameters problem can be avoided in ∈ exp ( ft − dt) γπ(t) − Xitβ f Fd¯ t models for which a sufficient statistic for the incidental (9) parameters is available.5 The main result in this section for any d ∈ {0, 1}T . This conditional probability distribution 5 Andersen (1970) derives conditions under which a CMLE may be consis- does not depend on α . tent for the common parameters. I use the sufficient statistic in Chamberlain i (1980), who extends the insight in Andersen (1970) to the fixed-effects binary choice logit model. The proof is in appendix A.1.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021 468 THE REVIEW OF ECONOMICS AND STATISTICS

Table 1.—An Illustration of the Relationship between the Ordered transformed observations has variation, but observation y3 Variables yit , the Transformed Variables di,t,π, and the Sum of the ¯ does, since y3 = (1, 3) is transformed into d3,(1,2) = (1, 0). Transformed Variables di,π for the Case J = 3, T = 2, and Three Different Cross-Section Units This last transformation, π = (1, 2), can easily be seen to discard information as it reduces the effective sample tyi,t di,t,(1,1) di,t,(2,2) di,t,(1,2) di,t,(2,1) size from three to one: only one of the three observations Observation 1 111111has variation in the transformed dependent variable. Trans- 220110formation π = (2, 1) (column 6 in table 1) is inefficient ¯ d1,π 1221because all the observations are equivalent after applying Observation 2 that transformation. These examples highlight that the choice 120101of transformation determines the source of variation the 230000 ¯ researcher uses when estimating the parameters of the model. d2,π 0101 Observation 3 Interestingly, for observations with no variation in the 111111ordered dependent variable, there may be transformations 230000 ¯ that induce variation in the transformed dependent variable. d3,π 1111This happens when the ordered variable is constant, and not equal to 1 or J. An example is y4 = (2, 2) and π = (2, 1). = The remainder of this section discusses the main result. Then d4,(2,2) (1, 0), so that the transformed variable is not If the cutoff categories are time invariant, then theorem 1 constant over time, even though we started from an ordered simplifies to variable that did not exhibit any variation.

1 C. Cut Points p π (d| β) =    , i, − − β f ∈F¯ exp t ( ft dt) Xit d In this section, I show that one can use knowledge of the cut-point differences to estimate the minimum required which does not depend on the cut points γj. This expression is identical to the conditional likelihood contribution in the change in the mth regressor to move an arbitrary cross- = fixed-effects binary choice logit model (Chamberlain, 1980). section unit with yit j to a category higher than j. For a transformation with time-varying cutoff categories, An interpretation of the magnitude of the regression the conditional probability depends on the differences in the coefficient is not available when cut-point differences are cut points. Consider the case T = 2, and let π = ( j, k) with unknown. This is related to a well-known drawback of the j = k. The (inverses of the) conditional probabilities are fixed-effects binary choice logit model (see the discussion   in Chamberlain, 1984 and Wooldridge, 2010). Consider the −1 | β γ = + γ − γ − − β pi,( j,k) ((1, 0) , ) 1 exp k j (Xi2 Xi1) , marginal effect of a ceteris paribus change in regressor m on (10) the probability that the dependent variable for individual i in   −1 period t is at or below j, p ((0, 1)| β, γ) = 1 + exp γj − γk − (Xi1 − Xi2) β . i,( j,k)  (11) ∂P (yit ≤ j| Xit, αi) = β Λ α + X β − γ ∂X m i it j Table 1 illustrates this case with J = 3. Four transformations it,m  × 1 − Λ α + X β − γ , (12) are available. The first one is dt,(1,1) = 1 {yt ≤ 1}, where i it j category 1 is chosen as the cutoff category in both time where β is the regression coefficient associated with the periods. Consider two observations from table 1: y = (1, 2) m 1 mth regressor. This marginal effect depends on the unob- and y = (2, 3). The transformed data are in column d . 2 i,t,(1,1) served heterogeneity α . Although the sign of the regression For observation y , only the y is at or below the cutoff, i 1 11 coefficient determines the sign of equation (12), the mag- so that d = 1 and d = 0. For y , both entries 1,1,(1,1) 1,2,(1,1) 2 nitude of the marginal effect is unknown in the absence of exceed the cutoff category, so that d = d = 0. 2,1,(1,1) 2,2,(1,1) knowledge on α .6 The inability to compute marginal effects For observation 2, there is no variation in the dependent i for the binary choice model is a serious obstacle to its use variable over time, which implies that this transformed in empirical applications. observation provides no information on the parameters. The Consider the ordered logit model with J = 2 (binary table also presents (in column 4) the other time-invariant choice). Knowledge of yit tells us whether the latent vari- transformation, π = (2, 2). ∗ able y is to the left or to the right of the cut-point γ , but it Next, consider the time-varying transformation π = (1, 2), it 1 does not tell us how far away the latent variable is from that which chooses cutoff categories 1 in period 1, and 2 cut point. Consequentially, for an arbitrarily large change in in period 2. For observation y = (1, 2), the ordered 1 X , we cannot be sure that the counterfactual latent variable variable is at the cutoff category in both periods, so it crosses the cut point. that d1,1,(1,2) = d1,2,(1,2) = 1. For observation y2 = (2, 3), the dependent variable exceeds the cutoff category in both 6 The expected marginal effect would require knowledge of the conditional = = periods, so that d2,1,(1,2) d2,2,(1,2) 0. Neither of those distribution of αi.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021 ESTIMATION IN THE FIXED-EFFECTS ORDERED LOGIT MODEL 469

In the ordered logit model with J > 2, the same reasoning The first component states that we can be certain that the applies to an observation that falls in one of the extreme cat- counterfactual dependent variable is greater than j if the 9 egories, yit ∈ {1, J}. For such an observation, the associated change in the latent variable due to Δx is large enough. ∗ Δ latent variable yit can be arbitrarily far away from the clos- To turn this into a useful quantity, consider a change of xm est cut point (γ1 or γJ−1). In the absence of knowledge on in the mth regressor Xitm, with positive regression coefficient αi + uit, there is no change in Xitβ large enough to guarantee βm > 0. Introduce the quantity that the latent variable moves across the cut point. γ − γ δ j = j j−1 The situation is different when we observe a depen- m . (15) βm dent variable in an intermediate category, yit = j ∈ {2, ··· , J − 1}. For such an observation, we know that the The result in equation (14) implies that ∗ γ γ latent variable yit is in the finite interval j−1, j . This Δ δ j ⇒ | = = allows us to bound the marginal effect. For example, a ceteris xm > m P (y˜it > j yit j, Xit) 1. ∗ paribus change in X β that exceeds γ − γ − will move y it j j 1 it j across one of the cut points. By considering the required As a result, we can interpret δm as the minimum required = change in Xit to guarantee such a crossing, this bound pro- change in Xitm to move an arbitrary observation yit j to a j vides an interpretation of the strength of the relationship higher category. A small value of δm means that the relation- between Xit and the yit. This opportunity is not available for ship between Xit and yit is strong at category j. Larger values j currently available estimators because they do not yield a for δm can be due to the distance from category j, which is consistent estimator of the cut-point differences. far away from j + 1, or because the mth regressor has little 10 To formalize this interpretation, consider an individual i impact on yit. who is in an intermediate category j at time t: yit = j ∈ {2, ··· , J − 1}.7 For such an individual, IV. Conditional Maximum Likelihood Estimation  ∗ = β + ∈ γ γ yit Xit vit j−1, j , (13) This section analyzes identification and estimation in one arbitrary π-transformed fixed effects binary choice model where vit is the composite error term vit = αi + uit.A Δ equations (6) to (8). ceteris paribus change in the regressors of x induces the The conditional probability, equation (9), in theorem 1 counterfactual variables depends on γ only through the differences in the cut points used by transformation π. To see this, note that the number X˜ it = Xit + Δx, ∗ = ∗ + Δ β of nonzero entries for any y˜it yit ( x)   = + Δ β + ∈ = ∈ { }T ¯ = ¯ (Xit x) vit, f Fd¯ f 0, 1 such that f d J   = · γ ∗ γ is equal to the number of nonzero entries in d. For any y˜it j 1 j−1 < y˜it < j . f ∈ F¯ , we have that γπ ( f − d ) = 0. This allows us = d (1) t t t j 1 to rewrite the conditional probability in terms of cut-point The object of interest is the distribution of the counterfactual differences γπ(t) − γπ(1): dependent variable, conditional on the observable random p−1 (d| β, γ) variables yit and Xit. Let Fv denote the unknown distribution i,π  of the composite error term vit = αi + uit, conditional on Xi. In appendix A.2, I obtain the following result for the condi- = exp ( ft − dt) γπ(t) − ( ft − dt) Xitβ ∈ t t tional probability that the counterfactual dependent variable f Fd¯ 8 y˜it is in a category strictly larger than j: 9 The second and third components of this display are not informative. The | = P (y˜it ⎧> j yit j, Xit) second component states that if we start from category j and decrease the ⎪ Δ β γ value of the latent variable, the counterfactual dependent variable will not ⎪1if( x) > j be greater than j. The third component is uninformative because it requires ⎨⎪ several evaluations of the unknown function F . For this reason, we will −γ − , v = j 1 focus on the first component. ⎪0if(Δx) β < 0, 10 An alternative interpretation of the cut-point differences is available. ⎪ γ − β − γ − +Δ β Denote the odds ratio for category j by ⎩⎪ Fv( j Xit ) Fv( j (Xit x) ) otherwise. ≤ | α  Fv(γj−Xit β)−Fv(γj−1−Xit β) P ( yit j Xit , i) ωj (Xit ) = = exp γj − αi − Xit β . (14) P ( yit > j| Xit , αi) The ratio of the odds ratios of two categories j and k, j = k is a function of 7 the cut-point difference only: The interpretation is available only for a dependent variable in the   intermediate categories, not for y ∈ {1, J}. it ωj (Xit ) 8 = γ − γ The appendix contains some additional results. For example, an log ω j k . expression similar to the one that follows can be obtained for k (Xit ) P ( y˜it = k| yit = j, Xit = x). Here, we focus on the conditional probability This formalizes the idea that, conditional on Xit , the cut-point differences P ( y˜it > j| yit = j, Xit ) because it leads to an easily interpretable quantity. measure the distance between two categories.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021 470 THE REVIEW OF ECONOMICS AND STATISTICS   2 | θ = − γ − γ ∂ ln pi,π (d π) exp ( ft dt) π(t) π(1) Hi,π (d| θπ) = , ∂θπ∂θ f ∈F¯ t π d  be the contribution to the score vector and the Hessian matrix − ( ft − dt) Xitβ . (16) for an individual i with di = d. In appendix A.3, I show that t  − θ − ∈ ¯ exp (( f d) Ziπ π) Ziπ ( f d) The estimand in the π-transformed fixed-effects binary f Fd si,π (d| θπ) =−  choice logit model, equations (6) to (8), therefore consists ∈ exp (( f − d) Ziπθπ) f Fd¯ of the regression coefficient β and a subset of the differences (19) of the cut points. To formally define the estimand, let and  Hi,π (d| θπ) γ˜ π,Δ = γπ(1) − γπ(1), ··· , γπ(T) − γπ(1)  { − πθπ} { − πθπ} 1 f ,g∈F¯ exp ( f d) Zi exp (g d) Zi be the T × 1 vector of cut-point differences that appear in =− d   2  2 equation (16). Denote by nπ the number of unique, nonzero ∈ exp (( f − d) Ziπθπ) f Fd¯ elements in γ˜ π,Δ. Collect those elements in an nπ × 1 vector 11 × Z π (g − f ) (g − f ) Ziπ. (20) γπ,Δ. The T ×nπ selection matrix Sπ transforms the unique i, γ elements in π,Δ into the time series of cut-point differences The second derivative, equation (20), is negative semidefi- γ ˜ π,Δ: nite: both the denominator and the numerator are positive, and the nonscalar part is an outer product. It follows that the γ˜ π Δ = Sπγπ Δ. , , criterion function, equation (18), is globally concave, which Adjoin the selection matrix Sπ to the stacked regressor helps me to establish consistency and asymptotic normality matrix to obtain the T× (K + nπ) set of augmented regres- under fairly weak conditions. To state a sufficient condition for consistency and asymp- sors Ziπ = −Xi | Sπ . The associated (K + nπ) × 1 vector  totic normality of the CMLE for θπ,0, let of augmented regression coefficients θπ = β, γπ,Δ is the ⎡ ⎤

parameter of interest for the π-transformed fixed-effects X   ⎢ i1 ⎥ binary choice model. The conditional probability in theorem . X˜ = vec X = ⎢ . ⎥ 1 can now be rewritten as a function of the estimand: i i ⎣ . ⎦

X −1 iT p (d| θπ) i,π    be the KT × 1 column vector that stacks the T blocks of K = exp ( ft − dt) γπ(t) − γπ(1) − Xitβ regressors for individual i. f ∈F¯ t d Assumption 1. The variance matrix of the regressors, = exp {( f − d) Ziπθπ}. (17) Var X˜ i , exists and is positive definite. ∈ f Fd¯ Assumption 1 guarantees that there is sufficient variation Then the CMLE can be defined as in the regressors for a given individual across time. In par-  n ticular, it implies that for any two time periods s = t, the θˆ = βˆ γ = 1 { = } | θ π , ˆ π,Δ arg max 1 di d lnpiπ (d π) variance matrix of the difference X − X is of full rank. RK ×Rnπ n it is i=1 This rules out regressors that are constant across a subset (18) of the sample period and rules out sets of regressors that  perfectly covary. It rules out time dummies. This assump- as an estimator for θπ = β , γπ Δ , the true value of ,0 0 , ,0 tion could be relaxed, as identification of the regression the parameters in the π-transformed binary choice model coefficient requires only two time periods. implied by the ordered logit model with true parameter To state the main result in this section, some additional values (β , γ ). 0 0 notation is required. First, denote by The upcoming proof of consistency of the CMLE relies on the concavity of the criterion function in equation (18). si,π (θπ) = 1 {di = d} si,π (d| θπ) Let d∈{0,1}T ∂ ln pi,π (d| θπ) s π d| θπ = the score contribution of individual i, and let Σπ be the i, ( ) θ , ∂ π variance of the score at the true value of the parameters:     11 A time-invariant transformation π sets π (t) = j for all t, so that the Σπ = E si,π θπ,0 si,π θπ,0 . (21) cut-point difference parameter γπ,Δ is empty, and nπ = 0.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021 ESTIMATION IN THE FIXED-EFFECTS ORDERED LOGIT MODEL 471

Denote the Hessian by The representation in terms of the first-order conditions is ⎛ ⎞ the first-order condition  ⎝ ⎠ π θπ = Hπ (θπ) = E 1 {di = d} Hi,π (d| θπ) (22) E si ,0 0. (26) T d∈{0,1} It will be useful to separate the role of the regression  coefficient from that of the cut-point differences: and write Hπ = Hπ θπ when the Hessian is evaluated at ,0  θπ,0. si,π,β β, γπ,Δ E [s π (θπ)] = E  i β γ { } = ··· si,π,γ , π,Δ Theorem 2. Let ( yi, Xi , i 1, , n) be a random sam-  ple from the fixed-effects ordered logit model, equations (1) β γ β = ∂ ln piπ , π, Δ /∂ to (3), with true parameter values (β0, γ0), and let π be an E . ∂ ln p π β, γπ Δ /∂γπ Δ arbitrary transformation. If assumption 1 holds, then (i) the i , , θˆ θ CMLE π in equation (18) is consistent for π,0; From this perspective, a transformation π provides K + nπ first-order conditions for K + nπ parameters.12 These ˆ p θπ → θπ,0 as n →∞; first-order conditions are moment conditions in the GMM framework. In what follows, we will gather the moment (ii) a central limit theorem applies to the score, conditions from all transformations. The set of all (J − 1)T transformations can be separated 1  d √ s π θπ → N (0, Σπ); (23) into a set of J − 1 time-invariant transformation,13 and the n i, ,0 i remaining, time-varying, transformations. We will consider them separately, starting with the time-invariant transforma- θ θ (iii) the Hessian Hπ ( π) exists and is invertible for all π; tions. For a time-invariant transformation, the number of θˆ and (iv) the CMLE estimator π in equation (18) has the unique cut-point differences is nπ = 0, so that the param- following limiting distribution: eter of interest consists only of the regression coefficient √   θπ,0 = β0. For a single time-invariant transformation, the θˆ − θ →d N −1Σ −1 →∞ n π,n π,0 0, Hπ πHπ as n . restriction on the K × 1 vector of scores si,π,β exactly identi- (24) fies β0. Taken together, the (J − 1) sets of restrictions on the score vectors from the time-invariant transformations over- The proof is in part i in appendix A.4 and parts ii to iv in identify the regression coefficient β0 through the K (J − 1) appendix A.5. moment conditions, Results i and iv of theorem 2 describe the asymptotic ⎡ ⎤ behavior of the CMLE based on a single transformation si,(1,··· ,1) (β0) ⎢ ⎥ π. Parts ii and iii are intermediate results for a standard β = ⎢ . ⎥ = E si,1,β ( 0) E ⎣ . ⎦ 0, (27) proof of asymptotic normality of an extremum estimator. They are stated here because they are essential ingredients si,(J−1,··· ,J−1) (β0) for the efficiency result in the next section. if J > 2. A GMM estimator based on equation (27) takes the form V. Efficiency β˜ = ¯ β ¯ β W1,n arg min s1,n ( ) W1,ns1,n ( ), (28) In this section, I introduce a class of generalized method of  1 n moments (GMM) estimators that incorporate the information where s¯ (β) = s β (β), and W is a weight − T 1,n n i=1 i,1, 1,n from all (J 1) transformations that can be applied to the matrix. fixed-effects ordered logit model. The optimal estimator in Existing consistent estimators for the regression coef- this class yields an estimator for the regression coefficient ficient correspond to different choices for W1,n. Setting β that is at least as efficient as existing estimators. I also 0 W1,n = ej ⊗ IK corresponds to using the time-invariant show that it is asymptotically equivalent to the optimal linear transformation π (t) = j. The blow-up-and-cluster (BUC) combination of all (J − 1)T CMLE estimators or optimal estimator advocated by Baetschmann et al. (2015) sets W1,n minimum distance estimator (OMD). equal to a block diagonal matrix with its jth block equal π For an arbitrary transformation , the estimand targeted to the inverse of the Hessian associated with transforma- by the CMLE is the maximizer tion π (t) = j (Baetschmann et al., 2015). Denote by β˜ ∗ the ⎡ ⎤ 12 The derivative siπ,β gives K moment conditions, and there are nπ moment ⎣ ⎦ β θπ,0 = arg max E 1 [di = d] ln pπ (d| θπ) . conditions from siπ,γ. There are K elements in the parameter and nπ RK ×Rnπ elements in γπ Δ. ∈{ }T , d 0,1 13 A time-invariant transformation is one that sets π (t) = j, j ∈ (25) {1, ··· , J − 1} for all t.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021 472 THE REVIEW OF ECONOMICS AND STATISTICS

asymptotically efficient GMM estimator in the class (28), the variance of the restricted set of scores from time-invariant  − 1 transformations in equation (27).15 The diagonal blocks are which sets plimW1,n = E si,1,β (β0) si,1,β (β0) . That esti- the transformation-specific variance matrices Σπ from equa- mator is asymptotically equivalent to the minimum distance tion (21). Furthermore, denote by H the expected derivative estimator proposed by Das and van Soest (1999; see also of the scores in equation (27) with respect to (β, γΔ), evalu- remark 1). ated at the true values of the parameter.16 Finally, H consists The moment conditions implied by the time-invariant 1 of the top left K (J − 1) × K block of H, which stacks the transformations, equation (27), do not exhaust the infor- Hessians Hπ of the time-invariant transformations. mation in the fixed-effects ordered logit model. Each time- varying transformation implies K + nπ additional moment Theorem 3. Let ({y , X }, i = 1, ··· , n) be a random sam- conditions of the form (26). These moment conditions i i β γ ple from the fixed-effects ordered logit model, equations (1) to involve both 0 and π,Δ,0. Gather the cut-point difference β γ γ = (3), with true parameter values ( 0, 0), and let assumption parameters from all time-varying transformations in Δ →∞ 14 1 hold. Then, as n , γπ,Δ , a column vector with nγ ≡ π nπ elements. π √  Similarly, collect the scores β˜ ∗ − β →d N   n 0 (0, V1), ∗ si,2,γ (β, γΔ) = si,π,γ β, γπ,Δ , π : nπ ≥ 1 √ βˆ β − 0 →d N n ∗ (0, V), γ γΔ in a nγ × 1 vector. The scores for the regression coefficients ˆ Δ ,0 from the time-varying transformations are collected in   where   si,2,β (β, γΔ) = si,π,β β, γπ,Δ , π : nπ ≥ 1 . −1 = Σ−1 V1 H1 1 H1 , Taking together the restrictions on the scores from all   −1 (J − 1)T transformations, we obtain V = H Σ−1H . ⎡ ⎤ s β (β ) ×  ⎢ i,1, 0 ⎥ Furthermore, let Vβ be the top-left K K block of V. Then E s β , γΔ = E ⎣ β β γΔ ⎦ = 0, (29) V − Vβ is positive semidefinite. i 0 ,0 si,2,  0, ,0 1 si,2,γ β0, γΔ,0

T Proof. The restrictions on si,2,γ exactly identify the cut- which has dimension K (J − 1) + nγ for the K + nγ- point differences γΔ,0. Therefore, by theorem 1 in Ahn and dimensional parameter (β, γΔ). A GMM estimator based on Schmidt (1995), estimation of β0 using the restrictions on equation (29) takes the form s β is equivalent to estimation of β , γπ using the restric-  i,1, 0 ,0 tions on si,1,β and si,2,γ. The restrictions on si,2,β therefore βˆ , γˆ Δ = arg min s¯ (β, γΔ) W s¯ (β, γΔ), (30) W,n ,W,n n n n provide additional information on β that is not used for  0 1 n estimation of the additional cut-point parameters. A detailed ¯ β γΔ = β γΔ where sn ( , ) n i=1 si ( , ) is the sample analog ∗ ∗ proof is in appendix A.6. of the moments in equation (29). Denote by βˆ , γˆ Δ the asymptotically efficient estimator in the class of estimators Theorem 3 states that for the regression coefficient β0, esti- of the form of equation (30). mation based on all transformations is at least as efficient as  Theorem 3 establishes the asymptotic distribution of estimation based on the time-invariant transformations only. ∗ ∗ ∗ ∗ βˆ , γˆ Δ and shows that βˆ is at least as efficient as β˜ , so that Therefore, the amount of information gained by considering adding the moment conditions from the time-varying trans- the first-order conditions from the time-invariant transforma- formations reduces the asymptotic variance of the regression tions is greater than or equal to the amount of information coefficient estimator. For a formal result, some additional required for estimating the additional cut-point parameters. ∗ notation is needed. First, denote by Note that this procedure also yields an estimator γˆ Δ for the   17   cut-point differences. Finally, note that the estimation pro- Σ = E si β0, γΔ,0 si β0, γΔ,0 (31) cedure can likely be improved by taking into account the relationship between the components of γΔ,0. the variance of the entire set of scores, equation (29), and by     Σ = β γ β γ 15 1 E si,1,β 0, Δ,0 si,1,β 0, Δ,0 The matrix Σ1 is the top-left K (J − 1) × K (J − 1) corner of Σ. 16 This matrix has a specific structure; it is omitted for the sake of brevity which is in appendix A.7.1. 14 Taking into account the relationship between the components in γΔ may 17 An anonymous referee pointed out that assumption 1 rules out time lead to a more efficient estimation procedure. For the purpose of this section dummies, but that using information from multiple transformations may (showing efficiency increases for the regression coefficient β), it is useful allow one to jointly identify time dummies and cut-point differences. I do to consider each γπ,Δ as a separate parameter. not provide a formal proof here and leave this for future research.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021 ESTIMATION IN THE FIXED-EFFECTS ORDERED LOGIT MODEL 473

Remark 1. In appendix A.7, I show that the optimal the clogit command.19 The OMD estimator can be obtained minimum distance (MD) estimator based on all (J − 1)T using the suest command after several calls to clogit. The CMLE estimators is asymptotically equivalent to the opti- CLE can be obtained by running clogit after duplicating ∗ ∗ mal GMM estimator βˆ , γˆ Δ in theorem 3. This provides an the data set using expand. This extends the insight by alternative way to combine the information in all the trans- Baetschmann et al. (2015) to time-varying transformations. formations, which will prove useful for implementation (see section VI).18 A. CMLE and OMD

Remark 2. By inspecting the expression for the optimal If a binary dependent variable di and regressors Xi follow a weights in appendix A.7, it can be seen that the weight given fixed-effects binary choice logit model, the conditional prob- to βˆ π is ability that forms the basis for the CMLE in Chamberlain ⎡ ⎤ (1980), and implemented in Stata’s clogit is given by 0  ⎢ ⎥ ⎢ . ⎥ −1 | β = + − − β ⎢ . ⎥ pi (d ) 1 exp ( ft dt) Xit . (32) ⎢ . ⎥ ⎢ ⎥ t ⎢ 0 ⎥   ⎢ βγ ⎥ −1 ⎢ Hπ ⎥ Now consider an observation (yi, Xi) from the fixed-effects ≡ Σ−1 Σ−1 ⎢ J ⎥ Wπ H H H ⎢ γ ⎥. ordered logit model. The conditional probability associ- ⎢ HπJ ⎥ π ⎢ ⎥ ated with the -transformed model is given by theorem 1, ⎢ 0 ⎥ ⎢ ⎥ equation (9): ⎢ . ⎥  ⎣ . ⎦  −1 | β γ = − γ − β 0 pi,π (d , ) exp ( ft dt) π(t) Xit . ∈ t f Fd¯ This weight will not be equal to 0 since H and Σ have full (33) rank because of the logistic errors. Since the OMD estima- π γ tor assigns nonzero weights to the CMLE for the regression For a time-invariant transformation , the cut points π(t) coefficient from time-varying transformations, this means drop out, and the conditional probability, equation (33), is that it is strictly more efficient than the OMD procedure identical to that of the fixed-effects binary choice logit model in Das and van Soest (1999), which implicitly assigns zero in equation (32). In that case, the CMLE for the regression π weight to the estimators based on time-varying transforma- coefficient using the -transformed model is identical to the tions. By remark 1, the efficient GMM and OMD estimators CMLE for the binary choice model applied to diπ. π based on all transformations are strictly more efficient than For a time-varying transformation , the conditional the GMM and OMD estimators based on the time-invariant probability in equation (33) features the additional term − γ transformations. t ( ft dt) π(t). In section 4, equation (17), I showed that the conditional probability can be rewritten as

VI. Implementation −1 | θ = { − θ } pi,π (d π) exp ( f d) Ziπ π , ∈ This section describes the implementation of the estima- f Fd¯ tors described in sections IV and V. First, I show how the = CMLE estimator based on a single, potentially time-varying where Ziπ is the set of augmented regressors Ziπ transformation from section IV can be implemented when a −Xi | Sπ . As a result, the conditional probability for the computer program for the fixed-effects binary choice logit π-transformed model is equivalent to the CMLE for a binary model is available. Second, I discuss how the asymptotically choice fixed-effects logit model with the columns in Sπ as efficient minimum distance estimator in section V, remark 1 additional regressors. These additional regressors indicate can be implemented using standard methods for minimum which cut-points are used to π-transform the dependent vari- distance estimators. Additionally, I discuss a CLE that is not able. The coefficient estimates for the elements of Sπ are asymptotically efficient but addresses a drawback of the opti- estimates for the cut-point differences γπ(t) − γπ(1). mal minimum distance (OMD) estimator. This CLE extends The OMD estimator in section V, remark 1, is the opti- the blow-up-and-cluster (BUC) estimator in Baetschmann mal linear combination of all CMLE estimators. The optimal et al. (2015). All three estimators are easy to implement in Stata (Stata- 19 The clogit command comes with an option that allows the empirical Corp, 2015). The CMLE estimator can be obtained using researcher to cluster standard errors. This clustering will carry over to the ordered logit estimators discussed in this paper. However, care should be exercised when choosing this option. Refer to section VII.C.2 in Cameron 18 This extends the insight in Baetschmann et al. (2015), who consider only and Miller (2015) for a discussion of cluster-robust inference in nonlinear time-invariant transformations. They show that the MD estimator in Das and fixed-effects models. Additionally, note that serial correlation is explicitly van Soest (1999) is equivalent to the GMM estimator β˜ ∗ in theorem 3. ruled out by the fixed-effects ordered logit model discussed here.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021 474 THE REVIEW OF ECONOMICS AND STATISTICS

weights depend on the variance matrix of vector of all The CLE is an extension of the blow-up and cluster (BUC) CMLE estimators. Stata’s suest command can be used to estimator described in Baetschmann et al. (2015). The differ- estimate that variance matrix. Therefore, one can produce ence between the CLE and the BUC estimator is that the CLE the OMD estimator using a simple program that uses suest estimator takes into account all transformations, whereas after (J − 1)T calls to clogit. A more detailed description the BUC estimator uses time-invariant transformations only. can be found in appendix A.7. Baetschmann et al. (2015) discuss the properties of the BUC estimator. By examining the first-order conditions of the B. Composite Likelihood Estimator objective function, they show that the BUC estimator is a consistent and asymptotically normal GMM estimator with T The number of CMLE estimators (J − 1) can be very nonoptimal weight matrix. As a result, the CLE is not asymp- large, even for moderate values of J and T. The OMD esti- totically efficient. However, it is easy to implement, avoids mator requires an estimate of the variance matrix of these the estimation of the Hessians or variance matrix for use estimators. When the number of entries in that variance in a second step, and has excellent finite-sample properties. matrix approaches or exceeds the number of observations These insights extend to the CLE estimator. n, we expect poor finite-sample performance of the OMD In Stata, the proposed estimator can be implemented by (a) estimator. duplicating each observation (J − 1)T times using expand; Recent contributions to the literature on estimation with (b) generating binary choice variables by applying a different many moments and on the combination of many estimators transformation π to each duplicate of the original data; or (c) include Han and Phillips (2006) and Chen, Jacho-Chávez, applying clogit on these binary choice variables and the aug- and Linton (2016). The OMD procedure discussed in Chen mented regressors discussed in section VIA. Baetschmann et et al. (2016) is similar to the estimator introduced in section al. (2015) provide more details. In particular, they discuss the V, the implementation of which is described in the previous need to use cluster-robust standard errors to account for the subsection. Chen et al. (2016) discuss conditions on the rate fact that observations are highly correlated by the definition of growth of the number of estimators with the sample size of the transformed dependent variable. under which standard inference applies (see, e.g., their theo- In appendix B, I document the usefulness of the CLE rem 3). In this paper, I assume that the number of categories approach using a simulation study. I show that the CLE J and the number of time periods T are fixed, so that the avoids the finite sample bias in situations where (J − 1)T conditions in their paper are trivially satisfied. is large relative to n. In those cases, the finite sample bias The CLE introduced in this section is an alternative for the OMD estimator is large. procedure that also incorporates the information from all transformations but avoids the estimation of the large vari- VII. Empirical Illustration ance matrix. The CLE has the added advantage that it imposes the relationship between the elements of γΔ.20 In this section, I investigate the relationship between The drawback of the CLE is that it sacrifices asymptotic reported (subjective) children’s health status and total house- efficiency. hold income using the conditional likelihood estimator that The CLE is defined as the maximizer of the sum of the uses all (J − 1)T transformations. The analysis in this section criterion functions for all the CMLEs: follows that in Murasko (2008) and Khanam et al. (2014).21 n In an influential paper, Case, Lubotsky, and Paxson (2002) θˆ = − use pooled cross-section data from the United States to doc- cle arg max 1{di=d} RK ×Rnπ ument that reported children’s health is positively related to π i=1   household income. They also find that this relationship is × ln exp ( ft − dt) γπ(t) − γπ(1) − Xitβ . stronger for older children. Currie and Stabile (2003) repli- ∈ t cate these findings using the panel data from Canada. They f Fd¯ (34) use the availability of panel data to investigate the effect of past health shocks but do not control for fixed effects. Fur- By imposing the normalization restriction γ1 = 0, we can ther empirical evidence for the findings in Case et al. (2002) interpret γj as the difference of the jth cut point with respect comes from Murasko (2008). Using the Medical Expenditure to γ1. This estimator, Panel Survey (MEPS) from the United States, he documents  that the income-health relationship seems to be weaker in θˆcle = βˆ cle, γˆ 2,cle, ··· , γˆ J−1,cle , MEPS than it is in the cross-sectional NHIS data used by Case et al. (2002). Murasko confirms that the income-health estimates the regression coefficient and all the cut-point relationship is stronger for older children. He does not use differences. fixed effects to control for unobserved heterogeneity.

20 For example, if J = 3, T = 2, then γ(1,2),Δ = γ2 −γ1 and γ(2,1),Δ = γ1 − 21 γ2 so that γ(1,2),Δ = γ(2,1),Δ. The CLE discussed in this section automatically Stata code for this example can be downloaded from my website: www imposes these restrictions. .sfu.ca/∼cmuris.

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021 ESTIMATION IN THE FIXED-EFFECTS ORDERED LOGIT MODEL 475

Two studies cast doubt on the finding that the health- Table 2.—Estimated Health-Income Relationship for Children Using MEPS Panel 16 income relationship becomes stronger with age. First, using pooled cross-section data from England, Currie, Shields, RE CRE (3, 3) BUC CLE

and Price (2007) confirm the positive relationship between log(Income)it −0.38 −0.10 0.33 −0.09 −0.06 income and health. However, they find that magnitude of the (0.03)(0.06)(0.26)(0.07)(0.08) Age× log(Income)it −0.014 0.017 0.083 0.021 0.035 relationship is much smaller than that reported in Case et al. (0.007)(0.013)(0.049)(0.015)(0.016) (2002), and they do not find evidence that the strength of the Family size 0.09 −0.19 −0.46 −0.20 −0.23 relationship increases with the age of the child. Khanam et al. (0.04)(0.14)(0.60)(0.15)(0.16) γ2 − γ1 2.01 2.02 – – 1.87 (2014) use Australian data. This study appears to be the first (0.05)(0.05)(0.05) that controls for time-invariant unobserved heterogeneity in γ3 − γ2 2.96 2.97 – – 2.92 this literature. They use the BUC estimator in Baetschmann (0.09)(0.09)(0.12) γ4 − γ3 2.73 2.74 – – 2.61 et al. (2015) and do not find evidence that the strength of the (0.23)(0.23)(0.29) health-income relationship increases with age. The panel consists of 4,131 children, each observed in two years. Age, log(Income), and Family size My results are based on panel 16 of the MEPS, a rotat- were normalized to have zero sample mean. The first two columns report the results from random effects (RE) and CRE estimator. For the CRE model, we model αi as a linear function of the average of the ing panel collected by the Agency for Healthcare Research time-varying regressors, gender, age, and dummies for race and region. Column (3, 3) uses Chamberlain’s conditional logit estimator using “Good” as the cutoff category. Column BUC reports the results from a Quality since 1996. It gathers information on demographic composite likelihood procedure using the time-invariant transformations. Column CLE uses the composite and socioeconomic variables and on health and health care likelihood procedure proposed by this paper. use from a nationally representative sample of households. Data from the household’s medical provider and employer- based health insurance supplement the household data. The household data are obtained through questionnaires. Impor- in this paper. Age, log(Income), and Family Size have been tantly, all household members are interviewed regarding normalized to have zero sample mean. their health status. There does not seem to be enough evidence in the data to Panel 16 of the MEPS contains data on 4,131 children find a positive effect of income on health at the average sam- gathered across two years (2011 and 2012). A child is any ple age (coefficient on log(Income)). The CRE estimator and interviewee who does not reach age 18 by the end of the the FE estimators (columns 2–4) do not document a statisti- interview period. I remove observations with nonpositive cally significant relationship. The random-effects estimator values for household income, age, and those with family size does find a statistically significant negative relationship, but less than two. I also remove the richest 5% of families. The this is likely due to omitted variable bias. The data, however, dependent variable is self-reported health status (RTHLTH) do support the finding that the income-health effect increases reported in rounds 2 and 4.22 Subjective health is reported with age. The CLE estimator proposed in this paper is the on a scale of 1 to 5, where 1 corresponds to Poor, 2 to Fair, only estimator that is efficient enough to pick this effect up. 3 to Good, 4 to Very Good, and 5 to Excellent. The explana- Although the other estimators find the same, positive, sign, tory variable of interest is the log of total household income, the results for those estimators are not statistically signifi- and the interaction between age and income. The interaction cantly different from 0 at standard significance levels. None term between age category and family income allows us of the estimators finds a statistically significant effect of fam- to determine whether the relationship between income and ily size, although all point estimates, with the exception of health changes with age. Following Murasko, I also control the estimate from the random-effects model, produce a neg- for family size (capped at 5). The reported specification does ative point estimate. It is likely that the this is due to the not include a year dummy; including it does not change the omitted variable bias, which is avoided by the correlated results. The results are reported in table 2. random-effects and fixed-effects procedures. I report results for five estimators. The first two columns In this context, the CLE is clearly to be preferred over report the results from the random effects (RE) and corre- the BUC and CMLE estimators; it is the only estimator lated random effects (CRE) estimator. For the CRE estimator that detects that the relationship between income and health model, I model αi as a linear function of the average of the changes with age. It is also the only fixed-effects estimator time-varying regressors, gender, age, and dummies for race that produces the cut-point differences. The CLE is also to and region. The “(3, 3)” column uses Chamberlain’s condi- be preferred over the RE estimator, which seems to be biased tional logit estimator using Good as the cutoff category. BUC due to the omitted variable bias that is captured by the fixed column reports the results from a composite likelihood pro- effects approach. cedure using the time-invariant transformations. The CLE It is also interesting to compare the CLE results to those column uses the composite likelihood procedure proposed from the CRE. The coefficient estimates from the CRE esti- mator are very similar to those from the CLE, which suggests that the model for the unobserved heterogeneity used by the 22 Reported health is available for all five interview rounds. However, we CRE is close to correctly specified. However, this model only have annual income data. We chose rounds 2 and 4 based on the timing of the interview rounds: the uneven interview rounds are conducted over a involves so many covariates that the additional restrictions period that span multiple years. imposed on the unobserved heterogeneity do not reduce

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021 476 THE REVIEW OF ECONOMICS AND STATISTICS

the standard errors much in comparison to the CLE. The Allen, Rebecca, and Jay Allnutt, “Matched Panel Data Estimates of the standard errors are only slightly smaller. Impact of Teach First on School and Departmental Performance,” DoQSS working papers 13-11, Institute of Education, University of In section IIIC, I discussed an interpretation for the mag- London (2013). nitude of the regression coefficient based on the cut-point Amato, Jeffery D., and Craig H. Furfine, “Are Credit Ratings Procyclical?” j Journal of Banking and Finance 28 (2003), 2641–2677. differences. The quantity δm in equation (15) can be inter- Andersen, Erling B., “Asymptotic Properties of Conditional Maximum- preted as the minimum required change in Xitm to move an Likelihood Estimators,” Journal of the Royal Statistical Society, arbitrary observation yit = j to a higher category. For the series B 32 (1970), 283–301. CLE results in this model, consider the amount of family Arellano, Manuel, and Stephane Bonhomme, “Nonlinear Panel Data Analysis,” Annual Review of Economics 3 (2012), 395–424. income required to move a 15-year-old who is currently Arellano, Manuel, and Bo Honoré, “Panel Data Models: Some Recent reported to have Fair health to have at least Good health. Developments” (pp. 3229–3296), in J. Heckman and E. Leamer, eds., The appropriate regression coefficient for a 15-year-old is Handbook of Econometrics, vol. 5 (Amsterdam: Elsevier, 2001). 7.60×0.035−0.064 ≈ 0.20. As a result, an income increase Baetschmann, Gregori, Kevin E. Staub, and Rainer Winkelmann, “Consistent Estimation of the Fixed Effects Ordered Logit Model,” of more than 900 is required to achieve this. This suggests Journal of the Royal Statistical Society, Series A 178 (2015), that either the quantity is a loose bound or the relationship 685–703. between income and health is weak. To get some idea, we can Blanchflower, David G., and Andrew J. Oswald, “Well-Being over Time in Britain and the USA,” Journal of Public Economics 88 (2004), compare the results from the CRE model. If we look at the 1359–1386. CRE’s predictions for children of age 15 who currently have Bonhomme, Stephane, “Functional Differencing,” Econometrica 80 (2012), Fair health, we see that a 100% income increase changes 1337–1385. Booth, Alison L., and Jan C. van Ours, “Job Satisfaction and Family Happi- their predicted probability of being in category Good or ness: The Part-Time Work Puzzle,” Economic Journal 118 (2008), above from 0.1415 to 0.1447. This suggests that the rela- F77–F99. tionship between income and health at age 15, although ——— “Part-Time Jobs: What Women Want?” Journal of Population Economics 26:1 (2013), 263–283. statistically significant, is not very strong. Cameron, A. Colin, and Douglas L. Miller, “A Practitioner’s Guide to Cluster-Robust Inference,” Journal of Human Resources 50 (2015), VIII. Conclusion 317–373. Carman, Katherine G., “Inheritances, Intergenerational Transfers, and the Accumulation of Health,” American Economic Review, Papers and I propose a new estimator for the regression coefficient and Proceedings 23 (2013), 223–237. the difference in the cut points in the fixed-effects ordered Case, Anne, Darren Lubotsky, and Christina Paxson, “Economic Status logit model. The estimator uses (J − 1)T transformations of and Health in Childhood: The Origins of the Gradient,” American a time series of ordered discrete outcomes into time series of Economic Review 92 (2002), 1308–1334. Chamberlain, Gary, “Analysis of Covariance with Qualitative Data,” Review binary outcomes. Taking into account all these transforma- of Economic Studies 47 (1980), 225–238. tions allows me to construct an estimator for the regression ——— “Panel Data” (pp. 1247–1318), in Z. Griliches and M. D. Intriligator, coefficient that is more efficient than currently available ed., Handbook of Econometrics, vol. 2 (Amsterdam: Elsevier, 1984). ——— “Binary Response Models for Panel Data: Identification and estimators. Furthermore, the difference in the cut points is Information,” Econometrica 78 (2010), 159–168. estimated, which provides a bound on a marginal effect that Chen, Xiaohong, David T. Jacho-Chávez, and Oliver Linton, “Averag- is useful in empirical practice. ing of an Increasing Number of Moment Condition Estimators,” Econometric Theory 32:1 (2016), 1–41. It may be possible to extend the main result in this paper Currie, Janet, and Mark Stabile, “Socioeconomic Status and Child Health: to a more general setting: the fixed-effects ordered choice Why Is the Relationship Stronger for Older Children?” American model that relaxes the logistic assumption and the serial Economic Review 93 (2003), 1813–1823. Currie, Janet, Michael A. Shields, and Stephen Wheatley Price, “The Child independence assumption. For example, consider the fixed- Health/Family Income Gradient: Evidence from England,” Journal effects ordered logit model in equations (1) to (3), but replace of Health Economics 26 (2007), 213–232. the conditional logit assumption by the assumption that the Das, Marcel, and Arthur van Soest, “A Panel Data Model for Subjective conditional distributed of the error terms is identically dis- Information on Household Income Growth,” Journal of Economic Behavior and Organization 40 (1999), 409–426. tributed in each error term. One could turn this ordered model Fairlie, Robert W., Florian Hoffmann, and Philip Oreopoulos, “A Com- into (J − 1)T instances of the semiparametric binary choice munity College Instructor Like Me: Race and Ethnicity Interac- model that Manski (1987) studied. An alternative approach tions in the Classroom,” American Economic Review 104 (2014), 2567–2591. would be to use a pairwise differencing approach (see Ahn Ferrer-i-Carbonell, Ada, and Paul Frijters, “How Important Is Methodology et al., 2015). for the Estimates of the Determinants of Happiness?” Economic Journal 114 (2004), 641–659. REFERENCES Frijters, Paul, John P. Haisken-DeNew, and Michael A. Shields, “Money Does Matter! Evidence from Increasing Real Income and Life Afonso, Antonio, Pedro Gomes, and Philipp Rother, “Short- and Long- Satisfaction in East Germany following Reunification,” American Run Determinants of Sovereign Debt Credit Ratings,” International Economic Review 94 (2004), 730–740. Journal of Finance and Economics 16 (2013), 1–15. ——— “The Causal Effect of Income on Health: Evidence from German Ahn, Hyungtaik, Hidehiko Ichimura, James L. Powell, and Paul A. Ruud, Reunification,” Journal of Health Economics 24 (2005), 997–1017. “Simple Estimator for Invertible Index Models,” working paper Hahn, Jinyong, “A Note on the Efficient Semiparametric Estimation of (2015). Some Exponential Panel Models,” Econometric Theory 13 (1997), Ahn, Seung C., and Peter Schmidt, “A Separability Result for GMM Estima- 583–588. tion, with Applications to GLS Prediction and Conditional Moment Hall, Alastair R., Generalized Method of Moments (Oxford: Oxford Tests,” Econometric Reviews 14 (1995), 19–34. University Press, 2005).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021 ESTIMATION IN THE FIXED-EFFECTS ORDERED LOGIT MODEL 477

Hamermesh, Daniel S., “The Changing Distribution of Job Satisfaction,” Machado, Matilde P., “A Consistent Estimator for the Binomial Distribu- Journal of Human Resources 36 (2001), 1–30. tion in the Presence of ‘Incidental Parameters’: An Application to Han, Chirok, and Peter C. B. Phillips, “GMM with Many Moment Patent Data,” Journal of Econometrics 119 (2004), 73–98. Conditions,” Econometrica 74 (2006), 147–192. Magnac, Thierry, “Panel Binary Variables and Sufficiency: Generalizing Hausman, Jerry A., Bronwyn H. Hall, and Zvi Griliches, “Econometric Conditional Logit,” Econometrica 72 (2004), 1859–1876. Models for Count Data with an Application to the Patents-R&D Manski, Charles F., “Semiparametric Analysis of Random Effects Linear Relationship,” Econometrica 52 (1984), 909–938. Models from Binary Panel Data,” Econometrica 55 (1987), 357–362. Honoré, Bo, “Trimmed Lad and Estimation of Truncated Murasko, Jason E., “An Evaluation of the Age-Profile in the Relationship and Censored Regression Models with Fixed Effects,” Econometrica between Household Income and the Health of Children in the United 60 (1992), 533–565. States,” Journal of Health Economics 27 (2008), 1489–1502. ——— “Nonlinear Models with Panel Data,” PortugueseEconomic Journal Newey, Whitney K., and Daniel L. McFadden, “Large Sample Estimation 1 (2002), 163–179. and Hypothesis Testing” (pp. 2111–2245), in R. F. Engle and D. L. Khanam R., H. S. Nghiem, and L. B. Connelly, “What Roles Do Contempo- McFadden, eds., Handbook of Econometrics, vol. 4 (Amsterdam: raneous and Cumulative Incomes Play in the Income–Child Health Elsevier, 1994). Gradient for Young Children? Evidence from an Australian Panel,” StataCorp, Stata Statistical Software: Release 14 (College Station, TX: Health Economics 23 (2014), 879–893, StataCorp LP, 2015). Lancaster, Tony, “The Incidental Parameter Problem since 1948,” Journal Wooldridge, Jeffrey M., “Distribution-Free Estimation of Some Nonlinear of Econometrics 95 (2000), 391–413. Panel Data Models,” Journal of Econometrics 90 (1999), 77–97. ——— “Orthogonal Parameters and Panel Data,” Review of Economic ——— Econometric Analysis of Cross Section and Panel Data, 2nd ed. Studies 69 (2002), 647–666. (Cambridge, Massachusetts: MIT Press, 2010).

Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00617 by guest on 02 October 2021