Linear Regression Estimation of Discrete Choice Models With

-JOFBS3FHSFTTJPO&TUJNBUJPOPG%JTDSFUF$IPJDF.PEFMTXJUI /POQBSBNFUSJD%JTUSJCVUJPOTPG3BOEPN$PFGàDJFOUT By P#$%&'( B#)#%&, J*%*+, T. F-., #/0 S$*12*/ P. R,#/* In a discrete choice demand model, consumer The parametric density f(b Z g) re!ects the dis- i chooses product j out of the set J if ui, j . tribution of the unobserved heterogeneity, the ui, k 5k 5 1, … , J, k Z j. A standard setup has tastes bi. The goal is to estimate the parameters a vector of D characteristics xi, j for product j, g in this density. and utility is parametrized as ui, j 5 x9i, j b 1 ei, j, Estimation usually proceeds by simulation: where b is a vector of parameters to estimate maximum likelihood or the method of moments. that re!ect the marginal utility of the product The consumer i-speci"c numerical integral is of characteristics, and ei, j is a product-speci"c dimension D. The likelihood must be evaluated error term. Typically ei, j is assumed to have a repeatedly at trial guesses of g. The nonlinear logit or normal marginal distribution. search over g can suffer from multiple local In industrial organization, marketing, and maxes, resulting in the need to try many start- transportation economics, hundreds of papers ing values. The dimension of g can be large—g use random coef"cient models to estimate both often contains variance matrices of multivariate individual and aggregate demand. For some normal distributions. examples, see Hayden J. Boyd and Robert E. Hierarchical Bayesian estimation is an alter- Mellman (1980); N. Scott Cardell and Frederick native (Peter E. Rossi, Greg M. Allenby, and C. Dunbar (1980); Dean A. Follmann and Diane Robert McCulloch 2005). Computationally ef- Lambert (1989); Pradeep K. Chintagunta, Dipak "cient Gibbs sampling requires training in C. Jain, and Naufel J. Vilcassim (1991); Steven conjugate, family relationships that are needed Berry, James Levinsohn, and Ariel Pakes (1995); for ef"cient random number generation. These Aviv Nevo (2001); Amil Petrin (2002); and conjugate families require restrictive model Kenneth E. Train (2003). assumptions that can break down with small Random coef"cients generalize the model so model perturbations. Gibbs sampling itself that ui, j 5 x9i, j bi 1 ei, j, where the D-vector bi requires training and monitoring by the user. is speci"c to consumer i. Adding random coef- This paper describes a method for estimat- "cients allows consumers to substitute (as prices ing random coef"cient discrete choice models in xi, j change) between products with similar that is both !exible and simple to compute. We nonprice observables in xi, j. Daniel McFadden demonstrate that, with a "nite number of types, and Train (2000) show the more general mixed choice probabilities are a linear function of the logit can !exibly approximate choice patterns. model parameters. Because of this linearity, Random coef"cient estimators are dif"cult to our model can be estimated using linear regres- compute. The likelihood for data on the choices sion, subject to inequality constraints. We can ji of consumers i 5 1, … , N is approximate an arbitrary distribution of random coef"cients by allowing the number of types to N exp1x r b2 i,j be suf"ciently large. Therefore, we say our esti- L(g) 5 3 i f(b Z g) db. q J mator is nonparametric for the distribution of g i51 b k51 exp1xir, k b2 heterogeneity. * Bajari: Department of Economics, University of Minnesota, Twin Cities, 1035 Heller Hall, 271 19th Ave. I. Review of Series Estimators South, Minneapolis, MN 55455, and National Bureau of Economic Research (e-mail: [email protected]); Let yj, t be the market share of product j in Fox: Department of Economics, University of Chicago, market t, x the characteristics of all J products 1126 E. 59th St., Chicago, IL 60637 (e-mail: fox@uchi- t cago.edu); Ryan: Massachusetts Institute of Technol- in market t, and hj, t measurement error in market ogy, 50 Memorial Drive, E52-262C, Cambridge, MA 02142, shares. Let yj, t 5 gj 1xt2 1 hj, t. A series estimator and NBER (e-mail: [email protected]). approximates an unknown function gj(xt) with 459 460 AEA PAPERS AND PROCEEDINGS MAY 2007 R the approximation gj(x) L g r51 hr (x) ur . Here, each product and each product or T # J regres- R 5hr 1x2 6r51 is a known basis of R functions cho- sion observations. The number of unknown sen to ease mathematical approximation, and ur parameters is the number of types, R. The term is an approximation weight on the function r. 1yj, t 2 P (j Z t)2 re!ects the measurement error in The key behind series estimation is that the market shares. r unknown parameters ur enter the market share Once the u ’s are estimated, we can predict approximation linearly. Estimation just regresses out-of-sample market shares by varying the R yj, t on 5hr 1xt 2 6r51 for various markets xt. Donald product covariates xj, t in the logit choice prob- W. K. Andrews (1991) shows that estimators of abilities that enter the equation for P ( j Z t ). r gj(x) for a given x and some functions of gj(x) Typically the R random coef"cient vectors b are asymptotically normal. are unknown. We view our estimator for the ur ’s as a nonparametric approximation to an under- II. Estimating Random Coef!cient Logit Models lying, possibly continuous density of random coef"cients. So before running the regression, We now show how linear regression can esti- we "rst draw or deterministically choose R ran- mate the random coef"cients logit model for dom coef"cients br. The estimator is not par- market share data. Assume, for now, that there ticularly sensitive to the scheme used to pick the really are r known, discrete consumer types. br’s, as the ur ’s are completely !exible parame- Each type r is distinguished by a known, ran- ters to be estimated. We do require that we span dom coef"cient vector br. Let ur be the fraction the domain of the underlying true, random coef- of consumers of type r in the population. "cient distribution in the limit as R grows. Let P( j Z t) be the no-measurement error mar- There is no way to impose that the types have ket share of product j in market t. Market shares some restrictive parametric distribution. Given are the sum of the individual choice probabili- that we know of no empirical applications where ties of each type in the marketplace, researchers would actually know the types have some distribution, we see no reason why a R exp ( brx ) researcher would not want to be nonparametric P (j Z t) 5 j, t ur. on the weights over the R types. a q J r r r51 g b k51 exp1 xk, t 2 R does not have to be too large. For two dimensions of b (D 5 2), R 5 200 works well. R r Type r’s logit choice probabilities are weighted One option imposes the constraints g r51u 5 by its frequency ur. The basis functions are not 1 and ur $ 0 for r 5 1, … , R. If so, the closed- the !exible mathematical functions from tradi- form regression becomes an inequality con- tional series estimators, but the predictions of strained least squares (ICLS) estimator, which an individual choice model for consumer type r. requires numerical optimization. The ICLS min- No unknown parameters enter the logit choice imization problem is convex, so a standard least probabilities. Each br represents all the utility squares algorithm will "nd a global optimum. parameters for type r. The unknown frequencies The minimization problem is much simpler than ur are structural objects, not just the approxima- minimizing over both the R br ’s and the R ur ’s tion weights from series estimation. (James J. Heckman and Burton H. Singer 1984). The key idea is that the type frequencies ur Our approach shares the intuition of span- enter the market shares linearly and can be esti- ning the space of economic models with the mated from a linear regression of shares on logit importance sampling estimator of Daniel A. choice probabilities for all types. Ackerberg (2001). His importance sampling With actual data yj, t on market shares, we estimator requires parametric type densities, a estimate the regression equation change of variables assumption, and numerical optimization, however. R r exp ( b xj, t) r yj, t 5 a q r u 1 1yj, t 2 P (j Z t)2 g J r III. Extensions r51 k51 exp1b xk, t 2 to estimate the R ur ’s. Let T be the number of A type r could involve values for the choice- r markets. There is one regression observation for speci"c errors of the form ej in addition to VOL. 97 NO. 2 LINEAR REGRESSION ESTIMATION OF DISCRETE CHOICE MODELS 461 r andom coef"cients br. Shares are still the sum has a closed form once the choice-speci"c value of decisions of R types of consumers, functions are computed. R IV. Monte Carlo yj, t L a 1{type r buys j Z xt} ur , r51 The following Monte Carlo explores the per- where the indicator 1 {type r buys j Z xt} says formance of three estimators on three differ- that consumer type r would buy product j when ent random coef"cient logit fake data designs. faced with the choice characteristics xt. The Each choice set has ten choices and an outside symbol “L” abstracts away from approximation option with utility 0.

Linear Regression Estimation of Discrete Choice Models With

Economic Choices

Small-Sample Properties of Tests for Heteroscedasticity in the Conditional Logit Model

Estimation of Discrete Choice Models Including Socio-Economic Explanatory Variables, Diskussionsbeiträge - Serie II, No

Discrete Choice Models As Structural Models of Demand: Some Economic Implications of Common Approaches∗

Discrete Choice Models with Multiple Unobserved ∗ Choice Characteristics

Chapter 3: Discrete Choice

Generalized Linear Models

Discrete Choice Models and Methods.2

Specification of the Utility Function in Discrete Choice Experiments

Models for Heterogeneous Choices

Discrete Choice Modeling William Greene Stern School of Business, New York University

Lecture 5 Multiple Choice Models Part I – MNL, Nested Logit