Lecture 6 Multiple Choice Models Part II – MN Probit, Ordered Choice

RS – Lecture 17 Lecture 6 Multiple Choice Models Part II – MN Probit, Ordered Choice 1 DCM: Different Models • Popular Models: 1. Probit Model 2. Binary Logit Model 3. Multinomial Logit Model 4. Nested Logit model 5. Ordered Logit Model • Relevant literature: - Train (2003): Discrete Choice Methods with Simulation - Franses and Paap (2001): Quantitative Models in Market Research - Hensher, Rose and Greene (2005): Applied Choice Analysis 1 RS – Lecture 17 Model – IIA: Alternative Models • In the MNL model we assumed independent nj with extreme value distributions. This essentially created the IIA property. • This is the main weakness of the MNL model. • The solution to the IIA problem is to relax the independence between the unobserved components of the latent utility, εi. • Solutions to IIA – Nested Logit Model, allowing correlation between some choices. – Models allowing correlation among the εi’s, such as MP Models. – Mixed or random coefficients models, where the marginal utilities associated with choice characteristics vary between individuals. Multinomial Probit Model • Changing the distribution of the error term in the RUM equation leads to alternative models. • A popular alternative: The εij’s follow an independent standard normal distributions for all i,j. • We retain independence across subjects but we allow dependence across alternatives, assuming that the vector εi = (εi1,εi2, , εiJ) follows a multivariate normal distribution, but with arbitrary covariance matrix Ω. 2 RS – Lecture 17 Multinomial Probit Model • The vector εi = (εi1,εi2, , εiJ) follows a multivariate normal distribution, but with arbitrary covariance matrix Ω. • The model is called the Multinomial probit model. It produces results similar results to the MNL model after standardization. • Some restrictions (normalization) on Ω are needed. • As usual with latent variable formulations, the variance of the error term cannot be separated from the regression coefficients. Setting the variances to one means that we work with a correlation matrix rather than a covariance matrix. MP Model – Pros & Cons • Main advantages: - Using ML, joint estimation of all parameters is possible. - It allows correlation between the utilities that an individual assigns to the various alternatives (relaxes IIA). - It does not rely on grouping choices. No restrictions on which choices are close substitutes. - It can also allow for heterogeneity in the (marginal) distributions for εi. • Main difficulty: Estimation. - ML estimation involves evaluating probabilities given by multidimensional normal integrals, a limitation that forces practical applications to a few alternatives (J=3,4). Quadrature methods can be used to approximate the integral, but for large J, often imprecise. 3 RS – Lecture 17 MP Model – Estimation • Probit Problem: Pnj Prob[Y j 1| X ] ... I[Vnj Vni nji ;j i] f (n )dn 1 1 J-dimensional integral involves ξjk=εk-εj, which is normally distributed, with variance Ω. We can rewrite the the probability as: P[yj=1|X] = P(ξj < Vj ) where Vj is the vector with kth element Vjk= xj’β-xk’β. Let θ={β,Ω}. To get the MLE, we need to evaluate this integral for any β and Ω. The MLE of θ maximizes L = Σn Σj ynj logP(ξj< Vj ) <= we need to integrate MP Model – Estimation • We need to integrate to get log P(ξj< Vj ) If J=3, we need to evaluate a bivariae normal –no problem. If J>3, we need to evaluate a 3-dimensional integral. A usual approach is to use Guassian quadrature (Recall Math Review, Lecture 12). Most current software programs use the Butler and Moffit (1982) method, based on Hermite quadrature. Practical considerations: If J>4, numerical procedures get complicated and, often, imprecise. For these cases, we rely on simulation-based estimation -simulated maximum likelihood or SML. 4 RS – Lecture 17 Review: Gaussian Quadratures • Newton-Cotes Formulae – Nodes: Use evenly-spaced functional values – Weights: Use Lagrange interpolation. Best, given the nodes. – It can explode for large n (Runge’s phenomenon) • Gaussian Quadratures – Select functional values at non-uniformly distributed points to achieve higher accuracy. The values are not predetermined, but unknowns to be determined. – Nodes and Weight are both “best” to get an exact answer if f is a (2n-1)th-order polynomial. Legendre polynomials are used. – Change of variables => the interval of integration is [-1,1]. 9 Review: Gaussian Quadratures • The Gauss-Legendre quadrature formula is stated as 1 n f(x)dx c f (x ) i i 1 i1 the ci's are called the weights, the xi's are called the quadrature nodes. The approximation error term, ε, is called the truncation error for integration. For Gauss-Legendre quadrature, the nodes are chosen to be zeros of certain Legendre (orthogonal) polynomials. 10 5 RS – Lecture 17 Change of Interval for Gaussian Quadrature • Coordinate transformation from [a,b] to [-1,1] This can be done by an affine transformation on t and a change of variables. b a b a t x 2 2 b a dt dx 2 x 1 t a x 1 t b abt1 t2 b 1 b a b a b a b a n f (t)dt f ( x ) ( )dx ci f (xi ) a 1 2 2 2 2 i1 11 Review: Gaussian Quadrature on [-1, 1] • Gauss Quadrature General formulation: 1 n f ( x )dx ci f ( xi ) c1 f ( x1 ) c2 f ( x2 ) cn f ( xn ) 1 i1 1 n 2 : f(x)dx 1 c1f(x1) c2f(x 2 ) -1x1 x2 1 •For n=2, we have four unknowns (c1, c2, x1, x2). These are found by assuming that the formula gives exact results for integrating a general 3rd order polynomial. It can also be done by choosing (c , 121 0 1 2 3 c2, x1, x2) such that it yields “exact integral” for f(x) = x , x , x , x . 6 RS – Lecture 17 Review: Gaussian Quadrature on [-1, 1] 1 Case n 2 f(x)dx c1f(x1) c2f(x 2 ) 1 Exact integral for f = x0, x1, x2, x3 – Four equations for four unknowns 1 f 1 1dx 2 c1 c2 c1 1 1 c 1 1 2 f x xdx 0 c1 x1 c2 x2 1 1 x 1 1 2 2 2 2 2 3 f x x dx c1 x1 c2 x2 1 3 1 1 x 3 3 3 3 2 f x x dx 0 c1 x1 c2 x2 3 1 1 1 1 I f ( x )dx f ( ) f ( ) 1 3 3 13 Review: Gaussian Quadrature on [-1, 1] 1 Case n 3: f (x)dx c1 f (x1) c2 f (x2 ) c3 f (x3 ) 1 -1x1 x2 x3 1 • Now, choose (c1, c2, c3, x1, x2,x3) such that the method yields 0 1 2 3 4 5 “exact integral” for f(x) = x , x , x , x ,x , x . (Again, (c1, c2, c3, x1, x2,x3) are calculated by assuming the formula gives exact expressions for integrating a fifth order polynomial). 14 7 RS – Lecture 17 Review: Gaussian Quadrature on [-1, 1] 1 f 1 xdx 2 c c c 1 2 3 1 1 f x xdx 0 c x c x c x 1 1 2 2 3 3 c1 5/ 9 1 1 c2 8/ 9 2 2 2 2 2 2 f x x dx c1x1 c2 x2 c3 x3 c 5/ 9 3 3 1 1 x1 3/ 5 f x3 x3dx 0 c x3 c x3 c x3 1 1 2 2 3 3 x 0 1 2 1 2 x 3/ 5 f x 4 x 4dx c x 4 c x 4 c x 4 3 1 1 2 2 3 3 1 5 1 5 5 5 5 5 f x x dx 0 c1x1 c2 x2 c3 x3 15 1 Review: Gaussian Quadrature on [-1, 1] • Approximation formula for n=3 1 5 3 8 5 3 I f ( x )dx f ( ) f ( 0 ) f ( ) 1 9 5 9 9 5 16 8 RS – Lecture 17 Review: Gaussian Quadrature – Example 1 • Evaluate: 4 I te 2 t dt 5216 .926477 0 - Coordinate transformation b a b a t x 2 x 2 ; dt 2dx 2 2 4 1 1 I te 2 t dt ( 4 x 4 )e 4 x 4 dx f ( x )dx 0 1 1 - Two-point formula (n=2) 4 4 1 1 1 4 4 4 4 I f (x)dx f ( ) f ( ) (4 )e 3 (4 )e 3 1 3 3 3 3 9.167657324 3468.376279 3477.543936 ( 33.34%) 17 Review: Gaussian Quadrature – Example 1 - Three-point formula (n=3) 1 5 8 5 I f ( x )dx f ( 0.6 ) f ( 0 ) f ( 0.6 ) 1 9 9 9 5 8 5 ( 4 4 0.6 )e 4 0.6 ( 4 )e 4 ( 4 4 0.6 )e 4 0.6 9 9 9 5 8 5 ( 2.221191545 ) ( 218.3926001 ) ( 8589.142689 ) 9 9 9 4967.106689 ( 4.79%) - Four-point formula (n=4) 1 I f (x)dx 0.34785f (0.861136) f (0.861136) 1 0.652145f (0.339981) f (0.339981) 5197.54375 ( 0.37%) 18 9 RS – Lecture 17 Review: Gaussian Quadrature – Example 2 • Evaluate x 2 1 1.64 I e 2 dx .44949742 2 0 - Coordinate transformation b a b a t x .82x .82 .82(1 x); dt .82dx 2 2 t 2 1 1 1.64 .82 1 [.82(1x)]2 .82 1 I e 2 dt e 2 dx f (x)dx 2 0 2 1 2 1 19 Review: Gaussian Quadrature – Example 2 - Two-point formula (n=2) 1 1 2 1 1 2 1 [.82(1 )] [.82(1 )] .82 .82 1 1 .82 2 3 2 3 I f (x)dx f ( ) f ( ) e e 2 1 2 3 3 2 0.32713267 *(0.94171147 + 0.43323413) .44978962 ( 0.065%) - Three-point formula (n=3) .82 1 .82 5 8 5 I f (x)dx f ( 0.6 ) f (0) f ( 0.6 ) 2 1 2 9 9 9 1 1 1 .82 5 [.82(1 0.6 )]2 8 [.82(10)]2 5 [.82(1 0.6 )]2 e 2 e 2 e 2 9 9 9 2 .32713267 * (0.5461465 9 + 0.63509351 + 0.19271450 ) 0.44946544 ( 0.007%) 20 10 RS – Lecture 17 Review: Multidimensional Integrals • In the review, we concentrated on one-dimensional integrals.

Lecture 6 Multiple Choice Models Part II – MN Probit, Ordered Choice

Multivariate Ordinal Regression Models: an Analysis of Corporate Credit Ratings

MNP: R Package for Fitting the Multinomial Probit Model

Chapter 3: Discrete Choice

The Ordered Probit and Logit Models Have Come Into

A Multinomial Probit Model with Latent Factors: Identification and Interpretation Without a Measurement System

Nlogit — Nested Logit Regression

Simulating Generalized Linear Models

Estimation of Logit and Probit Models Using Best, Worst and Best-Worst Choices Paolo Delle Site, Karim Kilani, Valerio Gatta, Edoardo Marcucci, André De Palma

Ordered Logit Model

Using Heteroskedastic Ordered Probit Models to Recover Moments of Continuous Test Score Distributions from Coarsened Data

Multinomial Logistic Regression Models with SAS Example

The Generalized Multinomial Logit Model