Discrete Choice Analysis I

Discrete Choice Analysis I Moshe Ben-Akiva 1.201 / 11.545 / ESD.210 Transportation Systems Analysis: Demand & Economics Fall 2008 Outline of 2 Lectures on Discrete Choice ● Introduction ● A Simple Example ● The Random Utility Model ● Specification and Estimation ● Forecasting ● IIA Property ● Nested Logit 2 Outline of this Lecture ● Introduction ● A simple example – route choice ● The Random Utility Model – Systematic utility – Random components ● Derivation of the Probit and Logit models – Binary Probit – Binary Logit – Multinomial Logit 3 Continuous vs. Discrete Goods Continuous Goods Discrete Goods auto x2 Indifference curves u3 u2 u1 x 1 bus 4 Discrete Choice Framework ● Decision-Maker – Individual (person/household) – Socio-economic characteristics (e.g. Age, gender,income, vehicle ownership) ● Alternatives – Decision-maker n selects one and only one alternative from a choice set Cn={1,2,…,i,…,Jn} with Jn alternatives ● Attributes of alternatives (e.g.Travel time, cost) ● Decision Rule – Dominance, satisfaction, utility etc. 5 Choice: Travel Mode to Work • Decision maker: an individual worker • Choice: whether to drive to work or take the bus to work • Goods: bus, auto • Utility function: U(X) = U(bus, auto) • Consumption bundles: {1,0} (person takes bus) {0,1} (person drives) 6 Consumer Choice • Consumers maximize utility – Choose the alternative that has the maximum utility (and falls within the income constraint) If U(bus) > U(auto) �choose bus If U(bus) < U(auto) �choose auto U(bus)=? U(auto)=? 7 Constructing the Utility Function ● U(bus) = U(walk time, in-vehicle time, fare, …) U(auto) = U(travel time, parking cost, …) ● Assume linear (in the parameters) β × β × U(bus) = 1 (walk time) + 2 (in-vehicle time) + … ● Parameters represent tastes, which may vary over people. – Include socio-economic characteristics (e.g., age, gender, income) β × β × – U(bus) = 1 (walk time) + 2 (in-vehicle time) + β × 3 (cost/income) + … 8 Deterministic Binary Choice ● If U(bus) - U(auto) > 0 , Probability(bus) = 1 If U(bus) - U(auto) < 0 , Probability(bus) = 0 P(bus) 1 0 0 U(bus)-U(auto) 9 Probabilistic Choice ● Random utility model Ui = V(attributes of i; parameters) + epsiloni ● What is in the epsilon? Analysts’ imperfect knowledge: – Unobserved attributes – Unobserved taste variations – Measurement errors – Use of proxy variables β × β × ● U(bus) = 1 (walk time) + 2 (in-vehicle time + β × 3 (cost/income) + … + epsilon_bus 10 Probabilistic Binary Choice P(bus) 1 0 0 V(bus)-V(auto) 11 A Simple Example: Route Choice – Sample size: N = 600 – Alternatives: Tolled, Free – Income: Low, Medium, High Route Income choice Low (k=1) Medium (k=2) High (k=3) Tolled (i=1) 10 100 90 200 Free (i=2) 140 200 60 400 150 300 150 600 12 A Simple Example: Route Choice Probabilities ● (Marginal) probability of choosing toll road P(i = 1) Pˆ(i = 1) = 200 / 600 = 1/3 ● (Joint) probability of choosing toll road and having medium income: P(i=1, k=2) Pˆ(i = 1, k = 2) = 100 / 600 = 1/6 2 3 P(i , k ) = 1 ∑∑ i=1 k =1 13 Conditional Probability P(i|k) P(i, k) = P(i) ⋅ P(k | i) = P(k) ⋅ P(i | k) = Independence P(i ) ∑ P(i, k ) k = P(k ) = ∑ P(i, k ) P(i | k) P(i) i P(i, k ) P(k | i) = P(k) P( k | i ) = , P(i ) ≠ 0 P(i ) P(i, k ) P(i | k ) = , P(k ) ≠ 0 P(k ) 14 Model : P(i|k) ● Behavioral Model~ Probability (Route Choice|Income) = P(i|k) ● Unknown parameters = = = π P(i 1| k 1) 1 = = = π P(i 1| k 2) 2 = = = π P(i 1| k 3) 3 15 Example: Model Estimation ● Estimation y c Sampling n e π = 1 π = 1 π = 3 u distribution ˆ , ˆ , ˆ q 1 15 2 3 3 5 e r f = 0.067 = 0.333 = 0.6 N=600 πˆ ⋅ ( −πˆ ) ⋅ − = 1 1 1 = 1/15 (1 1/15) = s1 0.020 N 1 150 1/3 π ˆ 2 πˆ ⋅ ( −πˆ ) ⋅ − = 2 1 2 = 1/ 3 (1 1/ 3) = s2 0.027 Standard errors N2 300 πˆ ⋅ ( −πˆ ) ⋅ − = 3 1 3 = 3 / 5 (1 3 / 5) = s3 0.040 N3 150 16 Example: Forecasting ● Toll Road share under existing income distribution: 33% ● New income distribution Route Income cho ic e Low (k=1) Medium (k=2) High (k=3) Tolled (i=1) 1/15*45=3 1/3*300=100 3/5*255=153 256 43% F ree (i =2) 42 200 102 344 57% New income 45 300 255 600 distribution Existing income 150 300 150 600 di strib uti on ● Toll road share: 33%�43% 17 The Random Utility Model ● Decision rule: Utility maximization – Decision maker n selects the alternative i with the highest utility Uin among Jn alternatives in the choice set Cn. ε Uin = Vin + in Vin =Systematic utility : function of observable variables ε in =Random utility 18 The Random Utility Model ● Choice probability: ≥ ∀ ∈ P(i|C n) = P(Uin Ujn, j Cn) ≥ ∀ ∈ = P(Uin - Ujn 0, j Cn) ∀ ∈ = P(Uin = maxj Ujn, j Cn) ● For binary choice: ≥ Pn(1) = P(U1n U2n) ≥ = P(U1n – U2n 0) 19 The Random Utility Model Routes Attributes Utility Travel time (t) Travel cost (c) (utils) Tolled (i=1) t1 c1 U1 Free (i=2) t2 c2 U2 = −β − β + ε U1 1t1 2c1 1 = −β − β + ε U 2 1t2 2c2 2 β β > 1, 2 0 20 The Random Utility Model ● Ordinal utility - Decisions are based on utility differences - Unique up to order preserving transformation = −β − β + ε + λ U1 ( 1t1 2c1 1 K ) = −β − β + ε + λ U 2 ( 1t 2 2c2 2 K) β β λ > 1, 2 , 0 21 The Random Utility Model c1-c2 V1 < V2 Alt. 2 is dominant + + + + + + + β + • + + = − 1 ⋅ − + ε • + + U1 β t1 c1 1 • + + • + 2 V1 > V2 • • + t -t • + • • + 1 2 β = − 1 ⋅ − + ε U β t c β • + 2 2 2 2 2 + + • • • 1 + β • • • + β = 1 = • + • β "value of time" • • • • • 2 • + Alt. 1 is dominant V1 = V2 • Choice = 1 β + Choice = 2 − = − 1 ⋅ − − − + ε − ε U U β (t t ) (c c ) ( ) 1 2 2 1 2 1 2 1 2 22 The Systematic Utility ● Attributes: describing the alternative – Generic vs. Specific • Examples: travel time, travel cost, frequency – Quantitative vs. Qualitative • Examples: comfort, reliability, level of service – Perception – Data availability ● Characteristics: describing the decision-maker – Socio-economic variables • Examples: income,gender,education 23 Random Terms ● Capture imperfectness of information ● Distribution of epsilons ● Variance/covariance structure – Correlation between alternatives – Multidimensional decision • Example: Mode and departure time choice ● Typical models – Logit model (i.i.d. “Extreme Value” error terms, a.k.a Gumbel) – Probit model (normal error terms) 24 Binary Choice ∀ ● Choice set Cn = {1,2} n ≥ Pn(1) = P(1|Cn) = P(U1n U2n) ε ≥ ε = P(V1n + 1n V2n + 2n) ≥ ε ε = P(V1n - V2n 2n - 1n) ≥ ε ≥ ε = P(V1n - V2n n) = P(Vn n) = Fε(Vn) 25 Binary Probit ● “Probit” name comes from Probability Unit ε σ 2 1n ~ N(0, 1 ) ε σ 2 2n ~ N(0, 2 ) ε σ2 σ 2 = σ 2 + σ 2 − σ n ~ N(0, ) where 1 2 2 12 1 ε 2 − 1 σ f (ε ) = e 2 σ 2π 1 ε 2 − V n 1 σ V = = 2 ε = Φ n Pn (1) Fε (Vn ) ∫ e d −∞ σ 2π σ where Φ ( z) is the standardized cumulative normal distribution 26 Binary Probit Normalization ● Relationship between Utility scale µ* and Scale Parameter σ : µ ε Var( * n) = 1 iff µ 2 ε = * var( n ) 1 1 1 ⇒ µ* = = ε σ Var( n ) ● Usual normalization: σ = 1, implying µ*= 1 27 Binary Logit Model ● “Logit” name comes from Logistic Probability Unit −µε ε µ ε = [− 1n ] 1n ~ ExtremeValue (0, ) Fε ( 1n ) exp e −µε ε µ ε = [− 2 n ] 2n ~ ExtremeValue (0, ) Fε ( 2n ) exp e 1 ε ~ Logistic (0,µ) ε = n Fε ( n ) −µε 1+ e n = = 1 Pn (1) Fε (Vn ) −µ 1+ e Vn 28 Why Logit? ● Probit does not have a closed form – the choice probability is an integral. ● The logistic distribution is used because: – It approximates a normal distribution quite well. – It is analytically convenient – Gumbel can also be “justified” as an extreme value distribution ● Logit does have “fatter” tails than a normal distribution. 29 Logit Model Normalization ● Relationship between Utility Scale µ* and Scale Parameter µ µ ε Var( * n) = 1 iff 1 µ* = ε Var( n ) ε ε ε π2 µ2 where Var( n)=Var( 2n - 1n)=2 /6 30 Logit Model Normalization � Usual normalization: µ =1, implying µ*= 3 π � Utility scale different from probit π – Need to multiply probit coefficients by 3 to be comparable to logit coefficients 31 Limiting Cases ● Recall: Pn(1) = P(Vn ≥ εn) = Fε(V1n –V2n) µ = 10 µ V 1n = 1 = e ● With logit, Fε (Vn ) −µ µ µ 1+ e V n e V 1n + e V 2n ● What happens as µ ∞ ? µ = 1 Pn(1) ● What happens as µ 0 ? µ = .1 Vn = V1n –V2n 32 Re-formulation ● Pn(i) = P(Uin ≥ Ujn) 1 = − µ − 1 + e (V in V jn ) µ V in = e µ µ V e V in + e jn ● If Vin and Vjn are linear in their parameters: µβ 'xin = e Pn (i) µβ µβ 'x e 'xin + e jn 33 Multiple Choice ≥ ● Choice set Cn: Jn alternatives, Jn 2 ( ) = + ε ≥ + ε ∀ ∈ P i | Cn P[Vin in Vjn jn , j Cn ] = P[(V + ε ) = max ∈ (V + ε )] in in j Cn jn jn = [ε − ε ≤ − ∀ ∈ ] P jn in Vin Vjn , j Cn 34 Multiple Choice ● Multinomial Logit Model ε – jn independent and identically distributed (i.i.d.) ε µ ∀ – jn ~ ExtremeValue(0, ) j F(ε ) = exp[ − e −µε ], µ > 0 f (ε ) = µe−µε exp[ − e−µε ] – Variance: π2/6µ2 µ Vin ( ) = e P i | Cn µ ∑e V jn ∈ j Cn 35 Multiple Choice – An Example ∀ ● Choice Set Cn = {auto ,bus, walk} n µ Vauto ,n = e P(auto | Cn ) µ µ µ e Vauto ,n + e Vbus ,n + e Vwalk ,n 36 Next Lecture ● Model specification and estimation ● Aggregation and forecasting ● Independence from Irrelevant Alternatives (IIA) property – Motivation for Nested Logit ● Nested Logit – specification, estimation and an example 37 MIT OpenCourseWare http://ocw.mit.edu 1.201J / 11.545J / ESD.210J Transportation Systems Analysis: Demand and Economics Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Discrete Choice Analysis I

Economic Choices

Small-Sample Properties of Tests for Heteroscedasticity in the Conditional Logit Model

Estimation of Discrete Choice Models Including Socio-Economic Explanatory Variables, Diskussionsbeiträge - Serie II, No

Discrete Choice Models As Structural Models of Demand: Some Economic Implications of Common Approaches∗

Discrete Choice Models with Multiple Unobserved ∗ Choice Characteristics

Chapter 3: Discrete Choice

Generalized Linear Models

Discrete Choice Models and Methods.2

Specification of the Utility Function in Discrete Choice Experiments

Models for Heterogeneous Choices

Discrete Choice Modeling William Greene Stern School of Business, New York University

Lecture 5 Multiple Choice Models Part I – MNL, Nested Logit