Discrete Choice Analysis I
Total Page:16
File Type:pdf, Size:1020Kb
Discrete Choice Analysis I Moshe Ben-Akiva 1.201 / 11.545 / ESD.210 Transportation Systems Analysis: Demand & Economics Fall 2008 Outline of 2 Lectures on Discrete Choice ● Introduction ● A Simple Example ● The Random Utility Model ● Specification and Estimation ● Forecasting ● IIA Property ● Nested Logit 2 Outline of this Lecture ● Introduction ● A simple example – route choice ● The Random Utility Model – Systematic utility – Random components ● Derivation of the Probit and Logit models – Binary Probit – Binary Logit – Multinomial Logit 3 Continuous vs. Discrete Goods Continuous Goods Discrete Goods auto x2 Indifference curves u3 u2 u1 x 1 bus 4 Discrete Choice Framework ● Decision-Maker – Individual (person/household) – Socio-economic characteristics (e.g. Age, gender,income, vehicle ownership) ● Alternatives – Decision-maker n selects one and only one alternative from a choice set Cn={1,2,…,i,…,Jn} with Jn alternatives ● Attributes of alternatives (e.g.Travel time, cost) ● Decision Rule – Dominance, satisfaction, utility etc. 5 Choice: Travel Mode to Work • Decision maker: an individual worker • Choice: whether to drive to work or take the bus to work • Goods: bus, auto • Utility function: U(X) = U(bus, auto) • Consumption bundles: {1,0} (person takes bus) {0,1} (person drives) 6 Consumer Choice • Consumers maximize utility – Choose the alternative that has the maximum utility (and falls within the income constraint) If U(bus) > U(auto) �choose bus If U(bus) < U(auto) �choose auto U(bus)=? U(auto)=? 7 Constructing the Utility Function ● U(bus) = U(walk time, in-vehicle time, fare, …) U(auto) = U(travel time, parking cost, …) ● Assume linear (in the parameters) β × β × U(bus) = 1 (walk time) + 2 (in-vehicle time) + … ● Parameters represent tastes, which may vary over people. – Include socio-economic characteristics (e.g., age, gender, income) β × β × – U(bus) = 1 (walk time) + 2 (in-vehicle time) + β × 3 (cost/income) + … 8 Deterministic Binary Choice ● If U(bus) - U(auto) > 0 , Probability(bus) = 1 If U(bus) - U(auto) < 0 , Probability(bus) = 0 P(bus) 1 0 0 U(bus)-U(auto) 9 Probabilistic Choice ● Random utility model Ui = V(attributes of i; parameters) + epsiloni ● What is in the epsilon? Analysts’ imperfect knowledge: – Unobserved attributes – Unobserved taste variations – Measurement errors – Use of proxy variables β × β × ● U(bus) = 1 (walk time) + 2 (in-vehicle time + β × 3 (cost/income) + … + epsilon_bus 10 Probabilistic Binary Choice P(bus) 1 0 0 V(bus)-V(auto) 11 A Simple Example: Route Choice – Sample size: N = 600 – Alternatives: Tolled, Free – Income: Low, Medium, High Route Income choice Low (k=1) Medium (k=2) High (k=3) Tolled (i=1) 10 100 90 200 Free (i=2) 140 200 60 400 150 300 150 600 12 A Simple Example: Route Choice Probabilities ● (Marginal) probability of choosing toll road P(i = 1) Pˆ(i = 1) = 200 / 600 = 1/3 ● (Joint) probability of choosing toll road and having medium income: P(i=1, k=2) Pˆ(i = 1, k = 2) = 100 / 600 = 1/6 2 3 P(i , k ) = 1 ∑∑ i=1 k =1 13 Conditional Probability P(i|k) P(i, k) = P(i) ⋅ P(k | i) = P(k) ⋅ P(i | k) = Independence P(i ) ∑ P(i, k ) k = P(k ) = ∑ P(i, k ) P(i | k) P(i) i P(i, k ) P(k | i) = P(k) P( k | i ) = , P(i ) ≠ 0 P(i ) P(i, k ) P(i | k ) = , P(k ) ≠ 0 P(k ) 14 Model : P(i|k) ● Behavioral Model~ Probability (Route Choice|Income) = P(i|k) ● Unknown parameters = = = π P(i 1| k 1) 1 = = = π P(i 1| k 2) 2 = = = π P(i 1| k 3) 3 15 Example: Model Estimation ● Estimation y c Sampling n e π = 1 π = 1 π = 3 u distribution ˆ , ˆ , ˆ q 1 15 2 3 3 5 e r f = 0.067 = 0.333 = 0.6 N=600 πˆ ⋅ ( −πˆ ) ⋅ − = 1 1 1 = 1/15 (1 1/15) = s1 0.020 N 1 150 1/3 π ˆ 2 πˆ ⋅ ( −πˆ ) ⋅ − = 2 1 2 = 1/ 3 (1 1/ 3) = s2 0.027 Standard errors N2 300 πˆ ⋅ ( −πˆ ) ⋅ − = 3 1 3 = 3 / 5 (1 3 / 5) = s3 0.040 N3 150 16 Example: Forecasting ● Toll Road share under existing income distribution: 33% ● New income distribution Route Income cho ic e Low (k=1) Medium (k=2) High (k=3) Tolled (i=1) 1/15*45=3 1/3*300=100 3/5*255=153 256 43% F ree (i =2) 42 200 102 344 57% New income 45 300 255 600 distribution Existing income 150 300 150 600 di strib uti on ● Toll road share: 33%�43% 17 The Random Utility Model ● Decision rule: Utility maximization – Decision maker n selects the alternative i with the highest utility Uin among Jn alternatives in the choice set Cn. ε Uin = Vin + in Vin =Systematic utility : function of observable variables ε in =Random utility 18 The Random Utility Model ● Choice probability: ≥ ∀ ∈ P(i|C n) = P(Uin Ujn, j Cn) ≥ ∀ ∈ = P(Uin - Ujn 0, j Cn) ∀ ∈ = P(Uin = maxj Ujn, j Cn) ● For binary choice: ≥ Pn(1) = P(U1n U2n) ≥ = P(U1n – U2n 0) 19 The Random Utility Model Routes Attributes Utility Travel time (t) Travel cost (c) (utils) Tolled (i=1) t1 c1 U1 Free (i=2) t2 c2 U2 = −β − β + ε U1 1t1 2c1 1 = −β − β + ε U 2 1t2 2c2 2 β β > 1, 2 0 20 The Random Utility Model ● Ordinal utility - Decisions are based on utility differences - Unique up to order preserving transformation = −β − β + ε + λ U1 ( 1t1 2c1 1 K ) = −β − β + ε + λ U 2 ( 1t 2 2c2 2 K) β β λ > 1, 2 , 0 21 The Random Utility Model c1-c2 V1 < V2 Alt. 2 is dominant + + + + + + + β + • + + = − 1 ⋅ − + ε • + + U1 β t1 c1 1 • + + • + 2 V1 > V2 • • + t -t • + • • + 1 2 β = − 1 ⋅ − + ε U β t c β • + 2 2 2 2 2 + + • • • 1 + β • • • + β = 1 = • + • β "value of time" • • • • • 2 • + Alt. 1 is dominant V1 = V2 • Choice = 1 β + Choice = 2 − = − 1 ⋅ − − − + ε − ε U U β (t t ) (c c ) ( ) 1 2 2 1 2 1 2 1 2 22 The Systematic Utility ● Attributes: describing the alternative – Generic vs. Specific • Examples: travel time, travel cost, frequency – Quantitative vs. Qualitative • Examples: comfort, reliability, level of service – Perception – Data availability ● Characteristics: describing the decision-maker – Socio-economic variables • Examples: income,gender,education 23 Random Terms ● Capture imperfectness of information ● Distribution of epsilons ● Variance/covariance structure – Correlation between alternatives – Multidimensional decision • Example: Mode and departure time choice ● Typical models – Logit model (i.i.d. “Extreme Value” error terms, a.k.a Gumbel) – Probit model (normal error terms) 24 Binary Choice ∀ ● Choice set Cn = {1,2} n ≥ Pn(1) = P(1|Cn) = P(U1n U2n) ε ≥ ε = P(V1n + 1n V2n + 2n) ≥ ε ε = P(V1n - V2n 2n - 1n) ≥ ε ≥ ε = P(V1n - V2n n) = P(Vn n) = Fε(Vn) 25 Binary Probit ● “Probit” name comes from Probability Unit ε σ 2 1n ~ N(0, 1 ) ε σ 2 2n ~ N(0, 2 ) ε σ2 σ 2 = σ 2 + σ 2 − σ n ~ N(0, ) where 1 2 2 12 1 ε 2 − 1 σ f (ε ) = e 2 σ 2π 1 ε 2 − V n 1 σ V = = 2 ε = Φ n Pn (1) Fε (Vn ) ∫ e d −∞ σ 2π σ where Φ ( z) is the standardized cumulative normal distribution 26 Binary Probit Normalization ● Relationship between Utility scale µ* and Scale Parameter σ : µ ε Var( * n) = 1 iff µ 2 ε = * var( n ) 1 1 1 ⇒ µ* = = ε σ Var( n ) ● Usual normalization: σ = 1, implying µ*= 1 27 Binary Logit Model ● “Logit” name comes from Logistic Probability Unit −µε ε µ ε = [− 1n ] 1n ~ ExtremeValue (0, ) Fε ( 1n ) exp e −µε ε µ ε = [− 2 n ] 2n ~ ExtremeValue (0, ) Fε ( 2n ) exp e 1 ε ~ Logistic (0,µ) ε = n Fε ( n ) −µε 1+ e n = = 1 Pn (1) Fε (Vn ) −µ 1+ e Vn 28 Why Logit? ● Probit does not have a closed form – the choice probability is an integral. ● The logistic distribution is used because: – It approximates a normal distribution quite well. – It is analytically convenient – Gumbel can also be “justified” as an extreme value distribution ● Logit does have “fatter” tails than a normal distribution. 29 Logit Model Normalization ● Relationship between Utility Scale µ* and Scale Parameter µ µ ε Var( * n) = 1 iff 1 µ* = ε Var( n ) ε ε ε π2 µ2 where Var( n)=Var( 2n - 1n)=2 /6 30 Logit Model Normalization � Usual normalization: µ =1, implying µ*= 3 π � Utility scale different from probit π – Need to multiply probit coefficients by 3 to be comparable to logit coefficients 31 Limiting Cases ● Recall: Pn(1) = P(Vn ≥ εn) = Fε(V1n –V2n) µ = 10 µ V 1n = 1 = e ● With logit, Fε (Vn ) −µ µ µ 1+ e V n e V 1n + e V 2n ● What happens as µ ∞ ? µ = 1 Pn(1) ● What happens as µ 0 ? µ = .1 Vn = V1n –V2n 32 Re-formulation ● Pn(i) = P(Uin ≥ Ujn) 1 = − µ − 1 + e (V in V jn ) µ V in = e µ µ V e V in + e jn ● If Vin and Vjn are linear in their parameters: µβ 'xin = e Pn (i) µβ µβ 'x e 'xin + e jn 33 Multiple Choice ≥ ● Choice set Cn: Jn alternatives, Jn 2 ( ) = + ε ≥ + ε ∀ ∈ P i | Cn P[Vin in Vjn jn , j Cn ] = P[(V + ε ) = max ∈ (V + ε )] in in j Cn jn jn = [ε − ε ≤ − ∀ ∈ ] P jn in Vin Vjn , j Cn 34 Multiple Choice ● Multinomial Logit Model ε – jn independent and identically distributed (i.i.d.) ε µ ∀ – jn ~ ExtremeValue(0, ) j F(ε ) = exp[ − e −µε ], µ > 0 f (ε ) = µe−µε exp[ − e−µε ] – Variance: π2/6µ2 µ Vin ( ) = e P i | Cn µ ∑e V jn ∈ j Cn 35 Multiple Choice – An Example ∀ ● Choice Set Cn = {auto ,bus, walk} n µ Vauto ,n = e P(auto | Cn ) µ µ µ e Vauto ,n + e Vbus ,n + e Vwalk ,n 36 Next Lecture ● Model specification and estimation ● Aggregation and forecasting ● Independence from Irrelevant Alternatives (IIA) property – Motivation for Nested Logit ● Nested Logit – specification, estimation and an example 37 MIT OpenCourseWare http://ocw.mit.edu 1.201J / 11.545J / ESD.210J Transportation Systems Analysis: Demand and Economics Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.