Discrete Choice Analysis I

Moshe Ben-Akiva

1.201 / 11.545 / ESD.210 Transportation Systems Analysis: Demand &

Fall 2008 Outline of 2 Lectures on Discrete Choice

● Introduction ● A Simple Example ● The Random Model ● Specification and Estimation ● Forecasting ● IIA Property ● Nested

2 Outline of this Lecture

● Introduction ● A simple example – route choice ● The Random Utility Model – Systematic utility – Random components ● Derivation of the Probit and Logit models – Binary Probit – Binary Logit – Multinomial Logit

3 Continuous vs. Discrete Goods

Continuous Goods Discrete Goods

auto x2

Indifference curves u3 u2 u1 x 1 bus

4 Discrete Choice Framework

● Decision-Maker – Individual (person/household) – Socio-economic characteristics (e.g. Age, gender,income, vehicle ownership) ● Alternatives – Decision-maker n selects one and only one alternative from a choice set Cn={1,2,…,i,…,Jn} with Jn alternatives ● Attributes of alternatives (e.g.Travel time, cost) ● Decision Rule – Dominance, satisfaction, utility etc.

5 Choice: Travel Mode to Work

• Decision maker: an individual worker • Choice: whether to drive to work or take the bus to work • Goods: bus, auto • Utility function: U(X) = U(bus, auto) • Consumption bundles: {1,0} (person takes bus) {0,1} (person drives)

6

• Consumers maximize utility – Choose the alternative that has the maximum utility (and falls within the income constraint)

If U(bus) > U(auto) �choose bus If U(bus) < U(auto) �choose auto

U(bus)=? U(auto)=?

7 Constructing the Utility Function

● U(bus) = U(walk time, in-vehicle time, fare, …) U(auto) = U(travel time, parking cost, …) ● Assume linear (in the parameters) β × β × U(bus) = 1 (walk time) + 2 (in-vehicle time) + … ● Parameters represent tastes, which may vary over people. – Include socio-economic characteristics (e.g., age, gender, income) β × β × – U(bus) = 1 (walk time) + 2 (in-vehicle time) + β × 3 (cost/income) + …

8 Deterministic Binary Choice

● If U(bus) - U(auto) > 0 , Probability(bus) = 1 If U(bus) - U(auto) < 0 , Probability(bus) = 0

P(bus) 1

0 0 U(bus)-U(auto)

9 Probabilistic Choice

● Random utility model

Ui = V(attributes of i; parameters) + epsiloni ● What is in the epsilon? Analysts’ imperfect knowledge: – Unobserved attributes – Unobserved taste variations – Measurement errors – Use of proxy variables β × β × ● U(bus) = 1 (walk time) + 2 (in-vehicle time + β × 3 (cost/income) + … + epsilon_bus

10 Probabilistic Binary Choice

P(bus)

1

0 0 V(bus)-V(auto)

11 A Simple Example: Route Choice

– Sample size: N = 600 – Alternatives: Tolled, Free – Income: Low, Medium, High

Route Income choice Low (k=1) Medium (k=2) High (k=3) Tolled (i=1) 10 100 90 200 Free (i=2) 140 200 60 400 150 300 150 600

12 A Simple Example: Route Choice

Probabilities ● (Marginal) probability of choosing toll road P(i = 1) Pˆ(i = 1) = 200 / 600 = 1/3 ● (Joint) probability of choosing toll road and having medium income: P(i=1, k=2) Pˆ(i = 1, k = 2) = 100 / 600 = 1/6

2 3 ∑∑ P(i , k ) = 1 i=1 k =1

13 Conditional Probability P(i|k)

P(i, k) = P(i) ⋅ P(k | i) = P(k) ⋅ P(i | k) = Independence P(i ) ∑ P(i, k ) k = P(k ) = ∑ P(i, k ) P(i | k) P(i) i P(i, k ) P(k | i) = P(k) P( k | i ) = , P(i ) ≠ 0 P(i ) P(i, k ) P(i | k ) = , P(k ) ≠ 0 P(k )

14 Model : P(i|k)

● Behavioral Model~ Probability (Route Choice|Income) = P(i|k) ● Unknown parameters = = = π P(i 1| k 1) 1 = = = π P(i 1| k 2) 2 = = = π P(i 1| k 3) 3

15 Example: Model Estimation

● Estimation y c Sampling n e

π = 1 π = 1 π = 3 u distribution ˆ , ˆ , ˆ q

1 15 2 3 3 5 e r f = 0.067 = 0.333 = 0.6 N=600

πˆ ⋅ ( −πˆ ) ⋅ − = 1 1 1 = 1/15 (1 1/15) = s1 0.020 N 1 150 1/3 π ˆ 2 πˆ ⋅ ( −πˆ ) ⋅ − = 2 1 2 = 1/ 3 (1 1/ 3) = s2 0.027 Standard errors N2 300 πˆ ⋅ ( −πˆ ) ⋅ − = 3 1 3 = 3 / 5 (1 3 / 5) = s3 0.040 N3 150

16 Example: Forecasting

● Toll Road share under existing income distribution: 33% ● New income distribution Route Income cho ic e Low (k=1) Medium (k=2) High (k=3) Tolled (i=1) 1/15*45=3 1/3*300=100 3/5*255=153 256 43% F ree (i =2) 42 200 102 344 57% New income 45 300 255 600 distribution Existing income 150 300 150 600 di strib uti on ● Toll road share: 33%�43%

17 The Random Utility Model

● Decision rule: Utility maximization – Decision maker n selects the alternative i with the highest

utility Uin among Jn alternatives in the choice set Cn. ε Uin = Vin + in

Vin =Systematic utility : function of observable variables ε in =Random utility

18 The Random Utility Model

● Choice probability: ≥ ∀ ∈ P(i|C n) = P(Uin Ujn, j Cn) ≥ ∀ ∈ = P(Uin - Ujn 0, j Cn) ∀ ∈ = P(Uin = maxj Ujn, j Cn) ● For binary choice: ≥ Pn(1) = P(U1n U2n) ≥ = P(U1n – U2n 0)

19 The Random Utility Model

Routes Attributes Utility Travel time (t) Travel cost (c) (utils)

Tolled (i=1) t1 c1 U1

Free (i=2) t2 c2 U2 = −β − β + ε U1 1t1 2c1 1 = −β − β + ε U 2 1t2 2c2 2 β β > 1, 2 0

20 The Random Utility Model

● Ordinal utility - Decisions are based on utility differences - Unique up to order preserving transformation

= −β − β + ε + λ U1 ( 1t1 2c1 1 K ) = −β − β + ε + λ U 2 ( 1t 2 2c2 2 K) β β λ > 1, 2 , 0

21 The Random Utility Model

c1-c2

V1 < V2 Alt. 2 is dominant + + + + + + + β + • + + = − 1 ⋅ − + ε • + + U1 β t1 c1 1 • + + • + 2 V1 > V2 • • + t -t • + • • + 1 2 β = − 1 ⋅ − + ε U β t c β • + 2 2 2 2 2 + + • • • 1 + β • • • + β = 1 = • + • β "value of time" • • • • • 2 • + Alt. 1 is dominant

V1 = V2 • Choice = 1 β + Choice = 2 − = − 1 ⋅ − − − + ε − ε U U β (t t ) (c c ) ( ) 1 2 2 1 2 1 2 1 2

22 The Systematic Utility

● Attributes: describing the alternative – Generic vs. Specific • Examples: travel time, travel cost, frequency – Quantitative vs. Qualitative • Examples: comfort, reliability, level of service – Perception – Data availability ● Characteristics: describing the decision-maker – Socio-economic variables • Examples: income,gender,education

23 Random Terms

● Capture imperfectness of information ● Distribution of epsilons ● Variance/covariance structure – Correlation between alternatives – Multidimensional decision • Example: Mode and departure time choice ● Typical models – Logit model (i.i.d. “Extreme Value” error terms, a.k.a Gumbel) – (normal error terms)

24 Binary Choice

∀ ● Choice set Cn = {1,2} n ≥ Pn(1) = P(1|Cn) = P(U1n U2n) ε ≥ ε = P(V1n + 1n V2n + 2n) ≥ ε ε = P(V1n - V2n 2n - 1n) ≥ ε ≥ ε = P(V1n - V2n n) = P(Vn n) = Fε(Vn)

25 Binary Probit

● “Probit” name comes from Probability Unit ε σ 2 1n ~ N(0, 1 ) ε σ 2 2n ~ N(0, 2 ) ε σ2 σ 2 = σ 2 + σ 2 − σ n ~ N(0, ) where 1 2 2 12

1 ε 2 −   1 σ f (ε ) = e 2   σ 2π 1  ε 2 −   V n 1 σ V  = = 2   ε = Φ n Pn (1) Fε (Vn ) ∫ e d   −∞ σ 2π  σ  where Φ (z )i s the standardized cumulative

26 Binary Probit Normalization

● Relationship between Utility scale µ* and Scale Parameter σ : µ ε Var( * n) = 1 iff µ 2 ε = * var( n ) 1 1 1 ⇒ µ* = = ε σ Var( n ) ● Usual normalization: σ = 1, implying µ*= 1

27 Binary Logit Model

● “Logit” name comes from Logistic Probability Unit −µε ε µ ε = [− 1n ] 1n ~ ExtremeValue (0, ) Fε ( 1n ) exp e −µε ε µ ε = [− 2 n ] 2n ~ ExtremeValue (0, ) Fε ( 2n ) exp e 1 ε ~ Logistic (0,µ) ε = n Fε ( n ) −µε 1+ e n

= = 1 Pn (1) Fε (Vn ) −µ 1+ e Vn

28 Why Logit?

● Probit does not have a closed form – the choice probability is an integral. ● The is used because: – It approximates a normal distribution quite well. – It is analytically convenient – Gumbel can also be “justified” as an extreme value distribution ● Logit does have “fatter” tails than a normal distribution.

29 Logit Model Normalization

● Relationship between Utility Scale µ* and Scale Parameter µ µ ε Var( * n) = 1 iff 1 µ* = ε Var( n ) ε ε ε π2 µ2 where Var( n)=Var( 2n - 1n)=2 /6

30 Logit Model Normalization

• Usual normalization: µ =1, implying µ*= 3 π

• Utility scale different from probit π – Need to multiply probit coefficients by 3 to be comparable to logit coefficients

31 Limiting Cases

● Recall: Pn(1) = P(Vn ≥ εn)

= Fε(V1n –V2n) µ = 10

µ V 1n = 1 = e ● With logit, Fε (Vn ) −µ µ µ 1+ e V n e V 1n + e V 2n

● What happens as  ∞ ? µ = 1 Pn(1)

● What happens as  0 ?

µ = .1

Vn = V1n –V2n

32 Re-formulation

● Pn(i) = P(Uin ≥ Ujn) 1 = − µ − 1 + e (V in V jn ) µ V in = e µ µ V e V in + e jn

● If Vin and Vjn are linear in their parameters: µβ 'xin = e Pn (i) µβ µβ 'x e 'xin + e jn

33 Multiple Choice

≥ ● Choice set Cn: Jn alternatives, Jn 2 ( )= + ε ≥ + ε ∀ ∈ P i | Cn P[Vin in Vjn jn , j Cn ]

= P[(V + ε )= max ∈ (V + ε )] in in j Cn jn jn = [ε − ε ≤ − ∀ ∈ ] P jn in Vin Vjn , j Cn

34 Multiple Choice

● Multinomial Logit Model ε – jn independent and identically distributed (i.i.d.) ε µ ∀ – jn ~ ExtremeValue(0, ) j F(ε ) = exp[ − e −µε ], µ > 0

f (ε ) = µe−µε exp[ − e−µε ] – Variance: π2/6µ2 µ Vin ( )= e P i | Cn µ ∑e V jn ∈ j Cn

35 Multiple Choice – An Example

∀ ● Choice Set Cn = {auto ,bus, walk} n

µ Vauto ,n = e P(auto | Cn ) µ µ µ e Vauto ,n + e Vbus ,n + e Vwalk ,n

36 Next Lecture

● Model specification and estimation ● Aggregation and forecasting ● Independence from Irrelevant Alternatives (IIA) property – Motivation for Nested Logit ● Nested Logit – specification, estimation and an example

37 MIT OpenCourseWare http://ocw.mit.edu

1.201J / 11.545J / ESD.210J Transportation Systems Analysis: Demand and Economics Fall 2008

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.