Single Agent Dynamics: Dynamic Discrete Choice Models Part I: Theory and Solution Methods
Günter J. Hitsch The University of Chicago Booth School of Business
2013
1/113
Overview: Goals of this lecture
1. Discuss the structure and theory of dynamic discrete choice models 2. Survey numerical solution methods 3. Discuss goals of statistical inference: Reduced vs structural form of the model 4. Model estimation
I Nested fixed point (NFP) estimators I MPEC approach I Heterogeneity and unobserved state variables
2/113 Overview
Review: Infinite Horizon, Discounted Markov Decision Processes
The Storable Goods Demand Problem
Dynamic Discrete Choice Models
The Storable Goods Demand Problem (Cont.)
The Durable Goods Adoption Problem
Numerical Solution of a Dynamic Decision Process Approximation and Interpolation Numerical Integration
The Durable Goods Adoption Problem (Cont.)
References
3/113
Overview
Review: Infinite Horizon, Discounted Markov Decision Processes
The Storable Goods Demand Problem
Dynamic Discrete Choice Models
The Storable Goods Demand Problem (Cont.)
The Durable Goods Adoption Problem
Numerical Solution of a Dynamic Decision Process Approximation and Interpolation Numerical Integration
The Durable Goods Adoption Problem (Cont.)
References
4/113 Elements of a Markov Decision Process
I Time: t =0, 1,... (infinite horizon) I A set of states, s S 2 I S is the state space
I A set of actions, a (s) 2 A I Set of feasible actions may depend on the current state, although this is more general than what we typically need in dynamic discrete choice
5/113
Elements of a Markov Decision Process
I A decision maker chooses some action a (s) in period t, given 2 A the current state s, and then receives the utility U(s, a) I The state evolves according to a probability distribution p( s, a) ·| I If S is discrete, then Pr st+1 = j st,at = p(j st,at) for j S { | } | 2 I If S is continuous, then p( st,at) is the conditional density of st+1 ·|
6/113 Elements of a Markov Decision Process
I The decision process is called “Markov” because the utility function and transition probabilities depend only on the current state and actions, not on the whole history of the process
I Utilities, U(s, a), and transition probabilities, p( s, a) are stationary, ·| i.e., do not depend on the time period t when an action is taken
I Note: this does not imply that the time series process st 1 is { }t=0 stationary, and vice versa, stationarity of st 1 is not required in { }t=0 this class of models I Example: A decision make gets tired of a product as time progresses. Could this be captured by a Markov Decision Process? — Answer: yes I Ut(s, a)= t is not allowed I Instead let S = 0, 1, 2,... , and for any s S let p(s +1s, a)=1, { } 2 | i.e. st+1 = st +1. Then define
U(s, a)= s
7/113
Actions, decision rules, and policies
I We examine decision problems where the decision maker uses current (and possibly past) information to choose actions in order to maximize some overall objective function
I Decision rules: dt : S s (s),dt(s) (s) ! 2S A 2 A I Decision rules associateS a current state with a feasible action I Policies, strategies, or plans: ⇡ =(d0,d1,...)=(dt)t1=0
I A policy is a sequence of decision rules—a plan on how to choose actions at each point in time, given the state at that point in time, st
8/113 Expected total discounted reward
I A given policy ⇡ =(dt)t1=0 induces a probability distribution on the sequence of states and actions, (st,at)t1=0 I For a given policy ⇡, the expected total discounted reward (utility) is given by
⇡ ⇡ 1 t v (s0)=E U(st,dt(st)) s0 | "t=0 # X I Future utilities are discounted based on the discount factor 0 < 1 (in finite horizon problems we could allow for =1)
9/113
Expected total discounted reward
⇡ I We will assume that v is well-defined I One necessary condition if the state space is discrete: The series inside the expectation converges
I Ensured if U is bounded, because 0 < 1 I Note, however, that boundedness is not a necessary condition for convergence
I If the state space is continuous, certain technical conditions need to hold for the expectation to exist (see Stokey and Lucas 1989)
10 / 113 Optimal policies
I Let ⇧ be the set of all policies. ⇡⇤ ⇧ is optimal if 2 ⇡⇤ ⇡ v (s) v (s),sS 2 for all ⇡ ⇧ 2 I The value of the Markov decision problem is given by
⇡ v⇤(s)=sup v (s) ⇡ ⇧ { } 2
I We call v⇤ the (optimal) value function
I An optimal policy ⇡⇤ is a policy such that
⇡⇤ ⇡ v (s)=v⇤(s) = max v (s) ⇡ ⇧ { } 2
11 / 113
Bellman Equation
I Intuition:
I Due to the stationarity of the utilities and transition probabilities, and given that there is always an infinite decision horizon, the (optimal) value function should only depend on the current state, but not on t
I Hence, for any current state s, the expected total discounted reward under optimal decision making should satisfy the relationship
v⇤(s)= sup U(s, a)+ p(s0 s, a)v⇤(s0)ds0 a (s) | 2A ⇢ Z
12 / 113 Bellman Equation
I The equation above is called a Bellman equation I Note that the Bellman equation is a functional equation with potential solutions of the form v : S R ! I More formally, let = f : S R : v < be the metric space B { ! k k 1} of real-valued, bounded functions on S. f =sups f(s) is the k k 2S | | supremum-norm. Define the operator L : , B ! B
Lv(s) sup U(s, a)+ p(s0 s, a)v(s0)ds0 ⌘ a (s) | 2A ⇢ Z
I A solution of the Bellman equation is a fixed point of L, v = Lv
13 / 113
Questions to be addressed
1. Is a solution of the Bellman equation the optimal value function? 2. Is there a solution of the Bellman equation, and how many solutions exist in general?
14 / 113 The main theorem
Theorem If v is a solution of the Bellman equation, v = Lv, then v = v⇤. Furthermore, under the model assumptions stated above a (unique) solution of the Bellman equation always exists.
I The proof of this theorem is based on the property that L is a contraction mapping. According to the Banach fixed-point theorem, a contraction mapping (defined on a complete metric space) has a unique fixed point. The contraction mapping property crucially depends on the additive separability of the total discounted reward function, and in particular on discounting, < 1
15 / 113
Optimal policies
I Assume that the supremum in the Bellman equation can be attained, and define the decision rule
d⇤(s) arg max U(s, a)+ p(s0 s, a)v⇤(s0) 2 a (s) | 2A ⇢ Z
I Let ⇡⇤ =(d⇤,d⇤,...) be a corresponding policy Theorem Let ⇡⇤ be as defined above. Then ⇡⇤ is an optimal policy, such that ⇡⇤ v = v⇤
16 / 113 Optimal policies
I Note that the theorem on the previous slide shows us how to find an optimal policy I In particular, note that the theorem shows that if the supremum in the Bellman equation can be attained, then there is a stationary optimal policy
17 / 113
Optimal Policies: Notes
I Asufficient condition for the existence of an optimal policy is
I (s) is finite A I Satisfied (by definition) in dynamic discrete choice problems
I Decision rules that depend on the past history of states and actions do not improve the total expected reward
I Hence, basing decisions only on current, not past information, is not restrictive in Markov decision processes
18 / 113 Overview
Review: Infinite Horizon, Discounted Markov Decision Processes
The Storable Goods Demand Problem
Dynamic Discrete Choice Models
The Storable Goods Demand Problem (Cont.)
The Durable Goods Adoption Problem
Numerical Solution of a Dynamic Decision Process Approximation and Interpolation Numerical Integration
The Durable Goods Adoption Problem (Cont.)
References
19 / 113
Kellogg’s Raisin Bran, 20 oz pack size
●●
●●●● ●●●●●●●●●● ●●● ● ●●● ●●●●●●●●●●● ●●●●●●
4.0 ●●● ●●●●● ●●●●●●●● ●● ●●●●● ●●●●●● ●●●● ●●●● ●●● ●●●● ●● ●●● ●●●●●●● ●●●● ● ● ●●●●
●●●●●●●●● ●●●●●● ●●●●●●●● ●●
●●●● ●●●●●●●● 3.5 ●●● ●●●●●●●● ●●● ● ●●●●
● ●● ● ●●● ● ● ●
●● ● ● ● ● 3.0 ● ● ● ● ●
Price ● ● ●
●● ● 2.5 ● ●
● ● ●
● ● ● ● 2.0 ● ● ● ● ● ●
● ● ● ● ●
2008 2009 2010 2011 2012
● ● ● ● ● ● ● ● ● ● ● ● 2.0
● ● ●● ● ● ● ●
● 1.5 ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●
Log (base 10) units ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●
● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5
● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● 0.0
2008 2009 2010 2011 2012
20 / 113 Overview: Sales promotions
I Many CPG products are sold at both regular (base) prices and at promotional prices I We often observe sales spikes during a promotion
I Could be due to brand switching or increased consumption I Could be due to stockpiling if the good is storable
I Develop a simple model of stockpiling—storable goods demand
I Goal: predict how demand is affected by the distribution of prices at which the storable good can be bought
21 / 113
Model outline
I At the beginning of each period a consumer goes to a store and chooses among K different pack sizes of a storable good
I Pack size k contains nk units of the product (nk 1, 2,... ) 2 { } I No purchase: n0 =0
I The choice in period t is a = 0, 1,...,K t 2 A { } I it is the number of units in the consumer’s inventory at the beginning of period t
I After the shopping trip, if at = k the consumer has it + nk units of the product at her disposal I The consumer has a consumption need of one unit in each period
I Will consume one unit if it + nk 1, and will not consume otherwise
22 / 113 Evolution of inventory
I Consumer can store at most I units
I Conditional on it and at = k, inventory evolves as
0 if it + nk =0 i = i + n 1 if 1 i + n I t+1 8 t k t k < I if it + nk >I : I Note: Units in excess of I +1can neither be consumed nor stored
I The evolution of inventory is deterministic, and we can write it+1 = (it,at)
23 / 113
Utilities and disutilities
I The purchase utility (rather: disutility) from buying pack size k is ↵P ⌧(n ) kt k I Pkt is the price of pack size k and ↵ is the price sensitivity parameter I ⌧(nk) is the hassle cost of purchasing (or transporting) pack size k I Outside option: P0t =0, ⌧(n0)=⌧(0) = 0
I The consumption utility, given that there is at least one unit to consume (i + n 1) is ,and0 otherwise t k I Inventory holding cost: c(it+1)
I c(0) = 0
24 / 113 Notation
I Define x (i ,P ), where P =(P ,...,P ) t ⌘ t t t 1t Kt I Define utility as a function of xt and as a function of the chosen pack size at = k :
↵P ⌧(n )+ c(i ) if k =0 kt k t+1 6 u (x )= c(i ) if k =0and i 1 k t 8 t+1 t < 0 if k =0and it =0 :
25 / 113
Distribution of prices
I The price vectors Pt are drawn from a discrete distribution (1) (L) I Pt can take one of L values P ,...,P with probability Pr P = P (l) = ⇡ { } { t } l I In this specification prices are i.i.d. over time
I Generalization: Prices be drawn from a Markov process where current prices depend on past price realizations
26 / 113 State vector and utility function
I A random utility term enters the consumer’s preferences over each pack size choice
I ✏t =(✏0t,...,✏Kt), where ✏kt is iid Type I Extreme Value distributed
I g(✏) is the pdf of ✏t
I The state of this decision process is st =(xt, ✏t) I The (full) utility function of the consumer:
U(st,at = k)=uk(xt)+✏kt
27 / 113
State evolution
I The evolution of xt is given by the pmf f :
f(i, P (l) x ,a ) Pr i = i, P = P (l) x ,a | t t ⌘ { t+1 t+1 | t t} = I i = (it,at) ⇡l { }·
I Defined for 0 i I,1 l L I Then the evolution of the state, s =(x, ✏), is given by
p(s s ,a )=f(x x ,a ) g(✏ ) t+1| t t t+1| t t · t+1
28 / 113 Discussion
I The storable goods demand model has a specific structure due to the way the random utility terms ✏kt enter the problem:
I The state has two separate components, xt and ✏t I The utility function has the additive form
U(st,at = k)=uk(xt)+✏kt
I The transition probability factors into two separate components:
p(st+1 st,at)=f(xt+1 xt,at) g(✏t+1) | | · I In the next section we will discuss Markov decision processes with this special structure
29 / 113
Overview
Review: Infinite Horizon, Discounted Markov Decision Processes
The Storable Goods Demand Problem
Dynamic Discrete Choice Models
The Storable Goods Demand Problem (Cont.)
The Durable Goods Adoption Problem
Numerical Solution of a Dynamic Decision Process Approximation and Interpolation Numerical Integration
The Durable Goods Adoption Problem (Cont.)
References
30 / 113 Structure of the decision problem
I A decision maker (typically a consumer or household) chooses among K +1alternatives (goods or services), a = 0, 1,...,K , in each period t =0, 1,... 2 A { } I Alternative 0 typically denotes the no-purchase option
I The state vector, s S, has the following structure: 2 K+1 s =(x, ✏), where x X, ✏ R 2 2
I The transition probability is p(s s ,a ) t+1| t t
31 / 113
Main assumptions
1. Additive separability. The utility from choosing action at = j in state st =(xt, ✏t) has the following additive structure:
U(st,at)=uj(xt)+✏jt
2. Conditional independence. Given xt and at, current realizations of ✏t do not influence the realizations of future states xt+1. Hence, the transition probability of x can be written as f(x x ,a ) t+1| t t 3. iid ✏jt. ✏jt is iid across actions and time periods.
I g(✏) is the pdf of ✏ =(✏0,...,✏K ),andwetypicallyassumethatthe support of ✏j is R I We could allow for g(✏ x), although this more general specification is | rarely used in practice
32 / 113 Discussion of model structure
I While x can contain both observed and unobserved components, ✏ is assumed to be unobservable to the researcher
I ✏j is an unobserved state variable
I Additive separability and the assumptions on ✏j will allow us to rationalize any observed choice, and thus ensure that the likelihood function is positive on the whole parameter space I Given the conditional independence and iid assumptions, the transition probability can be written as follows:
p(s s ,a )=f(x x ,a ) g(✏ ) t+1| t t t+1| t t · t+1
33 / 113
Decision rules and rewards
I In each period, the decision maker chooses an action according to a K+1 decision rule d : X R ,at = d(xt, ✏t). ⇥ ! A I The expected present discounted value of utilities under the policy ⇡ =(d, d, . . . ) is
⇡ 1 t v (x0, ✏0)=E U(xt, ✏t,d(xt, ✏t)) x0, ✏0 . | "t=0 # X
34 / 113 Optimal behavior and the Bellman Equation
I Given that is finite, there is an optimal decision rule d⇤(x, ✏) A I The associated value function satisfies
v(x, ✏) = max uj(x)+✏j + E [v(x0, ✏0) x, a = j] j { | } 2A
= max uj(x)+✏j + v(x0, ✏0)f(x0 x, j)g(✏0) dx0d✏0 j | 2A ⇢ Z
35 / 113
The expected value function
I Goal:
I Characterize optimality condition as a function of x only I faster computation of model solution )
I Define the expected or integrated value function,
w(x)= v(x, ✏)g(✏)d✏ Z I w(x) is the value that the decision maker expects to receive before the unobserved states, ✏, are realized I Using the definition of w, we can re-write the value function:
v(x, ✏) = max uj(x)+✏j + w(x0)f(x0 x, j) dx0 j | 2A ⇢ Z
36 / 113 The expected value function
I Taking the expectation on both sides of the previous (last slide) equation with respect to ✏, we obtain the integrated Bellman equation
w(x)= max uj(x)+✏j + w(x0)f(x0 x, j) dx0 g(✏)d✏ j | Z 2A ⇢ Z I The right-hand side of this equation defines a contraction mapping, : , on the space of bounded functions, B ! B B I Thus, the equation has a unique solution which must equal the expected value function, w = (w) I Note the role of the conditional independence and iid ✏ assumptions in the derivation of this equation
37 / 113
Blackwell’s sufficient conditions for a contraction
Theorem Let = f : S R : f < be the metric space of real-valued, B { ! k k 1} bounded functions on S. Suppose the operator L : satisfies the B ! B following two conditions: (1) Monotonicity: If f,g and f(x) g(x) for all x S, then 2 B 2 (Lf)(x) (Lg)(x) for all x S. 2 (2) Discounting: There is a number (0, 1) such that 2 (L(f + a))(x) (Lf)(x)+ a for all f ,a>0, and x S. 2 B 2 Then L is a contraction mapping with modulus .
38 / 113 is a contraction mapping
Proof. If g h, then
g(x0)f(x0 x, j) dx0 h(x0)f(x0 x, j) dx0 | | Z Z for all x X and j . It follows that f h. Let a 0. Then 2 2 A