Dynamic Discrete Choice Models Overview: Goals of This Lecture
Total Page:16
File Type:pdf, Size:1020Kb
Single Agent Dynamics: Dynamic Discrete Choice Models Part I: Theory and Solution Methods Günter J. Hitsch The University of Chicago Booth School of Business 2013 1/113 Overview: Goals of this lecture 1. Discuss the structure and theory of dynamic discrete choice models 2. Survey numerical solution methods 3. Discuss goals of statistical inference: Reduced vs structural form of the model 4. Model estimation I Nested fixed point (NFP) estimators I MPEC approach I Heterogeneity and unobserved state variables 2/113 Overview Review: Infinite Horizon, Discounted Markov Decision Processes The Storable Goods Demand Problem Dynamic Discrete Choice Models The Storable Goods Demand Problem (Cont.) The Durable Goods Adoption Problem Numerical Solution of a Dynamic Decision Process Approximation and Interpolation Numerical Integration The Durable Goods Adoption Problem (Cont.) References 3/113 Overview Review: Infinite Horizon, Discounted Markov Decision Processes The Storable Goods Demand Problem Dynamic Discrete Choice Models The Storable Goods Demand Problem (Cont.) The Durable Goods Adoption Problem Numerical Solution of a Dynamic Decision Process Approximation and Interpolation Numerical Integration The Durable Goods Adoption Problem (Cont.) References 4/113 Elements of a Markov Decision Process I Time: t =0, 1,... (infinite horizon) I A set of states, s S 2 I S is the state space I A set of actions, a (s) 2 A I Set of feasible actions may depend on the current state, although this is more general than what we typically need in dynamic discrete choice 5/113 Elements of a Markov Decision Process I A decision maker chooses some action a (s) in period t, given 2 A the current state s, and then receives the utility U(s, a) I The state evolves according to a probability distribution p( s, a) ·| I If S is discrete, then Pr st+1 = j st,at = p(j st,at) for j S { | } | 2 I If S is continuous, then p( st,at) is the conditional density of st+1 ·| 6/113 Elements of a Markov Decision Process I The decision process is called “Markov” because the utility function and transition probabilities depend only on the current state and actions, not on the whole history of the process I Utilities, U(s, a), and transition probabilities, p( s, a) are stationary, ·| i.e., do not depend on the time period t when an action is taken I Note: this does not imply that the time series process st 1 is { }t=0 stationary, and vice versa, stationarity of st 1 is not required in { }t=0 this class of models I Example: A decision make gets tired of a product as time progresses. Could this be captured by a Markov Decision Process? — Answer: yes I Ut(s, a)= t is not allowed − I Instead let S = 0, 1, 2,... , and for any s S let p(s +1s, a)=1, { } 2 | i.e. st+1 = st +1. Then define U(s, a)= s − 7/113 Actions, decision rules, and policies I We examine decision problems where the decision maker uses current (and possibly past) information to choose actions in order to maximize some overall objective function I Decision rules: dt : S s (s),dt(s) (s) ! 2S A 2 A I Decision rules associateS a current state with a feasible action I Policies, strategies, or plans: ⇡ =(d0,d1,...)=(dt)t1=0 I A policy is a sequence of decision rules—a plan on how to choose actions at each point in time, given the state at that point in time, st 8/113 Expected total discounted reward I A given policy ⇡ =(dt)t1=0 induces a probability distribution on the sequence of states and actions, (st,at)t1=0 I For a given policy ⇡, the expected total discounted reward (utility) is given by ⇡ ⇡ 1 t v (s0)=E β U(st,dt(st)) s0 | "t=0 # X I Future utilities are discounted based on the discount factor 0 β < 1 (in finite horizon problems we could allow for β =1) 9/113 Expected total discounted reward ⇡ I We will assume that v is well-defined I One necessary condition if the state space is discrete: The series inside the expectation converges I Ensured if U is bounded, because 0 β < 1 I Note, however, that boundedness is not a necessary condition for convergence I If the state space is continuous, certain technical conditions need to hold for the expectation to exist (see Stokey and Lucas 1989) 10 / 113 Optimal policies I Let ⇧ be the set of all policies. ⇡⇤ ⇧ is optimal if 2 ⇡⇤ ⇡ v (s) v (s),sS ≥ 2 for all ⇡ ⇧ 2 I The value of the Markov decision problem is given by ⇡ v⇤(s)=sup v (s) ⇡ ⇧ { } 2 I We call v⇤ the (optimal) value function I An optimal policy ⇡⇤ is a policy such that ⇡⇤ ⇡ v (s)=v⇤(s) = max v (s) ⇡ ⇧ { } 2 11 / 113 Bellman Equation I Intuition: I Due to the stationarity of the utilities and transition probabilities, and given that there is always an infinite decision horizon, the (optimal) value function should only depend on the current state, but not on t I Hence, for any current state s, the expected total discounted reward under optimal decision making should satisfy the relationship v⇤(s)= sup U(s, a)+β p(s0 s, a)v⇤(s0)ds0 a (s) | 2A ⇢ Z 12 / 113 Bellman Equation I The equation above is called a Bellman equation I Note that the Bellman equation is a functional equation with potential solutions of the form v : S R ! I More formally, let = f : S R : v < be the metric space B { ! k k 1} of real-valued, bounded functions on S. f =sups f(s) is the k k 2S | | supremum-norm. Define the operator L : , B ! B Lv(s) sup U(s, a)+β p(s0 s, a)v(s0)ds0 ⌘ a (s) | 2A ⇢ Z I A solution of the Bellman equation is a fixed point of L, v = Lv 13 / 113 Questions to be addressed 1. Is a solution of the Bellman equation the optimal value function? 2. Is there a solution of the Bellman equation, and how many solutions exist in general? 14 / 113 The main theorem Theorem If v is a solution of the Bellman equation, v = Lv, then v = v⇤. Furthermore, under the model assumptions stated above a (unique) solution of the Bellman equation always exists. I The proof of this theorem is based on the property that L is a contraction mapping. According to the Banach fixed-point theorem, a contraction mapping (defined on a complete metric space) has a unique fixed point. The contraction mapping property crucially depends on the additive separability of the total discounted reward function, and in particular on discounting, β < 1 15 / 113 Optimal policies I Assume that the supremum in the Bellman equation can be attained, and define the decision rule d⇤(s) arg max U(s, a)+β p(s0 s, a)v⇤(s0) 2 a (s) | 2A ⇢ Z I Let ⇡⇤ =(d⇤,d⇤,...) be a corresponding policy Theorem Let ⇡⇤ be as defined above. Then ⇡⇤ is an optimal policy, such that ⇡⇤ v = v⇤ 16 / 113 Optimal policies I Note that the theorem on the previous slide shows us how to find an optimal policy I In particular, note that the theorem shows that if the supremum in the Bellman equation can be attained, then there is a stationary optimal policy 17 / 113 Optimal Policies: Notes I Asufficient condition for the existence of an optimal policy is I (s) is finite A I Satisfied (by definition) in dynamic discrete choice problems I Decision rules that depend on the past history of states and actions do not improve the total expected reward I Hence, basing decisions only on current, not past information, is not restrictive in Markov decision processes 18 / 113 Overview Review: Infinite Horizon, Discounted Markov Decision Processes The Storable Goods Demand Problem Dynamic Discrete Choice Models The Storable Goods Demand Problem (Cont.) The Durable Goods Adoption Problem Numerical Solution of a Dynamic Decision Process Approximation and Interpolation Numerical Integration The Durable Goods Adoption Problem (Cont.) References 19 / 113 Kellogg’s Raisin Bran, 20 oz pack size ●● ●●●● ●●●●●●●●●● ●●● ● ●●● ●●●●●●●●●●● ●●●●●● 4.0 ●●● ●●●●● ●●●●●●●● ●● ●●●●● ●●●●●● ●●●● ●●●● ●●● ●●●● ●● ●●● ●●●●●●● ●●●● ● ● ●●●● ●●●●●●●●● ●●●●●● ●●●●●●●● ●● ●●●● ●●●●●●●● 3.5 ●●● ●●●●●●●● ●●● ● ●●●● ● ●● ● ●●● ● ● ● ●● ● ● ● ● 3.0 ● ● ● ● ● Price ● ● ● ●● ● 2.5 ● ● ● ● ● ● ● ● ● 2.0 ● ● ● ● ● ● ● ● ● ● ● 2008 2009 2010 2011 2012 ● ● ● ● ● ● ● ● ● ● ● ● 2.0 ● ● ●● ● ● ● ● ● 1.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● Log (base 10) units ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 2008 2009 2010 2011 2012 20 / 113 Overview: Sales promotions I Many CPG products are sold at both regular (base) prices and at promotional prices I We often observe sales spikes during a promotion I Could be due to brand switching or increased consumption I Could be due to stockpiling if the good is storable I Develop a simple model of stockpiling—storable goods demand I Goal: predict how demand is affected by the distribution of prices at which the storable good can be bought 21 / 113 Model outline I At the beginning of each period a consumer goes to a store and chooses among K different pack sizes of a storable good I Pack size k contains nk units of the product (nk 1, 2,... ) 2 { } I No purchase: n0 =0 I The choice in period t is a = 0, 1,...,K t 2 A { } I it is the number of units in the consumer’s inventory at the beginning of period t I After the shopping trip, if at = k the consumer has it + nk units of the product at her disposal I The consumer has a consumption need of one unit in each period I Will consume one unit if it + nk 1, and will not consume otherwise ≥ 22 / 113 Evolution of inventory I Consumer can store at most I units I Conditional on it and at = k, inventory evolves as 0 if it