A Service of

Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics

Iskhakov, Fedor; Jørgensen, Thomas Høgholm; Rust, John; Schjerning, Bertel

Article The endogenous grid method for discrete-continuous dynamic choice models with (or without) taste shocks

Quantitative Economics

Provided in Cooperation with: The Econometric Society

Suggested Citation: Iskhakov, Fedor; Jørgensen, Thomas Høgholm; Rust, John; Schjerning, Bertel (2017) : The endogenous grid method for discrete-continuous dynamic choice models with (or without) taste shocks, Quantitative Economics, ISSN 1759-7331, Wiley, Hoboken, NJ, Vol. 8, Iss. 2, pp. 317-365, http://dx.doi.org/10.3982/QE643

This Version is available at: http://hdl.handle.net/10419/195542

Standard-Nutzungsbedingungen: Terms of use:

Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes.

Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence.

https://creativecommons.org/licenses/by-nc/4.0/ www.econstor.eu Quantitative Economics 8 (2017), 317–365 1759-7331/20170317

The endogenous grid method for discrete-continuous dynamic choice models with (or without) taste shocks

Fedor Iskhakov Research School of Economics, Australian National University and ARC Centre of Excellence in Population Ageing Research, University of New South Wales

Thomas H. Jørgensen Department of Economics, University of Copenhagen

John Rust Department of Economics, Georgetown University

Bertel Schjerning Department of Economics, University of Copenhagen

We present a fast and accurate computational method for solving and estimating a class of models with discrete and continuous choice vari- ables. The solution method we develop for structural estimation extends the en- dogenous grid-point method (EGM) to discrete-continuous (DC) problems. Dis- crete choices can lead to kinks in the value functions and discontinuities in the optimal policy rules, greatly complicating the solution of the model. We show how these problems are ameliorated in the presence of additive choice-specific independent and identically distributed extreme value taste shocks that are typi- cally interpreted as “unobserved state variables” in structural econometric appli- cations, or serve as “random noise” to smooth out kinks in the value functions in numerical applications. We present Monte Carlo experiments that demonstrate the reliability and efficiency of the DC-EGM algorithm and the associated maxi- mum likelihood estimator for structural estimation of a life-cycle model of con- sumption with discrete retirement decisions.

Fedor Iskhakov: [email protected] Thomas H. Jørgensen: [email protected] : [email protected] Bertel Schjerning: [email protected] We acknowledge helpful comments from Chris Carroll, Giulio Fella and many other people, participants at seminars at University of New South Wales, University of Copenhagen, the 2012 conferences of the Society of Economic Dynamics, the Society for Computational Economics, and the Initiative for Computational Economics at Zurich (ZICE 2014, 2015, 2017). This paper is part of the Intelligent road user charging (IRUC) research project financed by the Danish Council for Strategic Research (DSF). Iskhakov, Rust, and Schjern- ing gratefully acknowledge this support. Iskhakov gratefully acknowledges the financial support from the Australian Research Council Centre of Excellence in Population Ageing Research (project CE110001029) and Michael P.Keane’s Australian Research Council Laureate Fellowship (project FL110100247). Jørgensen gratefully acknowledges financial support from the Danish Council for Independent Research in Social Sci- ences (FSE, Grant 4091-00040).

Copyright © 2017 The Authors. Quantitative Economics. The Econometric Society. Licensed under the Creative Commons Attribution-NonCommercial License 4.0. Available at http://www.qeconomics.org. DOI: 10.3982/QE643 318 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017)

Keywords. Life-cycle model, discrete and continuous choice, , Euler equation, retirement choice, endogenous grid-point method, nested fixed point algorithm, extreme value taste shocks, smoothed max function, structural estimation. JEL classification. C13, C63, D91.

1. Introduction This paper develops a fast new solution algorithm for structural estimation of dynamic programming models with discrete and continuous choices. The algorithm we propose extends the endogenous grid method (EGM) by Carroll (2006) to discrete-continuous (DC) models. We refer to it as the DC-EGM algorithm. We embed the DC-EGM algo- rithm in the inner loop of the nested fixed point (NFXP) algorithm (Rust (1987)), and show that the resulting maximum likelihood estimator produces accurate estimates of the structural parameters at low computational cost. There is an extensive literature on static models of discrete/continuous choice: a classic example is Dubin and McFadden (1984). However, the focus of our paper is on dynamic DC models. A classic example is the life-cycle model with discrete retire- ment and continuous consumption decisions. While there is a well developed literature on solution and estimation of dynamic models, and a separate literature on estimation of life-cycle models without discrete choices, there has been far less work on solution and estimation of DC models.1 There is good reason why DC models are much less commonly seen in the literature: they are substantially harder to solve. The value functions of models with only continu- ous choices are typically concave and the optimal policy function can be found from the Euler equation. EGM avoids the need to numerically solve the nonlinear Euler equation for the optimal continuous choice at each grid point in the state space. Instead, EGM specifies an exogenous grid over an endogenous quantity (e.g., savings) to analytically calculate the optimal policy rule (e.g., consumption) and endogenously determine the predecision state (e.g., beginning-of-period resources).2 DC-EGM retains the main de- sirable properties of EGM, namely it avoids the bulk of costly root-finding operations and handles borrowing constraints in an efficient manner. Dynamic programs that have only discrete choices are substantially easier to solve, since the optimal decision rule is simply the alternative with the highest choice-specific

1There are relatively few examples of structural estimation or numerical solution of DC models. Some prominent examples include the model of optimal nondurable consumption and housing purchases (Carroll and Dunn (1997)), optimal saving and retirement (French and Jones (2011)), and optimal saving, labor supply, and fertility (Adda, Dustmann, and Stevens (2017)). 2The EGM is in fact a specific application of what is referred to as “controlling the postdecision state” in operations research and engineering (Bertsekas, Lee, van Roy, and Tsitsiklis (1997)). Carroll (2006)in- troduced the idea in economics by developing the EGM algorithm with the application to the buffer-stock precautionary savings model. Since then the idea became widespread in economics. Further generaliza- tions of EGM include Barillas and Fernández-Villaverde (2007), Hintermaier and Koeniger (2010), Ludwig and Schön (2013), Fella (2014), Iskhakov (2015). Jørgensen (2013) compares the performance of EGM to mathematical programming with equilibrium constraints (MPEC). Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 319

value. However, solving dynamic programming problems that combine continuous and discrete choices is substantially more complicated, since discrete choices introduce kinks and nonconcave regions in the value function that lead to discontinuities in the policy function of the continuous choice (consumption). This can lead to situations where the Euler equation has multiple solutions for consumption, and hence it is only a necessary rather than a sufficient condition for the optimal consumption rule (Clausen and Strub (2013)). This inherent feature of DC problems complicates any method one might consider for solving DC models. We illustrate how DC-EGM can deal with these inherent complications using a life- cycle model with a continuous consumption and binary retirement choice with and without taste shocks. Our example is a simple extension of the classic life-cycle model of Phelps (1962) where, in the absence of a retirement decision, the optimal consump- tion rule could hardly be any simpler—a linear function of resources. However, once the discrete retirement decision is added to the consumption–savings problem—in our case allowing a worker with logarithmic utility to also make a binary irreversible retire- ment decision—the consumption function becomes unexpectedly complex, with mul- tiple discontinuities in the optimal consumption rule. We derive an analytic solution for this model and use it to demonstrate the accuracy of the solution obtained numerically by DC-EGM. We then show how DC-EGM can be used to solve DC models with taste shocks and investigate its performance as a nested solution method for structural esti- mation of a DP model of retirement. Fella (2014) showed how the EGM could be adapted to solve nonconcave problems, including models with discrete and continuous choices. In this paper we focus on dis- crete choices and show that introducing independent and identically distributed (i.i.d.) extreme value type I choice-specific taste shocks not only facilitates maximum likeli- hood estimation, but also smooths out some of the kinks in the value functions, thereby simplifying the numerical solution of the model. This approach results in multinomial logit formulas for the conditional choice probabilities for the discrete choices and a closed-form expression for the expectation of the value function with respect to these taste shocks.3 In econometric applications continuously distributed taste shocks are essential for generating predictions from dynamic programming models that are statistically nonde- generate. Such predictions assign a positive (however small) choice probability to every alternative, and therefore preclude zero likelihood observations. These shocks are in- terpreted as unobserved state variables, that is, idiosyncratic shocks observed by agents but not by the econometrician. However, in numerical or theoretical applications, taste shocks can serve as a smoothing device (homotopy perturbation) that facilitates the nu- merical solution of more advanced DC models that may have excessively many kinks and discontinuities, for example, caused by a large number of discrete choices.

3In principle, the extreme value assumption could be relaxed to allow for other distributions at the cost of numerical approximation of choice probabilities and the conditional expectation of the value function. For example, Bound, Stinebrickner, and Waidmann (2010) assume that the discrete choice-specific taste shocks are Normal rather than extreme value. Yet, we follow the long tradition of discrete choice modeling dating back to McFadden (1973) and Rust (1987). 320 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017)

The inclusion of extreme value type I taste shocks has a long history in discrete choice modeling dating back to the seminal work by McFadden (1973). This assumption is typically invoked in microeconometric analyses of dynamic discrete choice models where numerical performance boosted by closed-form choice probabilities is particu- larly important; see, for example, Rust (1994) and the recent survey by Aguirregabiria and Mira (2010). Some recent studies of DC models with extreme value taste shocks in- clude Casanova (2010), Ejrnæs and Jørgensen (2016), Iskhakov and Keane (2016), Oswald (2016), and Adda, Dustmann, and Stevens (2017). At first glance, the addition of stochastic shocks would appear to make the prob- lem harder to solve, since both the optimal discrete and continuous decision rules will necessarily be functions of these stochastic shocks. However, we show that a variety of stochastic variables in DC models smooth out many of the kinks in the value functions and the discontinuities in the optimal consumption rules. In the absence of smooth- ing, we show that every kink induced by the comparison of the discrete choice-specific value functions in any period t propagates backward in time to all previous periods as a manifestation of the decision maker’s anticipation of the future discrete action. The re- sulting accumulation of kinks during backward induction presents the most significant challenge for the numerical solution of DC models. In the presence of taste shocks the decision maker can only anticipate a particular future discrete action to be more or less probable, and thus the primary reason for the accumulation of kinks disappears. Thus, the combination of taste shocks and the stochastic variables in the model is perhaps the most powerful device to prevent the propagation and accumulation of kinks.4 Inthecasewhereextremevaluetasteshocksareusedasalogitsmoothing device of an underlying deterministic model of interest, we show that the latter problem can be approximated by the smoothed model to any desirable degree of precision. The scale parameter σ ≥ 0 of the corresponding extreme value distribution then serves as a ho- motopy or smoothing parameter. When σ is sufficiently large, the nonconcave regions near the kinks in the nonsmoothed value function disappear and the value functions become globally concave. But even small values of σ smooth out many of the kinks in the value functions and suppress their accumulation in the process of backward induc- tion as noted above. An additional benefit of the taste shocks is that standard integration methods, such as quadrature rules, apply when the expected value function is a smooth function. We run a series of Monte Carlo simulations to investigate the performance of DC- EGM for structural estimation of the life-cycle model with the discrete retirement deci- sion. We find that a maximum likelihood estimator that nests the DC-EGM algorithm performs well. It quickly produces accurate estimates of the structural parameters of the model even when fairly coarse grids over wealth are used. We find the cost of “over- smoothing” to be negligible in the sense that the parameter estimates of a perturbed model with stochastic taste shocks are estimated very accurately even if the true model does not have taste shocks. Thus, even in the case where the addition of taste shocks

4Contrary to the macro literature that uses stochastic elements such as employment lotteries (Rogerson (1988), Prescott (2005), Ljungqvist and Sargent (2005)) to smooth out nonconvexities, the taste shock we introduce in DC models in general do not fully convexify the problem. Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 321 results in a misspecification of the model, the presence of these shocks improves the accuracy of the solution and reduces computation time without increasing the approxi- mation bias significantly. Even when very few grid points are used to solve the model, we find that smoothing the problem improves the root mean square error (RMSE). Partic- ularly, with an appropriate degree of smoothing (σ), we can reduce the number of grid points by an order of magnitude without much increase in the RMSE of the parameter estimates. DC-EGM is applicable to many fields of economics and has been implemented in several recent empirical applications. Ameriks, Briggs, Caplin, Shapiro, and Tonetti (2015) study how the need for long term care and bequest motive interact with government-provided support to shape the wealth profile of the elderly. They use an endogenous grid method similar to DC-EGM to solve and estimate the correspond- ing nonconcave model. Iskhakov and Keane (2016) employ DC-EGM to estimate a life- cycle model of discrete labor supply, human capital accumulation, and savings for the Australian population. They use the model to evaluate Australia’s defined contribution pension scheme with means-tested minimal pension, and quantify the effects of antici- pated and unanticipated policy changes. Yao, Fagereng, and Natvik (2015) use DC-EGM to analyze how housing and mortgage debt affects consumers’ marginal propensity to consume. They estimate a model in which households hold debt, financial assets, and illiquid housing, and find that a substantial fraction of households are likely to behave in a “hand-to-mouth” fashion despite having significant wealth holdings. Druedahl and Jørgensen (2015) employ a modified version of DC-EGM to analyze the credit card debt puzzle. They solve a model of optimal consumption and debt holdings, and show how, for some parameterizations of the model, a large group of consumers find it optimal to simultaneously hold positive gross debt and positive gross assets even though the interest rate on the debt is much higher than the rate on the assets. Ejrnæs and Jør- gensen (2016) use DC-EGM to estimate a model of optimal consumption and saving with a fertility choice to analyze the saving behavior around intended and unintended child births. They model the fertility process as a discrete choice over effort to conceive a child subject to a biological fecundity constraint and allow for the possibility of unin- tended child births through imperfect contraceptive control. In the next section we present a simple extension of the life-cycle model of consump- tion and savings with logarithmic utility studied by Phelps (1962)andDeaton (1991) where we allow for a discrete retirement decision. We derive a closed-form solution to this problem and discuss its properties. Using this simple model we demonstrate the ac- curacy of the deterministic version of DC-EGM. We then introduce extreme value taste shocks and show how the implied smoothing affects the value functions and the optimal policy rules. In particular, we show that the error introduced by “extreme value smooth- ing” is uniformly bounded, and prove that the solution of the smoothed DP problem with taste shocks converges to the solution to the DP problem without taste shocks as the scale of the shocks approaches zero. Section 3 presents the full DC-EGM algorithm. In Section 4 we show how it is incorporated in the nested fixed point algorithm for max- imum likelihood estimation of the structural parameters in the retirement model. We 322 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017) present the results of a series of Monte Carlo experiments in which we explore the per- formance of the estimator in a variety of settings. We conclude with a short discussion of the range of models that DC-EGM is applicable to and discuss some open issues with this method.

2. Anillustrativeproblem:Consumption and retirement This section extends the classic life-cycle consumption–savings model of Phelps (1962) and Deaton (1991) to allow for a discrete retirement decision. We derive an analytic so- lution to this problem with logarithmic utility to both illustrate the complexity caused by the addition of a discrete retirement choice and show how DC-EGM computes this solution. While we focus on this simple example for expositional clarity, DC-EGM can be applied to a much more general class of problems that include taste and income shocks. We will discuss these extensions in Section 3 and show how the addition of shocks can actually simplify the solution of the model using DC-EGM.

2.1 Deterministic model of consumption–savings and retirement Consider the discrete-continuous (DC) dynamic optimization problem

T t max β log(ct ) − δdt (1) {c d }T t t t=1 t=1 involving choices of consumption ct and whether to retire dt , to maximize lifetime dis- counted utility. Let dt = 0 denote retirement, let dt = 1 denote continued work, and let δ>0 be the disutility of work. To simplify the exposition, we assume retirement is ab- sorbing, that is, a retiree cannot return to work.5 We solve (1) subject to a sequence of period-specific borrowing constraints, ct ≤ Mt , where Mt = R(Mt−1 − ct−1) + ydt−1 is the consumer’s consumable resources (wealth) at the beginning of period t. There is a fixed, nonstochastic gross interest rate R and labor income y for workers. The continuous consumption decision and discrete retirement decision are made at the start of each period, whereas interest earnings and labor in- come are paid at the end of the period.6 Let Vt(M) and Wt(M) be the expected discounted lifetime utility of a worker and a retiree, respectively, in period t of their life. The choice problem of the worker can be expressed recursively through the Bellman equation as Vt(M) = max vt(M 0) vt(M 1)  (2)

5This allows us to focus primarily on the worker’s problem. In the absence of absorbing retirement, the retiree’s problem involves a discrete choice (returning to work or staying retired), and can be solved by DC- EGM similarly to the worker’s problem. 6This timing convention is standard in the literature and it removes the need to include income as a seperate state variable when we extend this model to a much wider class of problems with stochastic R and/or y. Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 323

where the choice-specific value functions vt(M d), d ∈{0 1},aregivenby vt(M 0) = max log(c) + βWt+1 R(M − c)  (3) 0≤c≤M vt(M 1) = max log(c) − δ + βVt+1 R(M − c) + y  (4) 0≤c≤M

The Bellman equation for the retiree’sproblem, which does not involve a discrete choice, is much simpler and can be written as Wt(M) = max log(c) + βWt+1 R(M − c)  (5) 0≤c≤M

The value function for a retiree Wt(M) has a closed-form solution given by Phelps (1962, p. 742), so we focus on solving the worker’s problem, that is, solving for the value function Vt(M) and the optimal consumption rule ct (M). The fact that the future looks the same from the point of view of the retiree and from the point of view of the worker who decides to retire, can be verified from the fact that the right hand side of (3)isiden- tical to that of (5). Therefore, we have Wt(M) = vt(M 0), and the consumption function of the retiree is identical to the choice-specific consumption function of the worker who decided to retire, ct(M 0), where the second argument denotes the retirement choice. Note that even if vt(M 0) and vt(M 1) are concave functions of M, the value func- tion Vt(M) is the maximum of these two concave functions by (2) and will generally not be globally concave (Clausen and Strub (2013)). Further, Vt(M) will generally have a kink point at the value M = Mt where the two choice-specific value functions cross, that is, vt(Mt 1) = vt(Mt 0). We refer to these as primary kinks because they constitute opti- mal retirement thresholds for the worker in each period t. The optimal retirement rule is given by dt(M) = 1 if M

Theorem 1 (Analytical Solution to the Retirement Problem). Assume that income and disutility of work are time-invariant, the discount factor β and the disutility of work δ are not too large, that is,

βR ≤ 1 and δ<(1 + β) log(1 + β) (6) 324 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017) and instantaneous utility is given by u(c) = log(c). Then for τ ∈{1T} the optimal consumption rule in the worker’s problem (2)–(4) is given by ⎧ ⎪M if M ≤ y/Rβ ⎪ ⎪ l1 ⎪[M + y/R]/(1 + β) if y/Rβ ≤ M ≤ M −  ⎪ T τ ⎪ 2 2 l1 l2 ⎪ M + y 1/R + 1/R / 1 + β + β if M − ≤ M ≤ M −  ⎪ T τ T τ ⎪··· ··· ⎪ ⎪ −1 ⎪ τ−1 τ−1 ⎪ −i i lτ−2 lτ−1 ⎪ M + y R β if M − ≤ M ≤ M −  ⎪ T τ T τ ⎪ i=1 i=0 ⎪ − ⎪ τ τ 1 ⎪ − lτ−1 rτ−1 ⎪ M + y R i βi if M ≤ M

The segment boundaries are totally ordered with

l1 ··· lτ−1 rτ−1 ··· r1 y/Rβ< MT −τ < < MT −τ < MT −τ < < MT −τ < MT −τ (8) and the rightmost threshold MT −τ, given by

−1 −K τ (y/R)e i M − =  where K = δ β  (9) T τ − −K 1 e i=0 defines the smallest level of wealth sufficient to induce the consumer to retire at age t = T − τ.

li The proof of Theorem 1—in particular, the expressions for the kink points MT −τ and ri MT −τ—is available in a supplementary file on the journal website, http://qeconomics. org/supp/643/supplement.pdf. However, we show how this solution is derived when we introduce the DC-EGM algorithm in the next section.7

7Note that the assumptions on the parameters β, δ, and R are needed to ensure the ordering of the bounderies (8). Modified versions of Theorem 1 hold under weaker conditions, including a version where income and the disutility of work are age-dependent. However, depending on the paths of income and Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 325

Theorem 1 establishes that the optimal consumption rule of the worker cT −τ(M) is piecewise linear in M, and in period t consists of 2(T −t)+1 segments. The first segment where M

2.2 DC-EGM for problems without taste shocks

We are now in a position to introduce a generalization of the EGM algorithm for solv- ing discrete-continuous problems that we call the DC-EGM algorithm. We describe DC- EGM by showing how it can be used to solve for the optimal consumption rule in the last three periods of the worker’s problem. As the original EGM constitutes a building block of DC-EGM, we illustrate it as well using the consumption choice problem of the retiree. After explaining the DC-EGM algorithm, we compare its numerical performance and show that DC-EGM can closely approximate the analytic solution in Theorem 1. DC-EGM is a backward induction algorithm that uses the inverted Euler equation to sequentially compute (potentially without root-finding) the choice-specific value func- tions vt(M d) and the corresponding choice-specific consumption functions ct(M d) starting at the last period of life, T . Note that in a generic period t of the backward induc- tion, the Bellman equation (4) of a worker who remains working implies the following first order condition for the consumption choice known as the Euler equation:   0 = u (c) − βRu ct+1 R(M − c) + y βR (10) = 1/c −  ct+1(R(M − c) + y)

disutility of work, some of the intermediate thresholds in Theorem 1 may not exist or may be equal to each other. 326 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017)

Similarly the Bellman equation (3) for the worker who decides to retire implies the Euler equation =  −  r − 0 u (c) βRu ct+1 R(M c) βR (11) = /c −  1 r − ct+1(R(M c)) wherethesecondequalityshowsthecaseofu(c) = log(c). Given the period t + 1 opti- r mal consumption functions ct+1(M) for workers and ct+1(M) for retirees, the solutions to these Euler equations yield the period t choice-specific consumption functions of the 8 worker, ct(M d). The solutions are computed by applying the inverse of the marginal utility to the second component in (10)and(11). When such an inverse function is an- alytical, specifying an exogenous grid over end-of-period saving A = M − c facilitates solving for optimal current consumption in closed form without resorting to iterative numerical methods. This is the idea behind the endogenous grid method (EGM) pro- posed by Carroll (2006) that we build on. Consider the terminal period T . The optimal consumption rule is to consume all available wealth and, thus, is given by cT (M d) = M. With positive disutility of working, all agents retire (i.e., MT = 0) since income is paid at the end of the period, so it follows that dT (M) = 0. This solution provides the base for backward induction. Now consider a retiree in period T − 1. Note that because the Bellman equation of the retiree (5) is identical to that of the worker who decides to retire (3), the Euler equa- R tion (11) also characterizes the optimal consumption choices of the retiree, ct (M).In − R = + T 1 the closed-form solution of (11)iscT −1(M) M/(1 β). Consider how this so- lution is computed using the original EGM algorithm by Carroll (2006). EGM uses the Euler equation (11)toconstructanendogenous grid over M from an exogenous grid over  savings A = M − c.LetA ={A1AJ} denote the exogenous grid over savings. Be- cause savings is a sufficient statistic, that is, carries all the information about wealth and consumption in the period, Euler equation (11) can be solved for c for each point Aj. As mentioned above, when u(c) is analytically invertible, the solution is also analytical. = r = For the case u(c) log(c), the solution is easily seen to be cT −1(MjT −1) Aj/β,where  MjT −1 is an element of the endogenous grid MT −1 implied by the exogenous grid over  − = + r = + savings A in period T 1.WehaveMjT −1 Aj cT −1(MjT −1) Aj(1 1/β),which r = + implies the EGM solution cT −1(MjT −1) MjT −1/(1 β).  Thus, at the points of the endogenous grid Mt ,EGMproducesanexact solution in the sense that the Euler residuals exactly equal zero. Between these points calculation of the optimal consumption requires function approximation, typically linear interpo- lation. With the latter, EGM produces the exact (linear) solution for the consumption function of a retiree.

8 r Note the distinction between ct+1(M) and ct+1(M) which are the state-specific optimal consumption functions at time t + 1, and ct (M 0) and ct (M 1) which are the decision-specific consumption functions at time t for the worker. The distinction can be confusing since “work” and “retirement” are both states and decisions but it is important. We focus on the worker problem and skip the Euler equation for the retiree who have no additional discrete choice over retirement due to our assumption that retirement is absorbing. Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 327

Now consider a worker in period T − 1. The worker must solve for two consumption functions cT −1(M 0) and cT −1(M 1) corresponding to the decision to retire or not, re- spectively. However no special complications are created at this point: we simply apply “standard EGM” to solve for the optimal consumption rule of a worker cT −1(M d) for each of the discrete decisions d, using Euler equation (10)or(11)justaswedescribed for the case of a retiree above. Similar to above, with linear interpolation this also results in exact solutions cT −1(M 1) = (M + y/R)/(1 + β) and cT −1(M 0) = M/(1 + β). To ensure that the credit constraint cT −1 ≤ M is satisfied in the presence of noncap- ital income y, standard EGM has an additional step.9 Namely, from (10) it follows that invoking the EGM algorithm with zero savings, Aj = 0, produces an endogenous point MjT −1 = y/Rβ. As we show below in Theorem 2, it holds that savings as a function of wealth must be nondecreasing, and, therefore, for M ≤ y/Rβ the savings must remain zero, that is, cT −1(M 1) = M. To add this additional “credit-constrained” segment to the optimal consumption function, it is sufficient to add a point M0T −1 = 0 to the endoge-  nous grid MT −1, and set the corresponding optimal consumption cT −1(M0T −1 1) = M0T −1 = 0. This way, when linear interpolation is used, EGM finds the first two seg- ments of the true solution cT −1(M 1) given in equation (7)ofTheorem1 exactly, includ- ing the location of the first kink point. In summary, when there are discrete choices, DC-EGM invokes the EGM algo- rithm to calculate, via simple linear interpolation as described above, piecewise lin- d ear approximations of the decision-specific consumption functions ct(Mjt d) defined  d ={ d d } over decision-specific endogenous grids Mt M1tMJt . However, what is different about DC-EGM is that we need to compare the choice-specific value functions vt(M 0) and vt(M 1) so as to locate the threshold level of wealth when it becomes optimal to retire, Mt . DC-EGM constructs approximations to vt(M 0) and vt(M 1) over the respec-  0  1 tive endogenous grids Mt and Mt alongside the calculation of the optimal consumption functions ct (M 0) and ct (M 1) by substituting the latter into the Bellman equations (3) and (4). Using the interpolated decision-specific value functions, we then find the op- timal retirement threshold (primary kink) Mt by finding the point of intersection of the two decision-specific value functions, vt(Mt 0) = vt(Mt 1). The overall value function for the worker Vt(M) is then computed as an upper enve- lope of the two choice-specific value functions vt(M d), each defined over the endoge-  d nous grid Mt . Similarly, the overall consumption function of the worker cT −1(M) is com- bined from choice-specific consumption functions cT −1(M 1) and cT −1(M 0) depend- ing on whether the level of wealth M is below or above the primary kink point MT −1, fully in line with formula (7)ofTheorem1 for τ = 1. So far DC-EGM seems to be a rather straightforward extension of standard EGM, but at period T − 2 we encounter an important additional complication: the emergence of secondary kinks due to multiple local optima for c in the Bellman equation (4). Recall that Vt(M) is the maximum of decision-specific value functions and is not globally con- cave. In particular, VT −1(M) has a nonconcave region near MT −1, where the decision- specific value functions vT −1(M 0) and vT −1(M 1) cross. This implies that at time T − 2

9Note that the credit constraint never binds in the retiree’s problem. 328 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017) when we search over c to maximize log(c) + βVT −1(R(M − c) + y1) in (4), for some lev- els of M there will be multiple local optima for c with corresponding multiple solutions to the Euler equations. Thus, DC-EGM must also take care to select the correct solution to the Euler equation corresponding to the globally optimal consumption value. This is achieved by the calculation of the upper envelope over the overlapping segments of the decision-specific value functions that are produced from different solutions. The dom- inated grid points are then eliminated from the endogenous grid in a way we describe below. We illustrate this crucial step in the DC-EGM algorithm by showing how subopti- mal endogenous grid points are eliminated in the worker’s decision-specific value func- 10 tion vT −2(M 1) in Figure 1. With the parameter values listed in the Figure 1 legend, in period T − 1 the primary kink is MT −1 = 304382. In panel (a) we plot the maximand of the Bellman equation (4), log(c) + βVT −1(R(M − c) + y),inperiodT − 2 for differ- ent values of M. The kink in VT −1(M) and the nonconcave region around it translate directly into a kink and a nonconcave region around it in the maximand for various levels of M. Location of the latter kink depends on M, for example, in the lowest dot- ted line plotted for M = 28 the kink at MT −1 in VT −1(M) induces a kink at c = 175618. In general, we have that the kink in c occurs at c = M − (MT −1 − y)/R and, thus, in- creases monotonically in M. With the kink at c = M − (MT −1 − y)/R, the maximand log(c)+βVT −1(R(M −c)+y)has multiple locally optimal values of c on either side of the = r1 former. Panel (a) shows that for M<305626 MT −2 (notation of Theorem 1)theglobal r1 = r1 optimum is to the right of the kink point, and vice versa for M>MT −2.AtM MT −2 the consumer is indifferent between the two locally optimal solutions. The multiplicity of locally optimal solutions for c in the region near the secondary r1 kink MT −2 is also reflected in multiple solutions to the corresponding Euler equation as shown in panel (c) of Figure 1. The discontinuity in the Euler residual functions are located at the same kink points. Panel (b) of Figure 1 shows the implied consumption function and endogenous grid  1 − MT −2 that result from the application of the standard EGM method in period T 2.We label each of the points (MjT −2cj) resulting from the application of EGM to the first 20 exogenous saving grid points A1 = 0, A2 = 1,uptoA20 = 19. The striking result is  1 that EGM produces a nonmonotonic endogenous grid MT −2 as is indicated by the dot- ted line that connects (M11T −2c11T −2) = (3576 2576) to the point (M12T −2c12T −2) = (2698 1598). Evidently, this reflects both a discontinuity and a drop in both M and c. Note also that EGM has produced a consumption correspondence rather than a con- sumption function because of the two possible consumption values at the endogenous grid points (M12T −2M18T −2). In addition, the jump in this consumption corre- spondence at M11T −2 going backward to an endogenous grid point M12T −2

Theorem 2 (Monotonicity of the Saving Function). Let At(M d) = M − ct(M d) denote the savings function implied by the optimal consumption function ct(M d). If u(c) is a

10In fact, multiple solutions to the Euler equation cause the standard EGM loop to produce a “value correspondence” rather than a value function, while the elimination of suboptimal grid points converts this correspondence back to a proper function. Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 329

Figure 1. DC-EGM in period T − 2 of the retirement problem, where T = 20. The plots illustrate how suboptimal endogenous points in the value function of the worker who in period T − 2 de- cides to continue working, vT −2(M 1), are eliminated in the DC-EGM algorithm. Parameters of the model are T = 20, R = 1, β = 098, δ = 1,andy = 20. Panel (a) plots the maximand of the Bell- manequation(4); panel (b) shows the points of the optimal consumption functions computed by the standard EGM, corresponding to the endogenous grid points in panel (d); panel (c) show- cases the discontinuous Euler equation by plotting Euler residuals for several values of wealth M, corresponding to panel (a); panel (d) plots the “value correspondence” produced by the stan- dard EGM in the nonconcave region of the problem and the location of the kink point found by DC-EGM. concave function, then for each t ∈{1T} and each discrete choice d ∈{0 1} the opti- mal saving function At(M d) = M − ct(M d) is monotone nondecreasing in M.

The proof of a more general version of Theorem 2 for arbitrary DC models is given  1 in Appendix A. It implies that the nonmonotonic endogenous grid MT −2 illustrated in panel (b) of Figure 1 is inconsistent with an optimal solution to the problem. How can 330 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017) this be rectified? In panel (d) of Figure 1 we illustrate the key second step of DC-EGM— refinement of the endogenous grid to discard the suboptimal points produced by the EGM step. This is achieved by constructing the upper envelope over the segments of the discrete choice-specific value function correspondence in the region of M where multi- ple solutions were detected. The detection itself relied on checking for monotonicity of the endogenous grid. In panel (d) of Figure 1, M12T −2 MT −2 optimal consumption is given by the lower line marked with “◦” symbols. There is a discontinuous downward jump in consumption from endoge- r1 nous grid point M7T −2 to the next point M15T −2, unless the kink point MT −2 is also added, marking the point of vertical drop in the consumption function. Using this re-  ∗1 fined monotonic endogenous grid MT −2 (even with a small number of grid points J), the DC-EGM produces very accurate approximations to the true solutions VT −2(M) and cT −2(M) that capture both the kink in the former and the discontinuity in the latter. This completes our description of DC-EGM: the described procedure is repeated for all periods t to solve the retirement problem via backward induction on the Euler and Bellman equations. In the next section we verify that DC-EGM produces accurate ap- proximations to ct(M d), d = 1,atallperiodst ∈{1T}. It also generates accurate rj estimates of the secondary kink points Mt that capture discontinuous reductions in consumption that reflects the anticipated primary kink MT −1 under the optimal con- sumption policy.

2.3 Numerical performance of the DC-EGM

Figure 2 displays the optimal consumption function (7) and compares it to the numer- ical solution produced by DC-EGM, as well as the numerical solution produced by a naive brute force implementation of value function iteration (VFI). VFI solves the Bell- man equations (3)and(4) by backward induction over an exogenous grid on M and us- Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 331

Figure 2. Optimal consumption functions. The plots show optimal consumption rules of the worker in the consumption–savings model with R = 1, β = 098, y = 20,andT = 20.Panel(a) illustrates the analytical solution (which is indistinguishable from the the numerical solution produced by DC-EGM), panel (b) illustrates the numerical error from the solution found by DC-EGM, panel (c) shows the numerical solution found by VFI, and panel (d) shows the asso- ciated numerical errors. Both the VFI and DC-EGM solutions were generated using 2000 points in the M-grid. For VFI, grid points are equally spaced, the maximum level in the wealth is 600, and 10,000 equally spaced points of consumption between zero and M(t) are used to solve the maximization problem in the Bellman equation. ing numerical optimization to search for optimal consumption at each M grid point.11 With a sufficient number of grid points, DC-EGM is able to accurately locate all the dis- rj continuities of the analytical consumption rules (Mt ) and the boundary of the credit-

11Simple linear interpolation of the value function at the exogenous grid points over M was used to implement numerical optimization for values of c where implied next period resources R(M − c) + y does not lie on the predefined exogenous grid M . To enable a fair comparison with DC-EGM, we did use the analytical expressions for the value functions and consumption functions in the retiree problem. 332 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017) constrained region y/Rβ. Yet, because the kinks points associated with the credit con- lj straint in future periods, Mt , are not located precisely by DC-EGM, the right panel of − −4 τ 1 Figure 2 shows small relative errors on the order of 10 in the intervals (y/Rβ MT −τ) in each period T − τ. Overall, the numerical solution by DC-EGM replicates the analytical solution remarkably well.12 Panels (c) and (d) of Figure 2 show the solution produced by the VFI method where the optimal consumption levels are found by a fine grid search method. This implemen- tation of VFI could admittedly be thought of as too simplistic, with possible improve- ments in how the grid points are located and spaced, which computational methods are employed to search for optimal consumption in each grid point, and so forth. Yet, the point we wish to make is that a standard “off the shelf” version of the VFI method may have serious difficulties when solving DC problems due to its failure to adequately capture the secondary kinks in the value function that get “papered over” via naive ap- plication of the standard method of linear interpolation of the value functions. The bot- tom panels of Figure 2 shows that the VFI solution results in significant approximation errors and is unable to fully capture the numerous discontinuities in the consumption function. Figure 3 plots the optimal consumption functions and simulated consumption paths under the same assumptions as in Figure 2 except in this case we set R = 1/β = 102. The theoretical prediction for the consumption–savings model without retirement is that, with Rβ = 1, simulated consumption paths should be completely flat. Yet, the consumption functions shown in the left panel display numerous discontinuities that accumulate backward from the final period T = 20. Beyond the important economic message that discontinuous consumption functions are not incompatible with con- sumption smoothing, this also illustrates the remarkable precision of the DC-EGM al- gorithm. In fact, when we simulate consumption trajectories implied by this incredibly complex solution found numerically, the simulated consumption profiles are still per- fectly flat. Before we describe in detail how DC-EGM works, we now illustrate how the incor- poration of various types of uncertainty, including extreme value taste shocks, renders the accumulation of kinks in the value function and discontinuities in the consumption function considerably less severe.

3. DC-EGM for problems with taste shocks In this section we introduce income and taste shocks into the consumption and retire- ment model and show how DC-EGM is modified to accommodate these shocks. We

12With 2000 points on the endogenous grid over wealth it took our Matlab/C implementation around 017 seconds on a Lenovo ThinkPad laptop with an Intel® Core™ i7-4600M central processing unit (CPU) at 210 GHz and 8 GB random access memory (RAM) to generate the numerical solution by DC-EGM. This is about 20 times faster than VFI, which we implemented in Matlab with 500 fixed grid points over wealth. The discretization of consumption is a brute force approach to ensure that global optimum is found. We used 400 equally spaced guesses for each level of wealth. The fact that EGM offers the speedup of 1–2 orders of magnitude relative to VFI is a well established finding in the literature; see, for example, Barillas and Fernández-Villaverde (2007), Jørgensen (2013), Fella (2014), Ameriks et al. (2015). Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 333

Figure 3. Discontinuous consumption function and smooth consumption paths. The plots show optimal consumption functions of the worker in the consumption–savings model with with T = 20, dt = 1, y = 20, β = 098,andR = 1/β = 102. The left panel illustrates the solution for t = 1 10 18, while the right panel presents consumption paths simulated over the whole life cycle for several initial levels of wealth. The model was solved by the DC-EGM algorithm.

show how primary and secondary kinks are smoothed in the presence of the random factors in the model, and explain how the numerical solution is simplified even though the problem remains nonconvex in general. Three effects are at play. First, because the discrete choice policy is expressed in probabilistic terms, the calculation of primary kink points is no longer needed. Second, as a result, the process of accumulation of secondary kinks is perturbed: the pertur- bations caused by the primary kinks that remain throughout the backward induction process in the deterministic setting “fade out” in the presence of shocks. Third, the cal- culation of expectations over random income in the problem with taste shocks can be performed with standard numerical algorithms, as opposed to the setting with random income but without taste shocks. We discuss each of these points in detail below.

3.1 Taste and income shocks in the retirement problem Consider an extension of the model presented in Section 2 where the consumer faces income uncertainty and where choices are affected by discrete choice-specific taste shocks. More specifically, assume that income when working is yt = yηt ,whereηt is ∼ N − 2 log-normally distributed multiplicative idiosyncratic income shock, log ηt ( ση/2 2 13 ση). Following a vast literature on discrete choice modeling originating with McFadden (1973), we assume that random choice-specific taste shifters σεεt(d), d ∈{0 1},aread- ditively separable, i.i.d. and have an extreme value distribution with scale parameter σε.

13As mentioned above, we follow the literature in the assumption that idiosyncratic income shocks are realized after the labor supply choice is made, which is equivalent to allowing income to be dependent on a lagged choice of labor supply. 334 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017)

These shocks can be interpreted in the structural sense as the information relevant for the discrete choices that is observed by the agents but not by the econometrician. In this case, the scale parameter σε can be estimated from the data. Alternatively, if the true model is deterministic, σε can be interpreted as a (logit) smoothing parameter that can be chosen to approximate the true model with arbitrary precision; see Theorem 3 below. Because discrete choice-specific taste shocks as well as income shocks only enter the worker problem, the solution of the retiree problem (5) remains the same. As for the worker problem, the inclusion of taste shocks requires us to rewrite the Bellman equation (2)as Vt(M ε) = max vt(M 0) + σεε(0) vt(M 1) + σεε(1)  (12) where the value function conditional on the choice to retire, vt(M 0),isstillgivenby(3). However, the value function conditional on the choice to remain working, vt(M 1),is modified to account for the taste and income shocks in the following period, namely σε vt(M 1) = max log(c) − δ + β EV + R(M − c) + yη1 f(dη)  (13) 0≤c≤M t 1

Because the taste shocks are independent extreme value distributed random vari- σε ables, the expected value function, EV t+1, is given by the well known log-sum formula (McFadden (1973)) σε = + + EV t+1(M 1) E max vt+1(M 0) σεε(0) vt+1(M 1) σεε(1) (14) = σε log exp vt+1(M 0)/σε + exp vt+1(M 1)/σε 

The immediate effect of introducing extreme value taste shocks is the complete elimination of the primary kinks, because the location of the indifference point in (12) is now probabilistic from the point of view of the econometrician. Instead of calculating the location of the primary kink Mt in the value function of the worker V(M), the dis- crete choice policy function is now given by the logit choice probabilities Pt(d|M) that arise due to the distributional assumption for the taste shocks: exp vt(M d)/σε Pt(d|M)= d∈{0 1} (15) exp vt(M 1)/σε + exp vt(M 0)/σε

Because ε is unobserved it becomes impractical to carry out the calculations in terms of overall value function V(M)and overall consumption function c(M).Instead, in the presence of taste shocks, DC-EGM operates on the discrete choice-specific value functions v(Md) and consumption functions c(Md), and computes choice probabil- ities (15) at each time period t. It is worth noting that the problem is still not globally concave in general, so the upper envelope calculation and the elimination of the sub- optimal endogenous grid points is still performed as described in Section 2.2. However, when σε is sufficiently large, the value function Vt(M) in (12) eventually becomes glob- ally concave. To see this, note that as the variance of the taste shocks increases, the Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 335

choice-specific value functions are dominated by the noise and the differences between the alternatives become relatively less important. In turn, the components in (12)be- σε = come similar, and limσε↑∞ EV t (M)/σε log(2). It follows from (12) then that the value function vt(M 1) inherits its globally concave shape from the utility function. The income shocks14 in the model also smooth out the secondary kinks during back- ward induction. When the agent cannot perfectly anticipate having next period wealth exactly at the kink point, the secondary kinks are not replicated perfectly in the prior pe- riods. Together these two sources of uncertainty make numerical solution significantly easier by reducing the number of times the secondary envelope routine is called by DC- EGM to refine the nonmonotonic endogenous grid, as described in Section 2.2 and Al- gorithm 2 below. Figure 4 shows the consumption function ct(M 1) for a worker conditional on the choice to continue working for different values of the taste shock scale parameter ∈{ } σε 0 001 005 010 015 . The left panel plots the optimal consumption√ in the ab- sence of income uncertainty (ση = 0) while income uncertainty (ση = 0005) is added in the right panel. The plots are drawn for the period T − 5, corresponding to four dis- continuities of the choice-specific policy function (which is in line with Theorem 1,not counting the one at the primary kink point MT −5). It is evident that taste shocks of larger scale (σε ≥ 005) manage to smooth the function completely, eliminating all four discon- tinuities in the consumption function. Yet, for σε = 001 only the last (rightmost) discon- tinuity is obviously smoothed out. When the model has other stochastic elements such as wage shocks or random market returns, the accumulation of secondary kinks may be less pronounced due to the additional smoothing. Yet, in the absence of taste shocks,

Figure 4. Optimal consumption rules for the worker who remains working, ct (M 1).Theplots show optimal consumption rules of the worker who decides to continue working in the con- sumption–savings model with retirement in period t = T − 5 for a set of taste shock scales σε in = the absence√ of income uncertainty, ση 0 (left panel), and in presence of income uncertainty, ση = 0005 (right panel). The rest of the model parameters are R = 1, δ = 1, β = 098,andy = 20.

14Equivalently, random returns or other stochastic factors in the intertemporal budget constraint. 336 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017) the primary kinks cannot be avoided even if all secondary kinks are eliminated by a suf- ficiently high degree of uncertainty in the model. It is in this setup, which also appears to be mostly used in practical applications, where the introduction of the extreme value distributed taste shocks is especially beneficial. The taste shocks and other structural shocks together contribute to the reduction of the number of secondary kinks and to the alleviation of the issue of their multiplication and accumulation. It is clear from the right panel of Figure 4 that discontinuities in the consumption function can be elimi- nated with a smaller taste shock (σε = 001) when additional smoothing through other types of uncertainty is present in the model. Because the expected value function in (14) is a smooth function of M around the point where vt(M 1) = vt(M 0), the maximand in (13) is also smooth in this region. This leads to an additional benefit of the inclusion of taste shocks, namely that standard nu- merical integration algorithms for the smooth function can be used when calculating the integral in (13). When σε = 0, using procedures like Gaussian quadrature typically results in spurious discontinuities as shown in the left panel of Figure 5.Thisisdueto the integrand not being a smooth function; see Appendix B for a detailed discussion. The right panel of Figure 5 illustrates how moderate smoothing (σε = 005) significantly reduces this approximation error and removes the artificial kinks. When the taste shocks εt have an interpretation as stochastic noise that is intro- duced to help solve a difficult DC dynamic program by making it more smooth, σε is the amount of smoothing and has to be chosen and fixed prior to estimation. Theo- rem 3 shows that the level of σε can always be chosen in such a way that the perturbed model approximates the underlying deterministic model with an arbitrary degree of pre- cision.

igure 2 = = − F 5. Artificial discontinuities in consumption functions, ση 001, t T 3.Thefigure illustrates how the number of discrete points used to approximate expectations regarding future income affects the consumption functions from value function iteration (VFI) and DC-EGM. Panel (a) illustrates how using few (10) discrete equiprobable points to approximate expecta- tions produce severe approximation error when there is no taste shocks. Panel (b) illustrates how moderate smoothing (σε = 005) significantly reduces this approximation error. Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 337

Theorem 3 (Extreme Value Homotopy Principle). In every time period the (expected) value function of the consumption and retirement problem with extreme value taste σε shocks EV t (Mt) defined in (14) converges uniformly to the value function of the same problem without taste shocks Vt(M 1) defined in (2) as the scale of these shocks ap- proaches zero. The uniform bound T−t ∀ : σε − ≤ j t sup EV t (M 1) Vt(M 1) σε β log(2) (16) ≥ M 0 j=0 holds. Consequently, as σε ↓ 0, both continuous and discrete decision rules of the smoothed model with taste shocks converge pointwise to those of the deterministic model.

In Appendix D we prove a more general version of Theorem 3 that holds under weak conditions for arbitrary DC models. Theorem 3 formalizes the results presented graph- ically in Figure 4, and justifies our claim that the extreme value smoothing can be re- garded as a homotopy method for solving the nonsmooth limiting problem without taste shocks by solving a smooth “nearby” problem with extreme value taste shocks. The extreme value scale parameter σε plays the role of the “homotopy parameter.”

3.2 Extending DC-EGM for taste shocks and income uncertainty In the presence of taste shocks and income uncertainty, the DC-EGM algorithm remains largely the same, and potentially even simpler because the calculation of the primary kinks Mt is replaced by the calculation of choice probabilities using formula (15). The biggest difference is that with taste shocks, the DC-EGM operates on the choice-specific values, as was mentioned in Section 3.1. We present the pseudo-code of the full algo- rithm in this section. If we continue to assume that retirement is an absorbing state, the problem of the retiree remains the same, and we focus again on the worker’s problem. With taste and income shocks it is given by (12), (3), and (13), with the modified “smoothed” Euler equa- tion taking the form15   0 = u (c) − βR u ct+1 R(M − c) + yη1 Pt+1 1|R(M − c) + yη +  − + | − + u ct+1 R(M c) yη0 Pt+1 0 R(M c) yη f(dη) (17) 1 Pt+1 1|R(M − c) + yη Pt+1 0|R(M − c) + yη = − βR + f(dη) c ct+1 R(M − c) + yη1 ct+1 R(M − c) + yη0 where Pt(1|M) and Pt(0|M) are the choice probabilities (15), and the last line is written for the case u(c) = log(c). Given the next period choice-specific consumption functions of the worker ct+1(M 0) and ct+1(M 1),wesolve(17) via the same backward induction process as

15See Lemma 1 in Appendix A. 338 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017) described in Section 2.2 for the retirement problem without stochastic shocks. As be- fore, the induction starts at the terminal period T with the easily derived consumption functions cT (M 0) = cT (M 1) = M, choice-specific value functions vT (M 0) = u(M) = vT (M 1) + δ, and the probability of remaining working PT (1|M)= 1/(1 + exp(δ/σε)).  Then working backward, we choose an exogenous grid over saving A = (A1AJ) (which remains fixed throughout the backward induction process here for notational simplicity) and compute optimal consumption {ct (A1d)ct (AJd)} for each point Aj and for each discrete choice d by calculating the inverse marginal utility of the right hand side of the Euler equation (17)and(11)ford = 1 and d = 0, respectively. Us-  d = ing these consumption functions, we construct the endogenous grid over M as Mt d d d d = + ∈{ } (Mt1MtJ),whereMj is given by Mjt ct (Ajd) Aj, j 1J . For every d, if the resulting grid points are a monotonically increasing sequence, then no violation of monotonicity of the saving function as per Theorem 2 is indicated, and the DC-EGM method automatically reverts to the standard EGM method of Carroll  d (2006). However, if Mt is not a monotonically increasing sequence, we apply the same upper envelope procedure as described in Section 2.2 to eliminate the suboptimal ele-  d ments of Mt and add a point where the disjoint segments of the value function intersect. The last step amounts to calculating the choice-specific value functions v(Md) along- side the consumption functions, which is achieved by plugging the computed ct(A1d) into the maximand of the Bellman equation (13) for each point Aj of the exogenous grid on savings. After the choice-specific consumption and value functions c(Md) and v(Md) are  ∗ computed on the monotonic endogenous grids M dt ,theperiodt iteration of the DC- EGM algorithm is complete. Unlike in Section 2.2, in the presence of taste shocks the calculation of the upper envelope over the choice-specific value functions is not per- formed. Instead, the discrete choice probabilities may be calculated using (15)forany level of wealth M using interpolation of the choice-specific values v(M0) and v(M1). We continue this procedure from period T − 1 backward to period t = 1,atwhich point we have fully approximated the solution to the life-cycle retirement problem by DC-EGM. Note that as before, in the formulation with taste shocks none of the steps of the DC-EGM algorithm require iterative root-finding operations. Algorithms 1, 2,and3 provide the pseudo-code for the complete DC-EGM algorithm for the problems with discrete shocks. The full DC-EGM algorithm (Algorithm 1) invokes the EGM step (Algorithm 2) repeatedly to compute the value function correspondences for all discrete choices, and then finds and removes all suboptimal points on the re- turned endogenous grids by calling the upper envelope module (Algorithm 3). By Theo- rem 3, it can also approximate the solution for the problems without taste shocks if the scale parameter σε is fixed at a sufficiently small value. While the DC-EGM is similar to the approach proposed in Fella (2014), we explic- itly allow for extreme value type I taste shocks to preferences and show how they help with the computational issues specific to the model of discrete-continuous choices. The approach in Fella (2014) does not readily apply to the class of models with taste shocks but should be adjusted along the lines described here. In particular, DC-EGM operates with discrete choice-specific value functions and optimal consumption rules, Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 339

Algorithm 1 The DC-EGM algorithm Input: Structural parameters, utility function u(c),numberoftimepeiodsT ,number of grid points J, upper bound on wealth M¯ .  1: Fix the grid over savings A ={A1AJ} such that A1 = 0 and Aj

Algorithm 2 The EGM step, adaptation of the standard EGM algorithm (Carroll (2006)) Input: Structural parameters, utility function u(c),currentperiodt

3.3 Credit constraints Before turning to the Monte Carlo results, we briefly discuss how DC-EGM handles the credit constraints, c ≤ M. During the EGM step, the credit constraints are dealt with in exactly same manner as in Carroll (2006), as described in Section 2.2. Let the smallest possible end-of-period resources A1 = 0 be the first point in the exogenous grid over  saving A. Assuming that the corresponding point of the endogenous grid Mt(A1d) is 16 nonnegative, it holds that A(M d) = 0 for all M ≤ Mt(A1d)due to the monotonicity of the saving function A(M d) = M − ct(M d) (see Theorem 2). Therefore, the optimal consumption in this region is then given by ct (M d) = M, and the choice-specific value function is

vt(M d) = log(M) − dδt + β EV t+1(dyη)f (dη) M ≤ Mt(A1d) (18)

Note that the third component of (18) is the expected value of having zero savings. It is calculated within the EGM step for the point A1 = 0, and should be saved separately as a constant that depends on d but not on M. Once this constant is computed, vt(M d)

16It is not hard to show that this holds as long as the per-period utility function satisfies the Inada condi- tions. Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 341

Algorithm 3 Upper envelope  d Input: Endogenous grid Mt , choice-specific consumption and value functions  d  d  d ct (Mt d), vt(Mt d)calculated on Mt 1: for j = 1J− 1 do d d  2: if Mjt >Mj+1t then Detect nonmonotonicity in the endogenous grid  d d 3: Find j >jsuch that Mj+1t >Mjt      4: Define partitions N1 ={1j}, N2 ={jj }, N3 ={j J} 5: Run upper envelope calculation over segments of the “value function corre-    spondence” vt(N1d), vt(N2d) and vt(N3d) computed on the partitions in previ- ous step, as described in Section 2.2 and illustrated in Figure 1, panels (b) and (d). 6: Determine a set of suboptimal grid points Q  ∗d = 7: Refine the endogenous grid by removing suboptimal points, so that Mt  d \ Mt Q  d  d 8: Remove the corresponding points ct(MQtd), vt(MQtd) 9: [Optional] Add the kink point(s) M where the uppermost segments in Step 5 intersect and add the corresponding interpolated values of ct (Md), vt(Md) 10: end if 11: end for  ∗d  ∗d Output: Refined monotonic grid Mt , consumption and value functions ct(Mt d),  ∗d  ∗d vt(Mt d)calculated on Mt

essentially has analytical form in the interval [0Mt(A1d)], and thus can be directly evaluated at any point. When the per-period utility function is additively separable in consumption and dis- crete choice as in the retirement model we consider, (18) holds for all d in the inter- val 0 ≤ M ≤ mind Mt(A1d). In other words, the choice-specific value functions for low wealth have the same shape, which is shifted vertically with dt -specific coefficients. This implies that the logistic choice probabilities Pt(d|M) are constant in this interval and have to be calculated only once.

4. Monte Carlo results

In this section we investigate the properties of the approximate maximum likelihood estimator (MLE) that we obtain using the DC-EGM to approximate the model solution in the inner loop of the nested fixed point algorithm. We specifically focus on the role of income uncertainty and taste shocks for the approximation bias induced by a numerical solution with a finite number of grid points; in particular, how approximation bias de- pends on the number of grid points in smooth as well as nonsmooth problems. After a description of the data generating process (DGP), we present the results from a series of Monte Carlo experiments, and show that models used in typical empirical applications are sufficiently smooth to almost eliminate approximation bias using relatively few grid points. 342 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017)

4.1 Data generation process

For the Monte Carlo we consider a slightly more general formulation of the con- sumption–savings and retirement problem defined in (1) with constant relative risk aversion (CRRA) utility T 1−ρ − t ct 1 max β − δtdt + σεε(dt)  (19) {c d }T 1 − ρ t t 1 t=1 where ρ is the CRRA coefficient. So as to simulate synthetic data from the DGP consistent with the model and the vector of true parameter values, we solve the model very accurately with 2000 grid points using the DC-EGM. We refer to this solution as the true solution even though this is of course only an accurate finite approximation of the value function.17 We consider several specifications of the model in the Monte Carlo experiments be- low to study various aspects of the performance of the estimator. Our monte carlo exper- iments focus on estimation of the parameter δ for the disutility of work18,whichweas- sume is time-invariant with a true values shown in Table 1.Weperform200 Monte Carlo replications for each of the combinations of the other parameters, which are treated as known and fixed at their true values listed in Table 1. The exception is Section 4.4 where thetruevalueofσε used to generate the data is zero, but where we impose that σε is either 001 or 005. This enables us to also study the effect of model misspecification on the Monte Carlo performance of the NFXP estimator using the DC-EGM algorithm. For each specification of the model, 50,000 individuals are simulated for T = 44 pe- d = riods. Each individual i is initiated as a full-time worker si1 1, where we have used d ∈{ } sit 0 1 to denote the labor market state, that is, whether an individual is retired d = d = d (sit 0) or working (sit 1). Each worker’s initial wealth Mi1 is drawn from a uniform distribution on the interval [0 100]. At the beginning of each time period t,arandom log-normal labor market income shock ηt with variance parameter ση is drawn if the d individual i is working and the individual’s resources Mt are calculated. Given the level

Table 1. Baseline true parameter values.

Description Value Description Value

Time horizon T = 44 Disutility of work δ ∈{01 05} Gross interest rate R = 103 Discount factor β = 097 Full time employment income y = 10 CRRA coefficient ρ = 20 Income variance ση = 0 Taste shocks scale σε ∈{0 001 005}

17As a spot check, we have also compared this solution with the traditional value function iteration ap- proach, where we used a grid search over 1000 discrete points on the interval [0Mt ] to locate the optimal consumption for each value of wealth. We find that results are essentially identical. 18 Other parameters insluding the scale of taste shocks σε could have been estimated as well, but for clear exposition we focus on a single parameter. Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 343

of resources, discrete choice-specific value functions and choice probabilities are com- d puted, and a random draw determines which discrete labor market option dit is cho- sen. The labor force participation decision at t becomes the labor market state at t + 1, d = d d sit+1 dit . The optimal level of consumption, cit , is then computed conditional on dit , and the end-of-period wealth is calculated and stored to be used for calculation of re- + d sources available in the beginning of period t 1, Mit+1. We then add normal addi- tive measurement error with standard deviation σξ = 1 to get the simulated consump- d d d d d tion data, cit . This produces simulated panel data (Mitsitditcit) for each individual i ∈{150,000} in all time periods t ∈{144}.

4.2 Maximum likelihood estimation We implement a discrete-continuous version of the nested fixed point (NFXP) maximum likelihood estimator devised in Rust (1987, 1988), where we augment the original dis- crete choice estimator with a measurement error approach when assessing the likeli- hood of the observed continuous choices. { d d d d } Assume that a panel data set is available, (Mitsitditcit) i={1N}t={1T },con- taining observations on wealth, labor market state, and discrete and continuous choices of individuals i = 1N in time periods t = 1T.Letct(Mtstdt|θ) denote the consumption policy function computed by the DC-EGM for a given vector of model pa- rameters θ = (δβρσησε). We assume that consumption is observed with additive Gaussian measurement error, d = d d d | + ∼ ∀ cit ct Mitsitdit θ ξitξit N(0σξ) i.i.d. i t (20) d = d − d d d | Let ξit(θ) cit ct (Mitsitdit θ) denote the difference between the predicted and the observed consumption. We assume that the measurement error, ξit , is independent of the taste shocks, εt(dt), and, thus, the joint likelihood of observation i in period t is given by φ ξd (θ)/σ = d | d d it ξ it(θ σξ) P dit Mitsitθ  (21) σξ where φ(·) is the density function of the standard normal distribution. We have ignored d d | d the controlled transition probability for the retirement status sit , since Ptr(sit sit−1 d dit−1) is always 1 in the data when retirement is absorbing and the labor market state is perfectly controlled by the decision. d = The choice probabilities for the workers (sit 1) are standard logits, d d d exp vt M s d |θ /σε P dd |Mdsd θ = it it it  (22) it it it 1 d d | exp vt Mitsitj θ /σε j=0

d d d | and are computed from the discrete choice-specific value functions vt(Mitsitdit θ) found by the DC-EGM given a particular value of the parameter vector θ, evaluated at 344 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017) the data. Because retirement is absorbing and thus retirees do not have any discrete choice to make, the first component of individual likelihood contribution (21)dropsout d = when sit 0. L˜ = N T The joint log-likelihood function is given by (θ σξ) log i t it(θ σξ),where rearranging the first order condition with respect to σ2 yields the standard MLE for the ξ 2 = 1 Ti d 2 measurement error variance, σξ (θ) NT t=1 ξit(θ) . The concentrated log-likelihood function is, therefore, proportional to N T sd N T L ∝ it d d d | − d d | − 1 d 2 (θ) vt Mitsitdit θ EV t Mitsit θ log ξit(θ)  (23) σε 2 i=1 t=1 i=1 t=1

d d | 19 where EV t(Mitsit θ) is the the log sum given in (14) evaluated at parameter value θ. The parameter vector θˆ that maximizes (23) is the MLE of the model parameters.

4.3 Taste shocks as unobserved state variables

We are now ready to investigate the effects of smoothing on the accuracy of the MLE based on the DC-EGM algorithm. We conduct two Monte Carlo experiments where we vary the degree of smoothing induced by extreme value taste shocks and income un- certainty, respectively. Throughout we focus on estimating the parameter that indexes disutility of work, δ, while keeping all others fixed at their true values. Appendix C con- tains the average estimation time for DC-EGM.

Taste shocks and approximation error Figure 6 displays the root mean square error (RMSE) of the parameter estimates for the disutility of work, δˆ. Results are shown for varying degrees of smoothing, σε ∈{001 005}, and different values of the disutility of

Figure 6. Monte Carlo results: disutility of work. The plots illustrate the root mean square error ˆ (RMSE) of δ. Results are shown for varying degrees of smoothing, σε ∈{001 005}, and different values of the disutility of work, δ ∈{01 05}. The rest of the parameters are at their baseline levels; see Table 1.

19 d = Following (23), the log sum only has to be evaluated for workers, sit 1. Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 345

work parameter, δ ∈{01 05}.WithRMSEaround10e−3, the proposed estimator is al- ready accurate with 50 grid points and rapidly improves as the number of grid points increases from 50 through 1000. Note that standard errors will of course increase with σε due to the increased amount of unexplained variation in the error term, and RMSE reflects this too. Bearing this is mind, it is evident that the approximation bias decreases as the degree of smoothing increases, that is, larger values of σε. For higher levels of smoothing, problems with multiplicity of the Euler equation solutions disappear and few grid points are needed to approximate the (smooth) consumption function. This is particularly true when the disutility from work is large (δ = 05) because the noncon- cave regions are larger in this case. We also calculated the Monte Carlo standard devia- tion (MCSD),20 which is on the order of 10e−4 irrespective of the number of grid points used.

Income uncertainty Additional uncertainty about, for example, future labor market in- come tends to smooth out secondary kinks stemming from multiple solutions to the Eu- ler equations. To illustrate how that additional smoothing affects the proposed estima- tor, Figure 7 display RMSE when introducing income uncertainty. We report results from 21 2 ∈{ } two different values of the income variance, ση 0001 005 . The first level, 0001, does not completely smooth out secondary kinks while the significantly more uncertain 2 = income process with ση 005 does (see the right panel of Figure 4). Income uncertainty together with taste shocks smooth the problem to such a degree that the RMSE drops by an order of magnitude when increasing the income variance from 0001 to 005. Hence, using only a few grid points when estimating such a model will result in only minor approximation errors.

Figure 7. Monte Carlo results: income uncertainty. The plots illustrate the root mean square error (RMSE). Results are shown for varying degrees of smoothing, σε ∈{001 005}, and different 2 ∈{ } values of the income variance, ση 0001 005 . The rest of the parameters are at their baseline levels; see Table 1.

20MCSD is defined as standard deviation of the difference between the estimates and true values accross Monte Carlo runs, results not shown. 21The values of the income variance we use correspond well to the empirical findings, for example, in Gourinchas and Parker (2002), Meghir and Pistaferri (2004), Imai and Keane (2004). 346 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017)

As mentioned, standard errors will of course increase with σε due to the increased amount of unexplained variation. The MCSD is quite small and unaffected by the degree of income uncertainty as well as the number of grid points, but increases from 000023 to 000045 as σε increases from 001 to 005. This is the main explanation for why RMSE is only smaller for a small number of grid points. Sorting out this effect, it is clear that increasing σε decreases the amount of pure approximation bias, especially when the number of grid points is small. Note that MCSD is very small, in part due to a relatively large sample size, but also because the variance of the i.i.d. extreme value error term is extremely small. In most empirical applications, σε would be larger, leading to an even smoother problem than the one we consider here. Hence, with relatively few grid points we can expect to obtain an even smaller approximation bias induced by the finite grid approximation in the DC-EGM.

4.4 Taste shocks as logit smoother Until now we have assumed that the correct model has unobserved state variables and, thus, σε > 0 has to be estimated. To investigate how the proposed estimator performs if the data stem from a model in which there are no unobserved states, we estimate versions of the model where we impose σε > 0 and, thus, estimate a misspecified model. This is interesting because if researchers have reasons to believe that the underlying model has no shocks, the inclusion of these shocks acts as a smooth approximation to the true deterministic model. As argued above, solving the smoothed model is much faster since it requires fewer grid points and, thus, is much faster to estimate. Figure 8 illustrates the RMSE of δ when using 50, 100,and500 grid points for var- ious levels of smoothing σε ∈[0001 005], while the correct level is σε = 0.Intuitively, as the model becomes “more” misspecified (increasing the imposed σε), the RMSE and the MCSD increase. Interestingly, for a given number of discrete grid points, the RMSE is

Figure 8. Monte Carlo results: true model without taste shocks (misspecified). The plots illus- trate the root mean square error (RMSE) from estimation of a misspecified model. The model from which data are simulated is deterministic, σε = 0, while the model used to estimate the disutility of work imposes σε > 0. Results are shown for varying degrees of imposed smoothing, σε ∈[0001 005] on the horizontal axes, different levels of income shocks, ση ∈{0 005},anddif- ferent number of grid points. The rest of the parameters are at their baseline levels; see Table 1. Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 347

minimized by a σε > 0. While large degree of smoothing induces significant approxima- tion bias, the bias is initially falling in σε until some point at which the RMSE increases again. The minimum of the RMSE is attained for lower levels of smoothing if additional stochasticity (i.e., income shocks) is present in the model. This is expected because the income uncertainty smooths the problem and less logit smoothing is needed to ob- tain the optimal smooth approximation. It is worth noting, however, that the optimal amount of logit smoothing may not be sufficient to completely eliminate the noncon- vexities in the model. It is therefore essential for the solution method to be able to ro- bustly solve optimization problems with multiple local solutions, the task that DC-EGM performs particularly well. These results show the potential for great speed gains by smoothing. Using only 50 grid points and imposing σε = 001 produces a RMSE of around the same level as using 500 grid points and imposing σε ≈ 0 close to the true model. We can reduce the number of grid points by an order of magnitude without increasing the root mean square error significantly simply by choosing the degree of smoothing appropriately. Note, however, that there is naturally a trade-off between lowering the computational cost by increasing smoothing and decreasing the number of grid points and the accuracy of the resulting solution compared to the true solution of the nonsmooth model.

5. Discussion and conclusions In this paper we have shown how complications from numerous discontinuities in the consumption function to a life-cycle model with discrete and continuous choices can be avoided by smoothing the problem and using the DC-EGM algorithm. The proposed algorithm retains all the nice features of the original EGM method, namely that it typ- ically does not require any iterative root-finding operations, and is equally efficient in dealing with borrowing constraints. Moreover, we show that the smoothed model can be successfully estimated by the NFXP estimator based on the DC-EGM algorithm even with a small number of grid points and even when the true DGP is nonsmooth. For expositional clarity, we focused on a simple illustrative example when explaining the details of the DC-EGM algorithm. This also allows us to derive an analytical solution that we can compare to the numerical one. The analytical solution provides economic intuition for why first and second order kinks appear, and permits direct evaluation of the precision of the DC-EGM algorithm. Admittedly, the illustrative model of con- sumption and retirement is very stylized, and the reader may wonder if DC-EGM can be used to solve and estimate larger, more complex, and realistic models with more state variables, multiple discrete alternatives, heterogeneous agents, institutional constraints, and so forth. The answer is positive. As shown in Appendix A, the DC-EGM method can be applied to a much more general class of problems as long as the postdecision state variable is a sufficient statistic for the continuous choice in the current period, and the marginal utility function and intratemporal budget constraint are invertible. When the marginal utility function is analytically invertible, DC-EGM also avoids the bulk of costly root-finding operations.22

22The DC-EGM algorithm can also be generalized for other specifications including the models with large state space and multidimensional discrete choice. White (2015), Iskhakov (2015), Druedahl and Jør- 348 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017)

The DC-EGM method has been implemented in several recent empirical applica- tions, where it has proven to be a powerful tool for solving and estimating more com- plex DC models in various fields: labor supply, human capital accumulation and saving (Iskhakov and Keane (2016)); joint retirement decisions of couples (Jørgensen (2014)); consumption, housing purchases, and housing debt (Yao, Fagereng, and Natvik (2015)); saving decisions and fertility (Ejrnæs and Jørgensen (2016)); precautionary borrowing and credit card debt (Druedahl and Jørgensen (2015)). We have demonstrated in the Monte Carlo experiments that the NFXP maximum likelihood estimator based on the DC-EGM solution algorithm performs very well when decisions are made under uncertainty, for example, in the presence of extreme valued taste shocks and the existence of income uncertainty. Even when the true model is de- terministic, taste shocks can be used as a powerful smoothing device to simplify the solution without much approximation bias due to oversmoothing. The addition of extreme value taste shocks is not only a convenient smoothing de- vice that simplifies the solution of DC models, it is also an empirically relevant extension required to avoid statistical degeneracy of the model. In empirical applications the vari- ance of these shocks is typically much larger compared to what we have considered here. This makes models smooth enough to almost eliminate approximation bias in parame- ter estimates even with relatively few grid points. We therefore conclude that DC-EGM is practical and appears to be a fast and accurate method for use in actual empirical applications.

Appendix A: Theoretical foundations of DC-EGM For the purpose of this Appendix we consider the following more general formulation of the consumption–savings and retirement problem. Let Mt denote consumable wealth that is a continuous state variable with particular motion rule described below, and let st denote a vector of additional discrete or discretized state variables. Let ct be the scalar continuous decision (consumption) and let dt be a scalar discrete decision variable with finite set of values that could encode multiple discrete decisions if needed. Consider the dynamic discrete-continuous choice problem given by the Bellman equation, Vt(Mtst) = max u(ct dtst) + σεεt (dt) + βtEt Vt+1(Mt+1st+1)|Atdt  (24) 0≤ct ≤Mt dt ∈Dt where t = 1T − 1, and the last component of the maximand is absent for t = T .The choices in the model are restricted by the credit constraint ct ≤ Mt and feasibility sets Dt . The per-period utility includes scaled taste shocks σεεt(dt),whereεt is a vector of i.i.d. extreme value (type I) distributed random variables. The dimension of εt is equal to the number of alternatives that the discrete choice variable may take, εt (dt) denotes the component that corresponds to a particular discrete decision. In the general case the discount factor βt is time-specific to allow for the probability of survival. The ex- pectation is taken over the taste shocks εt+1 and the transition probabilities of the state gensen (2017) present theoretical foundations for extending endogenous grid methods to multidimen- sional models. Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 349

process st as well as any serially uncorrelated (or idiosyncratic) shocks that may affect Mt+1 and st+1. The expectation is taken conditional on the choices in period t using the sufficient statistic At = Mt − ct in place of the continuous (consumption) choice. Using the well known representation of the expectation of the maximum of extreme value distributed random variables, the Bellman equation (24)canbewritteninterms of the deterministic choice-specific value functions as vt(Mtst|dt) = max u(ct dtst) + βt Et Vt+1(Mt+1st+1)|Atdt (25) 0≤ct ≤Mt = max u(ct dtst) 0≤ct ≤Mt (26) + βtEt φ vt+1(Mt+1st+1|dt+1) Dt+1σε |Atdt  = [ xj ] where φ(xjJσ) σ log j∈J exp σ is the log-sum function. The expectation in (26)is now only taken with respect to (w.r.t.) state transitions and idiosyncratic shocks, unlike in (24)and(25). The crucial assumption for the DC-EGM is that postdecision state At constitutes the sufficient statistic for the continuous choice in period t, that is, that transition proba- bilities/densities of the state process (Mtst) depend on At rather than Mt or ct directly. It is also required that At as a function of Mt is (analytically) invertible. For our case, assume for concreteness that At = Mt − ct ,andthatMt+1 = RAt + y(dt),whereR is a gross return, and y(dt) is discrete choice-specific income. We also assume that the utility function u(ct dtst) satisfies the following condition.

23 Assumption A (Concave Utility). The instantaneous utility u(ct dtst) is concave in ct and has a monotonic derivative w.r.t. ct that is (analytically) invertible.

Lemma 1 (Smoothed Euler Equation). The Euler equation for the problem (24) takes the form

 u (ct dtst)  (27) = βtREt u ct+1(Mt+1st+1|dt+1) dt+1st+1 Pt+1(dt+1|Mt+1st+1) 

dt+1∈Dt+1

 where u (ct dtst) is the partial derivative of the utility function w.r.t. ct , ct+1(Mt+1 st+1|dt+1) is the choice-specific consumption function in period t +1, and Pt+1(dt+1|Mt+1 st+1) is the conditional discrete choice probability in period t + 1, given by Pt(dt|Mtst) = exp vt(Mtst|dt)/σε exp vt(Mtst|d)/σε  (28)

d∈Dt

23 More precisely, a weaker condition is sufficient, namely for every x and arbitrary 1 > 0 and 2 > 0 it must hold that u(ct + 1dt st ) − u(ct dt st ) ≥ u(ct + 1 + 2dt st ) − u(ct + 2dt st ); see Theorem 2. 350 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017)

Proof. Discrete choice-specific consumption functions ct (Mtst|dt) satisfy the the first order conditions for the maximization problems in (25)givenby  ∂Vt+1(Mt+1st+1) ∂Mt+1 u (ct dtst) + βtE = 0 (29) ∂Mt+1 ∂ct for every value of dt ∈ Dt . The envelope conditions for (25), ∂vt(Mtst|dt) ∂Vt+1(Mt+1st+1) ∂Mt+1 = βtE  (30) ∂Mt ∂Mt+1 ∂Mt and because ∂Mt+1(dt)/∂Mt = R =−∂Mt+1(dt)/∂ct , it holds for all dt and t = 1T−1:

 ∂vt(Mtst|dt) u (ct dtst) =  (31) ∂Mt The first order condition for (26)is  ∂vt+1(Mt+1st+1|dt+1) u (ct dtst) = βt REt Pt+1(dt+1|Mt+1st+1)  (32) ∂Mt+1 dt+1∈Dt+1 where choice probabilities Pt+1(dt+1|Mt+1st+1) are given by (28). Plugging (31) into (32) completes the proof. 

The DC-EGM algorithm outlined in Algorithm 1 is readily applicable to the general formulation of the discrete-continuous problem (24), except for the extra loop that has to be taken over all additional states st in Step 3 (Algorithm 1). The expectation over the transition probabilities of the state process is calculated together with the expectation over the other stochastic elements of the model in Algorithm 2. The criteria for selecting the solutions of the Euler equation that correspond to the optimal behavior in the model are based on the monotonicity of the savings function, which is established with the following theorem.24

Theorem 4 (Monotonicity of Savings Function). Denote by At(Mtst|dt) = Mt − ct (Mt st|dt) a discrete choice-specific savings function in period t. Under Assumption A, function At(M st|dt) is monotone nondecreasing in M for all t, st , and dt ∈ Dt .

Proof.Theorem4 is an application of Theorem 4 in Milgrom and Shannon (1994)to the current problem. Conditional savings function At(Mtst |dt) is a maximizer in the expression similar to (25) for the discrete choice-specific value function vt(Mtst|dt).As afunctionofM and A, the maximand in this expression is given by f(AM)= u(M − Adtst) + βt Et Vt+1 Mt+1(A) st+1  (33) where Mt+1(A) is next period wealth as an increasing function of A. It is necessary and sufficient to show that f(AM) is quasi-supermodular in A and satisfies the sin-

24A similar monotonicity result is also used in Fella (2014). Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 351

gle crossing property in (A M). The former is trivial because A is a scalar. For the latter, consider A >A, M >M and assume f(AM)>f(AM).Then     f A M − f A M     = u M − A dtst − u M − A dtst   + βt EV t+1 Mt+1 A st+1 − EV t+1 Mt+1 A st+1     (34) ≥ u M − A dtst − u M − A dtst   + βt EV t+1 Mt+1 A st+1 − EV t+1 Mt+1 A st+1     = f A M − f A M > 0

For the first inequality we use         u M − A dtst − u M − A dtst ≥ u M − A dtst − u M − A dtst          u M − A dtst − u M − A dtst ≥ u M − A dtst − u M − A dtst  (35)

u(z dtst) − u(z − M dtst) ≥ u(z + Adtst) − u(z + A − M dtst)

      where z = M − A , A = A − A > 0,andM = M − M > 0, and which is due to As- sumption A, that is, concavity of the utility function. It follows then that f(AM)> f(AM). Similarly, assumption f(AM) ≥ f(AM) leads to f(AM) ≥ f(AM) and, thus, f(AM)satisfies the single crossing property, and the monotonicity theorem in Milgrom and Shannon (1994) applies. 

Appendix B: Spurious discontinuities from numerical integration

To illustrate how naive numerical quadrature integration can produce spurious discon- tinuities in the policy function, we here focus on the illustrative model without smooth- ing. Particularly, for working households, the smoothed Euler equation in (17) collapses to ∞   u ct(Mt|dt) = β Ru ct+1(Mt+1|dt+1 = 1) · 1{Mt+1 ≤ Mt+1}f(dη) 0 ∞ (36)  + β Ru ct+1(Mt+1|dt+1 = 0) · 1{Mt+1 > Mt+1}f(dη) 0 wherewerecallthatMt+1 = R(Mt − ct(Mt|dt)) + yη. With the change of variables, q = f(η), we can write the Euler equation (36)as qt  −1  −1 u ct(Mt|dt) = β f (q)u ct+1 R Mt − ct (Mt|dt) + yf (q) dt+1 = 1 dq 0 (37) 1 −1  −1 + β f (q)u ct+1 R Mt − ct (Mt|dt) + yf (q) dt+1 = 0 dq qt 352 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017) where the threshold qt is given by = Mt+1 qt f  (38) Mt+1 As long as the income shock distribution is not degenerate, the resulting Euler equation (37) is continuous and smooth in ct(Mt W) through Mt+1 in spite of the discontinuity in the consumption function ct+1(Mt+1 W) at Mt+1 = Mt+1. In turn, this suggests that numerical integration should be done twice—once for each case—to ensure that the integral is well behaved. In contrast, the naive Euler equation in (36) is discontinuous in ct(Mt W).Whenus- ing numerical quadrature to evaluate the integral, for a given level of resources, some of the nodes will result in Mt+1 ≤ Mt+1 while others will result in the opposite case. For concreteness, say that 10 nodes are used and the five lowest nodes result in Mt+1 ≤ Mt+1. Say also that for a slightly larger value of current resources perhaps only four nodes sat- isfy Mt+1 ≤ Mt+1 while now six invoke the alternative. When comparing the solution found in the two (close) values of current period resources, there will be a discontinuous change in the optimal consumption. In the current model, this would result in spurious downward kinks in the consumption function around a secondary kink.

Appendix C: DC-EGM run times Figure 9 illustrates the average estimation time spent to estimate δˆ.Resultsareshown for varying degrees of income uncertainty, ση ∈{0001 005}, and different values of the disutility of work parameter, δ ∈{01 05}.

Appendix D: Proof of extreme value homotopy principle This appendix proves Theorem 3, which states that the value function and optimal de- cision rules in the presence of type I extreme value distributed taste shocks converge (in an appropriate sense to be defined below) to the value functions and decision rules of a

Figure 9. Timing: income uncertainty. The plots illustrate the time spent to estimate the model. Results are shown for varying degree of smoothing, σε ∈{001 005}, and different values of the 2 ∈{ } income variance, ση 0001 005 . The rest of parameters are at their baseline levels; see Table 1. Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 353

limiting problem without taste shocks. We prove Theorem 3 for a more general class of problems than just the retirement consumption model, and therefore restate it below. Let ε be a random variable having a standardized type I extreme value distribution with the cumulative distribution function (CDF) given by F(ε)= exp − exp{−ε}  (39)

We have E{ε}=γ,whereγ = 0577  is Euler’s constant and var(ε) = π2/6.Thenifσ is a positive scaling constant, σε will also be a type I extreme value distribution with expected value σγ and variance σ2π2/6. In the notation of the illustrative model in the paper, σ corresponds to the scaling parameter of the “perturbed” model σε. The homotopy convergence result we prove below holds for a considerably more general class of dynamic programming (DP) problems than the simple retirement ex- ampleweanalyzedinSection2.1 or even the class defined in Appendix A,whereweas- sumed the continuous choice is a unidimensional variable and we imposed additional assumptions to ensure monotonicity of the savings function. In this appendix we con- sider a more general class of problems, though we do not strive for maximum possible generality so as to make our proof as straightforward as possible. Consider a finite horizon DP problem without type I extreme value taste shocks that we also refer to as the “unperturbed” DP problem. In the last period, T ,theagent chooses a vector of k continuous choice variables c ∈ CT (d s),whereCT (d s) is a com- pact subset of a Rk, d is one of the discrete choices, and s is a potentially multidimen- sional vector of state variables in some Borel subset S of a finite dimensional Euclidean space. We assume that the discrete choice d is an element of a finite choice set DT (s). Let uT (dcs)be a utility function that is continuous in c for each s and each d ∈ DT (s) and a Borel measurable function of s for each c and d. Then the value function in period T is given by

VT (s) = max max uT (dcs) (40) d∈DT (s) c∈CT (ds)  Now consider time T − 1 and let pT (s |s c d) be a Markov transition probability provid- ing the conditional probability distribution over the state s at time T given that the state vector at time T − 1 is s, the discrete choice is d, and the continuous choice is c.Define the conditional expectation of VT , EV T −1(dcs),by   EV T −1(dcs)= VT s pT ∂s |dcs  (41) whereweuse∂s to indicate the stochastic next period state variables over which this expectation is taken. In Assumption C below, we assume that this conditional expec- tation exists and is continuous in c for each s ∈ S and d ∈ DT −1(s).Thenbyback- ward induction we can define the value function VT −1(s) and, continuing for each t ∈{T − 1T − 20}, we can define the sequence of functions {Vt} recursively using Bellman’s equation,   Vt(s) = max max ut (dcs)+ β Vt+1 s pt+1 ∂s |dcs  (42) d∈Dt (s) c∈Ct (ds) where β ≥ 0 is the agent’s discount factor. 354 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017)

We make the following assumptions on this limiting DP problem without taste shocks that is sufficient to guarantee the existence of a well defined solution.

Assumption B. The choice sets Dt (s) are all finite with a uniformly bounded number of elements D given by D = max sup Dt(s) < ∞ (43) ∈{ } t 01T s∈S where |Dt(s)| denotes the number of elements in the finite set Dt(s).

Assumption C. For each t ∈{0 1T}, each s ∈ S, and each d ∈ Dt(s), the function ut(dcs)is continuous in c, and for each t ∈{1 2T − 1}, s ∈ S, and d ∈ Dt−1(s), the function EV t(dcs)given by   EV t(dcs)= Vt s pt ∂s |dcs (44) is finite and continuous in c.

Define the discrete choice-specific continuous choice function ct (d s) by ct (d s) = argmax ut(dcs)+ βEV t+1(dcs) (45) c∈Ct (ds) and the optimal discrete decision rule δt(s) by δt(s) = argmax ut dct(d s) s + βEV t+1 dct(d s) s  (46) d∈Dt (s)

The overall optimal continuous decision rule ct(s) is then given by ct(s) = ct δt(s) s  (47)

The solution to the DP problem is given by the collection Γ of the T + 1 value functions {V0V1VT },theT +1 optimal continuous decision rules {c0c1cT },andtheT +1 optimal discrete decision rules {δ0δ1δT }. Nowwedefineafamilyofperturbed DP problems indexed by σ, the scale parameter of the type I extreme value distribution. Let ε denote a vector of i.i.d. extreme value ran- dom variables with the same dimension as |Dt(s)|, the number of elements in the finite choice set Dt (s). Assume the elements of DT (s) are ordered in some fashion and let ε(d) be the component of the vector ε corresponding to the choice of alternative d ∈ Dt (s). We will refer to ε(d) as the dth taste shock. Now consider the last period T . The value function VσT (s ε) is given by VσT (s ε) = max max uT (dcs)+ σε(d)  (48) d∈DT (s) c∈CT (ds)

|D (s)| Notice that VσT is now a function of the vector s and the vector ε ∈ R T .Ifthenumber D of elements of DT (s) varies with s ∈ S, we can embed the vector ε in R ,whereD is Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 355

the upper bound on the number of discrete choices by Assumption B.Wecanusethe convention that if |DT (s)|

D F(ε(1)     ε(D)) = exp − exp −ε(d)  (49) d=1

To compute the expected value of VT (s ε), we apply multivariate integration to get     EV σT (dcs)= Vt s ε F ∂ε pT ∂s |dcs   s ε (50)  = σ log exp uT dcT (s d) d /σ pT ∂s |dcs   s  d∈DT (s )

where c (s d) = argmax u (dcs) is the choice-specific continuous choice T c∈CT (ds) T function. The closed-form expression for the expectation over ε—the type I extreme value random variables—is a consequence of a property of extreme value random vari- ables known as max stability, that is, the maximum of a finite collection of type I extreme value random variables has a (shifted) type I extreme value distribution. We refer to the log-sum formula inside the integral of the lower equation in (50)asthesmoothed max function. We now prove a key lemma that establishes a bound between the usual max function and the smoothed max function.

Lemma 2 (Log-Sum Error Bounds). Let {v1vD} be any finite set of D real numbers and let σ>0 be a constant. Then we have D 0 ≤ σ log exp{vd/σ} − max{v1vD}≤σ log(D) (51) d=1

Proof. Consider the shifted values vd − max(v1vD) ≤ 0. It follows that D D log exp vd − max{v1vD} /σ ≤ log exp{0} = log(D) (52) d=1 d=1

∗ Define d = arg maxd(vd) and let J ≥ 1 denote the number of elements of D for which vd = vd∗ . The lower bound is obtained from observing that D log J + exp vd − max{v1vD} /σ ≥ 0 (53) d=1d=d∗ 356 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017)

Combining (52)and(53) with the identity D σ log exp vd − max{v1vD} /σ d=1 (54) D = σ log exp{vd/σ} − max{v1vD} d=1 concludes the proof. 

Lemma 2 is the key to all of our subsequent results and the key to Theorem 5 since it shows that the difference between the max function and the smoothed max func- tion is bounded by σ log(D) and this tends to 0 as σ ↓ 0. This will imply that the dif- ference between the value functions and decision rules of the unperturbed limiting DP problem and the family of perturbed DP problems with extreme value distributed taste shocks will converge to zero as the scale of the extreme value taste shocks, σ,converges to 0. We can now define the value functions at all time periods for the perturbed problem as the sequence {Vσ0VσT },whereVσT is given by equation (48) and the other value functions are given by the Bellman recursion Vσt (s ε) = max max ut(dcs)+ σε(d)+ βEV σt+1(dcs)  (55) d∈Dt (s) c∈Ct (ds) where EV σt+1(dcs)is the conditional expectation of Vσt+1(s ε) and is given by

EV σt+1(dcs)      (56) = σ log exp vσt+1 d cσt+1 d s s /σ pt+1 ∂s |dcs   s   d ∈Dt+1(s ) where

vσt+1(dcs)= ut+1(dcs)+ βEV σt+2(dcs) (57) and cσt+1(d s) is the choice-specific continuous choice rule given by cσt+1(d s) = argmax vσt+1(dcs)  (58) c∈Ct+1(ds)

Note that we used the max-stability property again to obtain the expression for EV σt+1(dcs)in equation (56), and we also note that due to the assumption that taste shocks are not only contemporaneously independent across different discrete choices d but also intertemporally independent processes, it follows that the value of the ε state vector at time t does not affect the conditional expectation of Vσt+1 and, hence, does not enter the conditional expectation EV σt+1(dcs).Thisconditional independence restriction on the ε shocks is critical to all results that follow below. Having defined the set of value functions for the family of perturbed problems, we can define the full solution of the perturbed problem as the collection Γσ consisting of Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 357

the value functions (Vσ0VσT ), the continuous decision rules (cσ0cσT ),and the the discrete decision rules (δσ0δσT ). Note that all of these objects depend on both s and ε, which constitute the full vector of state variables in the perturbed problem. In particular, the discrete decision rule δσt (s ε) can be defined using the choice-specific continuous choice rule cσt (d s) as δσt (s ε) = argmax vσt dcσt (d s) s + σε(d)  (59) d∈Dt (s)

and the unconditional or continuous decision rule can be defined using the choice- specific continuous choice rules by cσt (s ε) = cσt δσt (s ε) s  (60)

To define a notion of convergence of the solution Γσ of the family of perturbed DP problems to the solution Γ of the limiting unperturbed problem, we have to confront the difficulty that the state space for the family of perturbed problems is the set of points of the form (s ε) for s ∈ S and ε ∈ RD whereas the state space of the limiting unperturbed problem is just S. We start by noting the representation for the value functions of the perturbed problem, Vσt (s ε) = max vσt dcσt (d s) s + σε(d)  (61) d∈Dt (s)

which follows directly from the Bellman equation (55) and the definition of the vt func- tion in equation (57). We now compute a partial expectation of the value functions Vσt (s ε) over the ε holding the s state variable fixed. That is, we define the partial ex- pectation EV σt (s) as the function given by

EV σt (s) = Vσt (s ε)F(ε) ε (62) = σ exp vσt dcσt (d s) s /σ 

d∈Dt (s)

We are in the position now to state the main result, which is a reformulation of Theo- rem 3 for a more general class of DC models than the consumption retirement model in Section 2.

Theorem 5 (Extreme Value Homotopy Principle). Under Assumptions B and C, let Γ = (V0VT ) (δ0δT ) (c0cT ) (63)

be the solution to the limiting DP problem without taste shocks given in equations (40), (42), (45), and (46) above. Similarly, let Γσ = (Vσ0VσT ) (δσ0δσT ) (cσ0cσT ) (64) 358 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017) be the solution to the the perturbed DP problem with type I extreme value taste shocks with scale parameter σ>0 giveninequations(48), (55), (56), (59), and (60). Then as σ → 0 we have

lim Γσ = Γ (65) σ↓0 where the convergence of value functions is defined in terms of the partial expectations of the value functions for the perturbed problems with taste shocks, with EV σt (s) given in equation (62). It follows that the uniform bound holds T−t j ∀t sup EV σt (s) − Vt(s) ≤ σ β log(D) (66) ∈ s S j=0 and the decision rules converge pointwise for all (s ε), s ∈ S, and ε ∈ RD, that is,

lim δσt (s ε) = δt(s) σ↓0 (67) lim cσt (s ε) = ct(s) σ↓0 assuming that the decision rules of the limiting problem δt(s), ct(s) are singletons; other- wise the limits are elements of the sets (δt(s) ct (s)).

Proof.WeproveTheorem5 in three steps. First, we prove (66) by induction using Lemma 2 and showing that the bounds are independent of s. Second, we prove con- vergence of decision rules assuming that the limiting problem Γ has unique solution. Third, we extend the latter result to nonsingleton solution sets.

Lemma 3 (DP Error Bounds). Let Vt(s) be the value function for the unperturbed DP problem and let EV σt (s) be the partial expectation of the value function Vσt (s ε) to the perturbed DP problem. Then we have T−t j ∀ts 0 ≤ EV σt (s) − Vt(s) ≤ σ β log(D) (68) j=0

Proof. Lemma 3 can be proved by induction using Lemma 2. We work out the first sev- eral steps of the inductive argument, starting at period T .InperiodT , VT (s) is given by equation (40), which can be rewritten in terms of the choice-specific continuous choice rule as VT (s) = max uT dcT (d s) s (69) d∈DT (s) and, similarly, we have EV σT (s) given by EV σT (s) = σ log exp uT dcT (d s) s /σ  (70)

d∈DT (s) Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 359

sinceitiseasytoseethatcT (d s) = cσT (d s) in the final period T . Using Lemma 2,we obtain the bounds

0 ≤ EV σT (s) − VT (s) ≤ σ log(D) ∀s ∈ S (71) which establish the base case for our induction proof. Now suppose the inductive hy- pothesis holds, that is, the error bounds are given by equation (68)atperiodTT−1t+ 1. We now want to show that it also holds at period t.Wehave   Vt(s) = max ut dct(d s) s + β Vt+1 s pt+1 ∂s |dct(d s) s (72) d∈Dt (s) and 1 EV (s) = σ log exp u dc (d s) s σt σ t σt d∈Dt (s) (73)   + β EV σt+1 s pt+1 ∂s |dcσt (d s) s 

Note that cσt (d s) is the choice-specific continuous decision rule for the perturbed ˜ problem. Define a function Vt(s) by substituting cσt (d s) for ct(d s) in equation (72): ˜   Vt(s) = max ut dcσt (d s) s + β Vt+1 s pt+1 ∂s |dcσt (d s) s  (74) d∈Dt (s)

Since cσt (d s) is not necessarily an optimal choice-specific consumption for the unper- turbed problem, it follows that ˜ Vt(s) ≤ Vt(s) ∀s ∈ S (75) ˜ Similarly define the function EVσt (s) by substituting the conditional expectation of Vt+1 instead of the conditional expectation of EV σt+1 in the formula for EV σt (s) in equation (73). We have 1 EV˜ (s) = σ log exp u dc (d s) s σt σ t σt d∈Dt (s) (76)   + β Vt+1 s pt+1 ∂s |dcσt (d s) s 

Note that we can write 1 EV (s) = σ log exp u dc (d s) s σt σ t σt d∈D (s) t   (77) + β Vt+1 s pt+1 ∂s |dcσt (d s) s    + β EV σt+1 s − Vt+1 s pt+1 ∂s |s cσt (d s) s  360 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017)

By the inductive hypothesis, it follows that T−t−1    j β EV σt+1 s − Vt+1 s pt+1 ∂s |dcσt (d s) s ≤ σβ β log(D) (78) j=0

Thus, it follows from inequality (78) that the inequality T−t−1 ˜ j EV σt (s) ≤ EVσt (s) + σβ β log(D) (79) j=0 holds. From Lemma 2 we have

˜ ˜ EVσt (s) − Vt(s) ≤ σ log(D) (80)

Using inequalities (75), (79)and(80) it follows that T−t j 0 ≤ EV σt (s) − Vt(s) ≤ σ β log(D) (81) j=0 completing the induction step of the argument. It follows by mathematical induction that inequality (68) holds for all t ∈{0 1T}, so Lemma 3 is proved. 

Note that the bound (68)isuniform over all states s ∈ S since the right hand side of the inequality does not depend on s. In particular, we do not need to rely on any continuity or boundedness assumptions about Vt(s): this function could potentially be nonsmooth or even discontinuous in s and an unbounded function of s, something typi- cal in many economic problems with consumption and saving, including the retirement problem we analyzed in Section 2. It follows from uniformity of bound (68)that(66) holds. We turn now to establishing that the decision rules δσt (s ε) and cσt (s ε) in the per- turbed problem converge to the optimal decision rules δt(s) and ct (s) in the limiting unperturbed DP problem for t ∈{0 1T}. We will allow for the possibility that there aremultiplevaluesofd and c that attain the optimum values in equations (45)and(46) above, so in general we can interpret ct (s) and δt(s) as correspondences (i.e., set-valued functions of s). However, the pointwise argument is simplest in the case where there is a unique discrete and continuous decision attaining the optimum, so we first present the argument in this case in Lemma 4 below.

Lemma 4 (Policy Convergence 1). Consider a point s ∈ S for which δt(s) is just a single element d ∈ Dt (s) and ct(s) is a single element of the set of feasible continuous choice D Ct(δt (s) s) that attains the optimum. Then (67) holds for any ε ∈ R . Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 361

Proof. Since the pair of decisions (δt(s) ct (s)) is the unique optimizer of the Bellman equation in state s ∈ S,wehave   ut δt(s) ct (s) s + β Vt+1 s pt+1 ∂s |δt(s) ct (s) s   = ut δt(s) ct δt(s) s s + β Vt+1 s pt+1 ∂s |δt(s) ct δt(s) s s (82)   >ut(dcs)+ β Vt+1 s pt+1 ∂s |dcs

∀c = ct(s) ∈ Ct (d s) d = δt(s) ∈ Dt (s)

Let d be any limit point of the sequence {δσt (s ε)}. Since feasibility requires δσt (s ε) ∈ Dt(s) and Dt(s) is a finite set, at least one limit point must exist. Similarly let c be a limit point of the choice-specific continuous decision rule cσt (δσt (s ε) s ε). This also must have one limit point since feasibility requires that cσt (s ε) = cσt (δσt (s ε) s ε) ∈ Ct(δt (s ε) s), where the latter is a compact set due to Assumption C. This follows since { } ∈ we are considering a subsequence δσnt(s ε) that converges to a particular choice d { } Dt(s), so for sufficiently large n (or sufficiently small σn) the sequence cσnt(s ε) must be elements of the compact set Ct(d s) and thus must have at least one limit point. Now we show that d = δt (s) and c = ct(s) since otherwise we would have a contra- diction of the strict optimality of the decisions (δt(s) ct (s)) in inequality (82). We have 1 EV (s) = σ log exp u dc (d s) s σt σ t σt d∈Dt (s)   + β EV σt+1 s pt+1 ∂s |dcσt (d s) s

= Vσt (s ε)F(ε) (83) = ut δσt (s ε) cσt δσt (s ε) s s + σε δσt (s ε)   + β EV σt+1 s pt+1 ∂s δσt (s ε) cσt δσt (s ε) s s F(ε)

By Lemma 3 we have that uniformly for each t ∈{0 1T} and all s ∈ S,

lim EV σt (s) = Vt(s) (84) σ↓0

However, using the fact that for a subsequence {σn} converging to zero we have

= lim δσnt(s ε) d σn↓0 (85) lim cσt δσt (s ε) s = c σn↓0 362 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017) these limits together with the representation of EV σt (s) in the last equation of (83)im- plies that   Vt(s) = ut(dcs)+ β Vt+1 s pt+1 ∂s |dcs  (86)

However, because δt(s) and ct(s) are the unique optimizers of the Bellman equation in equation (82) above, it follows that d = δt (s) and c = ct(s). This argument holds forallclusterpointsof{δσt (s ε)} and {cσt (s ε)}, so it follows that for any sequence { } = { } { } σn with limn σn 0, the sequences δσnt(s ε) and cσnt(s ε) converge to δt(s) and ct (s), respectively, proving that the claimed limits in equation (67)—the statement of Lemma 4—hold. 

Finally, we consider the case where δt(s) and/or ct (s) are not singletons. We also al- low for the optimal decision rules to the perturbed problem, δσt (s ε) and cσt (s ε),to be correspondences (corresponding to the case where multiple choices attain the opti- mum in the Bellman equation). The fact that the extreme value taste shocks are contin- uously distributed over the entire real line implies that for almost all ε, δσt (s ε) will be a singleton (i.e., there will be a unique discrete choice that maximizes the agent’s utility). We now show in Lemma 5 that even when we allow for nonuniqueness in the op- timizing choices of (d c) in both the perturbed problem and the limiting unperturbed problem, the correspondences δσt (s ε) and cσt (s ε) are upper hemicontinuous,thatis, if we have limits given by

lim δσt (s ε) = d σ↓0 (87) lim cσt (s ε) = c σ↓0 where we now allow for the possibility that the limits d and c are actual sets, upper hemi- continuity requires that d ⊂ δt(s) and c ⊂ ct (s).

Lemma 5 (Policy Convergence 2). Consider a point s ∈ S where the decision rules δt(s) and ct(s) are potentially nonunique, that is, they may be sets of points in Dt (s) and Ct(δt (s) s), respectively. Then the correspondences δσt (s ε) and cσt (s ε) are upper hemicontinuous, and for almost all ε, δσt (s ε) is a singleton, which implies that its limit d is a single element in δt(s).

Proof. The proof is similar to Lemma 4 except that we now allow for the possibility that in the limiting DP model without taste shocks, there may be multiple values of d ∈ Dt(s) and c ∈ Ct(δt (s) s) that attain the maximum of the Bellman equation in equations (45) and (46) above. Since the extreme value distribution is continuous, the probability that there are any ties in the perturbed DP problem with taste shocks is zero (with respect to the extreme value distribution) and thus for almost all (s ε), δσt (s ε) is a singleton, and thus its limit d is a singleton. Following the reasoning of Lemma 4,ifc is a limit point of cσt (s ε),thenasσ → 0 we can represent c as

c ∈ lim cσt (d s) (88) σ↓0 Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 363

that is, c is one of the limit points of the {cσt (s ε)}. Now suppose that the pair (d c) is not optimal, that is, d = δt(s) and c/∈ ct(s). Then following the same argument as in Lemma 4 we can obtain a contradiction, because following the same argument we can show that equation (86) holds, but if (d c) are not optimal, this would contradict the fact that Vt(s) attains the maximum over all feasible (d c) values in equations (45)and(46). 

This concludes the proof of Theorem 5. 

References Adda, J., C. Dustmann, and K. Stevens (2017), “The career costs of children.” Journal of Political Economy, 125 (2), 293–337. [318, 320] Aguirregabiria, V. and P.Mira (2010), “Dynamic discrete choice structural models: A sur- vey.” Journal of Econometrics, 156 (1) 38–67. [320] Ameriks, J., J. S. Briggs, A. Caplin, M. D. Shapiro, and C. Tonetti (2015), “Long-term care utility and late in life saving.” Working Paper 20973, National Bureau of Economic Re- search. [321, 332] Barillas, F.and J. Fernández-Villaverde (2007), “A generalization of the endogenous grid method.” Journal of Economic Dynamics and Control, 31 (8), 2698–2712. [318, 332] Bertsekas, D. P.,Y. Lee, B. van Roy, and J. N. Tsitsiklis (1997), “A neuro-dynamic program- ming approach to retailer inventory management.” In Proceedings of the IEEE Confer- ence on Decision and Control, Vol. 4, 4052–4057. [318] Bound, J., T. Stinebrickner, and T. Waidmann (2010), “Health, economic resources and the work decisions of older men.” Journal of Econometrics, 156 (1), 106–129. [319] Carroll, C. D. (2006), “The method of endogenous gridpoints for solving dynamic stochastic optimization problems.” Economics Letters, 91 (3), 312–320. [318, 326, 338, 340] Carroll, C. D. and W. Dunn (1997), “Data sources and solution methods for unemploy- ment expectations, jumping (S, s) triggers, and household balance sheets.” In NBER Macroeconomics Annual, Vol. 12, 165–230, National Bureau of Economic Research, Cam- bridge. [318] Casanova, M. (2010), “Happy together: A structural model of couples’ joint retirement choices.” Working paper, Department of Economics, UCLA. [320] Clausen, A. and C. Strub (2013), “A general and intuitive envelope theorem.” Working Paper 248, Edinburgh School of Economics. [319, 323] Deaton, A. (1991), “Saving and liquidity constraints.” Econometrica, 59 (5), 1221–1248. [321, 322] Druedahl, J. and C. N. Jørgensen (2015), “Precautionary borrowing and the credit card debt puzzle.” Unpublished working paper, Department of Economics, University of Copenhagen. [321, 348] 364 Iskhakov, Jørgensen, Rust, and Schjerning Quantitative Economics 8 (2017)

Druedahl, J. and T. H. Jørgensen (2017), “A general endogenous grid method for multi- dimensional models with non-convexities and constraints.” Journal of Economic Dy- namics and Control, 74, 87–107. [347, 348] Dubin, J. A. and D. L. McFadden (1984), “An econometric analysis of residential electric appliance holdings and consumption.” Econometrica, 52 (2), 345–362. [318] Ejrnæs, M. and T. H. Jørgensen (2016), “Saving behavior around intended and unin- tended childbirths.” Working paper, University of Copenhagen. [320, 321, 348] Fella, G. (2014), “A generalized endogenous grid method for non-smooth and non- concave problems.” Review of Economic Dynamics, 17 (2), 329–344. [318, 319, 332, 338, 339, 350] French, E. and J. B. Jones (2011), “The effects of health insurance and self-insurance on retirement behavior.” Econometrica, 79 (3), 69–732. [318] Gourinchas, P.-O. and J. A. Parker (2002), “Consumption over the life cycle.” Economet- rica, 70 (1), 47–89. [345] Hintermaier, T. and W. Koeniger (2010), “The method of endogenous gridpoints with oc- casionally binding constraints among endogenous variables.” Journal of Economic Dy- namics and Control, 34 (10), 2074–2088. [318] Imai, S. and M. P. Keane (2004), “Intertemporal labor supply and human capital accu- mulation.” International Economic Review, 45 (2), 601–641. [345] Iskhakov, F. (2015), “Multidimensional endogenous gridpoint method: Solving triangu- lar dynamic stochastic optimization problems without root-finding operations.” Eco- nomics Letters, 135, 72–76. [318, 347] Iskhakov, F. and M. Keane (2016), “An analysis of the Australian social security system using a life-cycle model of labor supply with asset accumulation and human capital.” Working paper, CEPAR. [320, 321, 348] Jørgensen, T. H. (2013), “Structural estimation of continuous choice models: Evaluating the EGM and MPEC.” Economics Letters, 119 (3), 287–290. [318, 332] Jørgensen, T. H. (2014), “Leisure complementarities in retirement.” Working paper, Uni- versity of Copenhagen. [348] Ljungqvist, L. and T. Sargent (2005), “Lotteries for consumers versus lotteries for firms.” In Frontiers in Applied General Equilibrium Modeling: In Honor of Herbert Scarf (T. Ke- hoe, T. Srinivasan, and J. Whalley, eds.), 95–118, Cambridge University Press, Cam- bridge. [320] Ludwig, A. and M. Schön (2013), “Endogenous grids in higher dimensions: Delaunay interpolation and hybrid methods.” Working Paper 65, University of Cologne. [318] McFadden, D. (1973), “Conditional logit analysis of qualitative choice behavior.” In Fron- tiers in Econometrics (P.Zarembka, ed.), Academic Press, New York. [319, 320, 333, 334] Quantitative Economics 8 (2017) DC-EGM method for dynamic choice models 365

Meghir, C. and L. Pistaferri (2004), “Income variance dynamics and heterogeneity.” Econometrica, 72 (1), 1–32. [345] Milgrom, P. and C. Shannon (1994), “Monotone comparative statics.” Econometrica,62 (1), 157–180. [350, 351] Oswald, F. (2016), “Regional shocks, migration and homeownership.” Working paper, Science Po. [320] Phelps, E. (1962), “The accumulation of risky capital: A sequential utility analysis.” Econometrica, 30 (4), 729–743. [319, 321, 322, 323] Prescott, E. (2005), “Nonconvexities in quantitative general equilibrium studies of busi- ness cycles.” In Frontiers in Applied General Equilibrium Modeling: In Honor of Herbert Scarf (T. Kehoe, T. Srinivasan, and J. Whalley, eds.), 95–118, Cambridge University Press, Cambridge. [320] Rogerson, R. (1988), “Indivisible labor, lotteries, and equilibrium.” Journal of Monetary Economics, 21, 3–16. [320] Rust, J. (1987), “Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher.” Econometrica, 55 (5), 999–1033. [318, 319, 343] Rust, J. (1988), “Maximum likelihood estimation of discrete control processes.” SIAM Journal on Control and Optimization, 26 (5), 1006–1023. [343] Rust, J. (1994), “Structural estimation of Markov decision processes.” In Handbook of Econometrics, Vol. 4 (D. M. R. Engle, ed.), 3081–3143, Chapter 51, Elsevier Science B.V., Amsterdam. [320] White, M. N. (2015), “The method of endogenous gridpoints in theory and practice.” Journal of Economic Dynamics and Control, 60, 26–41. [347] Yao, J., A. Fagereng, and G. Natvik (2015), “Housing, debt and the marginal propensity to consume.” Working paper, Johns Hopkins University. [321, 348]

Co-editor Karl Schmedders handled this manuscript.

Manuscript received 30 November, 2015; final version accepted 14 November, 2016; available online 27 December, 2016.