Optimal Taxation and Human Capital Policies Over the Life Cycle

NBER WORKING PAPER SERIES

OPTIMAL TAXATION AND HUMAN CAPITAL POLICIES OVER THE LIFE CYCLE

Stefanie Stantcheva

Working Paper 21207 http://www.nber.org/papers/w21207

NATIONAL BUREAU OF ECONOMIC RESEARCH 1050 Massachusetts Avenue Cambridge, MA 02138 May 2015

I want to thank James Poterba and Ivan Werning for invaluable advice, guidance, and encouragement throughout this project. Esther Duflo and Robert Townsend provided helpful and generous support. I benefited very much from Emmanuel Saez' insightful comments. I also thank seminar participants at Berkeley, Booth, Chicago, Harvard, Michigan, MIT, Penn, Princeton, Santa Barbara, Stanford, Stanford GSB, Wharton, and Yale for their useful feedback and comments. The views expressed herein are those of the author and do not necessarily reflect the views of the National Bureau of Economic Research.

NBER working papers are circulated for discussion and comment purposes. They have not been peer- reviewed or been subject to the review by the NBER Board of Directors that accompanies official NBER publications.

© 2015 by Stefanie Stantcheva. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source. Optimal Taxation and Human Capital Policies over the Life Cycle Stefanie Stantcheva NBER Working Paper No. 21207 May 2015 JEL No. H21,H23,I21,I22,I24

ABSTRACT

This paper derives optimal income tax and human capital policies in a dynamic life cycle model of labor supply and risky human capital formation. The wage is a function of both stochastic, persistent, and exogenous "ability'' and endogenous human capital. Human capital is acquired throughout life through monetary expenses. The government faces asymmetric information regarding the initial ability of agents and the lifetime evolution of ability, as well as the labor supply. The optimal subsidy on human capital expenses is determined by three considerations: counterbalancing distortions to human capital investment from the taxation of wage and capital income, encouraging labor supply, and providing insurance against adverse draws from the productivity distribution. When the wage elasticity with respect to ability is increasing in human capital, the optimal subsidy involves less than full deductibility of human capital expenses on the tax base, and falls with age. I consider two ways to implement the optimum: income contingent loans, and a tax scheme that allows for a deferred deductibility of human capital expenses. Numerical results are presented that suggest that full dynamic risk-adjusted deductibility of expenses might be close to optimal, and that simple linear age-dependent policies can achieve most of the welfare gain from the second best.

Stefanie Stantcheva Department of Economics Littauer Center 232 Harvard University Cambridge, MA 02138 and NBER [email protected]

An online appendix is available at: http://www.nber.org/data-appendix/w21207 1 Introduction

Investments in human capital, in the form of both time and money, play a key role in most people’s lives. Children and young adults acquire education, and human capital accumulation continues throughout life through job training. There is a two-way interaction between human capital and the tax system. On the one hand, investments in human capital are influenced by tax policy – a point recognized early on by Schultz (1961).1 Taxes on labor income discourage investment by capturing part of the return to human capital, yet also help insure against earnings risk, thereby encouraging investment in risky human capital. Capital taxes affect the choice between physical and human capital. On the other hand, investments in human capital directly impact the available tax base and are a major determinant of the pre-tax income distribution. Policies to stimulate human capital acquisition, which vary greatly across countries, shape the skill distribution of workers –a crucial input into optimal income taxation models. This two-way feedback calls for a joint analysis of optimal income taxes and optimal human capital policies over the life cycle, which is the goal of this paper. The vast majority of optimal tax research assumes that productivity is exogenously determined, instead of being the product of investment decisions made throughout life. Therefore, this paper addresses the following questions. First, how, if at all, should the tax and social insurance system take into account human capital acquisition? Should human capital expenses be tax deductible? Second, what parameters are important for setting optimal human capital policies, such as subsidies, and how do optimal policies evolve over time? Finally, what combination of policy instruments implements the optimum? Can simple policies yield a level of welfare close to that achieved with complex systems? Specifically, this paper jointly determines optimal tax and human capital policies over the life cycle, and incorporates essential characteristics of the human capital acquisition process. First, human capital pays off over long periods of time and thus returns are inherently uncertain: skills can be rendered obsolete by unpredictable changes in technology, industry shocks, or macroeconomic contractions. Yet, private markets for insurance against personal productivity shocks are limited. Second, there are important and growing financial costs to human capital acquisition, which can be deterrents to an efficient investment in human capital. Finally, individuals have heterogeneous intrinsic abilities, which may differentially affect their returns to human capital investment.2 Accordingly, in the model, each individual’s wage is a function of endogenous human capital and

1“Our tax laws everywhere discriminate against human capital. Although the stock of such capital has become large and even though it is obvious that human capital, like other forms of reproducible capital, depreciates, becomes obsolete and entails maintenance, our tax laws are all but blind on these matters.” Schultz (1961), page 17. 2 The empirical evidence on this issue is reviewed in section 5.1.

1 stochastic “ability.” Ability, as in the standard Mirrlees (1971) income taxation model, is a comprehensive measure of the exogenous component of productivity. Agents have heterogeneous innate abilities, which are subject to persistent and privately uninsurable shocks. Throughout their lives, they can invest in human capital with risky returns by spending money. The government maximizes a standard social welfare function under asymmetric information about any agent’s ability – both its initial level and its evolution over life – as well as labor effort. This requires the imposition of incentive compatibility constraints in the dynamic mechanism designed by the government. To describe the distortions in the resulting constrained efficient allocations, the wedges, or implicit taxes and subsidies, are analyzed. Despite the complexity of the model, a very simple relation between the optimal human capital and labor wedges is derived. The implicit subsidy for human capital expenses is determined by three goals. The first is to counterbalance the distortions to human capital indirectly stemming from the labor and savings distortions. When these distortions are perfectly counterbalanced, the tax system is neutral with respect to human capital investments. I introduce the notion of a full dynamic risk-adjusted deductibility that ensures this neutrality. The second goal is to stimulate labor supply by increasing the wage, i.e., the returns to labor. The third is to redistribute and provide insurance, taking into account the differential effect of human capital on the pre-tax income of high and low ability people. When the percentage (or proportional) change in the wage of high ability agents from human capital is not larger than that of low ability agents, human capital has a positive redistributive effect on after-tax income and a positive insurance value. It is then optimal to subsidize human capital expenses beyond simply insuring a neutral tax system with respect to human capital expenses, i.e., beyond making human capital expenses fully tax deductible in a dynamic, risk-adjusted fashion. In this case, the human capital wedge also drifts up with age. The persistence of ability shocks directly translates into a persistence of the optimal human capital wedge over time. While the sign of the human capital subsidy is exclusively determined by the complementarity between human capital and ability, its magnitude is modulated by the strength of the insurance and redistributive motives and the persistence of ability shocks. This paper considers two ways of implementing the constrained efficient allocations: Income Con- tingent Loans (ICLs), and a “Deferred Deductibility” scheme. For ICLs, the loan repayment schedules are contingent on the past history of earnings and human capital investments. In the Deferred De- ductibility scheme, only part of current investment in human capital can be deducted from current taxable income. The remainder is deducted from future taxable incomes, to account for the risk and the nonlinearity of the tax schedule. I calibrate the model based on U.S. data to illustrate the optimal policies under different assump-

2 tions regarding the complementarity between human capital and ability. When human capital has a positive redistributive or insurance value, the net stimulus to human capital is small and positive, and grows with age. It is not optimal to deviate much from a neutral system with respect to human capital, a type of “production eﬃciency” result and, hence, full dynamic risk-adjusted deductibility is close to optimal. Simple linear age-dependent human capital subsidies, as well as income and savings taxes, achieve almost the entire welfare gain from the full second-best optimum for the calibrations studied.

1.1 Related literature

The complex process of human capital acquisition has been studied in a long-standing literature, starting with Becker (1964), Ben-Porath (1967), and Heckman (1976). The model in this paper tries to adopt, in a stylized way, some of this literature’s main findings. The structural branch of the literature (Cunha, Heckman, Lochner, and Masterov, 2006; Cunha and Heckman, 2007, 2008) emphasizes that human capital acquisition occurs throughout life, underscoring the need for a life cycle model. Both ex ante heterogeneity in the returns to human capital and uncertainty matter. A large body of empirical work documents the importance of human capital as a determinant of earnings (Card, 1995; Goldin and Katz, 2008; Huggett and Kaplan, 2011), and the financial and other factors shaping individuals’ decisions to acquire human capital (Lochner and Monge-Naranjo, 2011). The subset of this literature which studies the interaction between ability and schooling for earnings – a crucial consideration for optimal policies in this paper – is reviewed in detail in section 5.1. On the other hand, the optimal taxation literature, dating back to Mirrlees (1971), and developed more recently by Saez (2001), Kocherlakota (2005), Albanesi and Sleet (2006), Golosov, Tsyvinski, and Werning (2006), Battaglini and Coate (2008), Scheuer (2014), Golosov, Troshkin, and Tsyvinski (2013), and Farhi and Werning (2013) typically assumes exogenous ability, thus abstracting from endogenous human capital investments. Therefore, this paper builds on the lifecycle framework in Farhi and Werning (2013), and introduces endogenous stochastic productivity as the result of human capital acquisition by agents. A series of papers, evolving from static to dynamic, have considered optimal taxation jointly with education policies. Bovenberg and Jacobs (2005), using a static taxation model, find that education subsidies and income taxes are “Siamese Twins” and should always be set equal to each other, which is equivalent to making human capital expenses fully tax deductible. A few subsequent static papers emphasize the importance of the complementarity between intrinsic ability and human capital (Mal- donado (2008), with two types, Jacobs and Bovenberg (2011) with a continuum of types), or between risk and human capital (Da Costa and Maestri, 2007).

3 Several recent dynamic optimal tax papers examine the impact of taxation on human capital, with important differences to the current paper. Previous dynamic models allowed for heterogeneity across agents, but not uncertainty (Bohacek and Kapicka, 2008; Kapicka, 2013), or uncertainty, but not heterogeneity (Anderberg, 2009; Grochulski and Piskorski, 2010), which precludes a discussion of redistributive policies. Findeisen and Sachs (2012) include both heterogeneity and uncertainty, but focus on a one-shot investment during “college,” before the work life of the agent starts, with a one-time realization of uncertainty. By contrast, this paper features life cycle investment in human capital, through both expenses and time, and a progressive realization of uncertainty throughout life. A complementary analysis is Kapicka and Neira (2014), who posit a different human capital accumulation process with time investments and a fixed ability, and consider the case in which effort spent to acquire human capital is unobservable. Also complementary is the work by Krueger and Ludwig (2013), who adopt a Ramsey approach by specifying ex ante the instruments available to the government, in contrast to the Mirrlees approach adopted here, which considers an unrestricted direct revelation mechanism. In their overlapping generations general equilibrium model, “education” is a binary decision that occurs exclusively before entry into the labor market. The lifecycle analysis also addresses the issue of age-dependent taxation, as explored in Kremer (2002), Weinzierl (2011), and Mirrlees et al. (2011).3

The rest of the paper is organized as follows. Section 2 presents the dynamic lifecycle model and the full information benchmark. Section 3 sets up a recursive mechanism design program using the ﬁrst- order approach. Section 4 solves for the optimal policies and interpretes the results. Section 5 contains the numerical analysis. Section 6 discusses the implementation of the optimal policies using Income Contingent Loans (ICLs) and a Deferred Deductibility scheme. Section 7 concludes and discusses three alternative applications of the model: to intergenerational transfers and bequest taxation, to entrepreneurial taxation, and to health investments.

2 A Lifecycle Model of Human Capital Acquisition and Labor Sup- ply

The economy consists of agents who live for T years, during which they work and acquire human capital. Agents who work lt ≥ 0 hours in period t at a wage rate wt earn a gross income yt = wtlt. Each period, agents can build their stock of human capital by spending money. A monetary investment 0 of amount Mt (et) generates an increase in human capital et ≥ 0. The cost function satisﬁes: Mt (e) > 0, 3This paper is more generally related to the dynamic mechanism design literature, as developed by, among many others, Fernandes and Phelan(2000), Doepke and Townsend(2006), and Pavan, Segal, and Toikka (2014).

4 0 00 ∀et > 0; Mt (0) = 0; Mt (et) ≥ 0, ∀et ≥ 0. These monetary investments add to a stock of human capital 4 acquired by expenses (“expenses” for short), st, which evolves according to st = st−1 + et. Expenses can be thought of as the necessary material inputs into the production of human capital, such as books, tuition fees, or living and board costs while at college, net of the cost of living elsewhere. The disutility cost to an agent of supplying labor eﬀort lt is φt(lt). φt is strictly increasing and convex.

The wage rate wt is determined by the stock of human capital built until time t and stochastic ability θt:

wt = wt (θt, st)

∂w ∂2w wt is strictly increasing and concave in each of its arguments ( ∂m > 0, ∂m2 ≤ 0 for m = θ, s). Importantly though, no restrictions are placed on the cross-partials. This formulation allows for human capital to affect the wage differently at different ages.5 1 Agents are born at time t = 1 with a heterogeneous earning ability θ1 with distribution f (θ1). Earning ability in each period is private information, and evolves according to a Markov process with t a time-varying transition function f (θt|θt−1) , over a fixed support Θ ≡ θ, θ¯ . There are several possible interpretations for θt, such as stochastic productivity or stochastic returns to human capital.

For example, with a separable wage form wt = θt + ht (st), for some increasing, concave function ht, θt resembles a stochastic version of productivity from the static Mirrlees (1971) model. With a wage such as wt = θtht (st), θt is perhaps more naturally interpreted as the stochastic return to human capital. To keep with the tradition in the literature, θt will be called “ability” throughout. Ability to earn income can be stochastic among others because of health shocks, individual labor market idiosyncrasies or luck. The agent’s per period utility is separable in consumption and eﬀort (both labor and training): yt u˜t (ct, yt, st; θt) = ut (ct) − φt wt (θt, st) ut is increasing, twice continuously diﬀerentiable, and concave. Denote by θt the history of ability shocks up to period t, by Θt the set of possible histories at t, t t t t 2 1 and by P θ the probability of a history θ , P θ ≡ f (θt|θt−1) ...f (θ2|θ1) f (θ1) . An allocation

{xt}t speciﬁes consumption, output, and expenses and training stocks for each period t, conditional t t t t t on the history θ , i.e., xt = x θ Θt = c θ , y θ , s θ Θt . The expected lifetime utility from an allocation, discounted by a factor β, is given by:

4 The agent cannot wilfully destroy human capital, hence et ≥ 0. The ability shock θ described right below can partially account for stochastic depreciation. Deterministic depreciation would enter as a scaling factor of the next period’s human capital stock (st+1) in all formulas. 5Note that human capital yields an immediate beneﬁt in the period in which it is acquired, as well as into the future. This reduces the uncertainty by one period and simpliﬁes the optimal formulas below.

5 T X Z y (θt) U c θt , y θt , s θt = βt−1 u c θt − φ P θt dθt (1) t t w (θ , s (θt)) t=1 t t

t where, with some abuse of notation, dθ ≡ dθt...dθ1.

Let wm,t denote the partial of the wage function with respect to argument m (m ∈ {θ, s}), and wmn,t the second order partial with respect to arguments m,n ∈ {θ, s} × {θ, s}. One crucial parameter is the Hicksian coeﬃcient of complementarity between ability and human capital in the wage function at time t (Hicks, 1970; Samuelson, 1974), denoted by ρθs,t

wθsw ρθs ≡ (2) wswθ

A positive Hicksian complementarity between human capital s and ability θ means that higher 6 ability agents have a higher marginal benefit from human capital (wθs ≥ 0). Put differently, human capital compounds the exposure of the agent to stochastic ability and to risk. A Hicksian complementarity greater than 1 means that higher ability agents have a higher proportional benefit from human capital, i.e., the wage elasticity with respect to ability is increasing in human capital, i.e., ∂ ∂w θ 7 ∂s ∂θ w ≥ 0.

A separable wage function of the form wt = θt + ht (st) for some function ht implies that ρθs,t = 0.

A multiplicative form wt = θtht (s), the one typically used in the taxation literature, implies that

ρθs,t = 1. Finally, with a CES wage function, of the form

h i 1 1−ρt 1−ρt 1−ρt wt = α1tθ + α2tst (3) ability and human capital can be substituted one for the other at a ﬁxed, but potentially time-varying rate: ρθs,t = ρt.

3 The Planning Problem

Every period, the planner can observe an agent’s choices of output yt, consumption ct, and human capital st. The informational problem is that he cannot see ability θt in any period. This implies that, while the planner knows the wage function, he cannot know the wage realization wt (θt, st), nor labor supply lt = yt/wt since those depend on the unobserved θ. Put diﬀerently, when seeing a low output produced by an agent, he can not know whether it was due to the agent’s low labor eﬀort, or to a bad ability (and, hence, wage) shock.

6 ρθs is also the Hicksian complementarity coeﬃcient between education and ability in earnings y. 7Equivalently, the wage elasticity with respect to human capital is increasing in ability.

6 This technical section sets up the planning problem, starting from the sequential problem, and defining incentive compatibility. It then goes through two steps to make this problem tractable, following the recent procedure proposed for dynamic Mirrlees models by Farhi and Werning (2013), augmented here with human capital. First, a relaxed problem based on the first-order approach is written out, which replaces the full set of incentive compatibility constraints by the agent’s envelope condition. This relaxed program is then turned into a recursive dynamic programing problem through a suitable definition of state variables.

3.1 Incentive compatibility

To solve for the constrained eﬃcient allocations, suppose that the planner designs a direct revelation mechanism, in which, each period, agents have to report their ability θt. Denote a reporting strategy, t T specifying a reported type rt after each history by r = rt θ t=1. Let R be the set of all possible t t reporting strategies and r = r1 (θ1) , ..., rt θ be the history of reports generated by reporting strategy r. Because output, savings, and human capital are observable, the planner can directly specify allocations as functions of the history of reports, according to some allocation rules c rt , y rt, s rt.8 Let the continuation value after history θt under a reporting strategy r, denoted by ωr θt, be the solution to:

t t Z r t t t y(r (θ )) r t+1 t+1 ω (θ ) = ut(c(r (θ ))) − φt t t + β ω θ f (θt+1|θt) dθt+1 wt(θt, s(r (θ )))

The continuation value under truthful revelation, ω θt , is the unique solution to:

t Z t t y (θ ) t+1 t+1 ω θ = ut c θ − φt t + β ω θ f (θt+1|θt) dθt+1 wt (θt, s (θ )) Incentive compatibility requires that truth-telling yields a weakly higher continuation utility than any reporting strategy r: r (IC): ω (θ1) ≥ ω (θ1) ∀θ1, ∀r (4)

Denote by XIC the set of allocations which satisfy incentive compatibility condition (4). To solve this dynamic problem, a version of the ﬁrst order approach is used, requiring the following assumptions:

t Assumption 1 i) u˜ (c, y, s; θ) and ∂φ(l) ∂w(θ,s) l are bounded. ii) ∂f (θt|θt−1) exists and is bounded.9 t ∂l ∂θ w ∂θt−1 t iii) f (θt|θt−1) has full support on Θ.

8 t t t Hours of work are determined residually by l r = y r /w θt, s r 9For some distributions, this derivative is not bounded and assumption 3 in Kapiˇcka (2013) could be used instead, t R θ t ∂ t t namely that for F (θt|θt−1) ≡ f (θs|θt−1) dθs, we have F (θt|θt−1) ≤ 0 and F (θt|θt−1) either concave or θ ∂θt−1 convex.

7 Suppose the agent has witnessed a history of shocks θt. Consider one particular deviation strategy s r˜t, under which he reports truthfully until period t (˜rs (θ ) = θs ∀s ≤ t − 1), and lies in period t by t 0 reportingr ˜t θ = θ 6= θt. The continuation utility under this strategy is the solution to:

t−1 0 Z r˜ t t−1 0 y(θ , θ ) r˜ t−1 0 t ω (θ ) = ut(c(θ , θ )) − φt t−1 0 + β ω θ , θ , θt+1 f (θt+1|θt) dθt+1 wt(θt, s(θ , θ ))

Incentive compatibility in (4) implies that, after almost all θt, the temporal incentive constraint holds:

ω θt = max ωr˜ θt (5) θ0

t−1 Inversely, if (5) holds after all θ and for almost al θt, then (4) also holds (see Kapiˇcka (2013), Lemma 1). If we take the derivative of promised utility with respect to (true) ability, there are two direct effects, namely on the wage (higher types have higher wages) and on the Markov transition t f (θt|θt−1), and indirect effects on the allocation through the report. By the first-order condition of the agent, all indirect effects that affect the report and the allocation are jointly zero and only the two direct effects remain. This leads to the envelope condition of the agent:

t Z t+1 t ∂ω θ wθ,t t t t+1 ∂f (θt+1|θt) ω˙ θ := = l θ φl,t l θ + β ω θ dθt+1 (6) ∂θt wt ∂θt

The envelope condition tells us how promised utility changes with type at incentive compatible allocations, i.e., how the informational rents evolve. The first term is the static rent, familiar from screening models (Mirrlees, 1971), while the second is the dynamic rent that arises because the agent has some advance information about his type tomorrow (this second term disappears with iid shocks). Let XFOA denote the set of allocations which satisfy the envelope condition (6). It can be shown that this is a necessary condition for incentive compatibility.10 The analysis is in partial equilibrium. There is a physical capital asset that yields a fixed gross interest rate R. Investments in this physical capital are called “savings.” The planner’s objective is to minimize the expected discounted cost of providing an allocation, subject to incentive compatibility as defined in (4), and to expected lifetime utility of each (initial) type θ being above a threshold U(θ). The relaxed planning problem, denoted by P FOA replaces the incentive constraint by the envelope

10An application of Theorem 2 in Milgrom and Segal (2002), under assumption 1.

8 condition, and is given by:

min Π({c, y, s} ; U(θ)Θ) = (7) {c,y,s} " T t−1 Z # X 1 t t t t−1 t t c θ − y θ + Mt s θ − s θ P θ dθ R t t=1 Θ

s.t. : U ({c, y, s} ; θ) ≥ U(θ)

y θt ≥ 0, s θt ≥ s θt−1 , c θt ≥ 0

{c, y, s} ∈ XFOA

3.2 Recursive formulation of the relaxed program

To write the problem recursively, let the future marginal rent (the second term in the envelope condition) be denoted by: Z t+1 t t+1 ∂f (θt+1|θt) ∆ θ ≡ ω θ dθt+1 (8) ∂θt The envelope condition can then be rewritten as:

t wθ,t t t t ω˙ θ = l θ φl,t l θ + β∆ θ (9) wt

Let v θt be the expected future continuation utility: Z t t+1 t+1 v θ ≡ ω θ f (θt+1|θt) dθt+1 (10)

Continuation utility ω θt can hence be rewritten as:

t ! t t y θ t ω θ = ut c θ − φt t + βv θ (11) wt (θt, s (θ ))

Deﬁne the expected continuation cost of the planner at time t, given vt−1, ∆t−1, θt−1, and st−1:

" T τ−t # X 1 Z K (v, ∆, θ , s , t) = min c (θτ ) − y (θτ ) + M s (θτ ) − s θτ−1 P θτ−t dθτ−t t−1 t−1 R τ τ τ τ τ τ=t

τ−t τ−t τ t where, with some abuse of notation, dθ = dθτ dθτ−1...dθt, and P θ = f (θτ |θτ−1) ...f (θt|θt−1). A recursive formulation of the relaxed program is then for t ≥ 2:

Z K (v, ∆, θ−, s−, t) = min (c (θ) + Mt (s (θ) − s−) − wt (θ, s (θ)) l (θ) 1 + K (v (θ) , ∆ (θ) , θ, s (θ) , t + 1))f t (θ|θ ) dθ (12) R −

9 subject to:

ω (θ) = ut (c (θ)) − φt (l (θ)) + βv (θ) wθ,t ω˙ (θ) = l (θ) φl,t (l (θ)) + β∆ (θ) wt Z t v = ω (θ) f (θ|θ−) dθ Z ∂f t (θ|θ ) ∆ = ω (θ) − dθ ∂θ− where the maximization is over the functions (c (θ) , l (θ) , s (θ) , ω (θ) , v (θ) , ∆ (θ)). For period t = 1, the problem needs to be reformulated. Suppose all agents have identical initial human capital levels s0. The problem for t = 1 is then indexed by (U (θ))Θ, the set of target lifetime utilities U (θ) for each type θ:

Z K ((U (θ))Θ , 1) = min (c (θ) + M1 (s (θ) − s0) − w1 (θ, s (θ)) l (θ) 1 + K (v (θ) , ∆ (θ) , θ, s (θ) , 2))f 1 (θ) dθ R

s.t. : ω (θ) = u1 (c (θ)) − φ1 (l (θ)) + βv (θ) . wθ,1 ω (θ) = l (θ) φl,1 (l (θ)) + β∆ (θ) w1 ω (θ) ≥ U (θ) Z ∂f t (θ|θ ) ∆ = ω (θ) − dθ ∂θ−

where the maximization is now over (c (θ) , l (θ) , ω (θ) , s (θ) , v (θ) , ∆ (θ) , ∆). Note that ∆ is now a free variable that is chosen optimally. The set of constrained eﬃcient allocations that solve the Planning ∗,F OA problem is indexed by the set of utilities (U (θ))Θ and denoted by X ((U (θ))Θ) .

3.3 Validity of the ﬁrst-order approach and assumptions

The solution to the relaxed program might not be a solution to the full program, because the envelope condition is only a necessary condition. In the static taxation model (Mirrlees, 1971), the validity of the first-order approach is guaranteed if the utility function satisfies the standard Spence-Mirrlees single- crossing property and a simple monotonicity condition on the allocation. However, in the dynamic case, the conditions imposed on the allocations are more involved (see Golosov, Troshkin, and Tsyvinski (2013) or Pavan, Segal, and Toikka (2014)), and do not always provide much simplification. Hence, for the proposed calibrations in section 5, incentive compatibility of the candidate allocation, as well as any omitted non-negativity constraints, are checked numerically, using a procedure in the spirit of

10 Farhi and Werning (2013).11

Technical assumptions: The following assumptions are used as suﬃcient conditions only to determine the sign of the optimal wedges. All formulas derived still apply without them.12

R θ0 t R θ0 t 0 Assumption 2 i) θ f (θ|θb) dθ ≤ θ f (θ|θs) dθ, ∀t, θ , and θb > θs; t ∂ ∂f (θt|θt−1) 1 ii) t ≥ 0, ∀t, ∀θ ; ∂θt ∂θt−1 f (θt|θt−1) t−1 ∂v(θ) iii) ∂θ > 0 for all θ, ∂ ∂2 iv) ∂v K ≥ 0 and ∂v2 K ≥ 0.

The first-order stochastic dominance assumption in i) ensures that, for any given future payoff function increasing in θ, higher types today have a higher expected payoff. Assumption ii) introduces a form of monotone likelihood ratio property. Assumption iii) guarantees that the expected continuation utility is increasing in the type. Assumption iv) states that the resource cost is increasing and convex in promised utility. To reduce notational clutter throughout the paper, the dependence on the full history is often left t t implicit, e.g.: ct = c θ and τLt = τLt θ . Similarly, function arguments are sometimes left out, e.g.: ∂ t wz,t = ∂z wt θt, s θ and u(ct) = ut. Et denotes the expectation at time t, conditional on θt.

4 Optimal Human Capital Policies

This section characterizes the optimal allocations, obtained as solutions to the relaxed program P FOA above, using wedges, or implicit taxes and subsidies.

4.1 The wedges or implicit taxes and subsidies

In the second best, marginal distortions in agents’ choices can be described using “wedges.” Since the agent has three possible choices (working, saving, investing in human capital) there are three marginal t distortions. For any allocation, deﬁne the intratemporal wedge on labor τL θ , the intertemporal

11An alternative could be the random contracts or “lotteries” approach (e.g.: Karaivanov and Townsend (2014), which circumvents the sufficiency problem, and has explored increasingly sophisticated methods to increase computational efficiency, and counter the “curse of dimensionality,” which arises from the exponential growth of incentive constraints when adding hidden states or actions. For optimal tax analysis, however, it is appealing to have analytical formulas, which the first order approach, when valid, can deliver, and which build the intuition for the solution. 12In addition, all theoretical results on the signs of the wedges are indeed satisfied in the simulations (Section 5).

11 t t wedge on savings (also called capital) τK θ , and the human capital wedge τS θ as follows:

t φl,t(lt) τL θ ≡ 1 − 0 (13) wtut (ct) 0 t 1 ut (ct) τK θ ≡ 1 − 0 (14) Rβ Et (ut (ct+1)) 0 t t 0 ut+1 (ct+1) 0 τS θ ≡ − 1 − τL θ ws,tlt + Mt (et) − βEt 0 Mt+1 (et+1) − τSt+1 (15) ut (ct) Wedges are akin to locally linear subsidies and taxes, and would all be zero absent government intervention. They define a measure of the amount and direction of distortion at an allocation relative to the laissez-faire allocation. The labor wedge, which is very standard in the dynamic taxation literature (Golosov, Tsyvinski, and Werning, 2006), is defined as the gap between the marginal rate of substitution and the marginal rate of transformation between consumption and labor. In the laissez-faire, it would be zero since the agent would equate the marginal rates of substitution and transformation. On the t t other hand, imagine the Planner imposes a linear tax equal to τL(θ ) and lets the agent of type θ choose his labor supply, conditional on human capital and savings in a neighborhood around l(θt). If the agent’s problem is convex, he would set his labor supply so that equation (13) holds. Hence, a positive labor wedge means that labor is distorted downwards. Similarly, the savings wedge τK is defined as the difference between the expected marginal rate of intertemporal substitution and the return on savings.

The implicit subsidy on human capital expenses τS is such that the agent’s net marginal cost from 0 investing in human capital is locally reduced to Mt (et) − τSt. Like any subsidy, it is equal to the gap between marginal cost and marginal benefit. However, human capital yields benefits in all future periods, though, so the future discounted expected stream of marginal benefits is needed. I rewrite it here recursively, replacing the latter stream by the next period’s marginal cost. Solving this forward would yield the full future stream of marginal benefits. The paper sometimes loosely refers to the wedges as taxes and subsidies, and appeals to intuitions related to a standard tax system.13 The relation between wedges and explicit taxes is studied in Section 6, which addresses the issue of implemenation. The following definitions will be needed for the formulas below. For any variable x, define the

“insurance factor” of x, ξx,t+1 :

0 0 ut+1 ut+1 ξx,t+1 ≡ Cov −β 0 , xt+1 / Et β 0 Et (xt+1) ut ut 13The wedges can also have natural interpretations as marginal taxes or combinations of marginal taxes, for instance, in an implementation with a complex tax function that depends on the full history of savings, output, and human capital. The labor wedge would be the gradient of the tax function with respect to income. The map between marginal taxes and the human capital or savings wedges is more complex and studied in detail in Subsection 6.3.

12 with ξx,t+1 ∈ [−1, 1]. If x is a ﬂow to the agent, it is a good hedge if ξ < 0, and a bad hedge otherwise. With some abuse of notation, deﬁne also:

0 0 0 βut+1 βut+1 ξx,t+1 ≡ −Cov 0 − 1, xt+1 / Et 0 − 1 Et (xt+1) ut ut which, up to an additive constant, captures the same risk properties as ξx,t+1. u c Denote by εxy,t the elasticity of xt to yt, εxyt ≡ d log (xt) /d log (yt). Let εt and εt be the uncom- pensated and compensated labor supply elasticities to the net wage, holding savings ﬁxed.14

4.2 Optimal labor and savings wedges

Before characterizing the optimal human capital subsidy, it is worth mentioning how the two standard wedges –the labor and the savings wedges– are set in this model with human capital. Both results are derived in the Appendix. The labor wedge (Appendix Proposition 6) looks as in the standard model without human capital in Farhi and Werning (2013) or Golosov, Troshkin, and Tsyvinski (2013) with one exception: the wage elasticity with respect to ability εwθ,t is not constant at 1, but rather depends on human capital. For ρ−1 instance, with a CES wage as in (3), ε = wt(θt,st) . A higher elasticity ampliﬁes the labor wedge wθ θt as it increases the value of insurance and redistribution. The standard zero distortion at the bottom and the top results from the static Mirrlees model continue to apply in the presence of human capital. The labor wedge at any age is inversely related to the elasticity of labor supply that prevails at that time. The presence of observable human capital does not change a standard result in dynamic moral hazard models with observable savings and separable utility (Rogerson, 1985). Appendix Proposition 7 shows that, at the optimum, the Inverse Euler Equation holds, so that there is a positive savings wedge τK . Note that human capital is an alternative way of transfering resources to the future, and a substitute to savings. Hence, in the absence of additional redistributive or insurance eﬀects from human capital (i.e., if ρθs = 1), the human capital subsidy and the savings wedge co-move inversely.

4.3 The net human capital subsidy

The previously deﬁned wedges are in general suﬃcient to characterize an allocation. However, in the presence of human capital there are several simultaneous distortions: a zero human capital wedge, τS,

14I.e., εc and εu are deﬁned as in the static framework (Saez, 2001), at constant savings:

2 φl(l) 00 φl(l)/l + u (c) u u0(c)2 c φl(l)/l ε = 2 ε = 2 φl(l) 00 φl(l) 00 φll(l) − u0(c)2 u (c) φll(l) − u0(c)2 u (c)

c εt With per-period utility separable in consumption and labor, u c is the Frisch elasticity of labor. 1+εt −εt

13 does not imply that human capital is undistorted. For instance, if there is a positive labor wegde, human capital is indirectly distorted downwards, since part of its return is taxed away. Similarly, if there is a positive savings wedge, human capital investments could be distorted upwards since they allow the agent to transfer resources to the future without being subject to the savings tax. Hence, part of the subsidy on human capital is simply undoing some of the effects of the labor and capital distortions on human capital. It would thus be useful to find a measure of the net distortion on human capital. A natural benchmark is the human capital subsidy which ensures that the tax system is neutral with respect to human capital. To build up the intuition, let us first think of a one-period version of the model, with s = e and linear taxes and subsidies. An agent of type θ solves:

max u(w(s, θ)l(1 − τL) − M(s) + τSs) − φ(l) s,l

0 The first-order condition with respect to human capital yields: wsl(1 − τL) − M (s) + τS = 0. Imagine 0 0 that we set the subsidy to be τS = τLM (s). Then, we obtain: (1−τL)(wsl−M (s)) = 0, i.e., conditional on the labor choice, the tax system is neutral with respect to human capital, or put differently, human capital investment is efficient. Setting the subsidy equal to the marginal cost times the labor tax rate is the equivalent of making human capital expenses fully tax deductible, i.e., taxable income is only wl − M(s). The “net subsidy” on human capital is really (appropriately scaled)

0 τS − τLM (s) tst ≡ 0 (τS − M (s))(1 − τL) As just shown, it is the subsidy that is zero when there is full deductibility, and positive when human capital is encouraged more than at full deductibility. In this more complex multi-period model, I introduce a similar concept of full dynamic, risk-adjusted deductibility. The standard contemporaneous full deductibility no longer guarantees neutrality of the tax system with respect to human capital as in the one-period model above. Instead, we need to account for four more elements. First, marginal utility varies across states because of imperfect insurance and, thus, the agent does not value one dollar of transfer the same across states. Accordingly, any monetary amount has to be adjusted for risk using the insurance factors ξ. Second, the marginal beneﬁts of human capital last into the future. An agent should only be allowed to deduct from his taxable income today his cost today minus the cost he is saving by not having to invest in the next period, i.e. the deductible amount is:

(1−ξM0 ) Mt(st − st−1) − Et (Mt+1(st+1 − st)) for any choice of st, rather than just Mt(st − st−1). R(1−τK ) Third, there is a savings wedge that distorts intertemporal transfers. To avoid an arbitrage between transferring resources as savings versus as human capital, the agent pays a capital tax on

14 the transfer of resources that would be needed for him to invest (st+1 − st) tomorrow, which is 1 0 (1 − τLt) 1 − ξ 0 Et (Mt+1(st+1 − st)). R(1−τK ) M

Finally, because there is also a human capital subsidy in the next period, τSt+1, the agent receives

(1−ξτS ) a compensation in this period, for any investment st of the amount Et(τSt+1)(st+1 − st). One R(1−τK ) can check that this ensures that, conditional on current labor supply lt and future human capital st+1, the tax system is neutral with respect to human capital.15 Hence, I deﬁne the “net wedge,” as the gross wedge from which we ﬁlter out all the parts just explained that only go toward compensating for the other distortions.

Deﬁnition 1 Deﬁne the net wedge on human capital expenses, tst, as:

d 0d τSt − τLtMt + Pt tst ≡ 0d (16) Mt − τSt (1 − τLt)

d (1−ξτS ) τ ≡ τSt − Et(τSt+1) is the dynamic risk-adjusted subsidy. St R(1−τK ) 0d 0 (1−ξM0 ) 0 M ≡ M − Et M denotes the dynamic, risk-adjusted cost. t t R(1−τK ) t+1 τK 0 0 Pt ≡ (1 − τLt) 1 − ξ 0 Et M captures the risk-adjusted savings distortion. R(1−τK ) M t+1

A zero net wedge tst again means that the tax system is neutral with respect to human capital and that there is full dynamic risk-adjusted deductibility.16

4.4 The optimal human capital subsidy

Both the labor wedge and the net human capital wedge arise because of redistribution and insurance. At the optimum, they are constrained to follow a speciﬁc relation, which is akin to a type of inverse elasticity rule. The next two propositions, which contain the two main theoretical results of the paper, describe the relation between these two wedges and determine the sign of the net human capital wedge.

Proposition 1 At the optimum and at each history, the labor and human capital wedges need to satisfy the following relation: ∗ c ∗ τLt εt tst = ∗ u (1 − ρθs,t) (17) 1 − τLt 1 + εt ∗ t−1 ¯ ∗ t−1 In addition, tst θ , θ = tst θ , θ = 0, ∀t.

Despite the complexity of the model, Formula (17) gives us a clear link between the labor wedge and the net human capital wedge. The two wedges need to co-move if and only if ρθs < 1. A particularly

15In all these terms, the insurance factors are evaluated at the eﬃcient investment level, and taken as given by the agent. See the optimization problem of the agent in Appendix equation (30). 16 0 In the one period model above, or the last period t = T , all terms in the numerator except τS − M τL drop out.

15 appealing relation arises if the wage is a CES function as in (3) and disutility is separable and isoelastic:

1 φ (l) = lγ (γ > 1) (18) γ

In this special case, the ratio of the net human capital wedge and labor wedge has to be constant over time and over agents: ∗ ∗ τLt (1 − ρ) tst/ ∗ = 1 − τLt γ This relation can be used to simply check for the optimality of a given existing tax and subsidy system. The zero distortion at the bottom and top result, familiar for the labor wedge holds here for the net human capital wedge. It does not hold for the gross wedge τSt underscoring again that the true incentive eﬀects are captured by tst, not τSt. If Assumption (2) holds, which entails that the optimal labor wedge is non-negative at all histories (see the Proof in the Appendix), the sign of the net human capital wedge is determined by the Hicksian coeﬃcient of complementarity, ρθs: the net human capital wedge is positive if and only if ρθs < 1.

Proposition 2 If assumption (2) holds, then:

∗ t tst θ ≥ 0 ⇔ ρθs,t ≤ 1

∗ t An alternative, less stringent condition than 2, would be that τLt θ ≥ 0.

The Hicksian coeﬃcient of complementarity ρθs hence plays a key role in determining the sign of the net human capital wedge and its positive or negative relation to the labor wedge in (17). This key parameter ρθs and the critical threshold of 1 are discussed next.

4.5 The redistributive and insurance values of human capital

The optimal net wedge results from the balance of two effects that human capital has on social welfare. First, because it increases returns to work, it encourages labor supply, the “Labor Supply Effect.” 17 This is beneficial in the presence of a distortion in the labor decision. At the same time, if ρθs > 0, ∂2w which is equivalent to ability being complementary to human capital in the wage ∂θ∂s > 0 , human capital mostly benefits already able agents, and hence compounds existing inequality due to intrinsic differences in θt. The opposite occurs if ρθs < 0, in which case human capital reduces inequality. This effect will be labeled the “Inequality effect.” Because ability is stochastic, it is equivalent to say that if

ρθs > 0, human capital increases exposure to risk, because it mostly beneﬁts agents when they receive

17As is clear from Formula (17), this effect would disappear if there were no distortion on labor). This is different from the direct effect of human capital on earnings, through the increase in wage, which would exist even with no distortion on labor, or with fixed labor supply, and which is filtered out from the net subsidy.

16 high productivity shocks. Put differently, if ρθs > 0, a human capital subsidy increases pre-tax income inequality. This does not mean human capital investments are undesirable: The question is what happens to post-tax inequality. The answer is that, when ρθs < 1, the positive labor supply effect dominates any potential inequality effect and human capital reduces post-tax inequality. In this case, high ability agents do not disproportionately benefit from human capital. Hence, it will be beneficial to encourage human capital on net. In this case, human capital is said to have a positive insurance effect, or a positive redistributive effect on after-tax income inequality.18 Technically, the inequality effect is the result of a rent transfer, which arises from the need to satisfy agents’ incentive compatibility constraints. If high productivity agents benefit more from a marginal increase in human capital than lower productivity agents (ρθs > 0), an increase in their human capital tightens their incentive constraints. What matters for social welfare is whether, when encouraging human capital, the increase in resources from more labor is completely absorbed by the compensation forfeited to high productivity agents to satisfy their incentive compatibility constraints, or whether there are resources left over. The latter case happens when ρθs < 1, so that human capital investments generate net resources to be used for redistribution and insurance of all agents. A special case is the multiplicatively separable wage w = θh(s) for some function h of human capital. This wage function entails ρθs = 1 and, hence, a null net wedge at the optimum. This is an application of the Atkinson and Stiglitz (1976) result on the non-optimality of differential commodity taxation if preferences satisfy a form of separability between goods and labor. It is interesting to note that it does not apply to the gross human capital wedge τS since the latter does not capture the full distortion on human capital. That a zero net subsidy is optimal when ρθs = 1 does not depend on the 19 optimality of the labor or intertemporal wedges. Indeed, when ρθs = 1, the choice of education does not reveal any additional information on ability, as all types benefit equally from it in proportional terms. Returning to the relation between the labor and net human capital wedges in (17), while the labor wedge typically has a positive redistributive or insurance value,20 the net human capital wedge has a positive redistributive value if and only if (1 − ρθs) > 0. The optimal policies must be consistent with each other: if the labor wedge is higher so as to provide more insurance, the net human capital wedge must also be higher if and only if (1 − ρθs) > 0. 18 Importantly, the social objective assigns non-negative weights to all agents, and hence all consumption gains arising from higher resources are positively weighted, even if they increase inequality. Any Pareto improving rise in human capital would be stimulated. But the subsidy does not encourage rises in human capital, which benefit some agents at the expense of having to draw resources from other agents, with a resulting negative change in total social welfare. 19As in the direct proofs of the Atkinson-Stiglitz result by Laroque (2005). 20 As long as τLt ≥ 0, i.e., Assumption 2 holds.

17 Finally, if we extend the analysis to several types of human capital, s1, .., sJ with different Hicksian coefficients of complementarity, denoted by ρθsj , j = 1, ..., J, formula (17) would apply for each type of human capital, so that at the optimum: ∗ ∗ ts t t j = sit ∀(i, j) (19) 1 − ρθsj ,t 1 − ρθsi,t All else equal, human capital types that have the highest redistributive and insurance effects would be more subsidized on net.

4.6 A recursive representation of the human capital wedge

While formula (17) links the labor and human capital wedges, it is also informative to express the optimal net wedge as a function of more primitive factors, as is done in the next proposition.

Proposition 3 i) At the optimum, the net wedge can be written as: t 0 t ∗ t µ θ ut c θ εwθ,t tst θ = t (1 − ρθs,t) (20) f (θt|θt−1) θt where the multiplier µ (θt) on the envelope condition can be written as:

µ θt = κ θt + η θt (21)

Z θ¯ t 1 κ θ = (1 − gs) 0 t−1 f (θs|θt−1) dθs (22) θt ut (c (θ , θs)) Z θ¯ 0 t−1 1 with gs = ut c θ , θs λt−1 and λt−1 = 0 t−1 f (θm|θt−1) dθm θ ut (c (θ , θm)) " Z θ¯ !# t ∗ t−1 Rβ 1 θt−1 ∂f (θs|θt−1) η θ = tst−1 θ 0 t−1 dθs (23) ut−1 (c (θ )) (1 − ρθs,t−1) εwθ,t−1 θt ∂θt−1

∗ t 0 ii) tst θ = 0 if ut (ct) = 1 and θt is iid.

This representation makes it clear that, while the coefficient of complementarity ρθs determines the sign of the net wedge, the two terms κ and η modulate its amplitude. The insurance motive is t captured in κ θ , familiar from the static taxation literature. It would be zero with linear utility. gs is the marginal social welfare weight on an agent of type θs, measuring the social value of one more dollar transferred to that individual, and 1/λt−1 is the social cost of public funds. The novel term η θt captures the previous period’s net wedge, hence indirectly the previous period’s insurance motive, weighted by a measure of ability persistence. Recall that there can be a redistributive motive in the first period if there is initial heterogeneity.21 This motive remains effective

21 In the ﬁrst period, heterogeneity in θ1 leads to:

¯ Z θ 1 µ (θ ) = 1 − λ (θ ) u0 (c (θ )) f (θ ) 1 u0 (c (θ )) 0 s 1 1 s s θ1 1 1 s

18 through η θt, the more so if types are more persistent, but vanishes as skills become less persistent. In the limit, if θt is identically and independently distributed (iid), only the contemporaneous insurance motive κ (θt) matters. If, in addition to iid shocks, utility is linear in consumption, the optimal net subsidy is zero.22

4.7 Age-dependency

In proposition (20), the optimal wedge was expressed recursively and point-wise as a function of the previous period’s wedges. One can also rewrite the formulas in terms of a weighted expectation across types at time t, using some weighting function π (θ).23 For the sake of the exposition, ability is assumed to follow a log autoregressive process:

log (θt) = p log (θt−1) + ψt (24)

ψ where ψt has density f (ψ|θt−1) , with E (ψ|θt−1) = 0. The general formula for any stochastic process, for the labor wedge and the net human capital wedges are in the Appendix :

Corollary 1 The optimal net subsidy evolves over time according to:

0 εwθ,t−1 (1 − ρθs,t−1) 1 ut−1 Et−1 tst 0 εwθ,t (1 − ρθs,t) Rβ ut 0 1 ut−1 = εwθ,t−1 (1 − ρθs,t−1) Cov 0 , log (θt) + ptst−1 (25) Rβ ut Formula (25) exhibits a drift and an autoregressive term, like the labor wedge in Farhi and Werning (2013). Dynamic incentive compatibility constraints cause a positive covariance between consumption growth and productivity: by promising them higher consumption growth, the government induces higher ability agents to truthfully reveal their types. This, however, makes insurance valuable and is captured by the drift term. The insurance motive is magniﬁed by the sensitivity of the wage to ability

εwθ,t, and the redistributive or insurance factor of human capital (1 − ρθs). The drift term inherits the sign of the latter; when ρθs ≤ 1, human capital has a positive insurance effect, which caters well to the rising need for insurance. The Hicksian complementarity ρθs might vary over life. If it is decreasing faster, the net subsidy will rise faster or fall slower over the lifecycle. The sensitivity of the wage amplifies the effect of the Hicksian complementarity coefficient in either direction. The persistence of the shock p translates into a persistence for the labor wedge. Over time, in the optimal mechanism,

R θ¯ where λ0 (θs) is the multiplier (scaled by f (θs)) on type θs target utility. With linear utility: 1 = θ λ0 (θs) f (θs) . 22Except in the first period with different utility threshold for different agents. 23Different weighting functions π (θ) lead to different recursive relations, which must hold at the optimum, and some weighting functions draw out particularly enlightening effects. Such a reformulation for the optimal labor wedge formula was proposed by Farhi and Werning (2013) for a convenient weighting function π (θ) = 1.

19 there will be “subsidy smoothing” the net subsidy becomes more strongly correlated over time as age increases, because the variance of consumption growth falls to zero, which makes the drift term in the formula above vanish.

4.8 Some special cases and link to the existing literature

How do the results from recent papers which try to relax the assumption of exogenous wages relate to these findings? First, although the human capital and labor literatures have long studied the interaction between human capital and ability (see the summary in section 5.1), the optimal taxation literature has mostly adopted the more restrictive wage form w = θs. As shown, this is a special case which entails a zero net wedge and hence, any subsidy τS obtained at the optimum merely serves to counteract the labor and savings distortions, but does not fulfill any additional redistributive or insurance role. This is the “Siamese Twins” result in Bovenberg and Jacobs (2005). Second, Jacobs and Bovenberg (2011), Maldonado (2008), Da Costa and Maestri (2007), and Anderberg (2009) do emphasize the role of the complementarity between human capital and ability (or risk, depending on their setting). This dynamic model nests their findings. The definition of the net wedge and its interpretation are certainly different than in static models: the net wedge here needs to filter out the dynamic cost, adjust for risk, and take into account the savings wedge. Its interpretation is that of the distance of the tax system to full risk-adjusted dynamic deductibility, rather than a simple contemporaneous deductbility. But subject to this suitable redefinition of the net wedge, what is striking is that the sign of the net wedge is still determined as it would be in a completely static 24 model, by the sign of 1 − ρθs. Of course, this sign can now vary over the life cycle. Indeed, all dynamic considerations are loaded on the multiplicative term µ (= κ + η), which modulates the amplitude of the net wedge. The insurance and persistence terms highlight the role of the stochastic process of productivity and especially of its persistence over time, which can amplify or dampen the effects of the complementarity between human capital and ability. In addition, the evolution of the net wedge over the life cycle is affected by the coefficient of complementarity, as shown in Section 4.7. Third, as a special case if there is only heterogeneity, but no uncertainty as in Bohacek and Kapicka (2008), the evolution of the net subsidy in (25) would become:25 0 ∗ εwθ,t−1 (1 − ρθs,t−1) 1 ut−1 ∗ tst 0 = tst−1 εwθ,t (1 − ρθs,t) Rβ ut 24This result also requires stronger assumptions (Assumption 2) than in the static model and its proof is more complex. 25Although the comparison to Bohacek and Kapicka (2008) is difficult as these authors consider time investments in human capital, rather than monetary investments.

20 ∗ If shocks are iid, the redistributive and persistence term η disappears and tst is no longer explicitly ∗ correlated with tst−1 Finally, in the real world, education policies are often set independently from taxes. If we suppose that the tax system is ﬁxed and only human capital policies can be optimized, the net subsidy will be set as before, but with additional terms that capture the indirect eﬀect of human capital on labor supply (see formula (40) in the Appendix.

5 Numerical Analysis

This numerical analysis takes a middle stand between a simple illustration of the qualitative features of the optimal mechanism and a more careful calibration with quantitative implications for the optimal wedges and their lifecycle patterns. Computational procedure, calibration details, and additional results for alternative calibrations are in the online Computational Appendix.26 This numerical analysis has four goals: First, to highlight the quantitative importance of the Hicksian coeﬃcient of complementarity ρθs, second to compare the labor and capital wedges to those arising in a standard model without human capital, third to highlight the phenomenon of subsidy smoothing over the life cycle and, fourth, to study the progressivity of the human capital subsidy and the labor wedge. Before presenting simulation results, the empirical evidence on the crucial parameter of the model, namely the complementarity between human capital and ability in the wage is discussed.

5.1 Empirical evidence on the complementarity between human capital and ability

The formulas for the optimal net wedge and its evolution (see (16) and (25)) highlighted the importance of the complementarity between ability and human capital. Unfortunately, the evidence to date does not yield a conclusive answer regarding its magnitude and further empirical work on this key parameter seems needed.27 Several studies show that college education might mostly beneﬁt already able students, implying that ρθs > 0, and that ρθs > 1 is possible. Cunha, Heckman, Lochner, and Masterov (2006) (hereafter, CHLM, 2006) estimate that the return to one year of college is around 16% at the 5th percentile of the math test scores distribution, as opposed to 26% at the 95th percentile. There is only scarce evidence on the complementarity between on-the-job training and ability, but the same authors show that on-

26Available at https://dl.dropboxusercontent.com/u/12222201/Computational_Appendix_S_Stantcheva.pdf. 27There is some evidence for early childhood investments. Ashenfelter and Rouse (1998) suggest that lower ability children beneﬁt more from schooling, a ﬁnding also in line with Cunha, Heckman, Lochner, and Masterov (2006) for early childhood interventions. This is consistent with ρθs ≤ 1 for childhood investments. There is evidence that the Hicksian complementarity can change over life, as suggested by the structural literature on human capital formation (e.g.: Cunha and Heckman (2007)), so that estimates for primary and secondary schooling could are only of limited use for the analysis of higher education or job training (which are the focus of this paper).

21 the-job training is mostly taken up by those with higher AFQT scores, which might, all else equal, signal that they have a higher marginal return from it.28 The OECD(2004) reports that training mostly beneﬁts skilled workers in terms of higher wages, but beneﬁts low-skilled workers in terms of job security. Huggett, Ventura, and Yaron (2011) use a multiplicatively separable functional form for the wage in their structural model of time investments in human capital (implying ρθs = 1), which generates a lifecycle path of earnings that matches the data well.

5.2 Calibration

Calibration to US data: To calibrate the model, I construct a “baseline economy,” which has the same primitives as in Section 2, but no social planner and, hence, no optimal tax system. Instead, the linear labor taxes, capital (here, savings) taxes, and human capital subsidies are set to their current averages in the US. Public and private subsidies in the US cover around 50% of total resource costs of formal higher education. However, in the model, human capital expenses are a comprehensive measure, including all types of formal and informal investments, and some mostly unsubsidized expenses (e.g.: textbooks, computers). In addition, there are practically no subsidies beyond the initial 4 years of higher education. Hence, the linear subsidy in the baseline model, applicable to all expenses, is reduced to 35% for the ﬁrst 2 periods and to 0 thereafter.29 The linear labor tax rate is set to 13%, and the savings tax to 25%.30 Agents can borrow and save at a constant interest rate and start with zero asset holdings.31 In this baseline economy, some parameters are set exogenously based on the existing literature, while others are set to endogenously match two key moments from the data, namely, a wage premium and a ratio of human capital expenses to lifetime income, as explained in more detail next.

Functional Forms: Agents live for T = 30 periods, each representing roughly 2 years of life. They work for 20 periods and spend 10 years in retirement. Preferences during working years are given by:

γ κ yt u˜ (ct, yt, st, θt) = log (ct) − , κ > 0, γ > 1 γ wt (θ, s) R During retirement, utility is simplyu ˜ (ct) = log (ct). The aforementioned literature seemed to indicate that ρθs > 0. The wage is assumed to be CES, which allows us to cleanly focus on the (constant)

28 It is also not evident that the test scores used as measures of “ability” are themselves exogenous, especially at later ages. 29See indicators B, in OECD “Education At A Glance, 2013.” 30Only interest income and short term capital gains are taxed at ordinary income rates. Taxes on a lot of dividends and long term capital gains stop at 20% (plus possibly 2.9% for the new Medicare tax above $250K of total income, and various state taxes). 31Because of a natural debt limit consideration, agents do not choose to hold negative assets.

22 Hicksian coeﬃcient of complementarity ρθs = ρ:

1 1−ρ 1−ρ 1−ρ wt (θ, s) = θ + bss (26) where bs > 0 is a scaling factor. Two values of ρ are studied in the main text, namely, ρ = 0.2 and ρ = 1.2, with additional values considered in the Computational Appendix. The cost function of human capital contains an adjustment term and takes the form: 2 et Mt (st−1, et) = blet + ba st−1 with bl and ba the linear and the adjustment components of cost. Consistent with the high persistence in earnings documented in Storesletten, Telmer, and Yaron (2004), ability is assumed to follow a geometric random walk:

log θt = log θt−1 + ψt

1 2 2 with ψt ∼ N − 2 σψ, σψ . Exogenously calibrated parameters: The baseline model has γ = 3 and κ = 1, which implies a Frisch elasticity of 0.5 (Chetty, 2012). The discount factor is set to β = 0.95, and the net interest rate to

5%. The adjustment cost is normalized to ba = 2. There is large variation across empirical estimates 2 of the variance of ability σψ. A medium-range value of 0.0095 (Heathcote, Storesletten, and Violante (2005), hereafter HSV 2005) is adopted, with several alternative values for this parameter explored in the online Computational Appendix.

Endogenously matched parameters: The scaling factor for human capital bs and the linear cost parameter bl are set to match two statistics in the data: a wage premium and a ratio of human capital expenses to income. One complication which arises is that the model does not a priori restrict investments to occur only during the traditional “college” years. Indeed, one of the motivations for this study is the lifelong nature of human capital investments. However, most available estimates for wage premiums in the literature are for college education, with scarce evidence on job training. This difficulty can be overcome by redefining “college” appropriately for the model. Following Autor, Katz, et al. (1998) (hereafter AKK, 1998) who find 42.7% full-time college equivalents, the top 42.7% in the population of the baseline economy, ranked by educational expenses, are assumed to represent the real- life college-goers. Their average wage relative to the bottom 42.7% is set to match the wage premium for college estimated in the literature.32 These estimates range from 1.58 in Murphy and Welch (1992), to between 1.66 and 1.73 for AKK (1998), and above 1.80 in Heathcote, Perri, and Violante (2010). The calibration targets a mid-range value of 1.7.

32The middle 14.6% are omitted to clearly delineate between college-goers and others. Indeed, because of the continuous investments in the model, there is no sharp distinction between “college” and “no-college,” and in particular, no notion of “a degree.”

23 Table 1: Calibration

Deﬁnition Sim 1 Sim 2 Source/Target Exogenously calibrated or normalized ρ Hicksian complementarity 0.2 1.2 CHLM (2006) κ Disutility of work scale 1 1 γ Disutility elasticity 3 3 Chetty (2012) 2 σψ Variance of productivity 0.0095 0.0095 HSV (2005) T Working periods 20 20 T r Retirement periods 10 10 β Discount factor 0.95 0.95 R Gross interest rate 1.053 1.053 Endogenously calibrated in baseline economy bs Scale of HC in wage 0.09 0.1 Wage premium (AKK, 1998) bl Linear cost 0.5 0.5 (OECD, 2013, US Dept. Educ, 2010)

Turning to the second target, the net present value of higher education expenses over the net present value of lifetime income in the data is 13%.33 However, since agents invest beyond traditional college years in the model, there needs to be an allowance for later-in-life investments. It is assumed that college costs represent 2/3 of all lifetime investments in human capital, so that the target ratio of the net present value of lifetime human capital expenses and the net present value of lifetime income is 19%. Table1 summarize the resulting parameters. Using the computed policy functions, a Monte Carlo simulation with 100,000 draws is performed for each value of ρθs. The initial states are set to yield a zero present value resource cost for the allocation. This ensures comparability across simulations, and gives a sense of outcomes achievable without outside government revenue. To judge how well the baseline model ﬁts the data, the Online Appendix contains graphs of the mean consumption, income, human capital expenses and assets, as well as the variances of wages, income and consumption over the life cycle. The processes for earnings and wages match the data well to a ﬁrst order. In particular, earnings and the wage follow a near-random walk (as reported in Heathcote, Storesletten, and Violante (2005)) and the variances of log wages and earnings match those in Heathcote, Perri, and Violante (2010).

33Computed using data on attendance at diﬀerent types of colleges and costs (Chang Wei, 2010; OECD, 2013). See the detailed calculations in the online Computational Appendix.

24 5.3 Results

A brief note on the presentation of the wedges is required. When agents are at the corner solution of no investment, which occurs in the baseline calibration for most simulated paths approximately after periods 12 or 13, the subsidy is indeterminate, as long as it remains below an upper bound that does not induce agents to invest. Hence, the policy function for the gross wedge is set to zero for agents after they stop investing.34 To make it most comparable to an explicit subsidy, it is presented as a 0 fraction of the marginal cost, i.e., τSt/Mt (et) .

The role of the Hicksian complementarity ρθs : Figure1 presents the human capital wedges. For this figure only, the focus should be on periods before the black lines, as many agents no longer invest in human capital later in life, and wedges are normalized thereafter, as just explained. First, panel (a) shows that the optimal wedge on human capital is higher and grows faster when human capital has a positive insurance or redistributive effect (ρθs ≤ 1). When ρθs = 0.2, the wedge starts from 1% and grows to 19%; for ρθs = 1.2, it instead starts slightly negative and grows to only 11%. Panel (b) illustrates that with ρθs = 1.2, the net subsidy is negative, so that human capital expenses are made less than fully deductible. Conversely, when ρθs = 0.2, human capital expenses are subsidized on net beyond pure deductibility. The comparison between panels (a) and (b) once more highlights that the true incentive effect is different from the gross wedge when there are several wedges present. Finally, the net subsidy is growing when ρθs ≤ 1 and declining otherwise, as seen in the drift term of formula (25). However, the net wedges are very small. Hence, the overall system remains very close to neutrality with respect to human capital expenses. Put differently, full dynamic risk-adjusted deductibility is very close to optimal.35 This is akin to a “production efficiency” result: human capital is an intertemporal decision, with persistent effects, and distorting it for redistributive or insurance reasons is relatively costly unless the redistributive or insurance effects are very strong. The values of the complementarity between human capital and ability and the volatility of ability clearly matter for this result. Moving further away from a multiplicatively separable wage, i.e., increasing |ρθs − 1| leads to larger net wedges in absolute value. Similarly, a higher volatility increases the value of insurance and yields a higher optimal net wedge if human capital has a positive insurance value (ρθs < 1) and a lower optimal net wedge if not.

34 The net wedge cannot be innocuously normalized to zero; hence, when agents no longer invest, it is set to the right hand side in (16), a level that will not artiﬁcially induce agents to invest. 35The gross wedge is correspondingly smaller than 50% real world subsidy. Bear in mind, however, that the gross wedge here is more comprehensive, as it covers all human capital expenses, even those unsubsidized in the real world, and is available throughout life, not exclusively for college years.

25 Figure 1: Average human capital wedges over time

(a) Gross Wedge τSt (b) Net Wedge tst

0.25 0.2 ρ = 0.2 ρ = 0.2 0.2 ρ = 1.2 ρ = 1.2 0.15 0.15

0.1 0.1

0.05 0.05 0 0 −0.05

−0.1 −0.05 0 5 10 15 20 0 5 10 15 20 t t

Dashed lines mark the time after which most agents no longer invest in human capital. (a) The gross wedge is higher and grows faster when human capital has positive redistributive and insurance eﬀects (ρθs < 1). The wedge is normalized to zero for zero investment (a corner solution). (b) The provision of dynamic incentives also creates a value for insurance.

If ρθs < 1, human capital has positive redistributive and insurance values, and expenses are subsidized on net at a rising rate. Conversely, if ρθs > 1, expenses are only partially deductible, and deductibility decreases over time.

The production efficiency result hinges on the fact that the government jointly optimizes the full system, and insurance can also be achieved through the labor wedge. In particular, the planner is able through transfers to endogenously relieve “credit constraints,” which might be motivating high subsidies in the real world, without having to distort human capital acquisition at the margin. Indeed, Figure2 panel (a) shows that it is the optimal labor wedge which is large and rises over time to provide insurance against widening income dispersion. Hence also, a large part of the growing human capital wedge merely goes towards compensating for this growing disincentive to accumulate human capital. But most of the redistribution and insurance goes through the intratemporal labor wedge rather than the intertemporal net human capital wedge. This model ignores the general equilibrium effects of human capital accumulation. Positive spillovers from education, through peer effects, innovation, or social and political channels could lead to larger subsidies for human capital.

The labor and savings wedges with and without human capital: Figure2 compares the labor and savings wedges that arise at the optimum for each value of ρθs to those that arise from a model without human capital where the wage is equal to exogenous ability, wt = θt. The process for θ is rescaled so that both models generate the same variance of wages over the lifecycle (which, as noted, are consistent with the paths in Heathcote, Perri, and Violante (2010)). Hence, the exercise consists in holding observed wage volatility ﬁxed and asking how optimal policies would change if that path of

26 Figure 2: Average labor and savings wedges over time

(a) Labor Wedge (b) Savings Wedge

0.35 0.01 ρ = 0.2 ρ = 0.2 0.3 ρ = 1.2 ρ = 1.2 no HC 0.008 no HC 0.25

0.2 0.006

0.15 0.004 0.1 0.002 0.05

0 0 0 5 10 15 20 0 5 10 15 20 t t

“No HC” denotes the case without human capital. (a) The labor wedge is lower and grows slower over time in the presence of human capital if human capital has positive redistributive and insurance eﬀects (ρθs < 1). (b) The savings wedge is also lower in the presence of human capital with a redistributive and insurance value (ρθs < 1), but larger otherwise. wages was generated by endgeonous human capital or by a purely exogenous process.

The labor wedge is smaller and grows slower when ρθs < 1. This is because human capital fulfills part of the redistribution and insurance and takes some of the burden away from the labor wedge. On the other hand, when ρθs > 1, human capital does not fulfill any insurance or redistributive role and the full burden is again on the labor wedge, which is almost unaffected.36

Panel (b) plots the savings wedge over time. For ρθs = 0.2, it starts at 0.5% of the gross interest on savings, which corresponds to a 10% tax on net interest, and declines to zero.37 It is higher when

ρθs > 1. Relative to a case with no human capital, savings are less distorted in the presence of human capital with a redistributive value (ρθs < 1) since then, again, human capital helps the planner to screen agents. On the other hand, savings are more distorted when human capital increases the informational rents of high ability agents (ρθs > 1).

Subsidy smoothing over the life cycle: Figure 3 plots the net subsidy in period t against the net subsidy at t − 1, for young adults (t = 5 in panel (a)), and for middle-aged workers (t = 13 in panel (b)). Earlier in life, the net wedge is more volatile from one period to the next, but becomes more deterministic over time, leading to a “subsidy smoothing” result. The dynamic taxation literature has highlighted a similar “tax smoothing” result for the labor wedge (which continues to hold in the

36It is clear from Formula (32) in the Appendix that the eﬀect of human capital on the labor wedge are ambiguous. 37 The equivalent tax on net interest,τ ˜Kt solves (1 + (R − 1) (1 − τ˜Kt)) = R (1 − τKt).

27 presence of human capital). The intuitions for these results are the same. A persistent productivity shock early in life has repercussions over many periods, leading to a larger present value change in the income ﬂow than a later shock. Consumption in early years will react strongly to unexpected changes in ability, as the agent attempts to smooth out the shock. Accordingly, the variance of consumption growth is initially large, but decreases to zero over time. The drift term in the net subsidy formula (see (25)), which is proportional to the covariance between ability and consumption growth, tends to zero towards retirement. Then, only the autoregressive term of the random walk remains.

Figure 3: Subsidy smoothing over life

(a) tst against tst−1 at t = 5 (b) tst against tst−1 at t = 13

0.024 0.05

0.022 0.045

0.02 0.04

0.018 0.035

0.016 0.03

0.014 0.025 0.01 0.012 0.014 0.016 0.018 0.02 0.025 0.03 0.035 0.04 0.045

The net human capital wedge becomes more correlated from one period to the next as age increases, because the variance of consumption growth, which drives changes in the subsidy over time, vanishes. Figures are for ρθs = 0.2.

Progressivity or Regressivity of the net human capital and labor wedges: Figure4 plots τLt against the contemporaneous productivity shock, θt, at t = 19 and5 does the same for tst. The net human capital subsidy is always regressive, as long as ρθs > 0 because higher ability people beneﬁt more from human capital. When ρθs > 1, the labor wedge is regressive in the short run, which is true for a similar parameterization of the problem without human capital. On the other hand, when

ρθs < 1, the labor wedge exhibits a short-run progressivity. The reason for this reverse pattern is that both the labor wedge and the net subsidy are tools to insure against earnings risk. Along the optimal path, they need to evolve consistently, according to the key relation in (17). The labor wedge always has positive insurance and redistributive eﬀects. The same is true for the net subsidy only if ρθs < 1.

Accordingly, the two instruments co-move positively when ρθs < 1 and negatively when ρθs > 1.

28 Figure 4: Progressivity and regressivity of the labor wedge

(a) τLt against θt for ρθs = 0.2 (b) τLt against θt for ρθs = 1.2

0.2 0.35

0.18 0.3 L

0.16 L τ τ 0.25 0.14

0.2

Labor wedge 0.12 Labor wedge

0.1 0.15

0.08 0 0.5 1 1.5 2 0.1 Contemporaneous ability θ 0 0.5 1 1.5 2 Contemporaneous ability θ

The labor wedge exhibits short-run progressivity when ρθs < 1, but short-run regressivity when ρθs > 1.

Figure 5: Regressivity of the net human capital wedge

(a) tst against θt for ρθs = 0.2 (b) tst against θt for ρθs = 1.2

0.06 −0.005

−0.01 0.05 −0.015

0.04 −0.02 net wedge net wedge −0.025 0.03 −0.03

0.02 −0.035 0 0.5 1 1.5 2 0 0.5 1 1.5 2 Contemporaneous ability θ Contemporaneous ability θ

The net wedge is always regressive in the short run, but more so when ρθs > 1.

29 6 Implementation and Policy Implications

In this section, we step away from a direct revelation mechanism in which agents report their types and instead consider what policy tools can implement those same allocations. There are many possible implementations and the theory does not provide guidance as to which one to choose, unless we include additional considerations such as political or administrative constraints. I present two implementations which are particularly appealing because they are variations on policies that already exist.

6.1 Income contingent loans

Before presenting the Income contingent loans (hereafter, ICLs) the decentralized economy is described and some notation introduced. In the decentralized economy, agents choose their human capital expenses et, income yt, and savings bt in a risk-free account at a gross rate R. Initial wealth is zero 38 and initial human capital is s0. The government can observe and keep record of the histories of consumption, output, human capital, and wealth. ∗ t t Denote by mt θ the optimal allocation of the social planner’s problem after history θ for any choice variable m ∈ {c, y, b, e} (the Online Appendix shows how to construct them from the recursive t t t−1 allocations derived above). For any history θ and subset of variables m ⊂ {c, y, b, e}, let Qm(θ ) be the set of values for these variables at time t, which could arise in the planner’s problem after history t−1 ∗ t−1 t θ , i.e., such that for some θ ∈ Θ, mt = mt θ , θ . For a history of observed choices m , denote by t t t t ∗ s Θ m the set of all histories θ consistent with these choices, i.e., all θ such that ms = ms (θ ) for all s ≤ t. Assumption 3 guarantees that in the planner’s problem, the histories yt, et can be uniquely inverted to identify the history of abilities, θt.

Assumption 3 Θt yt, et is either the empty set or a singleton for all histories yt, et.

In the proposed ICL scheme, loans are combined with a standard income tax based on contemporaneous income TY (yt) , and a history-independent savings tax TK (bt). In each period, the agent is oﬀered a government loan Lt (et) as a function of his human capital expenses, and is required to make t−1 t−1 a history contingent repayment Dt L , y , et, yt , as a function of the full history of past loans and earnings, as well as current income, and human capital expenses. The agent’s problem is then to select t t t t the supremum over ct θ , yt θ , bt θ , et θ θt in: 38Initial wealth and human capital can be heterogeneous as long as they are observable, and will enter the proposed repayment schedule as additional conditioning variables.

30 T Z t X yt (θ ) V (b , θ ) = sup u c θt − φ P θt dθt (27) 1 0 0 t t t w (θ , s (θt−1) + e (θt)) t=1 t t t−1 t 1 s.t: c θt + b θt + M e θt − b θt−1 − L e θt t R t t t t−1 t t t t−1 t−1 t−1 t−1 t t t t ≤ yt θ − Dt L θ , y θ , et θ , yt θ − TY yt θ − TK bt θ t t−1 t t st θ = st−1 θ + et θ , s0 given, et θ ≥ 0, b0 = 0, bT ≥ 0

Proposition 4 The optimum can be implemented through human capital loans Lt (et) , with repay- t−1 t−1 ments Dt L , y , et, yt , contingent on the history of loans and earnings, current income, and human capital expenses, together with a history-independent savings tax TK (bt) , and an income tax on contemporaneous income TY (yt).

6.1.1 Features of the ICL system:

Figure 6 illustrates the implementation through ICLs, by plotting the average loan received and average consolidated payment made as a fraction of contemporaneous income. The loan received naturally declines over life, as less human capital investments are needed. The exact level of the loan would change if agents faced credit constraints: More stringent credit constraints would naturally lead to larger loans. The repayment is very mildly hump-shaped initially increasing with age, illustrating the insurance provided by the contingent repayment schedule and the increasing ability to pay over life. Figure 7 highlights the insurance role of the repayment schedule. It shows that repayments, as a fraction of income, are increasing in the contemporaneous income realization. The income-history contingent nature of repayments is clearly seen in their large dispersion at a given contemporaneous income: repayments depend on the full past, not only on current income. Repayments are larger when ρθs > 1, highlighting the need to provide more insurance when human capital does not have a redistributive and insurance value. Repayments are also higher and steeper in income when volatility is higher, as the need for insurance is then increased (not shown here, but clear in the formulas in the terms κ).

Figure 8 shows that the implied interest rate, deﬁned as the ratio of the repayment and the outstanding balance, is also increasing in the contemporaneous income realization for both values of ρθs. Clearly, the interest rate is not ﬁxed throughout life or across agents: it is history dependent. In the Online Appendix, Figures 9 and 10 show, respectively, that the interest rate is declining in both the current loan size and the outstanding loan balance, so that there is a “quantity discount.” Similar to repayments, a larger variance of ability would make the interest rate steeper in income.

31 Figure 6: Income history contingent loans

0.7 Loans: ρ = 1.2 0.6 Repayment: ρ = 1.2 Loans: ρ = 0.2 Repayment: = 0.2 0.5 ρ

0.4

0.3

0.2

0.1

0 0 5 10 15 20 t

Loans and Repayments as a fraction of contemporaneous income. Loans are high early in life, while repayments increase to provide insurance.

Figure 7: Insurance through contingent repayments

(a) repayment against yt for ρθs = 0.2 (b) repayment against θt for ρθs = 1.2

0.5 0.6

0.45 0.5 0.4

0.35 0.4 0.3 Repayment 0.25 Repayment 0.3

0.2 0.2 0.15

0.1 0.1 0 1 2 3 4 0 1 2 3 4 Contemporaneous income Contemporaneous income

The repayment schedule provides insurance: repayments are higher when income is higher. The history-contingent nature is seen in the dispersion of the repayment at a given contemporaneous income.

32 Figure 8: Interest rate on income-contingent loans

(a) interest rate againt yt for ρθs = 0.2 (b) interest rate against yt for ρθs = 1.2 1 0.8

0.7 0.8 0.6

0.6 0.5

0.4 0.4 Interest rate Interest rate

0.3 0.2 0.2

0 0.1 0.5 1 1.5 2 0.5 1 1.5 2 Contemporaneous income Contemporaneous income

The interest rate provides insurance: interest paid is higher when income is higher.

6.1.2 Comparison of the proposed implementation to existing ICLs

Certain types of ICLs for college education are used in several countries, including the U.S., New Zealand, Australia, the U.K., Chile, South Africa, Sweden, and Thailand, and have been growing in popularity as a tool to reduce public spending on education, while guaranteeing equality of access, and providing partial insurance in economic hardship (see Chapman (2006) for an overview).39 The loans sometimes depend on the level of education acquired, the type of degree or ﬁeld, and are indexed to the costs of education, which mirrors the proposed loan above. Several features of this optimal scheme have counterparts in existing policies. First, in some countries (e.g., Australia and New Zealand), repayments are directly collected through the tax authority and integrated with the tax system. The coercive tax power of the government is required to prevent agents from dropping out after the realization of their incomes, because more successful earners ex post cross-subsidize less successful ones. This is made clear by the prominent failure of the so-called “Yale Plan,” an attempt at risk-pooling within cohorts of students by Yale University in the 1970s. The plan suﬀered from a typical adverse selection problem: students with the best earnings prospects did not join or dropped out (Palacios, 2004). Second, in the optimal scheme, repayments are consolidated repayments for all past loans and need not, in any way, be equal to the total loan amount for a given agent; there is an implicit subsidy or tax fully integrated into the system for insurance and redistribution

39In the U.S., an important rationale seems to have been the fear that ﬁxed repayment loans would discourage students from careers in the lower-paying public sector Brody (1994). The ﬁrst schemes introduced in 1994 were Income Contingent Repayments (ICRs) for public sector jobs. In 1997, the College Cost Reducation and Access Act (CCRAA), introduced Income Based Repayment (IBR) beyond public sector jobs. Australia is one of the success stories since 1989 with its nationwide scheme (“Higher Education Contribution Scheme”) that automatically enrolls students in an ICL, with repayments collected directly through the tax system.

33 purposes. Many existing systems also have a substantial insurance component (see e.g., Dearden et al. (2008)) although their repayments are typically more tightly linked to the actual amount borrowed and do not redistribute much. Third, until recently in ICL schemes observed in practice, the focus was almost exclusively on the downside, so that repayments could be deferred or forgiven in times of economic hardship, but repayments were not necessarily increasing after a good stream of earnings. In Australia, however, there is a clear income-contingency, with different repayments schemes at different income threshold levels (Chapman, 2006). The U.K. has been moving closer to the Australian scheme and, while the repayments were previously fixed for any income above a threshold, since 2012, the loans are repaid with an income-contingent interest rate. There are two differences between the optimal scheme proposed and its real world counterparts. First, loans are optimally made available throughout life – not only for young adults in University – for instance, for expenses related to job training or continuing education.40 Fourth, the optimal repayment schedules depend not only on current income and the outstanding loan balance, but rather on the full history of earnings and past loans.41 There is very little history-dependence in real-world ICLs, possibly with the exception of Sweden, which uses a two-year averaging of earnings to determine repayments. Note that other policies exhibit exactly the kind of history dependence which is required here, for instance, social security and some types of tax-free savings accounts. However, the numerical analysis below shows that the gain from history-dependent policies, relative to simpler history-independent (but age-dependent) policies is not very large for the calibration chosen, implying that history-independent ICLs might be close to optimal.

6.2 Implementation with iid shocks and wealth dependence

A natural question is when the history dependence of the optimal policies proposed above can be reduced. In the special case of independently and identically distributed (iid) shocks, wealth and the starting stock of human capital each period can serve as suﬃcient statistics for the full past. A similar implementation for physical capital, in the absence of human capital, is studied in Albanesi and Sleet (2006).

Proposition 5 If θ is iid, the optimum can be implemented in an economy with borrowing constraints, an initial assignment of wealth, and an income tax schedule Tt (bt−1, st−1, yt, et) that depends on the beginning-of-period wealth and human capital stocks, as well as on contemporaneous income and human capital investment.

40This discrepancy could, however, be bridged if the cost function M is such that human capital investments only occur early in life. 41In particular, the sum of past loans is not a suﬃcient statistic for the full sequence of loans Lt−1.

34 The intuition for this result is that, conditional on human capital st−1, there is a direct mapping between the social planner’s cost of providing the optimal allocation to an agent with promised utility vt−1 and the beginning-of-period wealth bt−1 of the agent in the decentralized equilibrium. This recursive implementation allows a relatively simple map between the derivatives of the proposed tax function Tt and the optimal wedges. While the labor wedge is equal to the marginal income tax, the human capital wedge is not in general equal to the marginal subsidy, that is, the expected reduction in tax from an incremental investment in human capital. The relation between the human capital wedge and the tax system is:

0 0 d ∂Tt ut+1 ∂Tt+1 ∂Tt+1 ut+1 ∂Tt+1 ∂Tt+1 τSt = − + Et β 0 Et − + Cov β 0 , − ∂et ut ∂et+1 ∂st ut ∂et+1 ∂st

d (1−ξτS ) where recall that τ ≡ τSt − Et(τSt+1). A positive wedge does not necessarily imply a positive St R(1−τK ) marginal subsidy: it can be engineered either directly through expected positive marginal subsidies, or, instead, more indirectly through the risk properties of the optimal tax schedule. If the marginal tax reduction from human capital ∂Tt+1 − ∂Tt+1 is high when marginal utility of consumption is high, ∂et+1 ∂st human capital is a good hedge, and the covariance term is positive. It is then possible in theory that the overall wedge is positive even if expected marginal subsidies are zero.42 Note that this tax system can instead be reformulated as a means-tested grant G, such that:

Gt (yt, et|bt−1, st−1) = −Tt (bt−1, st−1, yt, et)

Means-tested grants for higher education are very common in many countries, while wealth contingent income taxes are not. In the US for instance, Pell grants take assets as well as contemporaneous income into account.43 With iid shocks, the previous implementation through ICLs can also be modiﬁed to use wealth instead of the full history of earnings and loans to determine the repayments. The wealth and income contingent repayment schedules Dt (bt−1, st−1, yt, et), are such that:

TY (yt) − Lt (et) + Dt (bt−1, st−1, yt, et) = Tt (bt−1, st−1, yt, et)

42 See Kocherlakota (2005) for the case of savings, where the income tax depends on the full history of incomes, and wealth carries no additional information about the past. However, in general, expected marginal subsidies are not zero. Human capital, like wealth in Albanesi and Sleet (2006), carries information value about the past, which restricts the marginal subsidies at the optimum. 43 Beyond higher education only, grants for job training exist for the unemployed (hence, somewhat means-tested), for youth at risk (“YouthBuild” in the US), or for diﬃcult to employ seniors (the “Senior Aide Program”), most often in the form of a direct provision of training. Some programs do provide funds for training based on need, such as the “Adult and Dislocated Worker Program” or the “Trade Adjustment Assistance.”

35 6.3 A deferred, risk-adjusted human capital expense deductibility scheme

This implementation directly addresses the debate about whether education expenses should be tax deductible. Boskin(1975) has argued that a true economic depreciation of educational expenses, for which the net present value of the deduction is equal to the expense, would recover neutrality of the tax system with respect to human capital. Even starker is the argument by Bovenberg and Jacobs (2005) that a purely contemporaneous deduction of education expenses from taxable income is suﬃcient.

Assume that ρθs = 1 so that the optimal net wedge is zero, and the focus is on the human capital subsidy that aims to neutralize the distortionary eﬀects of income and savings taxes. If the proposed tax system Tt (bt−1, st−1, yt, et) is diﬀerentiable in all its arguments (at least over the range of equilibrium path values), then it must satisfy:44

0 ∂Tt ut+1 ∂Tt+1 ∂Tt+1 − + βEt 0 − ∂et ut ∂et+1 ∂st 0 ∂Tt 0 1 0 1 0 ut+1 ∂Tt+1 0 = Mt − Et Mt+1 − 1 − ξM 0,t+1 Et βR 0 Et Mt+1 (28) ∂yt R R ut ∂bt On the other hand, the pure contemporaneous deductibility scheme proposed by Bovenberg and Jacobs (2005) would imply that at all dates ∂Tt = M 0 (e ) ∂Tt , ∀t.45 In this dynamic model, this is ∂yt t t ∂et only true for the last period T in which agents face a static problem. In all earlier periods, recall the four additional effects discussed in Section 4.3 that appeared in the definition of the net wedge ts (the risk adjustment, the dynamic cost, the savings wedge and the one period ahead subsidy). The policy counterpart to the concept of full risk-adjusted dynamic deductibility introduced in Section 4.3 is a risk-adjusted deferred deductibility scheme. 1 0 For the sake of the exposition, assume that β = R and Mt (et) = 1. Start from period T , in which a simple deductibility of expenses is sufficient with − ∂TT = ∂TT , and work backwards in order to rewrite ∂eT ∂yT the total change in the tax burden from an incremental investment at t as:

T −t 0 ∂Tt X ut+j−1 ∂Tt+j−1 − = (1 − β) βj−1E ∂e t u0 ∂y t j=1 t t+j−1 0 T −t 0 u ∂TT X ut+j ∂Tt+j ∂Tt+j + βT −tE T − βjE − (29) t u0 ∂y t u0 ∂b ∂s t T j=1 t t+j−1 t+j−1 The optimal subsidy is hence equivalent to a deferred deductibility scheme, in which a fraction (1 − β) of the human capital expense of time t are deducted from taxable income in each subsequent period.

44Obtained by applying Formula (20) to this tax system. See appendix formula (38) for a rewriting. 45 The discrepancy in this recommendation to the one in Bovenberg and Jacobs(2005) does not come from the restrictive wage function assumed there, because the argument made in this subsection is for the case in which ρθs = 1. Adding back ρθs 6= 1 would push the optimum even further away from pure contemporaneous deductibility, as an additional net encouragement or discouragement of human capital would be desirable. It arises instead from the dynamics and risk.

36 Intuitively, with changing income tax rates ∂Tt , a non-dynamic deductibility scheme would mean that ∂yt the expense of time t would be deducted at time t0s marginal tax rate, but the returns to this investment would accrue in the future when the agent faces potentially different marginal tax rates. If income is growing over time and marginal tax rates are increasing, as in a progressive tax system, there would be insufficient incentives to invest in human capital. A poor student would see little benefit from deducting his tuition fees from his low income, only to pay high marginal tax rates in the future. In addition, there is a “no arbitrage” term, (the last term in (29)), which takes into account the relative shift in the future tax schedule from more savings versus more human capital stock. Since savings and human capital are two ways to transfer resources intertemporally, there should at the optimum be no incentive to substitute from one to the other because of a tax advantage. If full deductibility is not the target, (i.e., ρθs 6= 1), implementing the optimal allocation requires adding back the optimal net subsidy each period, on top of this scheme.46 Tax incentives in the form of deduction schemes for higher education expenses are common, but are usually contemporaneous to the expense. In the US, the American Opportunity Credit and the Lifetime Learning Credit allow families to claim a deduction up to a certain level per student per year for college, as well as for books, supplies, and required equipment. The deferred deductibility scheme sets the right incentives in utility terms. In monetary terms, Figure9 illustrates that the scheme is progressive and provides insurance. The figure plots the fraction of the net present value of human capital expenses that the agent cannot deduct against the net present value of lifetime income. Lower income agents deduct more than they spent on human capital, while higher income agents deduct less and hence implicitly cross-subsidize lower income agents.

6.4 Welfare gains and simple age-dependent policies

What are the welfare gains from the optimal mechanism, and how do they compare to the welfare gains from simpler, linear, but age-dependent policies? The first line of table 2 shows the welfare gains from the second best relative to the laissez-faire economy, with no taxes or subsidies, in which agents are unconstrained to borrow and save at the gross interest rate R. Four cases are distinguished, according to the value of the complementarity coefficient between human capital and ability ρθs and the volatility 2 47 of the productivity shock σψ. Welfare gains are expressed as the percentage increase in consumption which, if received every year after all histories, would yield the same gain in lifetime utility. Given the clear age trends in the above figures and in the optimal formulas, it is natural to compare

46The linear cost is for exposition only, since we want interior solutions in general. See appendix formula (37) for the general case. 47The high volatility (0.0161) is from Storesletten, Telmer, and Yaron (2004).

37 Figure 9: Progressivity of the deferred deductibility scheme

0.5

0 not deducted

−0.5 % of NPV lifetime HC cost

−1 10 15 20 25 30 35 NPV lifetime income

Lower income people can deduct more than they spend on human capital, while higher income people deduct less.

Table 2: Welfare gains from simpler policies

ρθs = 0.2 ρθs = 1.2 Volatility Medium High Medium High Welfare gain from second best 0.85% 1.60% 0.98% 1.76% Welfare gain from linear age-dependent policies 0.79% 1.53% 0.94% 1.74% as % of second best 93% 95.6% 95.5% 98.5%

Medium volatility is 0.0095, high volatility is 0.0161. Line 1 expresses the gain from the second best, relative to the laissez-faire economy, in terms of the equivalent increase in consumption after all histories. Welfare gains are higher when human capital has negative redistributive and insurance values (ρθs > 1). Line 2 shows the gain from linear age-dependent policies relative to the laissez-faire, while line 3 expresses this gain as a fraction of the gain from the second best. Age-dependent linear policies achieve a very large fraction of the welfare gain from the second best.

38 the full optimum to simple age-dependent policies. The policy under consideration sets the linear human capital subsidy, the linear income tax rate, and the linear capital tax rate at each age equal to their cross-sectional averages at that age. It is numerically challenging to precisely optimize over age-dependent tax rates, given the number of periods and the presence of three instruments; hence, this procedure delivers a lower bound for the welfare gains. It turns out, however, that even this lower bound is very tight. Indeed, the third line in table 2 shows that welfare gains as a fraction of the second best gains range from 93% for a low-volatility and low ρθs case to a surprising 98.5% for a high volatility and high ρθs scenario. This suggests that – for these particular calibrations – the history-dependent policies can be informative about simpler, history-independent policies, and that the bulk of the gain comes from the age-trend of optimal policies.48 These ﬁndings are reminiscent of Mirrlees (1971) conclusion that static optimal income tax schedules appear close to linear. They also echo closely recent ﬁndings from the dynamic taxation literature (Farhi and Werning, 2013), suggesting that the addition of human capital per se does not change this result.49

7 Conclusion

This paper studies optimal dynamic taxation and human capital policies over the life cycle in a dynamic Mirrlees model with heterogeneous, stochastic, and persistent ability. The government aims to provide redistribution and insurance against adverse draws from the ability distribution. However, the government faces asymmetric information about agents’ ability – both its initial level and its stochastic evolution over life – and about labor supply. The constrained efficient allocations were obtained using a dynamic first order approach, and are characterized by wedges or implicit taxes and subsidies. Formulas for the optimal labor and human capital wedges, as well as for their evolution over time are derived. A simple relation was derived between the optimal labor and human capital wedges. A crucial consideration for the design of optimal policies is whether human capital has overall positive redistributive and insurance effects. If human capital stimulates labor supply, and hence generates additional resources more than it amplifies existing pre-tax inequality, it reduces after-tax income inequality on

48A word of caution is needed. Given the order of magnitude of 10e-5, it is actually very challenging to estimate these welfare comparisons between the second best and age-dependent policies precisely, especially over longer horizons. The numbers should only be taken as evidence for small welfare gains, not precise welfare calculations. The author is currently studying which other factors are most important for the welfare gains. For instance, stepping away from log utility and from a log-normal process for θ can increase welfare gains. 49They can also justify the use of simpler instruments in papers such as Krueger and Ludwig (2013). Note that allowing for nonlinear, age-dependent instruments, such as a repayment scheme that conditions only on current income and current loan would further increase the welfare gains.

39 balance. This occurs when the elasticity of the wage with respect to ability is decreasing in human capital or, put differently, when high ability agents do not disproportionately benefit from human capital. In this case, the optimal net subsidy on human capital expenses is positive and increasing over time. The optimal allocations can be implemented with income contingent loans, the repayment schedules of which depend on the full history of human capital investments and earnings. If shocks to ability are independently and identically distributed, a Deferred Deductibility scheme, in which part of current human capital expenses can be deducted from future years’ incomes, can also implement the optimum. The simulations reveal that the optimal net human capital wedges are small, which implies that neutrality of the tax system relative to human capital expenses is close to optimal for the proposed calibrations. In addition, simple age-dependent linear taxes and subsidies can achieve almost the entire welfare gain from the full second-best relative to the laissez-faire outcome. Further numerical work could shed light on whether this result remains true with different preferences, in particular with higher risk aversion. There are three alternative questions for which this analysis can provide some answers. First, should the tax system preserve neutrality with respect to the choice between bequests and human capital spending, two important ways in which parents can transfer resources to their children? The life cycle can be reinterpreted as a dynastic household, in which parents finance their children’s human capital, with persistence in stochastic ability, and partial or full depreciation of human capital across generations. A reinterpretation of the optimal formulas derived here shows that the optimal subsidy on parents’ investments in children’s human capital increases with their children’s expected labor taxes, and decreases with the bequest tax. If the elasticity of the children’s wage with respect to ability is decreasing in human capital, the positive redistributive and insurance effects of human capital transfers on the next generation push the optimal subsidy higher.50 Second, should productivity-enhancing investments by entrepreneurs or the self-employed be made tax-deductible? Workers in the model can instead be viewed as entrepreneurs or self-employed, who can invest in their businesses’ productivity through expenses for research, knowledge acquisition, or training of the workforce, generating risky and persistent profits.51 If innovation and productivity expenses disproportionately increase risk, they should be made less than fully deductible.52 Finally, the analysis could inform the study of optimal

50Higher than the subsidy needed to simply guarantee tax neutrality with respect to the choice between bequests and human capital transfers. 51These productivity investments, embodied in people, are distinct from investment in physical machines or ﬁnancial assets. 52In the sense that the elasticity of business earnings to risk is increasing in innovative activity. In practice, many expenses for the self-employed can be deducted, but there is no special category for innovation or productivity-enhancing expenses.

40 policies towards people’s investments in health – another type of human capital – which also involves both time and monetary costs, as well as heterogeneity and uncertainty over life. This theoretical research points to two important empirical explorations that could shed light on the mechanisms behind, and the magnitudes of, the optimal policies. First, how does the complementarity between ability and human capital change over life? In contrast to schooling or higher education, there is little evidence on this for human capital investments later in life, such as job training. This creates a fruitful link between optimal taxation and the long-standing empirical labor literature on this issue. Secondly, it has not been documented entirely yet how strongly people react to current and future expected taxes when making their human capital investment decisions. While challenging, estimating the long-term eﬀects of taxation on human capital accumulation appears very important.

41 References

Albanesi, S., and C. Sleet (2006): “Dynamic optimal taxation with private information,” The Review of Economic Studies, 73(1), 1–30.

Anderberg, D. (2009): “Optimal Policy and the Risk Properties of Human Capital Reconsidered,” Journal of Public Economics, 93(9-10), 1017–1026.

Ashenfelter, O., and C. Rouse (1998): “Income, Schooling, And Ability: Evidence From A New Sample Of Identical Twins,” The Quarterly Journal of Economics, 113(1), 253–284.

Atkinson, A. B., and J. E. Stiglitz (1976): “The design of tax structure: direct versus indirect taxation,” Journal of Public Economics, 6(1), 55–75.

Autor, D., L. Katz, et al. (1998): “Computing Inequality: Have computers changed the labor market?,” Quarterly Journal of Economics, 113(4), 1169–1213.

Battaglini, M., and S. Coate (2008): “Pareto eﬃcient income taxation with stochastic abilities,” Journal of Public Economics, 92(3), 844–868.

Becker, G. S. (1964): Human capital: A theoretical and empirical analysis, with special reference to education. University of Chicago Press.

Ben-Porath, Y. (1967): “The production of human capital and the life cycle of earnings,” The Journal of Political Economy, 75(4), 352–365.

Bohacek, R., and M. Kapicka (2008): “Optimal human capital policies,” Journal of Monetary Economics, 55(1), 1–16.

Boskin, M. J. (1975): “Notes on the tax treatment of human capital,” .

Bovenberg, L. A., and B. Jacobs (2005): “Redistribution and education subsidies are Siamese twins,” Journal of Public Economics, 89(11).

Brody, E. (1994): “Paying back your country through income-contingent student loans,” San Diego L. Rev., 31, 449.

Card, D. (1995): “Earnings, Schooling, and Ability Revisited,” Research in Labor Economics, 35, 111–136.

Chang Wei, C. (2010): “What is the Price of College?,” Discussion paper, National Center for Education Statistics Department of Education (NCES 2011-175).

Chapman, B. (2006): “Income Contingent Loans for Higher Education: International Reforms,” in Handbook of the Economics of Education, Vol. 2, ed. by E. A. Hanushek, and F. Welch, pp. 1437–1503. Amsterdam: Elsevier.

Chetty, R. (2012): “Bounds on elasticities with optimization frictions: A synthesis of micro and macro evidence on labor supply,” Econometrica, 80(3), 969–1018.

Cunha, F., and J. J. Heckman (2007): “Identifying and estimating the distributions of ex post and ex ante returns to schooling: A survey of recent developments,” Labour Economics, 43(4), 738–782.

(2008): “Formulating, identifying and estimating the technology of cognitive and noncognitive skill formation,” Journal of Human Resources, 43(4), 738–782.

Cunha, F., J. J. Heckman, L. Lochner, and D. V. Masterov (2006): “Interpreting the evidence on life cycle skill formation,” Handbook of the Economics of Education, 1, 697–812.

Da Costa, C. E., and L. J. Maestri (2007): “The risk properties of human capital and the design of government policies,” European Economic Review, 51(3), 695–713. Dearden, L., A. Goodman, E. Fitzsimons, and G. Kaplan (2008): “Higher Education Funding Reforms in England: the Distributional Eﬀects and the Shifting Balance of Costs,” Economic Journal, 118, 100–125.

Doepke, M., and R. M. Townsend (2006): “Dynamic mechanism design with hidden income and hidden actions,” Journal of Economic Theory, 126(1), 235–285.

Farhi, E., and I. Werning (2013): “Insurance and taxation over the life cycle,” The Review of Economic Studies, 80(2), 596–635.

Fernandes, A., and C. Phelan (2000): “A recursive formulation for repeated agency with history dependence,” Journal of Economic Theory, 19(2), 223–247.

Findeisen, S., and D. Sachs (2012): “Education and optimal dynamic taxation: The role of income-contingent student loans,” Discussion paper, Working Paper Series, Department of Economics, University of Zurich.

Goldin, C. D., and L. F. Katz (2008): The race between education and technology. Harvard University Press.

Golosov, M., M. Troshkin, and A. Tsyvinski (2013): “Redistribution and Social Insurance,” NBER Working Paper, (17642).

Golosov, M., A. Tsyvinski, and I. Werning (2006): “New dynamic public ﬁnance: a user’s guide,” in NBER Macroeconomics Annual 2006, Volume 21, pp. 317–388. MIT Press.

Grochulski, B., and T. Piskorski (2010): “Risky human capital and deferred capital income taxation,” Journal of Economic Theory, 145(3), 908–943.

Heathcote, J., F. Perri, and G. L. Violante (2010): “Unequal we stand: An empirical analysis of economic inequality in the United States, 1967–2006,” Review of Economic Dynamics, 13(1), 15–51.

Heathcote, J., K. Storesletten, and G. L. Violante (2005): “Two views of inequality over the life cycle,” Journal of the European Economic Association, 3(2-3), 765–775.

Heckman, J. J. (1976): “A life-cycle model of earnings, learning, and consumption,” Journal of Political Economy, 84(4), S9–S44.

Hicks, J. (1970): “Elasticity of substitution again: substitutes and complements,” Oxford Economic Papers, 25, 289–296.

Huggett, M., and G. Kaplan (2011): “Human capital values and returns: Bounds implied by earnings and asset returns data,” Journal of Economic Theory, 146(3), 897–919.

Huggett, M., G. Ventura, and A. Yaron (2011): “Sources of Lifetime Inequality,” American Economic Review, 101(7), 2923–54.

Jacobs, B., and A. Bovenberg (2011): “Optimal taxation of human capital and the earnings function,” Journal of Public Economic Theory, 13(6), 957–971.

Kapicka,ˇ M. (2013): “Eﬃcient allocations in dynamic private information economies with persistent shocks: A ﬁrst-order approach,” The Review of Economic Studies, 80(3), 1027–1054.

Kapicka, M. (2013): “Optimal mirrleesean taxation with unobservable human capital formation,” Discussion paper, Working paper, UC Santa Barbara.

Kapicka, M., and J. Neira (2014): “Optimal taxation in a life-cycle economy with endogenous human capital formation,” Discussion paper, Working paper, UC Santa Barbara.

Karaivanov, A., and R. M. Townsend (2014): “Dynamic ﬁnancial constraints: Distinguishing mechanism design from exogenously incomplete regimes,” Econometrica, 82(3), 887–959.

Kocherlakota, N. R. (2005): “Zero expected wealth taxes: A Mirrlees approach to dynamic optimal taxation,” Econometrica, 73(5), 1587–1621.

43 Kremer, M. (2002): “Should taxes be independent of age?,” Unpublished paper Harvard University.

Krueger, D., and A. Ludwig (2013): “Optimal Progressive Labor Income Taxation and Education Subsidies When Education Decisions and Intergenerational Transfers Are Endogenous,” American Economic Review, Papers and Proceedings, 103(3), 496 –501.

Laroque, G. R. (2005): “Indirect taxation is superﬂuous under separability and taste homogeneity: A simple proof,” Economics Letters, 87(1), 141–144.

Lochner, L. J., and A. Monge-Naranjo (2011): “The Nature of Credit Constraints and Human Capital,” The American Economic Review, 101(6), 2487–2529.

Maldonado, D. (2008): “Education policies and optimal taxation,” International Tax and Public Finance, 15(2), 131–143.

Milgrom, P., and I. Segal (2002): “Envelope theorems for arbitrary choice sets,” Econometrica, 70(2), 583–601.

Mirrlees, J. (1971): “An exploration in the theory of optimum income taxation,” The review of economic studies, 38, 175–208.

Mirrlees, J., S. Adam, T. Besley, R. Blundell, S. Bond, R. Chote, M. Gammie, P. Johnson, G. Myles, and J. Poterba (2011): Tax by design: The Mirrlees review. Oxford University Press.

Murphy, K. M., and F. Welch (1992): “The structure of wages,” The Quarterly Journal of Economics, pp. 285–326.

OECD (2004): “Improving Skills for More and Better Jobs: Does Training Make a Diﬀerence?,” Discussion paper, OECD Employment Outlook.

(2013): “Education at a Glance,” Discussion paper, OECD Employment Outlook.

Palacios, M. (2004): Investing in Human Capital: A Capital Markets Approach to Higher Education Funding. Cambridge: Cambridge University Press.

Pavan, A., I. Segal, and J. Toikka (2014): “Dynamic mechanism design: A myersonian approach,” Econo- metrica, 82(2), 601–653.

Rogerson, W. P. (1985): “Repeated Moral Hazard,” Econometrica, 53, 69–76.

Saez, E. (2001): “Using elasticities to derive optimal income tax rates,” The review of economic studies, 68(1), 205–229.

Samuelson, P. A. (1974): “Complementarity: An essay on the 40th anniversary of the Hicks-Allen revolution in demand theory,” Journal of Economic Literature, 12(4), 1255–1289.

Scheuer, F. (2014): “Entrepreneurial Taxation with Endogenous Entry,” American Economic Journal: Eco- nomic Policy, 6(2), 126–163.

Schultz, T. W. (1961): “Investment in human capital,” The American Economic Review, 6, 1–17.

Storesletten, K., C. I. Telmer, and A. Yaron (2004): “Consumption and risk sharing over the life cycle,” Journal of Monetary Economics, 51(3), 609–633.

Weinzierl, M. (2011): “The surprising power of age-dependent taxes,” The Review of Economic Studies, 78(4), 1490–1518.

Werning, I. (2011): “Nonlinear capital taxation,” Unpublished.

44 Appendix

A Additional Results

Explaining full dynamic risk-adjusted deductibility

Imagine there is a linear subsidy τS, a linear income tax τL, and a linear savings tax τK . The full dynamic risk-adjusted deductibility scheme conceptually allows the agent to deduct from his taxable (1−ξM0 ) income Mt(st − st−1) − Et (Mt+1(st+1 − st)), makes him pay the capital tax on R(1−τK )

1 0 (1 − τLt) 1 − ξM 0 Et (Mt+1(st+1 − st)) R (1 − τK )

(1−ξτS ) and compensates him for next period’s subsidy through a transfer Et(τSt+1(st+1 − st)). The R(1−τK ) 0 0 insurance factors ξτS , ξM 0 , and ξM are evaluated at the eﬃcient choice. Let bt−1 denote savings brought into period t. The agent in period t with shock θt, human capital st−1, and wealth bt−1 hence solves (writing out only periods t and t + 1):

max {ut(Rbt−1(1 − τKt−1 + wt(θt, st)lt(1 − τLt) − Mt(st − st−1) st,lt,bt

(1 − ξM 0 ) 1 0 +τLt(Mt(st−st−1)− Et(Mt+1(st+1−st)))−τKt (1−τLt)(1−ξM 0 )Et(Mt+1(st+1−st)) R(1 − τK ) R (1 − τK )

(1 − ξτS ) + Et(τSt+1(st+1 − st))) R(1 − τK )

+ βEt((ut+1(Rbt(1 − τKt) + wt+1(θt+1, st+1)lt+1(1 − τLt) − Mt+1(st+1 − st) + τSt+1(st+1 − st)))} (30)

Taking the FOC of the agent with respect to st and using the forms of the insurance factors at the eﬃcient choices, shows that this will induce, conditional on labor and on future human capital st+1, an eﬃcient choice of human capital:

1 l w (s , θ ) = M 0(s − s ) − E (M 0 (s − s )) (31) t t t t t t t−1 R t t+1 t+1 t

Proposition 6 i) At the optimum, the labor wedge is equal to:

∗ t t 0 t u τLt θ µ θ ut c θ εwθ,t 1 + εt ∗ t = t c (32) 1 − τL (θ ) f (θt|θt−1) θt εt with µ θt = η θt + κ θt as in (21), where η θt can be rewritten recursively as a function of the past labor wedge, τL,t−1:

∗ t−1 " c Z θ¯ # t τLt−1 θ Rβ εt−1 θt−1 ∂f (θs|θt−1) η θ = ∗ t−1 0 t−1 u dθs 1 − τL,t−1 (θ ) ut−1 (c (θ )) 1 + εt−1 εwθ,t−1 θt ∂θt−1

∗ t−1 ¯ ∗ t−1 ii) τLt θ , θ = τLt θ , θ = 0, ∀t.

45 Proposition 7 At the optimum, the inverse Euler Equation holds:

Z θ¯ Rβ 1 t+1 0 t = 0 t+1 f (θt+1|θt) dθt+1 (33) ut (c (θ )) θ ut+1 (c (θ ))

Note that both Anderberg (2009) and Da Costa and Maestri (2007) ﬁnd versions of the zero distortion at the top and the Inverse Euler Equation.

Corollary 2 The labor wedge evolves over time according to:

c u 0 τLt εwθ,t−1 εt 1 + εt−1 1 ut−1 Et−1 u c 0 (1 − τLt) εwθ,t 1 + εt εt−1 Rβ ut u 0 1 + εt−1 1 ut−1 τLt−1 = εwθ,t−1 c Cov 0 , log (θt) + p (34) εt−1 Rβ ut (1 − τLt−1)

Construction of the ICL schedule The proof of Proposition4 is in the Online Appendix. First, the loan is set to exactly cover the cost of human capital:

Lt (et) = Mt (et) ∀t, ∀et (35)

53 The savings tax TK (bt) is constructed to guarantee zero private wealth holdings. The repayment schedule D and income tax TY are such that, along the equilibrium path, the optimal allocations from the social planner’s problem are aﬀordable for each agent after all histories, given zero asset holdings:

t−1 t−1 ∗ t−1 ∗ t−1 ∗ t−1 ∗ t−1 ∗ t−1 Dt L , y , et θ , θ , yt θ , θ + TY yt θ , θ = yt θ , θ − ct θ , θ

t−1 t−1 t−1 t−1 −1 −1 t−1 for all L , y such that θ ∈ Θ M1 (L1) , ..., Mt−1 (Lt−1) , y 6= ∅, and all θ ∈ Θ, where the history of education et−1 is inverted from Lt−1 using (35). The repayment schedule on off- equilibrium allocations – those allocations which are not optimally assigned to any type in the social planner’s program – is set to be sufficiently unattractive, to ensure that agents do not select them. Intuitively then, conditional on entering a period with no savings, and with a given history of loans and output, agents only face the choice of allocations available in the planner’s problem after ability histories which, up to this period, are consistent with the observed choices. By the temporal incentive compatibility of the constrained efficient allocation, they will choose the allocation designed for them. This set of instruments defines a decentralized allocation rule, which, to an agent with past history Lt−1, yt−1 , assigns an allocation:

n t−1 t−1 t−1 t−1 t−1 t−1 t−1 t−1 o cˆt L , y , θt , yˆt L , y , θt , ˆbt L , y , θt , eˆt L , y , θt t,θt

n t t t to The equilibrium allocation as a function of ability histories cˆ θ , yˆt θ , ˆb θ , eˆ θ can be de-

53The construction builds on Werning (2011), who shows also that the savings tax can be redeﬁned to implement non zero savings, at the expense of modifying the repayment schedule. The repayment scheme could also allow for private savings, and directly condition on their history (bt−1). See the next implementation proposed, with non-zero private wealth holdings.

46 duced from the decentralization rule using the recursive relation:

t t−1 t−1 t−1 t−1 mˆ t θ =m ˆ t L θ , y θ , θt for m ∈ {c, y, b, e}

t−1 t−1 −1 −1 t−1 where θ ∈ Θ M1 (L1) , ..., Mt−1 (Lt−1) , y is unique by assumption 3. The decentralization rule is said to implement the optimum from the planner’s problem for a given set of promised t utilities (U (θ))Θ if, for all t and θ , the decentralized allocations under this rule coincide with the t ∗ t 54 social planner’s optimal allocations, i.e.,m ˆ t θ = mt θ for m ∈ {c, y, b, e}.

Implementation with iid shocks:

The recursive problem with iid shocks is nested in the formulation in section 2, if the states θt−1 and

∆t−1, which account for persistence, are omitted and the distribution of shocks is f (θ) each period.

Allocations can be expressed as functions of the reduced state space (vt−1, st−1) for each θt, and the government’s continuation cost is K (vt−1, st−1, t).

For this implementation, interpret the initial ability θ1 as uncertainty, like all other shocks θt, rather than intrinsic heterogeneity.The government selects an initial promised utility U1. All agents again start with the same human capital s0, and receive an initial wealth level b0 assigned by the government.

The government also sets borrowing limits b , and wealth is constrained by bt ∈ Bt ≡ [b , ∞). The t t proposed decentralization rule allocates cˆt, yˆt, ˆbt, eˆt to each agent type, following some mappings from observed initial wealth bt−1 and human capital st−1:

ˆ cˆt, yˆt, eˆt : Bt−1 × R+ × Θ → R+ and bt : Bt−1 × R+ × Θ → Bt

Starting from an initial wealth level b0, the recursive decentralization rule can be mapped into a sequential allocation for all θt.

Let Vt (b, s) denote the value of an agent with beginning-of-period wealth b and human capital s. A decentralization rule cˆt, yˆt, ˆbt, eˆt , an initial assignment of wealth b0, a sequence of borrowing T limits {b } , and initial human capital level s0 form a decentralized equilibrium if, in all periods, t t=1 cˆt, yˆt, ˆbt, eˆt attains the supremum in the agent’s problem in (36):

Z y (θ) 0 Vt (b, s) = sup ut (c (θ)) − φt + βVt+1 (b (θ) , s + e (θ)) f (θ) dθ (36) c,y,b0,e w (θ, s + e (θ)) 1 s.t: c (θ) + M (e (θ)) + b0 (θ) = y (θ) − T (b, s, y (θ) , e (θ)) + b ∀θ t R t 0 c, y, e :Θ → R+ and b :Θ → Bt = [bt, ∞) with VT +1 ≡ 0, bT ≡ 0.

A constrained eﬃcient allocation from the planner’s problem is implemented as a decentralized equilibrium if it arises as an equilibrium choice of agents in the above problem, and delivers expected lifetime utility V (b0, s0) = U1.

54In the planner’s problem, savings are indeterminate when consumption is controlled, and without loss of generality, agents could be saving zero.

47 The link between the human capital wedge and the explicit tax system is:

∗ ∗ 0 ∗ ! −∂Tt (K (vt−1, st−1, t) , st−1, yt , et ) ut+1 ct+1 ∂Tt+1 ∂Tt+1 τSt (vt−1, st−1, θt) = + βEt 0 ∗ − ∂et ut (ct ) ∂et+1 ∂st

∗ ∗ ∗ ∗ ∗ ∗ where ct = ct (vt−1, st−1, θt) , ct+1 = ct+1 (vt , st−1 + et , θt+1), and Tt+1 = Tt+1 (bt, st, yt+1, et+1) is evaluated at:

∗ ∗ ∗ ∗ ∗ vt = vt (vt−1, st−1, θt), bt = K (vt , st−1 + et , t + 1) st = st−1 + et ∗ ∗ ∗ ∗ ∗ ∗ yt+1 = yt+1 (vt , st−1 + et , θt+1), et+1 = et+1 (vt , st−1 + et , θt+1)

General Deductibility Scheme: With nonlinear cost, the general expression for the deferred deductibility scheme is:

T −t 0 0 ∂Tt X j−1 ut+j−1 ∂Tt+j−1 0 1 0 T −t uT ∂TT − = β Et 0 Mt+j−1 − Mt+j + β Et 0 (37) ∂et u ∂yt+j−1 R u ∂yT j=1 t t T −t 0 X j ut+j 0 0 ∂Tt+j ∂Tt+j − β Et 0 1 − ξM0,t+j Et+j−1 Mt+j − u ∂bt+j−1 ∂st+j−1 j=1 t The first set of terms capture the deferred deductibility from the income tax base. Because the marginal cost is no longer constant, the deduction in period t + j occurs at the dynamic marginal cost 0 1 0 effective in that period (Mt+j − R Mt+j+1), not at the “historic” marginal cost faced by the agent at 0 0 1 0 the time of the purchase Mt, i.e., a purchase of ∆e at time t is deducted as (Mt+j − R Mt+j+1)∆e 55 from yt+j at t + j. Otherwise, there would be arbitrage possibilities. Similarly to the text, the “no-arbitrage” term takes into account the differential tax increases from physical capital versus 0 0 human capital, except that now the nonlinear, risk adjusted cost 1 − ξM 0,t+j Et+j−1 Mt+j en- 0 ters the picture. The deduction is risk adjusted, as witnessed by the insurance factors, ξM 0,t+1 ≡ 0 0 βut+1 0 βut+1 0 −Cov 0 − 1,Mt+1 / Et 0 − 1 Et Mt+1 , defined in the text. ut ut As stated in the text, formula (28) can be rewritten in terms of the risk-adjusted, dynamic cost:

0 0 ∂Tt ut+1 ∂Tt+1 ∂Tt+1 ∂Tt 0d ∂Tt 1 0 ut+1 ∂Tt+1 0 − +βEt 0 − = Mt − 1 − 1 − ξM0,t+1 Et βR 0 Et Mt+1 (38) ∂et ut ∂et+1 ∂st ∂yt ∂yt R ut ∂bt

0d Solving this relation forward, yields the analogous to (37) with Mt .

Optimal human capital subsidy for a given income tax system

Suppose there is a linear tax on earnings τL. Deﬁne the consumption functionc ˜(θ) = c(βv(θ) − ω(θ), θ, l(θ)). The objective is now:

K(v , ∆ , s , θ , t) = min{c˜(βv(θ) − ω(θ), θ, l(θ)) + wtl(θ) + Mt(s(θ) − s ) + 1 K(v(θ), ∆(θ), s(θ), θ, t + 1)} R

The envelope condition is unchanged. Note that the ﬁrst-order condition of the agent with respect

55 1 Note that with linear cost, as in the main text, this is just (1 − β) with β = R for all t < T , and 1 for t = T.

48 to labor needs to hold as an additional constraint in the problem (since labor can now no longer be chosen directly): 0 wtut(˜c(θ))(1 − τL) = φl,t(l(θ))

From this equation:

dc˜(θ) w u0 (˜c(θ))(1 − τ )ds(θ) + w u00(˜c(θ))(1 − τ ) dl(θ) = φ (l(θ))dl(θ) s,t t L t t L dl(θ) ll,t and hence: 0 dl ws,tut(˜c(θ))(1 − τL) = (39) ds 00 φl,t(l(θ)) φll,t(l(θ)) − wut (˜c(θ))(1 − τL) u0(˜c(θ)) The ﬁrst-order condition with respect to education now yields:

u µt wθ,t 1 µt wθ,t wt 1 + ε dl tst = t 2 φl,t(l(θ))(1 − ρθs,t) − τLwt + t 2 φl,t(l(θ)) c (40) f (θt|θt−1) wt ws,tl(θ) f (θt|θt−1) wt ws,tl(θ) ε ds

d˜l with ds as given by (39).

B Derivations and Proofs

Additional Proofs are in the Online Appendix.

Proof of Propositions 1,3, and6: The expenditure function:c ˜(l, ω − βv, θ) deﬁnes consumption indirectly as a function of labor l, current period utility (˜u = ω − βv), and the current realization of the type (note that conditional on these variables, consumption does not depend on human capital s). Then, ω (θ) = ut (c (θ))−φt (l (θ))+ βv (θ) becomes redundant as a constraint, and the choice variables are (l (θ) , s (θ) , ω (θ) , v (θ) , ∆ (θ)).

Let the multipliers in program (12) be (in the order of the constraints there) µ (θ), λ−, and γ−. The problem is solved using the optimal control approach where the “types” play the role of the running variable, ω (θ) is the state (andω ˙ (θ) its law of motion), and the controls are l (θ) , v (θ) , s (θ) and ∆ (θ). The Hamiltonian is:

t (˜c (l (θ) , ω (θ) − βv (θ) , θ) + Mt (s (θ) − s−) − wt (θ, s (θ)) l (θ)) f (θ|θ−) 1 + K (v (θ) , ∆ (θ) , θ, s (θ) , t + 1) f t (θ|θ ) R − t t ∂f (θ|θ−) wθ,t +λ− v − ω (θ) f (θ|θ−) + γ− ∆ − ω (θ) + µ (θ) l (θ) φl,t (l (θ)) + β∆ (θ) ∂θ− wt

with boundary conditions: lim µ (θ) = lim µ (θ) = 0 θ→θ¯ θ→θ Taking the ﬁrst order conditions (hereafter, FOC) of the recursive planning problem yields (the

49 variable with respect to which the FOC is taken appears in brackets):

∗ τL (θ) µ (θ) wθ,t 0 l (θ) φll,t (l (θ)) [l (θ)] : ∗ = t ut (c (θ)) 1 + 1 − τL (θ) f (θ|θ−) wt φl,t (l (θ)) using the deﬁnitions of εc, εu and ε in the text:

∗ u τL (θ) µ (θ) εwθ 0 1 + ε ∗ = t ut (c (θ)) c 1 − τL (θ) f (θ|θ−) θ ε

1 ∂K (v (θ) , ∆ (θ) , θ, s (θ) , t + 1) [s (θ)] : −M 0 (s (θ) − s ) + l (θ) w + t − s,t R ∂s (θ) µ (θ) 1 = t l (θ) φl,t (l (θ)) 2 wθ,tws,t (ρθs,t − 1) f (θ|θ−) wt where: (letting θ0 and s0 be the next period’s type and human capital respectively):

∂K Z = M 0 s0 θ0 − s (θ) f t+1 θ0|θ dθ0 ∂s (θ) t+1

so that: 1 Z −M 0 (s (θ) − s ) + l (θ) w + M 0 (s0 (θ0) − s (θ)) f t+1 (θ0|θ) dθ0 t − s,t R t+1 µ (θ) 1 = t l (θ) φl,t (l (θ)) 2 wθ,tws,t (ρθs,t − 1) f (θ|θ−) wt

Use the expression for ws,tlt from the deﬁnition of the human capital wedge τSt in (15) to write the

ﬁrst order condition as as a function of the modiﬁed wedge tst:

t ∗ t µ θ 0 t εwθ,t tst θ = t u ct θ (1 − ρθs,t) f (θt|θt−1) θt From this, we can immediately deduce the relation between the modiﬁed wedge and the tax rate in the text: c τLt εt tst = (1 − ρθs,t) u (1 − τLt) 1 + εt The law of motion for the co-state µ (θ) comes from the ﬁrst-order condition with respect to the state variable ω (θ):

t 1 ∂f (θ|θ−) 1 t [ω (θ)] : − 0 + (λ−) + (γ−) t f (θ|θ−) =µ ˙ (θ) (41) ut (c (θ)) ∂θ− f (θ|θ−) Integrating this and using the boundary condition µ θ¯ = 0, yields:

Z θ¯ t 1 ∂f (θ|θ−) 1 t µ (θ) = 0 − (λ−) − (γ−) t f (θ|θ−) (42) θ ut (c (θ)) ∂θ− f (θ|θ−)

50 Integrating and using both boundary conditions yields:

Z θ¯ 1 t λ− = 0 f (θ|θ−) dθ (43) θ u (c (θ))

∂K(v(θ),∆(θ),θ,s(θ),t+1) ∂K(v(θ),∆(θ),θ,s(θ),t+1) Using the envelope conditions ∂v(θ) = λ (θ) and ∂∆(θ) = −γ (θ), the ﬁrst-order conditions with respect to v (θ) and ∆ (θ) respectively lead to:

1 λ (θ) [v (θ)] : = (44) u0 (c) Rβ and γ (θ) µ (θ) [∆ (θ)] : − = t (45) Rβ f (θ|θ−) Using (43) and (45) in the expression for µ (θ) from (42) yields: µ θt = κ θt + η θt where

Z θ¯ Z θ¯ ! t 1 0 1 t κ θ = 0 1 − u (c (θ)) 0 f (m|θ−) dm f (θ|θ−) θt u (c (θ)) θ u (c (m))

Z θ¯ t t−1 Z θ¯ t t ∂f (θ|θ−) µ θ ∂f (θ|θ−) η θ = − (γ−) dθ = Rβ dθ θt ∂θ− f (θt−1|θt−2) θt ∂θ− ∗ where the last equality uses the lag of (45). The multiplier is replaced by the last period’s tst−1 (re- ∗ spectively, τLt−1) using the optimal formulas to obtain the expressions in proposition (3) (respectively, (6)). The zero net wedge result at the top and bottom follows immediately from the boundary conditions µ (θ) = µ θ¯ = 0.

t 0 Part ii) of Proposition 3: If θ is iid, γ− = 0 and η θ = 0 for all t. In addition, if ut (ct) = 1 ∀t, t then κ θ = 0 as well. Proof of Proposition 2: From the expression of ts, the proof is immediate by inspection as long as µ θt ≥ 0 ∀t, ∀θt, which is now proved.

Lemma 1 Under assumption (2), µ θt ≥ 0 ∀t, ∀θt.

Proof of Lemma 1: The proof is close to the one in Golosov, Troshkin, and Tsyvinski (2013), for a separable utility function and with human capital. From the envelope condition and the FOC for v (θ) in (44):

∂K Rβ = λ (θ) = 0 ∂v ut (c (θ))

∂K Since by assumption v (θ) is increasing in θ and K () is increasing and convex in v, it must be that ∂v 1 is increasing in θ, so that 0 as well is increasing in θ. ut(c(θ))

51 1 Start in period t = 1. In this case, since θ has a degenerate distribution, ∂f (θ |θ ) = 0 and 0 ∂θ0 1 0   Z θ¯ 1 1 ˜ ˜ µ (θ1) =  − λ− f θ1 dθ1 θ 0 ˜ 1 u1 c θ1

0 1 1 0 Choose the θ such that u0(c(θ0)) = λ− . Since u0(c(θ)) is increasing in θ, for θ ≥ θ , µ (θ) ≥ 0 (integrating over non-negative numbers only). Using the boundary condition µ (θ) = 0, µ (θ1) can also be rewritten as:   Z θ1 1 1 ˜ ˜ µ (θ1) = − + λ− f θ1 dθ1 θ 0 ˜ u1 c θ1

0 1 Since for θ ≤ θ , 0 ≤ λ , we again have µ (θ ) ≥ 0. Thus, for all θ , µ (θ ) ≥ 0. 1 1 u (c(θ1)) − 1 1 1 By the ﬁrst-order condition for ∆ in (45):

γ (θ ) µ (θ ) − 1 = 1 Rβ f (θ1) so that γ (θ1) ≤ 0, for all θ1. Note that µ (θ2) is equal to: ¯   Z θ ∂f θ˜2|θ1 1 (γ−) ˜ µ (θ2) =  − (λ−) −  f θ2|θ1 θ 0 ˜ ∂θ1 ˜ 2 u2 c θ2 f θ2|θ1

˜ g(θ2|θ1) ˜ 1 Since by assumption (2) iii), ˜ is increasing in θ2, and we already showed that 0 ˜ is f(θ2|θ1) u (c(θ2)) ˜ 0 increasing in θ2, there is a θ2 such that

0 1 ∂f (θ2|θ1) (γ−) 0 0 − 0 = λ− u2 (c (θ2)) ∂θ1 f (θ2|θ1)

0 2 and such that for θ2 ≥ θ2, µ θ ≥ 0 (since integrating over non-negative numbers only). Rewriting 2 µ θ as an integral from θ to θ2 and using the boundary condition θ = 0, we can again show that 0 µ (θ2) ≥ 0 also for θ2 ≤ θ2. Proceeding in the same way for all periods up to T shows the result. Proof that tst = 0 when ρθs = 1 without using the ﬁrst-order approach: Consider a separable wage w = θs, a history θt and a perturbation of the allocation for all θ˜t in a neighborhood sδ(θ˜t) of θt, |θt − θ˜t| ≤ η such that sδ θ˜t = s θ˜t + δ and yδ θ˜t = y θ˜t + dy θ˜t such that = yδ(θ˜t) s(θ˜t) y(θ˜t) , i.e., dy θ˜t = δ . This perturbation leaves utilities and incentive compatibility constraints y(θ˜t) s(θ˜t) unaﬀected. The change in the resource cost must hence be zero in this neighborhood, and letting y(θ) 0 1 0 0 η → 0, we obtain: − s(θ) + Mt (e (θ)) − R Et Mt+1 (e (θ )) = 0, which is equivalent to tst = 0 for the multiplicative wage.

Proof of Proposition 7: Taking integral ofµ ˙ (θ) in equation (41) between the two boundaries, θ¯ and θ, and using the boundary conditions µ θ¯ = µ (θ) = 0, as well as the expression for λ− from (44) , lagged by one period, yields the inverse Euler equation in (33).

52 Proof of Corollary 2: The derivation of the time evolution of the labor wedge follows Farhi and Werning (2013). Take any weighting function π (θ) > 0 and let Π (θ) denote a primitive of π (θ) /θ. Starting from the expression of the optimal labor wedge in (32), multiply both sides of the expression by π (θ) > 0. Integrating by parts, yields:

Z t c Z τLt (θ ) t θt εt 1 π (θt) t π (θt) t f (θt|θt−1) u 0 t dθt = µ θ dθt (1 − τLt (θ )) εwθ,t 1 + εt ut (c (θ )) θt θt Z t = − µ˙ θ Π(θt) dθt

Z t−1 t ! 1 Rβ Rβµ θ ∂f (θt|θt−1) 1 t = 0 t − 0 t−1 + t f (θt|θt−1)Π(θt) dθt ut (c (θ )) ut−1 (c (θ )) f (θt−1|θt−2) ∂θt−1 f (θt|θt−1) Z c t 1 Rβ RβτLt−1 1 θt−1 εt−1 ∂f (θt|θt−1) 1 t = 0 − 0 + 0 u t f (θt|θt−1)Π(θt) dθtdθt ut ut−1 (1 − τLt−1) ut−1 εwθ,t−1 1 + εt−1 ∂θt−1 f (θt|θt−1) where the third line uses the expression for λ− and γ− from respectively (44) and (45) evaluated at t − 1. The fourth line uses the optimal wedge from (32) at time t − 1 to substitute for the multiplier µ θt−1. Using the inverse Euler Equation, yields a general formula for any stochastic process and weighting function:

t c 0 t−1 ! τLt (θ ) θt εt ut−1 c θ π (θt) Et−1 t u 0 t (46) (1 − τLt (θ )) εwθ,t 1 + εt ut (c (θ )) θt 0 t−1 ! c Z t ut−1 c θ RβτLt−1 θt−1 εt−1 ∂f (θt|θt−1) = Cov 0 t , Π(θt) + u Π(θt) dθt ut (ct (θ )) (1 − τLt−1) εwθ,t−1 1 + εt−1 ∂θt−1

For the particular weighting function π (θt) = 1 (with Π (θt) = log (θt)), and with the AR(1) process assumed for log(θt), the formula becomes as in (34).

Proof of Corollary 1: The net wedge on human capital can be rewritten similarly as in the proof of Proposition 2, using the same weighting function π (θ) = 1: Z tst 1 t 0 t f (θt|θt−1) ut (ct (θ )) εwθ,t (1 − ρθs,t) Z t−1 Z t 1 Rβ t µ θ ∂f (θt|θt−1) = 0 t − 0 t−1 f (θt|θt−1) log (θt) dθt + Rβ log (θt) dθt ut (ct (θ )) ut−1 (c (θ )) f (θt−1|θt−2) ∂θt−1 where the second line uses the expression for λ− and γ− from respectively (44) and (45) evaluated at t − 1. Using the optimal wedge from (20) at time t − 1 to substitute for the multiplier µ θt−1 , the boundary conditions for µ (and the resulting Inverse Euler Equation), and the log AR(1) process for

θt, formula (25) in the text is obtained.