<<

ECONG011 Public Microeconomics

Ian Preston

1 Introduction

2 Individuals

We consider as starting point a competitive economy without government. Suppose that there are

m different consumption goods

n different types of labour

There are H individuals, h = 1,...,H, who

have endowments of goods ωh and consume quantities qh

each have an endowment of time, normalised to 1, from which they supply labour Lh

Individual preferences are captured in functions uh(qh,Lh)

3 Firms

There are K firms, k = 1,...,K, which undertake production plans which involve

using labour lk

to produce quantities of goods yk

according to technological requirements, say Gk(yk, lk) ≤ 0.

4 Trade

To simplify, we assume that

each firm produces only one type of good

each individual supplies only one type of labour

Furthermore the only types of trade that occur are

sales of labour from individuals to firms and

sales of goods from firms to individuals

In particular this avoids complications concerned with the treatment of trades in goods or labour between firms and firms or between individuals and individuals.

5 Prices

We begin without any government. Suppose both firms and consumers behave as price takers. Let

the pretax price vector for goods be p0

the pretax wage vector be w

6 Competitive behaviour

Firms maximise profits given technology

max πk = p0kyk − w0lk s.t.Gk(yk, lk) ≤ 0. lk,yk

Profits are then shared among individuals according to ownership shares δhk Individuals choose goods demands and labour supplies to maximise utility given their budget X h h h 00 ¡ h h¢ k h h max u (q ,L ) s.t. p q − ω − δhkπ − w L ≤ 0. qh,Lh k If we assume constant returns to scale then profits are zero in equilibrium.

7 Competitive equilibrium

A competitive equilibrium consists in prices and wages that lead to a feasible allocation of goods and labour X ¡ ¢ X qh − ωh − yk ≤ 0 h X Xk Lh − lk ≥ 0. h k Existence of an equilibrium is guaranteed given convexity of preferences and technology.

8 Welfare theorems

By the First Fundamental Theorem any such equilibrium is Pareto efficient By the Second Fundamental Theorem, any Pareto efficient allocation can be sustained in such an economy as a competitive equilibrium given an appropriate redistribution of endowments. These are standard results of earlier microeconomics courses.

9 Minimal role for government

Trade between agents in a competitive economy needs the protection of a legal system defining rights and enforcing their recognition. This itself requires a form of government with expenses which need to be covered by the raising of public resources. The security offered by a functioning judicial system can be considered as a foundational example of a public good.

10 Public goods and externalities

The incorporation into the model of the existence of other public goods raises further issues about the economic role of government. Public goods can be privately provided but there are strong economic reasons to think it may be more efficient for government to act as provider. The existence of externalities associated with private goods raises related is- sues.

11 Equity

The particular competitive outcome associated with a specific initial distri- bution of endowments and abilities may well be considered unacceptably in- equitable compared to others that might follow from a redistribution of re- sources. Government may arise as the agent effecting such a redistribution through taxation and disbursement of public funds. To do so effectively the government needs to collect information The manner in which it implements taxation should not be such as to dis- courage individuals from revealing that information where it is needed.

12 Other roles for government

The assumption of price-taking behaviour may be inappropriate and the ex- istence of monopoly power raises a case for government regulation. The assumption that the economy settles naturally into equilibrium may also be unwarranted and point towards a case for macroeconomic intervention. These are important issues concerned with the role of government but dealt with in other courses.

13 Social welfare, inequality and poverty

14 Social choice

15 Social choice Before proceeding to discussion of the design of schemes for taxation and public provision, we need to establish a criterion to judge the outcomes of gov- ernment intervention. Suppose then that the government has to choose a social state x drawn from a choice set X. These could be thought of as defining points in an Edgeworth box in a purely competitive economy with private goods distinguished by things such as tax schedules levels of public provision of some good

16 Social choice relation

Individuals have preferences

%h, h = 1,...,H over those states as captured in utility functions ¡ ¢ U = u1, u2, . . . , uH

What we want is to determine a social choice relation %∗ over X as a function of the individual U.

17 Welfarism

The view that only satisfaction of preferences matters to social evaluation is known as welfarism Often taken for granted in economic discussion but it is restrictive Rules out consideration of certain things sometimes considered important such as rights, duties, etc

High rates of tax on alcohol may be motivated by moral disapproval of drinking

Taxation of labour may be influenced by views on the virtue of work

Certain libertarian perspectives take a view of property rights that makes them regard redistributive taxation as

18 Impossibility of a Paretian liberal

Problems arise if preferences can have regard to activities of others As an example, we can take Sen’s proof of the impossibility of a Paretian liberal. Suppose there are two individuals, a puritan P and a libertine L. There is a salacious novel and we consider social choice over three states

the novel is read by noone x0

the novel is read by the puritan alone xP

the novel is read by the libertine alone xL

19 Impossibility of a Paretian liberal: Preferences

The puritan would rather noone read the novel but if anyone is going to read it then he would rather it were him than the libertine:

x0 ÂP xP ÂP xL

The libertine would least prefer that the book be unread but he also prefers that the puritan read it than that he himself does:

xP ÂL xL ÂL x0

20 Impossibility of a Paretian liberal: Social choice

∗ By the (welfarist) Pareto principle xP Â xL since everyone shares that pref- erence. But this is inconsistent with the liberal view that it is a matter only for the individual concerned to choose whether or not to read the book if the alternative is that noone do so: ∗ ∗ xL Â x0 x0 Â xP since these views together generate a cycle in social preferences:

∗ ∗ ∗ x0 Â xP Â xL Â x0

21 Invariance

The options for aggregation of individual preferences depends upon the infor- mation assumed to be contained in the individual utilities A convenient way of capturing this is by defining classes of transformations under which the social choice relation is invariant We specify the information content of utilities by requiring ¡ ¡ ¢ ¡ ¢ ¢ ¡ ¢ %∗ φ1 U 1 , φ2 U 2 ,...,U H; X =%∗ U i,U 2,...,U H; X for all φ1, φ2, · · · ∈ Φ where Φ is some class of transformations.

22 Ordinal comparability assumptions

Ordinal Noncomparability, ONC: Φ contains all increasing φi

Individual preference orderings are known but no interpersonal com- parisons of preference intensity are permitted

Corresponds to the assumption that we know no more than we can identify from individual choice behaviour

Ordinal Level Comparability, OLC: Φ contains all common increasing φi

Restriction that transformations must be common means that we can say whether one individual is better off or worse off than another

23 Cardinal comparability assumptions

Cardinal Noncomparability CNC: Φ contains all increasing φi = ai + biU

Affine transformations are permitted but since parameters can be individual specific this is not very different from ONC

Cardinal Unit Comparability CUC: Φ contains all φi = ai + bU Cardinal Full Comparability CFC: Φ contains all increasing φi = a + bU Cardinal Ratio Scale Comparability CRS: Φ contains all increasing φi = bU

Requiring common parameters in admissible affine transformations strengthens comparability

24 Arrow’s Theorem

Arrow’s General Possibility Theorem shows that ONC severely restricts the possibility for social choice Arrow proved that no social choice relation can satisfy all of the following:

• Universal Domain

• Pareto Principle

• Independence of irrelevant alternatives

• Nondictatorship

25 Arrow’s requirements I

• Universal Domain: The social choice relation should be complete and tran- sitive for any choice set X.

∗ ∗ For all xA, xB in any X, either xA % xB or xB % xA

∗ ∗ ∗ For all xA, xB, xC in any X, if xA % xB and xB % xC then xA % xC • Pareto Principle: The social choice relation should respect unanimous pref- erence.

∗ xA % xB if xA %h xB for all h = 1,...,H

26 Arrow’s requirements II

• Independence of irrelevant alternatives: The restriction of the social choice relation to any pair of outcomes should be independent of the wider choice set X.

∗ ∗ If xA % xB when X = {xA, xB} then xA % xB whenever X ⊇ {xA, xB} • Nondictatorship: No one individual should decide the social choice relation.

∗ There is no h such that xA % xB if and only if xA %h xB

27 Interpreting Arrow

Sen interprets the result as arising from ”informational famine”:

Firstly, welfarism demands that you allow only utility information to enter into social choice decisions

Secondly, assumptions are made so that that utility information is utterly impoverished

28 Outline of a proof

The proof of the theorem can be loosely summarised as follows:

• Take two outcomes and suppose opinion differs between two groups which exhaust the population

• Social preference has to follow the opinion of one or other group

• Their opinion is decisive over this pair and over any choice where opinion is similarly split

• It cannot matter to their decisiveness that their opinion is opposed by the others

• Within the group there must be a decisive subgroup

• It is possible to keep dividing until you arrive at an eventual dictatorship.

29 Almost decisiveness

Consider options x and y.

Suppose the population divides into two groups A and B such that x Âi y for i ∈ A and y Âi x for i ∈ B. By ONC this is all the information that social choice can use If social choice favours x over y then we say that A is almost decisive over {x, y} This means that their opinion prevails over {x, y} whenever it is unanimous and they are opposed by everyone else

30 Almost decisiveness implies decisiveness

In fact their opinion must prevail over any pair of outcomes where preferences are similarly split. Suppose there were outcomes a and b such that

a Âi x Âi y Âi b for i ∈ A y Âi b Âi a Âi x for i ∈ B

Then x %∗ y since A is almost decisive over {x, y} But a %∗ x and y %∗ b by the Pareto principle Therefore a %∗ b by transitivity. Preferences over {x, y} cannot have mattered to this by IIA Therefore A must also be almost decisive over {a, b}.

31 Opposition is irrelevant

The Pareto principle says that social choice is positively responsive to indi- vidual preferences so it surely cannot be important to the decisiveness of A that their opinion is opposed by everyone else Suppose that there is a third outcome z and suppose that

x Âi y Âi z for i ∈ A y Âi x, y Âi z for i ∈ B

By the Pareto principle, y Â∗ z Therefore, since x %∗ y by almost decisiveness of A and social choice is transitive it must be that x Â∗ z However we have said nothing about preferences in B between x and z Hence A is decisive over any pair whether opposed or not.

32 There must be a decisive subgroup Suppose there is a third option z and that we can divide A into two groups

A1 and A2 such that

x Âi y Âi z for i ∈ A1

z Âi x Âi y for i ∈ A2

y Âi z Âi x for i ∈ B Notice these are just the sort of preferences that create a majority voting cycle if none of A1, A2 and B constitute a majority Now x %∗ y by decisiveness of A ∗ If z % y then A2 is decisive since they are the only group with this preference ∗ ∗ On the other hand, if y % z then, by transitivity, x % z and A1 is decisive Either way, some subgroup is decisive.

33 There must be a dictator

Repeat these arguments until the decisive subgroup has shrunk to a singe individual This individual is therefore a dictator The proof is complete

34 Responses to Arrow’s Theorem: Drop transitivity

We can drop the requirement that the social choice relation be transitive so as to allow for example that

• x ∼∗ y and y ∼∗ z but x Â∗ z (which is allowed by quasitransitivity)

• x Â∗ y and y Â∗ z but x ∼∗ z (which is allowed by acyclicity)

This would allow, for example, the Pareto rule which says that

x Â∗ y if everyone prefers x to y but x ∼∗ y otherwise

Such a rule is not transitive but it is quasitransitive: if H = 2, x Â1 z Â1 y ∗ ∗ ∗ but y Â2 x Â2 z then x ∼ y, y ∼ z but x  z Other related forms of group dictatorship would be allowed such as saying that x Â∗ y if and only if everyone with blue eyes preferred x to y.

35 Responses to Arrow’s Theorem: Restrict the domain

Ruling out certain classes of individual preference orderings would allow non- dictatorial social choice relations satisfying Arrow’s other axioms. In particular, majority voting gives a social choice relation which is intransitive for certain configurations of individual preferences – those that give rise to majority voting cycles – but it is possible to rule these out by prohibiting certain preferences at the individual level. Ruling these out means that the step in the proof whereby any decisive group can be shrunk down to a smaller one is not possible.

36 Single-peakedness

Particularly important are single-peaked preferences. Suppose that

the options to be considered can be ordered along a single dimension X

∗ each individual i has a bliss point ξi ∈ X

∗ if comparing any two outcomes on the same side of ξi they prefer the ∗ one nearer to ξi

∗ ∗ ∗ ∗ (x − ξi )(y − ξi ) > 0 ⇒ x %i y iff |x − ξi | ≤ |y − ξi |

37 Impossibility of majority voting cycles Under single-peakedness, any triple can be ordered in such a way, say x < y < z, such that the middle option y is never the least preferred. The population can therefore be split into four groups (neglecting indiffer- ence):

A : x Âi y Âi z B : z Âi y Âi x

C : y Âi x Âi z D : y Âi z Âi x Pairwise majority voting over these three options cannot produce a cycle. If either A or B have a majority of the population then their prefer- ences prevail and are obviously transitive If neither have a majority then there are majorities for y over both x and z and there cannot therefore be a cycle

38 Condorcet winner

An option which beats every other in pairwise votes is said to be a Condorcet winner. If preferences are single peaked then the bliss point of the median voter is a Condorcet winner This is the median voter theorem of Black. Actual public choice mechanisms will not necessarily select such an outcome however since

voting is rarely over single-dimensional issues

the mechanism for aggregating votes may not pick a Condorcet winner even if it is and one exists

39 Responses to Arrow’s Theorem: Drop independence

IIA rules out social choice relations such as the Borda rule, plurality voting or instant run-off voting.

• The Borda rule has each individual rank the alternatives assigns scores according to the position in the ranking adds these scores across individuals as the basis for social choice This is transitive for any preferences within any choice set but the social preference between two elements x and y varies as other elements are added or subtracted from X. • Plurality voting judges one outcome better than another if it is the most preferred element within X of more people

40 Violations of IIA

Suppose

#{x Âi y Âi z} = 3 #{y Âi z Âi x} = 4 #{z Âi x Âi y} = 2

A majority prefer x to y so if the choice set contains only x and y then application of plurality voting or the Borda rule obviously judges x Â∗ y. Suppose however that choice is made from the set {x, y, z}.

Since y is the most preferred choice of more voters than x in this set, plurality voting puts y above x

The Borda score for y is 20 (ie 2 × 3 + 3 × 4 + 1 × 2 ) and the Borda score for x is 17 (ie 3 × 3 + 1 × 4 + 2 × 2) so the Borda rule also puts y above x.

41 Responses to Arrow’s Theorem: Relax invariance

Enriching the quality of the utility information is a final alternative.

If we relax ordinal noncomparability (ONC) to ordinal level comparability OLC then it becomes possible to compare levels of utility. This admits, for example, dictatorship by position in a welfare ranking so that social choice can be according to what is preferred by the least well-off person (or the median person).

42 Social welfare functions

Cardinal unit comparability (CUC) is the strongest invariance requirement allowing . P P If i Ui(x) ≥ i Ui(y) then X X [ai + bUi(x)] ≥ [ai + Ui(y)] i i so the sum of utilities can be used as a social choice criterion. Cardinal ratio-scale comparability (CRS) allows general homothetic social 1 P ρ welfare aggregates such as ρ i Ui Numerical full comparability (NFC), which rules out any non-identity trans- formations at all, allows a general social welfare function W (U1,U2,...,UH) From now onwards this last case will be assumed.

43 Inequality

44 Income based social welfare functions Suppose now that we are in a situation where outcomes can be compared according to individual utilities depending on a single monetary measure which we call income, yi. We can therefore write social welfare as a function of the vector of incomes

Ω(y1, y2, . . . , yH) = W (U1(y1),U2(y2),...,UH(yH)). To simplify exposition, let us assume that individuals are ranked by income so that yi ≥ yj if i > j. To be compatible with the Pareto criterion, social welfare should be increasing in each individual utility since utility is increasing in income Ω(·) should be increasing in each income

45 Inequality and income gaps

One aspect of social judgment that we want to build into social welfare is aversion to inequality To do that we need to decide what constitutes a reduction in inequality. One very strong criterion is the closing up of all income gaps, concertina- fashion,

either in the relative sense that ratios of incomes all become nearer to 1

or in the absolute sense that all income gaps become closer to zero

46 Pigou-Dalton criterion

Another common criterion is the so-called Pigou-Dalton condition:

inequality is reduced by any transfer of income from a richer to a poorer person (Pigou-Dalton transfer or Robin Hood transfer).

Often regarded as uncontroversial even though Pigou-Dalton transfers do not uniformly close up income gaps. If we start from an income vector (1, 3, 5) and transfer income from the richest to the poorest person (in the true spirit of Robin Hood) so as to get (2, 3, 4) then no-one would deny that inequality has fallen If we were to transfer from the richest to the middle so as to get (1, 4, 4) it would be clear that we had reduced inequality in the upper half of incomes but the poorest person would now be further behind the next poorest

47 Generalised Lorenz curve

Define the generalised Lorenz curve G(i) as a function of position in the income by cumulating incomes and dividing by population H 1 Xi G(i) = y H i j=1 The generalised Lorenz curve therefore runs from 0 toy ¯ It is convex by construction since the slope is proportional to income and incomes are ranked.

48 Lorenz curve

The Lorenz curve is constructed in the same way but dividing through by 1 P mean incomey ¯ = H i yi 1 Xi L(i) = y Hy¯ i j=1 The Lorenz curve runs from 0 to 1 and is similarly convex

49 Effect of Pigou-Dalton transfers

A Pigou-Dalton transfer from i to j

leaves the Lorenz curves unchanged outside of the range i to j

but raises the Lorenz curve at all points in between

We say that the vector after the change Lorenz dominates that before the change, meaning that the Lorenz curve is nowhere lower and somewhere higher. If Lorenz dominance holds then, wherever you divide the income ranking, the poorer fraction of the population have a greater share of income in one case than in the other

50 Lorenz curves and inequality with fixed mean incomes

Not only do Pigou-Dalton transfers lead to Lorenz dominance but

if mean incomes are the same, then one Lorenz curve dominates an- other only if it is possible to get from the one to the other by a series of Pigou-Dalton transfers

Also, if mean incomes are unchanged, then there will be Lorenz dominance if either all relative gaps or all absolute gaps are closed up.

51 Pigou-Dalton transfers and social welfare functions

A social welfare function which increases in response to Pigou-Dalton transfers is called Schur-concave.

This will be true if µ ¶ ∂Ω ∂Ω − (yi − yj) ≤ 0 ∂yi ∂yj For obvious reasons, such social welfare functions are also referred to as Lorenz-consistent. If the social welfare function is additive and anonymous X Ω(y1, . . . , yH) = φ(yi) h then Schur-concavity is equivalent to concavity of the individual utility function φ(y).

52 Generalised Lorenz dominance

We want to extend comparisons of inequality and social welfare to cases where mean incomes differ. There is a result due to Shorrocks that shows that the only changes which increase all social welfare functions which are increasing and Schur concave are changes which raise the generalised Lorenz curve. If generalised Lorenz dominance holds then what that says is that, wherever you divide the income ranking, the poorer fraction of the population have a greater total income in one case than in the other.

53 Relative and absolute inequality

To extend inequality comparisons we need to make a judgment as to what sort of changes, among those that do not leave the mean unchanged, nonetheless do not affect inequality. The most common view is the relative one under which scaling up incomes by a common factor leaves inequality unchanged There is also an absolute view under which it is equal translations of all incomes that leave inequality unchanged If we accept the relative view then we can continue to use Lorenz dominance since scaling of all incomes leaves the Lorenz curve unaltered.

54 Relative inequality measures

A relative inequality measure F (·) is any function that is

Schur convex (ie −F (·) is Schur concave)

invariant to scaling (ie homogeneous of degree zero)

There are many examples such as:

• the coefficient of variation – in other words, the ratio of the standard devia- tion to the mean s 1 1 X (y − y¯)2 y¯ H i i

55 Gini coefficient

The Gini coefficient is twice the area between the Lorenz curve and the diag- onal along which the Lorenz curve would lie if all incomes were the same 2 X i (y − y¯) H2y¯ i i

The link to income gaps can be seen by reexpressing it as the mean relative income gap 1 X X |y − y | H2y¯ i j i j

56 Counter-example: the variance of logarithms

An example of a zero-degree-homogeneous function which might be expected to be Schur-convex but turns out not to be is the variance of logarithms  2 1 X 1 X ln y − ln y  H i H j i j

This fails to be Schur-convex because of the way that the geometric mean 1 P H j ln yj can be changed by progressive transfers

57 Equally distributed equivalent income

Suppose there is a homothetic social choice relation. This can be represented by a linearly homogeneous social welfare function Ω(y). Then we can define the equally distributed equivalent income ξ as that income which if given to everyone would generate the same social welfare as the actual income vector

Ω(y1, y2, . . . , yH) = Ω(ξ, ξ, . . . , ξ) = ξΩ(1, 1,..., 1)

The equally distributed equivalent income ξ is itself in fact a particular ho- mogeneous social welfare function representing the given social choice relation.

58 Atkinson-Kolm-Sen inequality index

Now we can construct a relative inequality index as

I = 1 − ξ/y¯

This index is Schur convex and homogeneous of degree zero as required. It can be thought of as the fraction of income wasted from a social welfare perspective as a consequence of inequality. The idea is attributed to Atkinson, Kolm and Sen, all writing separately.

59 Atkinson inequality index

Atkinson’s particular measure proceeds from a social welfare specification 1 X Ω = y1−² 1 − ² i i where ² > 0 is interpreted as an inequality aversion parameter The corresponding inequality index is " #1/(1−²) 1 1 X 1 − y1−² y¯ H i i

60 Equity and efficiency

One thing that is neat about this is the existence of a social welfare measure

ξ =y ¯ (1 − I) conveniently represented as the product of mean income and an equality measure This nicely captures the equity and efficiency aspects to social welfare mea- surement. The whole reasoning here can also be reversed so that one can start with an inequality measure and derive a corresponding social welfare measure using the same formula If we begin with the Gini coefficient for example then we derive a social welfare measure which is exactly the area under the generalised Lorenz curve.

61 and inequality

It is important to establish which sorts of taxes reduce inequality For the moment we ignore behavioural responses and assume a tax function

T (y) applied to fixed incomes yi, i = 1,...,H. We assume marginal tax rates everywhere between 0 and 1 so that tax pay- ments increase with incomes but the pretax rich remain the posttax rich.

62 Progressive taxation

We say that a tax is progressive if T (y)/y is increasing in y so that the average tax rate rises with income. Equivalently, progressive taxes

have an elasticity of taxes to incomes which is greater than one

have marginal tax rates T 0(y) everywhere greater than average tax rates T (y)/y

63 Progressive taxation and inequality

There are two points to note about such taxes Firstly T (yj) yj > if yj > yi T (yi) yi so tax payments are more unequal – which means more heavily concentrated on the rich – than the incomes to which they are applied Secondly yj − T (yj) yj < if yj > yi yi − T (yi) yi so incomes after tax are more equal than incomes before tax. Progressive taxes are the only sorts of taxes which ensure that these facts are true whatever the pretax income distribution.

64 Progressive taxation and the Lorenz curve

As a consequence, given that the ranking of individuals by taxes, by incomes before and after tax all coincide, if taxes are progressive then

the Lorenz curve for incomes after tax Ly−T (i) lies above the Lorenz

curve for incomes before tax Ly(i)

the Lorenz curve for tax payments LT (i) lies below the Lorenz curve

for incomes before tax Ly(i)

65 Redistributive effect and departure from proportionality

If we let T¯ be the mean tax payment then 1 T/¯ y¯ L (i) = L (i) − L (i) y−T 1 − T/¯ y¯ y 1 − T/¯ y¯ T T/¯ y¯ ⇒ L (i) − L (i) = [L (i) − L (i)] y−T y 1 − T/¯ y¯ y T so that

the redistributive effect Ly−T (i) − Ly(i) can be linked to

the departure from proportionality Ly(i) − LT (i) and the average tax rate T/¯ y¯

66 Poverty

67 Poverty

Poverty is concerned with the failure of incomes to meet basic needs at the bottom end To operationalise this we need to identify an income level (the poverty line, z) minimally sufficient to cover those needs Whether or not z should itself depend on the income distribution is a debated issue There are

some needs (food, shelter, etc) that are not particularly dependent on the incomes of others

some needs (dignity, self-respect, etc) that are so

68 Headcount ratio

One obvious measure is the headcount ratio which simply records the pro- portion of the population below z 1 P = #(y < z) H This is a superficially attractive, because simple, measure and evidently a popular focus of public debate on the issue It is a measure however which fails to satisfy some basic

69 Poverty axioms I

An increase in the income of any poor person ought to decrease poverty The headcount fails to satisfy this since it is completely insensitive to how poor the poor are An increase in the income of a poor person reduces the headcount only if it takes that person across the poverty line.

70 Shortfall index

A measure that avoids this weakness is the shortfall index Q which is based on reckoning up the total gap between the incomes of the poor and the poverty line Define for each individual a censored income

y˜i = min [yi, z] and a poverty gap

gi = max [z − yi, 0] = z − y˜i then 1 X 1 X Q = g /z = 1 − y˜ /z. H i H i i i If m is the mean income of the poor then Q = P (1 − m/z).

71 Poverty axioms II

A transfer of income from a more to a less poor person reduces poverty This is a less obvious requirement but one that neither the headcount ratio nor the shortfall index satisfy It is essentially a demand that the poverty measure be sensitive to the in- equality among the poor. Alternatively it can be seen as a demand that the needs for relief of the poorest be recognised as in more urgent need of relief.

72 Poverty indices I: Foster-Greer-Thorbecke

Foster, Greer and Thorbecke are associated with the proposal to measure poverty by the mean of some convex transformation of poverty gaps 1 X R = φ(g ) 1 H i i for some convex φ

73 Poverty indices II: Clark-Hemming-Ulph

Clark, Hemming and Ulph are associated with the proposal to measure poverty by taking the equally distributed equivalent income of the truncated distribu- tion, say ξ˜, and calculating ˜ R2 = 1 − ξ/z.

If we let I˜ be the inequality index of the truncated distribution then ³ ´ ˜ R2 = 1 − ξ/y˜ (˜y/z) = 1 − (1 − I˜)(1 − Q)

74 Commodity taxation

75 Equivalences and normalisations

76 Linear taxes

Goods priced at p0 before tax are subject to ad valorem taxes at rates

t = (t1, t2, . . . , tn)

Labour is subject to a linear income tax at rate τ and individuals are paid a lump sum grant G. Thus X 0 h h h pi (1 + ti)qi ≤ w (1 − τ)L + G i

77 Num´erairesand untaxed goods

It is important to be clear about issues of normalisation. Only relative prices are determined in equilibrium. Any one price can be set to unity before and after tax (we call such a good a num´eraire). That good is untaxed by construction as a normalisation and not a restric- tion. Typically we take that good to be labour when discussing commodity taxes but that does not mean that there is a restriction prohibiting taxation of leisure for which commodity taxes need to correct It also makes no sense to talk about which goods are taxed and which sub- sidised at the optimum except relative to a particular normalisation.

78 Equivalences

What matter are individual budget sets. Suppose taxes on goods and labour are as described so that X 0 h h h pi (1 + ti)qi ≤ w (1 − τ)L + G i An identical budget set is achieved with no labour income tax, goods taxes

(ti + τ)/(1 − τ) and grant G/(1 − τ) µ ¶ X 1 + t G p0 i qh ≤ whLh + . i 1 − τ i 1 − τ i Note that a pure labour income tax at rate τ with grant G is equivalent therefore to a uniform commodity tax at rate t = τ/(1 − τ) with a grant G/(1 − τ).

79 Normalising pretax prices

From now onward, we assume pretax prices are all equal to unity, p0 = 1 This is another harmless normalisation rather than a loss of generality, achieved by choice of units of measurement for the goods.

80 Welfare analysis of tax reforms

81 Welfare analysis of small tax increase

Consider the marginal effect of raising the tax rate on the kth good, tk ∂ ∂ V (w, p, G) = − V (w, p, G)qk = −θqk ∂tk ∂G where θ = ∂V/∂G by Roy’s identity. The first order welfare effect is proportional to consumption of the good. P Revenue is R = k tkqk − G so " # ∂ X ∂f R(w, p, G) = q + t i ∂t k i∂p k i k

82 Comparing tax raising options

Marginal welfare loss per unit of revenue gained is therefore

∂V/∂tk qk λk = − = θ P ∂fi ∂R/∂tk qk + ti i ∂pk This offers a means of comparison of different tax raising options while also being suggestive of optimum design.

If λi > λj then there exists a marginal shift of taxation from good i to good j which can raise welfare without losing revenue.

It is only if all λi, i = 1, . . . , n are equal that no improvement is possible.

83 Welfare analysis of larger tax increase

For a non-marginal change · µ ¶¸ ∆V 1∆tk ∂ ln θ ∂ ln fk = −θqk 1 + + ∆tk 2 tk ∂ ln pk ∂ ln pk The higher order approximation brings in terms relating to demand elasticities

∂ ln fk/∂ ln pk. Empirical investigation with actual demand estimates suggests such higher order terms may be important for getting the distribution of welfare effects correct.

84 Optimum commodity taxation

85 Optimum taxation of a homogeneous population Suppose there is a population of identical individuals so that distributional issues can be put aside. The government can raise its revenue requirement R¯ only through commodity taxes and therefore tries to solve max V (w, 1 + t, 0) s.t.R = R¯ t Its first order condition is ∂V ∂R + λ = 0 ∂tk ∂tk where λ is a Lagrange multiplier for the revenue constraint.

Thus λk as defined above is equated across goods ∂V/∂tk qk λk = − = θ P ∂fi ∂R/∂tk qk + ti i ∂pk

86 Ramsey rules

Using some demand theory to develop the implications " # X ∂f 0 = −θq + λ q + t i k k i∂p i k " µ ¶# X ∂g ∂f = −θq + λ q + t i − q i k k i ∂p k ∂y i k This can be rearranged to give an expression à ! X ∂g θ X ∂f t k = −q 1 − − t i ≡ (b − 1)q i ∂p k λ i ∂y k i i i P where b ≡ (θ/λ) + i ti∂fi/∂y is the marginal social value of income adjusted for the value of any demand-related change in tax revenue These expressions are the Ramsey rules for optimum commodity taxation..

87 Marginal social value of income

Multiplying by tk and summing gives X X X ∂gk titk = (b − 1) tkqk = (b − 1)R¯ ∂pi i k k The left hand side expression is nonpositive by negativity of the Slutsky matrix Thus 1 − b has the same sign as the revenue requirement R¯.

88 Samuelson interpretation

Optimal taxes are zero if R¯ = 0 since there is no point causing deadweight loss if no revenue needs to be raised. If R¯ > 0 then X t i η∗ = b − 1 < 0 1 + t ki i i so that taxes are so designed that there are equal proportional compensated falls at the margin, in the interpretation credited to Samuelson. The left hand side is called by Mirrlees an “index of discouragement” which is equated across goods.

89 Inverse elasticity rule

If it were the case that compensated cross-price effects were small so that

∂gk/∂pi ' 0 for i 6= k then ti b − 1 ' ∗ 1 + ti ηii Taxes are highest on goods with lowest compensated own price elasticity. This is the so-called inverse elasticity rule. Deadweight loss is lowest where taxes are placed on goods least responsive to taxes. If there are no cross-price elasticities to consider then deadweight loss is ap- proximately given for each good by the area of a triangle beneath a compensated demand function and increases with the square of the tax rate The inverse elasticity rule minimises the sum of these triangles.

90 Optimum lump sum taxation

Suppose the government can now use the uniform grant G. The first order condition would be " # X ∂f 0 = θ + λ t i − 1 = λ(b − 1) i ∂y i

Thus b = 1 and ti = 0 for all taxes. All revenue would be raised through the lump sum tax to avoid deadweight loss.

91 Optimum taxation of a heterogeneous population

For a many-person economy the marginal effect of raising tk on social welfare is X X ∂ ∂W d h h h h W (·) = − h V (·)qk = − β qk ∂tk ∂V dy h h by Roy’s identity. Even in the utilitarian case where ∂W = 1, βh will vary across households ∂V h because of variation in θh. P P h Revenue is R = h k tkqk − HG so " # X X X h ∂ h ∂qi R = qk + ti ∂tk ∂pk h i h

92 First order conditions

The Lagrangean for the optimum tax problem is " # X X h ¯ W (V1,V2,...,Vn) + λ ti qi − HG − R i h and first order conditions for solution imply " # X X X X h h h h ∂fi 0 = − β qk + λ qk + ti ∂pk h h i h " µ ¶# X X X X h h h h h ∂gi h∂fi = − β qk + λ qk + ti − qk ∂pk ∂y h h i h

93 Modified Ramsey rules

Therefore à ! X X h X h X h ∂gk h β ∂fi ti = − qk 1 − − ti ∂pi λ ∂y i h Xh i h h ≡ (b − 1)qk h ¡ ¢ h h P h where b ≡ β /λ + i ti∂fi /∂y is the net marginal social value of income These are the modified Ramsey rules applying to a many-person economy.

94 Optimum setting of uniform grant

The first order condition with respect to G is now " # X X X ∂f X 0 = βh + λ t i − H = λ (bh − 1) i ∂y h i h h so that bh only needs to equal one on average across the population. Uniform lump sum taxes are no longer an optimal way to raise revenue.

95 Taxing according to distributional characteristics

h h The covariance between b and qk matters, reflecting the distributional quality of the good. If a good is consumed heavily by people with a low net social marginal valu- ation of income – in other words, those whose needs are considered less socially pressing – then that is reason to tax it heavily in the optimum scheme. There is a reason, in other words, to tax luxuries heavily. This still needs to be weighed against the Ramsey efficiency considerations regarding sensitivity of compensated demands to distortion.

96 Linear Engel curves

97 Optimum uniform taxation

Optimal commodity taxes will generally not be uniform but there are certain classes of preferences which can be shown to imply uniformity of the optimum. In particular, Deaton has shown that the assumptions of

• weak separability of commodities and leisure

• linearity of Engel curves are alone sufficient to imply uniform optimal tax rates.

98 Preferences with linear Engel curves

Such preferences correspond to an indirect utility function of the form µ ¶ whL + G + B(p) V (wh, p, G) = max φ L, L A(p) where both A(p) and B(p) are homogeneous of degree one. Labour supply choice solves µ ¶ µ ¶ whL + G + B(p) wh whL + G + B(p) φ L, + φ L, = 0 1 A(p) A(p) 2 A(p) enabling us to write a labour supply function µ ¶ wh G + B(p) L(p, wh,G) = ζ , A(p) A(p) for some ζ

99 Indirect utility and expenditure function under linearity

The indirect utility function therefore has the form µ ¶ wh G + B(p) v(p, wh,G) = ψ , A(p) A(p) for some ψ Inverting gives an expenditure function of the form µ ¶ wh e(p, wh, υ) = A(p)γ , u − B(p) A(p) for some γ

100 Hicksian and Marshallian demands under linearity

By Shephard’s Lemma the Hicksian demands for goods and supply of labour are · ¸ wh g (p, wh, υ) = A γ − γ − B k k A 1 k h χ(p, w , υ) = −γ1

By adding up and by the homogeneity properties of A(p) and B(p) whL + G + B wh = γ − γ A A 1 Substituting this into the compensated demand functions above gives Mar- shallian demands for goods ¡ ¢ A £ ¤ f wh, p, G = k whL + G + B − B k A k with L given as above, so that Engel curves are all linear.

101 Many-person Ramsey rules under linearity I

Now, noting which terms do and do not vary with h, use these expressions to substitute into the two sides of the modified Ramsey rule expression. Firstly, X A X (bh − 1)qh = k (bh − 1)whLh k A h h

102 Many-person Ramsey rules under linearity II

Secondly, note that homogeneity requires X h h h h (1 + ti) ∂gk /∂pi + w ∂gk /∂w = 0 i so that if there are uniform taxes, ti = t for all i, then X X h X X h ∂gk t ∂gk ti = (1 + t) ∂pi 1 + t ∂pi i h i h t X ∂gh = − wh k 1 + t ∂wh h t X ∂χh = − wh 1 + t ∂pk h µ ¶ µ ¶ t X wh 2 wh = A γ , υ 1 + t k A 11 A h

103 Optimality of uniformity under linearity

The only term varying with k on either side is the common factor Ak and therefore equating of the two sides reduces to the same condition for each of the n goods µ ¶ µ ¶ X whLh t X wh 2 wh (bh − 1) = γ , υ A 1 + t A 11 A h P h t (bh − 1)whLh ⇒ = Ph h h h 1 + t h w L ηLL h h h where ηLL = ∂ ln χ /∂ ln w is a compensated labour supply elasticity. Uniform taxes are therefore optimal.

104 Welfare improving movement towards uniformity

Deaton has also shown that if you add the assumption of

additive separability across all goods then not only does the optimum involve uniform taxes but any local movement towards uniformity is necessarily welfare-improving. It would, of course, therefore be redundant to go to data with a demand system of this sort to analyse such a problem

105 Demand estimation

106 Working-Leser Engel curves

An example of a flexible specification is provided by PIGLOG prefernces. The origin of such a specification lies in the so-called Working-Leser form for uncompensated budget share equations. If we let y denote total spending on goods then this specification has

wi(y, p) = ai(p) + bi(p) ln y

This is a specification placing no restriction on price responses but forcing budget shares to be linear in the logarithm of total commodity spending It has proved to fit many data sets well.

107 PIGLOG preferences

The corresponding indirect utility function is

max φ(L, V (wL + G, p)) L where the separable subutility corresponding to goods has the form ln [y/A(p)] V (y, p) = B(p) where A(p) is homogeneous of degree one and B(p) homogeneous of degree zero. Note that if B(p) = 1 then subpreferences over goods are homothetic Any dependence of budget shares on total spending determined by the prop- erties of B(p).

108 PIGLOG budget shares

By Roy’s identity

∂V/∂ ln pi wi(y, p) = − ∂V/∂· ln y ¸ 1 ∂ ln A ln (y/A) ∂B = −B − − 2 B ∂ ln pi B ∂ ln pi ∂ ln A ∂B = + ln (y/A) ∂ ln pi ∂ ln pi ao that ai(p) and bi(p) can be identified with the price elasticities of the indices A(p) and B(p). Note that if B(p) is constant then budget shares depend only on prices, com- patibly with the earlier observations about homotheticity.

109 Almost Ideal Demand System

The Almost Ideal Demand System (AIDS) is an example of PIGLOG prefer- ences formed by choosing functional forms for A(p) and B(p) X 1 X X ln A(p) = α + α ln p + γ ln p p 0 i i 2 ij i j X i i j ln B(p) = βi ln pi i so that X ∗ wi(y, p) = αi + γij ln pj + βi ln [y/A(p)] j ∗ 1 where γij = 2 (γij + γji).

110 Integrability restrictions for AIDS I

One advantage of such a functional form is that satisfaction of integrability restrictions is relatively easily imposed. P • Adding Up: If i wi(y, p) = 1 for any y and p then X X X ∗ αi = 1, βi = γji = 0 i i i

• Homogeneity: In order that wi(λy, λp) = wi(y, p) for any y and p it must be that X ∗ γij = 0 j and that A(p) is linearly homogeneous so that also X X ∗ αi = 1, γji = 0 i i

111 Integrability restrictions for AIDS II

∗ ∗ • Symmetry: Symmetry is satisfied if γij = γji • Negativity: Negativity is an inequality restriction and is typically checked to hold given the range of variation of y and p in the data.

Note that these restrictions are clearly not independent – for example, adding up and symmetry imply homogeneity.

112 Estimating AIDS

The equations to be estimated would be linear in parameters were it not for the A(p) term scaling total budget. It is possible to estimate by beginning with an approximation to A(p) such as the Stone price index X S(p) = w¯i ln pi i wherew ¯i is the population mean budget share, substituting into the budget share equations and estimating the parameters linearly. These estimates can then be used to update the price index and proceed iteratively until convergence (which is guaranteed).

113 Imposing integrability

Adding up does not need to be imposed since budget shares add to 1 in the data Homogeneity is typically imposed by expressing all nominal quantities relative to the price of a certain num´erairegood Symmetry is either imposed as a restriction on multivariate estimation or imposed, say by minimum distance techniques, after estimating the equations for different goods separately.

114 Relaxing linearity

Linearity of Engel curves in ln y fits well for some goods but not for all. The PIGLOG specification can be extended to more sophisticated income dependence but integrability is quite restrictive with regard to how this can be done. Gorman shows that the only function of income that can be added linearly is a quadratic term and that that is the limit to further terms which can be added. Such a specification arises from an indirect utility of the form ln [y/A(p)] V (y, p) = B(p) + Φ(p) ln [y/A(p)] where Φ(p) is a a further homogeneous-of-degree-zero price index.

115 QUAIDS

Keeping the AIDS specifications for A(p) and B(p) and choosing the conve- nient form X Φ(p) = φi ln pi i gives the QUAIDS specification X φ w (y, p) = α + γ∗ ln p + β ln [y/A(p)] + + i ln [y/A(p)]2 i i ij j i B(p) j which can again be estimated iteratively, though iterating now over B(p) as well as A(p).

116 Income taxation

117 Optimum linear income taxation

118 Income taxation

Drop the distinction between different goods as objects of taxation. Single consumption good c with a price set to one. The tax function on labour income is denoted T (wL) and the individual therefore chooses hours to solve

max u(c, L) s.t. c = wL − T (wL)

Assume a continuous wage distribution according to distribution function F

119 Linear income taxation

Suppose a constant marginal tax rate τ and uniform grant component G so that c = w(1 − τ)L + G

Let the uncompensated labour supply function be H(w(1 − τ),G) and the compensated labour supply function be χ(w(1 − τ), υ). Very similar to the indirect tax case, given that a linear income tax and a uniform tax on commodities are effectively equivalent.

120 Optimum linear income tax problem

The government seeks to solve Z Z max W (V (w(1 − τ), G, Q)dF s.t. τ wLdF − G − R¯ = 0 τ,G with Lagrangean Z © ¡ ¢ª W (V ) + λ τwL − G − R¯ dF

First order conditions are Z ½ µ ¶¾ ∂V ∂L W 0 + λ τw − 1 dF = 0 ∂G ∂G Z ½ µ ¶¾ ∂V ∂L W 0 + λ τw + wL dF = 0 ∂τ ∂τ

121 Optimum linear income tax rules

Defining ∂L ∂H ∂χ ∂H = −w = −w − wL ∂τ ∂w(1 − τ) ∂w(1 − τ) ∂G ∂V = −θwL ∂τ and defining the net social marginal valuation of income b = W 0θ/λ+τw∂H/∂G with θ = ∂V/∂G, we have Z (b − 1) dF = 0 Z µ ¶ τw ∂χ wL b − 1 + dF = 0. L ∂w(1 − τ)

122 Interpreting optimum linear income tax rules

The former condition tells us that the net social marginal valuation of income averages unity. Rearranging the latter gives a condition R τ wL(b − 1)dF = − R 1 − τ wLηLLdF where ηLL = ∂ ln H/∂ ln w(1 − τ) is a compensated labour supply elasticity. This expression only implicitly defines the optimum tax rate τ

123 Equity and efficiency

Numerator and denominator of this expression can be seen as conveniently capturing equity and efficiency considerations. High pretax inequality as reflected in a high covariance between b and pretax earnings wL Z wL(b − 1)dF is associated with high optimum tax rates

High labour supply elasticities ηLL, and therefore high deadweight costs of labour taxation, Z

wLηLLdF are associated with low rates.

124 Comparing with optimum commodity tax formulae

Notice that this optimum tax formulae is essentially the same as that derived for the optimum uniform tax rate in the case of separability between goods and leisure and linear Engel curves. R R τ wL(b − 1)dF t (b − 1)wLdF = − R = R 1 − τ wLηLLdF 1 + t wLηLLdF

125 Optimum nonlinear income taxation

126 Two ability types

Suppose there are only two ability types, assumed for simplicity to be equally numerous.

The more able type has productivity and therefore pretax wage w1 and the other w2 < w1. Suppose also that utility is additively separable

U(c, L) = u(c) − v(L) where u is concave and v convex so that preferences are convex. The government objective is to maximise utilitarian social welfare

u(c1) + u(c2) − v(L1) − v(L2)

127 First best taxation

Suppose the government can observe productivity types and impose consump- tion and labour supplies so as achieve its objective subject to raising revenue per person of R¯

max u(c1) + u(c2) − v(L1) − v(L2) c1,c2,L1,L2

w1L1 + w2L2 − c1 − c2 ≥ 2R¯

First order conditions require

0 0 u (c1) = λ u (c2) = λ 0 0 v (L1) = w1λ v (L2) = w2λ. where λ is the Lagrange multiplier on the revenue constraint

128 First best allocation

Therefore

0 0 u (c1) = u (c2) 0 0 v (L1) = w1v (L2)/w2.

Thus less and more able individuals consume the same, c1 = c2 Given convexity of the disutility of labour, the more able are expected to work longer hours, L1 > L2. Both types consume resources at the same rate to generate utility whereas the more able generate more resources for each unit of utility given up in hours of work.

129 Incentive compatibility

The more able are therefore left worse off at the optimum. The assumed ability of the government to observe productivity is therefore critical to its ability to implement the optimum If it cannot, then the more able have no incentive to reveal themselves since they will be penalised for doing so.

130 Zero marginal tax rates

For the given preferences the marginal rate of substitution between consump- tion and leisure is given by the ratio of marginal utilities v0(L)/u0(c). For both types the first order conditions also imply

0 0 v (Li)/u (ci) = wi, i = 1, 2 so that the marginal rate of substitution equals the pretax wage. At the margin each individual’s preparedness to give up hours of leisure for additional consumption is equal to the marginal rate of transformation of hours of work into output of the consumption good. In effect, both types are placed at consumption-leisure combinations which would be chosen if facing budget constraints with a zero marginal tax rate.

131 Second best taxation

Suppose that the government can only set individuals a choice of consumption- earnings combinations, each of which must be offered to both types of individual. Then it cannot make the more able worse off than they would be if emulating the less able by earning the same but working the fewer hours necessary for them to reach that earnings level.

u(c1) − v(L1) ≥ u(c2) − v(w2L2/w1).

Note that u(c2)−v(w2L2/w1) ≥ u(c2)−v(L2) so satisfaction of this constraint guarantees that the more able are better off than the less able.

132 Second best problem

The Lagrangean for the optimum tax problem is

u(c1) + u(c2) − v(L1) − v(L2) £ ¤ + λ w1L1 + w2L2 − c1 − c2 − 2R¯

+ µ [u(c1) − v(L1) − u(c2) + v(w2L2/w1)]

First order conditions are

(1 + µ)u0(c ) = λ (1 − µ)u0(c ) = λ 1 µ ¶2 0 0 w2 0 w2L2 (1 + µ)v (L1) = w1λ v (L2) − µ v = w2λ. w1 w1

133 The more able are better off

We know the incentive compatibility constraint to bind so that µ > 0 From the first two of these conditions

0 0 u (c1) = λ/(1 + µ) < λ/(1 − µ) = u (c2) and therefore c1 > c2. The more able work longer hours but they are rewarded in higher consumption for doing so.

134 Marginal tax rates

If we look at marginal tax rates we see that

0 0 v (L1)/u (c1) = w1 so that the labour supply choices of the more able are still undistorted at the margin exactly as in the first best. However 0 0 1 − µ v (L2)/u (c2) = w2 ³ ´ < w2 w2 0 w2L2 1 − µw v w ³ ´ 1 1 0 w2L2 0 since w2/w1 < 1 and v < v (L2). w1 The marginal rate of substitution of the less able is below their pretax wage The distortion to their labour supply is a necessary feature of observing in- centive compatibility at the optimum.

135 Explaining optimality of labour supply distortion I

Suppose that labour supply of more and less able were both undistorted so that 0 0 0 0 v (L1)/u (c1) = w1 v (L2)/u (c2) = w2 Look for the possibility of a welfare-improving deviation consisting of infinites- imal changes ∆c1, ∆L1, ∆c2 and ∆L2. The change in social welfare would, using the undistortedness, be

0 0 0 0 u (c1)∆c1 − v (L1)∆L1 + u (c2)∆c2 − v (L2)∆L2 0 0 = u (c1) [∆c1 − w1∆L1] + u (c2) [∆c2 − w2∆L2]

136 Explaining optimality of labour supply distortion II

But if the government budget constraint were respected then

[∆c1 − w1∆L1] + [∆c2 − w2∆L2] = 0 so that the change in social welfare would be

0 0 (u (c2) − u (c1)) [∆c2 − w2∆L2] .

137 Explaining optimality of labour supply distortion III

Differentiating the incentive compatibility constraint gives µ ¶ 0 0 0 w2 0 w2 u (c1)∆c1 − v (L1)∆L1 − u (c2)∆c2 + v L2 ∆L2 = 0 w1 w1 from which, using the government budget constraint and lack of distortion again, · µ ¶¸ 0 0 0 w2 0 w2 − (u (c2) + u (c1)) [∆c2 − w2∆L2] = v (L2) − v L2 ∆L2 w1 w1 so that the change in social welfare would be 0 0 · µ ¶¸ u (c2) − u (c1) 0 w2 0 w2 − 0 0 v (L2) − v L2 ∆L2. u (c2) + u (c1) w1 w1

138 Explaining optimality of labour supply distortion IV

If there is any desire to redistribute from more to less able then

0 0 u (c2) − u (c1) > 0

A reduction in labour supply of the less able ∆L2 < 0 would raise social welfare if implemented compatibly with the given constraints.

139 Extending to the continuous case Extending this sort of analysis to more types means adding more incentive compatibility constraints. Most of these are redundant however under fairly weak conditions – it is necessary only to ensure that each individual is dissuaded from posing as the next least able person to ensure all constraints are satisfied. The limiting case as one adds more and more incentive compatibility con- straints for individuals nearer and nearer in abilities is a differential condition Write

U(w1) = u(c1)−v(L1) = u(c2)−v(w2L2/w1) = U(w2)+v(L2)−v(w2L2/w1) and let w2 − w1 approach zero to get dU/dw ≥ wv0(L)/L

140 Equivalence to optimum labour supply response

The same condition could be reached by differentiating the utility function and using v0(L)/u0(c) = w (1 − T 0(wL))

Differentiating gives dU dc dL = u0(c) − v0(L) dw dw dw dL = [u0(c)w (1 − T 0(wL)) − v0(L)] + (1 − T 0(wL)) Lu0(c) dw = (1 − T 0(wL)) Lu0(c) = wv0(L)/L

141 Optimum control formulation

As we reach the continuous case the problem becomes one of maximising Z [u(c) − v(L)] dF subject to budget constraint Z £ ¤ wL − c − R¯ dF = 0

This can be treated as an optimum control problem, for example by

• taking labour supply L as control variable

• taking utility U = u(c) − v(L) as state variable

• letting the incentive compatibility condition give an equation of motion for U

142 Hamiltonian formulation

The Hamiltonian for the problem is © £ ¤ª H = U + λ wL − c − R¯ f(w) + µLv0(L)/w with first order condition ∂H 0 = ∂L½ ¯ ¾ ∂c ¯ Lv00(L) + v0(L) = λ w − ¯ f(w) + µ ∂L¯ w U µ ¶ Lv00(L) = λwT 0(wL)f(w) + µu0(c) (1 − T 0(wL))) 1 + v0(L)

143 Costate variable

The evolution of the costate variable µ follows µ ¶ dµ ∂H λ = − = − 1 − f(w) dw ∂U u0(c) and satisfies the endpoint condition

µ(wmax) = 0

Thus Z µ ¶ wmax λ µ = 1 − 0 dF w u (c)

144 Optimum marginal rates I

Rearranging the first order condition gives µ ¶ T 0(wL) µυ0(c) Lv00(L) = − 1 + 1 − T 0(wL) λwf(w) v0(L)

The endpoint condition µ(wmax) = 0 ensures that the optimum marginal tax rate is zero at the top of the distribution just as in the discrete case. At lower ability levels we would expect a positive µ and therefore a marginal tax rate between 0 and 1. The variation in µ across ability levels will depend upon the strength of re- distributive social preferences.

145 Optimum marginal rates II

For lower values of w we also have an indication of other factors determining the optimum marginal tax rate. T 0(wL) is higher at points in the distribution where

• wf(w) is low: there is a low concentration of the highly able

• ∂v0(L)/∂L is high: hours choices are relatively insensitive to after-tax wages.

146 Combining income and commodity taxation

Allowing for nonlinear taxation of labour income has dramatic consequences for the optimum structure of commodity taxes. The earlier discussion of commodity taxation showed that nonuniform com- modity taxation could be useful to a redistributive government even if

a linear labour income tax were available

labour supply separable from commodities in preferences

This is no longer true if we remove the restriction of linearity on labour income taxes Result is due to Atkinson and Stiglitz

147 Combining income and commodity tax with discrete types

Take the setting with two ability types above and let utility be

U(q, L) = u(q) − v(L) where q is a vector of goods rather than a scalar consumption aggregate with pretax prices p0 We can rewrite the Lagrangean for the optimum tax problem as

u(q ) + u(q ) − v(L ) − v(L ) 1 h2 1 2 i 00 − λ w1L1 + w2L2 − p (q1 + q2) − R¯

+ µ [u(q1) − v(L1) − u(q2) + v(w2L2/w1)]

148 Optimal commodity taxes

First order conditions with respect to individual commodity demands qhi, h = 1, 2, i = 1, . . . , m are

0 0 (1 + µ)∂u/∂q1i = λpi (1 − µ)∂u/∂q2i = λpi so that, for both types h = 1, 2 and for any two goods i, j = 1, . . . , m, the marginal rate of substitution is optimally set equal to the pretax price ratio

∂u/∂qhi 0 0 = pi /pj. ∂u/∂qhj

149 Interpreting optimal uniformity

In fact, this implies the even stronger result that even nonlinear commodity taxation would serve no purpose. Critically driven by facts that

individuals face common pretax prices for commodities

utility is separable between commodities and leisure

The only restriction on achieving the first best is the incentive compatibility constraint arising because of the government’s inability to observe individual ability types Separability means that observing commodity demands is of no use in allevi- ating that

150 Combining income and commodity tax with continuous ability

Obviously this extend also to a continuous ability distribution. Firstly, using the budget constraint and first order conditions for optimum labour supply and commodity choice, we derive the equation for motion for U dU X dq dL = ∂u/∂q i − v0(L) dw idw dw "i # X dq dL v0(L) = p i − w (1 − T 0(wL)) idw dw w (1 − T 0(wL)) i = wv0(L)/L which is unchanged. Now rewrite the Hamiltonian n h io 0 H = U + λ wL − p0 q − R¯ f(w) + µLv0(L)/w

151 Optimal uniform taxes, again

First order conditions with respect to L and qi, i = 2, . . . , m include ∂H 0 = ∂q½i ¯ ¾ ∂q ¯ = λ −p0 1¯ − p0 f(w) 1 ∂q ¯ i ½ i U ¾ 0 ∂u/∂qi 0 = λ −p1 − pi f(w) ∂u/∂q1 from which it follows that

∂u/∂q1 0 0 p1/pi = = p1/pi ∂u/∂qi so that commodity choice is again undistorted.

152 Practical optimum tax formulae

Optimum tax formulae involve solving some rather challenging simultaneous differential equations Perhaps better to regard these formulae as informative about general princi- ples rather than as practical toolbox Recent developments have sought to bridge the gap to practical application. One approach (associated with Saez):

constrain the tax function to be linear over an interval

allow it to be set unconstrainedly optimally outside

consider factors determining the optimal constant marginal rate over that interval

153 Optimum high end tax rates

The easiest case to consider is a tax linear above a threshold. Suppose we impose a linear tax above wL = ζ with rate τ and implicit grant G. Consider a change leaving posttax income the same at ζ so that

∆G = ζ∆τ

154 Nonbehavioural revenue effects

First assess the revenue effects.

If noone changes behaviour in response then each person above ζ pays an additional (wL − ζ)∆τ so that mean effect is Z ∆τ (wL − ζ)dF wL≥ζ

155 Behavioural revenue effects

But behaviour does change. Each person above ζ responds so that earnings change by ∂ ln L ∆τ w(1 − τ)L ∂ ln L ∆τ w∆L = −wL + ζ ∂ ln w(1 − τ)1 − τ G ∂ ln G 1 − τ ∆τ = − (wLη − φζ) 1 − τ where η = ∂ ln L/∂ ln w(1 − τ) and φ = (w(1 − τ)L/G) ∂ ln L/∂ ln G. Hence there is a mean revenue gain from behavioural responses of Z τ∆τ − (wLη − φζ)dF 1 − τ wL≥ζ .

156 Welfare effects

Against this revenue loss we need to balance the welfare effects. By the envelope theorem we can ignore behavioural effects here. Suppose that we value public funds at γ times high earners’ incomes. R Welfare loss is γ∆τ wL≥ζ(wL − ζ)dF .

157 Optimum balance of revenue and welfare effects

At the optimum Z Z τ∆τ (1 − γ)∆τ (wL − ζ)dF − (wLη − φζ)dF = 0 wL≥ζ 1 − τ wL≥ζ and therefore R τ (1 − γ) (wL − ζ)dF = R wL≥ζ 1 − τ wL≥ζ(wLη − φζ)dF Notice here that γ may be a matter of value judgment but everything else is observable or estimable.

158 Practical formulae

If there is a known highest earnings level then the tax rate goes to zero at that point Suppose not, but Z (wL − ζ)dF/ζ(1 − F (ζ)) → (α − 1) wL≥ζ as ζ → ∞ and γ, φ and η are roughly constant above ζ. Then we have an applicable formula at high income levels

τ/(1 − τ) → (1 − γ)(α − 1)/(αη − φ)

159