ECONG011 Public Microeconomics
Ian Preston
1 Introduction
2 Individuals
We consider as starting point a competitive economy without government. Suppose that there are
m different consumption goods
n different types of labour
There are H individuals, h = 1,...,H, who
have endowments of goods ωh and consume quantities qh
each have an endowment of time, normalised to 1, from which they supply labour Lh
Individual preferences are captured in utility functions uh(qh,Lh)
3 Firms
There are K firms, k = 1,...,K, which undertake production plans which involve
using labour lk
to produce quantities of goods yk
according to technological requirements, say Gk(yk, lk) ≤ 0.
4 Trade
To simplify, we assume that
each firm produces only one type of good
each individual supplies only one type of labour
Furthermore the only types of trade that occur are
sales of labour from individuals to firms and
sales of goods from firms to individuals
In particular this avoids complications concerned with the tax treatment of trades in goods or labour between firms and firms or between individuals and individuals.
5 Prices
We begin without any government. Suppose both firms and consumers behave as price takers. Let
the pretax price vector for goods be p0
the pretax wage vector be w
6 Competitive behaviour
Firms maximise profits given technology
max πk = p0kyk − w0lk s.t.Gk(yk, lk) ≤ 0. lk,yk
Profits are then shared among individuals according to ownership shares δhk Individuals choose goods demands and labour supplies to maximise utility given their budget X h h h 00 ¡ h h¢ k h h max u (q ,L ) s.t. p q − ω − δhkπ − w L ≤ 0. qh,Lh k If we assume constant returns to scale then profits are zero in equilibrium.
7 Competitive equilibrium
A competitive equilibrium consists in prices and wages that lead to a feasible allocation of goods and labour X ¡ ¢ X qh − ωh − yk ≤ 0 h X Xk Lh − lk ≥ 0. h k Existence of an equilibrium is guaranteed given convexity of preferences and technology.
8 Welfare theorems
By the First Fundamental Theorem any such equilibrium is Pareto efficient By the Second Fundamental Theorem, any Pareto efficient allocation can be sustained in such an economy as a competitive equilibrium given an appropriate redistribution of endowments. These are standard results of earlier microeconomics courses.
9 Minimal role for government
Trade between agents in a competitive economy needs the protection of a legal system defining property rights and enforcing their recognition. This itself requires a form of government with expenses which need to be covered by the raising of public resources. The security offered by a functioning judicial system can be considered as a foundational example of a public good.
10 Public goods and externalities
The incorporation into the model of the existence of other public goods raises further issues about the economic role of government. Public goods can be privately provided but there are strong economic reasons to think it may be more efficient for government to act as provider. The existence of externalities associated with private goods raises related is- sues.
11 Equity
The particular competitive outcome associated with a specific initial distri- bution of endowments and abilities may well be considered unacceptably in- equitable compared to others that might follow from a redistribution of re- sources. Government may arise as the agent effecting such a redistribution through taxation and disbursement of public funds. To do so effectively the government needs to collect information The manner in which it implements taxation should not be such as to dis- courage individuals from revealing that information where it is needed.
12 Other roles for government
The assumption of price-taking behaviour may be inappropriate and the ex- istence of monopoly power raises a case for government regulation. The assumption that the economy settles naturally into equilibrium may also be unwarranted and point towards a case for macroeconomic intervention. These are important issues concerned with the role of government but dealt with in other courses.
13 Social welfare, inequality and poverty
14 Social choice
15 Social choice Before proceeding to discussion of the design of schemes for taxation and public provision, we need to establish a criterion to judge the outcomes of gov- ernment intervention. Suppose then that the government has to choose a social state x drawn from a choice set X. These could be thought of as defining points in an Edgeworth box in a purely competitive economy with private goods distinguished by things such as tax schedules levels of public provision of some good
16 Social choice relation
Individuals have preferences
%h, h = 1,...,H over those states as captured in utility functions ¡ ¢ U = u1, u2, . . . , uH
What we want is to determine a social choice relation %∗ over X as a function of the individual utilities U.
17 Welfarism
The view that only satisfaction of preferences matters to social evaluation is known as welfarism Often taken for granted in economic discussion but it is restrictive Rules out consideration of certain things sometimes considered important such as rights, duties, etc
High rates of tax on alcohol may be motivated by moral disapproval of drinking
Taxation of labour may be influenced by views on the virtue of work
Certain libertarian perspectives take a view of property rights that makes them regard redistributive taxation as theft
18 Impossibility of a Paretian liberal
Problems arise if preferences can have regard to activities of others As an example, we can take Sen’s proof of the impossibility of a Paretian liberal. Suppose there are two individuals, a puritan P and a libertine L. There is a salacious novel and we consider social choice over three states
the novel is read by noone x0
the novel is read by the puritan alone xP
the novel is read by the libertine alone xL
19 Impossibility of a Paretian liberal: Preferences
The puritan would rather noone read the novel but if anyone is going to read it then he would rather it were him than the libertine:
x0 ÂP xP ÂP xL
The libertine would least prefer that the book be unread but he also prefers that the puritan read it than that he himself does:
xP ÂL xL ÂL x0
20 Impossibility of a Paretian liberal: Social choice
∗ By the (welfarist) Pareto principle xP Â xL since everyone shares that pref- erence. But this is inconsistent with the liberal view that it is a matter only for the individual concerned to choose whether or not to read the book if the alternative is that noone do so: ∗ ∗ xL Â x0 x0 Â xP since these views together generate a cycle in social preferences:
∗ ∗ ∗ x0 Â xP Â xL Â x0
21 Invariance
The options for aggregation of individual preferences depends upon the infor- mation assumed to be contained in the individual utilities A convenient way of capturing this is by defining classes of transformations under which the social choice relation is invariant We specify the information content of utilities by requiring ¡ ¡ ¢ ¡ ¢ ¢ ¡ ¢ %∗ φ1 U 1 , φ2 U 2 ,...,U H; X =%∗ U i,U 2,...,U H; X for all φ1, φ2, · · · ∈ Φ where Φ is some class of transformations.
22 Ordinal comparability assumptions
Ordinal Noncomparability, ONC: Φ contains all increasing φi
Individual preference orderings are known but no interpersonal com- parisons of preference intensity are permitted
Corresponds to the assumption that we know no more than we can identify from individual choice behaviour
Ordinal Level Comparability, OLC: Φ contains all common increasing φi
Restriction that transformations must be common means that we can say whether one individual is better off or worse off than another
23 Cardinal comparability assumptions
Cardinal Noncomparability CNC: Φ contains all increasing φi = ai + biU
Affine transformations are permitted but since parameters can be individual specific this is not very different from ONC
Cardinal Unit Comparability CUC: Φ contains all φi = ai + bU Cardinal Full Comparability CFC: Φ contains all increasing φi = a + bU Cardinal Ratio Scale Comparability CRS: Φ contains all increasing φi = bU
Requiring common parameters in admissible affine transformations strengthens comparability
24 Arrow’s Theorem
Arrow’s General Possibility Theorem shows that ONC severely restricts the possibility for social choice Arrow proved that no social choice relation can satisfy all of the following:
• Universal Domain
• Pareto Principle
• Independence of irrelevant alternatives
• Nondictatorship
25 Arrow’s requirements I
• Universal Domain: The social choice relation should be complete and tran- sitive for any choice set X.
∗ ∗ For all xA, xB in any X, either xA % xB or xB % xA
∗ ∗ ∗ For all xA, xB, xC in any X, if xA % xB and xB % xC then xA % xC • Pareto Principle: The social choice relation should respect unanimous pref- erence.
∗ xA % xB if xA %h xB for all h = 1,...,H
26 Arrow’s requirements II
• Independence of irrelevant alternatives: The restriction of the social choice relation to any pair of outcomes should be independent of the wider choice set X.
∗ ∗ If xA % xB when X = {xA, xB} then xA % xB whenever X ⊇ {xA, xB} • Nondictatorship: No one individual should decide the social choice relation.
∗ There is no h such that xA % xB if and only if xA %h xB
27 Interpreting Arrow
Sen interprets the result as arising from ”informational famine”:
Firstly, welfarism demands that you allow only utility information to enter into social choice decisions
Secondly, assumptions are made so that that utility information is utterly impoverished
28 Outline of a proof
The proof of the theorem can be loosely summarised as follows:
• Take two outcomes and suppose opinion differs between two groups which exhaust the population
• Social preference has to follow the opinion of one or other group
• Their opinion is decisive over this pair and over any choice where opinion is similarly split
• It cannot matter to their decisiveness that their opinion is opposed by the others
• Within the group there must be a decisive subgroup
• It is possible to keep dividing until you arrive at an eventual dictatorship.
29 Almost decisiveness
Consider options x and y.
Suppose the population divides into two groups A and B such that x Âi y for i ∈ A and y Âi x for i ∈ B. By ONC this is all the information that social choice can use If social choice favours x over y then we say that A is almost decisive over {x, y} This means that their opinion prevails over {x, y} whenever it is unanimous and they are opposed by everyone else
30 Almost decisiveness implies decisiveness
In fact their opinion must prevail over any pair of outcomes where preferences are similarly split. Suppose there were outcomes a and b such that
a Âi x Âi y Âi b for i ∈ A y Âi b Âi a Âi x for i ∈ B
Then x %∗ y since A is almost decisive over {x, y} But a %∗ x and y %∗ b by the Pareto principle Therefore a %∗ b by transitivity. Preferences over {x, y} cannot have mattered to this by IIA Therefore A must also be almost decisive over {a, b}.
31 Opposition is irrelevant
The Pareto principle says that social choice is positively responsive to indi- vidual preferences so it surely cannot be important to the decisiveness of A that their opinion is opposed by everyone else Suppose that there is a third outcome z and suppose that
x Âi y Âi z for i ∈ A y Âi x, y Âi z for i ∈ B
By the Pareto principle, y Â∗ z Therefore, since x %∗ y by almost decisiveness of A and social choice is transitive it must be that x Â∗ z However we have said nothing about preferences in B between x and z Hence A is decisive over any pair whether opposed or not.
32 There must be a decisive subgroup Suppose there is a third option z and that we can divide A into two groups
A1 and A2 such that
x Âi y Âi z for i ∈ A1
z Âi x Âi y for i ∈ A2
y Âi z Âi x for i ∈ B Notice these are just the sort of preferences that create a majority voting cycle if none of A1, A2 and B constitute a majority Now x %∗ y by decisiveness of A ∗ If z % y then A2 is decisive since they are the only group with this preference ∗ ∗ On the other hand, if y % z then, by transitivity, x % z and A1 is decisive Either way, some subgroup is decisive.
33 There must be a dictator
Repeat these arguments until the decisive subgroup has shrunk to a singe individual This individual is therefore a dictator The proof is complete
34 Responses to Arrow’s Theorem: Drop transitivity
We can drop the requirement that the social choice relation be transitive so as to allow for example that
• x ∼∗ y and y ∼∗ z but x Â∗ z (which is allowed by quasitransitivity)
• x Â∗ y and y Â∗ z but x ∼∗ z (which is allowed by acyclicity)
This would allow, for example, the Pareto rule which says that
x Â∗ y if everyone prefers x to y but x ∼∗ y otherwise
Such a rule is not transitive but it is quasitransitive: if H = 2, x Â1 z Â1 y ∗ ∗ ∗ but y Â2 x Â2 z then x ∼ y, y ∼ z but x  z Other related forms of group dictatorship would be allowed such as saying that x Â∗ y if and only if everyone with blue eyes preferred x to y.
35 Responses to Arrow’s Theorem: Restrict the domain
Ruling out certain classes of individual preference orderings would allow non- dictatorial social choice relations satisfying Arrow’s other axioms. In particular, majority voting gives a social choice relation which is intransitive for certain configurations of individual preferences – those that give rise to majority voting cycles – but it is possible to rule these out by prohibiting certain preferences at the individual level. Ruling these out means that the step in the proof whereby any decisive group can be shrunk down to a smaller one is not possible.
36 Single-peakedness
Particularly important are single-peaked preferences. Suppose that
the options to be considered can be ordered along a single dimension X
∗ each individual i has a bliss point ξi ∈ X
∗ if comparing any two outcomes on the same side of ξi they prefer the ∗ one nearer to ξi
∗ ∗ ∗ ∗ (x − ξi )(y − ξi ) > 0 ⇒ x %i y iff |x − ξi | ≤ |y − ξi |
37 Impossibility of majority voting cycles Under single-peakedness, any triple can be ordered in such a way, say x < y < z, such that the middle option y is never the least preferred. The population can therefore be split into four groups (neglecting indiffer- ence):
A : x Âi y Âi z B : z Âi y Âi x
C : y Âi x Âi z D : y Âi z Âi x Pairwise majority voting over these three options cannot produce a cycle. If either A or B have a majority of the population then their prefer- ences prevail and are obviously transitive If neither have a majority then there are majorities for y over both x and z and there cannot therefore be a cycle
38 Condorcet winner
An option which beats every other in pairwise votes is said to be a Condorcet winner. If preferences are single peaked then the bliss point of the median voter is a Condorcet winner This is the median voter theorem of Black. Actual public choice mechanisms will not necessarily select such an outcome however since
voting is rarely over single-dimensional issues
the mechanism for aggregating votes may not pick a Condorcet winner even if it is and one exists
39 Responses to Arrow’s Theorem: Drop independence
IIA rules out social choice relations such as the Borda rule, plurality voting or instant run-off voting.
• The Borda rule has each individual rank the alternatives assigns scores according to the position in the ranking adds these scores across individuals as the basis for social choice This is transitive for any preferences within any choice set but the social preference between two elements x and y varies as other elements are added or subtracted from X. • Plurality voting judges one outcome better than another if it is the most preferred element within X of more people
40 Violations of IIA
Suppose
#{x Âi y Âi z} = 3 #{y Âi z Âi x} = 4 #{z Âi x Âi y} = 2
A majority prefer x to y so if the choice set contains only x and y then application of plurality voting or the Borda rule obviously judges x Â∗ y. Suppose however that choice is made from the set {x, y, z}.
Since y is the most preferred choice of more voters than x in this set, plurality voting puts y above x
The Borda score for y is 20 (ie 2 × 3 + 3 × 4 + 1 × 2 ) and the Borda score for x is 17 (ie 3 × 3 + 1 × 4 + 2 × 2) so the Borda rule also puts y above x.
41 Responses to Arrow’s Theorem: Relax invariance
Enriching the quality of the utility information is a final alternative.
If we relax ordinal noncomparability (ONC) to ordinal level comparability OLC then it becomes possible to compare levels of utility. This admits, for example, dictatorship by position in a welfare ranking so that social choice can be according to what is preferred by the least well-off person (or the median person).
42 Social welfare functions
Cardinal unit comparability (CUC) is the strongest invariance requirement allowing utilitarianism. P P If i Ui(x) ≥ i Ui(y) then X X [ai + bUi(x)] ≥ [ai + Ui(y)] i i so the sum of utilities can be used as a social choice criterion. Cardinal ratio-scale comparability (CRS) allows general homothetic social 1 P ρ welfare aggregates such as ρ i Ui Numerical full comparability (NFC), which rules out any non-identity trans- formations at all, allows a general social welfare function W (U1,U2,...,UH) From now onwards this last case will be assumed.
43 Inequality
44 Income based social welfare functions Suppose now that we are in a situation where outcomes can be compared according to individual utilities depending on a single monetary measure which we call income, yi. We can therefore write social welfare as a function of the vector of incomes
Ω(y1, y2, . . . , yH) = W (U1(y1),U2(y2),...,UH(yH)). To simplify exposition, let us assume that individuals are ranked by income so that yi ≥ yj if i > j. To be compatible with the Pareto criterion, social welfare should be increasing in each individual utility since utility is increasing in income Ω(·) should be increasing in each income
45 Inequality and income gaps
One aspect of social judgment that we want to build into social welfare is aversion to inequality To do that we need to decide what constitutes a reduction in inequality. One very strong criterion is the closing up of all income gaps, concertina- fashion,
either in the relative sense that ratios of incomes all become nearer to 1
or in the absolute sense that all income gaps become closer to zero
46 Pigou-Dalton criterion
Another common criterion is the so-called Pigou-Dalton condition:
inequality is reduced by any transfer of income from a richer to a poorer person (Pigou-Dalton transfer or Robin Hood transfer).
Often regarded as uncontroversial even though Pigou-Dalton transfers do not uniformly close up income gaps. If we start from an income vector (1, 3, 5) and transfer income from the richest to the poorest person (in the true spirit of Robin Hood) so as to get (2, 3, 4) then no-one would deny that inequality has fallen If we were to transfer from the richest to the middle so as to get (1, 4, 4) it would be clear that we had reduced inequality in the upper half of incomes but the poorest person would now be further behind the next poorest
47 Generalised Lorenz curve
Define the generalised Lorenz curve G(i) as a function of position in the income distribution by cumulating incomes and dividing by population H 1 Xi G(i) = y H i j=1 The generalised Lorenz curve therefore runs from 0 toy ¯ It is convex by construction since the slope is proportional to income and incomes are ranked.
48 Lorenz curve
The Lorenz curve is constructed in the same way but dividing through by 1 P mean incomey ¯ = H i yi 1 Xi L(i) = y Hy¯ i j=1 The Lorenz curve runs from 0 to 1 and is similarly convex
49 Effect of Pigou-Dalton transfers
A Pigou-Dalton transfer from i to j
leaves the Lorenz curves unchanged outside of the range i to j
but raises the Lorenz curve at all points in between
We say that the vector after the change Lorenz dominates that before the change, meaning that the Lorenz curve is nowhere lower and somewhere higher. If Lorenz dominance holds then, wherever you divide the income ranking, the poorer fraction of the population have a greater share of income in one case than in the other
50 Lorenz curves and inequality with fixed mean incomes
Not only do Pigou-Dalton transfers lead to Lorenz dominance but
if mean incomes are the same, then one Lorenz curve dominates an- other only if it is possible to get from the one to the other by a series of Pigou-Dalton transfers
Also, if mean incomes are unchanged, then there will be Lorenz dominance if either all relative gaps or all absolute gaps are closed up.
51 Pigou-Dalton transfers and social welfare functions
A social welfare function which increases in response to Pigou-Dalton transfers is called Schur-concave.
This will be true if µ ¶ ∂Ω ∂Ω − (yi − yj) ≤ 0 ∂yi ∂yj For obvious reasons, such social welfare functions are also referred to as Lorenz-consistent. If the social welfare function is additive and anonymous X Ω(y1, . . . , yH) = φ(yi) h then Schur-concavity is equivalent to concavity of the individual utility function φ(y).
52 Generalised Lorenz dominance
We want to extend comparisons of inequality and social welfare to cases where mean incomes differ. There is a result due to Shorrocks that shows that the only changes which increase all social welfare functions which are increasing and Schur concave are changes which raise the generalised Lorenz curve. If generalised Lorenz dominance holds then what that says is that, wherever you divide the income ranking, the poorer fraction of the population have a greater total income in one case than in the other.
53 Relative and absolute inequality
To extend inequality comparisons we need to make a judgment as to what sort of changes, among those that do not leave the mean unchanged, nonetheless do not affect inequality. The most common view is the relative one under which scaling up incomes by a common factor leaves inequality unchanged There is also an absolute view under which it is equal translations of all incomes that leave inequality unchanged If we accept the relative view then we can continue to use Lorenz dominance since scaling of all incomes leaves the Lorenz curve unaltered.
54 Relative inequality measures
A relative inequality measure F (·) is any function that is
Schur convex (ie −F (·) is Schur concave)
invariant to scaling (ie homogeneous of degree zero)
There are many examples such as:
• the coefficient of variation – in other words, the ratio of the standard devia- tion to the mean s 1 1 X (y − y¯)2 y¯ H i i
55 Gini coefficient
The Gini coefficient is twice the area between the Lorenz curve and the diag- onal along which the Lorenz curve would lie if all incomes were the same 2 X i (y − y¯) H2y¯ i i
The link to income gaps can be seen by reexpressing it as the mean relative income gap 1 X X |y − y | H2y¯ i j i j
56 Counter-example: the variance of logarithms
An example of a zero-degree-homogeneous function which might be expected to be Schur-convex but turns out not to be is the variance of logarithms 2 1 X 1 X ln y − ln y H i H j i j
This fails to be Schur-convex because of the way that the geometric mean 1 P H j ln yj can be changed by progressive transfers
57 Equally distributed equivalent income
Suppose there is a homothetic social choice relation. This can be represented by a linearly homogeneous social welfare function Ω(y). Then we can define the equally distributed equivalent income ξ as that income which if given to everyone would generate the same social welfare as the actual income vector
Ω(y1, y2, . . . , yH) = Ω(ξ, ξ, . . . , ξ) = ξΩ(1, 1,..., 1)
The equally distributed equivalent income ξ is itself in fact a particular ho- mogeneous social welfare function representing the given social choice relation.
58 Atkinson-Kolm-Sen inequality index
Now we can construct a relative inequality index as
I = 1 − ξ/y¯
This index is Schur convex and homogeneous of degree zero as required. It can be thought of as the fraction of income wasted from a social welfare perspective as a consequence of inequality. The idea is attributed to Atkinson, Kolm and Sen, all writing separately.
59 Atkinson inequality index
Atkinson’s particular measure proceeds from a social welfare specification 1 X Ω = y1−² 1 − ² i i where ² > 0 is interpreted as an inequality aversion parameter The corresponding inequality index is " #1/(1−²) 1 1 X 1 − y1−² y¯ H i i
60 Equity and efficiency
One thing that is neat about this is the existence of a social welfare measure
ξ =y ¯ (1 − I) conveniently represented as the product of mean income and an equality measure This nicely captures the equity and efficiency aspects to social welfare mea- surement. The whole reasoning here can also be reversed so that one can start with an inequality measure and derive a corresponding social welfare measure using the same formula If we begin with the Gini coefficient for example then we derive a social welfare measure which is exactly the area under the generalised Lorenz curve.
61 Taxes and inequality
It is important to establish which sorts of taxes reduce inequality For the moment we ignore behavioural responses and assume a tax function
T (y) applied to fixed incomes yi, i = 1,...,H. We assume marginal tax rates everywhere between 0 and 1 so that tax pay- ments increase with incomes but the pretax rich remain the posttax rich.
62 Progressive taxation
We say that a tax is progressive if T (y)/y is increasing in y so that the average tax rate rises with income. Equivalently, progressive taxes
have an elasticity of taxes to incomes which is greater than one
have marginal tax rates T 0(y) everywhere greater than average tax rates T (y)/y
63 Progressive taxation and inequality
There are two points to note about such taxes Firstly T (yj) yj > if yj > yi T (yi) yi so tax payments are more unequal – which means more heavily concentrated on the rich – than the incomes to which they are applied Secondly yj − T (yj) yj < if yj > yi yi − T (yi) yi so incomes after tax are more equal than incomes before tax. Progressive taxes are the only sorts of taxes which ensure that these facts are true whatever the pretax income distribution.
64 Progressive taxation and the Lorenz curve
As a consequence, given that the ranking of individuals by taxes, by incomes before and after tax all coincide, if taxes are progressive then
the Lorenz curve for incomes after tax Ly−T (i) lies above the Lorenz
curve for incomes before tax Ly(i)
the Lorenz curve for tax payments LT (i) lies below the Lorenz curve
for incomes before tax Ly(i)
65 Redistributive effect and departure from proportionality
If we let T¯ be the mean tax payment then 1 T/¯ y¯ L (i) = L (i) − L (i) y−T 1 − T/¯ y¯ y 1 − T/¯ y¯ T T/¯ y¯ ⇒ L (i) − L (i) = [L (i) − L (i)] y−T y 1 − T/¯ y¯ y T so that
the redistributive effect Ly−T (i) − Ly(i) can be linked to
the departure from proportionality Ly(i) − LT (i) and the average tax rate T/¯ y¯
66 Poverty
67 Poverty
Poverty is concerned with the failure of incomes to meet basic needs at the bottom end To operationalise this we need to identify an income level (the poverty line, z) minimally sufficient to cover those needs Whether or not z should itself depend on the income distribution is a debated issue There are
some needs (food, shelter, etc) that are not particularly dependent on the incomes of others
some needs (dignity, self-respect, etc) that are so
68 Headcount ratio
One obvious measure is the headcount ratio which simply records the pro- portion of the population below z 1 P = #(y < z) H This is a superficially attractive, because simple, measure and evidently a popular focus of public debate on the issue It is a measure however which fails to satisfy some basic properties
69 Poverty axioms I
An increase in the income of any poor person ought to decrease poverty The headcount fails to satisfy this since it is completely insensitive to how poor the poor are An increase in the income of a poor person reduces the headcount only if it takes that person across the poverty line.
70 Shortfall index
A measure that avoids this weakness is the shortfall index Q which is based on reckoning up the total gap between the incomes of the poor and the poverty line Define for each individual a censored income
y˜i = min [yi, z] and a poverty gap
gi = max [z − yi, 0] = z − y˜i then 1 X 1 X Q = g /z = 1 − y˜ /z. H i H i i i If m is the mean income of the poor then Q = P (1 − m/z).
71 Poverty axioms II
A transfer of income from a more to a less poor person reduces poverty This is a less obvious requirement but one that neither the headcount ratio nor the shortfall index satisfy It is essentially a demand that the poverty measure be sensitive to the in- equality among the poor. Alternatively it can be seen as a demand that the needs for relief of the poorest be recognised as in more urgent need of relief.
72 Poverty indices I: Foster-Greer-Thorbecke
Foster, Greer and Thorbecke are associated with the proposal to measure poverty by the mean of some convex transformation of poverty gaps 1 X R = φ(g ) 1 H i i for some convex φ
73 Poverty indices II: Clark-Hemming-Ulph
Clark, Hemming and Ulph are associated with the proposal to measure poverty by taking the equally distributed equivalent income of the truncated distribu- tion, say ξ˜, and calculating ˜ R2 = 1 − ξ/z.
If we let I˜ be the inequality index of the truncated distribution then ³ ´ ˜ R2 = 1 − ξ/y˜ (˜y/z) = 1 − (1 − I˜)(1 − Q)
74 Commodity taxation
75 Equivalences and normalisations
76 Linear taxes
Goods priced at p0 before tax are subject to ad valorem taxes at rates
t = (t1, t2, . . . , tn)
Labour is subject to a linear income tax at rate τ and individuals are paid a lump sum grant G. Thus X 0 h h h pi (1 + ti)qi ≤ w (1 − τ)L + G i
77 Num´erairesand untaxed goods
It is important to be clear about issues of normalisation. Only relative prices are determined in equilibrium. Any one price can be set to unity before and after tax (we call such a good a num´eraire). That good is untaxed by construction as a normalisation and not a restric- tion. Typically we take that good to be labour when discussing commodity taxes but that does not mean that there is a restriction prohibiting taxation of leisure for which commodity taxes need to correct It also makes no sense to talk about which goods are taxed and which sub- sidised at the optimum except relative to a particular normalisation.
78 Equivalences
What matter are individual budget sets. Suppose taxes on goods and labour are as described so that X 0 h h h pi (1 + ti)qi ≤ w (1 − τ)L + G i An identical budget set is achieved with no labour income tax, goods taxes
(ti + τ)/(1 − τ) and grant G/(1 − τ) µ ¶ X 1 + t G p0 i qh ≤ whLh + . i 1 − τ i 1 − τ i Note that a pure labour income tax at rate τ with grant G is equivalent therefore to a uniform commodity tax at rate t = τ/(1 − τ) with a grant G/(1 − τ).
79 Normalising pretax prices
From now onward, we assume pretax prices are all equal to unity, p0 = 1 This is another harmless normalisation rather than a loss of generality, achieved by choice of units of measurement for the goods.
80 Welfare analysis of tax reforms
81 Welfare analysis of small tax increase
Consider the marginal effect of raising the tax rate on the kth good, tk ∂ ∂ V (w, p, G) = − V (w, p, G)qk = −θqk ∂tk ∂G where θ = ∂V/∂G by Roy’s identity. The first order welfare effect is proportional to consumption of the good. P Revenue is R = k tkqk − G so " # ∂ X ∂f R(w, p, G) = q + t i ∂t k i∂p k i k
82 Comparing tax raising options
Marginal welfare loss per unit of revenue gained is therefore
∂V/∂tk qk λk = − = θ P ∂fi ∂R/∂tk qk + ti i ∂pk This offers a means of comparison of different tax raising options while also being suggestive of optimum design.
If λi > λj then there exists a marginal shift of taxation from good i to good j which can raise welfare without losing revenue.
It is only if all λi, i = 1, . . . , n are equal that no improvement is possible.
83 Welfare analysis of larger tax increase
For a non-marginal change · µ ¶¸ ∆V 1∆tk ∂ ln θ ∂ ln fk = −θqk 1 + + ∆tk 2 tk ∂ ln pk ∂ ln pk The higher order approximation brings in terms relating to demand elasticities
∂ ln fk/∂ ln pk. Empirical investigation with actual demand estimates suggests such higher order terms may be important for getting the distribution of welfare effects correct.
84 Optimum commodity taxation
85 Optimum taxation of a homogeneous population Suppose there is a population of identical individuals so that distributional issues can be put aside. The government can raise its revenue requirement R¯ only through commodity taxes and therefore tries to solve max V (w, 1 + t, 0) s.t.R = R¯ t Its first order condition is ∂V ∂R + λ = 0 ∂tk ∂tk where λ is a Lagrange multiplier for the revenue constraint.
Thus λk as defined above is equated across goods ∂V/∂tk qk λk = − = θ P ∂fi ∂R/∂tk qk + ti i ∂pk
86 Ramsey rules
Using some demand theory to develop the implications " # X ∂f 0 = −θq + λ q + t i k k i∂p i k " µ ¶# X ∂g ∂f = −θq + λ q + t i − q i k k i ∂p k ∂y i k This can be rearranged to give an expression à ! X ∂g θ X ∂f t k = −q 1 − − t i ≡ (b − 1)q i ∂p k λ i ∂y k i i i P where b ≡ (θ/λ) + i ti∂fi/∂y is the marginal social value of income adjusted for the value of any demand-related change in tax revenue These expressions are the Ramsey rules for optimum commodity taxation..
87 Marginal social value of income
Multiplying by tk and summing gives X X X ∂gk titk = (b − 1) tkqk = (b − 1)R¯ ∂pi i k k The left hand side expression is nonpositive by negativity of the Slutsky matrix Thus 1 − b has the same sign as the revenue requirement R¯.
88 Samuelson interpretation
Optimal taxes are zero if R¯ = 0 since there is no point causing deadweight loss if no revenue needs to be raised. If R¯ > 0 then X t i η∗ = b − 1 < 0 1 + t ki i i so that taxes are so designed that there are equal proportional compensated falls at the margin, in the interpretation credited to Samuelson. The left hand side is called by Mirrlees an “index of discouragement” which is equated across goods.
89 Inverse elasticity rule
If it were the case that compensated cross-price effects were small so that
∂gk/∂pi ' 0 for i 6= k then ti b − 1 ' ∗ 1 + ti ηii Taxes are highest on goods with lowest compensated own price elasticity. This is the so-called inverse elasticity rule. Deadweight loss is lowest where taxes are placed on goods least responsive to taxes. If there are no cross-price elasticities to consider then deadweight loss is ap- proximately given for each good by the area of a triangle beneath a compensated demand function and increases with the square of the tax rate The inverse elasticity rule minimises the sum of these triangles.
90 Optimum lump sum taxation
Suppose the government can now use the uniform grant G. The first order condition would be " # X ∂f 0 = θ + λ t i − 1 = λ(b − 1) i ∂y i
Thus b = 1 and ti = 0 for all taxes. All revenue would be raised through the lump sum tax to avoid deadweight loss.
91 Optimum taxation of a heterogeneous population
For a many-person economy the marginal effect of raising tk on social welfare is X X ∂ ∂W d h h h h W (·) = − h V (·)qk = − β qk ∂tk ∂V dy h h by Roy’s identity. Even in the utilitarian case where ∂W = 1, βh will vary across households ∂V h because of variation in θh. P P h Revenue is R = h k tkqk − HG so " # X X X h ∂ h ∂qi R = qk + ti ∂tk ∂pk h i h
92 First order conditions
The Lagrangean for the optimum tax problem is " # X X h ¯ W (V1,V2,...,Vn) + λ ti qi − HG − R i h and first order conditions for solution imply " # X X X X h h h h ∂fi 0 = − β qk + λ qk + ti ∂pk h h i h " µ ¶# X X X X h h h h h ∂gi h∂fi = − β qk + λ qk + ti − qk ∂pk ∂y h h i h
93 Modified Ramsey rules
Therefore à ! X X h X h X h ∂gk h β ∂fi ti = − qk 1 − − ti ∂pi λ ∂y i h Xh i h h ≡ (b − 1)qk h ¡ ¢ h h P h where b ≡ β /λ + i ti∂fi /∂y is the net marginal social value of income These are the modified Ramsey rules applying to a many-person economy.
94 Optimum setting of uniform grant
The first order condition with respect to G is now " # X X X ∂f X 0 = βh + λ t i − H = λ (bh − 1) i ∂y h i h h so that bh only needs to equal one on average across the population. Uniform lump sum taxes are no longer an optimal way to raise revenue.
95 Taxing according to distributional characteristics
h h The covariance between b and qk matters, reflecting the distributional quality of the good. If a good is consumed heavily by people with a low net social marginal valu- ation of income – in other words, those whose needs are considered less socially pressing – then that is reason to tax it heavily in the optimum scheme. There is a reason, in other words, to tax luxuries heavily. This still needs to be weighed against the Ramsey efficiency considerations regarding sensitivity of compensated demands to distortion.
96 Linear Engel curves
97 Optimum uniform taxation
Optimal commodity taxes will generally not be uniform but there are certain classes of preferences which can be shown to imply uniformity of the optimum. In particular, Deaton has shown that the assumptions of
• weak separability of commodities and leisure
• linearity of Engel curves are alone sufficient to imply uniform optimal tax rates.
98 Preferences with linear Engel curves
Such preferences correspond to an indirect utility function of the form µ ¶ whL + G + B(p) V (wh, p, G) = max φ L, L A(p) where both A(p) and B(p) are homogeneous of degree one. Labour supply choice solves µ ¶ µ ¶ whL + G + B(p) wh whL + G + B(p) φ L, + φ L, = 0 1 A(p) A(p) 2 A(p) enabling us to write a labour supply function µ ¶ wh G + B(p) L(p, wh,G) = ζ , A(p) A(p) for some ζ
99 Indirect utility and expenditure function under linearity
The indirect utility function therefore has the form µ ¶ wh G + B(p) v(p, wh,G) = ψ , A(p) A(p) for some ψ Inverting gives an expenditure function of the form µ ¶ wh e(p, wh, υ) = A(p)γ , u − B(p) A(p) for some γ
100 Hicksian and Marshallian demands under linearity
By Shephard’s Lemma the Hicksian demands for goods and supply of labour are · ¸ wh g (p, wh, υ) = A γ − γ − B k k A 1 k h χ(p, w , υ) = −γ1
By adding up and by the homogeneity properties of A(p) and B(p) whL + G + B wh = γ − γ A A 1 Substituting this into the compensated demand functions above gives Mar- shallian demands for goods ¡ ¢ A £ ¤ f wh, p, G = k whL + G + B − B k A k with L given as above, so that Engel curves are all linear.
101 Many-person Ramsey rules under linearity I
Now, noting which terms do and do not vary with h, use these expressions to substitute into the two sides of the modified Ramsey rule expression. Firstly, X A X (bh − 1)qh = k (bh − 1)whLh k A h h
102 Many-person Ramsey rules under linearity II
Secondly, note that homogeneity requires X h h h h (1 + ti) ∂gk /∂pi + w ∂gk /∂w = 0 i so that if there are uniform taxes, ti = t for all i, then X X h X X h ∂gk t ∂gk ti = (1 + t) ∂pi 1 + t ∂pi i h i h t X ∂gh = − wh k 1 + t ∂wh h t X ∂χh = − wh 1 + t ∂pk h µ ¶ µ ¶ t X wh 2 wh = A γ , υ 1 + t k A 11 A h
103 Optimality of uniformity under linearity
The only term varying with k on either side is the common factor Ak and therefore equating of the two sides reduces to the same condition for each of the n goods µ ¶ µ ¶ X whLh t X wh 2 wh (bh − 1) = γ , υ A 1 + t A 11 A h P h t (bh − 1)whLh ⇒ = Ph h h h 1 + t h w L ηLL h h h where ηLL = ∂ ln χ /∂ ln w is a compensated labour supply elasticity. Uniform taxes are therefore optimal.
104 Welfare improving movement towards uniformity
Deaton has also shown that if you add the assumption of
additive separability across all goods then not only does the optimum involve uniform taxes but any local movement towards uniformity is necessarily welfare-improving. It would, of course, therefore be redundant to go to data with a demand system of this sort to analyse such a problem
105 Demand estimation
106 Working-Leser Engel curves
An example of a flexible specification is provided by PIGLOG prefernces. The origin of such a specification lies in the so-called Working-Leser form for uncompensated budget share equations. If we let y denote total spending on goods then this specification has
wi(y, p) = ai(p) + bi(p) ln y
This is a specification placing no restriction on price responses but forcing budget shares to be linear in the logarithm of total commodity spending It has proved to fit many data sets well.
107 PIGLOG preferences
The corresponding indirect utility function is
max φ(L, V (wL + G, p)) L where the separable subutility corresponding to goods has the form ln [y/A(p)] V (y, p) = B(p) where A(p) is homogeneous of degree one and B(p) homogeneous of degree zero. Note that if B(p) = 1 then subpreferences over goods are homothetic Any dependence of budget shares on total spending determined by the prop- erties of B(p).
108 PIGLOG budget shares
By Roy’s identity
∂V/∂ ln pi wi(y, p) = − ∂V/∂· ln y ¸ 1 ∂ ln A ln (y/A) ∂B = −B − − 2 B ∂ ln pi B ∂ ln pi ∂ ln A ∂B = + ln (y/A) ∂ ln pi ∂ ln pi ao that ai(p) and bi(p) can be identified with the price elasticities of the indices A(p) and B(p). Note that if B(p) is constant then budget shares depend only on prices, com- patibly with the earlier observations about homotheticity.
109 Almost Ideal Demand System
The Almost Ideal Demand System (AIDS) is an example of PIGLOG prefer- ences formed by choosing functional forms for A(p) and B(p) X 1 X X ln A(p) = α + α ln p + γ ln p p 0 i i 2 ij i j X i i j ln B(p) = βi ln pi i so that X ∗ wi(y, p) = αi + γij ln pj + βi ln [y/A(p)] j ∗ 1 where γij = 2 (γij + γji).
110 Integrability restrictions for AIDS I
One advantage of such a functional form is that satisfaction of integrability restrictions is relatively easily imposed. P • Adding Up: If i wi(y, p) = 1 for any y and p then X X X ∗ αi = 1, βi = γji = 0 i i i
• Homogeneity: In order that wi(λy, λp) = wi(y, p) for any y and p it must be that X ∗ γij = 0 j and that A(p) is linearly homogeneous so that also X X ∗ αi = 1, γji = 0 i i
111 Integrability restrictions for AIDS II
∗ ∗ • Symmetry: Symmetry is satisfied if γij = γji • Negativity: Negativity is an inequality restriction and is typically checked to hold given the range of variation of y and p in the data.
Note that these restrictions are clearly not independent – for example, adding up and symmetry imply homogeneity.
112 Estimating AIDS
The equations to be estimated would be linear in parameters were it not for the A(p) term scaling total budget. It is possible to estimate by beginning with an approximation to A(p) such as the Stone price index X S(p) = w¯i ln pi i wherew ¯i is the population mean budget share, substituting into the budget share equations and estimating the parameters linearly. These estimates can then be used to update the price index and proceed iteratively until convergence (which is guaranteed).
113 Imposing integrability
Adding up does not need to be imposed since budget shares add to 1 in the data Homogeneity is typically imposed by expressing all nominal quantities relative to the price of a certain num´erairegood Symmetry is either imposed as a restriction on multivariate estimation or imposed, say by minimum distance techniques, after estimating the equations for different goods separately.
114 Relaxing linearity
Linearity of Engel curves in ln y fits well for some goods but not for all. The PIGLOG specification can be extended to more sophisticated income dependence but integrability is quite restrictive with regard to how this can be done. Gorman shows that the only function of income that can be added linearly is a quadratic term and that that is the limit to further terms which can be added. Such a specification arises from an indirect utility of the form ln [y/A(p)] V (y, p) = B(p) + Φ(p) ln [y/A(p)] where Φ(p) is a a further homogeneous-of-degree-zero price index.
115 QUAIDS
Keeping the AIDS specifications for A(p) and B(p) and choosing the conve- nient form X Φ(p) = φi ln pi i gives the QUAIDS specification X φ w (y, p) = α + γ∗ ln p + β ln [y/A(p)] + + i ln [y/A(p)]2 i i ij j i B(p) j which can again be estimated iteratively, though iterating now over B(p) as well as A(p).
116 Income taxation
117 Optimum linear income taxation
118 Income taxation
Drop the distinction between different goods as objects of taxation. Single consumption good c with a price set to one. The tax function on labour income is denoted T (wL) and the individual therefore chooses hours to solve
max u(c, L) s.t. c = wL − T (wL)
Assume a continuous wage distribution according to distribution function F
119 Linear income taxation
Suppose a constant marginal tax rate τ and uniform grant component G so that c = w(1 − τ)L + G
Let the uncompensated labour supply function be H(w(1 − τ),G) and the compensated labour supply function be χ(w(1 − τ), υ). Very similar to the indirect tax case, given that a linear income tax and a uniform tax on commodities are effectively equivalent.
120 Optimum linear income tax problem
The government seeks to solve Z Z max W (V (w(1 − τ), G, Q)dF s.t. τ wLdF − G − R¯ = 0 τ,G with Lagrangean Z © ¡ ¢ª W (V ) + λ τwL − G − R¯ dF
First order conditions are Z ½ µ ¶¾ ∂V ∂L W 0 + λ τw − 1 dF = 0 ∂G ∂G Z ½ µ ¶¾ ∂V ∂L W 0 + λ τw + wL dF = 0 ∂τ ∂τ
121 Optimum linear income tax rules
Defining ∂L ∂H ∂χ ∂H = −w = −w − wL ∂τ ∂w(1 − τ) ∂w(1 − τ) ∂G ∂V = −θwL ∂τ and defining the net social marginal valuation of income b = W 0θ/λ+τw∂H/∂G with θ = ∂V/∂G, we have Z (b − 1) dF = 0 Z µ ¶ τw ∂χ wL b − 1 + dF = 0. L ∂w(1 − τ)
122 Interpreting optimum linear income tax rules
The former condition tells us that the net social marginal valuation of income averages unity. Rearranging the latter gives a condition R τ wL(b − 1)dF = − R 1 − τ wLηLLdF where ηLL = ∂ ln H/∂ ln w(1 − τ) is a compensated labour supply elasticity. This expression only implicitly defines the optimum tax rate τ
123 Equity and efficiency
Numerator and denominator of this expression can be seen as conveniently capturing equity and efficiency considerations. High pretax inequality as reflected in a high covariance between b and pretax earnings wL Z wL(b − 1)dF is associated with high optimum tax rates
High labour supply elasticities ηLL, and therefore high deadweight costs of labour taxation, Z
wLηLLdF are associated with low rates.
124 Comparing with optimum commodity tax formulae
Notice that this optimum tax formulae is essentially the same as that derived for the optimum uniform tax rate in the case of separability between goods and leisure and linear Engel curves. R R τ wL(b − 1)dF t (b − 1)wLdF = − R = R 1 − τ wLηLLdF 1 + t wLηLLdF
125 Optimum nonlinear income taxation
126 Two ability types
Suppose there are only two ability types, assumed for simplicity to be equally numerous.
The more able type has productivity and therefore pretax wage w1 and the other w2 < w1. Suppose also that utility is additively separable
U(c, L) = u(c) − v(L) where u is concave and v convex so that preferences are convex. The government objective is to maximise utilitarian social welfare
u(c1) + u(c2) − v(L1) − v(L2)
127 First best taxation
Suppose the government can observe productivity types and impose consump- tion and labour supplies so as achieve its objective subject to raising revenue per person of R¯
max u(c1) + u(c2) − v(L1) − v(L2) c1,c2,L1,L2
w1L1 + w2L2 − c1 − c2 ≥ 2R¯
First order conditions require
0 0 u (c1) = λ u (c2) = λ 0 0 v (L1) = w1λ v (L2) = w2λ. where λ is the Lagrange multiplier on the revenue constraint
128 First best allocation
Therefore
0 0 u (c1) = u (c2) 0 0 v (L1) = w1v (L2)/w2.
Thus less and more able individuals consume the same, c1 = c2 Given convexity of the disutility of labour, the more able are expected to work longer hours, L1 > L2. Both types consume resources at the same rate to generate utility whereas the more able generate more resources for each unit of utility given up in hours of work.
129 Incentive compatibility
The more able are therefore left worse off at the optimum. The assumed ability of the government to observe productivity is therefore critical to its ability to implement the optimum If it cannot, then the more able have no incentive to reveal themselves since they will be penalised for doing so.
130 Zero marginal tax rates
For the given preferences the marginal rate of substitution between consump- tion and leisure is given by the ratio of marginal utilities v0(L)/u0(c). For both types the first order conditions also imply
0 0 v (Li)/u (ci) = wi, i = 1, 2 so that the marginal rate of substitution equals the pretax wage. At the margin each individual’s preparedness to give up hours of leisure for additional consumption is equal to the marginal rate of transformation of hours of work into output of the consumption good. In effect, both types are placed at consumption-leisure combinations which would be chosen if facing budget constraints with a zero marginal tax rate.
131 Second best taxation
Suppose that the government can only set individuals a choice of consumption- earnings combinations, each of which must be offered to both types of individual. Then it cannot make the more able worse off than they would be if emulating the less able by earning the same but working the fewer hours necessary for them to reach that earnings level.
u(c1) − v(L1) ≥ u(c2) − v(w2L2/w1).
Note that u(c2)−v(w2L2/w1) ≥ u(c2)−v(L2) so satisfaction of this constraint guarantees that the more able are better off than the less able.
132 Second best problem
The Lagrangean for the optimum tax problem is
u(c1) + u(c2) − v(L1) − v(L2) £ ¤ + λ w1L1 + w2L2 − c1 − c2 − 2R¯
+ µ [u(c1) − v(L1) − u(c2) + v(w2L2/w1)]
First order conditions are
(1 + µ)u0(c ) = λ (1 − µ)u0(c ) = λ 1 µ ¶2 0 0 w2 0 w2L2 (1 + µ)v (L1) = w1λ v (L2) − µ v = w2λ. w1 w1
133 The more able are better off
We know the incentive compatibility constraint to bind so that µ > 0 From the first two of these conditions
0 0 u (c1) = λ/(1 + µ) < λ/(1 − µ) = u (c2) and therefore c1 > c2. The more able work longer hours but they are rewarded in higher consumption for doing so.
134 Marginal tax rates
If we look at marginal tax rates we see that
0 0 v (L1)/u (c1) = w1 so that the labour supply choices of the more able are still undistorted at the margin exactly as in the first best. However 0 0 1 − µ v (L2)/u (c2) = w2 ³ ´ < w2 w2 0 w2L2 1 − µw v w ³ ´ 1 1 0 w2L2 0 since w2/w1 < 1 and v < v (L2). w1 The marginal rate of substitution of the less able is below their pretax wage The distortion to their labour supply is a necessary feature of observing in- centive compatibility at the optimum.
135 Explaining optimality of labour supply distortion I
Suppose that labour supply of more and less able were both undistorted so that 0 0 0 0 v (L1)/u (c1) = w1 v (L2)/u (c2) = w2 Look for the possibility of a welfare-improving deviation consisting of infinites- imal changes ∆c1, ∆L1, ∆c2 and ∆L2. The change in social welfare would, using the undistortedness, be
0 0 0 0 u (c1)∆c1 − v (L1)∆L1 + u (c2)∆c2 − v (L2)∆L2 0 0 = u (c1) [∆c1 − w1∆L1] + u (c2) [∆c2 − w2∆L2]
136 Explaining optimality of labour supply distortion II
But if the government budget constraint were respected then
[∆c1 − w1∆L1] + [∆c2 − w2∆L2] = 0 so that the change in social welfare would be
0 0 (u (c2) − u (c1)) [∆c2 − w2∆L2] .
137 Explaining optimality of labour supply distortion III
Differentiating the incentive compatibility constraint gives µ ¶ 0 0 0 w2 0 w2 u (c1)∆c1 − v (L1)∆L1 − u (c2)∆c2 + v L2 ∆L2 = 0 w1 w1 from which, using the government budget constraint and lack of distortion again, · µ ¶¸ 0 0 0 w2 0 w2 − (u (c2) + u (c1)) [∆c2 − w2∆L2] = v (L2) − v L2 ∆L2 w1 w1 so that the change in social welfare would be 0 0 · µ ¶¸ u (c2) − u (c1) 0 w2 0 w2 − 0 0 v (L2) − v L2 ∆L2. u (c2) + u (c1) w1 w1
138 Explaining optimality of labour supply distortion IV
If there is any desire to redistribute from more to less able then
0 0 u (c2) − u (c1) > 0
A reduction in labour supply of the less able ∆L2 < 0 would raise social welfare if implemented compatibly with the given constraints.
139 Extending to the continuous case Extending this sort of analysis to more types means adding more incentive compatibility constraints. Most of these are redundant however under fairly weak conditions – it is necessary only to ensure that each individual is dissuaded from posing as the next least able person to ensure all constraints are satisfied. The limiting case as one adds more and more incentive compatibility con- straints for individuals nearer and nearer in abilities is a differential condition Write
U(w1) = u(c1)−v(L1) = u(c2)−v(w2L2/w1) = U(w2)+v(L2)−v(w2L2/w1) and let w2 − w1 approach zero to get dU/dw ≥ wv0(L)/L
140 Equivalence to optimum labour supply response
The same condition could be reached by differentiating the utility function and using v0(L)/u0(c) = w (1 − T 0(wL))
Differentiating gives dU dc dL = u0(c) − v0(L) dw dw dw dL = [u0(c)w (1 − T 0(wL)) − v0(L)] + (1 − T 0(wL)) Lu0(c) dw = (1 − T 0(wL)) Lu0(c) = wv0(L)/L
141 Optimum control formulation
As we reach the continuous case the problem becomes one of maximising Z [u(c) − v(L)] dF subject to budget constraint Z £ ¤ wL − c − R¯ dF = 0
This can be treated as an optimum control problem, for example by
• taking labour supply L as control variable
• taking utility U = u(c) − v(L) as state variable
• letting the incentive compatibility condition give an equation of motion for U
142 Hamiltonian formulation
The Hamiltonian for the problem is © £ ¤ª H = U + λ wL − c − R¯ f(w) + µLv0(L)/w with first order condition ∂H 0 = ∂L½ ¯ ¾ ∂c ¯ Lv00(L) + v0(L) = λ w − ¯ f(w) + µ ∂L¯ w U µ ¶ Lv00(L) = λwT 0(wL)f(w) + µu0(c) (1 − T 0(wL))) 1 + v0(L)
143 Costate variable
The evolution of the costate variable µ follows µ ¶ dµ ∂H λ = − = − 1 − f(w) dw ∂U u0(c) and satisfies the endpoint condition
µ(wmax) = 0
Thus Z µ ¶ wmax λ µ = 1 − 0 dF w u (c)
144 Optimum marginal rates I
Rearranging the first order condition gives µ ¶ T 0(wL) µυ0(c) Lv00(L) = − 1 + 1 − T 0(wL) λwf(w) v0(L)
The endpoint condition µ(wmax) = 0 ensures that the optimum marginal tax rate is zero at the top of the distribution just as in the discrete case. At lower ability levels we would expect a positive µ and therefore a marginal tax rate between 0 and 1. The variation in µ across ability levels will depend upon the strength of re- distributive social preferences.
145 Optimum marginal rates II
For lower values of w we also have an indication of other factors determining the optimum marginal tax rate. T 0(wL) is higher at points in the distribution where
• wf(w) is low: there is a low concentration of the highly able
• ∂v0(L)/∂L is high: hours choices are relatively insensitive to after-tax wages.
146 Combining income and commodity taxation
Allowing for nonlinear taxation of labour income has dramatic consequences for the optimum structure of commodity taxes. The earlier discussion of commodity taxation showed that nonuniform com- modity taxation could be useful to a redistributive government even if
a linear labour income tax were available
labour supply separable from commodities in preferences
This is no longer true if we remove the restriction of linearity on labour income taxes Result is due to Atkinson and Stiglitz
147 Combining income and commodity tax with discrete types
Take the setting with two ability types above and let utility be
U(q, L) = u(q) − v(L) where q is a vector of goods rather than a scalar consumption aggregate with pretax prices p0 We can rewrite the Lagrangean for the optimum tax problem as
u(q ) + u(q ) − v(L ) − v(L ) 1 h2 1 2 i 00 − λ w1L1 + w2L2 − p (q1 + q2) − R¯
+ µ [u(q1) − v(L1) − u(q2) + v(w2L2/w1)]
148 Optimal commodity taxes
First order conditions with respect to individual commodity demands qhi, h = 1, 2, i = 1, . . . , m are
0 0 (1 + µ)∂u/∂q1i = λpi (1 − µ)∂u/∂q2i = λpi so that, for both types h = 1, 2 and for any two goods i, j = 1, . . . , m, the marginal rate of substitution is optimally set equal to the pretax price ratio
∂u/∂qhi 0 0 = pi /pj. ∂u/∂qhj
149 Interpreting optimal uniformity
In fact, this implies the even stronger result that even nonlinear commodity taxation would serve no purpose. Critically driven by facts that
individuals face common pretax prices for commodities
utility is separable between commodities and leisure
The only restriction on achieving the first best is the incentive compatibility constraint arising because of the government’s inability to observe individual ability types Separability means that observing commodity demands is of no use in allevi- ating that
150 Combining income and commodity tax with continuous ability
Obviously this extend also to a continuous ability distribution. Firstly, using the budget constraint and first order conditions for optimum labour supply and commodity choice, we derive the equation for motion for U dU X dq dL = ∂u/∂q i − v0(L) dw idw dw "i # X dq dL v0(L) = p i − w (1 − T 0(wL)) idw dw w (1 − T 0(wL)) i = wv0(L)/L which is unchanged. Now rewrite the Hamiltonian n h io 0 H = U + λ wL − p0 q − R¯ f(w) + µLv0(L)/w
151 Optimal uniform taxes, again
First order conditions with respect to L and qi, i = 2, . . . , m include ∂H 0 = ∂q½i ¯ ¾ ∂q ¯ = λ −p0 1¯ − p0 f(w) 1 ∂q ¯ i ½ i U ¾ 0 ∂u/∂qi 0 = λ −p1 − pi f(w) ∂u/∂q1 from which it follows that
∂u/∂q1 0 0 p1/pi = = p1/pi ∂u/∂qi so that commodity choice is again undistorted.
152 Practical optimum tax formulae
Optimum tax formulae involve solving some rather challenging simultaneous differential equations Perhaps better to regard these formulae as informative about general princi- ples rather than as practical toolbox Recent developments have sought to bridge the gap to practical application. One approach (associated with Saez):
constrain the tax function to be linear over an interval
allow it to be set unconstrainedly optimally outside
consider factors determining the optimal constant marginal rate over that interval
153 Optimum high end tax rates
The easiest case to consider is a tax linear above a threshold. Suppose we impose a linear tax above wL = ζ with rate τ and implicit grant G. Consider a change leaving posttax income the same at ζ so that
∆G = ζ∆τ
154 Nonbehavioural revenue effects
First assess the revenue effects.
If noone changes behaviour in response then each person above ζ pays an additional (wL − ζ)∆τ so that mean effect is Z ∆τ (wL − ζ)dF wL≥ζ
155 Behavioural revenue effects
But behaviour does change. Each person above ζ responds so that earnings change by ∂ ln L ∆τ w(1 − τ)L ∂ ln L ∆τ w∆L = −wL + ζ ∂ ln w(1 − τ)1 − τ G ∂ ln G 1 − τ ∆τ = − (wLη − φζ) 1 − τ where η = ∂ ln L/∂ ln w(1 − τ) and φ = (w(1 − τ)L/G) ∂ ln L/∂ ln G. Hence there is a mean revenue gain from behavioural responses of Z τ∆τ − (wLη − φζ)dF 1 − τ wL≥ζ .
156 Welfare effects
Against this revenue loss we need to balance the welfare effects. By the envelope theorem we can ignore behavioural effects here. Suppose that we value public funds at γ times high earners’ incomes. R Welfare loss is γ∆τ wL≥ζ(wL − ζ)dF .
157 Optimum balance of revenue and welfare effects
At the optimum Z Z τ∆τ (1 − γ)∆τ (wL − ζ)dF − (wLη − φζ)dF = 0 wL≥ζ 1 − τ wL≥ζ and therefore R τ (1 − γ) (wL − ζ)dF = R wL≥ζ 1 − τ wL≥ζ(wLη − φζ)dF Notice here that γ may be a matter of value judgment but everything else is observable or estimable.
158 Practical formulae
If there is a known highest earnings level then the tax rate goes to zero at that point Suppose not, but Z (wL − ζ)dF/ζ(1 − F (ζ)) → (α − 1) wL≥ζ as ζ → ∞ and γ, φ and η are roughly constant above ζ. Then we have an applicable formula at high income levels
τ/(1 − τ) → (1 − γ)(α − 1)/(αη − φ)
159