<<

James Mirrlees' Contributions to the Theory of Information and Incentives Author(s): and Timothy Besley Source: The Scandinavian Journal of Economics, Vol. 99, No. 2 (Jun., 1997), pp. 207-235 Published by: Blackwell Publishing on behalf of The Scandinavian Journal of Economics Stable URL: http://www.jstor.org/stable/3440553 . Accessed: 09/03/2011 03:25

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at . http://www.jstor.org/action/showPublisher?publisherCode=black. .

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

Blackwell Publishing and The Scandinavian Journal of Economics are collaborating with JSTOR to digitize, preserve and extend access to The Scandinavian Journal of Economics.

http://www.jstor.org Scand.J. of Economics99(2), 207-235, 1997 James Mirrlees' Contributions to the Theory of Information and Incentives*

Avinash Dixit PrincetonUniversity, Princeton, NJ 08544-1021,USA and TimothyBesley London School of Economics,London WC2A 2AE, England I. Introduction The theory of informationand incentives has revolutionizedeconomics. We now understandhow markets and other modes of private economic transaction,as well as public policy, are affected when different partici- pants have differentinformation, or when the actions of some participants are not observable to others. Mirrlees has made pathbreakingcontribu- tions - arguablythe two most importantcontributions - to the develop- ment of these ideas and methods of analysis. His Review of Economic Studiesarticle [1971c]on optimal nonlinearincome taxationintroduced a technique that evolved into the "revelationprinciple", which fully charac- terizes all policies that are "incentivecompatible", or feasible under infor- mation asymmetry. This technique is now the workhorse model for numerous applications.His Bell Journalof Economics article [1976a] on hierarchiesand organizations,and some of his importantunpublished (yet widely circulated)papers referred to below, played a similar role for the theoryof the design of incentivesto induce effort. He identifiedthe precise way in which observableoutcomes convey probabilisticinformation about the underlyingunobservable action, and therebyclarified how rewardsor punishmentsbased on outcomes can serve as incentivesto action. Mirrlees developed these ideas in the context of specific applications. However, they are truly fundamentalto economic theory, and they have profoundlyinfluenced subsequent thinking and researchon the design of * We are very gratefulto , , Bengt Holmstrom,Kevin Roberts, RobertWilson, and last and most important,Richard Zeckhauser, for theirvery detailed and helpful commentson the preliminaryversion, given under extremetime pressure. Dates in bracketsrefer to publicationsby Mirrlees discussed in this paper (see Biblio- graphy) and dates in parentheses to works by other authors listed at the end of this article. Biographicalnote: James A. Mirrleeswas bornin 1936in Minnigaff,Scotland. He received his M.S. in Mathematics in Edinburghin 1957, and his Ph.D. from the University of Cambridgein 1963. He was Edgeworth Professor of Economics at Oxford University between 1969 and 1995, and currentlyholds the professorshipof PoliticalEconomy at the Universityof .

? The editors of the ScandinavianJournal of Economics 1997. Publishedby BlackwellPublishers, 108 Cowley Road, OxfordOX4 1JF, UK and 350 Main Street, Maiden, MA 02148, USA. 208 A. Dixit and T.Besley incentives in numerous contexts. Robert Wilson has expressed this eloquently:1"For twenty years the general theory of taxation has been founded substantiallyon the seminal contributionby James Mirrlees.It is also the foundation for related topics that have become mature fields of study: the theory of regulation, price discriminationof various forms, nonlinear pricing and Ramsey pricing generally, contracting (labor contracts,principal-agent relationships,procurement, risk-sharing). Like the implicationsof Einstein'stheory of relativityfor physics,the methodo- logical implications of Mirrlees' formulation and analysis infuse every topic in microeconomictheory where incentives and/or private informa- tion are relevant." Mirrlees is also known to a broader spectrumof the economics profes- sion for his contributionsto the theory of public finance, and to develop- ment economics.In the formerarea, in collaborationwith Peter Diamond, he offered a thorough and definitive treatment of optimal commodity taxation. This both formalized and extended previous work by Frank Ramsey and . The Diamond-Mirrlees [1971a,b] papers also spurred a great deal of subsequent work, with the germs of ideas developed later by many others found within them. Most importantly,they proved that even in the absence of personalized lump-sumtransfers, it is generallydesirable to main efficiencyin produc- tion. Any necessarydistortions should take the form of taxes or subsidies levied at the final stage of sales to consumers. In particular,for a small country that does not have any monopoly power in internationaltrade, world prices constitute the correct social opportunitycosts. This formed the basis of Mirrlees' work with Ian Little in development economics, whichhas had a majorpractical influence on the way in whichinternational development agencies conduct cost benefit analyses. This essay focuses on Mirrlees'work on informationand incentives.We begin with a brief review of the issues, and the limitationsof traditional economic theory in confrontingthem. We keep this part very non-techni- cal and elementary, so that the general reader can obtain an adequate knowledge of the issues. We then discuss Mirrlees' pioneering articles, explainhow they evolved into the general methods of solution of problems of the design of incentives,and give a brief summaryof later developments of the theory and its applications.Here we sketch a few technical details. We then briefly discuss Mirrlees'other contributions.

II. An Overview of Information Economics It has long been recognized that the participantsin the numerous trans- actions that go on in any complex economy have diverse information ' Privatecommunication.

? The editors of the Scandinavian Journal of Economics 1997. James Mirrleesand the theoryof informationand incentives 209 pertaining to these transactionsand their effects. Managersknow more about a firm'sproduction than do shareholders;workers know more about their own abilities than do employers;taxpayers know more about their actual and potential income than does the government'srevenue service. Traditionaleconomic theory argued that the price system automatically fully aggregates this dispersed information.The reasoning is as follows. Each producer or worker uses his information to calculate the action (supply or demand) that is most profitableor beneficial to himself, and these decisionscollectively determine the marketprices. Therefore market pricesshould reflect all the privateinformation as it pertainsto the scarcity of various goods and services. However, this analysisapplies only to marketswith a large number of participantswhere the buyer and seller do not care about each other's identities. Many real-life transactionsare not "arm's-length"or "anony- mous" in this sense, and thereforethey have featuresthat orthodoxtheory cannot explain. Bilateralbargaining offers the most obvious example;the informationa seller has about the quality of his product matters to the potentialbuyer, who has to decide whetherto concludea deal, and if so, on what terms. Similarly,an employer tryingto secure more or better effort from a particularworker or groups of workersis not dealing in an anony- mous labormarket. And likewise,a bankthat lends to a companyfaces the risk of default from that particularcompany, not from an anonymous capital market.Some of these relationshipsmay be competitiveex ante:an employermay choose from amongseveral job applicantsand a workermay choose among several jobs; there are many potential borrowers and lenders in the capital market. But once a deal is struck,the relationship becomes bilateral and non-anonymous;therefore the parties must take into account this future fixity and its implications,even while they are consideringthe choice from among numerouspotential partners.2 When private information bears on transactionsthat are not anony- mous, each participanthas the opportunityand the natural incentive to manipulate his information to his own advantage.A job applicant may overstatehis abilityor qualifications,and an intense workerwhen the boss is present may slack off in his absence.An insuranceapplicant may under- state the risk he is seeking to insure; if an insured risk materializes,the claimantwill try to hide his lack of precautionarymeasures.

2If one partymust make some sunk investmentex ante, and the contractas struckex ante is not fully enforceableby an externalauthority, then the other partycan demanda renegotia- tion more favorable to itself; this problem of "opportunism"has been emphasized by Williamson(1985, Chs. 2, 7). The problemwe discussis different;it arisesbecause informa- tionalasymmetries restrict the set of feasiblecontracts, even when feasiblecontracts are fully enforceable.

? The editors of the Scandinavian Journal of Economics 1997. 210 A. Dixit and T Besley

Of course the partieson the other side of these transactionshave equally clear incentives to try to elicit the truth about quality or risk. One way would be to improve observabilityand make informationsymmetric. A bank that has lent to a companycan obtain representationon the board of directorsto monitor how the company is using the money; health or life insurancecompanies assess the riskinessof applicantsby subjectingthem to medicalexaminations; tax authoritiesconduct audits. But all such direct checks are costly and imperfect. Even after they have been used to the point where their marginalbenefits no longer justify the marginalcost, information asymmetriespersist, and they are often quite substantial. Therefore devices that rely on the informedparties' own incentivesmust be used. The general idea is to structurethe terms of the contractbetween the two asymmetricallyinformed parties in such a way that concealment becomes less possible or more costly for the informed party. In other words, that party's incentives must be realigned so that they are more congruent.In the context of insurance,the most obvious such devices are deductiblesand coinsurance,where the insured has to bear a part of the loss; see Pauly (1968). Further analysisreveals many other, more subtle, mechanismsof this kind that are useful for more complexsituations. These are discussed in Sections III and IV below. Mirrlees'work has helped lay the foundationsof this general theory of the design of incentives. The emphasis on the lack of anonymityin transactionswhere private informationis important,and on the resultingopportunities for strategic manipulationof information,should also make it clear that concepts from non-cooperativegame theory should play a central role in the theory of informationeconomics. Indeed, it is just this combinationof game theory and the theoryof informationthat has producedsome of the most exciting and fruitfulresearch in economics duringthe last two decades. The prizes awardedto three pioneers of game theory - Nash, Harsanyi,and Selten - in 1994, and to two pioneers of informationeconomics this year- Mirrleesand Vickrey- are just and fittingtestimony to the importanceof this research. Indeed, Harsanyiand Selten's award-winningwork was to extend and refine the concept of Nash equilibriumto games of imperfect and incomplete information;this provides much of the game-theoretic basis for the moderndevelopments of the theoryof mechanismdesign that stem from the work of Mirrleesand Vickrey.

A BriefHistory When economic theorists began to examine the role of information in detail, the first approach was to think of information as an economic commodity. Thus statistical theories of the value of informationwould

? The editors of the Scandinavian Journal of Economics 1997. JamesMirrlees and the theoryof informationand incentives 211 enable us to find the demand for information.Some useful progressalong this road was made, but a different route proved much more fruitful. Instead of regardinginformation as a commodityby itself, people asked how the availabilityand the dispersalof informationpertaining to ordinary economic commoditieswould alter the workingof the marketsand other allocation mechanismsfor those commodities. Arrow (1963, 1970) organized the economic analysis of asymmetric informationin two categories:unobservable action ( in the jargon of the insuranceliterature) and hidden information(adverse selec- tion in the same jargon).Both arise because the insuredhas more informa- tion about the risks he seeks to insure than does the insurancecompany. Moral hazard arises when the insured can take actions (such as always lockingthe car) that will reduce the levels of risk.The insurancecompany cannot observethese actions,and the availabilityof insuranceweakens the insured'sincentives to take the precautions.Adverse selection ariseswhen the risk depends on some innate characteristic(such as a health condition or general carelessness)that, althoughnot under control of the insured,is knownto him but not to the insurancecompany. Then a contractwith any given ratio of premiumsto coveragewill attractexactly those clientswhose risk is high enough that they find the ratio attractive;hence the termin- ology. Arrow'sanalysis of these problemswas mostly informal,but it is a tribute to his insight that these two concepts continue to provide an organizingframework for the subject.3 The next best known early work is Akerlof (1971). He showed how a marketcould collapse altogetherunder the weight of the adverseselection problem.The seller of a used car knowsthe qualityof the particularcar he is selling; buyers know only the average quality on the market, not the qualityof a particularcar offered to them. Buyerswill not pay more than what the average car on the market is worth, but then only those with below-averagecars will put them on the market, which will lower the average,and so on, until no one is willingto sell and the marketcollapses. This was a dramaticand provocativeexample, and spurredmuch further research, but by itself it is ultimately of somewhat limited value. Many markets suffer from some adverse selection, but most of them do not collapse. They find ways of coping, by devisingcontracts and other incen- tives that partiallysolve the problems posed by the asymmetricinforma- tion. Therefore it is far more importantto understandhow such coping takes place, namelyhow to design such contractsand incentivesthat reveal some or all of the information,usually at some cost.

3The terminology is somewhat unfortunate;no moral judgement need be made when someone takes an unobservableaction that optimizeshis own objective,and selection need not alwaysbe adverse.But the usage has become common, and we will adopt it here.

? The editors of the Scandinavian Journal of Economics 1997. 212 A. Dixit and T Besley

There are two distinctbut related ways of thinkingabout the resulting theory. One considers the context of the informationasymmetry, distin- guishing market interactionsfrom non-marketones. The other, concep- tually more basic, classifies the problems according to whether the initiative for coping comes from the less informed party or the more informed. Methods that the less informedparty can employ to elicit desired infor- mation or action from the more informed party are generally called "screening"devices. Direct tests are screening by examination;schemes that alter the informedparty's own incentivesin the desired directionare called screening by self-selection. Mirrlees'and Vickrey'swork provides foundationsof the latter branchof theory.It is of particularimportance in non-market situations - the internal organization of a firm, and the government'seconomic policy and planning.Of course screeningby self- selection does arise in marketcontexts too; an importantearly model for insurance markets is in Rothschild and Stiglitz (1976) and Wilson (1977). The more informedparty may wish to take the initiativeand reveal its informationif this will work to its advantage,but then the other partywill treat such revelations skeptically.A highly skilled or productiveworker would like a potential employer to know the fact; so would a good car driver like the insurancecompany to recognize his skill. The difficultyis that a less skilled or productiveworker would try to pretend to have high skills or productivity,and a worse driverwould claim to be better. There- fore strategicallyaware employers or insurers would not believe mere assertions.The trulyskilled or carefulhave to find observableactions that those with less skill or care would find it too costlyor impossibleto imitate, and which therefore crediblyconvey their truthfulinformation to the less informed party in the transaction. Spence (1974), who pioneered this branch of the theory of asymmetricinformation, labelled such actions "signals".For example, a high-skillworker may get education to a level that someone with less skill would find it too expensiveto achieve.Even if the last years of education have no direct effect on the worker'sproduc- tivity,they can still have value as a signal.Signalling is particularlyimport- ant in market interactions, although it does arise in organization or planning,too. This very brief discussionof signallingmerely serves to clarifyhow that branch of the theory differs from the one where the uninformedparty takes the initiative,namely screening,which is our main focus here. We now proceed to examine in detail the design of incentive mechanismsfor screening,first for the case where some innate characteristicof one party is not known to the other (adverseselection), and then for the case where one party'saction is not observableto the other (moral hazard).

? The editors of the Scandinavian Journal of Economics 1997. JamesMirrlees and the theoryof informationand incentives 213

III. Incentives under Adverse Selection To recapitulate,the concept of adverseselection in informationeconomics arises most directlyin an insurancecontext. If some innate characteristic that affects the risk class of an individualis not observableto the insurance company,then an insurancepolicy will selectivelyattract clients in bad risk classes. More generally,the term refersto a situationwhere one partyto an economic relatinshiphas advanceprivate information about some param- eter that affects the payoffs from the relationship. An importantapplication of adverse selection is the case of redistribu- tive taxationwhere the governmentcannot observe individuals'earnings abilities. Mirrlees'[1971c] income tax model was the first general analysis of optimal policy design that took account of such informationalasym- metry. Edgeworth (1897) had observed that an egalitarian objective implies very high tax rates - even 100 per cent at the margin.Although economists were aware of the implied disincentives,little progress had been made on how to bring these into the analysis,weighing up the gains from equity against any efficiency loss. Vickrey (1945) made a valiant attemptto model this, but did not get any results.There were attemptsthat specified a parametrizedtax schedule, for example affine in Sheshinski (1972), quadraticin Zeckhauser(1969), or isoelasticin Wesson (1972), but these remained isolated examples that failed to bring out the general principles.Mirrlees provided a formulation. The following is a somewhat simplified version of his model. Let x denote the consumptionor after-taxincome of a person, and z the output or before-tax income he produces. People differ in their productiveeffi- ciency. These differences are captured by an index n over a continuum range [1l,h].Normalize the total population to unity. Let f(n) denote the density function of the population accordingto their types, and F(n) the cumulative distributionfunction of types. A person with higher n can produce a given outputz with less effort or disutility.Choose the scale of n so that the utilityof personn is U(x,z/n), increasingin the firstargument, decreasingin the second, and concave. A simple interpretationis that by workingy clock hours, person n can generate z = ny efficiency units of labor. The government wants to maximize a standard Utilitarian welfare function

U (n),( )f(n)dn. (1)

This is subject to the budget constraint,or equivalentlya total resource constraint,

? The editors of the Scandinavian Journal of Economics 1997. 214 A. Dixit and T Besley

'h [z(n)-x(n)]f(n) dn >0. (2) . I If each person's type were publicly observable,the governmentcould impose individualized lump-sum taxes. This is formally equivalent to orderingperson n to produce outputz(n), and give him consumptionx(n), these functionsbeing chosen to maximize(1) subjectto (2). The first-order conditions are

U, (x(n), --U2(x -- , (3) n n \ n where ) is the Lagrange multiplier for the constraint (2). This is a hypothetical first best, assuming away information problems. It is essentially the solution that Edgeworth had in mind, since the marginal utility of consumptionis equalized across individuals. To see why such informationproblems are endemic, differentiate (3) totally with respect to n, solve for the derivativesx'(n) and y'(n), and substitute to find the effect on the utility V(n) = U(x(n), z(n)/n). As long as leisure is a normal good, this implies V'(n) <0, so that high ability individualsare worse off than those of low ability.Since the government wants to equalize marginalutilities of consumptionand effort, the most productive individualsmust work harder,while not enjoying sufficiently increasedconsumption to compensate them. This starkly highlights the incentive problem: if n is not publicly observable, and the governmenttries to implement the above first-best, people will have an incentive to pretend to be less productivethan they actually are by reducing their labor supply. This makes the first best infeasible. The central question then becomes what kind of a second best can be achieved using an income tax schedule common to all persons. Such a schedule can be equivalently specified by making the consumption or after-taxincome x a function of the productionor pre-tax income z, say x = X(z). Each personn choosesz(n) to maximizeutility U(x, z/n) subjectto x = X(z). The schedule is chosen by the government.Properties of the tax sched- ule, such as progessivity,must be derivedfrom the government'soptimiza- tion; they cannot be assumedfrom the outset. Thereforevirtually none of the usual convexityconditions for the consumer'schoice can be imposed. This makes the problem seem intractable.Mirrlees' ingenious device was that under certain conditionsthe whole problemcould be reduced to just one optimization - the government's - with a subsidiary constraint to ensure that the z(n) in the government'ssolution would also be chosen by person n.

? The editors of the Scandinavian Journal of Economics 1997. James Mirrleesand the theoryof informationand incentives 215

We shall state Mirrlees' theorem, and then in the subsections that follow, relate it to later developments in this area.4 Suppose that the function (x, y) = -yU2(x, y)/U,(x, y) (4) is increasingin y for each fixedx. The consumption,production and utility levels x(n), z(n) and V(n) = U(x(n), z(n)/n) over the range of n arise from individualmaximization under some tax functionx = X(z) if and only if two conditionshold: (i) z(n) is a non-decreasingfunction, and (ii) utilityvaries with individualtype accordingto the differentialequation V'(n) = -z(n)U2(x(n), z(n)/n)/n2. (5) This theorem reduces the very difficultproblem of designingthe sched- ule X(z) to a simplerand more standardoptimal control problem,namely the maximizationof (1) subject to the constraint (5). Regard n as the independentvariable, V as the state variable,and z as the controlvariable; given Vandz,x is solved using the relation V = U(x,z/n). Then (5) becomes the equation for the system dynamics.Note that since U2(') is negative, V'(n) is positive. Thus the higher types must be given sufficientlymore utility to remove their incentiveto pretend to be lower types. This control problemcan be solved whereasthe originalone degeneratesinto a morass of uninterpretable conditions. This transformation was Mirrlees' breakthrough.5 Mirrlees' equivalence theorem contains within it the seeds of much subsequentwork on mechanismdesign with hidden information.The next two subsectionsexplain this by relatingthe theoremto two lines of general- ization that have followed.

TheRevelation Principle The most importantinnovation in Mirrlees'solution is that instead of the income tax schedule the government can specify the consumption and output schedulesx(n) and z(n) subjectto certainconditions. The differen- tial equation (5) capturesthe conditionthat the assignedquantities should be equivalent to the ones individualswould choose optimallyalong a tax

4Actually,our statement of the theorem ignores several subtle matters to do with the definitionof the consumptionset, the differentiabilityof V(n), etc. But the simplifiedversion sufficesto bringout all the points that are relevanthere. 5 More generally,this is knownas solvingthe problemby focusingonly on "local"incentive compatibilityconditions which concern only whether a particularindividual wishes to masqueradeas an adjacenttype. In this analysis,local incentivecompatibility implies global incentivecompatibility, i.e., not wishingto masqueradeas an adjacenttype guaranteesthat one would not wish to masqueradeas any other type. See the discussionbelow.

? The editors of the Scandinavian Journal of Economics 1997. 216 A. Dixit and T Besley schedule. Since person n could have made the choice (x(m),z(m)), we must have U(x(m), z(m)/n) - U(x(n), z(n)/n) < 0. (6) Omittingsome technicalproblems of discontinuitiesfor ease of exposition, suppose that (x(n), z(n)) varies smoothlywith n along the schedule.Divide (6) by (m-n) and let m->n. This gives

U, n) x(n)+- U2 (X(n) z'(n) = 0. (7) (x(n),kx z(n n n n Then V'(n) = dU(x(n),z(n)/n)/dn

z(n) n 1 n z(n) n (n - z(n) n )xx'(n)?-(U2x(n), n n )

z(n) -z(n) -- U2 x(n), ( ). n2 n

Using (7) in this, we get (5); this is just an "envelope"result. All this works as if the governmentsimply asked people to declare their private information (their type n) and then assigned them appropriate production targets and consumption levels. The functionsx(n), z(n) are laid down in advance;they constitutethe government'scommitment about how it will use the informationthat people supply.Then (6) becomes the condition that the functions should be such that people find it optimal to reveal their informationtruthfully. Once the resultingcontrol problem is solved, the optimal policy can be expressed as a tax schedule x = X(z) obtained by eliminatingn between the optimal control functionsx = x(n) and z = z(n). These ideas were later formalized by others, most notably Myerson (1982), into some terminolgy and a theorem. A mechanism that simply asks people to reveal their information is called "direct"; if truthful revelation is optimal, such a mechanismis called "incentivecompatible". Then the general theorem, labelled the revelation principle,says that all feasible allocations of any mechanism are the same as those of a direct incentive-compatiblemechanism.6 This principle dramaticallysimplified the problem by permitting us to restrict our search for an optimal

6Once again we leave out many details, most importantlythe specificationof equilibrium concept under each mechanism.In the Mirrleesmodel, we can use the most compelling concept, namelydominant strategy equilibria.

? The editors of the Scandinavian Journal of Economics 1997. James Mirrleesand the theoryof informationand incentives 217 mechanismwithout loss of generalityto a much smaller class of feasible mechanisms.

The Single-CrossingProperty The simplificationafforded by the revelationprinciple goes only partof the way. The complete or global incentivecompatibility condition is (6). It has to hold for all n and m. This is still a continuumof constraints;maximizing (1) subjectto all of them at once would be too complex.The furthertrick in reducingthe problem to a standardcontrol frameworkis the abilityto replace this set of conditions by the local or first-orderconditions (7), or equivalently,the envelope result (5). That is achievedby the condition on the function 4(x,y) defined in (4) above. For a fixedn, considerthe tradeoffbetween x andz along an indifference curve. The implicitfunction theorem gives -dx- U2(x, z/n) (1/n)

dz_ u = constant U, (x, z/n) 1 =- (x, y), (8) z where y = z/n. Note that the curves are upward sloping because z is a "bad";they are convex because U is quasi-concave.Now compare the slopes of the indifference curves for different n that intersect at a given point (z, x). A highern with givenz means a lowery. Since +(x, y) has been assumed to increase as y increasesfor a givenx, (8) shows that of the two individualswhose indifference curves cross through the point (z,x), the one with the higher n has the flatter slope. In particular,any pair of indifference curves for different n can cross only once; hence the name single-crossingproperty. This assumptionensures that the functionsx(n) andz(n) obtainedusing the first-orderenvelope propertyin (5) do achieve the desired individual maximafor all n. In terms of a tax schedule, the idea is that higher-ntypes would find their optimum farther to the right, and thus would choose a largerz(n). We omit the details. The single-crossingproperty underlies virtuallyall models of adverse selection, and Mirrleeswas the first to recognize it and pinpoint its role. The work of Spence (1974), Rothschild and Stiglitz (1976) and others furtherclarified its role, and popularizedits use. Single-crossingallows us to replace the global incentive-compatibility condition (6) by the local (7). But it also implies that higher-nindividuals will choose a largerz(n); therefore the government'soptimization must

? The editors of the ScandintavianJournal of Economics 1997. 218 A. Dixit and T.Besley recognize this as a constraint on feasible or implementable functions (x(n), z(n)). This needs some care. The usual startingpoint is to ignore this condition and see if the solution automaticallygenerates an increasing z(n).7 If it does not, then the range of n has to be split into intervalswhere z(n) is increasingand ones where it is constant.This is called the "ironing procedure";see Baron and Myerson (1982) and Wilson (1993, p. 85) for details.

The OptimalityConditions Even with the importantconceptual simplification afforded by the revela- tion principle,the optimal income tax model remains quite complicated, and Mirrlees' [1971c] solution of it required much additional skill and effort, of both economic and mathematicalvarieties. Here we want to elucidate the issue of incentives in the simplest possible way, so we examine a very special case. We assume a utility function linear in consumption: U(x, y) = x-C(y) where the disutilityC(y) of the clock hoursy is an increasingand convex function.This is a poor model for the income tax problemas such,because in conjunctionwith the utilitarianobjective (1) it eliminates the equity considerationsthat motivatethe problemin the first place. The important compensatingadvantage for us is that the aspect of incentivesstands out most clearly and simply, and brings out the close parallel with other frequentuses of incentivecompatible mechanisms as in industrialorganiz- ation where income effects can be neglected; see Fudenberg and Tirole (1991, pp. 262-5). To restore any role for any income tax,we suppose that the government has a fixed revenue requirementR; thus the tax schedule is to be designed to raise this revenue while imposing as small a dead-weight burden as possible. There is one furtherproblem. When utility is linear in consump- tion and social welfare does not value equality,incentive compatibility can be met costlesslyby drivingdown the utility of the low types very far and then giving sufficientlymore to each higher type. Therefore we suppose that there is some lowerbound on consumptionwhich becomes bindingfor a range of the lowest types. We do not treat this in detail below, but focus on the more interestingfirst-order conditions for the types who are above this floor level.

7 Subsequentwork has shownthat this holds in manyapplications if the "hazardrate" of the distributionfunction of types,f(n)/[1-F(n)], is a monotonicfunction of n.

? The editors of the Scandinavian Journal of Economics 1997. James Mirrleesand the theoryof informationand incentives 219

Regardingthe utility V(n) as the state variable,and formallytakingy(n) to be a control variable instead of z(n), the other control variablex(n) is linked to these by V(n) = x(n)-C(y(n)). (9) Then the budget constraint(2) becomes

"h [ny(n)-C(y(n))-V(n)] f(n) dn = R. (10) . I The incentive compatibilitycondition is V'(n) =y(n)C'(y(n))/n. (11) Subjectto these conditions,we wish to maximize,

V(n)f(n) dn. (12) . 1 While this can be solved using Pontryagin'smaximum principle, it is even easier to think of it as an ordinaryLagrangian problem with a contin- uum of choice variables x(n) and V(n), subject to a single revenue constraint(10) and a continuumof local incentivecompatibility conditions (11). Let A denote the Lagrangemultiplier for the former and /,(n) those for the latter. Then the Lagrangianis

"h = {V(n) - [ny(n)-C(y(n))-V(n)]}f(n) dn + /(h)V(h) - (l)V(l)

h ^h - f'(n)V(n) dn- f(n)[y(n)/n]C'(y(n)) dn, where in the second line we have integrated the product @(n)V'(n)by parts. The first-orderconditions for V(n) are (1-2) f(n)-'(n) = 0. Since V(h) is unconstrained,the terminalor transversalitycondition at the upper end-point h is V(h) = 0. Then we have ,(n) = (2-1) [1-F(n)]. (13) If the revenue constaint has bite, relaxing it should make it possible to increase social welfare, which is being measured in consumption units, more than one-for-one. Therefore we may take 2 > 1.

? The editors of the Scandinavian Journal of Economics 1997. 220 A. Dixit and T.Besley

Next, the first-orderconditions for y(n) are 2[n -C'(y(n))]f(n) - (n) [C'(y(n)) +y(n)C"(y(n))]/n = 0. (14) These have a very intuitiveand useful interpretation,which we show using Figure 1. We violate the mathematicalformalism somewhat, and suppose that the people are located at close but discrete points n, n + dn, ..., the number at n is f(n) dn, and the numbers at n +dn and beyond total [1 -F(n)]. In the figure, type n is located at point N, and type (n +dn) at point D. Their indifferencecurves are labelledI(n) and I(n + dn), and are shown as straight lines because this is a "local" picture covering only a small range of output and consumption quantities. The single-crossing propertyensures that I(n + dn) is flatter than I(n). Also, I(n + dn) passes through the point N; therefore type (n + dn) is only just prevented from wantingto masqueradeas n; this is how the incentivecompatibility condi- tion binds in the optimumsolution. Now suppose the governmentasks all personsof type n to work dy more clock hours. This raises their output by n dy. They are given enough extra

x 1 45?/

I (n)

I 1(n+dn)

I (n+dn) A D

z(n) z(n) + n dy z(n+dn) z Fig. 1.

? The editors of the Scandinavian Journal of Economics 1997. James Mirrleesand the theoryof informationand incentives 221 consumptionto keep them indifferent;this moves them from point N to point N' in the figure.Using (8) above, the slope of the indifferencecurve I(n) is C'(y)/n, so the extraconsumption to each is n dyC'(y)/n = C'(y) dy. Therefore the governmentraises extra revenue [n-C'(y)] dy from each (correspondingto the height AN' in the figure). There are f(n) dn such people, and each unit of revenue is valued at i, so the increase in the government'sobjective is X[n-C'(y(n))]f(n) dn dy. This is just dn dy times the first term in (14) above. But the action has additionalconsequences beyond n. Considerthe type- (n + dn) people at D. They can now pretend to be of type n, and get the higher utility at N', because they can produce the requiredhigher output n(y(n) + dy) at a smallerdisutility of C(n(y(n) + dy)/(n+ dn)).8To prevent this, their consumptionmust be raised. In the figure,they must be moved up from D to D'. We have assumedutility to be linear in consumption,so DD' = BN'. The latter height equals the horizontaldistance n dy timesthe difference between the slopes of I(n) and I(n + dn), type type-n and type- (n + dn) indifferences curves at N. The slope of I(n) is C'(z/n)/n, and holding z fixed at its value at the point N, the change in the slope as n increasesby dn is

d (C'(z/n)N nC"(z/n) (-z/n2)-C'(zln) n n dn dn \ n )dn=- n2

C'(y) +yC"(y) = dn. ni2

Multiplyingthis by n dy gives the extra consumption required for types (n + dn). That increasesthe utility of each person receivingit by an equal amount,but it is also a revenue loss to the government,and that is valued at i per unit. Moreover, the same consumption increase must be granted (equal because utilityis linear in consumption)to the people of type (n + 2 dn) to preventthem from pretendingto be type (n + dn), and so on.9 There are in all [1-F(n)] people to the right of n. Giving them all this extra consumptionlowers the government'sobjective by

'Remember that the clock hours are not directly observable. 9For more general utility functions, these compensation amounts for successively higher types of people will be all different, and more complicated calculations will have to be made to determine the correct amounts and then to integrate them over types from n to h. The resulting formula will be more complex, but the principle of solution is the same as in our special case.

? The editors of the Scandinavian Journal of Economics 1997. 222 A. Dixit and T Besley

(2-1) [1-F(n)] (dy dn n which is just dy dn times the second term in the optimalitycondition (14) above. Putting together the direct effect to do with the people at n, and the indirect effect via the people to the right of n, we see that the condition requiresthe full marginaleffect to be zero. This is just as it should be.

FurtherResults The transversalitycondition V,(h)= 0 has an importantby-product. The first-ordercondition (14) at that point becomes h -C'(y(h)) = 0. The most productiveperson is asked to work to the point where the marginaldisutil- ity equals the productivity.In other words, there is no distortionof effort at that end-point. In the language of the tax schedule, this has a dramatic implication. Differentiatingthe utilityrelation (9) and using the incentivecompatibility condition (11), we have x'(n) = V'(n)+C'(y(n))y'(n) = [y(n)/n]C'(y(n)) + C'(y(n))y'(n) = C'(y(n)) [y(n) +ny'(n)]/n = C'(y(n))z'(n)/n. Along the income tax schedulex = X(z), dx/dz =x'(n)/z'(n) = C'(y(n))/n. Therefore at the upper end-pointn = h, we have dx/dz = 1, so the optimal marginaltax rate at the top is zero. This may seem strange, but it has a very intuitive interpretationthat follows from our discussionof the condition (14) above. Inducingperson n to work a little harder by offering more consumptionhas the additional bad consequence that all the people above n must also be offered higher consumptionto preservetheir incentivecompatibility. This is irrelevantfor the person at the top end of the range. Therefore it is optimal to induce him to work harder so long as any positive surpluscan be extracted;this stops being desirable only when the point is reached where the marginal product of the top person's effort equals its marginaldisutility. This intuition shows that the "no-distortion-at-the-top"result is far more general than the specificcontext illustratedhere. We ignoredequity considerationsby makingutility linear in consumption,but the result does not depend on this simplification.If a tax schedule does not have this property, another feasible and Pareto superior schedule can be

? The editors of the Scandinavian Journal of Economics 1997. James Mirrleesand the theoryof informationand incentives 223 constructed,proving the first one non-optimal;see Phelps (1973), Sadka (1976) and Seade (1977). The result has also proved very important in other applications,most particularlyfor non-linearpricing. This result on the zero marginaltax rate at the top is, however, often misconstrued.First, its applicabilityis found to be limited in the numerical exercises discussed below. It also applies only when the support of the distributionis known for sure. Second, it is a distractionfrom the main issue for effecting redistribution,which concernsthe averagetax rate. The average tax rate for the richest person may be quite high, and rising through most of the range of n even when the marginalrate at the top is zero. The precise assumptionsabout what is observablemake an important difference for the incentive compatibilitycondition and therefore for the optimum tax schedule. Throughoutthe above analysis,we have assumed that the governmentcould observea person'sgross outputor incomez, but not the effort or clock hours y. A different solution results if y can be observed.Then the governmentcan announcefunctionsx(n) andy(n). The truthfulrevelation constraints become U(x(n),y(n)) > U(x(m),y(m)) for all n, m. These implyequal utilityfor all persons.The mechanismsimply strings out everyone along a common indifference curve, with the higher-n types working longer hours and getting just enough more consumption to compensate them. Mirrlees [1974b] pointed this out, but also remarked that in realityclock hourswere not a true indicationof effort. The transla- tion of clock hoursy into effective output z requiredsome unobservable effort, and a mechanismbased ony gave no incentiveto the higher-ntypes to make this effort.

SubsequentDevelopments and Applications Mirrlees' pioneering work on incentive compatibilityand the design of optimal rewardschedules under informationasymmetry has led to numer- ous applications,and has transformedour understandingof manyissues in many fields of economics. We offer a very brief and selective summaryof these subsequentdevelopments.

Taxation:Mirrlees was motivatedby the desire to understandproperties of just tax systems.For practicalapplication, it is importantto discernproper- ties of the tax schedule more fully, since the first-orderconditions tell us very little on their own. The firstset of simulationresults were presentedin Mirrlees[1971c]. He found the tax scheduleto be roughlylinear, with fairly low marginal tax rates. Stern (1976), beginning from Mirrlees' original

? The editors of the Scandinavian Journal of Economics 1997. 224 A. Dixit and T Besley finding of near linearity,conducted a more comprehensiveanalysis of a linear tax scheme. He tried differentspecifications of the productionfunc- tion or the objective function to see how high or progressive the tax schedulecould become, with mixed success. Mirrlees'simulations imposed strong assumptionsabout preferences and they were later augmentedby the more comprehensiveexercise of Tuomala (1984). He confirmedthat zero at the top is a not particularlyimportant in applicationswith rather high marginalrates being found near to the top of the distribution,then going rapidly to zero. He also found considerable non-linearityin the optimal tax schedule. Hammond (1979) broughtout some of the links between optimal taxa- tion and more clearly. He examined informational feasibilityin greaterdepth, and establishedthe importantresult that in an economy with a continuumof types, personalizedlump-sum transfers are not incentivecompatible. This gives a theoreticalbasis for the usual asser- tion that such transfers,despite their theoreticallydesirable propertyof being non-distorting,are not a practicaltool of policy. In an economywith a finite numberof discrete types, limited lump-sumtransfers are feasible; Stiglitz (1981) examines this possibilityand its consequences. Atkinson and Stiglitz (1980, pp. 412--22) give a simple expositionof the theory of optimumnon-linear income taxationwith finite numbersas well as a continuumof types, and offer an interpretationof the condition (14) that is specificallytailored to this context. Guesnerie (1995) is a recent detailed and masterly analysis of all aspects of the theory of tax design under asymmetricinformation. A furtherimportant line of researchconcerns when an optimal income tax constitutesthe limits on the amountof redistributionthat a society can undertake. Part of this issue concerns whether income taxes should be supplemented with commodity taxes and/or public provision of private goods to gain even more redistributivepossibilities. Beginning with Atkin- son and Stiglitz (1976) and Mirrlees [1976a], this has been shown to depend upon the separabilitybetween leisure and other goods in the utility function. Intuitively,a good that is relativelymore complementarywith labor supply ought to be subsidized. However, if all goods are equally substitutableor complementarywith laborsupply (as in the case of separa- bility), then income taxes can serve all redistributiveends.

TheRevelation Principle: While Mirrlees[1971c] already contains the idea, the generalityand the power of this principlebecame clear only gradually in the course of researchover several years. The same idea also emerged from anothersource, namelythe analysisof manipulabilityof social choice rules by Gibbard(1973), which in turngrew out of a conjectureof Vickrey (1960). More general formulations,and different equilibriumconcepts

? The editors of the Scandinavian Journal of Economics 1997. JamesMirrlees and the theoryof informationand incentives 225

(dominantstrategy, Bayesian Nash) emerged with the work of Dasgupta, Hammond and Maskin (1979) and Myerson(1982).

Non-linearPricing: Spence (1977) was probablythe first to recognize the generality and wider applicabilityof Mirrlees' income tax model.10He adaptedthe technique- regardingutility as a state variable,whose differ- ential equation captures the local incentive compatibilitycondition, and imposingthe single crossingcondition to ensure global incentivecompati- bility- to characterizea monopolist'soptimum non-linear price schedule when he does not know the demand function of any individualconsumer. Indeed, the single-crossingproperty is now often called the Mirrlees- Spence condition. Mussa and Rosen (1978) in an early and related model treat the question of a multi-productmonopolist's choice of qualitiesand prices. This line of research has progressed far in industrialeconomics. The masterlybook by Wilson (1993) is a notable recent contribution.

Regulation:Baron and Myerson(1982) showed how a regulatorwould set prices when the costs of the firm were private information.Just as the marginaltax rate in Mirrlees'problem has to be small enough to provide highly productiveindividuals to reveal their abilitythrough their income- leisure choice, so the regulator has to allow low-cost firms sufficiently higher profit (share the surplusor rent with them) to prevent them from slackening or pretending to have higher costs. Apart from its specific contribution to the theory of regulation, this paper was important in explainingthis idea to a much wider audience,and therebyspurring much more researchusing the revelationprinciple. This line of researchculmi- nated in the comprehensivetreatise by Laffont and Tirole (1993), which also initiated an extension to the politics of regulation.

Public Goods andAuctions: An individualbeneficiary of a publicgood has better knowledge of his own preferencesthan does the government.The consumer has the incentive to understatehis willingnessto pay, and the governmentwants to elicit the truth. The same problem arises between each bidder and the auctioneer when a private good is being auctioned. The theoryof design of truth-revealingmechanisms in these situationsgot its start from the work of Vickrey. Later developments of this research used the revelationprinciple, and its associatedtechniques, more formally. Myerson (1981) developed and applied the principle in the context of

'"Spencedoes not cite Mirrlees[1971c] directly, but he cites a manuscriptversion of Atkin- son and Stiglitz(1980), whose discussionof income taxationis explicitlybased on Mirrlees' paper.

? The editors of the Scandinavian Journal of Economics 1997. 226 A. Dixit and T.Besley auctions. Green and Laffont (1979) gave a thorough treatment of the theory of mechanismdesign for demand revelationand efficient provision of public goods.

IV. Incentives under Moral Hazard The essence of moral hazardis that one partyto an economic relationship takes an action that cannot be observed or inferred by another affected party.Thus an observableoutcome x is a functionx = g(a, 0) of two unob- servables,an action a and a state of nature 0. The action is chosen before the state of nature is realized. In the context of insurance,where the term moral hazardoriginates, a could be the care exercisedby the insured,and x the amount of loss in an accident.In a principal-agentrelationship, a is the agent'seffort andx the total profitor surplusfor the pair. In each case, 0 is a randomvariable that affects the outcome, making it impossibleto infer a perfectlyby observingx. Following the general discussion of insurance by Arrow (1963), the problemwas modelled in this way by Spence and Zeckhauser(1971) and Ross (1973), although the latter contributionfocused on the case where effort was not costly. We lay out this formulationbriefly, and then turn to Mirrlees'reformulation. The contractbetween the principaland the agent specifies the amount s(x) the principal is to pay the agent if the outcome x is observed. Let H(x-s(x)) be the principal'sutility, and U(s(x))-V(a) the agent's, net of his disutilityof effort. Let 4(0) denote the densityfunction of the random variable 0. The principal is to choose the contract s(x) to maximize his expected utility

H(g(a, 0)-s(g(a, 0)))/b(0) dO. (15) subject to two constraints. First, he must ensure that the agent has sufficient expected utility u0 to induce him to accept the contract (the participationconstraint)

[U(s(g(a, 0)))-V(a)] O(O)dO uo. (16)

Second, the principalmust recognizethat, given the contacts(x), the agent chooses a to maximizehis own expectedutility (the incentivecompatibility constraint).Thus aearg max [U(s(g(a, 0)))-V(a)]p(O0) dO. (17)

? The editors of the Scandinavian Journal of Economics 1997. James Mirrleesand the theoryof informationand incentives 227

In practice,this condition must be replacedby the first-ordercondition of the agent's optimization.But that condition requiresthe derivativeof the paymentfunction s(x); by using it we are imposingdifferentiability on the principal'schoice function whereas any such properties,if they are valid, should emerge as a part of the solution. A second drawbackof the above method is of greater practicalimportance: the conditions for the princi- pal's optimization problem become unwieldy and yield little general insight, although Spence and Zeckhauser (1971) and Ross (1973) examined several special cases and obtained some useful inferences. This set the stage for another importantbreakthrough by Mirrlees.We offer a very brief exposition.The details can be found in Mirrlees [1974b, 1975a, 1976a, 1979a]; see also the survey by Hart and Holmstrom (1987). Mirrlees' crucial analyticaladvance came from the fact that for each level a of effort or care, the probabilitydistribution of the randomstate 0 gives rise to a probabilitydistribution on the outcome x. Therefore the choice of a can be regarded as a choice among the availableprobability distributionsof x.11Write F(x, a) for the cumulativedistribution of x given a, andf(x, a) for the correspondingdensity function. More effort produces better outcomes in the sense that as a increases,the distributionof x shifts to the right in the sense of first-orderstochastic dominance. Now the expression(15) for the principal'sexpected utility gets replaced by

H(x-s(x))f(x, a) dx, (18) the participationconstraint (16) is replaced by

[U(s(x))-V(a)]f(x, a) dx >uo,, (19) and the incentive compatibilityconstraint (17) becomes aearg max [U(s(x))-V(a)]f(x, a) dx. (20)

Note that a no longer appears as an argument of s(.); therefore the conditionof the optimalchoice of a can be characterizedwithout imposing

" Spence and Zeckhauser(1971, footnote 2) mentionthis possibility,but do not conductany analysisusing it.

? The editors of the ScandinavianJournal of Economics 1997. 228 A. Dixit and T Besley a priori assumptionson the rewardfunction. We turn to the results and interpretationsthat emerge from this formulation.

The TradeoffBetween Incentives and Risk-Sharing Suppose the unorthodoxconstraint (20) can be replaced by a customary equality constraint, namely the first-order condition of the agent's maximization,

- V'(a) + [u(s(x))- V(a)]fa(x, a) dx = 0. (21)

Now the principalwants to choose s(x) to maximize(18) subjectto (20) and (21). This is a simpler variationalproblem, and its first-ordercondition is

H'(x-s(x)) fa(x, a) (22) U'(s(x)) f(x, a) where 2 and p are the respectiveLagrange multipliers for the constraints (19) and (21). One can prove that p >0; see Holmstrom(1979) and Roger- son (1985) for further discussion. This has an immediate interpretation.If the incentive compatibility constraintwere absent, the second term on the r.h.s.would be absent, and the ratio of the marginalutilities of income to the two parties would be equal in all outcomes. This is the Arrow-Borch risk sharingresult, which would yield the standardfirst-best Pareto optimum of an Arrow-Debreu model of complete markets in the absence of informationalasymmetry. Thereforethe second term on the r.h.s.of (22) capturesthe extent to which risk-sharingmust be limited to give the agent an incentive to exert the unobservableeffort. The existence of a tradeoff between risk and incentives was noted by earlier writers.In fact an analogous condition appears in Ross (1973, eq. (10)). But Mirrlees'result makes the idea much more precise and amen- able to a very intuitive interpretationwhich is due to Holmstrom(1979). Note that fa(x, a)/f(x, a) = Q[lnf(x, a)]/aa. (23) This is the derivative of the log likelihood function of outcome, with respect to the effort level viewed as an unknownparameter. It gives the principal probabilistic information based on the observable x about

? The editors of the Scandinavian Journal of Economics 1997. James Mirrleesand the theoryof informationand incentives 229 whetherthe agent took the unobservableaction a. If the derivativeis large and positive, the principalcan quite reliablyinfer that the agent's action was unlikely to be locally smaller than a. Therefore he can more confi- dently rewardthe agent based on this x. That is just what (22) shows. The larger is the right-handside, the larger is the principal'smarginal utility relative to that of the agent, and therefore the larger the agent's reward s(x) from the outcomex. In other words,the more informativeisx abouta, the fartherwill the incentive payment for x depart from that under first- best risk-sharing. This elegantly simple result not only gave economists a clear intuition about the trade-offbetween risk-sharingand incentives,but also paved the way for furtherunderstanding of the role of other observableand informa- tive variablesin contractdesign. At this level of generality,one cannot say much about the shape of the optimal incentive scheme. However, mono- tonicity in x such that high observed returnsare associatedwith a higher payoffwould appearto be a naturalproperty. This will hold providedthat such high returnsare more likely to be due to effort ratherthan pure lack, since a monotonicincentive scheme will then motivatethe agent to provide more effort. Differentiating(22) one soon finds that the formal require- ment is that the likelihood ratio fa(x, a)/f(x, a) is increasing in x, which is knownas the monotone likelihoodratio property.Milgrom (1981) demon- stratedhow the conditioncould be interpretedto mean that higherreturns convey "good news" about the agent's effort.

Approximationto the First-Best If the likelihood derivativein (23) becomes unboundedas the outcomes tend toward the worst, and the agent's utility U(s(x)) is also unbounded below, incentivecompatible contracts can approacharbitrarily close to the first-bestby setting s(x) at extremelypunishing levels for such outcomes. Of course the exact first-bestis not attainable,so technicallythe problem has no solution. But in practice, one can say that moral hazard can be almost fully overcome under these conditions. Mirrlees [1974b] gave an example of this, and Mirrlees [1975a]proved the general result. The intuitionis clear from the likelihoodinterpretation. If the likelihood derivativegoes to infinityfor values of x close to its lower limit, this means that bad outcomes are extremely informativeabout the agent's having taken a suboptimal action. When he takes the optimal action, the bad outcomes are very unlikely;therefore the agent need be given only a small offsetting rewardin the case of other outcomes to fulfill his participation constraint.But if he takes a suboptimalaction, the bad outcomes become much more likely, and the severe penalties associatedwith these outcomes

? The editors of the Scandinavian Journal of Economics 1997. 230 A. Dixit and T.Besley give the agent a large incentive to avoid such actions. Thus the principal can induce first-besteffort at near-zerocost.

Validityof the First-OrderApproach Before Mirrlees,the replacementof the agent's maximizationby its first- order condition was done routinely and without thought. But Mirrlees [1975a, 1982c] showed that this procedurewas in fact very problematic. Under standardassumptions, an agent'sproblem under moral hazard need not be convex. Hence, the agent's optimal action may occur at an extreme of the feasible set of actions. There may also be multiple solutions to the first-orderconditions, with some of the solutionsyielding local minimaor stationarypoints that are neithermaxima nor minima.Mirrlees also asked when the first-orderapproach is valid. He correctlyidentified two condi- tions that are of use, the monotone likelihood ratio property, and the convex distributionfunction property,but his proof was faulty. Further work by Rogerson (1985) and Jewitt (1988) has now completedthis aspect of the theory.

Applicationto Hierarchies Mirrlees [1976a] applied this general frameworkto study the internal organizationof firms,specifically the use of incentivewage schedules and the role of hierarchies.This is an importantcontribution to the formal- ization of Coase's idea that the size of the firmis determinedby tradingoff the costs of internal control against the costs of using the market. A particularlyintriguing result in Mirrlees'spaper is that paymentschedules at lower levels of the firm's internal hierarchyshould be more like fixed wages for following particular instructions, and at higher levels there should be more profit-sharingincentives. This paper has also inspireda lot of furtherwork.

LaterDevelopments and Applications As with the revelation principle for the case of adverse selection, the theory of incentive design under moral hazard has become a part of everydayeconomic theory. We select a few examplesfor special mention. For an expositionwith a somewhatdifferent perspective and using some- what different techniques,see Stiglitz (1983). The basic idea that outcomes conveyprobabilistic information about the underlyingeffort, and that incentiveschemes can be based on any variable that is informativein this manner,has verywide applicability.For example, Holmstrom(1982) uses it to show how, when a principalhas severalagents

? The editors of the Scandinavian Journal of Economics 1997. James Mirrleesand the theoryof informationand incentives 231 and the errors in their outcomes are correlated, the optimal incentive scheme for each agent should use the outcomes of all other agents. When only rank order information about individuals'performance is known, contests emerge as optimal mechanisms,with payment accordingto the order in which individualsperform. Grossman and Hart (1983) construct a very general model, using an alternative that circumventsthe difficulties of the first-orderapproach. However,the analysisis at such a level of generalitythat few specificresults emerge. On the other hand, Holmstromand Milgrom(1987) offer a very specific model based on constant absolute risk aversion and a normal distributionof outcomes that arises from aggregationof numeroussmall shocks through time. This enables them to obtain and characterizean optimalincentive scheme that is linearin outcomes. Its coefficientsdepend in a naturalway on the agent'srisk aversionand disutilityof effort, and on the error with which the observed outcome reflects the unobservable effort. If the outcome is a more accuratemeasure of the effort, then the optimal incentivescheme will be "more powerful"in the sense of givinga higher marginalreward for output. Analysisof moral hazardin the context discussedby Mirrleeshas found many applications.Mirrlees focused at length on the internalworkings of the firm. Later work on this theme has considered more sophisticated incentiveschemes that includeauditing; see, for example,Dye (1986). This has a large numberof applicationsin the accountingliterature. The work has also been developed to studyoptimal financial structure from a princi- pal agent point of view. Formal developments of ideas in Jensen and Meckling's(1976) essay owes a great deal to the ideas developed in Mirrl- ees' work. In all of these applications,the trade-off between risk sharing and moral hazardis a central concern. A majordrawback with the above formulationis that effort has only one dimension. In real world situations agents typicallyhave many different dimensions of discretion. Output too may be multi-dimensionalwith differenttypes of output being differentiallyresponsive to particularkinds of effort. Holmstromand Milgrom(1987, 1991) initiate a studysuch multi- task environments.Of particularimportance is the interaction between different kinds of activities. Compare two activities, whose outcomes measure their inputs with different degrees of accuracy.Considered in isolation, the more accurately measured activity should have a more powerful incentive scheme. But suppose the same agent performsboth, and they are substitutesin the sense that more effort devoted to one comes at the cost of effort devoted to the other. Offering a higher-powered incentive scheme to the more accuratelymeasured activity will divert the agent'seffort awayfrom the other. Thereforethe principalfinds it optimal to weaken the incentivesfor this activitytoo. This work is provingto be one

? The editors of the Scandinavian Journal of Economics 1997. 232 A. Dixit and T Besley of the most fruitful extensions of the basic moral hazard model; for example, Dixit (1996, appendix)uses it to explain the weakness of incen- tives in governmentbureaucracies.

V. Mirrlees' Other Contributions While the contributionsdiscussed above go to the very foundations of economic theory, Mirrlees is perhaps even better known to most of the economics profession for his specific contributionsto and developmenteconomics. His joint work with Peter Diamond on commoditytaxation was a major advanceupon previoustreatments of FrankRamsey and Paul Samuelson; the Diamond-Mirrlees model now stands as the definitivestatement and analysisof this branchof public economics. The second of the two Diamond-Mirrlees papers [1971a] and [1971b] elaboratesRamsey-Samuelson type inverse-elasticitytax formulas,and is probably the more widely known. Many subsequent papers have been written,expressing the formulasslightly differently, examining their impli- cations in special contexts like intertemporalallocation or international trade, and makingthem amenable to empiricalwork. But the firstDiamond-Mirrlees paper contains a far more fundamental result. Since commoditytaxation drives a wedge between consumer and producerprices, it necessarilyleaves us in a second-bestworld. The general theoryof such an economy offers little hope for any clear prescriptions;we know from the negative result of Lipsey and Lancaster that once one condition of Pareto optimality is violated, then it may be desirable to violate others. But Diamond and Mirrleesoffer a positive resultof surpris- ing generality:under a wide range of conditions it remains desirable to keep the production sector as a whole operating efficiently.Thus inter- mediate inputs should not be taxed or subsidized,a small countryshould keep its production sector exposed to world prices, and in the inter- temporalcontext interest should be taxed or subsidized,if so desired,only at the stage where consumersor savers receive such income. The argument is remarkablysimple. Make two assumptions:(i) the aggregatedemand vector varies cont;inuouslywith respect to the set of tax instruments,and (ii) the set of instrumentsis sufficientlyrich that a varia- tion that potentially benefits all consumers can be found. Suppose a tax equilibriumleaves the aggregatedemand point in the interiorof the feasi- ble aggregateproduction set. Then by (ii) a direction of variationcan be found to benefit all consumers,and by (i) a small move in this directionis feasible. Therefore the initial point cannot have been optimal.A uniform lump-sum subsidy satisfies these conditions. More interestingly,so does

? The editors of the Scandinavian Journal of Economics 1997. James Mirrleesand the theoryof informationand incentives 233 commodity taxation, provided there is a commodity (which may be a Hicksian composite) that is not simultaneouslypositively demanded by one consumer and positivelysupplied by another.An increase in its price must be Pareto improving if no consumer is a net demander, and a decrease works if none is a net supplier. In later work they extended this result. For example Mirrlees [1972a] consideredhow restrictedability to tax profitscan justify departuresfrom productionefficiency, and Diamond and Mirrlees [1976b]built on earlier work by both authorsto show that under constantreturns to scale, private production should continue to make zero profits at the correct shadow prices. This formulationprovided the most general theoretical underpin- ning for Mirrlees'work on cost benefit analysiswith Ian Little. Little and Mirrlees applied these ideas to derive very detailed criteria and proceduresfor evaluatingdevelopment projects. The productioneffi- ciency result has important policy implications, especially for less developed countrieswhere productiondistortions abound. Their manual [1974c] had significant impact on the thinking and practices of inter- national organizations,particularly the World Bank, which evaluate and supportsuch projects.

VI. ConcludingRemarks Mirrlees'work has enhanced our understandingof a range of important problems.While setting out to study specificapplications, mostly in public economics and developmenteconomics, he has developedanalytic insights that had beneficial consequences far beyond the problem at hand. Thus, not only has he solved importantproblems, he has providedus with a novel way to address different problemsin the future. The fact that Mirrlees began with applications, and not with the expressedaim of developinggeneral theories is an importantaspect of his approach.Thus, his attemptsto put the welfare economics of government policy on a rigorous footing stem from an intrinsicinterest in providing policy guidance. Hence, while one learns a great deal about economic analysisin general from studyinghis contributions,one is also remindedof his vision that one reason for doing economics is the possibilityof making the world a better place.

References

Akerlof, G.: The market for lemons: Qualitative uncertainty and the market mechanism. QuarterlyJournal of Economics 84, 488-500, 1971. Arrow, K. J.: Uncertainty and the welfare economics of medical care. 53, 941-69, 1963.

? The editors of the Scandinavian Journal of Economics 1997. 234 A. Dixit and T. Besley

Arrow, K. J.: Political and economic evaluation of social effects and externalities.In J. Margolis (ed.), The Analysisof Public Output,Columbia UniversityPress, New York, 1970. Atkinson,A. B. and Stiglitz,J. E.: The designof tax structure:Direct versus indirect taxation. Journalof PublicEconomics 6, 55-75, 1976. Atkinson,A. B. and Stiglitz,J. E.: PublicEconomics. McGraw-Hill, New York, 1980. Baron,D. and Myerson,R.: Regulatinga monopolistwith unknowncosts. Econometrica50, 911-30, 1982. Dasgupta,P. S., Hammond,P. J. and Maskin,E.: The implementationof social choice rules: Some general rules on incentivecompatibility. Review of EconomicStudies 46, 185-216, 1979. Dixit, A. K.: The Makingof EconomicPolicy: A Transaction-CostPolitics Perspective. MIT Press, Cambridge,MA, 1996. Dye, R. A.: Optimal monitoring policies in agencies. Rand Journal of Economics 17, 339-50. Fudenbeg,D. and Tirole, J.: Game Theory.MIT Press, Cambridge,MA, 1991. Gibbard, A.: Manipulation of voting schemes: A general result. Econometrica 41, 587-602. Green, J. and Laffont, J.-J.: Incentives in Public Decision Making. North-Holland, Amsterdam,1979. Grossman,S. J. and Hart, 0. D.: An analysisof the principal-agentproblem. Econometrica 51, 7-45, 1983. Guesnerie, R.: A Contributionto the Pure Theoryof Taxation.Cambridge University Press, Cambridge,UK, 1995. Hammond, P. J.: Straightforwardindividual incentive compatibilityin large economies. Reviewof EconomicStudies 46, 263-82, 1979. Hart, 0. and Holmstrom, B.: The theory of contracts. In T. Bewley (ed.), Advances in Economic Theory:Fifth WorldCongress of the EconometricSociety, Cambridge University Press, 71-155, 1987. Holmstrom, B.: Moral hazard and observability.Bell Journal of Economics 10, 74-92, 1979. Holmstrom,B.: Moral hazardin teams. Bell Journalof Economics13, 324-40, 1982. Holmstrom,B. and Milgrom,P.: Aggregationand linearityin the provisionof intertemporal incentives.Econometrica 55, 303-28, 1987. Holmstrom,B. and Milgrom,P.: MultitaskPrincipal-Agent Analyses: Incentive Contracts, Asset Ownership,and Job Design.Journal of Law Economicsand Organization7, 24-52, 1991. Jensen, M. C. and Meckling,W. H.: Theory of the firm, managerialbehavior, agency costs and ownershipstructure. Journal of FinancialEconomics 3, 305-60, 1976. Jewitt,I.: Justifyingthe first-orderapproach to principal-agentproblems. Econometrica 56, 1177-90, 1988. Laffont, J.-J. and Maskin,E.: The theory of incentives:An overview.In W. Hildenbrand (ed.), Advancesin Economic Theory:Invited Papers for the FourthWorld Congress of the EconometricSociety, Cambridge University Press, 1982. Laffont,J.-J. and Tirole,J.: A Theoryof Incentivesin Procurementand Regulation.MIT Press, Cambridge,MA, 1993. Milgrom, P.: Good news and bad news: Representationtheorems and applications.Bell Journalof Economics12, 380-91, 1981. Mussa, M. and Rosen, S.: Monopolyand productquality. Journal of Economic Theory18, 301-17, 1978. Myerson, R. B.: Optimal auction design. Mathematicsof OperationsResearch 6, 58-73,

? The editors of the Scandinavian Journal of Economics 1997. James Mirrlees and the theory of information and incentives 235

1981. Myerson,R. B.: Optimalcoordination mechanisms in generalizedprincipal-agent problems. Journalof MathematicalEconomics 10, 67-81, 1982. Pauly,M.: The economicsof moralhazard: Comment. American Economic Review 58,531-6, 1968. Phelps, E. S.: Taxation and wage income for economic justice. QuarterlyJournal of Economics87, 331-54, 1973. Rogerson, W.: The first order approach to principal-agent problems.Econometrica 53, 1357-68, 1985. Ross, S.: The economic theory of agency: The principal'sproblem. American Economic Review63, 134-9, 1973. Rothschild,M. and Stiglitz,J. E.: Equilibriumin competitiveinsurance markets: An essayon the economics of imperfect information.Quarterly Journal of Economics 90, 629-50, 1976. Sadka,E.: On income distribution,incentive effects, and optimalincome taxation.Review of EconomicStudies 43, 261-8, 1976. Seade, J. K.: On the shape of optimal tax schedules.Journal of PublicEconomics 7, 203-35, 1977. Sheshinski,E.: The optimal linear income tax. Reviewof Economic Studies39, 179-208, 1972. Spence, A. M.: MarketSignalling. Harvard University Press, Cambridge,MA, 1974. Spence,A. M.: Nonlinearprices and economicwelfare. Journal of PublicEconomics 8, 1-18, 1977. Spence,A. M.: and Zeckhauser,R.: Insurance,information and individualaction. American EconomicReview (Papers and Proceedings)61, 380-7, 1971. Stern, N. H.: On the specificationof models of optimumincome taxation.Journal of Public Economics6, 123-62, 1976. Stiglitz,J. E.: Pareto optimalityand competition.Journal of Finance36, 235-51, 1981. Stiglitz, J. E.: Risk, incentives and insurance:The pure theory of moral hazard. Geneva Paperson Riskand Insurance8, 4-33, 1983. Tuomala,M.: On optimalincome taxation:Some furthernumerical results. Journal of Public Economics23, 351-66, 1984. Vickrey, W.: Measuringmarginal utility by reactions to risk. Econometrica31, 319-33, 1945. Vickrey,W.: Utility, strategy,and social decision rules. QuarterlyJournal of Economics74, 507-35, 1960. Wesson,J.: On the distributionof personalincomes. Review of EconomicStudies 39, 77-86, 1972. Williamson,0. E.: TheEconomic Institutions of Capitalism.Free Press, New York, 1985. Wilson,C. A.: Model of insurancewith incompleteinformation. Journal of EconomicTheory 16, 167-207, 1977. Wilson, R. B.: NonlinearPricing. Oxford University Press, New York, 1993. Zeckhauser,R. J.: Uncertaintyand the analysisand evaluationof publicexpenditures: The PPB system. Joint Economic Committee, U.S. Congress, Vol. 8, U.S. Government PrintingOffice, WashingtonDC, 149-66, 1969.

? The editors of the Scandinavian Journal of Economics 1997.