Information Projection: Model and Applications.∗

Home , George Loewenstein

Kristóf Madarász

University of California, Berkeley.†

First Version: March 2007. This Version: April 2008.

Abstract

Evidence from both psychology and economics shows that people systematically underestimate

informational differences. I model such information projection by assuming that after processing a

signal, a person overestimates the probability with which this signal is available to others. I apply the

model to agency and communication settings. When learning about an expert’s skill using ex-post

information, a biased evaluator exaggerates how much a skilled expert could have known ex ante,

and hence underestimates the expert on average. To minimize such underestimation, ex-ante experts

will be too reluctant to produce useful information that will be seen more clearly by the evaluator ex

post, and too eager to gather information that the evaluator will independently learn ex post. I also

show that information projection introduces noise into evaluations decreasing an expert’s incentive

to exert effort and lowering the optimal use of monitoring relative to the Bayesian case. Evidence

from, and applications to, medical malpractice, liability regulation, and effective communication are

discussed.

Keywords: Hindsight Bias, Curse of Knowledge, Internal Labor Markets, Medical Malpractice, Communication, Retrospectroscope.

∗I am grateful to Matthew Rabin for his constant encouragement and suggestions. I also owe special thanks to Botond Koszegi˝ for his advice and support. I beneﬁted from conversations with George Akerlof, Jerker Denrell, Erik Eyster, Marina Halac, Daniel Kahneman, Ulrike Malmendier, David Rosenberg, Adam Szeidl, Daniel Teknos and seminar participants at LSE, LBS, Cambridge, Oxford, UC San Diego, Yahoo! Research, Google, UC Berkeley, CEU and HAS. †Contact: [email protected] or 549 Evans Hall # 3880, UC Berkeley, Berkeley CA 97420-3880.

1 1 Introduction

The study of how asymmetric information affects economic activity, typically builds on the assumption that people perceive informational differences correctly. Evidence shows however, that people systematically mispredict informational differences and exaggerate the similarity between the information they have and information available to others. For example, having learned novel information about a patient, one may well exaggerate whether an attentive physician should have diagnosed cancer earlier. In this paper, I model such information projection by assuming that having processed a signal, a biased person exaggerates the probability with which this signals is also available to others. I show that as a result, a biased examiner will be too pessimistic about the skill of the agents she evaluates. In turn, agents will have incentives to change the type of information they produce to mitigate the adverse effects of hindsight bias on their reputation. I also investigate how this bias affects the optimal use of incentives, communication and social inference. In the context of ﬁnancial markets, Camerer, Loewenstein, and Weber (1989) provide laboratory evidence that better informed traders overestimate how much uninformed traders know, and that such curse of knowledge affects market outcomes. In Section 2, I review both controlled laboratory, and more stylized ﬁeld evidence to support my claim that information projection is a widespread phenomenon. In Section 3, I develop a formal model of information projection building on CWL (1989).

In Sections 4 and 5, I turn to the main application of the paper: the inﬂuence of information projection on performance evaluation. To illustrate the results, consider a radiologist who diagnoses a patient based on an ambiguous X-ray. After the diagnosis is made, the patient returns with novel symptoms and an evaluator is asked to assess the radiologist’s original diagnosis. A biased evaluator projects ex-post information, and acts as if all radiologists should have guessed the symptoms earlier. This leads to two types of inferential mistakes: underestimation, and over- and under-inference. Assume that radiologists differ in skill, and skilled ones understand the X-ray and unskilled ones do not. If more ex-ante information increases the chances of an ex-post successful treatment, a biased evaluator exaggerates the success rate for both types of radiologists. In hindsight, a successful treatment becomes the norm and a failed one becomes a surprise. If the probability of failure decreases with skill, the evaluator thus underestimates the radiologist skill on average. The ’surprisingly’ high failure to success ratio is perceived to be the result of the lack of skill, rather than the lack of sufﬁcient ex-ante

2 information. While the evaluator underestimates the agent on average, information projection will typically affect her conditional beliefs as well. Whenever knowing the symptoms ex-ante would have increased the chances of a successful diagnosis for a skilled type than for an unskilled type, the evaluator over-infers skill from performance. For example, if the symptoms alone are uninformative, but combined with the X-ray they are perfectly indicative of cancer, a biased evaluator perceives differences in luck to be differences in skill. If however, knowing the symptoms alone is almost perfectly informative, and hence the probability of a failed treatment depends very little on understanding the X-ray, the evaluator perceives differences in performance to be due to differences in luck. Here, she underinfers skill from performance. Given these results, a natural question to ask is how the radiologist might change his behavior to minimize the adverse effects of information projection on his reputation. Evidence suggests that experts are often aware that those evaluating them are biased. For example, it is argued that “defensive medicine”, medical procedures designed to minimize false liability rather than maximize cost-effective health care, is due partly to the fear of experts that those evaluating them will suffer from hindsight bias. To study such behavior, assume that the radiologist can decide what radiographs to order ex ante. I show that if a radiograph is a substitute of the ex-post information, i.e., it provides information ex-ante that the evaluator will independently learn ex post, then the radiologist has an incentive to over-produce this radiograph. Overly costly MRI’s might be ordered for all patients if such MRI’s produce the same information that the evaluator inevitably learns ex post. At the same time, the radiologist is too reluctant to produce complement information, i.e., radiographs that help him make a good diagnosis but can be interpreted better in light of ex-post information. He might avoid ordering a mammography that helps detect breast cancer if he fears it can be interpreted much better in hindsight than in foresight. Thus as a result of information projection, increasing the likelihood of ex-post evaluations could increase production costs, lower productivity and exacerbate the types of over- and underproduction of information that observers have attributed to medical malpractice regulation.1

If the management of a hospital is aware of evaluators propensity for hindsight bias, it can correct this mistake to some extent. In important situations however, even the perfect anticipation of the evaluator’s

1See Studdert et al. (2005), Kessler, and McClellan (2002) and Jackson, and Righi (2006).

3 bias cannot eliminate inefﬁciencies. To show this, in Section 5, I turn from a context where the amount of information that the radiologist learns from an X-ray is a function of effort rather than of skill. To motivate the radiologist, a hospital may provide incentives to encourage a careful reading of the X-ray. When the radiologist’s effort is not observed however, he might be rewarded and punished based

solely on whether the patient’s condition improved or deteriorated. In cases of limited liability or risk- averse radiologists, no such reward scheme can be ﬁrst-best optimal. A second-best scheme may instead involve monitoring whether the radiologist’s diagnosis accorded with the information that was available to him ex-ante. A biased evaluator, however, is prone to judge the correctness of the diagnosis not on the basis of the

ex-ante available information, but on the basis of both the ex-ante and the ex-post information. Thus the radiologist is punished too often for bad luck and rewarded too rarely for good decisions. As a result, the radiologist’s incentive to carefully understand the X-ray is lower than under a Bayesian evaluator. More generally, an agent who is de jure facing a negligence rule is de facto punished and rewarded according to strict liability if he is assessed by a biased judge or jury. I show that the report of a biased evaluator contains too much noise and hence even if the hospital anticipates the bias, it has a reason to monitor less often than in the rational case. I also show that if the hospital does rely on biased reports, it nevertheless decides to induce lower levels of effort to save on incentives that are appropriate in the rational case, but too strong in the biased case. In Section 6, I turn to the inﬂuence of information projection on communication. I show that a listener who projects his private information will be too credulous of a speaker’s advice because he overestimates how much the speaker knows. I also show that a speaker who projects information on her non-communicable background knowledge, will mistakenly send messages that are too ambiguous for her audience to interpret. I identify conditions for over- and under-communication. Finally, in Section 7, I conclude with a brief discussion of some further implications and extensions of my model. I discuss how information projection might affect social inferences in networks causing hostility between groups, as well as the possibility of extending my model to capture the related phenomenon of ignorance projection, where a person who does not observe a signal underestimates the probability that this signal is available to others.

4 2 Evidence and Related Literature

Folk wisdom has for a long time recognized the existence of what I call information projection, as noted by the common refrain, "hindsight is 20/20". I begin this section by discussing both laboratory and more stylized evidence on two closely related phenomena: hindsight bias – the phenomenon that people form biased judgements in hindsight relative to foresight –, and the curse of knowledge – the phenomenon that informed people overestimate the information of those uninformed.2 I then turn to a brief summary of some evidence on related biases lending support to the existence of the projection of various forms of private information. Although individual studies are often subject to alternative interpretations, the sum-total of the studies provides a compelling case for the widespread existence of this phenomenon. The presence of information projection in experimental financial markets was demonstrated by Camerer, Loewenstein, and Weber (1989). A group of Wharton and Chicago MBA students traded assets of eight different companies in a double-oral auction. Traders were divided into two groups. In the first group, traders were presented with the past earnings history of the companies (not including 1980) and traded assets that yielded returns in proportion to the actual 1980 earnings of these companies. In the second group, traders received the same information, and in addition they also learned the actual 1980 earnings of the companies. By design, returns for traders in the second group depended on the market price established by those in the first group. Therefore to maximize earnings, better-informed traders had to guess as correctly as possible the market price at which less-informed traders traded these assets. If traders in the second group project their information, then their guesses and hence the price at which they trade are significantly different from the market price established by the first group. CLW

(1989) ﬁnds that the guesses of better-informed traders were biased by 60% towards the actual 1980 earnings and market prices were biased by 30%.3 The reason why the bias in the market was lower than in judgements is that traders with a smaller bias traded more aggressively. Less biased traders behaved as if they had anticipated that others would project information. Further evidence comes from the experimental study of Loewenstein, Moore, and Weber (2006) who build on CLW (1989). They study the curse of knowledge using a set of visual recognition tasks.

2Hindsight bias studies involve both between-subject and within-subject designs. In the latter, participants have to recall their own prior estimates after being presented with new evidence. Since my focus in this paper is on interpersonal information projection, I concentrate on the between-subject designs. 3CLW do not report the numbers explicitly, only graphically, so these are approximate. See CLW (1989) pp.1241.

5 Figure 1: From Loewenstein, Moore, and Weber (2006).

In these tasks, subjects are presented with two pictures that differ in one crucial detail. LMW (2006) divide subjects into three groups: uninformed, informed, and choice. In the uninformed condition, no additional information is available besides the two pictures. In the informed condition, the difference between the pictures is highlighted for the subjects. In the choice condition, subjects could decide whether to obtain additional information for a small fee, or remain uninformed. After looking at the pictures, the subjects in each group are asked to guess what fraction of people in the uninformed group could tell the difference between the two pictures. Subjects are compensated based on how well they predicted this fraction. As Figure 1 indicates, the informed subjects’ mean estimate was signiﬁcantly higher than the uninformed subjects’ mean estimate. Importantly, a signiﬁcant portion of the people in the choice condition

paid for additional information. In this group, the mean estimate was 55.4%, while the mean estimate of subjects who chose to remain uninformed was 34.6%. Hence people not only projected their additional information, but also paid for information that biased their judgements in a way that lower their earnings. The work of Fischhoff (1975) initiated research on hindsight bias. He showed that reporting an outcome of an uncertain historical event increases the perceived ex-ante likelihood of the reported outcome occurring. Fischhoff’s findings were replicated by a plethora of studies, and most of these find a strong presence of such hindsight bias, often larger than the one found in this initial study. These studies and the meta-analyses building on them also show that the presence of hindsight bias is robust to a great number of debiasing techniques. A robust comparative static result is that the more informative the outcome the greater is the bias, Harley et al. (2004). As I demonstrate in Section 3, my model of information projection exhibits the same monotonicity. Less controlled evidence comes from more explicit field studies. In the context of liability judgements, there is a wealth of evidence that juries and experienced judges fail to ignore superior informa-

6 tion and instead form judgements as if the defendant had information that was unavailable at the time he acted. Experiments have demonstrated the existence of information projection in the evaluation of ex-ante judgements of various experts. Anderson et al. (1997) documented the existence of the bias in judges deciding on cases of auditors’ liability where auditors failed to predict the ﬁnancial problems of their audit clients. Caplan, Posner, and Cheney (1991) conducted a study with 112 practicing anesthesi- ologists. Here physicians saw identical case histories but were either told that the case ended in minor or were told that it ended in severe damages. Those who were told that a severe damage occurred were more likely to judge the ex-ante care to be negligent. In certain cases, the difference in the frequency of ruling negligence was as great as 51%. Bukszar and Terry (1988) demonstrate hindsight bias in the solution of business case studies, Hastie, Schkade, and Payne (1999) document very serious biases in jurors’ judgement of punitive liability. Strong effects were found among others in experiments on the assessment of railroad accidents, legality of search, evaluation of military ofﬁcers, etc. For survey articles on the evidence, see e.g., Harley (2007).4

A large set of other psychological ﬁndings further indicate that people project various types of private information. For example, a study by Gilovich, Medvec, and Savitsky (1998) shows that people greatly overestimate the probability that their lies, once made, are detected by others.5 Such overestimation was also documented in the context of communication. In a set of experiments, Kruger et al. (2005) found that when people communicate through email, they overestimate how well their intent is transmitted through their messages.6 Here, senders had to make serious and sarcastic statements either through email or voice recording, and then guess the probability that receivers would be able to understand their intent. As Figure 2 shows, the mean estimate for both those sending an email and those sending a voice recording was 78%, while the actual probabilities were 73% in the voice condition and 58% in the email condition. Kruger et al. (2005) also conduct an experiment where they ask subjects in the email condition to vocalize their messages before sending them. Senders are again randomly divided into two groups; some are asked to vocalize the message in the same tone as the tone of their email, and others

4The legal profession has long recognized this bias and developed certain procedures to mitigate its effects. One such procedure is the bifurcation of trials, where ex-post evidence is suppressed at the initial phases of the trial. More on this see Rachlinski (1998). 5Illusion of transparency was also studied in the context of negotiations, Van Boven, Gilovich, and Medvec (2003). Here the results are harder to interpret. 6See also Newton (1990) on tappers and listeners.

7 Figure 2: From Krueger, Epley, Parker, and NG (2005). are asked to vocalize it in the opposite tone. Senders in both groups overestimate how easy it would be to understand their messages, yet such overestimation decreased signiﬁcantly in the case where senders vocalize in the opposite tone. While some of these results may be due to general overconﬁdence about one’s ability to communicate, the evidence is more consistent with the interpretation of information projection. My paper builds closely on the experimental results of Camerer, Loewenstein, and Weber (1989). CLW offer a preliminary model of this bias by assuming that a better informed trader’s estimate of the mean of a less informed trader’s estimate of the value of an asset is the linear combination of the better informed trader’s estimate of this mean value and the less informed traders’ estimate of this mean value. Biais, and Weber (2007) build on this formalization of CLW and assume that after observing a realization of a random variable, a person misperceives the mean of her prior on this variable to be the mean of her own posterior. Biais and Weber then study whether this formulation of within person hindsight bias can explain trading behavior consistent with underreaction to news. They also test their hypothesis using psychometric and investment data from a sample of investment bankers in Frankfurt and London. In the context of predicting future changes in one’s taste, the phenomenon of projection has also been studied by Loewenstein, O’Donoghue, and Rabin (2003) and Conlin, O’Donoghue, and Vogel- sang (2007). In contrast to the projection of taste, the projection of information is most relevant in the interpersonal domain where people think about what others might or might not know, and hence it is primarily a social bias. Several other papers, with no explicitly developed model, argued that information projection, under the rubric of hindsight bias or curse of knowledge, might have important economic consequences. Among others, Viscusi, and Zeckhauser (2005), Camerer, and Malmendier (2006), Heath, and Heath

8 (2007), Rachlinski (1998) argue that information projection might be an important factor in economic settings affecting both judgements and the functioning of organizations. The model also belongs to the small but growing literature on quasi-Bayesian models of individual biases e.g., Rabin (2002), Mul- lainathan (2002) and the literature on social biases e.g., DeMarzo, Vayanos, and Zwiebel (2003).

The evidence I summarized in this section is indicative of the fact that people project various forms of information. Although this evidence comes from a diverse set of experimental paradigms that use different methods of identiﬁcation and classify information projection under a variety of rubrics, the model that I present in the next section provides a framework to study this phenomenon in a more uniﬁed manner. It also provides a setup to make more precise predictions about the implications of information projection in organizations and labor markets and to test such predictions.

3 Model

Consider an environment where people observe signals about the underlying physical state ω ∈ Ω, Ω is bounded. An example of ω could be the fundamental value of a company’s stock, the medical condition of a patient, or the geophysical conditions of a place where an engineer is commissioned to build a

N bridge. Let there be M people and N different signals {sj}j=1. A signal is a function sj(ω):Ω −→ ∆Z from the set of states to the set of lotteries over a realization space Z. These signals provide information about the state through the correlation between the observed outcome from Z and the state ω ∈ Ω. Information is interpreted given a common prior σ(ω) over Ω, where this prior also determines people’s shared view about ω absent any signals.

j j Let pm determine the probability with which signal sj is available for person m. If pm = 0, then

j sj is surely not available for her, and if pm = 1, it surely is. The collection of these parameters for all

j N M m and all j is given by p = {{pm}j=1}m=1. The elements of this vector describe the correct Bayesian estimates of the distribution of information. The informational environment is then summarized by the

N tuple {Ω, {sj}j=1, σ, p}. In what follows, I distinguish between the availability and the processing of a signal. Availability refers to the fact that this signal is ’present’, while processing refers to the fact that its information content is actually understood. As an illustration, note that only someone who has training in medicine

9 knows what to infer from the radiograph. In cases where this distinction applies, I assume that pm concerns the availability of a signal. We can now deﬁne information projection in the following way: a person who projects information exaggerates the probability that the signals she processed are available to others. To measure the extent of this mistake, I introduce a parameter ρ ∈ [0, 1] which denotes the degree of information projection.

Deﬁnition 1 Person i exhibits interpersonal information projection of universal degree ρ if, after process-

j j,ρ j ing signal sj, her perception of pk is given by pk ≡ pk(1 − ρ) + ρ. for all k ∈ M, k = i.

Information projection by person i corresponds to the overestimation of the probability that a signal

person i processed, is available others. Such overestimation is increasing in ρi. If ρi = 0, then the

person has correct Bayesian perception and does not exaggerate the availability of signals. If ρi = 1, then she exhibits full information projection and her perception of the probability that the signals she processed are available to others equals 1. In cases where 0 < ρi < 1, the person exhibits partial information projection.7 The above deﬁnition captures the key feature of the evidence: people underestimates informational differences to an extent not warranted by Bayesian reasoning. Intuition suggests that certain pieces of information are projected more than others, and that the extent to which a particular piece of information is projected depends on a number of factors. In the

above deﬁnition, I allow for heterogeneity in projection by allowing ρ to vary across signals and across

j individuals. If ρi denotes the degree to which person i projects signal sj, then such heterogeneity exists

j l j j whenever ρi = ρi for some j, l ∈ N or ρi = ρk for i, k ∈ M. Here, I do not attempt to pin down the factors determining the value of ρ. My claim though is that the evidence suggests that in a number of economically important domains ρ > 0. More research is needed to get a better understanding of why certain signals are projected more than others. While full information projection is not, partial information projection is sensitive to the re-description of signals. For example, if two signals are collapsed into one, then partial projection of the combined signal induces a different probability distribution on the information of player m than if the two signals were projected individually. In most relevant applications though there is a natural way to break down

7In certain contexts, for greater psychological realism or for issues of measurability, the following transformation of the j,ρ j j true probabilities into perceived probabilities might be more appropriate: pk = pk/[(1 − ρ) + ρpk]. This functional form j j j,ρ preserves the same properties as the previous one for all pk > 0, but assumes that if pk = 0, then pk = 0 for all ρ.

10 information into distinct signals or groups of signals. For example, in the case of hindsight bias in performance evaluation where information projection happens over time, the timing of information already suggests a way to break down information into distinct signals. Importantly, however, almost all results in this paper are qualitative and do not depend on the use of partial projection. I indicate it in the text when a result holds under partial information projection but not under full information projection. There is another sense in which the exact separation of signals matters in my setup. This concerns the distinction between availability and processing. If one signal requires skill to be processed and the other does not, then my model has different implications when these two signals are collapsed into one, or considered to be separate. Here, I always assume that the degree to which a signal requires skill to be processed is ﬁxed. As mentioned in Section 2, evidence suggests that in important contexts, people anticipate that others are biased. Since I build on this fact in the applications, I deﬁne such anticipation formally. Let the probability density function fi,k(ρ) describe the beliefs of person i concerning the extent to which person k = i projects her information. If fi,k is not concentrated on 0, person i believes that there is a non-zero probability that person k is biased. Two types of anticipation are of special interest. First, if person i believes that person k is not biased then the cdf generated by fi,k is such that Fi,k(0) = 1. If person i has a perfect guess of person k’s bias, then Fi,k(ρk) = 1 and Fi,k(ρ) = 0 for all ρ < ρk.

3.1 A Dinner

Many of the paper’s results follow from two ways information projection biases inference. To demonstrate these, consider a dinner invitation from Mrs. Robinson (the host) to Mr. Keynes (the guest). At the dinner, Robinson offers either ﬁsh or meat to Keynes, and let her choice be denoted by y ∈ {M,F }.

Assume that Keynes either prefers ﬁsh, ωF , or meat, ωM . Robinson observes a noisy signal sr about

his taste, where Pr(sr = ω | ω) = h ≥ 0.5. Keynes also observes a private signal sk, about his

preference for the evening. Keynes is better informed about his own taste and thus sk is such that

Pr(sk = ω | ω) = z > h. The point of this example is to see how Keynes’ views change about Robinson after the dinner.

Consider a case, where Robinson has four possible types θ. She is either kind, and follows her signal or she is mean, and follows only her own taste. In addition, she either prefers meat or ﬁsh. Assume for a

11 moment that Keynes knows his taste, and set z = 1. Assume also that Robinson observes only a noisy signal where h = 2/3. Let the prior belief of Keynes be π0(θ), and assume that he initially believes that each type is equally likely. The following table summarizes a Bayesian versus a fully biased guest’s beliefs about the kindness of the host after being served the meal he likes and after being served the meal

he dislikes:

Posterior Bayesian, ρ = 0. Biased, ρ = 1.

2/3+2/3 4 1+1 2 π1(kind | y = sk) = 2/3+1+2/3 = 7 , 1+1+1 = 3 . 1/3+1/3 2 0 π1(kind | y = sk) = 1/3+1+1/3 = 5 , 1 = 0. 7 4 5 2 1 7 2 7 E[π1(kind)] = 12 ∗ 7 + 12 ∗ 5 = 2 , 12 ∗ 3 = 18 .

Note that a biased guest overestimates kindness if he is served the mean he likes, and underestimates it if he is served the meal he dislikes. In the former case, he believes that a kind host serves the right meal with probability one. In the latter case, he believes that a kind host serves the wrong meal with probability zero. Hence in both situations a biased Keynes reads too much into Robinson’s choice. More generally, whenever the guest’s signal about his taste is more precise than the host’s, a biased

ρ guest overestimates how well different types separate. Let π1(θ) be the posterior of a ρ−biased guest. The following proposition shows that the guest overinfers from the host’s choice.

ρ ρ Proposition 1 For all π0, π1(θkind | y = sk) is increasing and π1(θkind | y = sk) is decreasing in ρ.

The above proposition holds independently of whether Keynes actually observes the realization of s1, or just knows that Robinson observed s1. In both cases, he exaggerates how much she knew, and in expected terms infers too much from her choice. As the third row of the above table shows, these two over-attributions do not cancel out, rather on average Keynes comes to believe that Robinson is mean. Note that relative to his prior, a biased guest overestimates the probability that he will be served the meal he prefers. A biased guest’s estimate is 3/4 while the true probability is 7/12. This means that Keynes will be disappointed in Robinson on average. More generally, a biased guest who knows more about his own taste than the host, overestimates the probability with which the host she can serve his preferred meal if she wants to. This implies that on average he underestimates the probability that the host cares about his taste. As the following proposition

shows such underestimation holds for all z > h.

12 ρ Proposition 2 For all π0,E[π1(θkind)] is decreasing in ρ where expectations are taken with respect to the true distribution of signals.

To further illustrate this point, consider a case where the guest and the host receive i.i.d. noisy signals about the state. Assume that the guest mistakenly believes that the host observed the exact same signal realization as he did. As long as the two signals are i.i.d., the expected posterior of a biased observer

and that of a Bayesian one are the same. Underestimation happens only if the biased guest has more information than the host. Even in this case however, over-inference has an interesting implication. Assume that it so happens that the host has the same taste as the guest. Here a fully biased guest on average infers that the host is kind. In contrast, if it happens to be so that the host has a different taste, then a fully biased guest on average infers that the host is hostile. Thus a biased guest misattributes differences in taste to differences in intentions.

4 Skill Assessment

Let’s now turn to the main application of the paper and consider the impact of information projection on performance evaluation. This application is motivated both by the key role performance evaluation plays in labor markets, organizations, medicine, and law and by the evidence which indicates that this bias is common in such contexts.8 In this section, I focus on a problem of skill assessment where a supervisor wants to learn about the competence of her agent. In the next section, I focus on the problem of incentives and monitoring where a principal wants to motivate the agent to exert effort. Consider an agent who is hired by a principal to process and act upon information available to him. Since agents differ in their ability to understand information a supervisor is asked to review the agent’s performance on a task and assess the agent’s competence. Such assessment then forms the basis of compensation, ﬁring or job-allocation decisions. Assume, as it is typically the case, that when evaluating his performance the supervisor has access to some ex-post information that was not available to the agent.9

8On the former, see e.g., Alchian, and Demsetz (1972) or Lazear (2000). 9Berlin (2000) argues that in the case of lung cancer, the generally accepted error rate for radiological detection is between 20% and 50%. For radiographs that were previously evaluated as normal but where the patient later developed lung carcinoma however, the carcinoma is seen in retrospect in as many as 90% of the cases.

13 Consider for example a social worker who is assigned a case of foster care. After the injury of the child the state commissions a supervisor to investigate whether the social worker was efﬁcient at preventing this outcome. All the home visits and the phone calls of the social worker are reviewed to establish whether the social worker acted appropriately given his information. In doing so, a biased supervisor projects information that becomes available only through learning that the child was injured. Similar evaluation might happen, when a CEO is assessed by a Board that knows the market conditions that were uncertain at the time, when the CEO had to decide how to allocate funds among various projects.

Agent Supervisor A’s info S’s info

Radiologist Medical Examiner Patient’s X-ray Subsequent case history.

Social Worker Government Ofﬁcial Child’s family history Child’s reaction to treatment.

CEO Board Firm’s investment projects Market conditions. I ﬁrst show that a biased supervisor underestimates the skill of the agent on average. Since both higher skill and more information leads to a higher probability of success in my setup, exaggerating how much information the social worker had leads to underestimation. The second result identiﬁes conditions under which the supervisor will infer too little or too much from performance. I conclude this section by showing how increasing the frequency of monitoring changes the behavior of an agent who anticipates the bias. I derive predictions on the types of information that will be over-produced and the types that will be under-produced.

4.1 Setup

The radiologist’s (agent’s) task is to offer a treatment recommendation y which maximizes the probabil-

ity of a successful outcome. Before taking y, he receives a set of signals s0 about the patient’s condition

ω, where s0 consists of signals that all radiologists understand and consists of signals that require skill to be processed. The probability that a radiologist understands skill-intensive signals depends his type

θ ∈ [0, 1]. A radiologist of type θ, understands such signals in θ fraction of the time, the most skilled radiologist (θ = 1) always understands the X-ray while the incompetent one (θ = 0) never does. If he does not understand these signals, he infers nothing from them.10 10In some speciﬁcations, a more natural interpretation of θ would refer to the fraction of signals the agent understands/receives or his ability to distinguish between important and unimportant signals. Importantly, the results in this

14 After the radiologist takes y, the supervisor observes whether a success (S) or a failure (F ) occurred

along with a set of novel signals s1 about ω. In most cases, observing success or failure alone has information about the state ω, and it is key to our analysis that the supervisor does learn something novel about the patient’s medical condition that was not available to the radiologist ex-ante.

Assume that understanding more signals in s0 increases the probability that the radiologist’s ex-ante

optimal choice leads to a success ex-post. Furthermore, assume that if he could use the signals in s1 in addition, this probability would even be higher for all types. As long as the principal cares about success, she prefers to employ a high type radiologist over low type. Assume ﬁnally that neither the

11 supervisor nor the agent observes θ but they share a common prior π0 with full support over [0, 1]. The uncertainty about the radiologist skill motivates the skill assessment of the supervisor because what the supervisor learns about θ can then form the basis of employment, compensation and allocation decisions by the principal.

4.2 Example

To illustrate the formal setup, consider ﬁrst a speciﬁc information and task structure. Let Ω = {1, 2, 3, 4} and let an ex-ante signal s0 provide noisy information on whether the state is an even or an odd number.

Formally, let Pr(s0 = z | z) = h where z ∈ {even, odd} and h ∈ (0.5, 1). Let a second ex-ante

signal s0 give precise information on whether the state is low (ω ≤ 2) or high (ω > 2). Assume that

s0 requires skill to be processed but s0 does not. Assume that a success occurs if y = ω and a failure

occurs otherwise. Since s0 is processed with probability θ by an agent of type θ, the true probability of success for a type θ agent is:

h Pr(S | θ, h) = (1 + θ) (1) 2

Assume that the supervisor observes an ex-post signal s1, which tells her precisely whether the state is

an even or an odd number. If the supervisor projects s1, then her perceived probability of a success given section require only that the probability of success is increasing in type for any given set of signals. 11The assumption that the agent does not know his type is for simplicity only. It is otherwise standard in the career concerns literature, e.g. Holmström (1999). Since in Sections 4.3-4.6 the agent is passive, this assumption plays no role in the results there.

15 a projection of degree ρ is:

1 Pr(S | θ, h)ρ = (ρ + (1 − ρ)h)(1 + θ) (2) 2

where subscript ρ refers to the degree of the bias. This equation shows that the supervisor’s expectation of the probability with which a type θ agent should succeed is increasing in ρ. Assume that the supervisor

ρ ρ observes success or failure, but not y. Let π1(S) and π1(F ) denote a ρ-biased supervisor’s updated beliefs after observing success and failure, respectively. The following claim shows that biased inference after success is the same as the unbiased, but a biased supervisor is more pessimistic after a failure than a Bayesian one.

ρ ρ Claim 1 For all π0, π1(S) does not change in ρ, π1(F) is decreasing in ρ in the sense of ﬁrst-order stochastic dominance (FOSD).

If recruitment and ﬁring decisions are based on the supervisor’s assessment, then this claim implies that the agent will be ﬁred too often after a failure but not after a success.

4.3 Underestimation

Although the above example was illustrative of the setup, it did not explicate how information projection changes the supervisor’s assessment more generally. To identify this mechanism, let me now turn to the more general case. The ﬁrst result is also the main result of this section. It claims that if the supervisor projects productive information, she underestimates the radiologist’s skill level on average.

ρ ρ Proposition 3 Taking expectations based on the true distribution of signals, E[π1] FOSD E[π1 ] iff

ρ ≥ ρ, for all π0.

Information projection leads to the systematic underestimation of the agent’s skill. Since the supervisor projects productive information, she overestimates the overall probability of a success and underestimates the overall probability of a failure. Hence she is more surprised observing a failure and less surprised observing a success than in the unbiased case. As a result, a biased supervisor puts more weight on the information revealed by a failure and less weight on the information revealed by a success than a Bayesian supervisor. Since lower types are more likely to fail than higher types, this leads to

16 underestimation on average. Note that in the Bayesian case the expected posterior always equals the prior and hence the above proposition also implies that the expected biased posterior is lower than the prior, and that this is true for any prior. Proposition 3 shows that if the supervisor has access to information that the agent did not have, the supervisor is negatively biased in her assessment. Let’s call an increase in s1, the ex-post information, a change that increases the extent to which knowing and acting upon s1 increases the probability of a successful outcome after an optimal decision. The next corollary shows that an increase in the projected signal leads to further underestimation.

ρ Corollary 1 For all ρ > 0, E[π1] is decreasing in s1 in the sense of FOSD.

In the analysis above, I assumed that the supervisor’s inference is based on a performance measure which consists of either a success or a failure and the knowledge of what the set of ex-ante signals is. In many situations, the supervisor has more detailed information, such as observing the exact realizations of the signals in s0 or the agent’s action y. In a Bayesian setting, such information leads to more precise estimates of θ. In a world of biased evaluators, it might well increase underestimation. Thus in-depth investigations might be welfare reducing in a biased case, even if they are welfare improving in the Bayesian one.

4.4 Over- and Under-inference

Proposition 3 is consistent with the general wisdom that hindsight bias leads to too much ex-post blame. Although this result is true on average, it does not follow that the supervisor assigns too much blame both after a success and after a failure. Conditional beliefs will depend on the exact nature of the information projected. If the bias leads to a perception, where the marginal return to skill is higher than in the Bayesian case, the supervisor overinfers skill from performance. This happens for example in a case, where in the absence of ex-ante information the outcome is determined only by chance, but in hindsight, an able radiologist should have detected cancer, had he not only understood the X-ray, but knew what symptoms the patient would develop later. In contrast, if information projection leads to a perception, where the marginal return to skill is lower than in the Bayesian case, the supervisor underinfers skill from performance. This is the case where ex-post information completely substitutes for the skill-intensive

17 information and hence in retrospect all differences in performance are due to differences in luck. The following two examples show how conditional beliefs depend on the nature of the projected information.

Let ω = !1!2 where !i ∈ {1, −1} for i = 1, 2 and let there be a symmetric prior σ both on !1

and !2. Let s0 be a signal about !1that is true with probability h. Let s0 be a signal about !2 that is

always true. Assume that skill is necessary for the understanding of s0 but not of s0. In this case, the true probability of success for a type θ agent is:

1 Pr(S | θ) = θh + (1 − θ) (3) 2

Let the ex-post information be given by s1 where s1 reveals !1. Hence the perceived probability of success of type θ with information projection of degree ρ is:

1 Pr(S | θ)ρ = ρθ + (1 − ρ)θh + (1 − θ). (4) 2

It is easy to see that for all ρ > 0 the marginal return on skill in the biased case is higher than in the

1 1 Bayesian case, h − 2 + ρ(1 − h) versus (h − 2 ). As a result, the supervisor exaggerates the extent to 1 which performance is inﬂuenced by skill. In the limit, where h = 2 , performance is not informative about skill yet a biased supervisor infers skill from performance. Here, full information projection leads to the complete illusion of talent.12

Let be everything as before, except the ex-post information. Let s1 now tell the true value of !2 with probability z. Here, in contrast to the previous example, the productivity of s1 depends on whether the

agent processed s0 or not. If s0 was processed, s1 adds no information. If it was not, s1 increases the agent’s chances of producing a successful outcome. The true probability of success for a type θ agent is

1 Pr(S | θ) = θh + (1 − θ) (5) 2

and the perceived probability with full information projection is

Pr(S | θ)1 = θh + (1 − θ)[hz + (1 − h)(1 − z)]. (6)

12For other mechanisms leading to the illusion of talent based on false beliefs, see Rabin (2002) or Spiegler (2006).

18 In contrast to the previous case, here the marginal return on skill is lower in the biased case. In the limit, where z = 1, the perceived probability of success equals h for all types and hence the marginal return is perceived to be zero. This means that a fully biased supervisor does not update her beliefs after observing performance because she believes that differences in performance are due entirely to differences in luck.

Given a bias of degree ρ, let λρ(θ) = Pr(S | θ)/ Pr(S | θ)ρ be a measure of the exaggeration of the probability of success for a type θ agent. We can now state the following more general result:

ρ ρ ρ Proposition 4 For all π0 and ρ > 0, if λ (θ) is decreasing in θ, then π1(S) FOSD π1(S), λ (θ) is

ρ increasing in θ, then π1(S) FOSD π1(S).

The above proposition speciﬁes the effects of information projection on the supervisor’s assessment after a success as a function of the projected information. 13 The impact on the assessment after a failure is the outcome of the net effect of two forces: underestimation, and over- and under-inference. In the case of over-inference, these two point in the same direction and the supervisor is always too pessimistic after a failure. In the case of under-inference, they point in different directions and the net effect is ambiguous. As Example 2 shows, it is possible that the under-inference effect dominates and a biased supervisor is too optimistic after a failure.

4.5 Production of Information

As the evidence suggests, many professionals do anticipate the presence of hindsight bias. In our model, this suggests that the agent might respond strategically to the supervisor’s bias. It follows from the above analysis, that if the agent prefers a higher assessment to a lower one, information projection decreases his welfare on average. To avoid such underestimation, the agent has incentives to reduce the information gap that exists between the ex-ante and the ex-post environment. This incentive might motivate the agent to change the set of signals he has, and also to avoid certain tasks if possible.

Consider a speciﬁcation where the radiologist has access to a set of signals s0 and can decide to

14 produce an additional ex-ante radiograph s0. The cost of producing this radiograph is a and the beneﬁt 13In the case where the marginal return to skill is the same both in the true and in the biased perception, only underestimation has an effect. See the example of Section 4.2. 14 I do not require that s0 needs skill to be processed, only that there are some skill-intensive signals in s0. For ease of notation, I suppress s0 in what follows. 19 is the increased probability of a successful treatment. Assume that the radiologist bears the full cost of a and also that his compensation w0 depends on the outcome. This compensation is w0,S > 0 after a success, and 0 after a failure. Assume that the radiologist’s utility depends on the supervisor’s evaluation

as well. In particular, assume that the radiologist’s future wage w1 equals the mean of the evaluator’s

ρ ρ ex-post assessment, i.e., w1 = E[θ | π1]. Future wages could be interpreted as a reduced form for the radiologist’s future employment opportunities.15 Formally, consider the following von Neumann- Morgenstern utility function for the agent:

ρ U(w, a) = w0 + w1 − a ∗ 1{s0 is produced} (7)

The above speciﬁcation assumes risk neutrality over assessments, i.e., that ex-ante the agent cares only about the expected beliefs of the supervisor, and does not care about the difference between the conditional beliefs after a success or a failure. This assumption is mainly for expository purposes, and later in this section, I discuss the case where this assumption is relaxed.

Let m denote the frequency of monitoring. This frequency corresponds to the ex-ante probability with which the agent is evaluated. Since the supervisor’s assessment of θ changes only conditional of an assessment, it follows that for a ﬁxed m and ρ, the radiologist’s optimal choice whether to produce the additional radiograph is determined by the following inequality:

ρ ρ [Pr(S, s0) − Pr(S)]w0,S − a ≥ mE[w1 − w1(s0)] (8)

The left-hand is the direct beneﬁt minus the direct cost of producing signal s0. The right-hand is the loss/gain in expected wages from producing this additional radiograph. In the Bayesian case, the expectation of the right-hand side of Eq. (8) is always zero. This is true because the supervisor’s expected posterior always equals her prior under Bayesian learning, and this holds independently of what signals the radiologist did or did not see. As a result, given the assumption of risk-neutrality, the choice of producing the additional radiograph is independent of the frequency of monitoring. Furthermore, even if we relax the assumption of risk-neutrality, the radiologist’s choice should be independent of the ex-post information. Since such information does not inﬂuence the agent’s

15 ρ ρ The speciﬁc assumption that w1 = E[θ | π1 ] is without loss of generality in the sense that results hold for all utility ρ functions that are increasing in π1 in the sense of FOSD.

20 ex-ante productivity, it should not affect the supervisor’s assessment in the Bayesian case. In the biased case, at the same time, the posterior is decreasing in the information gap between the ex-ante and the ex-post stage. Here the choice whether to produce the additional radiograph depends crucially on the relationship of this signal and the ex-post information. To see this, let me distinguish be-

tween two ways the productivity of these two signals could be linked. I call signals s0 and s1 substitutes, if processing s0 decreases the productivity gain from having s1. I call these two signals complements, if processing s0 increases the productivity gain from having s1. The following deﬁnition introduces these two properties formally:

ρ ρ ρ ρ Deﬁnition 2 Let λ = Pr(S)/ Pr (S) and λ = Pr(S | s0)/ Pr (S | s0). Signals s0 and s1 are substi-

ρ ρ ρ ρ tutes if λ < λ for all ρ. Signals s0 and s1 are complements if λ > λ for all ρ. When signals are substitutes, the radiologist can reduce the information gap between ex-ante and ex- post by ordering an additional radiograph. When signals are complements, ordering a new radiograph

increases this information gap. For a given m and ρ, let a(m, ρ) denote the cost where Eq. (8) holds for equality. According to the next proposition, an increase in the probability of monitoring leads to an increase in the production of substitute information and to a decrease in the production of complement information.

Proposition 5 If s0 and s1 are substitutes, then a(ρ, m) is decreasing in m iff ρ > 0. If s0 and s1 are complements, then a(ρ, m) is increasing in m iff ρ > 0.

A radiologist has additional incentives to undertake diagnostic procedures that substitute for ex-post information. The reason is that such diagnostic procedures reduce the probability of unwarranted ex- post blame. Even when such procedures are socially inefﬁcient, because they are too costly or just undesirable for example because they expose the patient to too much radiation, a radiologist will undertake them to maintain a good reputation. As a result, the more he is monitored, the more expensive and potentially more harmful his activities will be on such tasks. At the same time, a radiologist has additional incentives to avoid information that can be interpreted much better in hindsight than in foresight. Even if the production of such information increases productivity more than it increases costs, the radiologist is better-off without producing such information because this way, he can avoid developing

21 a bad reputation. Both effects are increasing in the frequency of monitoring.16 Proposition 5 provides distinct predictions on over- and under-production of information as a function of the environment. Stylized evidence provides support for existence of both over- and underproduction of information. For example, Studdert et al. (2005) survey physicians (in areas of surgery, radiology and emergency medicine) in Pennsylvania. In their sample, 43% of the physicians report using imaging technology in clinically unnecessary circumstances, and 42% of them claim, they took steps to restrict their practices, which included eliminating procedures prone to future complications. Kessler and Mclellan (2002), show that changes in defensive medicine that result from medical liability reforms, are primarily on diagnostic rather than on therapeutic practices. Intuition suggests, that there is typically more room for information projection in the former than in the latter context. While many argue for a direct link between defensive medicine and hindsight bias, further evidence is needed to test for the mechanism described. Note that Proposition 5 rests crucially on the assumption that the supervisor conditions her inference of the agent’s competence on all signals produces by the radiologist. She does not need to observe the realization of these ex-ante signals, but the results depend on the fact that she does observer whether s0 was or was not produced. Alternatively, one could imagine a situation, where the radiologist can produce s0 secretly, i.e., in

a way that the supervisor does not learn about s0. In this case, the radiologist’s expectation of his future wages is independent from his production choice. Even here, the anticipation of the supervisor’s bias might lead to distortions in production choices. To see these effects, let’s return for a moment to Proposition 4. In environments, where the supervisor overinfers skill from performance wages are too high after a success and too low after a failure. It follows that here the radiologist wants to over-produce ex-ante information secretly. In environments, where the supervisor underinfers skill from performance, wages are low high after a success, and hence the radiologist might want to under-produce ex-ante information secretly. These deviations from the Bayesian incentives disappear if these under- and over- productions are detected, unlike in the case of Proposition 5. Given the focus on skill assessment, I assumed that skill-intensive signals are always present as is

16A corollary of the underproduction result is that the radiologist is also averse to ordering tests that deliver results after his recommendation is made. These tests increase the information gap between ex-ante and ex-post, and hence increase the extent of underestimation.

22 true in most situations. In situations, where the agent cannot eliminate, or sufficiently reduce, the infor- mativeness of novel ex-post signals, ceteris paribus, he would like to avoid procedures that involve inference about his skill to protect himself from underestimation. A similar incentive might well be present in a Bayesian setting, where the radiologist is risk-averse over future wages. Absent skill-intensive signals, the agent is not exposed to wage fluctuations that result from the supervisor’s updating.17 Information projection amplifies this aversion so that even a risk-neutral agent will exhibit such preferences. Impor- tantly though, the two mechanisms are distinct and the incentives that can alter them are also different.

5 Reward and Punishment

In the pervious section, I showed that a biased supervisor underestimates the agent’s skill on average. A principal, responsible for employment decisions, can to some extent correct the supervisor’s mistake if she anticipates that the supervisor’s reports are too negative on average. In most situations, however, a principal does not have as detailed information about the agent’s task as the supervisor. Hence such corrections might introduce other forms of inefﬁciencies, and might not eliminate the incentives of the agent to act against underestimation. In this section, I turn from a context where the amount of information that the agent learns from a signal is a function of his skill, to situations where it is a function of how much effort he exerts. How often the radiologist understands X-rays, depends on how carefully he evaluates them. A careful evaluation is costly because it requires the radiologist to exert effort. To provide incentives for the radiologist, the principal offers him a contract that rewards the agent for a good health outcome and punishes him for a bad one. If the health outcome is only a noisy measure of the correctness of the radiologist’s diagnosis, and effort is unobservable, better incentives can be provided if the principal hires a supervisor to monitor the radiologist. This way, the principal can tie reward and punishment closer to whether the radiologist made the correct diagnosis given the information available to him ex ante.18 The main result of this section shows that if the supervisor projects ex-post information, the efﬁ- ciency gains from monitoring are decreased. I show that if the supervisor believes that the agent could

17On this logic in the Bayesian setting, see Hermalin (1993). 18For this classic insight, that increasing observability reduces inefﬁciency in the context of moral hazard, see Holmström (1979) and Shapiro and Stiglitz (1984).

23 have learned the true state, the radiologist is punished too often and exerts less effort than in the Bayesian case. I also show that when the principal designing incentives anticipates the supervisor’s bias, she wants to monitor less often. Even if she decides to monitor, she induces less effort on the part of the agent than in the Bayesian case. The reason is that information projection, even if anticipated by the principal, introduces noise in the supervisor’s reports, and hence decreases the efﬁciency of monitoring.

5.1 Effort

Assume that the agent’s level of the effort determines the probability with which he understands signal s0. Let p(a) be the probability that s0 is understands s0 when the agent exerts effort a. If he does not

understand it, he infers nothing from s0. I assume decreasing returns to effort in terms of the processing

probability. Formally, p (a) > 0 and p (a) < 0. I also assume that lima→0 p (a) = ∞ and lima→∞ p(a) = 0.

Let s0 be such that Pr(s0 = ω | ω) = h. Assume that the probability of a success conditional on the fact that the agent’s action equals the state, , is . Assume that the probability of success for y = ω k actions different from the state, y = ω, is z where k > z. Finally, assume that if the agent does not

process s0, he is equally likely to take any action y ∈ Ω and the probability that such a random action matches the state is b where b < h. For simplicity, assume that both the agent and the principal are risk neutral. Let the agent’s utility function again be U(w, a) = w0 − a and the principal’s utility function be V (r, w) = r − w0, where r is the revenue to the principal from the task. Let the revenue of the principal be 1 after a success and 0 after a failure.

5.2 Performance Contract

As the benchmark, I characterize the ﬁrst-best effort level where the marginal social beneﬁt from exerting

∗ effort equals the marginal social cost. The ﬁrst best effort level, af , is then deﬁned implicitly by the following equality:

qp (af ) = 1

24 19 where q = (h − b)(k − z), and q measures the productivity gain from processing signal s0. This productivity increases in h, the precision of the agent’s signal, and in k, the probability of success conditional on an optimal action. The productivity decreases in b, the probability of making the right choice by chance, and in z, the probability of success conditional on a non-optimal choice. With a slight abuse of notation, let the vector q denote the collection of the parameters, h, b, k, z. Let’s now turn to the case, where the agent’s effort is unobservable. Assume that the agent is pro-

tected by limited liability, i.e., w0 ≥ 0 has to be true in all contingencies. Let the agent’s outside option be 0. Given the assumption of risk-neutrality, the principal’s optimal contract is one that offers the low-

20 est compensation possible after a failure. This implies that the compensation after a failure is wF = 0.

Let wS denote the compensation offered to the agent upon a success. In light of these considerations, the principal’s problem is to maximize her expected utility:

max V (r(a, q), w) = [p(a)q + bk + (1 − b)z](1 − wS) (9) a,w

subject to the agent’s incentive compatibility constraint:

an(q, w) = arg max[p(a)q + bk + (1 − b)z]wS − a. (10) a

Given the agent’s utility function, we can replace this incentive compatibility constraint with its

ﬁrst-order condition. To guarantee that there is a unique stable equilibrium, I assume that p(a) ≤ 0 for

∗ all a. The optimal effort level, an(q), which solves this constrained maximization problem is deﬁned implicitly by following equation:

p(p + (bk + (1 − b)z)/q) qp = 1 − . (11) (p)2

∗ Let wn(q) denote the corresponding optimal wage.

∗ ∗ Note that an(q) is always smaller than af (q). The reason is that the principal faces a trade-off: implementing a higher level of effort is only feasible at the cost of leaving a higher rent for the agent.

19 |Ω| I assume that the solution is always interior. Furthermore, h = h + (1 − h) |Ω|−1 b where |Ω| is the cardinality of the action space. 20On the use of limited liability contracts, see e.g., Innes (1990) and Dewatripont, and Bolton (2005). I believe that the results of this section hold a fortiori given a risk-averse radiologist.

25 Thus effort is lower and the agent’s rent is higher than in the ﬁrst-best. A simple comparative static result follows from Eq. (11). Increasing h or k, increases the productivity of processing information and thus generates higher utility for the principal given a contract. Since p > 1 is always true in equilibrium, a higher h or k allows for cheaper incentives and thus the principal wants to induce more effort, implying that effort is increasing in h and k.

∗ Lemma 1 An increase in h or k increases the equilibrium effort level an(q) and the payoff to the principal.

5.3 Bayesian Monitoring

The effort level characterized by Eq. (11) is optimal given that the supervisor observes a performance measure that consists only of success and failure, but obtaining more precise reports about the agent’s action, allows the principal to induce the same level of effort at a lower cost. Consider that the principal can monitor the agent by learning the agent’s action and the information that was available to him. In case of such monitoring, the optimal contract rewards the agent if his action is the one suggested by the information available to him and punishes the agent otherwise. Since whether a success happens or not does not contain additional information, it is easy to see that such a compensation scheme is optimal. Given such a reward scheme, the agent’s incentive compatibility constraint can now be expressed by the following ﬁrst-order condition:

am(q, w) = arg max p(a)(1 − b)wS + bwS − a (12) a

∗ and the optimal contract induces an equilibrium effort level, am(q), deﬁned implicitly by the following condition: p(p + b/(1 − b)) pq = 1 − . (13) (p)2

∗ Let wm(q) denote the corresponding optimal wage.

∗ The equilibrium effort under monitoring, am(q), is always greater than equilibrium effort without

∗ monitoring an(q). The reason is that monitoring improves the trade-off between providing incentives and leaving a positive rent for the agent. It rewards good decisions rather than good luck. As a result, if the principal monitors the agent, she can induce the same level of effort at a lower cost and hence for

26 any given level of effort, she realizes a greater expected proﬁt. The fact that it becomes cheaper for the principal to induce effort means that the principal is willing to pay for monitoring.

∗ ∗ Lemma 2 The equilibrium under monitoring induces a higher effort, an(q) < am(q) and the principal is better-off with the option of monitoring.

5.4 Biased Monitoring

Let the supervisor’s ex-post signal be s1 and assume that the projected information is such that along with s0 it perfectly reveals the state but alone its uninformative. This means that a biased supervisor perceives the true problem as if h = 1 for all h ≤ 1. Furthermore, it also implies that upon not

processing s0 the supervisor still believes that the probability that the agent can take the right action is b .The consequence of such information projection is that the supervisor makes wrong attributions from the agent’s choice. Whenever y = ω, the supervisor concludes that the agent did not successfully

read the information available to her. Hence, if the agent did read and follow s0, but this information

21 turned out to be ’incorrect’ ex-post, the supervisor mistakenly infers that the agent did not read s0. The probability of this mistake is p(a)(1 − h), i.e., the probability that s0 is processed times the probability

that s0 did not suggest the right action. Assume that the agent correctly predicts the bias of the supervisor. In this case, the agent’s effort is given by the solution of the following maximization problem:

1 am(q, w) = arg max p(a)h(1 − b)wS + bwS − a. (14) a

Comparing this condition with that of Eq. (12), it is clear that the return to effort is lower in the biased case. The reason for the former is that an unbiased supervisor can distinguish – up to probability

(1 − b) – between a bad decision that is due to wrong ex-ante information and a bad decision that results from not processing a signal. In contrast, a biased supervisor mistakes a bad decision due to wrong ex-ante information to a bad decision that is due to not having processed the available information. This implies that for any given compensation wS the agent exerts less effort in the biased case. 21Note that the ’inference’ of the supervisor is only about whether the agent’s effort was successful or not. In a moral hazard context, there is no inference about the agent’s effort a.

27 1 1 Proposition 6 Suppose h < 1. Then am(q, w) < am(q, w), and am(q, w) is increasing in h with 1 am(q, w) = am(q, w) if h = 1..

The above corollary shows that if a negligence based reward scheme is enforced by a biased evaluator, than it becomes closer to strict liability and in our setup, this reduces care. A possible corollary to the above proposition is that negligence rule might actually backfire. The reason is that under monitoring the radiologist is offered a lower compensation in equilibrium which is outweighs by the increased probability of reward in a Bayesian case. Since the probability of a reward is reduced in the biased case, care might be lower under biased monitoring than under the simple performance contract. As a final scenario, consider the case where the bias of the supervisor is common knowledge between the principal and the agent. If the principal is aware of the supervisor’s bias, she knows that at times, the supervisor comes to the wrong conclusion. Since the principal can only determine the probability of this mistake, and not whether the supervisor’s report is actually wrong or right, information projection adds noise to the supervisor’s reports. Thus, the data obtained by monitoring contains more noise than in the Bayesian case. This decreases the efficiency of monitoring. As a result, the principal decides to induce less effort than he would had he believed that the supervisor had perfect Bayesian perception.

1∗ Let the optimal effort level induced be denoted by am (q) and implicitly deﬁned by:

(p)2 − p(p + b/h(1 − b)) pq = . (p)2

1∗ ∗ 1∗ Proposition 7 If the principal anticipates the bias, she induces effort am (q) < am(q) and am (q) is increasing in h.

The analysis above has implications to the effect of hindsight bias on tort liability. It claims that whenever there is unobservable effort involved information projection reduces rather than increases an injurer’s incentive to exercise due care. This observation is in contrast with the common conjecture, e.g., Rachlinski (1998), that an agent anticipating hindsight bias will take too much pre-caution too avoid ex-post blame.22

22A key difference between my setup and the standard setup for the study of optimal liability, e.g., Shavell (1980), is that the level of precaution (effort) is unobservable and action is not increasing in precaution rather it is the probability of a right action that increases in effort.

28 6 Communication

In the previous sections, I focused on the problem of performance evaluation but information projection might affect other aspects of organizational life as well. One such domain is communication. Both intuition and evidence presented in Section 2 indicates that when giving or taking advice people assume too much about what the other party knows. In this Section I demonstrate two ways information projection affects efﬁcient information transmission between a speaker and a listener. These two themes are credulity and unintended ambiguity. Credulity refers to a case where a listener follows the recommendation of a speaker too closely because he assumes that the recommendation of the speaker already incorporates his private information. As a result, he will fail to combine his private information with the speaker’s recommendation and will fail to sufﬁciently deviate from this recommendation even if he should. Unintended ambiguity refers to the case where a speaker sends a message that is too ambiguous to the listener. A biased speaker exaggerates the probability with which her background knowledge is shared with the listener, and hence will overestimate how likely it is that the listener will be able to interpret her message. I show that depending on the messages available for the speaker, the speaker might communicate too often or too rarely.

6.1 Credulity

Consider a situation where an advisee has to take an action ye that is as close as possible to an unknown state ω on which the shared prior is N(0, σ0). This state could describe the optimal solution of a research problem, the best managerial solution on the organization of production or the diagnosis of a patient.

The advisee has some private information about ω that is given by se = ω + εe where εe is a mean zero

Gaussian noise term such that the posterior on ω, given the prior and se, is N(se, σe). The advisor also has some private information about ω given by s = ω + ε where ε is a mean zero Gaussian noise term r r r 23 such that the posterior on ω, given the prior and sr, is N(sr, σr). The advisor makes a recommendation y equal to her posterior mean. The advisor cannot communicate the full distribution or the true r signal directly. Such limits on communication might arise due to complexity considerations, or because

2 2 2 2 23 σ0 σ0 σe σ0 Formally, if εe ∼ N(0, σe) then se = 2 2 se and σe = 2 2 . Similarly, if εr ∼ N(0, σr) then se = 2 2 sr and σ0 +σe σ0 +σe σ0 +σr 2 2 σ0 σr σe = σ2+σ2 . 0 r 29 it’s prohibitively costly to explain this private information. Instead, she can give a recommendation regarding the best action she would follow. Let the advisee’s and the advisor’s objective be

2 max −Eω(ye − ω) (15) ye

thus the advisee’s goal is to take an action that minimizes the distance between his action and the state.

Given the advisor’s recommendation yr, and the advisee’s private information se, a rational advisee takes

0 action ye such that: 0 0 0 ye = E[N(ω, c , v )] (16)

2 2 0 yrσe seσr where c = 2 2 + 2 2 and N(ω ; c, v) is a short form for a normally distributed random variable σe +σr σr +σe with mean c and variance v. This action is based on the correct perception of how information is distributed between the advisor and the advisee. This action efﬁciently aggregates the information in the

recommendation yr and the advisee’s private information se. Consider now the case where the advisee exhibits full information projection. Here, he believes that the advisor’s recommendation is based not only on the realization of sr, but also on se, and thus it

already incorporates all information available to the parties. As a result, he reacts to the advice yr by

1 taking action ye such that:

1 1 1 ye = E[N(ω, c , v )] (17)

1 1 0 where c = yr and v = v . It follows that if the advisee exhibits full information projection, he puts all the weight on what the advisor says and no weight on his private information. This way, his private information is lost. The following proposition shows that a biased advisee follows the recommendation of his advisor too closely.

ρ 1 Proposition 8 E |yr − ye | is decreasing in ρ and E |yr − ye | = 0 where expectations are taken with respect to the true distribution of signals.

This proposition follows from the discussion above. Note that the more precise the advisee’s private information is, the greater is the loss relative to the unbiased case. In the biased case, information

30 aggregation fails because the advisee fails to sufﬁciently update the advisor’s recommendation given his private information.24 One way to eliminate this information loss is to invest in a technology that allows the advisor to communicate her posterior distribution. Another option is to block communication between the advisor and the advisee. Assuming full information projection, the advisee is ex-ante better- off without a recommendation if and only if his signal is more precise than the advisor’s signal. More generally, the following corollary is true:

Corollary 2 There exists an indicator function k(ρ, σe, σr) ∈ {0, 1} such that the advisee is better-off

with a recommendation if k(ρ, σe, σr) = 0, and the advisee is better-off without a recommendation if k(ρ, σe, σr) = 1. The function k(ρ, σe, σr) is increasing in ρ and σr and decreasing in σe.

6.2 Ambiguity, Over- and Under-Communication

In the above context, information projection leads to credulity because the advisee projects his private information. Let’s now turn to a context where the advisor projects her private information about the state ω. Consider an information structure analogous to the Examples in Section 4.4. Let ω = !1!2 and !1,!2 ∈ {−1, 1}. Assume that s0 = !1 is the advisor’s background knowledge, which cannot be communicated to the advisee. Let s1 = !2 be the signal that can be communicated to the advisee. As an example, consider a radiologist who speaks to a patient about the patient’s medical condition ω. Signal s0 incorporates the radiologist’s knowledge of medicine such as the meaning of a complex medical term.

Signal s1 is a medical term that describes the condition of the patient. If the patient does not know the

meaning of a medical term, then s1 does not incorporate any information to him. If the patient knows

the meaning of the medical term, he can interpret s1 in light of s0.

Let there be a third signal s2, Pr(s2 = ω | ω) = h where 0.5 < h < 1. This signal provides noisy

information about ω, but does not require the patient to know s0, the medical language. For simplicity, let

the true probability with which signals (s0, s1, s2) are available to the advisee be pe = (0, 0, 0). Assume that the patient has a symmetric prior on ω and that the advisor can send only one signal because sending two is prohibitively costly. Sending one message costs c. Let the payoff to the advisor be 1 if the advisee

24The logic of why a biased advisee will be too credulous, is also indicative of why information projection can result in irrational herding behavior. In the context of Banerjee (1992) for example, while rational information updating results in herding type behavior only in contexts where the action space is not as ﬁne as the signal space, information projection leads to herding even if the action space is as ﬁne as the signal space, and hence where no rational herding should occur.

31 guesses ω correctly, and let it be 0 otherwise.

The advisor has three distinct options: remain silent, send signal s1, send signal s2. The table below summarizes the advisor’s perceived payoff as a function of a ρ:

Payoff / action: Silence Send s1 Send s2

0 1 1 EV : 2 , 2 − c, h − c. 2 ρ 2 2 ρ +1 2 2 EV : ρ + (1 − ρ )(ρh + (1 − ρ)0.5), (1 − ρ)ρh + 2 − c, ρ + (1 − ρ )h − c.

Since an unbiased advisor knows that s1 does not convey any valuable information to the patient, she

never sends s1. Furthermore, she decides to spend time describing the state to the patient in lay terms

1 whenever h−c > 2 , that is when the expected beneﬁt of talking is greater than the cost of not remaining silent. In contrast, a biased medical advisor exaggerates the probability with which the medical term conveys valuable information to the patient because she projects the knowledge of the medical language s0. Hence if she is sufﬁciently biased, she prefers to send s1 over s2. Formally, this happens when ρ ≥ 2h − 1. At the same time, a biased advisor also exaggerates that the patient already knows both the medical and the lay description. As a result, she underestimates the return to sending a costly message in general. The net effect of these two forces depends on the degree to which the advisor is biased. If the advisor is fully biased, she always decides to remain silent because she assumes that the advisee already knows ω anyway. If the advisor is only moderately biased however, she might communicate even when a rational advisor would remain silent.

Proposition 9 If ρ < 2h − 1, the advisor sends s2 iff c ≤ k2(ρ, h) and is silent otherwise. The function k2(ρ, h) is increasing in h and decreasing in ρ. If ρ > 2h − 1, the advisor sends s1 iff c ≤ k1(ρ, h) and is silent otherwise. Furthermore, if h = 0.5, the advisor sends s1 iff c ≤ 0.5ρ(1 − ρ).

The above proposition shows not only that a biased advisor might send a dominated message but also offers some comparative static results on whether there will be too much or too little communication. If the advisor is only moderately biased, and the lay description is sufﬁciently informative, then she com- municates too rarely. Here underestimating the return to communication dominates her overestimation of how informative the medical description is. If the advisor is sufﬁciently biased, then depending on

32 how informative the lay description is, she might communicate too often. Since she overestimates the probability that following the medical description the advisee will take the right action, she engages in costly communication even if it conveys no information to the advisee. Hence adding dominated communication options might decrease efﬁciency in the presence of information projection.

Results in the above proposition are consistent with the intuition that the curse-of-knowledge leads to too much ambiguity. For example, many argue that this is true for computer manuals written by experts but targeted to lay people. While in the case of computer manuals, hiring a lay person rather than an expert to proof-read the manuscript could decrease the curse, in many other situations more explicit communication protocols might better improve welfare.

7 Conclusion

In this paper, I developed a model of information projection applicable to problems of asymmetric information. The applications in this paper are motivated by problems and evidence from labor markets, organization, medicine, and law, but they are not exhaustive in any sense. I conclude the paper by considering some possible further applications and extensions. The results in Section 4 and Section 5 suggest that if debiasing is ineffective, special kinds of incentives might be necessary to mitigate the adverse effects of information choice on the production and processing of information. Novel insights might be gained in contexts where the radiologist can decide both about what information to produce and how much effort to exert in understanding the information he produced. Another possible extension of the over-inference and underestimation results of Section 2 is to the analysis of group formation in social networks. Recall that a biased guest will be too optimistic about the kindness of the host if the host and the guest have similar tastes, and will be too pessimistic about the host’s kindness if their tastes differ. This implies that if friendships are formed partly on the perception of social intentions, then members of a group might be too similar in taste. More importantly, such cliques will misperceive each other as hostile because they mistakenly attribute taste differences to hostile intentions. As a corollary of the underestimation result of Proposition 2, it might also be true that a social network will have too few links.

33 The underestimation result can also be extended to the settings of Section 6. A biased advisee might underestimate how attentive his advisor is with him because he exaggerates the precision of the advice an attentive advisor could give if he wanted. Here attentiveness is deﬁned as the probability that the advisor bases her recommendation on information rather than on noise. A biased advisor might underestimate

how perceptive her advisee is because she does not recognize how ambiguous her messages are. Here perceptiveness is deﬁned as the probability that the advisee listens to the advisor’s message. Such inferences can result in the breakdown of communication between parties who have a lot to share with each other but suffer from projection bias. Another direction to extend the ideas presented in this paper is to consider the related phenomenon of ignorance projection. Ignorance projection happens when someone who does not observe a signal underestimates the probability with which this signal is available to others. Though evidence on ignorance projection is not as strong as the evidence on information projection, it might still be a phenomenon worth studying, both empirically and theoretically. Finally, one could study information and ignorance projection in the intrapersonal domains where people project their current information and their current ignorance on their future selves leading to distortions in prospective memory.

34 8 Appendix

Proof of Proposition 1. Note ﬁrst that since z > h , the host follows sk if she observes sk. Without loss

of generality assume that sk = meat. The biased conditional likelihoods are given by

ρ (ρ + (1 − ρ)h)π0(θkind) π1(θkind | y = meat) = (18) (ρ + (1 − ρ)h)π0(θkind) + π0(θmean,meat)

and

ρ (1 − ρ)(1 − h)π0(θkind) π1(θkind | y = meat) = (19) (1 − ρ)(1 − h)π0(θkind) + π0(θmean,fish)

ρ ρ Since h ≥ 0.5, π1(θ1 | y = sk) is increasing in ρ and π1(θ1 | y = sk) is decreasing in ρ.

Proof of Proposition 2. The guest’s perception of the ex-ante likelihood of the event that y = sk, is increasing in ρ. By virtue of the properties of Bayes’ rule, the following relation holds for all ρ :

ρ ρ ρ ρ π0(θkind) = π1(θkind | y = sk) Pr (y = sk | π0) + π1(θkind | y = sk) Pr (y = sk | π0).

ρ The expected posterior π1(θkind) at the same time, is given by :

ρ ρ ρ π (y = sk | θkind)π0(θkind) π (y = sk | θkind)π0(θkind) E[π1(θkind)] = ρ Pr(y = sk) + ρ Pr(y = sk). Pr (y = sk) Pr (y = sk) (20)

ρ ρ Since Pr (y = sk | π0) is increasing and Pr (y = sk | π0) is decreasing in ρ , then given Proposition

ρ 1, it follows that E[π1(θkind)] is decreasing in ρ.

Proof of Claim 1. Note that

1 h (ρ + h(1 − ρ))(1 + θ)π0(θ) (1 + θ))π0(θ) (1 + θ))π0(θ) 2 = = 2 (21) 1 1 1 1 h 0 2 (ρ + h(1 − ρ))(1 + θ)π0(θ)dθ 0 (1 + θ))π0(θ)dθ 0 2 (1 + θ))π0(θ)dθ ρ 0 ρ hence it follows that π (θ | S) = π (θ | S) for all ρ and π0. The result on π (θ | F) follows from the Proof of Proposition 3 below.

Proof of Proposition 3. The expected posterior is the probability weighted average of the posterior after

35 ρ 0 ρ 0 ρ a success and the posterior after a failure. E[π1 | π0] = Pr (S)π1(S) + (1 − Pr (S))π1(F ). For a given type θ this is equal to

ρ ρ ρ 0 Pr (S | θ)π0(θ) 0 Pr (F | θ)π0(θ) E[π (θ) | π0(θ)] = Pr (S) ∗ + (1 − Pr (S)) ∗ . (22) 1 Prρ(S) (1 − Prρ(S))

Note that E[π1 | π0] = π0.

ρ ρ ρ 0 ρ Let’s introduce two variables: λS = Pr(S)/ Pr (S) and λF = (1 − Pr (S))/(1 − Pr (S)), where ρ ρ ρ variables are taken with respect to the expectations in π0. Note that λS < 1 and λF > 1 and λS is ρ ρ decreasing λF is increasing in ρ given the assumption that Pr (S | θ) = ρ Pr(S | s1, θ) + (1 − ρ) Pr(S | θ). Since Prρ(S | θ) is increasing in θ for all ρ, it follows that the expected weight on higher types is decreasing in ρ. Formally,

ρ ρ ρ ρ ρ ρ ρ ρ λS Pr (S | θ)π0(θ) + λF Pr (F | θ)π0(θ) = λSπ0(θ) + (λF − λS) Pr (F | θ)π0(θ) where the equality follows from the fact that Prρ(S) + (1 − Prρ(S)) = 1 for all ρ. Hence, lower types are overweighted relative to higher types. Since Pr ρ(F | θ) is decreasing in θ for all ρ, it follows that for any θ∗ < 1

θ∗ ρ θ∗ 0 E[π1(θ) | π0]dθ > 0 E[π1(θ) | π0]dθ. (23) ρ ρ Furthermore, since λF − λS is increasing in ρ it follows that

θ∗ ρ θ∗ ρ 0 E[π1(θ) | π0]dθ > 0 E[π1 (θ) | π0]dθ (24) whenever ρ > ρ.

Proof of Corollary 1. If for a given s0, Pr(S | θ, s1) > Pr(S | θ, s1) for all θ, then for all ρ,

ρ Pr(S) Pr(S) ρ λS{s1} = ρ < ρ = λS{s1}. Pr (S, s1) Pr (S, s1)

ρ Since for both s1 and s1, Pr (S | θ) is increasing in θ, the corollary follows from the above proof of

36 Proposition 3.

ρ ∗ Proof of Proposition 4. To show that π1(S) FOSD π1(S) we have to show that for all θ < 1 ,

θ∗ θ∗ ρ π1(θ | S)dθ ≥ π1(θ | S)dθ 0 0 for all θ∗ < 1. One can re-write this inequality in the following way:

θ∗ θ∗ ρ ρ Pr (S)/ Pr(S) ≥ Pr (S | θ)π0(θ)dθ / Pr(S | θ)π0(θ)dθ . (25) 0 0

If λρ(θ) = Pr(S | θ)/ Prρ(S | θ) is decreasing in θ for all θ with λρ(0) ≥ λρ(θ) ≥ λρ(1), then this

1 ρ 1 ρ inequality holds since 0 π1(θ | S)dθ = 0 π1(θ | S)dθ = 1. If λ (θ) is increasing in θ then the reverse ρ inequality holds, and then π1(S) FOSD π1(S).

Proof of Proposition 5. Note ﬁrst that in the Bayesian case the RHS of Eq.(9) is zero and does not

ρ ρ depend on s1. In the biased case, w1 = E[θ | π1] depends on s1 and it is decreasing in ρ. The decision

to produce s0 more often or less often than in the Bayesian case depends on whether the following expression is positive or negative : ρ ρ E[w1 − [w1 | s0]] (26)

ρ ρ ρ ρ It follows from the proof of Proposition 3, that if λS > λS, then E[w1] > E[[w1 | s0]] for all ρ > 0. This is true because underestimation is decreasing in Pr(S)/ Prρ(S) = λρ . Similarly, if λρ < λρ then S S S ρ ρ for all . It follows that if ρ ρ then is decreasing in and if E[w1]] < E[[w1 | s0]] ρ > 0 λ > λS, a(m, ρ) m λρ < λρ , then a(m, ρ) is increasing in m. S Proof of Lemma 1. First let’s derive the optimal contract as given by Eq. (11). The principal’s maximization problem yields the following Lagrangian:

L(wS, a, µ) = (p(a)q + bk + (1 − b)z)(1 − wS) + µ(p (a)qwS − 1)

The FOC with respect to a is given by p q(1 − wS) + µp qw = 0 and with respect to wS it is −(p(a)q +

37 bk + (1 − b)z) + µp(a)q = 0. Solving for µ and substituting for w = 1/p(a)q the equilibrium effort level is given by

p(p + (bk + (1 − b)z)/q) p(p + b/(h − b) + z/q) pq = 1 − = 1 − (27) (p)2 (p)2

∗ Let the solution of this equation be denoted by an(q). Note that the second-order conditions are satisﬁed as long as p(a) ≤ 0. An increase in k or h increases q and hence increases the LHS of Eq.(28). An increase in k or h decreases the RHS of Eq.(28). Since p is increasing and concave and p ≤ 0, it follows that this leads to a higher equilibrium effort level. To see the effects of an increase in k and h on the principal’s welfare

note that for a given wS, (p(a)q + bk + (1 − b)z)(1 − wS) is increasing in a since wS < 1. Furthermore

the optimal wS after an increase cannot be larger than the original wS because h − b < 1 < p and k − z < 1 < p.

Proof of Lemma 2. Let’s ﬁrst derive the optimal contract given monitoring as given by Eq. (13). The principal’s maximization problem yields the following Lagrangian:

L(wS, a, µ) = (p(a)q + bk + (1 − b)z) − (p(a)(1 − b) + b)wS + µ(p (a)(1 − b)wS − 1)

The ﬁrst-order condition with respect to a is given by p q − p (1 − b)wS + µp (1 − b)wS = 0 and the

ﬁrst-order condition with respect to wS is given by −(p(1 − b) + b) + µp (1 − b) = 0. Solving for µ and

∗ substituting w = 1/p (1 − b), we get that the equilibrium effort level am is determined by

p(p + b/(1 − b)) pq = 1 − (28) (p)2

To see the inequality in the lemma, note ﬁrst that (bk + (1 − b)z)/(h − b)(k − z) > b/(1 − b) ⇐⇒ b(k − z) + z(1 − b) > bh(k − z) is always true if h < 1. Compare now Eq. (29) with Eq. (28).Note that the LHS’s of these two equations are the same and the RHS of Eq. (29) is smaller than the RHS of Eq.(28). Given the assumption that p ≤ 0 it follows that a is greater under monitoring.

38 To show the increase in the principal’s welfare, note that

∗ ∗ ∗ EVn = p(an)q + bk + (1 − b)z − (p(an) + b/(h − b) + z/q)/p (an) and

∗ ∗ ∗ EVm = p(am)q + bk + (1 − b)z − (p(am) + b/(1 − b))/p (am)

∗ ∗ ∗ ∗ Since (1 − 1/p (an)) and (1 − 1/p (am)) are both positive because p (an), p (am) > 1 and because b/(h − b) + z/q > b/(1 − b), if h < 1, it follows that EVm > EVn.

1 Proof of Proposition 6. Let’s ﬁx a wage w. It follows that the agent’s effort choice am(q, w) is given by the solution of the maximization problem :

1 am(q, w) = arg max p(a)h(1 − b)wS + bwS − a (29) a

The FOC is given by p h(1 − b)wS = 1. In contrast, am(q, w) is deﬁned by the FOC p (1 − b)wS = 1.

1 1 Hence, for any given wS, am(q, w) < am(q, w) as long as h < 1. Also am(q, w) is increasing in h.

Proof of Proposition 7. To prove this proposition, consider the principal’s problem when she knows

1 that the agent’s action is given by am(q, w). Here the principal’s Lagrangian is given by

L(wS, a, µ) = p(a)q + (bk + (1 − b)z) − p(a)(h − b)wS − bwS + µ(p (a)(h − b)wS − 1)

the ﬁrst-order condition with respect to a is given by p q − p (h − b)wS + µp (h − b)wS = 0 and the

ﬁrst-order condition with respect to wS is given by −p(h − b) − b + µp (h − b) = 0. Solving for µ and

1∗ substituting w = 1/p (h − b) we get that am (q) is given by:

p(p + b/(h − b)) pq = 1 − (30) (p)2

1∗ ∗ Comparing Eq. (31) with Eq.(29) it follows am < am as long as h < 1 because the RHS of (31) is

1∗ always greater than the RHS of Eq.(29). Furthermore since the RHS of (31) is decreasing in h, am is

39 decreasing in h.

Proof of Proposition 8. Since se and sr are independent it follows that the joint distribution of ω, se and sr is given by a multivariate normal distribution with mean vector (0, 0, 0) and the corresponding 2 σ0 covariance matrix C. Given the assumptions on C, it follows that E[ω | sr] = 2 2 sr and E[ω | se, sr] σ0 +σe is given by −1 2 2 2 2 2 σ0 + σe σ0 se E[ω | se, sr] = σ0, σ0 (31)  2 2 2   σ0 σ0 + σr sr         Straightforward calculation shows that

2 2 yrσe seσr E[ω | se, yr] = 2 2 + 2 2 σe + σr σr + σe 2 2 2 2 2 σ0 2 σ0 σe 2 σ0 σr where se = 2 2 se, σ = 2 2 and σ = 2 2 . σ0 +σe e σ0 +σe r σ0 +σr Consider now the biased case where ρ = 1. Here the advisee believes that y = E[ω | s , s ] and r e r 1 hence takes an action ye = yr. For ρ < 1 the advisee believes that with probability ρ it is the case that yr = E[ω | se, sr] and with probability 1 − ρ it is the case that yr = E[ω | sr]. Hence it is always true

ρ 0 0 ρ that ye ∈ [min{ye , yr} , max{{ye , yr}]. Furthermore as the probability ρ increases |ye − yr| decreases.

ρ 2 Proof of Corollary 4. Note ﬁrst that −E(ye − ω) is decreasing in ρ by virtue of Proposition 8 since

the estimate of ω has the lowest variance given s1 and s2 if ye = E[ω | se, yr]. Also for a ﬁxed ρ,

ρ E |yr − ye | is decreasing in σr and increasing in σe. Hence if we ﬁx σe < M < ∞, there always exists

2 ρ 2 sufficiently large σr such that −E(se − ω) > −E(ye − ω) . Similarly, for a fixed σr > 0 there always exists σ sufficiently small that −E(s − ω)2 > −E(yρ − ω)2. It follows that k(σ , σ , ρ) is increasing e e e e r in σ decreasing in σ and increasing in ρ. e r

Proof of Proposition 9. Simple calculations show that s2 dominates s1 iff 2h − 1 > ρ. Here the advisor

2 sends s2 iif c < (1 − ρ )(h + 0.5)(1 − ρ) = k2(ρ, h). It follows that k2(ρ, h) is increasing in h and

3 decreasing in ρ. If 2h − 1 < ρ, the advisor sends s1 iif c < ρ (h − 0.5) + ρ(0.5 − ρh) = k1(ρ, h).

40 References

[1] Alchian, Armen, and Harold Demsetz. 1972. "Production, Information Costs and Economic Orga- nization." American Economic Review, 62(5): 777-95.

[2] Anderson, John, Marianne Jennings, Jordan Lowe, and Philip Reckers. 1997. "The Mitigation of Hindsight Bias in Judges’ Evaluation of Auditor Decisions." Auditing: A Journal of Practice and Theory, 16(2): 20–39.

[3] Banerjee, Abhijit. 1992. "A Simple Model of Herd Behavior." Quarterly Journal of Economics, 107(3): 797-817.

[4] Berlin, Leonard. 2002. "Malpractice Issues in Radiology. Hindsight Bias." American Journal of Roentgenology, 175(3): 597-601.

[5] Biais, Bruno, and Martin Weber. 2007. "Hindsight Bias and Investment Performance." Mimeo IDEI Toulose.

[6] Bukszar, Ed, and Connolly Terry. 1988. "Hindsight Bias and Strategic Choice: Some Problems in Learning From Experience." Academy of Management Journal, 31(3): 628-641.

[7] Camerer, Colin, George Loewenstein, and Martin Weber. 1989. "The Curse of Knowledge in Eco- nomic Settings: An Experimental Analysis." Journal of Political Economy, 97(5): 1234-1254.

[8] Camerer, Colin, and Ulrike Malmendier. 2007. "Behavioral Economics of Organizations." in: P. Diamond and H. Vartiainen (eds.), Behavioral Economics and Its Applications. Princeton: Princeton University Press.

[9] Caplan Robert, Posner Karen, Cheney Frederick. 1991. "Effect of Outcome on Physicians’ Judg- ments of Appropriateness of Care." Journal of the American Medical Association, 265(15): 1957-1960.

[10] Conlin, Mike, Ted O’Donoghue, and Timothy Vogelsang. 2007. "Projection Bias in Catalog Or- ders." American Economic Review, 97(4): 1217-1249.

[11] DeMarzo, Peter, Dimitri Vayanos, and Jeffrey Zwiebel. 2003. "Persuasion Bias, Social Inﬂuence, and Uni-dimensional Opinions." Quarterly Journal of Economics, 18(3): 909-968.

[12] Dewatripont, Mathias, and Patric Bolton. 2005. Contract Theory. Cambridge: The MIT Press.

41 [13] Fischhoff, Baruch. 1975. "Hindsight =Foresight: The Effect of Outcome Knowledge on Judge- ment Under Uncertainty." Journal of Experimental Psychology: Human Perception and Per- formance, 1(3): 288-299.

[14] Gilovich, Thomas, Kenneth Savitsky, and Victoria Medvec. 1998. "The Illusion of Transparency: Biased Assessment of Other’s Ability to Read Our Emotional States." Journal of Personality and Social Psychology, 75(2): 743-753.

[15] Kessler, Daniel, and Mark McClellan. 2002. "How Liability Law Affects Medical Productivity." Journal of Health Economics, 21(6): 931-955.

[16] Kruger, Justin, Epley Nicholas, Jason Parker, and Zhi-Wen Ng. 2005. "Egocentrism over E-mail: Can People Communicate as well as They Think?" Journal of Personality and Social Psychol- ogy, 89(6): 925-936.

[17] Harley, Erin, Keri Carlsen, and Geoffrey Loftus. 2004. "The “Saw-It-All-Along” Effect: Demon- strations of Visual Hindsight Bias." Journal of Experimental Psychology: Learning, Memory, and Cognition, 30(5): 960-968.

[18] Harley, Erin. 2007. "Hindsight Bias in Legal Decision Making." Social Cognition, 25(1): 48-63.

[19] Hastie, Ried, David Schkade, and John Payne. 1999. "Juror Judgments in Civil Cases: Hindsight Effects on Judgments of Liability for Punitive Damages." Law and Human Behavior, 23(5): 597-614.

[20] Heath, Chip, and Dan Heath. 2007. Made to Stick: Why Some Ideas Survive and Others Die. Random House.

[21] Hermalin, Benjamin. 1993. "Managerial Preferences Concerning Risky Projects." Journal of Law, Economics, & Organization, 9(1): 127-35.

[22] Holmström, Bengt. 1979. "Moral Hazard and Observability." Bell Journal of Economics, 10(1): 74-91.

[23] Holmström, Bengt. 1999. "Managerial Incentive Problems - A Dynamic Perspective." Review of Economic Studies, 66(1): 169-182.

[24] Innes, Robert. 1990. "Limited Liability and Incentive Contracting with Ex-ante Action Choices."

42 Journal of Economic Theory, 52(1): 45-67.

[25] Jackson, Rene, and Alberto Righi. 2006. Death of Mammography: How Our Best Defense Against Cancer is Being Driven to Extinction. Caveat Press.

[26] Lazear, Edward. 2000. "Performance Pay and Productivity." American Economic Review, 90(5): 1346-61.

[27] Loewenstein, George, Ted O’Donoghue, and Matthew Rabin. 2003. "Projection Bias in Predicting

Future Utility." Quarterly Journal of Economics, 118(4): 1209-1248.

[28] Loewenstein, George, Don Moore, and Roberto Weber. 2006. "Misperceiving the Value of Infor- mation in Predicting the Performance of Others." Experimental Economics, 9(3): 281-295.

[29] Mullainathan, Sendhil. 2002. "A Memory-Based Model of Bounded Rationality." Quarterly Jour- nal of Economics, 117(3): 735-774.

[30] Newton, Elizabeth. 1990. "Overconﬁdence in the Communication of Intent: Heard and Unheard Melodies." Unpublished Doctoral Dissertation, Stanford University, Stanford, CA.

[31] Rabin, Matthew. 2002. Inference by Believers in the Law of Small Numbers." Quarterly Journal of Economics, 117(3): 775-816.

[32] Rachlinski, Jeffrey. 1998. "A Positive Psychological Theory of Judging in Hindsight." The Univer- sity of Chicago Law Review, 65(2): 571-625.

[33] Shapiro, Carl, and Joseph Stiglitz. 1984. "Equilibrium Unemployment as a Worker Discipline De- vice." American Economic Review, 74(3): 433-444.

[34] Shavell, Steven. 1980. "Strict Liability Versus Negligence." Journal of Legal Studies, 9(1): 1-25.

[35] Spiegler, Roni. 2006. "The Market for Quacks." Review of Economic Studies, 73(4): 1113-1131.

[36] Studdert, David, Michelle Mello, William Sage, Catherine DesRoches, Jordon Peugh, Kinga Za- pert, and Troyen Brennan. 2005. "Defensive Medicine Among High-Risk Specialist Physicians in a Volatile Malpractice Environment." Journal of the American Medical Association, 293(2): 2609-2617.

[37] Van Boven, Leaf, Gilovich Thomas, and Victoria Medvec. 2003. "The Illusion of Transparency in Negotiations." Negotiation Journal, 19(2): 117-131.