<<

Altruistic punishment: A closer look 1

1 In press: Proceedings of the Royal Society B: Biological Sciences, 280(1758). 2 3 4 Do humans really punish altruistically? A closer look 5 6 Eric J. Pedersena, Robert Kurzbanb,c, and Michael E. McCullougha 7

8 aDepartment of Psychology, University of Miami, FL 33124-0751 USA 9 bDepartment of Psychology, University of Pennsylvania, PA 19104-6241 USA 10 cDepartment of Economics, University of Alaska Anchorage, AK 99508 USA 11 12 Key Words: cooperation, altruistic punishment, third-party punishment, affective forecasting, 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Corresponding Author: 31 Michael E. McCullough 32 Department of Psychology 33 University of Miami 34 P.O. Box 248185 35 Coral Gables, FL 33124-0751 36 Phone: 305-284-8057 37 Fax: 305-284-3402 38 Email: [email protected]

Altruistic punishment: A closer look 2

39 Abstract

40 Some researchers have proposed that has given rise in humans to one or more

41 adaptations for altruistically punishing on behalf of other individuals who have been treated

42 unfairly, even when the punisher has no chance of benefiting via reciprocity or benefits to kin.

43 However, empirical support for the altruistic punishment hypothesis depends on results from

44 experiments that are vulnerable to potentially important experimental artefacts. Here we searched

45 for evidence of altruistic punishment in an experiment that precluded these artefacts. In so doing,

46 we found that victims of unfairness punished transgressors whereas witnesses of unfairness did

47 not. Furthermore, witnesses’ emotional reactions to unfairness were characterized by envy of the

48 unfair individual’s selfish gains rather than by moralistic anger toward the unfair behaviour. In a

49 second experiment run independently in two separate samples, we found that previous evidence

50 for altruistic punishment plausibly resulted from affective forecasting error—that is, limitations

51 on humans’ abilities to accurately simulate how they would feel in hypothetical situations.

52 Together, these findings suggest that the case for altruistic punishment in humans—a view that

53 has gained increasing attention in the biological and social sciences—has been overstated.

54

Altruistic punishment: A closer look 3

55 Do humans really punish altruistically? A closer look

56 In many animal species, including humans, individuals punish conspecifics that have

57 harmed them [1–3]. Some researchers have recently argued that humans, unlike other animals,

58 also altruistically punish individuals who have harmed others, even when the punisher has no

59 chance of benefiting via reciprocity or benefits to kin [4–6]. Results from several economics

60 experiments appear to support this claim [4,6,7], but some scholars have questioned both the

61 adaptationist logic behind such theoretical claims [8–10] and the interpretation of the empirical

62 results [8,10–14]. Here we elide these theoretical debates and instead investigate a more basic

63 empirical question: Do people actually spontaneously punish individuals who have only harmed

64 other individuals in anonymous settings in the laboratory? Put differently, do the empirical

65 research findings often marshalled in support of the altruistic punishment hypothesis [e.g., 4,6]

66 provide a reliable guide to the presence or absence of a propensity for altruistic punishment in

67 humans?

68 In previous work, researchers claimed empirical support for the existence of altruistic

69 punishment on the basis of results from public goods game experiments in which the individual

70 being punished had harmed—or failed to help—the putative punisher as well as other victims

71 [4], leaving open the possibility that the punishment was vengeful, rather than altruistic [10].

72 Results from similar experiments that exclude revenge as a possible motive suggest that

73 investments in punishment in such contexts are conspicuously low [15]. Additional data

74 frequently adduced in support of the altruistic punishment hypothesis come from third-party

75 punishment games [6,7], in which a Dictator chooses to give some portion of a sum of money

76 (or none) to a passive Recipient. A third player can punish the Dictator (at a cost) in response to

Altruistic punishment: A closer look 4

77 the Dictator’s transfer to the Recipient. Many third parties in games with this structure pay a cost

78 to punish stingy Dictators, despite receiving no financial benefit from doing so [6,7].

79 However, five methodological limitations of the standard third-party punishment game

80 might conspire to yield inflated estimates of humans’ propensity to punish strangers for having

81 behaved unfairly toward other strangers. First, in the standard game, subjects are assigned to a

82 third-party role that implies their task is to determine how much to punish the Dictator: Indeed,

83 the only choices third parties can make are whether to punish the Dictator [16]—and, if so, how

84 much. Thus, any error will lead to an increase in the estimated quantity of punishment. Second,

85 punishment in the third-party punishment game is typically administered with the presence (or

86 inferred presence) of an audience: punishment of the Dictator by the third party is witnessed by

87 the initial victim because all players see the results of the game. The presence of an audience

88 introduces reputational considerations that could motivate punishment as a means of pursuing

89 indirect fitness benefits (e.g., by signaling one’s quality as a cooperative partner [17, 18]; or

90 one’s formidability to prevent future exploitation of oneself [19, 20] or one’s friends and kin

91 [21]). Indeed, it has been shown—though with a different paradigm—that observers of unfair

92 treatment punish third parties significantly less when they are assured no one will see their

93 decision [22; however, see also ref. 23].

94 Third, the third-party punishment game is typically conducted with the “strategy method”

95 [24], which requires third parties to repeatedly respond to a series of hypothetical Dictator

96 choices—in advance of learning of the Dictator’s actual choice—that are progressively more (or

97 progressively less) unfair [6]. Such methods can cause subjects to infer that the experimenters

98 expect them to vary their responses according to some feature that varies across the set of

99 repeated scenarios [25]. Consequently, due to a well-known experimental artefact called demand,

Altruistic punishment: A closer look 5

100 subjects might feel compelled to punish at least some of the time, calibrating those decisions to

101 the only feature of the Dictators’ repeated choices that varies: how unfair they are. This is

102 especially problematic in the standard third-party punishment game because rewarding is not

103 allowed; the only way subjects can vary their responses is to vary their amount of punishment. In

104 a notable exception, Alemberg, et al. [26] did add a rewarding option to the typical third-party

105 punishment game (conducted with the strategy method), and a small amount of third-party

106 punishment was observed, on average, when Dictators transferred $0 (of $10) to the Recipient.

107 We note, however, that subjects in this experiment were informed, before making their decisions,

108 that it was possible they would not be paired with another subject—in such a case, their

109 decisions would not be enacted and they would retain all of their money (i.e., participants’

110 decisions were somewhat hypothetical; see below).

111 Fourth, the strategy method also involves affective forecasting [27] inasmuch as it

112 requires subjects to respond ex ante to Dictator actions that have not yet occurred. Such

113 behavioural commitments can differ from the actual behaviours people enact after experiencing

114 social situations directly because people frequently weight the features of social situations

115 differently during conscious deliberation than they do after experiencing those social situations

116 in real time [28]. For example, as forecasters, people severely overestimate how upset they

117 would feel by (and subsequently, how much they would attempt to avoid interacting with)

118 someone who had made a racist comment; in contrast, subjects who have actually observed

119 another individual express strongly racist attitudes (versus those in control conditions) respond

120 with relative indifference to the racist individual [29].

121 Fifth, previous claims that anger is the predominant emotional response of third-party

122 punishers have relied on self-reports of anger in response to hypothetical scenarios [4,6]. Self-

Altruistic punishment: A closer look 6

123 reports of anger are typically highly correlated with self-reports of other, similar emotions—

124 including envy [30]. To the extent that the covariation between self-reported anger and self-

125 reported envy is not statistically controlled, estimates of third parties’ anger toward unfair

126 strangers might actually reflect envy, which can also motivate costly punishment in pursuit of

127 goals that are quite distinct from putatively altruistic goals such as enforcing norms or delivering

128 deterrence benefits to strangers [31]: Specifically, if third parties’ punishment of individuals who

129 have treated another individual unfairly is motivated by envy, but not by anger, then the

130 mechanisms that motivate third-party punishment might process cues that another individual has

131 obtained better outcomes than the self, rather than cues that an individual has violated a norm or

132 harmed an anonymous third party in whom the punisher has no fitness interest [e.g., 6,7].

133 Here we present two experiments designed to test whether subjects punish altruistically

134 on behalf of strangers in a third-party punishment game that was designed to rectify the

135 methodological problems noted above. We also examined whether previous findings could

136 plausibly be explained as a product of affective forecasting errors. We note that our goal was not

137 to estimate the unique influence of each of these five potential methodological problems; rather,

138 our goal was to test whether the altruistic punishment hypothesis could survive falsification in an

139 experiment that eliminated these problems. Experiment 1 was a modified third-party punishment

140 game in which subjects could either punish or reward—thereby reducing experimental demand

141 for punishment [25], the confounding of error and punishment, and potential audience effects.

142 Also, subjects made decisions about giving or deducting money from Dictators after witnessing

143 the Dictator’s decision, which enabled us to measure third-party punishment without the

144 possibility of affective forecasting errors [27]. Additionally, our measures of emotion were fine-

145 grained enough that it was possible to evaluate the unique motivational roles of anger and envy.

Altruistic punishment: A closer look 7

146 In Experiments 2a and 2b, the same third-party punishment game was presented as a

147 hypothetical vignette to subjects from two different research pools.

148

149 Methods

150 Subjects 151 152 Experiment 1: Subjects were 315 University of Miami undergraduates (mean age = 19.12,

153 SD = 2.99; 57% female). They received partial course credit and monetary compensation (see

154 below).

155 Experiment 2a: 538 subjects (mean age = 34.37, SD = 12.14; 60% female) were recruited

156 via Amazon Mechanical Turk (http://www.mturk.com/mturk/welcome) and were paid $0.25 for

157 their participation. Participation was restricted to users in the United States. Because participants

158 merely had to read a vignette and then report their forecasts of how they would think, feel, and

159 behave if the hypothetical situation had actually happened to them, participation generally took

160 about 4 minutes.

161 Experiment 2b: We replicated Experiment 2a with University of Miami undergraduates;

162 394 subjects (mean age = 18.74, SD = 1.27; 53% female) participated for partial course credit.

163 Procedure

164 Experiment 1: Subjects were run in individual sessions at a computer station in an

165 isolated room (see ESM S2.1). The entire experiment, including instructions, was conducted on a

166 computer via E-Run with a script created in E-Prime version 2.0. After subjects provided

167 informed consent, they were told they would be interacting with two other players in the building

168 over the computer network and that it was important that those other people remain anonymous;

169 in fact, they interacted with a pre-programmed computer script. Without this deception, the

Altruistic punishment: A closer look 8

170 research would have been unfeasible (see ESM S2.1). Subjects were informed that they would be

171 participating in an economic decision-making game that would last for multiple rounds and they

172 would be paid based on the money they earned during the game. Because deception was

173 involved, everyone was paid a flat rate of $9 at the end of the experiment following a debriefing.

174 We used a “funnel debriefing” method designed to detect suspicion and explore subjects’

175 reactions to having been deceived [32]. Subjects flagged for suspicion were excluded from all

176 analyses presented; re-including them in analyses did not qualitatively affect the results in any

177 way (see ESM S1.2).

178 The decision-making game comprised two rounds (Fig. 1) in which each player was

179 given $5 to use in each round and assigned to one of three roles: Decision-Maker, Receiver, or

180 Observer. (We refer to these roles here as Dictator, Recipient, and Third Party, respectively, to

181 be consistent with labels used in previous work on third-party punishment.) Subjects were not

182 told the exact number of rounds to avoid end of game effects [33], and were told that money

183 earned during each round would be “banked” and thus unaffected by subjects’ behaviour during

184 subsequent rounds. The Dictator ostensibly had the option to give any portion of his or her $5 to

185 the Recipient, or take any portion of the Recipient’s $5; the Third Party would merely see the

186 results of the round and would not be affected by the Dictator’s choice. Subjects were informed

187 that in some rounds all players would be involved, and in other rounds some players might be

188 excluded. Subjects were randomly assigned to be either the Third Party or the Recipient in the

189 first round and the (computer-programmed) Dictator either took $4 or $0 from the Recipient. The

190 computer displayed a summary screen for the round showing the amount of money each player

191 earned for the round. Following the round, subjects completed a lexical decision task (see ESM

192 S1.3) and a series of self-report questions (see below).

Altruistic punishment: A closer look 9

193 Prior to role assignment for the second round, subjects were informed that there would be

194 no Third Party in Round 2 [to avoid potential audience effects; 22,34]; one player would be

195 assigned to a different task and be unable to see the results of the interaction. We note that the

196 presence of the experimenter can also induce audience effects e.g., [22]; we took great care to

197 minimize this potential influence by (a) clearly informing participants during the consent process

198 that their data would be stored completely anonymously and could not be connected to them in

199 any way, and (b) minimizing contact with the experimenter by presenting all instructions

200 electronically. Though we cannot rule out experimenter audience effects completely, our results

201 are as insulated from them as we believe it was possible to do in the context of this experiment.

202 All players were given another $5; because previous earnings had been “banked,” all

203 players started Round 2 with $5. The subject was assigned the role of Dictator while the Dictator

204 from Round 1 (who had treated either the subject or the other player fairly or unfairly) was

205 assigned the role of the Recipient (ostensibly by chance). Players were identified consistently

206 throughout, so subjects were aware that the Recipient in Round 2 was the same player that had

207 been the Dictator in Round 1. Subjects were instructed that they could give any amount of their

208 $5 to the Recipient, do nothing, or remove any amount of the Recipient’s $5 (the word

209 “punishment” was never used). Removing money cost one-fourth of the amount removed and,

210 unlike in the first round, was not gained by the subject as income—it simply disappeared. Note

211 that the cost of punishment used here, 1:4, was less expensive than the 1:3 cost typically used in

212 the third-party punishment game; previous research has shown that punishment becomes more

213 likely as the cost of punishment declines (see ref. [10] for review). Following the completion of

214 the round, the experiment ended and the experimenter debriefed the subject through an

Altruistic punishment: A closer look 10

215 extensive, staged process to assess the believability of the experiment and to explain why

216 deception was necessary [32].

217 Experiments 2a and 2b: After providing consent, subjects were instructed to imagine

218 themselves “…in a particular situation in our laboratory. Please try to picture yourself in the

219 situation we are describing. We will ask you to complete a series of questions regarding how you

220 think you would think, feel, and act in this situation.” The layout and instructions of the game

221 were presented as they were in Experiment 1, and the rounds of the game and the self-report

222 measures were the same. However, subjects did not complete a lexical decision task following

223 the first round and were not debriefed following the completion of the experiment (because no

224 deception was involved).

225 Psychometric information regarding the self-report measures 226 227 Self-rated emotions toward the other players: Subjects were asked to describe their emotional

228 responses toward both of the other players after the first round. They described their feelings

229 toward both players to avoid demand effects that might have occurred by probing only about the

230 Dictator. (Emotional reactions to the other player were not of theoretical interest here and so, in

231 the interest of brevity, we do not report them).

232 - Anger: 3-item composite of ratings on a scale from 0 (not at all) to 5 (extremely) the

233 extent to which the subject was “angry,” “mad,” and “outraged” at the Dictator

234 (Cronbach’s  = .94).

235 - Envy: 2-item composite of ratings on a scale from 0 (not at all) to 5 (extremely) the extent

236 to which the subject was “envious” and “jealous” of the Dictator (Cronbach’s  = .84).

Altruistic punishment: A closer look 11

237 Fairness/moral wrongness of the Round 1 dictator’s behaviour: Subjects were asked to rate

238 both how “fair” and how “morally wrong” the Dictator’s behaviour was toward the Recipient in

239 Round 1 on a scale from 1 (not at all) to 9 (totally).

240

241 Results

242 Experiment 1

243 Third parties did not punish on behalf of strangers: A one-sample Wilcoxon test (used

244 because distributions were non-normal) revealed that the sample median of the distribution of

245 dollars punished or rewarded in Round 2 (in terms of the effect on the Recipient, not the cost to

246 the subject) by third-party witnesses of unfairness did not differ significantly from a

247 hypothesized median of zero, Z = -1.48, p = .140, N = 65 (all p-values throughout manuscript are

248 two-tailed). In contrast, victims of unfairness punished a nonzero amount, Z = -3.52, p < .001, N

249 = 61—significantly more than mere witnesses of unfairness, p = .026, N = 126 (two-sample

250 median test; Fig. 2).

251 If the function of punishment is to deter harmdoers from imposing costs on oneself (i.e.,

252 to bargain for better treatment for oneself [10,13,20]) or others in the future—or even if its

253 function is to enforce adherence to social norms [6]—then the punishment must be strong

254 enough to erase unfairly gained benefits [1,35]. Otherwise, the harmdoer retains a net profit from

255 the transgression, and thus will retain an incentive to continue to behave unfairly toward others

256 in the future. Because unfair Dictators took $4 from Recipients in Round 1 of Experiment 1, $4

257 was also the minimum amount of punishment that would be expected to deter unfair Dictators

258 from behaving unfairly in the future. Third-party punishment of this magnitude was extremely

259 rare: only 2 of 65 (3%) witnesses imposed at least $4 worth of punishment on unfair Dictators, a

Altruistic punishment: A closer look 12

260 proportion no different from the proportion for witnesses of fairness (0 of 80; p = .199, Fisher’s

261 Exact Test). In contrast, 13 of 61 (21%) victims of unfairness punished at least $4, a proportion

262 significantly greater than that for both recipients of fairness (0 of 64; p < .001), and witnesses of

263 unfairness (p = .002). Indeed, most victims of unfairness who punished (13 of 21) imposed at

264 least $4 worth of punishment.

265 According to the self-report measures of emotion, third parties did not become angry at

266 unfairness: when controlling for envy (which was highly correlated with anger, r = .637, p <

267 .001; see ESM S1.5), a 2 (Target: Self, Other) x 2 (Treatment: Fair, Unfair) ANCOVA revealed

268 a significant target*treatment interaction for anger, F (1, 265) = 17.11, p < .001). Witnesses of

269 unfairness (M = .533, SE = .095, N = 65) did not report more anger than did witnesses of fairness

270 (M = .414, SE = .084, N = 80), p = .363, partial 2 = .00, but victims of unfairness (M = 1.28, SE

271 = .098, N = 61) did report more anger than their fairly-treated counterparts (M = .418, SE = .093,

272 N = 64), p < .001, partial 2 = .13 (Fig. 3; see ESM S1.3 and Fig. S1 for a replication with an

273 implicit measure based on reaction time data). Thus, people became angry when treated unfairly

274 but not when they only witnessed the unfair treatment of a stranger [cf. 36]. Importantly, this

275 difference in the anger of witnesses versus victims of unfairness was not due to different

276 perceptions of the transgression’s fairness or moral wrongness (see ESM S1.4 and Fig. S2).

277 Eleven of sixty-five witnesses of unfairness paid some cost to impose costs on the unfair

278 Dictator; with a much larger sample, one might argue, we therefore might have found statistical

279 evidence for mild third-party punishment. However, because the witnesses of unfairness were

280 not angry at the Dictator (see above), we suspected that the predominant emotional response

281 among witnesses of unfairness was envy given that they had observed the unfair Dictator obtain

282 a higher payoff ($9) than they themselves had received [$5; see ref. 37]. We found a significant

Altruistic punishment: A closer look 13

283 target*treatment interaction for envy (with anger partialed out), F (1, 265) = 4.53, p = .034:

284 witnesses of unfairness were more envious of the Dictator than were the witnesses of fairness, p

285 = < .001, partial 2 = .07. In contrast, victims of unfairness were no more envious than were their

286 fairly-treated counterparts, p = .306, partial 2 = .00. Thus, had we observed a significant amount

287 of third-party punishment among witnesses of unfairness, it plausibly could have been motivated

288 by envy toward the unfair Dictator rather than by moralistic anger. This difference in the

289 emotions of the witnesses and victims of the Dictator’s unfairness explains why third-party

290 punishment was quite rare and mild: witnesses of unfairness were envious of the Dictator’s ill-

291 gotten gains—but not angry—and so they likely were reluctant to spend their own money to

292 punish the Dictator.

293 During Experiment 1, we also ran a small fifth condition (N = 45; see ESM S1.1) in

294 which witnesses of unfairness started Round 1 with $9, enabling us to test whether witnesses’

295 economic disadvantage relative to unfair Dictators explained their surplus envy. Witnesses who

296 started Round 1 with $9 were significantly less envious of unfair Dictators (controlling for anger)

297 than were witnesses who started with $5, F (1, 107) = 8.35, p = .005, partial 2 = .07. Self-

298 reported anger (controlling for envy) did not differ between groups, F (1,107) = .581, p = .448,

299 partial 2 = .01. Witnesses of unfairness with $9 did not punish an amount significantly different

300 from zero, Z = -.879, p = .379, N = 45, nor differently than did the witnesses of unfairness with

301 $5, p = .475, N = 110, even though doing so would have cost a smaller proportion of their stake.

302 Thus, the emotional reactions of witnesses of unfairness were characterized by envy rather than

303 moralistic anger.

304 Experiment 2a

Altruistic punishment: A closer look 14

305 In Experiments 2a (online sample) and 2b (undergraduate sample), we investigated how

306 subjects’ affective and behavioural forecasts in a hypothetical scenario would compare to the

307 results from Experiment 1 (see Table S1 and Table S2 for descriptive statistics for both

308 experiments). This experiment was conducted not because we thought that participants’

309 hypothetical responses would provide a reliable assay of how they would behave in a real-life

310 situation, such as the one we explored in Experiment 1, but because we wished to compare

311 participants’ forecasts of how they might behave and feel with participants’ actual behaviour and

312 emotional reactions in Experiment 1. The rounds of the game were identical to Experiment 1,

313 except that subjects were instructed to report how they believed they would act and feel in

314 response to a hypothetical vignette.

315 In Experiment 2a—and in contrast to Experiment 1—witnesses of hypothetical unfairness

316 forecast that they would administer a greater-than-zero amount of punishment, Z = -2.38, p =

317 .017, N = 137, as did victims of hypothetical unfairness, Z = -2.66, p = .008, N = 148; witnesses

318 and victims of hypothetical unfairness did not differ significantly, p = .391. Four additional

319 results suggest that a different psychological process was at work in Experiment 2a than in

320 Experiment 1. First, witnesses of hypothetical unfairness forecast a much higher likelihood of

321 punishing at least $4 (the critical threshold for efficacious punishment) than did witnesses of

322 hypothetical fairness (p = .002, Fisher’s Exact Test). This proportion (22 of 137; 16%) was

323 significantly larger than what we saw in the actual behaviour of Experiment 1 subjects (2 of 65;

324 3%), p = .009, Fisher’s Exact Test, (Fig. 4). Second, the proportion of witnesses (16%) and

325 victims (31 of 148; 21%) of hypothetical unfairness that forecast punishing at least $4 did not

326 differ, p = .361, contrary to Experiment 1. Third, witnesses of hypothetical unfairness forecast a

327 significant amount of anger toward unfair Dictators in Experiment 2a, again in contrast to

Altruistic punishment: A closer look 15

328 Experiment 1: there was a significant target*treatment interaction (controlling for envy), F (1,

329 285) = 5.87, p = .016. Witnesses of hypothetical unfairness (M = 1.59, SE = .109, N = 133)

330 forecast more anger than did witnesses of hypothetical fairness (M = .419, SE = .116, N = 147), p

331 < .001, partial 2 = .16. Likewise, victims of hypothetical unfairness (M = 2.04, SE = .110, N =

332 139) forecast more anger than did recipients of hypothetical fairness (M = .345, SE = .108, N =

333 141), p < .001, partial 2 = .28. Fourth, both witnesses (F (1, 131) = 25.48, p < .001, partial 2 =

334 .16) and victims (F (138) = 10.69, p = .001, partial 2 = .07) of hypothetical unfairness forecast

335 significantly more anger (controlling for envy) toward the unfair Dictator than their counterparts

336 reported in Study 1 (Fig. 3). These results are therefore consistent with proposals that norm

337 violations elicit “negative emotions” [4,6], which in turn motivate altruistic punishment, but here

338 they resulted from affective forecasting rather than from responding to real-time events. (Recall,

339 in contrast, that in Experiment 1, which involved real-time behaviour and emotional responses

340 rather than forecasting, no such moral outrage was found).

341 Experiment 2b

342 The pattern of punishment results for Experiment 2b was virtually identical to those of

343 Experiment 2a: the proportion of witnesses of hypothetical unfairness that forecast they would

344 punish at least $4 (5 of 85; 6%) did not differ from that of victims of hypothetical unfairness (9

345 of 101; 9%), p = .580, Fisher’s Exact Test—a pattern that was similar to Experiment 2a, but

346 contrary to Experiment 1. Interestingly, neither witnesses Z = .183, p = .855, N = 85, nor victims

347 of hypothetical unfairness, Z = -.162, p = .872, N = 101, reported they would administer a

348 greater-than-zero amount of punishment. Nevertheless, both witnesses of hypothetical unfairness

349 (p = .031) and victims of hypothetical unfairness (p = .012) forecast they would punish

350 significantly more than did their (hypothetically) fairly-treated counterparts: this is because both

Altruistic punishment: A closer look 16

351 witnesses, Z = 2.33, p = .020, N = 97 and recipients of hypothetical fairness, Z = 2.98, p = .003,

352 N = 85, rewarded a greater-than-zero amount. Furthermore—and importantly—witnesses and

353 victims of hypothetical unfairness did not forecast different amounts of punishment, p = .743.

354 Thus, notwithstanding the fact that hypothetical punishment of unfairness appeared largely to

355 have taken the form of withdrawing reward (rather than imposing costs) for subjects in

356 Experiment 2b, the punishment results largely replicated those obtained in Experiment 2a (see

357 Fig. 4).

358 Moreover, the pattern of emotion results of Experiment 2b was identical to that of

359 Experiment 2a: witnesses of hypothetical unfairness (M = 1.43, SE = .100, N = 94) forecast more

360 anger (controlling for envy) than did witnesses of hypothetical fairness (M = .392, SE = .104, N =

361 92), p < .001, partial 2 = .13. Likewise, victims of hypothetical unfairness (M = 1.44, SE = .096,

362 N = 109) forecast more anger than did recipients of hypothetical fairness (M = .259, SE = .098, N

363 = 99), p < .001, partial 2 = .16. Witnesses (F (1, 144) = 18.53, p < .001, partial 2 = .11) but not

364 victims (F (157) = .005, p = .945, partial 2 = .00) of hypothetical unfairness also forecast

365 significantly more anger (controlling for envy) toward the unfair Dictator than the subjects in

366 Experiment 1 actually experienced (Fig. 3). The overall pattern of forecast behaviour and

367 emotion in Experiment 2b suggests that the students who were the subjects in Experiment 2b had

368 a slight tendency to believe that they would reward fair distributions, which the non-student

369 subjects in Experiment 2a did not share, but in every other way the results are identical to those

370 of Experiment 2a: subjects forecast that both experiencing and witnessing unfairness would

371 cause them to become angry and to punish dictators to a greater extent than did subjects who

372 forecast their responses to either receiving or witnessing fair treatment. Furthermore, both

373 experiencers and witnesses of unfairness forecast equivalent likelihoods of punishing at least $4.

Altruistic punishment: A closer look 17

374 Discussion

375 Experiment 1 indicates that, under the conditions we investigated, humans do not impose

376 meaningful amounts of third-party punishment on behalf of absolute strangers. The nominal and

377 statistically non-significant amount of punishment we did observe was apparently motivated by

378 envy because of a comparatively unfavourable personal outcome rather than by moralistic anger

379 on behalf of a mistreated stranger. Our finding that the emotional reaction to witnessing

380 unfairness is characterized by envy rather than moralistic anger is particularly inconvenient for

381 the altruistic punishment hypothesis: to categorize a behaviour as an adaptation for altruistic

382 benefit delivery, one needs to provide evidence that the psychological mechanisms that produce

383 the behaviour in question have been designed for that specific function. That is, one needs to

384 demonstrate that the behaviour is not caused by mechanisms designed for a different function.

385 The presence of envy, rather than moralistic anger, in response to witnessing unfairness, suggests

386 that the psychological mechanisms involved in third-party punishment are, at least in part,

387 designed to process cues that another individual has obtained better outcomes than oneself [38].

388 In contrast, we found no evidence that they are designed to process cues that an anonymous

389 stranger has been harmed. We do not mean to imply that humans do not impose any third-party

390 punishment: under some circumstances, they do [22,35,39]. However, our results cast doubt on

391 the proposal that the mechanisms that motivate third-party punishment are altruistic benefit-

392 delivery systems that are motivated proximately by moralistic anger.

393 Experiments 2a and 2b show furthermore that people inaccurately forecast their affective

394 and behavioural responses to unfairness in experimental games: in particular, subjects who

395 imagined themselves witnessing (rather than experiencing) unfair treatment forecast both more

396 anger and punishment (and, in the case of Experiment 2b, withdrawal of rewarding) than is

Altruistic punishment: A closer look 18

397 observed among people who witness unfair treatment in the laboratory. This dissociation

398 between hypothetical and actual third-party punishment raises the possibility that punishment

399 imposed by mere witnesses of unfairness found in prior work resulted from demand

400 characteristics, affective forecasting errors, and the other methodological shortcomings we have

401 cited here [8,25,27].

402 Limitations

403 As mentioned at the outset, the goal of the experiments presented herein was not to

404 systematically identify which specific methodological conventions were responsible for previous

405 findings of third-party punishment on behalf of strangers. Rather, the goal was to test whether a

406 suite of methodological conventions that are commonly applied within the third-party

407 punishment game collude to create more third-party punishment in that experimental realization

408 than would actually obtain in experiments that remediated those methodological shortcomings.

409 As such, our results cannot speak with certainty to the effects of particular aspects of previous

410 designs, such as the strategy method. A recent survey of studies comparing the strategy method

411 to the direct-response method across a variety of paradigms found that evidence surrounding the

412 effect of using the strategy method is mixed [40], but importantly, no study has been conducted

413 to directly compare the amounts of third-party punishment elicited in experiments using the

414 strategy method versus the direct response method [40]. Therefore, further work is needed to

415 determine the unique contribution of the use of the strategy method (and the other potential

416 methodological artefacts we have identified herein) to the apparently exaggerated evidence for

417 altruistic third-party punishment that previous work has revealed: we emphasize again that doing

418 so was not our goal here. Despite this limitation, our results do strongly suggest that subjects’

Altruistic punishment: A closer look 19

419 forecasts of their likely anger and punishment in response to witnessing unfairness in the

420 standard third-party punishment game [e.g., 6] are exaggerated.

421 Conclusion

422 These findings are of broad significance in the study of human cooperation because many

423 researchers have proceeded under the assumption that altruistic punishment is a robust

424 phenomenon that requires an adaptationist explanation. Indeed, two scientific “problems” for

425 which cooperation researchers over the past decades have been seeking adaptationist solutions

426 might not be problems at all. Consider the puzzle framed by proponents of “strong reciprocity,”

427 such as Gintis [41], who claimed that humans are “strong reciprocators” who are “predisposed to

428 cooperate with others and punish non co-operators, even when this behaviour cannot be justified

429 in terms of self-interest, extended kinship, or (p. 169).”

430 With respect to the former problem—“unjustified” predispositions to cooperate without

431 apparent individual benefit—results from recent models suggest that a bias to cooperate, even

432 when one faces cues of an interaction being one-shot, should be expected to coevolve with

433 reciprocity. This is because mistaking a one-shot interaction for a repeated interaction is a less

434 costly error than the reverse [42]. In terms of the latter problem (i.e., the claim that people punish

435 non co-operators even when the punisher does not stand to benefit individually), the results from

436 Experiment 1 call into question the claim that people engage in altruistic third-party punishment

437 at all [See also, 10,13]. We think another way forward in the study of third-party punishment in

438 humans, as in the study of the evolved mechanisms that motivate human cooperation in general,

439 is to intensify the search for direct or indirect benefits for punishers that outweigh the costs of

440 punishment, consistent with all known cases of third-party intervention in non-human animals

441 [43,44].

Altruistic punishment: A closer look 20

442 Acknowledgements

443 We thank Max Burton-Chellow, , Stuart A. West, and Richard W. Wrangham for

444 their feedback on a previous draft, and David G. Rand for kind assistance with his data. Research

445 supported by grants from the Air Force Office of Scientific Research (Award #FA9550-12-1-

446 0179) to M.E.M and R.K., the Arsht Research on Ethics and Community Program at the

447 University of Miami to M.E.M, and an NSF Graduate Research Fellowship to E.J.P.

Altruistic punishment: A closer look 21

448 References

449 1 Clutton-Brock, T. H. & Parker, G. A. 1995 Punishment in animal societies. Nature 373, 450 209-216. (DOI:10.1038/373209a0)

451 2 West, S. A., Griffin, A. S. & Gardner, A. 2007 Social semantics: altruism, cooperation, 452 mutualism, strong reciprocity and . J Evol Biol 20, 415-432. 453 (DOI:10.1111/j.1420-9101.2006.01258.x)

454 3 Bshary, A. & Bshary, R. 2010 Self-serving punishment of a common enemy creates a 455 public good in reef fishes. Curr Biol 20, 2032-2035. (DOI:10.1016/j.cub.2010.10.027)

456 4 Fehr, E. & Gächter, S. 2002 Altruistic punishment in humans. Nature 415, 137-140. 457 (DOI:10.1038/415137a)

458 5 Boyd, R., Gintis, H., Bowles, S. & Richerson, P. J. 2003 The evolution of altruistic 459 punishment. Proc Natl Acad Sci USA 100, 3531-3535. (DOI:10.1073/pnas.0630443100)

460 6 Fehr, E. & Fischbacher, U. 2004 Third-party punishment and social norms. Evol Hum 461 Behav 25, 63-87. (DOI:10.1016/S1090-5138(04)00005-4)

462 7 Henrich, J., McElreath, R., Barr, A., Ensminger, J., Barrett, C., Bolyanatz, A., Cardenas, J. 463 C., Gurven, M., Gwako, E., Henrich, N. et al. 2006 Costly punishment across human 464 societies. Science 312, 1767-70. (DOI:10.1126/science.1127333)

465 8 West, S. A., El Mouden, C. & Gardner, A. 2011 Sixteen common misconceptions about 466 the evolution of cooperation in humans. Evol Hum Behav 32, 231-262. 467 (DOI:10.1016/j.evolhumbehav.2010.08.001)

468 9 Burnham, T. C. & Johnson, D. D. P. 2005 The biological and evolutionary Logic of 469 human cooperation. Analyse and Kritik 27, 113-135.

470 10 McCullough, M. E., Kurzban, R. & Tabak, B. A. In press. Cognitive systems for revenge 471 and forgiveness. Behav Brain Sci

472 11 Hagen, E. H. & Hammerstein, P. R. 2006 Game theory and : A critique 473 of some recent interpretations of experimental games. Theor Popul Biol 69, 339-348. 474 (DOI:10.1016/j.tpb.2005.09.005)

475 12 Kümmerli, R., Burton-Chellew, M. N., Ross-Gillespie, A. & West, S. A. 2010 Resistance 476 to extreme strategies, rather than prosocial preferences, can explain human cooperation in 477 public goods games. Proc Natl Acad Sci USA 107, 10125-30. 478 (DOI:10.1073/pnas.1000829107)

479 13 Krasnow M. M., Cosmides L., Pedersen E. J., & Tooby J. 2012 What Are Punishment and 480 Reputation for? PLoS ONE 7: e45662. doi:10.1371/journal.pone.0045662

Altruistic punishment: A closer look 22

481 14 McCullough M. E., Pedersen, E. J., Schroder, J. M., Tabak, B. A., & Carver, C. S. 2012 482 Harsh childhood environmental characteristics predict exploitation and retaliation in 483 humans. Proc R Soc B 20122104. doi: 10.1098/rspb.2012.2104

484 15 Carpenter, J. P. & Matthews, P. H. In press. Norm Enforcement: Anger, indignation, or 485 reciprocity? J Eur Econ Assoc

486 16 Orne, M. 1962 On the of the psychological experiment: With particular 487 reference to demand characteristics and their implications. Am Psychol 17, 776-783.

488 17 Dana, J., Weber, R. A. & Kuang, J. 2007 Exploiting Moral Wiggle Room: Experiments 489 Demonstrating an Illusory Preference for Fairness. Econ Theor 33, 67-80. 490 491 18 Nelissen, R. M. A. 2008 The price you pay: cost-dependent reputation effects of altruistic 492 punishment. Evol Hum Behav 29, 242-248.

493 19 Johnstone, R. A. & Bshary, R. 2004 The evolution of spiteful behavior. Proc R Soc B 271, 494 1917-1922.

495 20 Sell, A, Tooby, J. & Cosmides, L. 2009 Formidability and the logic of human anger. Proc 496 Natl Acad Sci USA 106, 15073-15078.

497 21 Lieberman, D., & Linke, L. 2007 The effect of social category on third party punishment. 498 Evol Psychol 5, 289-305.

499 22 Kurzban, R., DeScioli, P. & O’Brien, E. 2007 Audience effects on moralistic punishment. 500 Evol Hum Behav 28, 75-84. (DOI:10.1016/j.evolhumbehav.2006.06.001)

501 23 Bolton, G. E., & Zwick, R. (1995). Anonymity versus punishment in ultimatum 502 bargaining. Games and Economic Behavior 10, 95-121.

503 24 Selten, R. 1967 Die Strategiemethode zur Erforschung des eingeschränkt rationalen 504 Verhaltens im Rahmen eines Oligopolexperiments. In Beiträge zur experimentellen 505 Wirtschaftsforschung, (ed H. Sauermann), pp. 136-168.

506 25 Weber, S. J. & Cook, T. D. 1972 Subject effects in laboratory research: An examination of 507 subject roles, demand characteristics, and valid inference. Psychol Bul 77, 273-295. 508 (DOI:10.1037/h0032351)

509 26 Almenberg J., Dreber A., Apicella C. L., & Rand D.G. 2011 Third Party Reward and 510 Punishment: Group Size, Efficiency and Public Goods, in Psychology of Punishment 511 Nova Science Publishers. Eds. NM Palmetti et al.

512 27 Wilson, T. D. & Gilbert, D. T. 2005 Affective forecasting: Knowing what to want. Curr 513 Dir Psychol Sci 14, 131-134.

Altruistic punishment: A closer look 23

514 28 Cook, K. S. & Yamagishi, T. 2008 A defense of deception on scientific grounds. Soc 515 Psychol Q 71, 215-221.

516 29 Kawakami, K., Dunn, E., Karmali, F. & Dovidio, J. 2009 Mispredicting affective and 517 behavioral responses to . Science 323, 276-278. (DOI:10.1126/science.1164951)

518 30 Hareli, S. & Weiner, B. 2002 Dislike and envy as antecedents of pleasure at another’s 519 misfortune. Motiv Emot 26, 257-277.

520 31 Reuben, E. & van Winden, F. 2008 Social ties and coordination on negative reciprocity: 521 The role of affect. J Public Econ 92, 34-53. (DOI:10.1016/j.jpubeco.2007.04.012)

522 32 Aronson, E., Ellsworth, P., Carlsmith, J. & Gonzales, M. 1990 Methods of research in 523 social psychology. : McGraw-Hill.

524 33 Camerer, C. 2003 Behavioral game theory. New York: Princeton University Press.

525 34 Ernest-Jones, M., Nettle, D. & Bateson, M. 2011 Effects of eye images on everyday 526 cooperative behavior: A field experiment. Evol Hum Behav 32, 172-178. 527 (DOI:10.1016/j.evolhumbehav.2010.10.006)

528 35 Petersen, M. B., Sell, A., Tooby, J. & Cosmides, L. 2010 Evolutionary psychology and 529 criminal justice: A recalibrational theory of punishment and reconciliation. In Human 530 Morality and Sociality: Evolutionary and comparative perspectives (ed H. Høgh-Oleson), 531 pp. 72-131. New York: Palgrave MacMillan.

532 36 Batson, C. D., Kennedy, C. L., Nord, L., Stocks, E. L., Fleming, D. A., Marzette, C. M., 533 Lishner, D. A., Hayes, R. E., Kolchinsky, L. M. & Zerger, T. 2007 Anger at unfairness: Is 534 it moral outrage? Eur J Soc Psychol 1285, 1272-1285. (DOI:10.1002/ejsp)

535 37 Zizzo, D. & Oswald, A. 2001 Are people willing to pay to reduce others’ incomes? 536 Annales d’Economie et de Statistique 63-64, 39-65.

537 38 Price, M. E., Cosmides, L. & Tooby, J. 2002. Punitive sentiment as an anti-free 538 rider psychological device. Evol Hum Behav 23, 203-231.

539 39 Phillips, S., Cooney, M., Carr, T. & Frady, B. 2005 Aiding peace, abetting violence: Third 540 parties and the management of conflict. Am Sociol Rev 70, 334-354.

541 40 Brandts, J. & Charness, G. 2011 The strategy versus the direct-response method: a first 542 survey of experimental comparisons Exp Econ 14, 375-398.

543 41 Gintis, H. 2000 Strong reciprocity and human sociality. J Theor Biol 206, 169-179. 544 (DOI:10.1006/jtbi.2000.2111)

Altruistic punishment: A closer look 24

545 42 Delton, A. W., Krasnow, M. M., Cosmides, L. & Tooby, J. 2011 Evolution of direct 546 reciprocity under uncertainty can explain human generosity in one-shot encounters. Proc 547 Natl Acad Sci USA (DOI:10.1073/pnas.1102131108)

548 43 Raihani, N. J., Grutter, A. S. & Bshary, R. 2010 Punishers benefit from third-party 549 punishment in fish. Science 327, 171. (DOI:10.1126/science.1183068)

550 44 Smith, J. E., Van Horn, R. C., Powning, K. S., Cole, A. R., Graham, K. E., Memenis, S. 551 K. & Holekamp, K. E. 2010 Evolutionary forces favoring intragroup coalitions among 552 spotted hyenas and other animals. Behav Ecol 21, 284-303. (DOI:10.1093/beheco/arp181)

553

Altruistic punishment: A closer look 25

554 Figure Captions

555

556

557 Figure 1. Game structure

558 (a.) Round 1. All players started with $5. Subject was either Recipient or 3rd Party. The Dictator

559 (a fictitious player whose “decisions” were determined by computer script) either took $0 (fair

560 conditions; unbolded) or $4 (unfair conditions; bolded) from Recipient. (b.) Round 2. Players

561 started with $5. Subject was Dictator; previous “Dictator” was Recipient; 3rd party was excluded.

562 Subject was allowed to give any portion of $5, do nothing, or pay a 1:4 cost to deduct money

563 from Recipient. Money deducted from Recipient in Round 2 was “burned”—it was not gained by

564 Dictators as income.

565

Altruistic punishment: A closer look 26

566

567 Figure 2. Punishment/reward distributions (Experiment 1)

568 Amount of money (in $) the subject punished (negative values) or rewarded (positive values) the

569 Round 1 Dictator (N = 270). Values are in terms of the effect on the Round 1 Dictator, not the

570 cost to the subject (cost-to-punish ratio = 1:4; cost-to-reward ratio = 1:1).

571

Altruistic punishment: A closer look 27

572

573 Figure 3. Self-reported anger (Experiments 1, 2a and 2b)

574 Self-reported anger (scale: 0-5) toward Dictator following Round 1, controlling for envy (N =

575 943). Error bars = +/- 1 SE.

576

Altruistic punishment: A closer look 28

577

578 Figure 4. Punishment ≥ $4 (Experiments 1, 2a and 2b)

579 Proportion of subjects in unfair conditions that punished (Experiment 1) or reported they would

580 punish (Experiments 2a and 2b) ≥ $4 (N = 597). Error bars = +/- 1 SE.

581

Altruistic punishment: A closer look 29

582 Electronic Supplementary Material 583 584 S1. Supporting Methods and Results 585 586 S1.1 Random assignment to experimental condition and cell sizes 587 588 Experiment 1: The condition in which witnesses of unfairness started with $9 was significantly 589 smaller than the other conditions because we stopped enrolling subjects in this condition to 590 increase power for statistical analyses involving the four cells of the 2 (Target: Self, Other) x 2 591 (Treatment: Fair, Unfair) design. This decision was made prior to any analysis of the data from 592 this condition and was based solely on an attempt to maximize the number of subjects in each of 593 the other four cells, given the rate at which we managed to recruit subjects into the study. All 594 other cell size differences are due to the random nature of the assignments. 595 596 Experiments 2a and 2b: Cell size differences are due to random assignment. The discrepancy in 597 sample sizes between punishment/rewarding analyses and emotion analyses is a result of our 598 adding the emotion questions halfway through data collection. 599 600 S1.2 Data excluded from Experiment 1 analyses based on debriefing responses 601 602 Data from 26 subjects (12 female; Age: M = 18.85, SD = 1.62) were excluded from all analyses, 603 figures, and tables (including ESM) because they expressed scepticism during debriefing (see 604 ESM Appendix for debriefing script) that they had been interacting with other people. Decisions 605 to exclude individual subjects were made without knowledge of their experimental data. The 606 number of subjects excluded did not vary by condition, χ2 (4, N = 341) = .459, p = .977, 607 suggesting that none of the conditions induced greater scepticism than any other. 608 609 Additionally, we re-ran all of the analyses with flagged subjects included and found that the 610 results did not change qualitatively in any way—all of the significant relationships presented in 611 the main text remained significant when flagged subjects were added to the analyses. 612 613 S1.3 Lexical decision task measure of implicit anger following round 1 (Experiment 1): 614 Method and results 615 616 In addition to the self-report anger data reported in the main paper for Experiment 1, we 617 collected data from a lexical decision task (LDT) following Round 1: Subjects decided, as 618 quickly as possible, whether a string of letters was a word or a non-word; reaction times to 15 619 hostility-related words (e.g., “angry,” “kill”) in the LDT serve as an implicit measure of anger 620 (with faster reaction times indicating more anger; see refs 34,35). 621 622 A 2 (Target: Self, Other) x 2 (Treatment: Fair, Unfair) ANOVA revealed a significant 623 target*treatment interaction for LDT anger following Round 1, F (1, 261) = 4.66, p = .032, 624 partial 2 = .02 (see Fig. S1). Main effects were not significant. The reaction times to hostility- 625 related words for subjects who witnessed another person being treated unfairly (natural log- 626 transformed ms, M = 6.70, SD = 0.28, N = 62) did not differ from those who witnessed another 627 being treated fairly (M = 6.66, SD = 0.27, N = 80), p = .416, partial 2 = .00. However, subjects

Altruistic punishment: A closer look 30

628 who were treated unfairly (M = 6.58, SD = 0.29, N = 59) had faster reaction times than did 629 subjects who were treated fairly (M = 6.69, SD = 0.32, N = 64), p = .029, partial 2 = .02, 630 suggesting greater anger among those who had been treated unfairly. 631 632 Reaction times to neutral words are typically controlled for in analysing LDT data [34,35]. 633 However, we did not control for reaction times to neutral words in our analysis because (a) 634 reaction times to neutral words did not differ across conditions, F (3, 261) = 1.54, p = .205, 635 partial 2 = .02, and (b) the inclusion of reaction times to neutral words as a covariate eliminated 636 the significant interaction. Because statistical control of a non-significant covariate decreases 637 power by subtracting degrees of freedom from the error term while not removing adequate sums 638 of squares for error [36], we dropped it from our analysis. 639 640 S1.4 Fairness/moral wrongness of the round 1 dictator’s behavior (Experiment 1): 641 Witnesses and receivers view the transgression identically. 642 643 A possible explanation for the differences between the emotional and behavioral responses of 644 recipients and witnesses of unfairness is that recipients paid more attention to the Dictator’s 645 unfair behavior. However, the target*treatment interaction was not significant for perceived 646 fairness (F (1, 266) = 1.73, p = .190) or moral wrongness (F (1, 266) = 1.33, p = .249) of the 647 Dictator’s decision (see Fig. S2). There were, however, significant effects of treatment: Unfair 648 Dictators’ behavior was perceived as more unfair, F (1, 266) = 354.13, p < .001, and morally 649 wrong, F (1, 266) = 182.03, p < .001, than fair Dictators’ behavior. Thus, subjects in both unfair 650 Dictator conditions clearly judged that a transgression had occurred, but they were only angered 651 and motivated to punish when they personally were targets of unfairness—consistent with 652 previous findings [27]. 653 654 S1.5 Effects of round 1 dictator behavior on self-reported anger toward the round 1 655 dictator, envy uncontrolled 656 657 When envy was not controlled, witnesses of unfairness did appear to get angry at unfair 658 Dictators, relative to witnesses’ anger in response to fair dictators: A 2 (Target: Self, Other) x 2 659 (Treatment: Fair, Unfair) ANOVA revealed significant main effects of both target, F (1, 266) = 660 10.81, p = .001), and treatment, F (1, 266) = 85.09, p < .001) for anger, and a significant 661 target*treatment interaction for anger, F (1, 266) = 12.44, p < .001). Witnesses of unfairness (M 662 = .813, SE = .110, N = 65) reported more anger than did witnesses of fairness (M = .198, SE = 663 .098, N = 80), p < .001, partial 2 = .06, and recipients of unfairness (M = 1.55, SE = .115, N = 664 61) reported more anger than recipients of fairness (M = .172, SE = .108, N = 64), p < .001, 665 partial 2 = .22. However, as discussed in the main text, witnesses of unfairness reported no 666 more anger than witnesses of fairness when envy was statistically controlled. Thus, witnesses’ 667 anger at unfair Dictators can be attributed to envy rather than to moralistic anger. 668

Altruistic punishment: A closer look 31

669 S2. Supplementary Notes 670 671 S2.1 The use of deception 672 673 Though experimental economists typically resist the use of deception in experiments, its use here 674 is justified: there was no practical way we could have obtained a sufficiently large sample size 675 without deception. We sought to gather data from at least 50 subjects per cell of our main 2x2 676 design in order to have adequate statistical power. Without the use of deception, we would have 677 had to rely on a minimum of 100 subjects, in the role of the Round 1 Dictator, to take exactly 678 $4.00 from the Recipient (unfair conditions) and at least 100 more subjects to take exactly $0.00 679 from the Recipient (fair conditions). Assume that Round 1 Dictators would be equally likely to 680 make one of the 11 possible choices on the give-take continuum, from giving $5 to taking $5, in 681 whole-dollar increments.* We would need N = 1,100 (100 subjects per choice*11 choices) to 682 achieve the same statistical power without the use of deception as with 200 subjects in our actual 683 paradigm. Considering that our interest lies entirely with how subjects responded to Dictator 684 actions, and not the actions themselves, such a design would be wasteful of subjects’ time 685 (thereby altering the ratio of benefits to risks of the experiment, and thus its ethicality) and 686 resources. 687 Below we address some possible concerns with the use of deception in our method, and why 688 these concerns do not affect the validity of our results. 689 Authenticity of debriefing responses. It could be argued that our debriefing process was 690 inadequate to identify all cases of scepticism of our deception among our participants because 691 participants were being compensated both with money and partial course credit; that is, perhaps 692 they were coerced into responding that they believed the deception because they felt they would 693 not receive compensation if they responded that they did not. This is highly unlikely. Participants 694 read and sign consent forms prior to participation in any experiment at the University of 695 Miami—as is the case with any IRB-approved study involving human subjects in the United 696 States—that explicitly state that participants cannot be denied compensation based on their 697 responses; actual amount of money earned may vary based on decisions in experimental tasks, 698 but course credit is required to be granted once a consent form is signed. 699 Possible contamination of the subject pool. If participants are regularly subjected to experiments 700 in which deception is used, it could be argued that perhaps they come to expect to be deceived in 701 future experiments in which they participate [37], and consequently provide responses that do 702 not accurately reflect natural behavior. This is unlikely to be a significant issue in the psychology 703 department’s subject pool at the University of Miami for several reasons. First, our subject pool 704 consists only of undergraduates currently enrolled in a specific introductory psychology course; 705 once they have completed the course they are no longer eligible to participate in experiments. 706 Second, members of the subject pool only participate in five total hours of experiments, which 707 translates to participation in approximately two to four experiments. Third, many of the 708 experiments that members of our subject pool participate in do not involve deception. Thus, it is

* It is probably unrealistic to assume that subjects would be equally likely to choose any one of the 11 options, but this oversimplification is more than adequate to demonstrate the present point. Also, in our actual design with predetermined Round 1 Dictator actions, subjects in Round 2 were allowed to reward or punish any amount in this range, down to the nearest cent: they made their decision by typing in a precise amount.

Altruistic punishment: A closer look 32

709 unlikely that participants in our subject pool have become accustomed to participating in 710 experiments involving deception to the point where their behavior is altered by its expectation.

711 Running subjects in individual sessions. The decision to run subjects either individually or in 712 groups presented a trade-off between experimental control and a more realistic setting. We 713 decided to run subjects individually as the benefits of doing so greatly outweighed the potential 714 costs of the less realistic setting. First, though subjects may have been more sceptical that they 715 were actually interacting with others since there were no other people in the room, the scenario 716 we presented was very plausible: subjects were told that two other participants we located 717 elsewhere in the psychology building (a five-story building with lots of foot traffic). Second, 718 running a third-party punishment study with two other people in the room introduces variables 719 relative to their physical appearances including: sex, clothing style, coalitional markings (e.g., 720 fraternity symbols, sports team logos), perceived formidability, friendliness, ethnicity, etc. Even 721 though subjects’ identities are anonymous in the game and subjects are separated with partitions 722 during play, subjects will inevitably interact sometime during the experimental session. Thus, 723 running subjects individually under the perfectly plausible guise of their interacting with other 724 people in another room avoids the potential experimental noise that can be introduced by these 725 other factors. 726 727 Concern has previously been raised that running subjects in isolated rooms may potentially 728 increase scepticism that interactions are legitimate, thereby influencing behavioural results. 729 Frohlich, et al. [38], empirically tested whether running a dictator game with isolated subjects led 730 to different results as compared with running the game with subjects in the same room. In their 731 discussion, Frolich, et al. state, “Contrary to the hypothesis, most measures of subjects’ 732 uncertainties were not significantly different as a result of the change in the number of rooms. 733 Indeed, only doubt that the money left in the envelope would be given to the paired other was 734 significantly reduced in the One Room experiments.” We note that this effect was only 735 statistically significant by one-tailed test (p = .038), whereas the effects on other, similar, 736 dependent variables were not statistically significant (even by one-tailed test: Did not view 737 experiment as a game; Not sure that description was accurate; and, most importantly, Not sure 738 that there were real people paired.) 739 740 S2.2 Inconsistency with previous experimental results 741 742 It could be argued that the reason we found no altruistic punishment in Experiment 1 is because 743 we changed multiple aspects of the standard design of the third-party punishment game at once. 744 Indeed, we made several changes in an attempt to minimize or eliminate (1) audience effects, (2) 745 experimental demand, (3) affective forecasting, and (4) potential extraneous variables introduced 746 by brief interactions with other participants. However, there are several reasons this argument is 747 not compelling. First, we observed a significant amount of second-party punishment so, clearly, 748 there was not a general suppression of punishment in our design. Second, the cost of punishment 749 in our design (1:4) was less expensive than the 1:3 cost-to-punishment ratio typically used in the 750 third-party punishment game [e.g., 6]. Indeed, our design should have encouraged punishment 751 relative to previous designs. Third, our purpose here was to test for the presence of altruistic 752 third-party punishment in a well-controlled experimental design, not to iteratively make changes 753 to a design that contained several features that likely produced artefactual results—to achieve 754 this well-controlled design, all of our changes needed to be made simultaneously.

Altruistic punishment: A closer look 33

755 756 References

757 34 Ayduk, O., Mischel, W. & Downey, G. 2002 Attentional mechanisms linking rejection to 758 hostile reactivity: The role of “hot” versus “cool” focus. Psychol Sci 13, 443-448.

759 35 Gollwitzer, M. & Denzler, M. 2009 What makes revenge sweet: Seeing the offender 760 suffer or delivering a message? J Exp Soc Psychol 45, 840-844. 761 (DOI:10.1016/j.jesp.2009.03.001)

762 36 Tabachnick, B. G. & Fidell, L. S. 1989 Using Multivariate Statistics. New York: Harper 763 & Row.

764 37 Hertwig, R. & Ortmann, A. 2001 Experimental practices in economics: A methodological 765 challenge for psychologists? Behav Brain Sci 24, 383-451.

766 38 Frohlich, N., Oppenheimer, J., & Bernard Moore, J. 2001. Some doubts about measuring 767 self-interest using dictator experiments: the costs of anonymity. J Econ Behav Organ 46, 768 271-290.

769 770

Altruistic punishment: A closer look 34

771 S3. Supplementary Figures and Captions

772 773 Figure S1. 774 Reaction times to hostility-related words in the lexical decision task. 775

Altruistic punishment: A closer look 35

776 777 Figure S2. 778 Ratings of the fairness and moral wrongness of the Round 1 Dictator’s decision. Scale from 1 779 (Not at all fair/morally wrong) to 9 (Totally fair/morally wrong). 780

Altruistic punishment: A closer look 36

781 S4. Supplementary Tables Unfair - Unfair - Fair – Unfair - Fair - Witness Witness Overall Recipient Recipient Witness ($5) ($9) Variable Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD $ Punished/Rewarded* -0.18 1.46 0.17 0.68 -1.12 2.08 0.34 1.05 -0.24 1.37 -0.22 1.42 LDT RT** 6.65 0.29 6.69 0.32 6.58 0.29 6.66 0.28 6.70 0.28 6.61 0.31 Moral wrongness 3.34 2.67 1.31 1.19 4.48 2.49 1.61 1.79 5.18 2.30 5.13 2.31 Fairness 6.07 2.95 8.61 1.08 3.64 2.40 8.48 1.24 4.26 2.43 4.07 2.00 Anger 0.64 1.01 0.14 0.44 1.54 1.29 0.17 0.46 0.84 1.07 0.67 0.88 Envy 0.85 1.17 0.27 0.61 1.47 1.40 0.36 0.76 1.48 1.27 0.80 1.02 782 * Negative values indicate punishment 783 ** Response time to hostility-related words in lexical decision task, ln-transformed ms 784 785 Table S1. Variable summary statistics – Experiment 1. 786

Altruistic punishment: A closer look 37

Fair – Unfair - Fair - Unfair - Overall Recipient Recipient Witness Witness Variable Mean SD Mean SD Mean SD Mean SD Mean SD $ Punished/Rewarded* -0.23 2.08 0.05 1.69 -0.59 2.48 0.17 1.51 -0.45 2.28 Moral wrongness 3.08 3.23 1.04 2.13 5.32 3.04 0.79 1.81 4.72 2.68 Fairness 5.28 3.38 7.85 2.21 2.64 2.57 7.74 2.01 3.37 2.58 Anger 1.12 1.32 0.16 0.52 2.25 1.20 0.28 0.64 1.63 1.13 Envy 0.98 1.25 0.46 0.92 1.85 1.39 0.34 0.69 1.12 1.16 787 * Negative values indicate punishment 788 789 Table S2. Variable summary statistics – Experiment 2a. 790

Altruistic punishment: A closer look 38

Fair – Unfair - Fair - Unfair - Overall Recipient Recipient Witness Witness Variable Mean SD Mean SD Mean SD Mean SD Mean SD $ Punished/Rewarded* 0.25 1.83 0.42 1.71 -0.04 1.99 0.58 1.74 0.06 1.79 Moral wrongness 2.87 2.88 0.75 1.79 4.56 2.41 1.00 1.92 4.97 2.32 Fairness 5.40 3.25 7.92 2.17 3.24 2.24 7.70 2.34 3.02 2.30 Anger 0.90 1.15 0.15 0.51 1.60 1.21 0.24 0.62 1.51 1.11 Envy 0.92 1.26 0.39 0.91 1.68 1.28 0.21 0.53 1.27 1.42 791 * Negative values indicate punishment 792 793 Table S3. Variable summary statistics – Experiment 2b. 794

Altruistic punishment: A closer look 39

795 Appendix 796 797 Debriefing script followed by experimenter: 798 799 (1) Tell participant that the study is over. Ask if he/she has any questions. If the questions are 800 about hypotheses or the deceptive elements of the experiment, explain that you will address 801 those specific questions in just a few moments.

802 (2) Ask whether entire the experiment was clear in its overall purpose, and whether all aspects of 803 the procedure made sense. Was there anything that the participant found confusing unclear? 804 “Were you, at any point, unsure about what we were asking you to do?”

805 (3) We would find it very helpful to hear about any of your personal feelings and reactions to the 806 experiment. Probe about what made person feel the way they felt.

807 (4) Today’s experiment was designed to help us test some very specific hypotheses about human 808 behavior. Do you have any idea what those hypotheses were? If you had to guess, what would 809 you say were the hypotheses we were testing today? We would like to know as many of your 810 guesses about our hypotheses as you can come up with.

811 (5) Ask whether participant found any aspect of the procedure odd, upsetting or disturbing.

812 (6) Did you wonder at any point whether there was more than meets the eye to any of the 813 procedures that we had you complete today? That is, do you think that there might have been any 814 information that I held back from explaining to you about the experiment until now? Ask 815 participant to say more about their suspicions, and to elaborate on their questions about the 816 procedure.

817 (7) Ask how participant thinks (the suspicions he/she mentioned) affected his or her behavior 818 during the study.

819 The experimenter then fully explained the nature of the deception to the participant and why it 820 was a necessary part of the experiment. The experimenter also discussed the aims of the research 821 with the participant, and answered any questions the participant had about their experience. 822 Lastly, the experimenter and participant discussed possible ways for the participant to talk about 823 the experiment with his or her peers in a manner that, while honest, would not spoil the 824 deception for others in the subject pool.

825 Our experimenters reported anecdotally that participants generally found the experiment fun and 826 interesting, and were typically fascinated with the study design after the deception was revealed 827 in debriefing. Furthermore, the experimenters’ discussion of how to talk to other about the study 828 helped to, in a sense, bring participants in as collaborators on the research such that (a) they 829 would not feel that they had been taken advantage of, and (b) they would not spoil the study for 830 others in the subject pool.