Altruistic punishment: A closer look 1
1 In press: Proceedings of the Royal Society B: Biological Sciences, 280(1758). 2 3 4 Do humans really punish altruistically? A closer look 5 6 Eric J. Pedersena, Robert Kurzbanb,c, and Michael E. McCullougha 7
8 aDepartment of Psychology, University of Miami, FL 33124-0751 USA 9 bDepartment of Psychology, University of Pennsylvania, PA 19104-6241 USA 10 cDepartment of Economics, University of Alaska Anchorage, AK 99508 USA 11 12 Key Words: cooperation, altruistic punishment, third-party punishment, affective forecasting, 13 evolutionary psychology 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Corresponding Author: 31 Michael E. McCullough 32 Department of Psychology 33 University of Miami 34 P.O. Box 248185 35 Coral Gables, FL 33124-0751 36 Phone: 305-284-8057 37 Fax: 305-284-3402 38 Email: [email protected]
Altruistic punishment: A closer look 2
39 Abstract
40 Some researchers have proposed that natural selection has given rise in humans to one or more
41 adaptations for altruistically punishing on behalf of other individuals who have been treated
42 unfairly, even when the punisher has no chance of benefiting via reciprocity or benefits to kin.
43 However, empirical support for the altruistic punishment hypothesis depends on results from
44 experiments that are vulnerable to potentially important experimental artefacts. Here we searched
45 for evidence of altruistic punishment in an experiment that precluded these artefacts. In so doing,
46 we found that victims of unfairness punished transgressors whereas witnesses of unfairness did
47 not. Furthermore, witnesses’ emotional reactions to unfairness were characterized by envy of the
48 unfair individual’s selfish gains rather than by moralistic anger toward the unfair behaviour. In a
49 second experiment run independently in two separate samples, we found that previous evidence
50 for altruistic punishment plausibly resulted from affective forecasting error—that is, limitations
51 on humans’ abilities to accurately simulate how they would feel in hypothetical situations.
52 Together, these findings suggest that the case for altruistic punishment in humans—a view that
53 has gained increasing attention in the biological and social sciences—has been overstated.
54
Altruistic punishment: A closer look 3
55 Do humans really punish altruistically? A closer look
56 In many animal species, including humans, individuals punish conspecifics that have
57 harmed them [1–3]. Some researchers have recently argued that humans, unlike other animals,
58 also altruistically punish individuals who have harmed others, even when the punisher has no
59 chance of benefiting via reciprocity or benefits to kin [4–6]. Results from several economics
60 experiments appear to support this claim [4,6,7], but some scholars have questioned both the
61 adaptationist logic behind such theoretical claims [8–10] and the interpretation of the empirical
62 results [8,10–14]. Here we elide these theoretical debates and instead investigate a more basic
63 empirical question: Do people actually spontaneously punish individuals who have only harmed
64 other individuals in anonymous settings in the laboratory? Put differently, do the empirical
65 research findings often marshalled in support of the altruistic punishment hypothesis [e.g., 4,6]
66 provide a reliable guide to the presence or absence of a propensity for altruistic punishment in
67 humans?
68 In previous work, researchers claimed empirical support for the existence of altruistic
69 punishment on the basis of results from public goods game experiments in which the individual
70 being punished had harmed—or failed to help—the putative punisher as well as other victims
71 [4], leaving open the possibility that the punishment was vengeful, rather than altruistic [10].
72 Results from similar experiments that exclude revenge as a possible motive suggest that
73 investments in punishment in such contexts are conspicuously low [15]. Additional data
74 frequently adduced in support of the altruistic punishment hypothesis come from third-party
75 punishment games [6,7], in which a Dictator chooses to give some portion of a sum of money
76 (or none) to a passive Recipient. A third player can punish the Dictator (at a cost) in response to
Altruistic punishment: A closer look 4
77 the Dictator’s transfer to the Recipient. Many third parties in games with this structure pay a cost
78 to punish stingy Dictators, despite receiving no financial benefit from doing so [6,7].
79 However, five methodological limitations of the standard third-party punishment game
80 might conspire to yield inflated estimates of humans’ propensity to punish strangers for having
81 behaved unfairly toward other strangers. First, in the standard game, subjects are assigned to a
82 third-party role that implies their task is to determine how much to punish the Dictator: Indeed,
83 the only choices third parties can make are whether to punish the Dictator [16]—and, if so, how
84 much. Thus, any error will lead to an increase in the estimated quantity of punishment. Second,
85 punishment in the third-party punishment game is typically administered with the presence (or
86 inferred presence) of an audience: punishment of the Dictator by the third party is witnessed by
87 the initial victim because all players see the results of the game. The presence of an audience
88 introduces reputational considerations that could motivate punishment as a means of pursuing
89 indirect fitness benefits (e.g., by signaling one’s quality as a cooperative partner [17, 18]; or
90 one’s formidability to prevent future exploitation of oneself [19, 20] or one’s friends and kin
91 [21]). Indeed, it has been shown—though with a different paradigm—that observers of unfair
92 treatment punish third parties significantly less when they are assured no one will see their
93 decision [22; however, see also ref. 23].
94 Third, the third-party punishment game is typically conducted with the “strategy method”
95 [24], which requires third parties to repeatedly respond to a series of hypothetical Dictator
96 choices—in advance of learning of the Dictator’s actual choice—that are progressively more (or
97 progressively less) unfair [6]. Such methods can cause subjects to infer that the experimenters
98 expect them to vary their responses according to some feature that varies across the set of
99 repeated scenarios [25]. Consequently, due to a well-known experimental artefact called demand,
Altruistic punishment: A closer look 5
100 subjects might feel compelled to punish at least some of the time, calibrating those decisions to
101 the only feature of the Dictators’ repeated choices that varies: how unfair they are. This is
102 especially problematic in the standard third-party punishment game because rewarding is not
103 allowed; the only way subjects can vary their responses is to vary their amount of punishment. In
104 a notable exception, Alemberg, et al. [26] did add a rewarding option to the typical third-party
105 punishment game (conducted with the strategy method), and a small amount of third-party
106 punishment was observed, on average, when Dictators transferred $0 (of $10) to the Recipient.
107 We note, however, that subjects in this experiment were informed, before making their decisions,
108 that it was possible they would not be paired with another subject—in such a case, their
109 decisions would not be enacted and they would retain all of their money (i.e., participants’
110 decisions were somewhat hypothetical; see below).
111 Fourth, the strategy method also involves affective forecasting [27] inasmuch as it
112 requires subjects to respond ex ante to Dictator actions that have not yet occurred. Such
113 behavioural commitments can differ from the actual behaviours people enact after experiencing
114 social situations directly because people frequently weight the features of social situations
115 differently during conscious deliberation than they do after experiencing those social situations
116 in real time [28]. For example, as forecasters, people severely overestimate how upset they
117 would feel by (and subsequently, how much they would attempt to avoid interacting with)
118 someone who had made a racist comment; in contrast, subjects who have actually observed
119 another individual express strongly racist attitudes (versus those in control conditions) respond
120 with relative indifference to the racist individual [29].
121 Fifth, previous claims that anger is the predominant emotional response of third-party
122 punishers have relied on self-reports of anger in response to hypothetical scenarios [4,6]. Self-
Altruistic punishment: A closer look 6
123 reports of anger are typically highly correlated with self-reports of other, similar emotions—
124 including envy [30]. To the extent that the covariation between self-reported anger and self-
125 reported envy is not statistically controlled, estimates of third parties’ anger toward unfair
126 strangers might actually reflect envy, which can also motivate costly punishment in pursuit of
127 goals that are quite distinct from putatively altruistic goals such as enforcing norms or delivering
128 deterrence benefits to strangers [31]: Specifically, if third parties’ punishment of individuals who
129 have treated another individual unfairly is motivated by envy, but not by anger, then the
130 mechanisms that motivate third-party punishment might process cues that another individual has
131 obtained better outcomes than the self, rather than cues that an individual has violated a norm or
132 harmed an anonymous third party in whom the punisher has no fitness interest [e.g., 6,7].
133 Here we present two experiments designed to test whether subjects punish altruistically
134 on behalf of strangers in a third-party punishment game that was designed to rectify the
135 methodological problems noted above. We also examined whether previous findings could
136 plausibly be explained as a product of affective forecasting errors. We note that our goal was not
137 to estimate the unique influence of each of these five potential methodological problems; rather,
138 our goal was to test whether the altruistic punishment hypothesis could survive falsification in an
139 experiment that eliminated these problems. Experiment 1 was a modified third-party punishment
140 game in which subjects could either punish or reward—thereby reducing experimental demand
141 for punishment [25], the confounding of error and punishment, and potential audience effects.
142 Also, subjects made decisions about giving or deducting money from Dictators after witnessing
143 the Dictator’s decision, which enabled us to measure third-party punishment without the
144 possibility of affective forecasting errors [27]. Additionally, our measures of emotion were fine-
145 grained enough that it was possible to evaluate the unique motivational roles of anger and envy.
Altruistic punishment: A closer look 7
146 In Experiments 2a and 2b, the same third-party punishment game was presented as a
147 hypothetical vignette to subjects from two different research pools.
148
149 Methods
150 Subjects 151 152 Experiment 1: Subjects were 315 University of Miami undergraduates (mean age = 19.12,
153 SD = 2.99; 57% female). They received partial course credit and monetary compensation (see
154 below).
155 Experiment 2a: 538 subjects (mean age = 34.37, SD = 12.14; 60% female) were recruited
156 via Amazon Mechanical Turk (http://www.mturk.com/mturk/welcome) and were paid $0.25 for
157 their participation. Participation was restricted to users in the United States. Because participants
158 merely had to read a vignette and then report their forecasts of how they would think, feel, and
159 behave if the hypothetical situation had actually happened to them, participation generally took
160 about 4 minutes.
161 Experiment 2b: We replicated Experiment 2a with University of Miami undergraduates;
162 394 subjects (mean age = 18.74, SD = 1.27; 53% female) participated for partial course credit.
163 Procedure
164 Experiment 1: Subjects were run in individual sessions at a computer station in an
165 isolated room (see ESM S2.1). The entire experiment, including instructions, was conducted on a
166 computer via E-Run with a script created in E-Prime version 2.0. After subjects provided
167 informed consent, they were told they would be interacting with two other players in the building
168 over the computer network and that it was important that those other people remain anonymous;
169 in fact, they interacted with a pre-programmed computer script. Without this deception, the
Altruistic punishment: A closer look 8
170 research would have been unfeasible (see ESM S2.1). Subjects were informed that they would be
171 participating in an economic decision-making game that would last for multiple rounds and they
172 would be paid based on the money they earned during the game. Because deception was
173 involved, everyone was paid a flat rate of $9 at the end of the experiment following a debriefing.
174 We used a “funnel debriefing” method designed to detect suspicion and explore subjects’
175 reactions to having been deceived [32]. Subjects flagged for suspicion were excluded from all
176 analyses presented; re-including them in analyses did not qualitatively affect the results in any
177 way (see ESM S1.2).
178 The decision-making game comprised two rounds (Fig. 1) in which each player was
179 given $5 to use in each round and assigned to one of three roles: Decision-Maker, Receiver, or
180 Observer. (We refer to these roles here as Dictator, Recipient, and Third Party, respectively, to
181 be consistent with labels used in previous work on third-party punishment.) Subjects were not
182 told the exact number of rounds to avoid end of game effects [33], and were told that money
183 earned during each round would be “banked” and thus unaffected by subjects’ behaviour during
184 subsequent rounds. The Dictator ostensibly had the option to give any portion of his or her $5 to
185 the Recipient, or take any portion of the Recipient’s $5; the Third Party would merely see the
186 results of the round and would not be affected by the Dictator’s choice. Subjects were informed
187 that in some rounds all players would be involved, and in other rounds some players might be
188 excluded. Subjects were randomly assigned to be either the Third Party or the Recipient in the
189 first round and the (computer-programmed) Dictator either took $4 or $0 from the Recipient. The
190 computer displayed a summary screen for the round showing the amount of money each player
191 earned for the round. Following the round, subjects completed a lexical decision task (see ESM
192 S1.3) and a series of self-report questions (see below).
Altruistic punishment: A closer look 9
193 Prior to role assignment for the second round, subjects were informed that there would be
194 no Third Party in Round 2 [to avoid potential audience effects; 22,34]; one player would be
195 assigned to a different task and be unable to see the results of the interaction. We note that the
196 presence of the experimenter can also induce audience effects e.g., [22]; we took great care to
197 minimize this potential influence by (a) clearly informing participants during the consent process
198 that their data would be stored completely anonymously and could not be connected to them in
199 any way, and (b) minimizing contact with the experimenter by presenting all instructions
200 electronically. Though we cannot rule out experimenter audience effects completely, our results
201 are as insulated from them as we believe it was possible to do in the context of this experiment.
202 All players were given another $5; because previous earnings had been “banked,” all
203 players started Round 2 with $5. The subject was assigned the role of Dictator while the Dictator
204 from Round 1 (who had treated either the subject or the other player fairly or unfairly) was
205 assigned the role of the Recipient (ostensibly by chance). Players were identified consistently
206 throughout, so subjects were aware that the Recipient in Round 2 was the same player that had
207 been the Dictator in Round 1. Subjects were instructed that they could give any amount of their
208 $5 to the Recipient, do nothing, or remove any amount of the Recipient’s $5 (the word
209 “punishment” was never used). Removing money cost one-fourth of the amount removed and,
210 unlike in the first round, was not gained by the subject as income—it simply disappeared. Note
211 that the cost of punishment used here, 1:4, was less expensive than the 1:3 cost typically used in
212 the third-party punishment game; previous research has shown that punishment becomes more
213 likely as the cost of punishment declines (see ref. [10] for review). Following the completion of
214 the round, the experiment ended and the experimenter debriefed the subject through an
Altruistic punishment: A closer look 10
215 extensive, staged process to assess the believability of the experiment and to explain why
216 deception was necessary [32].
217 Experiments 2a and 2b: After providing consent, subjects were instructed to imagine
218 themselves “…in a particular situation in our laboratory. Please try to picture yourself in the
219 situation we are describing. We will ask you to complete a series of questions regarding how you
220 think you would think, feel, and act in this situation.” The layout and instructions of the game
221 were presented as they were in Experiment 1, and the rounds of the game and the self-report
222 measures were the same. However, subjects did not complete a lexical decision task following
223 the first round and were not debriefed following the completion of the experiment (because no
224 deception was involved).
225 Psychometric information regarding the self-report measures 226 227 Self-rated emotions toward the other players: Subjects were asked to describe their emotional
228 responses toward both of the other players after the first round. They described their feelings
229 toward both players to avoid demand effects that might have occurred by probing only about the
230 Dictator. (Emotional reactions to the other player were not of theoretical interest here and so, in
231 the interest of brevity, we do not report them).
232 - Anger: 3-item composite of ratings on a scale from 0 (not at all) to 5 (extremely) the
233 extent to which the subject was “angry,” “mad,” and “outraged” at the Dictator
234 (Cronbach’s = .94).
235 - Envy: 2-item composite of ratings on a scale from 0 (not at all) to 5 (extremely) the extent
236 to which the subject was “envious” and “jealous” of the Dictator (Cronbach’s = .84).
Altruistic punishment: A closer look 11
237 Fairness/moral wrongness of the Round 1 dictator’s behaviour: Subjects were asked to rate
238 both how “fair” and how “morally wrong” the Dictator’s behaviour was toward the Recipient in
239 Round 1 on a scale from 1 (not at all) to 9 (totally).
240
241 Results
242 Experiment 1
243 Third parties did not punish on behalf of strangers: A one-sample Wilcoxon test (used
244 because distributions were non-normal) revealed that the sample median of the distribution of
245 dollars punished or rewarded in Round 2 (in terms of the effect on the Recipient, not the cost to
246 the subject) by third-party witnesses of unfairness did not differ significantly from a
247 hypothesized median of zero, Z = -1.48, p = .140, N = 65 (all p-values throughout manuscript are
248 two-tailed). In contrast, victims of unfairness punished a nonzero amount, Z = -3.52, p < .001, N
249 = 61—significantly more than mere witnesses of unfairness, p = .026, N = 126 (two-sample
250 median test; Fig. 2).
251 If the function of punishment is to deter harmdoers from imposing costs on oneself (i.e.,
252 to bargain for better treatment for oneself [10,13,20]) or others in the future—or even if its
253 function is to enforce adherence to social norms [6]—then the punishment must be strong
254 enough to erase unfairly gained benefits [1,35]. Otherwise, the harmdoer retains a net profit from
255 the transgression, and thus will retain an incentive to continue to behave unfairly toward others
256 in the future. Because unfair Dictators took $4 from Recipients in Round 1 of Experiment 1, $4
257 was also the minimum amount of punishment that would be expected to deter unfair Dictators
258 from behaving unfairly in the future. Third-party punishment of this magnitude was extremely
259 rare: only 2 of 65 (3%) witnesses imposed at least $4 worth of punishment on unfair Dictators, a
Altruistic punishment: A closer look 12
260 proportion no different from the proportion for witnesses of fairness (0 of 80; p = .199, Fisher’s
261 Exact Test). In contrast, 13 of 61 (21%) victims of unfairness punished at least $4, a proportion
262 significantly greater than that for both recipients of fairness (0 of 64; p < .001), and witnesses of
263 unfairness (p = .002). Indeed, most victims of unfairness who punished (13 of 21) imposed at
264 least $4 worth of punishment.
265 According to the self-report measures of emotion, third parties did not become angry at
266 unfairness: when controlling for envy (which was highly correlated with anger, r = .637, p <
267 .001; see ESM S1.5), a 2 (Target: Self, Other) x 2 (Treatment: Fair, Unfair) ANCOVA revealed
268 a significant target*treatment interaction for anger, F (1, 265) = 17.11, p < .001). Witnesses of
269 unfairness (M = .533, SE = .095, N = 65) did not report more anger than did witnesses of fairness
270 (M = .414, SE = .084, N = 80), p = .363, partial 2 = .00, but victims of unfairness (M = 1.28, SE
271 = .098, N = 61) did report more anger than their fairly-treated counterparts (M = .418, SE = .093,
272 N = 64), p < .001, partial 2 = .13 (Fig. 3; see ESM S1.3 and Fig. S1 for a replication with an
273 implicit measure based on reaction time data). Thus, people became angry when treated unfairly
274 but not when they only witnessed the unfair treatment of a stranger [cf. 36]. Importantly, this
275 difference in the anger of witnesses versus victims of unfairness was not due to different
276 perceptions of the transgression’s fairness or moral wrongness (see ESM S1.4 and Fig. S2).
277 Eleven of sixty-five witnesses of unfairness paid some cost to impose costs on the unfair
278 Dictator; with a much larger sample, one might argue, we therefore might have found statistical
279 evidence for mild third-party punishment. However, because the witnesses of unfairness were
280 not angry at the Dictator (see above), we suspected that the predominant emotional response
281 among witnesses of unfairness was envy given that they had observed the unfair Dictator obtain
282 a higher payoff ($9) than they themselves had received [$5; see ref. 37]. We found a significant
Altruistic punishment: A closer look 13
283 target*treatment interaction for envy (with anger partialed out), F (1, 265) = 4.53, p = .034:
284 witnesses of unfairness were more envious of the Dictator than were the witnesses of fairness, p
285 = < .001, partial 2 = .07. In contrast, victims of unfairness were no more envious than were their
286 fairly-treated counterparts, p = .306, partial 2 = .00. Thus, had we observed a significant amount
287 of third-party punishment among witnesses of unfairness, it plausibly could have been motivated
288 by envy toward the unfair Dictator rather than by moralistic anger. This difference in the
289 emotions of the witnesses and victims of the Dictator’s unfairness explains why third-party
290 punishment was quite rare and mild: witnesses of unfairness were envious of the Dictator’s ill-
291 gotten gains—but not angry—and so they likely were reluctant to spend their own money to
292 punish the Dictator.
293 During Experiment 1, we also ran a small fifth condition (N = 45; see ESM S1.1) in
294 which witnesses of unfairness started Round 1 with $9, enabling us to test whether witnesses’
295 economic disadvantage relative to unfair Dictators explained their surplus envy. Witnesses who
296 started Round 1 with $9 were significantly less envious of unfair Dictators (controlling for anger)
297 than were witnesses who started with $5, F (1, 107) = 8.35, p = .005, partial 2 = .07. Self-
298 reported anger (controlling for envy) did not differ between groups, F (1,107) = .581, p = .448,
299 partial 2 = .01. Witnesses of unfairness with $9 did not punish an amount significantly different
300 from zero, Z = -.879, p = .379, N = 45, nor differently than did the witnesses of unfairness with
301 $5, p = .475, N = 110, even though doing so would have cost a smaller proportion of their stake.
302 Thus, the emotional reactions of witnesses of unfairness were characterized by envy rather than
303 moralistic anger.
304 Experiment 2a
Altruistic punishment: A closer look 14
305 In Experiments 2a (online sample) and 2b (undergraduate sample), we investigated how
306 subjects’ affective and behavioural forecasts in a hypothetical scenario would compare to the
307 results from Experiment 1 (see Table S1 and Table S2 for descriptive statistics for both
308 experiments). This experiment was conducted not because we thought that participants’
309 hypothetical responses would provide a reliable assay of how they would behave in a real-life
310 situation, such as the one we explored in Experiment 1, but because we wished to compare
311 participants’ forecasts of how they might behave and feel with participants’ actual behaviour and
312 emotional reactions in Experiment 1. The rounds of the game were identical to Experiment 1,
313 except that subjects were instructed to report how they believed they would act and feel in
314 response to a hypothetical vignette.
315 In Experiment 2a—and in contrast to Experiment 1—witnesses of hypothetical unfairness
316 forecast that they would administer a greater-than-zero amount of punishment, Z = -2.38, p =
317 .017, N = 137, as did victims of hypothetical unfairness, Z = -2.66, p = .008, N = 148; witnesses
318 and victims of hypothetical unfairness did not differ significantly, p = .391. Four additional
319 results suggest that a different psychological process was at work in Experiment 2a than in
320 Experiment 1. First, witnesses of hypothetical unfairness forecast a much higher likelihood of
321 punishing at least $4 (the critical threshold for efficacious punishment) than did witnesses of
322 hypothetical fairness (p = .002, Fisher’s Exact Test). This proportion (22 of 137; 16%) was
323 significantly larger than what we saw in the actual behaviour of Experiment 1 subjects (2 of 65;
324 3%), p = .009, Fisher’s Exact Test, (Fig. 4). Second, the proportion of witnesses (16%) and
325 victims (31 of 148; 21%) of hypothetical unfairness that forecast punishing at least $4 did not
326 differ, p = .361, contrary to Experiment 1. Third, witnesses of hypothetical unfairness forecast a
327 significant amount of anger toward unfair Dictators in Experiment 2a, again in contrast to
Altruistic punishment: A closer look 15
328 Experiment 1: there was a significant target*treatment interaction (controlling for envy), F (1,
329 285) = 5.87, p = .016. Witnesses of hypothetical unfairness (M = 1.59, SE = .109, N = 133)
330 forecast more anger than did witnesses of hypothetical fairness (M = .419, SE = .116, N = 147), p
331 < .001, partial 2 = .16. Likewise, victims of hypothetical unfairness (M = 2.04, SE = .110, N =
332 139) forecast more anger than did recipients of hypothetical fairness (M = .345, SE = .108, N =
333 141), p < .001, partial 2 = .28. Fourth, both witnesses (F (1, 131) = 25.48, p < .001, partial 2 =
334 .16) and victims (F (138) = 10.69, p = .001, partial 2 = .07) of hypothetical unfairness forecast
335 significantly more anger (controlling for envy) toward the unfair Dictator than their counterparts
336 reported in Study 1 (Fig. 3). These results are therefore consistent with proposals that norm
337 violations elicit “negative emotions” [4,6], which in turn motivate altruistic punishment, but here
338 they resulted from affective forecasting rather than from responding to real-time events. (Recall,
339 in contrast, that in Experiment 1, which involved real-time behaviour and emotional responses
340 rather than forecasting, no such moral outrage was found).
341 Experiment 2b
342 The pattern of punishment results for Experiment 2b was virtually identical to those of
343 Experiment 2a: the proportion of witnesses of hypothetical unfairness that forecast they would
344 punish at least $4 (5 of 85; 6%) did not differ from that of victims of hypothetical unfairness (9
345 of 101; 9%), p = .580, Fisher’s Exact Test—a pattern that was similar to Experiment 2a, but
346 contrary to Experiment 1. Interestingly, neither witnesses Z = .183, p = .855, N = 85, nor victims
347 of hypothetical unfairness, Z = -.162, p = .872, N = 101, reported they would administer a
348 greater-than-zero amount of punishment. Nevertheless, both witnesses of hypothetical unfairness
349 (p = .031) and victims of hypothetical unfairness (p = .012) forecast they would punish
350 significantly more than did their (hypothetically) fairly-treated counterparts: this is because both
Altruistic punishment: A closer look 16
351 witnesses, Z = 2.33, p = .020, N = 97 and recipients of hypothetical fairness, Z = 2.98, p = .003,
352 N = 85, rewarded a greater-than-zero amount. Furthermore—and importantly—witnesses and
353 victims of hypothetical unfairness did not forecast different amounts of punishment, p = .743.
354 Thus, notwithstanding the fact that hypothetical punishment of unfairness appeared largely to
355 have taken the form of withdrawing reward (rather than imposing costs) for subjects in
356 Experiment 2b, the punishment results largely replicated those obtained in Experiment 2a (see
357 Fig. 4).
358 Moreover, the pattern of emotion results of Experiment 2b was identical to that of
359 Experiment 2a: witnesses of hypothetical unfairness (M = 1.43, SE = .100, N = 94) forecast more
360 anger (controlling for envy) than did witnesses of hypothetical fairness (M = .392, SE = .104, N =
361 92), p < .001, partial 2 = .13. Likewise, victims of hypothetical unfairness (M = 1.44, SE = .096,
362 N = 109) forecast more anger than did recipients of hypothetical fairness (M = .259, SE = .098, N
363 = 99), p < .001, partial 2 = .16. Witnesses (F (1, 144) = 18.53, p < .001, partial 2 = .11) but not
364 victims (F (157) = .005, p = .945, partial 2 = .00) of hypothetical unfairness also forecast
365 significantly more anger (controlling for envy) toward the unfair Dictator than the subjects in
366 Experiment 1 actually experienced (Fig. 3). The overall pattern of forecast behaviour and
367 emotion in Experiment 2b suggests that the students who were the subjects in Experiment 2b had
368 a slight tendency to believe that they would reward fair distributions, which the non-student
369 subjects in Experiment 2a did not share, but in every other way the results are identical to those
370 of Experiment 2a: subjects forecast that both experiencing and witnessing unfairness would
371 cause them to become angry and to punish dictators to a greater extent than did subjects who
372 forecast their responses to either receiving or witnessing fair treatment. Furthermore, both
373 experiencers and witnesses of unfairness forecast equivalent likelihoods of punishing at least $4.
Altruistic punishment: A closer look 17
374 Discussion
375 Experiment 1 indicates that, under the conditions we investigated, humans do not impose
376 meaningful amounts of third-party punishment on behalf of absolute strangers. The nominal and
377 statistically non-significant amount of punishment we did observe was apparently motivated by
378 envy because of a comparatively unfavourable personal outcome rather than by moralistic anger
379 on behalf of a mistreated stranger. Our finding that the emotional reaction to witnessing
380 unfairness is characterized by envy rather than moralistic anger is particularly inconvenient for
381 the altruistic punishment hypothesis: to categorize a behaviour as an adaptation for altruistic
382 benefit delivery, one needs to provide evidence that the psychological mechanisms that produce
383 the behaviour in question have been designed for that specific function. That is, one needs to
384 demonstrate that the behaviour is not caused by mechanisms designed for a different function.
385 The presence of envy, rather than moralistic anger, in response to witnessing unfairness, suggests
386 that the psychological mechanisms involved in third-party punishment are, at least in part,
387 designed to process cues that another individual has obtained better outcomes than oneself [38].
388 In contrast, we found no evidence that they are designed to process cues that an anonymous
389 stranger has been harmed. We do not mean to imply that humans do not impose any third-party
390 punishment: under some circumstances, they do [22,35,39]. However, our results cast doubt on
391 the proposal that the mechanisms that motivate third-party punishment are altruistic benefit-
392 delivery systems that are motivated proximately by moralistic anger.
393 Experiments 2a and 2b show furthermore that people inaccurately forecast their affective
394 and behavioural responses to unfairness in experimental games: in particular, subjects who
395 imagined themselves witnessing (rather than experiencing) unfair treatment forecast both more
396 anger and punishment (and, in the case of Experiment 2b, withdrawal of rewarding) than is
Altruistic punishment: A closer look 18
397 observed among people who witness unfair treatment in the laboratory. This dissociation
398 between hypothetical and actual third-party punishment raises the possibility that punishment
399 imposed by mere witnesses of unfairness found in prior work resulted from demand
400 characteristics, affective forecasting errors, and the other methodological shortcomings we have
401 cited here [8,25,27].
402 Limitations
403 As mentioned at the outset, the goal of the experiments presented herein was not to
404 systematically identify which specific methodological conventions were responsible for previous
405 findings of third-party punishment on behalf of strangers. Rather, the goal was to test whether a
406 suite of methodological conventions that are commonly applied within the third-party
407 punishment game collude to create more third-party punishment in that experimental realization
408 than would actually obtain in experiments that remediated those methodological shortcomings.
409 As such, our results cannot speak with certainty to the effects of particular aspects of previous
410 designs, such as the strategy method. A recent survey of studies comparing the strategy method
411 to the direct-response method across a variety of paradigms found that evidence surrounding the
412 effect of using the strategy method is mixed [40], but importantly, no study has been conducted
413 to directly compare the amounts of third-party punishment elicited in experiments using the
414 strategy method versus the direct response method [40]. Therefore, further work is needed to
415 determine the unique contribution of the use of the strategy method (and the other potential
416 methodological artefacts we have identified herein) to the apparently exaggerated evidence for
417 altruistic third-party punishment that previous work has revealed: we emphasize again that doing
418 so was not our goal here. Despite this limitation, our results do strongly suggest that subjects’
Altruistic punishment: A closer look 19
419 forecasts of their likely anger and punishment in response to witnessing unfairness in the
420 standard third-party punishment game [e.g., 6] are exaggerated.
421 Conclusion
422 These findings are of broad significance in the study of human cooperation because many
423 researchers have proceeded under the assumption that altruistic punishment is a robust
424 phenomenon that requires an adaptationist explanation. Indeed, two scientific “problems” for
425 which cooperation researchers over the past decades have been seeking adaptationist solutions
426 might not be problems at all. Consider the puzzle framed by proponents of “strong reciprocity,”
427 such as Gintis [41], who claimed that humans are “strong reciprocators” who are “predisposed to
428 cooperate with others and punish non co-operators, even when this behaviour cannot be justified
429 in terms of self-interest, extended kinship, or reciprocal altruism (p. 169).”
430 With respect to the former problem—“unjustified” predispositions to cooperate without
431 apparent individual benefit—results from recent models suggest that a bias to cooperate, even
432 when one faces cues of an interaction being one-shot, should be expected to coevolve with
433 reciprocity. This is because mistaking a one-shot interaction for a repeated interaction is a less
434 costly error than the reverse [42]. In terms of the latter problem (i.e., the claim that people punish
435 non co-operators even when the punisher does not stand to benefit individually), the results from
436 Experiment 1 call into question the claim that people engage in altruistic third-party punishment
437 at all [See also, 10,13]. We think another way forward in the study of third-party punishment in
438 humans, as in the study of the evolved mechanisms that motivate human cooperation in general,
439 is to intensify the search for direct or indirect benefits for punishers that outweigh the costs of
440 punishment, consistent with all known cases of third-party intervention in non-human animals
441 [43,44].
Altruistic punishment: A closer look 20
442 Acknowledgements
443 We thank Max Burton-Chellow, Steven Pinker, Stuart A. West, and Richard W. Wrangham for
444 their feedback on a previous draft, and David G. Rand for kind assistance with his data. Research
445 supported by grants from the Air Force Office of Scientific Research (Award #FA9550-12-1-
446 0179) to M.E.M and R.K., the Arsht Research on Ethics and Community Program at the
447 University of Miami to M.E.M, and an NSF Graduate Research Fellowship to E.J.P.
Altruistic punishment: A closer look 21
448 References
449 1 Clutton-Brock, T. H. & Parker, G. A. 1995 Punishment in animal societies. Nature 373, 450 209-216. (DOI:10.1038/373209a0)
451 2 West, S. A., Griffin, A. S. & Gardner, A. 2007 Social semantics: altruism, cooperation, 452 mutualism, strong reciprocity and group selection. J Evol Biol 20, 415-432. 453 (DOI:10.1111/j.1420-9101.2006.01258.x)
454 3 Bshary, A. & Bshary, R. 2010 Self-serving punishment of a common enemy creates a 455 public good in reef fishes. Curr Biol 20, 2032-2035. (DOI:10.1016/j.cub.2010.10.027)
456 4 Fehr, E. & Gächter, S. 2002 Altruistic punishment in humans. Nature 415, 137-140. 457 (DOI:10.1038/415137a)
458 5 Boyd, R., Gintis, H., Bowles, S. & Richerson, P. J. 2003 The evolution of altruistic 459 punishment. Proc Natl Acad Sci USA 100, 3531-3535. (DOI:10.1073/pnas.0630443100)
460 6 Fehr, E. & Fischbacher, U. 2004 Third-party punishment and social norms. Evol Hum 461 Behav 25, 63-87. (DOI:10.1016/S1090-5138(04)00005-4)
462 7 Henrich, J., McElreath, R., Barr, A., Ensminger, J., Barrett, C., Bolyanatz, A., Cardenas, J. 463 C., Gurven, M., Gwako, E., Henrich, N. et al. 2006 Costly punishment across human 464 societies. Science 312, 1767-70. (DOI:10.1126/science.1127333)
465 8 West, S. A., El Mouden, C. & Gardner, A. 2011 Sixteen common misconceptions about 466 the evolution of cooperation in humans. Evol Hum Behav 32, 231-262. 467 (DOI:10.1016/j.evolhumbehav.2010.08.001)
468 9 Burnham, T. C. & Johnson, D. D. P. 2005 The biological and evolutionary Logic of 469 human cooperation. Analyse and Kritik 27, 113-135.
470 10 McCullough, M. E., Kurzban, R. & Tabak, B. A. In press. Cognitive systems for revenge 471 and forgiveness. Behav Brain Sci
472 11 Hagen, E. H. & Hammerstein, P. R. 2006 Game theory and human evolution: A critique 473 of some recent interpretations of experimental games. Theor Popul Biol 69, 339-348. 474 (DOI:10.1016/j.tpb.2005.09.005)
475 12 Kümmerli, R., Burton-Chellew, M. N., Ross-Gillespie, A. & West, S. A. 2010 Resistance 476 to extreme strategies, rather than prosocial preferences, can explain human cooperation in 477 public goods games. Proc Natl Acad Sci USA 107, 10125-30. 478 (DOI:10.1073/pnas.1000829107)
479 13 Krasnow M. M., Cosmides L., Pedersen E. J., & Tooby J. 2012 What Are Punishment and 480 Reputation for? PLoS ONE 7: e45662. doi:10.1371/journal.pone.0045662
Altruistic punishment: A closer look 22
481 14 McCullough M. E., Pedersen, E. J., Schroder, J. M., Tabak, B. A., & Carver, C. S. 2012 482 Harsh childhood environmental characteristics predict exploitation and retaliation in 483 humans. Proc R Soc B 20122104. doi: 10.1098/rspb.2012.2104
484 15 Carpenter, J. P. & Matthews, P. H. In press. Norm Enforcement: Anger, indignation, or 485 reciprocity? J Eur Econ Assoc
486 16 Orne, M. 1962 On the social psychology of the psychological experiment: With particular 487 reference to demand characteristics and their implications. Am Psychol 17, 776-783.
488 17 Dana, J., Weber, R. A. & Kuang, J. 2007 Exploiting Moral Wiggle Room: Experiments 489 Demonstrating an Illusory Preference for Fairness. Econ Theor 33, 67-80. 490 491 18 Nelissen, R. M. A. 2008 The price you pay: cost-dependent reputation effects of altruistic 492 punishment. Evol Hum Behav 29, 242-248.
493 19 Johnstone, R. A. & Bshary, R. 2004 The evolution of spiteful behavior. Proc R Soc B 271, 494 1917-1922.
495 20 Sell, A, Tooby, J. & Cosmides, L. 2009 Formidability and the logic of human anger. Proc 496 Natl Acad Sci USA 106, 15073-15078.
497 21 Lieberman, D., & Linke, L. 2007 The effect of social category on third party punishment. 498 Evol Psychol 5, 289-305.
499 22 Kurzban, R., DeScioli, P. & O’Brien, E. 2007 Audience effects on moralistic punishment. 500 Evol Hum Behav 28, 75-84. (DOI:10.1016/j.evolhumbehav.2006.06.001)
501 23 Bolton, G. E., & Zwick, R. (1995). Anonymity versus punishment in ultimatum 502 bargaining. Games and Economic Behavior 10, 95-121.
503 24 Selten, R. 1967 Die Strategiemethode zur Erforschung des eingeschränkt rationalen 504 Verhaltens im Rahmen eines Oligopolexperiments. In Beiträge zur experimentellen 505 Wirtschaftsforschung, (ed H. Sauermann), pp. 136-168.
506 25 Weber, S. J. & Cook, T. D. 1972 Subject effects in laboratory research: An examination of 507 subject roles, demand characteristics, and valid inference. Psychol Bul 77, 273-295. 508 (DOI:10.1037/h0032351)
509 26 Almenberg J., Dreber A., Apicella C. L., & Rand D.G. 2011 Third Party Reward and 510 Punishment: Group Size, Efficiency and Public Goods, in Psychology of Punishment 511 Nova Science Publishers. Eds. NM Palmetti et al.
512 27 Wilson, T. D. & Gilbert, D. T. 2005 Affective forecasting: Knowing what to want. Curr 513 Dir Psychol Sci 14, 131-134.
Altruistic punishment: A closer look 23
514 28 Cook, K. S. & Yamagishi, T. 2008 A defense of deception on scientific grounds. Soc 515 Psychol Q 71, 215-221.
516 29 Kawakami, K., Dunn, E., Karmali, F. & Dovidio, J. 2009 Mispredicting affective and 517 behavioral responses to racism. Science 323, 276-278. (DOI:10.1126/science.1164951)
518 30 Hareli, S. & Weiner, B. 2002 Dislike and envy as antecedents of pleasure at another’s 519 misfortune. Motiv Emot 26, 257-277.
520 31 Reuben, E. & van Winden, F. 2008 Social ties and coordination on negative reciprocity: 521 The role of affect. J Public Econ 92, 34-53. (DOI:10.1016/j.jpubeco.2007.04.012)
522 32 Aronson, E., Ellsworth, P., Carlsmith, J. & Gonzales, M. 1990 Methods of research in 523 social psychology. New York: McGraw-Hill.
524 33 Camerer, C. 2003 Behavioral game theory. New York: Princeton University Press.
525 34 Ernest-Jones, M., Nettle, D. & Bateson, M. 2011 Effects of eye images on everyday 526 cooperative behavior: A field experiment. Evol Hum Behav 32, 172-178. 527 (DOI:10.1016/j.evolhumbehav.2010.10.006)
528 35 Petersen, M. B., Sell, A., Tooby, J. & Cosmides, L. 2010 Evolutionary psychology and 529 criminal justice: A recalibrational theory of punishment and reconciliation. In Human 530 Morality and Sociality: Evolutionary and comparative perspectives (ed H. Høgh-Oleson), 531 pp. 72-131. New York: Palgrave MacMillan.
532 36 Batson, C. D., Kennedy, C. L., Nord, L., Stocks, E. L., Fleming, D. A., Marzette, C. M., 533 Lishner, D. A., Hayes, R. E., Kolchinsky, L. M. & Zerger, T. 2007 Anger at unfairness: Is 534 it moral outrage? Eur J Soc Psychol 1285, 1272-1285. (DOI:10.1002/ejsp)
535 37 Zizzo, D. & Oswald, A. 2001 Are people willing to pay to reduce others’ incomes? 536 Annales d’Economie et de Statistique 63-64, 39-65.
537 38 Price, M. E., Cosmides, L. & Tooby, J. 2002. Punitive sentiment as an anti-free 538 rider psychological device. Evol Hum Behav 23, 203-231.
539 39 Phillips, S., Cooney, M., Carr, T. & Frady, B. 2005 Aiding peace, abetting violence: Third 540 parties and the management of conflict. Am Sociol Rev 70, 334-354.
541 40 Brandts, J. & Charness, G. 2011 The strategy versus the direct-response method: a first 542 survey of experimental comparisons Exp Econ 14, 375-398.
543 41 Gintis, H. 2000 Strong reciprocity and human sociality. J Theor Biol 206, 169-179. 544 (DOI:10.1006/jtbi.2000.2111)
Altruistic punishment: A closer look 24
545 42 Delton, A. W., Krasnow, M. M., Cosmides, L. & Tooby, J. 2011 Evolution of direct 546 reciprocity under uncertainty can explain human generosity in one-shot encounters. Proc 547 Natl Acad Sci USA (DOI:10.1073/pnas.1102131108)
548 43 Raihani, N. J., Grutter, A. S. & Bshary, R. 2010 Punishers benefit from third-party 549 punishment in fish. Science 327, 171. (DOI:10.1126/science.1183068)
550 44 Smith, J. E., Van Horn, R. C., Powning, K. S., Cole, A. R., Graham, K. E., Memenis, S. 551 K. & Holekamp, K. E. 2010 Evolutionary forces favoring intragroup coalitions among 552 spotted hyenas and other animals. Behav Ecol 21, 284-303. (DOI:10.1093/beheco/arp181)
553
Altruistic punishment: A closer look 25
554 Figure Captions
555
556
557 Figure 1. Game structure
558 (a.) Round 1. All players started with $5. Subject was either Recipient or 3rd Party. The Dictator
559 (a fictitious player whose “decisions” were determined by computer script) either took $0 (fair
560 conditions; unbolded) or $4 (unfair conditions; bolded) from Recipient. (b.) Round 2. Players
561 started with $5. Subject was Dictator; previous “Dictator” was Recipient; 3rd party was excluded.
562 Subject was allowed to give any portion of $5, do nothing, or pay a 1:4 cost to deduct money
563 from Recipient. Money deducted from Recipient in Round 2 was “burned”—it was not gained by
564 Dictators as income.
565
Altruistic punishment: A closer look 26
566
567 Figure 2. Punishment/reward distributions (Experiment 1)
568 Amount of money (in $) the subject punished (negative values) or rewarded (positive values) the
569 Round 1 Dictator (N = 270). Values are in terms of the effect on the Round 1 Dictator, not the
570 cost to the subject (cost-to-punish ratio = 1:4; cost-to-reward ratio = 1:1).
571
Altruistic punishment: A closer look 27
572
573 Figure 3. Self-reported anger (Experiments 1, 2a and 2b)
574 Self-reported anger (scale: 0-5) toward Dictator following Round 1, controlling for envy (N =
575 943). Error bars = +/- 1 SE.
576
Altruistic punishment: A closer look 28
577
578 Figure 4. Punishment ≥ $4 (Experiments 1, 2a and 2b)
579 Proportion of subjects in unfair conditions that punished (Experiment 1) or reported they would
580 punish (Experiments 2a and 2b) ≥ $4 (N = 597). Error bars = +/- 1 SE.
581
Altruistic punishment: A closer look 29
582 Electronic Supplementary Material 583 584 S1. Supporting Methods and Results 585 586 S1.1 Random assignment to experimental condition and cell sizes 587 588 Experiment 1: The condition in which witnesses of unfairness started with $9 was significantly 589 smaller than the other conditions because we stopped enrolling subjects in this condition to 590 increase power for statistical analyses involving the four cells of the 2 (Target: Self, Other) x 2 591 (Treatment: Fair, Unfair) design. This decision was made prior to any analysis of the data from 592 this condition and was based solely on an attempt to maximize the number of subjects in each of 593 the other four cells, given the rate at which we managed to recruit subjects into the study. All 594 other cell size differences are due to the random nature of the assignments. 595 596 Experiments 2a and 2b: Cell size differences are due to random assignment. The discrepancy in 597 sample sizes between punishment/rewarding analyses and emotion analyses is a result of our 598 adding the emotion questions halfway through data collection. 599 600 S1.2 Data excluded from Experiment 1 analyses based on debriefing responses 601 602 Data from 26 subjects (12 female; Age: M = 18.85, SD = 1.62) were excluded from all analyses, 603 figures, and tables (including ESM) because they expressed scepticism during debriefing (see 604 ESM Appendix for debriefing script) that they had been interacting with other people. Decisions 605 to exclude individual subjects were made without knowledge of their experimental data. The 606 number of subjects excluded did not vary by condition, χ2 (4, N = 341) = .459, p = .977, 607 suggesting that none of the conditions induced greater scepticism than any other. 608 609 Additionally, we re-ran all of the analyses with flagged subjects included and found that the 610 results did not change qualitatively in any way—all of the significant relationships presented in 611 the main text remained significant when flagged subjects were added to the analyses. 612 613 S1.3 Lexical decision task measure of implicit anger following round 1 (Experiment 1): 614 Method and results 615 616 In addition to the self-report anger data reported in the main paper for Experiment 1, we 617 collected data from a lexical decision task (LDT) following Round 1: Subjects decided, as 618 quickly as possible, whether a string of letters was a word or a non-word; reaction times to 15 619 hostility-related words (e.g., “angry,” “kill”) in the LDT serve as an implicit measure of anger 620 (with faster reaction times indicating more anger; see refs 34,35). 621 622 A 2 (Target: Self, Other) x 2 (Treatment: Fair, Unfair) ANOVA revealed a significant 623 target*treatment interaction for LDT anger following Round 1, F (1, 261) = 4.66, p = .032, 624 partial 2 = .02 (see Fig. S1). Main effects were not significant. The reaction times to hostility- 625 related words for subjects who witnessed another person being treated unfairly (natural log- 626 transformed ms, M = 6.70, SD = 0.28, N = 62) did not differ from those who witnessed another 627 being treated fairly (M = 6.66, SD = 0.27, N = 80), p = .416, partial 2 = .00. However, subjects
Altruistic punishment: A closer look 30
628 who were treated unfairly (M = 6.58, SD = 0.29, N = 59) had faster reaction times than did 629 subjects who were treated fairly (M = 6.69, SD = 0.32, N = 64), p = .029, partial 2 = .02, 630 suggesting greater anger among those who had been treated unfairly. 631 632 Reaction times to neutral words are typically controlled for in analysing LDT data [34,35]. 633 However, we did not control for reaction times to neutral words in our analysis because (a) 634 reaction times to neutral words did not differ across conditions, F (3, 261) = 1.54, p = .205, 635 partial 2 = .02, and (b) the inclusion of reaction times to neutral words as a covariate eliminated 636 the significant interaction. Because statistical control of a non-significant covariate decreases 637 power by subtracting degrees of freedom from the error term while not removing adequate sums 638 of squares for error [36], we dropped it from our analysis. 639 640 S1.4 Fairness/moral wrongness of the round 1 dictator’s behavior (Experiment 1): 641 Witnesses and receivers view the transgression identically. 642 643 A possible explanation for the differences between the emotional and behavioral responses of 644 recipients and witnesses of unfairness is that recipients paid more attention to the Dictator’s 645 unfair behavior. However, the target*treatment interaction was not significant for perceived 646 fairness (F (1, 266) = 1.73, p = .190) or moral wrongness (F (1, 266) = 1.33, p = .249) of the 647 Dictator’s decision (see Fig. S2). There were, however, significant effects of treatment: Unfair 648 Dictators’ behavior was perceived as more unfair, F (1, 266) = 354.13, p < .001, and morally 649 wrong, F (1, 266) = 182.03, p < .001, than fair Dictators’ behavior. Thus, subjects in both unfair 650 Dictator conditions clearly judged that a transgression had occurred, but they were only angered 651 and motivated to punish when they personally were targets of unfairness—consistent with 652 previous findings [27]. 653 654 S1.5 Effects of round 1 dictator behavior on self-reported anger toward the round 1 655 dictator, envy uncontrolled 656 657 When envy was not controlled, witnesses of unfairness did appear to get angry at unfair 658 Dictators, relative to witnesses’ anger in response to fair dictators: A 2 (Target: Self, Other) x 2 659 (Treatment: Fair, Unfair) ANOVA revealed significant main effects of both target, F (1, 266) = 660 10.81, p = .001), and treatment, F (1, 266) = 85.09, p < .001) for anger, and a significant 661 target*treatment interaction for anger, F (1, 266) = 12.44, p < .001). Witnesses of unfairness (M 662 = .813, SE = .110, N = 65) reported more anger than did witnesses of fairness (M = .198, SE = 663 .098, N = 80), p < .001, partial 2 = .06, and recipients of unfairness (M = 1.55, SE = .115, N = 664 61) reported more anger than recipients of fairness (M = .172, SE = .108, N = 64), p < .001, 665 partial 2 = .22. However, as discussed in the main text, witnesses of unfairness reported no 666 more anger than witnesses of fairness when envy was statistically controlled. Thus, witnesses’ 667 anger at unfair Dictators can be attributed to envy rather than to moralistic anger. 668
Altruistic punishment: A closer look 31
669 S2. Supplementary Notes 670 671 S2.1 The use of deception 672 673 Though experimental economists typically resist the use of deception in experiments, its use here 674 is justified: there was no practical way we could have obtained a sufficiently large sample size 675 without deception. We sought to gather data from at least 50 subjects per cell of our main 2x2 676 design in order to have adequate statistical power. Without the use of deception, we would have 677 had to rely on a minimum of 100 subjects, in the role of the Round 1 Dictator, to take exactly 678 $4.00 from the Recipient (unfair conditions) and at least 100 more subjects to take exactly $0.00 679 from the Recipient (fair conditions). Assume that Round 1 Dictators would be equally likely to 680 make one of the 11 possible choices on the give-take continuum, from giving $5 to taking $5, in 681 whole-dollar increments.* We would need N = 1,100 (100 subjects per choice*11 choices) to 682 achieve the same statistical power without the use of deception as with 200 subjects in our actual 683 paradigm. Considering that our interest lies entirely with how subjects responded to Dictator 684 actions, and not the actions themselves, such a design would be wasteful of subjects’ time 685 (thereby altering the ratio of benefits to risks of the experiment, and thus its ethicality) and 686 resources. 687 Below we address some possible concerns with the use of deception in our method, and why 688 these concerns do not affect the validity of our results. 689 Authenticity of debriefing responses. It could be argued that our debriefing process was 690 inadequate to identify all cases of scepticism of our deception among our participants because 691 participants were being compensated both with money and partial course credit; that is, perhaps 692 they were coerced into responding that they believed the deception because they felt they would 693 not receive compensation if they responded that they did not. This is highly unlikely. Participants 694 read and sign consent forms prior to participation in any experiment at the University of 695 Miami—as is the case with any IRB-approved study involving human subjects in the United 696 States—that explicitly state that participants cannot be denied compensation based on their 697 responses; actual amount of money earned may vary based on decisions in experimental tasks, 698 but course credit is required to be granted once a consent form is signed. 699 Possible contamination of the subject pool. If participants are regularly subjected to experiments 700 in which deception is used, it could be argued that perhaps they come to expect to be deceived in 701 future experiments in which they participate [37], and consequently provide responses that do 702 not accurately reflect natural behavior. This is unlikely to be a significant issue in the psychology 703 department’s subject pool at the University of Miami for several reasons. First, our subject pool 704 consists only of undergraduates currently enrolled in a specific introductory psychology course; 705 once they have completed the course they are no longer eligible to participate in experiments. 706 Second, members of the subject pool only participate in five total hours of experiments, which 707 translates to participation in approximately two to four experiments. Third, many of the 708 experiments that members of our subject pool participate in do not involve deception. Thus, it is
* It is probably unrealistic to assume that subjects would be equally likely to choose any one of the 11 options, but this oversimplification is more than adequate to demonstrate the present point. Also, in our actual design with predetermined Round 1 Dictator actions, subjects in Round 2 were allowed to reward or punish any amount in this range, down to the nearest cent: they made their decision by typing in a precise amount.
Altruistic punishment: A closer look 32
709 unlikely that participants in our subject pool have become accustomed to participating in 710 experiments involving deception to the point where their behavior is altered by its expectation.
711 Running subjects in individual sessions. The decision to run subjects either individually or in 712 groups presented a trade-off between experimental control and a more realistic setting. We 713 decided to run subjects individually as the benefits of doing so greatly outweighed the potential 714 costs of the less realistic setting. First, though subjects may have been more sceptical that they 715 were actually interacting with others since there were no other people in the room, the scenario 716 we presented was very plausible: subjects were told that two other participants we located 717 elsewhere in the psychology building (a five-story building with lots of foot traffic). Second, 718 running a third-party punishment study with two other people in the room introduces variables 719 relative to their physical appearances including: sex, clothing style, coalitional markings (e.g., 720 fraternity symbols, sports team logos), perceived formidability, friendliness, ethnicity, etc. Even 721 though subjects’ identities are anonymous in the game and subjects are separated with partitions 722 during play, subjects will inevitably interact sometime during the experimental session. Thus, 723 running subjects individually under the perfectly plausible guise of their interacting with other 724 people in another room avoids the potential experimental noise that can be introduced by these 725 other factors. 726 727 Concern has previously been raised that running subjects in isolated rooms may potentially 728 increase scepticism that interactions are legitimate, thereby influencing behavioural results. 729 Frohlich, et al. [38], empirically tested whether running a dictator game with isolated subjects led 730 to different results as compared with running the game with subjects in the same room. In their 731 discussion, Frolich, et al. state, “Contrary to the hypothesis, most measures of subjects’ 732 uncertainties were not significantly different as a result of the change in the number of rooms. 733 Indeed, only doubt that the money left in the envelope would be given to the paired other was 734 significantly reduced in the One Room experiments.” We note that this effect was only 735 statistically significant by one-tailed test (p = .038), whereas the effects on other, similar, 736 dependent variables were not statistically significant (even by one-tailed test: Did not view 737 experiment as a game; Not sure that description was accurate; and, most importantly, Not sure 738 that there were real people paired.) 739 740 S2.2 Inconsistency with previous experimental results 741 742 It could be argued that the reason we found no altruistic punishment in Experiment 1 is because 743 we changed multiple aspects of the standard design of the third-party punishment game at once. 744 Indeed, we made several changes in an attempt to minimize or eliminate (1) audience effects, (2) 745 experimental demand, (3) affective forecasting, and (4) potential extraneous variables introduced 746 by brief interactions with other participants. However, there are several reasons this argument is 747 not compelling. First, we observed a significant amount of second-party punishment so, clearly, 748 there was not a general suppression of punishment in our design. Second, the cost of punishment 749 in our design (1:4) was less expensive than the 1:3 cost-to-punishment ratio typically used in the 750 third-party punishment game [e.g., 6]. Indeed, our design should have encouraged punishment 751 relative to previous designs. Third, our purpose here was to test for the presence of altruistic 752 third-party punishment in a well-controlled experimental design, not to iteratively make changes 753 to a design that contained several features that likely produced artefactual results—to achieve 754 this well-controlled design, all of our changes needed to be made simultaneously.
Altruistic punishment: A closer look 33
755 756 References
757 34 Ayduk, O., Mischel, W. & Downey, G. 2002 Attentional mechanisms linking rejection to 758 hostile reactivity: The role of “hot” versus “cool” focus. Psychol Sci 13, 443-448.
759 35 Gollwitzer, M. & Denzler, M. 2009 What makes revenge sweet: Seeing the offender 760 suffer or delivering a message? J Exp Soc Psychol 45, 840-844. 761 (DOI:10.1016/j.jesp.2009.03.001)
762 36 Tabachnick, B. G. & Fidell, L. S. 1989 Using Multivariate Statistics. New York: Harper 763 & Row.
764 37 Hertwig, R. & Ortmann, A. 2001 Experimental practices in economics: A methodological 765 challenge for psychologists? Behav Brain Sci 24, 383-451.
766 38 Frohlich, N., Oppenheimer, J., & Bernard Moore, J. 2001. Some doubts about measuring 767 self-interest using dictator experiments: the costs of anonymity. J Econ Behav Organ 46, 768 271-290.
769 770
Altruistic punishment: A closer look 34
771 S3. Supplementary Figures and Captions
772 773 Figure S1. 774 Reaction times to hostility-related words in the lexical decision task. 775
Altruistic punishment: A closer look 35
776 777 Figure S2. 778 Ratings of the fairness and moral wrongness of the Round 1 Dictator’s decision. Scale from 1 779 (Not at all fair/morally wrong) to 9 (Totally fair/morally wrong). 780
Altruistic punishment: A closer look 36
781 S4. Supplementary Tables Unfair - Unfair - Fair – Unfair - Fair - Witness Witness Overall Recipient Recipient Witness ($5) ($9) Variable Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD $ Punished/Rewarded* -0.18 1.46 0.17 0.68 -1.12 2.08 0.34 1.05 -0.24 1.37 -0.22 1.42 LDT RT** 6.65 0.29 6.69 0.32 6.58 0.29 6.66 0.28 6.70 0.28 6.61 0.31 Moral wrongness 3.34 2.67 1.31 1.19 4.48 2.49 1.61 1.79 5.18 2.30 5.13 2.31 Fairness 6.07 2.95 8.61 1.08 3.64 2.40 8.48 1.24 4.26 2.43 4.07 2.00 Anger 0.64 1.01 0.14 0.44 1.54 1.29 0.17 0.46 0.84 1.07 0.67 0.88 Envy 0.85 1.17 0.27 0.61 1.47 1.40 0.36 0.76 1.48 1.27 0.80 1.02 782 * Negative values indicate punishment 783 ** Response time to hostility-related words in lexical decision task, ln-transformed ms 784 785 Table S1. Variable summary statistics – Experiment 1. 786
Altruistic punishment: A closer look 37
Fair – Unfair - Fair - Unfair - Overall Recipient Recipient Witness Witness Variable Mean SD Mean SD Mean SD Mean SD Mean SD $ Punished/Rewarded* -0.23 2.08 0.05 1.69 -0.59 2.48 0.17 1.51 -0.45 2.28 Moral wrongness 3.08 3.23 1.04 2.13 5.32 3.04 0.79 1.81 4.72 2.68 Fairness 5.28 3.38 7.85 2.21 2.64 2.57 7.74 2.01 3.37 2.58 Anger 1.12 1.32 0.16 0.52 2.25 1.20 0.28 0.64 1.63 1.13 Envy 0.98 1.25 0.46 0.92 1.85 1.39 0.34 0.69 1.12 1.16 787 * Negative values indicate punishment 788 789 Table S2. Variable summary statistics – Experiment 2a. 790
Altruistic punishment: A closer look 38
Fair – Unfair - Fair - Unfair - Overall Recipient Recipient Witness Witness Variable Mean SD Mean SD Mean SD Mean SD Mean SD $ Punished/Rewarded* 0.25 1.83 0.42 1.71 -0.04 1.99 0.58 1.74 0.06 1.79 Moral wrongness 2.87 2.88 0.75 1.79 4.56 2.41 1.00 1.92 4.97 2.32 Fairness 5.40 3.25 7.92 2.17 3.24 2.24 7.70 2.34 3.02 2.30 Anger 0.90 1.15 0.15 0.51 1.60 1.21 0.24 0.62 1.51 1.11 Envy 0.92 1.26 0.39 0.91 1.68 1.28 0.21 0.53 1.27 1.42 791 * Negative values indicate punishment 792 793 Table S3. Variable summary statistics – Experiment 2b. 794
Altruistic punishment: A closer look 39
795 Appendix 796 797 Debriefing script followed by experimenter: 798 799 (1) Tell participant that the study is over. Ask if he/she has any questions. If the questions are 800 about hypotheses or the deceptive elements of the experiment, explain that you will address 801 those specific questions in just a few moments.
802 (2) Ask whether entire the experiment was clear in its overall purpose, and whether all aspects of 803 the procedure made sense. Was there anything that the participant found confusing unclear? 804 “Were you, at any point, unsure about what we were asking you to do?”
805 (3) We would find it very helpful to hear about any of your personal feelings and reactions to the 806 experiment. Probe about what made person feel the way they felt.
807 (4) Today’s experiment was designed to help us test some very specific hypotheses about human 808 behavior. Do you have any idea what those hypotheses were? If you had to guess, what would 809 you say were the hypotheses we were testing today? We would like to know as many of your 810 guesses about our hypotheses as you can come up with.
811 (5) Ask whether participant found any aspect of the procedure odd, upsetting or disturbing.
812 (6) Did you wonder at any point whether there was more than meets the eye to any of the 813 procedures that we had you complete today? That is, do you think that there might have been any 814 information that I held back from explaining to you about the experiment until now? Ask 815 participant to say more about their suspicions, and to elaborate on their questions about the 816 procedure.
817 (7) Ask how participant thinks (the suspicions he/she mentioned) affected his or her behavior 818 during the study.
819 The experimenter then fully explained the nature of the deception to the participant and why it 820 was a necessary part of the experiment. The experimenter also discussed the aims of the research 821 with the participant, and answered any questions the participant had about their experience. 822 Lastly, the experimenter and participant discussed possible ways for the participant to talk about 823 the experiment with his or her peers in a manner that, while honest, would not spoil the 824 deception for others in the subject pool.
825 Our experimenters reported anecdotally that participants generally found the experiment fun and 826 interesting, and were typically fascinated with the study design after the deception was revealed 827 in debriefing. Furthermore, the experimenters’ discussion of how to talk to other about the study 828 helped to, in a sense, bring participants in as collaborators on the research such that (a) they 829 would not feel that they had been taken advantage of, and (b) they would not spoil the study for 830 others in the subject pool.