COUNT TO TEN FIRST

ON PUNISHMENT AND THE HOT -COLD EFFECT

by Cora Hollander, 10215115 // second version: October 28, 2013 MSc Behavioral and // Supervisor: prof. dr. F.A.A.M. van Winden

“When angry, count to ten before you speak. If very angry, count to one hundred.” Thomas Jefferson

1. INTRODUCTION “When you are upset, count to ten before you react.” This piece of folk wisdom captures the idea that how we react in the heat of a debate or dispute might not be the reaction that we prefer when we are in a more thoughtful state. Moreover, the heated reaction may be costly to oneself and something that we later regret. Does this advice contains a truth? Do you behave differently when you would 'count to ten first'? This is not only a relevant question in social life, but for economists as well. Each economic transaction takes place in a relationship between two or more parties, for example an employer and an employee or a customer and a supplier. Trust is a vital element of such a relationship. In the absence of complete contract enforcement, parties rely on each other to stick to their part of the deal. When trust is violated, there often exists the opportunity for retaliation or punishment. This can take many forms. Think for example of a misled customer that starts to spread negative rumors about a certain supplier, or of an employer leaving bad references about a former employee. Retaliating behavior often destroys wealth, meaning that both (or even more) parties suffer from it. However, its existence is not all bad. The threat of punishment can help deter exploitative behavior. Understanding when and why people punish, helps to explain this inefficient behavior and may help designing mechanisms that prevent it or enforce its power toward stimulate cooperation. This study investigates one particular feature of punishment, that is the timing of the punishment decision. Provided that there occurs no new information during the delay, there seems to be no reason why a difference in timing influences the decision. This point of view disregards the impact of the emotional state the decision maker is in. Emotions, or more generally visceral factors, are known for

Count to ten first page 1 of 37 their ability to override deliberative reasoning and have shown to be very influential in decision making (Zajonc, 1984; Damasio, 1994; LeDoux, 1996; Loewenstein, 1996). When in a state of high emotional intensity or 'hot' state, people often act in ways they themselves view as against their self-interest. When in a less intense or 'cold' state, it is hard to imagine how one would behave in hot state and vice versa. This phenomenon has been dubbed the 'hot-cold empathy gap' and is thought to be particularly present for negative emotions (Loewenstein, 2000). The motivation for punishment rises from a negative experience, when people feel that someone misbehaved. This can come with feelings of disappointment, anger or wrath and therefore induce the urge to punish even when this is costly to oneself. When the flood of emotions has waned, so may the urge to punish. Investigating that suggestion is the focus of this study. I aim to answer the question: Does costly punishment occur less frequently when a time lag is imposed on the punishment decision?

To answer this question properly we have to establish first that people engage in a trust relationship, second that violation of trust gives a distinct emotional response, and third that this emotional response is a decisive factor in the consideration of punishment. In order to study this, I conducted a behavioral laboratory experiment. How trust and its violation are exactly measured will be explained in Section 2 (p. 4-5). To reflect on the impact of emotions, I need a situation in which people can feel angry or disappointed about another persons behavior and subsequently have the option to act on these feelings in a costly manner. The trust game or investment game (Kreps, 1990; Berg et al. 1995) is a very useful vehicle to realize this. The game allows for engaging in a relationship of trust, and the violation of this trust. The main outline of the game applied in this study is as follows (see Section 3 for more details). Two players are anonymously coupled in a pair. One of them – called the Investor – is endowed with an amount of money and has to decide how much to transfer to the other participant – called the Trustee. The transferred amount is multiplied by three. The Trustee then has to decide how much to transfer back to the Investor. A variant of the game adds a third stage in which the Investor has the possibility to punish the Trustee by giving up some of its own money in order to reduce the other person's earnings. When the decision to punish has to be made directly after observing the Trustee's transfer, the reaction to this transfer and the decision to punish almost coincide. In other words, when contemplating punishment the Investor is still in the emotional hot state that might arise from observing its counterpart's choice. In the control condition, the game is played exactly as just described. In the treatment

Count to ten first page 2 of 37 condition, participants perform a intermediate task before Investors have the opportunity to punish. This task consists of three reasoning questions and takes approximately five minutes. Its main purpose is to cause a small delay between observing the outcome of the mutual transfers and the punishment decision. Specifically, the intermediate task serves to make time for any emotions to ware off and bring the Investors in a cold state before they decide on punishment. If punishment occurs significantly less often in the treatment condition, the folk wisdom to count to ten first indeed contains a truth.

This study is a pilot study, which means that there are limited resources available in terms of location, money and time. I will discuss the consequences of these practical restrictions in Section 7. I originally planned to collect 48 observations (48 people in 24 pairs playing the game twice). If Trustees would send back zero about half the time (as is common), there would occur ten situations per treatment that are likely to provoke punishment. This is still a very small sample, but could be enough to offer an indication on the hypothesized treatment effect. While performing the sessions, I noticed that Trustees almost always choose the equal split, leaving no motivation for punishment. Since it was clear that in this pattern I would never observe enough punishment situations to be able to do statistical testing, I terminated the experiment. Hence the amount of data collected is not sufficient to answer the research question. To answer the question whether there is a difference in frequency of punishment between treatments, a first requirement is that punishment is observed at all. I observed zero instances of punishment, so I cannot tell whether there is a systematic difference between the control and treatment condition. The presentation of the results will therefore focus more on how the observed behavior can be explained rather than delivering an answer to the research question.

The remainder of this paper is organized as follows. Section 2 presents an overview of the existing literature on trust games, punishments, and hot-versus-cold studies. Section 3 describes the experimental design and derives hypotheses. Section 4 provides a theoretical analysis. Section 5 proposes a set-up for empirical analysis. Section 6 presents the results. Section 7 discusses the results and procedures of this study and provides suggestions for improvements in future research. Section 8 concludes.

Count to ten first page 3 of 37 2. LITERATURE REVIEW Traditionally economics has assumed that people are rational decision makers with perfect foresight who pursue only their self-interest. It is now widely recognized that other than selfish considerations play an essential part in both social and business life. A large body of experimental evidence established that people do take factors of trust, reciprocity, and fairness into account (see Camerer, 2003 for a review). Several simple games have been developed that mimic the key features of economic interactions and that can measure social preferences (i.e. displaying behavior that deviates from rational and selfish choice). Consider for example the in which one player, the proposer, offers a split of a certain amount of money between himself and a second player, the responder. The responder can then accept this split or reject it such that both players receive nothing. Although accepting any offer is always the in the traditional economic sense, people often reject offers in which they would get only a small part of the pie. Rejection in the ultimatum game can be viewed as punishment, since the responder pays a cost (foregoing a positive amount of money) such that the other player is left with no earnings as well. Many proposers seem to anticipate this and offer (close to) a fair split. Proposers offer substantial amounts even in dictator games, when the responder does not have the option to reject (see again Camerer, 2003). In gift-exchange games, two players sequentially choose how large a gift that want to send to each other. Second movers often reimburse large amounts sent by first movers (e.g. Charness, 2004; Fehr and Gächter, 2000). Fehr, Gächter and Kirchsteiger (1997) added a third stage to the game in which first movers can reward or punish the other party. First movers utilized this option frequently, and the amount sent by second-movers increased dramatically. The availability of punishment has shown to sustain cooperation also in public good games (Fehr and Gächter, 2000). Egas and Riedl (2008) find similar results, but stress that this is only the case if the cost-impact ratio (the cost born by the punisher and the impact on the player being punished) is low enough. Taken together, these results clearly demonstrate that people have a taste for equality or fairness at least to some extent. This taste is such an important motivation that people are willing to incur losses (i.e. pay for punishment) to express it.

Since this study in particularly concerned with trust and punishing a violation of trust, I will look at these concepts more closely. Recall that the trust game allows to measure trust and trustworthiness. In defining trust, I follow here Lukas and Walgenbach (2010): “Trust in persons means accepting that they may take

Count to ten first page 4 of 37 advantage of a trustor although the trustor expects them not to do so.” This definition captures two distinctive features of trust. First, to trust means to recognize that there is a chance at a loss and still show the “willingness to be vulnerable” (Mayer et al., 1993) by engaging in the act. Second, a trustor can only form an expectation about the trustworthiness of the trustee and never be sure of what the trustee will do. In other words, there exists “irreducible uncertainty” (Nootenboom, 2002). An illustration: if you would know that it is in the other party's own interest to repay your 'trust', your act is merely calculated cooperation rather than trust. Knowing that repayment is in the other's self-interest reduces the uncertainty characterizing trust. Recognize that trust in this definition needs the repayment of trust not only to be uncertain, but also needs it to be an act that deviates from selfish maximization of monetary payoff. For an act to truly be of a trusting nature then, it requires the actor to regard its riskiness and nevertheless take the leap of faith in the other party's willingness to share the benefit of the exchange. In the trust game, the Investor faces a chance of a gain (receiving more than his endowment) and a chance of a loss (losing his endowment) if he decides to transfer the money. It is clear that it is mutually beneficial to engage in a relationship, but there is no explicit rule that obliges either one party to share the money. Therefore, this game is suitable to measure trust. Some models (see Watson (1999; 2002) and Sobel (1985)) explain cooperation in trust settings as the equilibrium behavior of rational self-regarding actors, because sustained exchange of trust and trustworthiness is beneficial to each individual actor as well. However, it is again questionable whether this can be called trust and trustworthiness if it is motivated by self-interest. Such a theory of conditional cooperation (“I trust you as long as you are trustworthy” and vice versa) cannot explain trust in one-shot interactions when there is no 'sustained exchange'. The Trustee faces a social dilemma: s/he can either be fully self- regarding and display opportunistic behavior, or s/he can display trustworthiness and share the proceeds of the investment with the Investor. Note again that the Trustee has no monetary incentive to be trustworthy. This is a classical example of moral hazard, because there is no formal enforcement mechanism. I will call a Trustee trustworthy if his/her action realizes a gain for the Investor, and untrustworthy if it realizes a loss. In case the Trustee pays back exactly the amount invested by the Investor, I call this mildly trustworthy since it does not make him/her worse off then s/he was.

Numerous studies employing the trust game typically observe the following pattern (see again Camerer, 2003 for an extensive review). The large majority of investors

Count to ten first page 5 of 37 send a positive amount of money, on average half of their endowment. Trustees display a widely dispersed pattern, with substantial numbers of people acting both purely selfish and trustworthy. On average they send back the original investment, which means that trust does not pay in most cases. Recall that the rational and selfish choice is to send no money at all in both roles. That theory can clearly not explain the general findings. Why do people trust and repay trust? Engaging in a trust relationship is mutually beneficial. Under the right circumstances, actors with a tendency to be trusting and trustworthy also in one-shot interactions have an evolutionary advantage (see for a recent argument Manapata et al., 2013). Even when the circumstances in the laboratory are different, participants bring such a mechanism with them. The evolutionary explanation is strengthened by the finding that there exists a biological basis for trust. Kosfeld et al. (2005) found that Investors who inhaled oxytocin, a hormone that is implicated in social bonding, transfered larger proportions of their endowment. Trustee behavior was unaffected by oxytocin. Trustworthiness in trust games is often explained in terms of altruism, social norm adherence, or reciprocity. Direct reciprocity seems to explain only a small part, since the amount Trustees send back is only slightly bigger than the amounts they transfer as Dictators (Cox, 2009).

The availability of punishment enhances both trust and trustworthiness significantly (at least when the punishment is severe enough), and this option is commonly used (see e.g. Andreoni et al. 2003; Rigdon, 2009). Where does this demand for punishment come from? In repeated interactions, punishment can be used for strategic reasons. If the punishment is severe enough, a displayed willingness to punish can convince another party to cooperate with you in the future. Punishment in a one-shot interaction has no such function. It is a wealth-destroying act that serves the punisher no benefit other than perhaps the gratification of releasing its anger, and still it is frequently observed. There is evidence that punishing is indeed used for such reasons. Based on self-reports, unfair decisions are punished often because they lead to an angry reaction (Camerer, 2003). Anger is found to be the predominant driver of punishment in social dilemma games, with the intensity of anger related to fairness perceptions (Hopfensitz and Reuben, 2009). In support of this view, responders in an ultimatum game rejected significantly less when they had another way to display their emotions by sending a free message along with their decision (Xiao and Houser, 2005). These messages were indeed used to express negative emotions following an unfair offer. Further evidence stems from Sanfey et al. (2003) who looked at brain activity during the decision to reject or accept an offer in again the

Count to ten first page 6 of 37 ultimatum game. Of particular interest here is the pattern found in the anterior insula, a brain region known to be involved in processing negative emotions in particular. Anterior insula activation was responsive to the degree of unfairness, with more unfair offers leading to more activation. Moreover, activity was predictive for rejection rates: subjects with higher anterior insula activation to unfair offers rejected those offers more often. This suggests that an unfair outcome triggers an emotional reaction, and that this reaction indeed influences the decision to punish. Quervain et al. (2004) found that people derive satisfaction from punishing others who violated their trust. Investors punishing untrustworthy Trustees showed elevated activation in the dorsal striatum, a brain area implicated in the processing of rewards. Also, Investors with higher activations were willing to pay more for punishment.

The literature discussed in the previous paragraph seems to agree that feelings of anger and satisfaction are important determinants of the decision to punish. How does the timing of the punishment decision come into play? From a fully rational point of view, postponement as such (without new information) should not influence a decision. However, it has been shown that timing of a decision does make a difference. For example, the willingness to invest in risky gambles increases when the decision is made for multiple periods in the future compared to each period in turn (Gneezy and Potter, 2003). Evidence from several social dilemma games show contributions are higher when people decide simultaneously rather than sequentially even without information on the prior moves (e.g. Rapoport, 1997; Abele and Ehrhart, 2005). The hot-cold state paradigm relates timing to emotional state. The influence of emotional state is formally included in dual-system theories of decision making that distinguish between two types of cognitive processes (Epstein, 1994; Stanovich and West, 2000; Kahneman, 2011). The first type of reasoning is fast, fairly effortless, and automatic - coined “System 1” processes, while the second type is slow, intentional, and requires conscious attention - coined “System 2” processes. System 1 is primarily driven by intuition and is prone to heuristics and emotions, where as System 2 is driven by rules and logic and moderates deliberative reasoning. Timing can influence which system has the upper hand. Imposing high time pressure, and thereby suppressing System 2, resulted in participants being more likely to reject offers in an ultimatum game (Cappelletti et al., 2008). Delaying a decision is thought to mitigate the influence of System 1 and allow more time for System 2 to kick in, thus leading to more rational decisions (e.g. Adler, Rosen, and Silverstein, 1998).

Count to ten first page 7 of 37 The distinction between hot and cold states is generally created by means of one of two methods. Some studies define as a hot state the situation in which the the decision maker moves after observing the actual action by the first mover. Those studies apply the method to observe cold state behavior, i.e. asking decision makers to state their action in response to hypothetical first mover's actions. Other studies aim to induce a cold state by imposing a time lag between the observation of the first mover's action and the final response to that action. Without a time lag decision makers are said to be still in a hot state. Results on the effect of 'temperature' (hot versus cold) on decision making are mixed, both between and within studies using the methods just explained. The effect seems to depend on other experimental parameters as well. Employing the strategy method, Charness and Brandts (1998) studied two sequential games that can measure the extent to which players give a positive (cooperative) or negative (selfish) response to a positive or negative action. They found no systematic difference in behavior between states. In a study with comparable games and also applying the strategy method, Brosig and Weimann (2003) do observe more punishment (negative responses) with decision makers in a hot state, but only when the cost of punishment was low enough. In my opinion the strategy method might fail to create a distinction between a hot and a cold state. Participants using the strategy method have to decide immediately after realizing a scenario. If they consider the hypothetical cases by vividly imagining themselves in such a position, they could still be in a hot state. The time lag method better reflects the notion that emotions arise quickly and then wear off over time. Still, the results here are mixed as well. Oechssler et al. (2008) found that the rejection rate of unequal splits in an ultimatum game dropped by 25 percent when responders were given the unanticipated opportunity to revise their choice 24 hours later. This effect was present only when the stakes were high-prize lottery tickets, and not when they were small amounts of cash. 1 In a similar vein, Grimm and Mengel (2011) observed rejection rates going down by almost half with a cooling-off period of ten minutes. This suggests a cooling-off period induces more rational behavior, at least when the incentive is high enough. On the contrary, Bosman et al. (2001) do not observe a cooling-off effect. Here, half of the responders in an ultimatum game was asked to make their decision one hour after reviewing the proposer´s offer. Neither behavior nor self- reported emotions upon deciding differed from those who responded directly. This suggests that the impact of emotions on decision making can show up again by the

1 The expected value of both incentive schemes was the same.

Count to ten first page 8 of 37 same magnitude when one has actually has to make a decision. In line with this finding, a ten-minute cooling-off period did not make players respond closer to the profit-maximizing choice in a Stackelberg game (Cardella and Chi, 2012). The latter pair of authors suggest themselves that the intended cooling-off period may have acted as a heating-up period instead, by fostering the emotional response rather than damming it. Neo et al. (2013) do find an effect of imposing time delay on responder behavior in ultimatum games, but they do not observe such an effect in trust games. Which mechanism is precisely at work during an intended cooling-off period is not well articulated in most studies. The general idea rests on the so-called “primacy of affect”; the notion that people can identify an affective reaction more rapidly then they can produce a deliberate response (Zajonc, 1984). Inserting a cooling-off period is one of the most commonly applied methods to reduce anger, in research and in practice (Adler et al., 1998). Taking a break allows deliberation to catch up with affect. This will not work if the time is spent reliving the event that triggered the emotion (Goleman, 1995). VanOyen Witvliet et al. (2001) examined the emotional and physiological effects of rehearsing hurtful memories and harboring grudges (i.e. unforgiving thoughts) compared to taking perspective and granting forgiveness. They found that unforgiving thoughts induced more aversive emotions, as revealed by significantly higher corrugate (brow) electromyography (EMG), skin conductance, heart rate, and blood pressure. This affirms that active remembrance has an emotional impact. For a cooling-off period to have the intended effect, actors need to devote their attention to other matters, to reason away the urge of an outburst, or to focus on calming their nervous system by recalling pleasant experiences (Doner, 1996). One explanation for the mixed results on the effect of a time lag described above (cooling off versus heating up) could then be that some studies did not succeed in diverting people's attention away from the Trustee's behavior. A second explanation could be that the impact of emotions on decision making is more robust than their fast and flaring nature suggests. This is consistent with “the law of conservation of emotional momentum of emotions” by Frijda (1988) which states that “ emotional events retain their power to elicit emotions emotions indefinitely, unless counteracted by repetitive exposures that permit extinction or habituation to the extent that these are possible .” This means that emotional impact cannot be mitigated by time as such, but that re-evaluation of the eliciting event can. 2

The literature reviewed here shows that the hot-cold effect is not well understood

2 It would be interesting to incorporate these intermediate activities here into future research.

Count to ten first page 9 of 37 yet. It is not clear in which circumstances a cooling-off period is effective. Several questions ask for investigation. How long a time lag is sufficient? Is the activity during the break of importance? Does it matter whether the decision is delayed or that the decision can be revised? Is certain type of decision making (for example competition, bargaining) perhaps less susceptible to a cooling-off effect? The current research wants to contribute to this quest, by focusing on punishment and the effect of delaying the decision with a short reasoning task.

3. EXPERIMENTAL DESIGN AND HYPOTHESES In this section I will describe the experimental design and give a detailed account of the recruiting procedures, the setting, and the communication during the sessions. All original statements, both written and verbal, that were communicated to the participants at any time during the experiment are available to the reader in the Appendices (in Dutch). These include the instructions, answer sheets, and my spoken notifications.

3.1 What is measured The experiment is designed to measure money transfers by Investors and Trustees (reflecting trust and trustworthiness respectively) and punishment by Investors, as will be explained in more detail in Section 3.2. Aside from those, I incorporated two additional measurements: the Self-Assessment Manikin (SAM) and the Cognitive Reflection Test (CRT). The SAM is a very quick and easy method to measure an affective response. It asks respondents to indicate their present state on two scales of pictures, representing valence (the degree to which a respondent is happy) and arousal (the degree to which a respondent is excited). 3 The SAM method has shown to be as good or even better than more extended and time-consuming methods (Bradley and Lang, 1994). The brief and straightforward nature of the SAM makes it a very nice technique to use here for two reasons. First, a short question is less likely to influence the subsequent punishment decision than a longer list of questions on emotional state. Second, it allows me to introduce the SAM only to the Investors at the time they are asked to complete it. This way, it does not draw explicit attention of the Trustees to the potential emotional response of the Investors. Of course Trustees can think of the emotional response of their counterpart themselves, but then it happens in a natural manner. Also, by concealing the SAM for the Responders they do not receive unnecessary information during the instructions

3 The original SAM also includes a third scale for 'dominance'. I left this scale out to keep time to complete the SAM as minimal as possible. Dominance is not relevant in the current context.

Count to ten first page 10 of 37 which could be confusing. Note that this is not deception since I do not hide information about action sets or payoff schemes. The answers to the SAM have no consequences for the outcome of the interaction at all. The CRT is a three-item test designed to assess a specific cognitive ability: the ability to suppress an intuitive and spontaneous but wrong answer in favor of a reflective and deliberate right answer (Frederick, 2005). Consider for example the following question: “A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost?”. The immediate intuitive answer is 10 cents, but is easily found to be wrong if one would take a moment to check that answer. 4 The test aims to measure the degree of cognitive reflectiveness, with low scores pointing at 'cognitive impulsiveness'. This spectrum from reflectiveness to impulsiveness relates to dual-system theory mentioned in Section 2. High (low) CRT performance corresponds to high (low) ability to control the input from System 1. CRT performance is shown to be correlated with several behavioral traits. For example, people with high scores seem to be more patient (i.e. have lower implied discount rates) and more risk neutral (Frederick, 2005). High-scorers are also less susceptible to a number of behavioral biases such as the reflection effect, the conjunction fallacy, conservatism bias, base rate fallacy, the endowment effect, overconfidence, and anchoring (Kahneman and Frederick, 2006; Hoppe and Kusterer, 2010).

Finally, participants are asked to complete a questionnaire regarding some background information on age, gender, education, familiarity with game theory, and fairness opinions. Also, I asked participants to state their considerations for each decision they made. This turned out to be insightful in understanding the observed behavior, as I will describe in Section 6 .

3.2. Parameters of the game and treatment The experiment features two conditions: the control condition which I call T0 or 'Hot', and the treatment condition which I call T1 or 'Cold'. Each participant plays either T0 or T1, which entails a between-participants design. A within-participants design would open up the possibility for round effects and learning effects that blur the treatment effect. These could be in principle be checked for by applying different orders, but the modest number of participants makes it hard to reliably test for order effects. Which condition was assigned to a session was simply determined by the schedule. The first session was assigned T0 and thereafter I let conditions alternate.

4 The correct answer is 5 cents.

Count to ten first page 11 of 37 One session consists of two parts. Participants are informed that there will be a part 2, but not what that part constitutes before Part 1 is over. In fact, Part 2 consists of the same game but with a different partner. At the beginning of Part 1 they are explicitly informed that they will be matched randomly with another person in the room, and that they will encounter this person only once through the entire experiment. This information reveals something about Part 2, namely that participants will be matched with someone. I nevertheless share this information to emphasize the interaction in both parts is one-shot. At the start of Part 2, it is again assured everyone will meet someone else than in Part 1.

All participants are assigned a show-up fee of 2 euros. Investors are endowed with an additional 5 euro 'game-play money'. 5 The game is explained in steps. First, I describe the control condition T0:  In Step 1, the Investor has to decide if s/he wants to transfer the game-play endowment to the Trustee or not. If s/he transfers, this amount is tripled by the experimenter. The Trustee then receives 15 euro. If the Investor does not transfer, the game ends. Both players do still participate in Step 4.  In Step 2, the Trustee is informed about the Investor's decision. If the endowment is transferred, the Trustee has to decide how much money s/he wants to transfer back to the Investor. S/He can choose to send back 0, 5, or 7.50 euro. The Investor is directly informed about the Trustee's decision.  In Step 3, the Investor can decide to punish the Trustee provided that s/he transferred in Step 1. 6 First, his/her emotional response to the Trustee's action is measured by means of a SAM. Then the Investor decides to punish or not. If so, the punishment will lower the Trustee's payoff by 3 euro at the cost of 1 euro to the Investor. The Investor can always do this, even if s/he transferred the endowment and got nothing in return. In that case, the show-up fee prevents him/her from negative earnings. This is carefully explained to the participants. Any proceeds from punishment are paid back to the experimenter.  In Step 4, all participants individually answer three quiz questions (more details will follow in section 3.3).

The action space for all three decisions (transfer, back transfer, punishment) is deliberately restricted to specific amounts instead of letting participants freely

5 In retrospect, the chosen endowment setting creates an unbalanced playing field and is thereby possibly a confounding factor. I will come back to this in Section 7. 6 Note that the rare but theoretical case that a the Investor does not transfer the game-play endowment but nevertheless punishes, is hereby prohibited. This protects the Trustee from generating negative earnings.

Count to ten first page 12 of 37 choose an amount in an interval. I acknowledge it restricts freedom and might nudge people toward a particular decision by making certain allocations focal (especially for Trustees). The advantage of a restricted action space is that it is easier to interpret what a given action could mean, both for the actor's partner and for me as a researcher. The first transfer is set to be binary (transfer the full endowment or nothing) so it is clear that a transfer is a sign of trust, or at least a sign of willingness to engage in a relationship. Any other positive transfer is multi- interpretable: at one hand it is a sign of trust because money is given out of hands, but at the other hand a certain amount is kept from the other player which can be seen as a sign of distrust. Regarding the second transfer, I choose these amounts because they correspond to different motivations for determining the transfer by the Trustee: not reciprocate/maximize own earnings (0), reimburse the investment (5), and share the profit and/or equate the payoffs (7.5). 7 The punishment decision is again binary. This cannot reveal the strength of the desire to punish, but that is not the goal. The main objective of this research is to see whether the timing of the punishment decision influences the propensity to punish and not whether it influences the degree to which people punish. 8 Also, with the chosen punishment structure it is easy to prevent negative earnings without the need to give Investors a much larger endowment then Trustees.

The treatment and control condition differ in one respect: the order of the last two steps. In T0, the punishment decision is carried out directly after Step 2 as described above. In T1, the punishment decision is delayed until after the answering of the three questions, i.e. Step 3 and 4 change places. Hence, the steps in T1 look like this:  In Step 1, the Investor has to decide if s/he wants to transfer the game-play endowment or not. If s/he transfers, the Trustee receives the tripled amount of 15 euro. If the endowment is not transferred, the game ends. Both players do still participate in Step 3.  In Step 2, the Trustee is informed about the Investor's decision and decides how much to transfer back to him/her. The Investor is directly informed about the Trustee's decision.  In Step 3, all participants individually answer three quiz questions.  In Step 4, the Investor completes the SAM and can decide to punish the Trustee.

7 See Section 7 for comments on the possible effects of this action space and its alternatives. 8 This is an interesting follow-up question for future research.

Count to ten first page 13 of 37 The time devoted to the three individual questions serves in T1 as a cooling-off period. Solving the questions is a short but engaging task, which is meant to pass some time and take especially the Investor's mind of the game for a while. the Trustees also perform this task although they are not the ones who need to cool off. They cannot leave the room since that would reveal information about who played which role. Doing nothing might be boring, and the task will provide them with something to do. In T0, the three questions are merely added for two reasons: 1) to keep the duration of the session and the maximum earning possibilities equal between the control and treatment group, and 2) to have data on CRT performance in the control group too.

3.3. The quiz questions For the intended cooling off-effect to take place in T1 we need an intermediate task that takes a couple of minutes and that is engaging enough to spend effort on. This quizzes are therefore incentivized. Subject can earn 0.50 euro per correct answer to each quiz question, that is 1.50 euro in total for this task per part. I do not provide feedback on performance. 9 Because I let participants play the game twice (part 1 and 2), I need two different but comparable assignments to serve as the intermediate task. In Part 1, I use the Cognitive Reflection Test (please see above). In Part 2, I collected another set of questions that also appeals to reasoning, and is also famous for pointing at common failures in logical reasoning (adopted from Kahneman, 2011).

3.4. Recruitment and procedures Participants were recruited through my own social network. I aimed to recruit people that were only indirectly related to me, to prevent potential experimenter effects and a too informal atmosphere as much as possible. Most participants were second-degree connections, i.e. colleges, friends or acquaintances of my friends and family members. As as consequence, participants within (but not across) each session knew each other (which will later show to be potentially problematic, see Section 7). People were invited by e-mail and could sign up through a website I moderated. All sessions were carried out between July 30 and August 13, 2013 and were held on location close to the participants. Before each session, I requested an empty room with desks and chairs (a large office or a meeting room) to be available for me to create a temporary laboratory. I arranged the seats such that participants could not observe each others instructions and actions. As I prepared the room, participants gathered in

9 Subjects were very eager to hear the correct answers to the questions. This is a clear sign of engagement with the task. With those interested, I discussed the correct answers afterwards.

Count to ten first page 14 of 37 another room or in the hallway. Then I started the experiment in a formal manner, as is common in research labs. All participants adhered to the instructions very carefully, and maintained silent and concentrated throughout the whole experiment. Upon entering the lab, participants are randomly assigned to a seat by blindly drawing a card with a table number from a stack. For each seat I predetermined the role and matching protocol. Participants are privately informed about their role: half play the part of Investor and half play the part of Trustee.10 All keep their role throughout the whole experiment. Investors are matched with Trustees into pairs. I use a strangers matching protocol, which means that players are anonymously matched with a different counterpart in each part of the experiment. Each pair therefore has only a singe-shot interaction although the game itself is repeated. Such a matching protocol rules out any strategic motives such as reputation building to play a role. The experiment was conducted manually; participants indicated their answers and decisions on paper answer sheets. When applicable, participants were privately informed by me about their counterpart's actions. I transferred the information by ticking a box on the answer sheets. 11 Trustees received no feedback about the punishment decision. The instructions were read out loud in order to create common knowledge and to urge participants to take the time to go through the instructions. 10 Participants were invited to ask questions for clarification. Any questions were answered privately. 12 In the questionnaire at the end of each session, I explicitly asked if any part of the instructions or answer sheets were unclear. Nobody indicated so. 13 After the instructions, each participant receives answer sheets for the part at hand. A set of answer sheets consists of three to four pages stapled together. The first page displays Step 1 and 2, the second and third page display Step 3 and 4 in the applicable order. In both the control and the treatment version, participants are able to review the history of the game in that part if they want to by flipping back to previous pages. This possibility is indicated on the answer sheet for the punishment decision. The answers of Part 1 are collected before the start of Part 2 so that participants cannot compare both parts on paper, only naturally from memory. The interested reader is again referred to the Appendix to review the

10 To keep the language used in the experiment as neutral as possible, the roles of Investor and Trustee were labeled Player A and Player B respectively. 11 The interested reader is again referred to the appendices for the instructions and sample answer sheets. 12 I received only one question: 'What goal should I pursue? Making the most money, or …'. I answered: 'That is up to you. You may choose a goal yourself.' 13 There were two comments posted here: 1) one participant found it unclear what his objective had to be (this was not the same participant that asked me about this), 2) one participant wondered why I measured 'arousal' in the SAM.

Count to ten first page 15 of 37 original answer sheets. At the end of the experiment I randomly selected one pair in one part of the experiment for payment according to their earnings in that part. Normally one prefers to pay all participants but my budget did not allow for that. Theoretically the incentive of acting according to ones true preferences stays in place since the relative outcomes of the game are not altered by the random payment scheme. However, I acknowledge that the earning prospect is heavily diluted by this additional lottery. Payment was executed privately. All participants received an envelope which contained a receipt stating their payment, and (if applicable) their payment in cash. Participants were asked to check their payment and sign the receipts. This concluded the session.

3.5. Hypotheses Investors who have trusted their endowment have the possibility to punish the Trustee. Even though punishment is costly, Investors may have a desire to punish when they feel Trustees have mistreated them. That is, punishment is assumed to be mainly an emotional reaction to the violation of the Investor's norm or expectation. Delaying the decision is thought to bring the Investor in a cold state and thereby mitigate the influence of emotions. Therefore I expect to observe less punishment when a time lag is imposed on the punishment decision.

Hypothesis I: Punishment occurs less frequently in T1 than in T0.

If punishment is indeed assigned less in T0, the next step is to figure out which mechanism caused this. As explained earlier, the effect of a time lag is assumed to go through emotional state. If this is correct, this pathway can be identified in two ways. First, participants who punish are expected to be more emotional then those who do not punish. This should show up in the SAM-scores regardless of treatment. Second, the participants experiencing the time lag are expected to be in a more neutral emotional state then those who experience no delay. The SAM-scores should reflect this as well regardless of the punishment decision. Since both punishment and emotional response are likely to vary with Trustee behavior, these expectations ought to be tested conditional on the Trustee's transfer.

Hypothesis IIa: SAM-scores of non-punishers are closer to zero than the SAM-scores of punishers, controlling for Trustee behavior.

Count to ten first page 16 of 37 Hypothesis IIb: SAM-scores in T1 are closer to zero than SAM-scores in T0, controlling for Trustee behavior.

The design allows for one more way to reveal the mechanisms underlying punishment behavior by using the data on CRT performance. Dual-system theory describes the emotional system as an impulsive system. If the decision to punish (importantly, in the absence of strategic motives 14 ) is mainly induced by an intuitive and emotional System 1-response, participants who are more capable of suppressing impulses are expected to punish less (across treatments).

Hypothesis III: CRT score is negatively correlated with punishment.

I do need to make a precautionary remark concerning the last hypothesis. In the treatment condition T1, the CRT takes place before the punishment decision and hence compromises the purity of the measurement of emotional response. It is possible that the CRT induces emotion in itself and that this subsequently alters punishment behavior. If this is the case, it would show as different correlations of CRT score and punishment between the treatment and the control condition. To fully address this concern, one should separately test whether the CRT on its own induces changes in emotional state or can be regarded as 'neutral' in terms of emotions.

4. THEORETICAL ANALYSIS Let me present a formal description of the game. Investors and Trustees are called A and B, respectively. Their respective transfers are denoted by X and Y.

Punishment is denoted by Z. Q i states the number of quiz questions answered correctly by player i in Step 3 (in T1) or Step 4 (in T0). Note that a description of one part suffices since each part represents a single-shot interaction and payment is effectuated in only one part. Below I define the set of players N, the set of all possible histories of the game H 15 , the player functions P that assigns a player to each subhistory and the payoff functions πi for all N.

Players: N = {A, B} Histories: H = {( Ø) , (0), (5), (5,0), (5,5), (5, 7.5), (5,0,0), (5,0,1), (5,5,0), (5,5,1), (5,7.5,0), (5,7.5,1)} 14 In the presence of strategic motives, for example incentives for reputation building, punishing can also be a sign of advanced forward looking behavior. 15 A terminal history (i.e. a sequence completing all steps) has the general form { X, Y|X, Z|X,Y } .

Count to ten first page 17 of 37 Player functions: P( Ø) = P(5,0) = P(5,5) = P(5,7.5) = A P(5) = B

Payoff functions: πA = 2+5–X+Y-Z+0.5Q A and πB = 2+3X–Y-3Z+0.5Q B with X = {0,5}, Y = {0,5,7.5}, Z = {0,1}, Q = {0,1,2,3}

Since each player answers the quiz questions independently, Q can be viewed as an exogenously determined constant and as having no interaction with X, Y, or Z. However, the possibility that quiz performance somehow influences the behavior in the game or vice versa cannot be excluded. Consider first the possible influence of quiz performance on behavior. It is not clear in which direction such an influence would run. Participants know that the quiz questions are coming, but do not know what kind of questions it will be or how well they performed. On the one hand, good performance can enable punishment. Recall that the show-up fee ensures Proposers have budget for performance at all times. Still, imagine that a participant adopts some kind of threshold strategy (“I want to earn at least C euros”). Quiz performance may determine whether this threshold is met and hence whether this participant has a 'surplus' to spend on punishment. On the other hand, good performance may lessen punishment indirectly. The prospect of earning money in the quiz later may make the threshold easier to reach. Responders may then be inclined to share more money and thereby removing grounds for punishment. Consider second the possible influence of behavior on quiz performance. Participants adopting a threshold strategy that have already met their threshold may be less intrigued by the quiz. This is turn could mitigate the effect of the time lag and hence reduce its effect on the punishment decision. The same type of reasoning goes for participants not having met their threshold by Step 2. Although this reasoning may seem far-fetched, it cannot be ruled out entirely. To keep the analysis simple, I will nonetheless assume that Q is determined outside the game and can be disregarded in setting X, Y, and Z.

Simplified

payoff functions: πA = 2+5–X+Y-Z and πB = 2+3X–Y-3Z

4.1. Four common frameworks The next step in the analysis requires an understanding of the functional form of a player's utility functions, i.e. of its preferences. Here is where different theoretical frameworks depart from each other. I will discuss four frameworks.

Standard game theory The traditional game-theoretic perspective (von Neumann and Morgenstern, 1947; Selten, 1965) assumes actors to be rational, selfish, and utility maximizing (with

Count to ten first page 18 of 37 one's own monetary payoff as the only argument of the utility function). Provided that these preferences are common, a Responder keeps all s/he receives, and anticipating this a Proposer will transfer no positive amount to begin with. Also no Proposer will punish, because it is costly for himself and it serves no purpose later on since players meet each other only once. In the perfect (SPNE) then, X=Y=Z=0 for for every player in every part. The corresponding outcome for respectively A and B is (7,2).

Prediction SPNE: X = Y = Z = P = 0 Standard game theory predicts no transfers and no punishment.

The observation that people in fact frequently display non-selfish behavior violates the traditional assumptions. This evidence is too abundant to be explained by trembling hands or insufficiently high stakes (Camerer, 2003). This has led theorists to develop models that account for motivational mechanisms other than pure self- interest, i.e. other-regarding preferences. The way decision makers are assumed to have a regard for others (positively or negatively) differs among approaches. I will briefly describe three frameworks that have shown to be helpful in explaining experimental data (see e.g. Lopez-Perez, 2008; Loewenstein, 2000, for more formal summaries of the modeling). 16

Distributional concerns Models that capture distributional concerns assume that utility is increasing in own income, but is also affected by a player's income compared to others. Utility is often decreasing in the difference between the payoff of oneself and the other. Famous formalizations are the theory of inequality aversion by Fehr and Schmidt (1999) and ERC theory by Bolton and Ockenfels (2000). Those models feature utility functions that balance advantageous and disadvantageous inequalities in income. Inequity is called advantageous when you earn more than the other player, and disadvantageous when you earn less. Usually it is assumed that the weight put on an advantageous difference in payoff is less than the weight put on a disadvantageous difference. Both versions predict that people will sacrifice money to make incomes more equal, at least under certain parametric values. In the present game, ERC theory predicts participants to share the proceeds. The prediction of inequity aversion depends on the exact values of the parameters (i.e. tbe weights in the utility functions mentioned above). It is certainly consistent with trust and trustworthiness given that the parameter values are high enough. With

16 I exclude here a (plain) care for efficiency, altruism, and warm glow since these notions cannot account for the richness of the experimental evidence.

Count to ten first page 19 of 37 very low parameters (i.e. almost no care for inequality), no trust can also be an equilibrium. Both theories too can explain punishment in the case of an unequal split, because punishment reduces the difference between payoffs. 17 Under the assumed preferences, this increases utility.

Prediction Inequity Aversion/ERC: i) X=5, Y=7.5, P=1 iff Y=0, 5 provided that weights on advantageous (for decision on X and Y) and disadvantageous (for decision on P) income inequalities are sufficiently high

ii) X=Y=Z=0 in case of sufficiently low weights in both income inequalities

Fairness and reciprocity Models that capture reciprocity let the regard for the other depend on the other's behavior. Utility functions include terms measuring kindness and fairness of an offer such that people prefer being nice to those that have been nice to them, and act unkindly towards those who acted unkindly before. Rabin (1993) showed that a mutual cooperative 'fairness equilibrium' exists when the care for fairness is sufficiently high. Again, for very low parameter values no exchange is also an equilibrium. This approach has been extended by several authors, for instance to sequential games (Dufwenberg and Kirchsteiger, 2004), by including intentions (Falk and Fischbacher, 1998), and social welfare considerations (Charness and Rabin, 2002). These models explain trust, trustworthiness, and punishment in terms of positive and negative reciprocity.

Prediction theories of reciprocity: i) X=5, Y=7.5, P=1 iff Y=0,5 provided that the care for fairness and the degree of intentionality is high enough

ii) X=Y=Z=0 in case of sufficiently low parameter values

Role of affect While both models of inequity aversion and reciprocity can explain punishment, both cannot explain a hot-cold effect. Although some authors acknowledge that emotional satisfaction may be the proxy mechanism that guides people to behave reciprocal, all three frameworks do not differentiate between decision-making in

17 A numerical example: the Investor sends the endowment and the Trustee sends back 5 euro. The intermediate outcome is 5,10. When punishment is applied, the final outcome is mere equal namely 4,7. This is explained by sufficient weight put on disadvantageous inequity (under inequity aversion) or having earnings closer to the average (under ERC). The same reasoning applies when the Trustee sends back 0 euro.

Count to ten first page 20 of 37 various emotional states. That is, all three frameworks discussed above predict no effect of temperature. With the rising interest of economists in the role of affect, several models have been developed that incorporate dual-system theories of behavior. Most of these attempts are concerned with intertemporal choice and/or self control (Shefrin and Thaler, 1988; Metcalfe and Mischel, 1999; Caplin and Leahy, 2001; Bernheim and Rangel, 2003). Loewenstein and O'Donoghue (2007) provide a framework that models the hot-cold concept more generally. Behavior depends on the interplay of a cool deliberative system and a hot affective system. In the domain of social interactions, the regard for the other person is determined by the weighted average of two motivational functions. The deliberative system puts a stable weight on the other person's payoff (on some moral grounds), and the affective system puts a variable weight on the other person's payoff that depends on how sympathetic one feels toward the other. The main implication is that ultimately the affective system is decisive. Little sympathy in the affective system moves behavior closer to self-interest, and high sympathy pushes behavior towards altruism. Although Loewenstein and O'Donoghue do not discuss the case of negative sympathy (antipathy), I do recognize the potential to extend their framework to antipathy as well. Negative affect would push behavior also away from self-interest, but in the direction of spiteful behavior. According to this reasoning, costly punishment in one-shot interactions can be explained by high intensity of negative affect. Lowering the affective intensity – for instance by a cooling-off period – should then move behavior back towards self-interest (no punishment).

5. EMPIRICAL ANALYSIS: A SETUP Although I do not estimate an empirical model because of insufficient data, I still like to present my setup for the empirical analysis. As a first concern, I have to account for the fact that each participant plays the game twice. This means that not all observations are independent and that decisions within any one player can be correlated. This causes problems in the estimation of the standard errors and hence in the estimation of the test statistics. To control for correlated errors I form twenty clusters; one for each player. These clusters allow for correlated standard errors within each participant, but not between them. Second, I have to choose a specification. Recall that the main research question asks whether punishment occurs less frequently when a time lag is imposed on the punishment decision. The assumed underlying force of the treatment effect is emotional state. In other words, the treatment effect is assumed

Count to ten first page 21 of 37 to be mediated by emotional state. Therefore I would perform mediation analysis (Baron and Kenny, 1986) to investigate whether variation in punishment can be explained by imposing a time lag and if so, whether this effect can be (partially) explained by variation in emotional state. Below I prepare the steps of the proposed analysis.

The first step is to test whether there is an effect at all that may be mediated, i.e. whether there is a relationship between punishment and time lag. The outcome variable is the dummy variable 'Z', which equals 1 if punishment occurred and 0 otherwise. The variable of interest or predictor is the treatment variable 'D' (for delayed), which equals 1 if a time lag was imposed and 0 otherwise. Emotional state is characterized by valence and arousal, which are each measured on a separate nine-point scale. The scales run from respectively very happy or excited (+4) to very unhappy or calm (-4). Let SAM-A denote the score on arousal and SAM-V the score on valence. Notice that a score of zero corresponds to a neutral emotional state in that dimension. I combine the two SAM scores into one measure called 'A' for Affect. I do this by adding SAM-A and SAM-V in absolute value. A high A then indicates a non-neutral emotional state (in one or both dimension(s)), while a low A indicates a state closer to neutrality. The variable A is assumed to be the mediator variable. The transfer by the Trustee, Y, is included as an explanatory variable since punishment is foremost a response to Y. A possible treatment effect is assumed to run through Y, since the time lag is thought to diminish the initial emotional response to the Trustee's choice. To capture this, I include an interaction term of D and Y 18 :

Specification I Z = β0 + β1 D + β2 Y + β3 D*Y + ε

Since the dependent variable is binary, a probit estimation is appropriate. To test whether there is a treatment effect, I would then test the following hypothesis:

H0: β1 = β3 = 0

H1: β1 v β3 ≠ 0

If the null hypothesis can be rejected and the sum of the coefficients β1 and β3 is negative, then I would have found statistical support for the notion that imposing a cooling-off period reduces the propensity to punish. If the null hypothesis is rejected but the sum of the coefficients is positive, the time lag had the opposite effect and stirred up the urge to punishment. If the coefficients are not significantly different from zero, I would have found no evidence for a treatment effect and the analysis would stop here.

18 Note that X, the initial transfer by the Investor, need not to be included since punishment is possible only when the endowment is trusted.

Count to ten first page 22 of 37 If I do find a treatment effect, the next question is why a time lag influences punishment behavior and in particular whether emotional state of the punisher can explain the effect. Therefore, the second step is to see whether there exists a relationship between the assumed mediator A and the predictor D (while again controlling for Y):

Specification II A = γ0 + γ1 D+ γ2 Y + γ3 D*Y + ε

This specification can be estimated by ordinary least squares-regression. A minimum requirement for a mediation effect of emotional state is a significant relationship between A and D. To verify this, I would test the following hypothesis:

H0: γ1 = γ3 = 0

H1: γ1 v γ3 ≠ 0

The requirement is met if the null hypothesis can be rejected and the sum of the coefficients γ1 and γ3 is non-zero. I would then proceed to the third step. If this requirement is not met, I would have found no evidence for a (non-trivial) mediation effect of emotional state. The hypothesis that a time lag serves as a cooling-off period predicts that the sum of γ1 and γ3 is negative, i.e. that a time lag 'neutralizes' emotion. In the third step I test a full model. For a mediation effect to exist, the mediator not only needs to be related to the predictor but also to the outcome variable. Furthermore, the effect of A on Z could depend on Y. This relation could have shown up already as a jointly significant non-zero sum of γ2 and γ3 in Specification II. To capture this possibility, I add an interaction term of A and Y. The full model then looks as follows 19 :

Specification III Z = δ0 + δ 1 D+ δ2 A + δ 3 Y + δ4 D*Y + δ 5 A*Y + ε

Now the final test on the mediation effect can be performed by testing the following hypothesis:

H0: δ2 = δ5 = 0

H1: δ2 v δ 5 ≠ 0 Rejection of the null hypothesis would confirm that emotional state mediates the treatment effect. Emotional state completely mediates the effect of the time lag if the coefficients δ 1 and δ4 are jointly insignificant and/or their sum is zero. In that case, the terms involving D can be dropped. Evidence of this mediation effect would support the idea that emotional state of the decision maker is a determinant of punishment. The economic importance of this can be judged by the

19 Note that an interaction of A and D is not included since their interdependence is investigated in earlier steps of the mediation analysis.

Count to ten first page 23 of 37 size of the sum of δ2 and δ5. The hypothesis that decision makers in a more neutral emotional state punish less predicts that the sum of δ2 and δ5 is negative.

The specifications above could be further refined by controlling for additional measures. Differences in punishment rates or emotional state can also stem from differences in personal characteristics, which are now captured by the error term. Treatment is assigned (quasi-)randomly and is therefore expected not to correlate with any personal characteristics. For those characteristics that are measured, auxiliary regressions can be run to check whether those are correlated with the explanatory variables. Those that do correlate should be included in the regression to prevent omitted variable bias. If the full model tests significant, it would be interesting to look whether arousal, valence, or both are affected by the delay. For this purpose, I would re- run Specification II but now with SAM-A and SAM-V each in turn as the independent variable. This could indicate in which dimension(s) emotional state is altered by a time lag.

6. RESULTS As explained in Section 1, the number of sessions was less than originally planned in the sight of insufficient variation in behavior. I performed three sessions of each six (2 times) or eight (1 time) participants, twenty people in total. Since each participant plays the game twice (part 1 and 2), this gives twenty observations (though not all unique, see Section 5). A session took approximately forty minutes, with ten minutes spent on reading the instructions. All participants were native Dutch and enjoyed higher education (higher professional education (in Dutch 'HBO') or university). The average age was 34.7 years ( sd 12.187) 20 . Sixty percent was male. Thirty percent was familiar with game theory. Six out of twenty participants were selected for payment, earning on average 10.42 euros. Although I cannot answer my main research question, there are some other insights that can be derived. In informal talks with participants after a session was done, I learned that many of them immediately recognized the tension between greed and fairness. To me this is a sign that these kinds of simple laboratory games do succeed in mimicking 'real' situations, also with people who are completely unfamiliar with experimental economics. I also learned that questionnaires in which participants were asked to state their considerations during decision making were very helpful. Participants answered these questions seriously, only one person did

20 'sd ' is short for standard deviation

Count to ten first page 24 of 37 not write something down here. There is little evidence of subgame-perfect gameplay, as has been replicated in many studies before. Investors transferred their endowment 80 percent of the time (only two unique 21 Investors did not). The average transfer by Trustees was 6.56 euros (sd 1.983), meaning that trust was generally repaid. Out of the sixteen occasions that Investors trusted, 75 percent of Trustee decisions showed trustworthiness (nine unique), and 18.75 percent mild trustworthiness (three unique). Untrustworthiness was enacted only once. Punishment is not observed at all. In this data set, this can be explained by the high degree of trustworthiness. When a Trustee transfers back the maximum s/he can, there is no rationale to punish him/her. 22 The question then is how the high degrees of trust and trustworthiness can be explained. The high return rate of the Trustees can have several explanations. Trustees might anticipate Investors to punish them in case of low transfers, and send a high amount to prevent that. Alternatively they might have some form of other- regarding preferences, which can be a desire to equalize payoffs, to adhere to an implicit norm (social or private), or to reciprocate the Investor's trust. Lastly, the choice set employed might have nudged Trustees toward high trustworthiness since transferring 7.50 euros was the only way to make the exchange pay for both players 23 . To understand the high degree of trust and trustworthiness, I looked at the self-stated considerations in the questionnaires. I asked for these in a neutral and open manner, without hinting on any particular motives. The statements align remarkably well with the common frameworks of social games. I categorized the answers as follows. Responses that read “try to earn as much as possible myself” and “maximize profits” were coded as 'maximize own earnings'. Motivations that included “share equally” and “increase joint earnings” were coded as 'share earnings'. Lastly, phrases like “I trust that the other will pay me back” and “return the gesture” were coded as 'reciprocate'. In case participants gave account of different considerations in Part 2 then Part 1 (when they took different actions in both parts), these were recorded separately. If they were the same (this could be indicated by the participants in the questionnaire), I counted that same explanation twice. All but six decisions (three unique) could be coded in this manner. Three people could not be linked to one of the motives coded above. One participant said

21 By 'unique' I mean that one individual participant made such a decision. For example: I observed four times that the initial endowment was kept. This was done by two Investors who kept it in both parts, hence two unique decision makers. I report the number of unique decision makers to put the frequency which a particular choice or answer occurred in the right perspective. 22 Note that punishment after receiving 7.50 euros is allowed. In case of extremely competitive preferences (wanting to be ahead more then wanting to earn the most), people may want to do so. 23 I will come back to this notion in Section 7.

Count to ten first page 25 of 37 to offer an equal split out of fear for punishment later. One said to “choose something that I would not choose otherwise”. The third said to consider that s/he “ knew the other people in the room”. Looking at the 34 decisions I could link to one of the three motives, self- stated explanations and behavior matches fairly well (especially for the latter two motives). Again there is little evidence for the prediction of traditional economics; maximizing the own earnings is said to be the objective for only 20.6 percent (seven decisions, four unique). The Investors who state this motivation still transfer their endowment four out of six times (two out of three unique). At first sight, this seems inconsistent since a rational profit-maximizer is expected to keep its endowment. However, transferring your endowment can still be in line with maximizing your own earnings if you believe that the chance the Trustee will return more then the endowment's worth is sufficiently high. The one Trustee who reported he aimed to maximize earnings indeed kept the 15 euros. This was also the only time this occurred. About 47 percent of the coded considerations indicated the desire to 'share earnings'. Trustees who reported so (nine times, seven unique) all indeed decided on an equal split. Investors aiming for an equal share (four times, two unique) all trusted to meet a like-minded person, this was the case twice. The remaining 32.4 percent stated reciprocal motives. Investors betting on their trust being reciprocated all sent their endowments (four times, two unique). They all received at least their endowment back. Trustees who said to reciprocate gave back 5 euro (one time) or 7.50 euros (three times, two unique).

The SAM proved easy to use, even when introduced as a surprise. Recall that valence and arousal were each indicated on a separate nine-point scale from respectively very happy or excited (+4) to very unhappy or calm (-4). As is to be expected, valence is highly and significantly correlated with the Trustee's transfer (r=0.6562, p=0.0058, n=16). Responses towards a given transfer varied widely among individuals. Scores for arousal after receiving 7.50 euros (n=12) varied between -4 and 1, on average -0.58 ( sd 1.564). Scores for valence were more alike, on average 3.33 ( sd 0.985). A wide dispersion can also be found in the fairness opinions. Participants (n=20) were asked to indicate how fair they think the three possible actions by the Trustee are on a scale of very unfair (1) to very fair (7). The equal split was regarded as fair with a mean score of 6.3 ( sd 1.017). Participants agreed less on the fairness of sending back 0 or 5 euros. Both options received a score between 1 and 6, on average 2.35 ( s=1.545). Investors judged the two lowest options as

Count to ten first page 26 of 37 slightly less fair than Trustees, but not significantly. Quiz performance suggests that both sets of questions were not too easy to be a distraction. Participants performed on average slightly (but insignificantly) better in Quiz 2, with 1.825 ( sd 0.864) correct answers compared to 1.65 ( sd 0.813) in Quiz 1. 24 Surprisingly, Quiz 1 performance is negatively correlated to Quiz 2 performance (r= -0.49, p=0.001, n=20). This suggests that the quizzes appealed to different reasoning skills. This does not seem problematic for the current purpose. Even if the two quizzes are not comparable in terms of difficulty, the important aspect for this research is whether the quizzes are comparable in providing a distraction from the game played. I do not have any indication (like e.g. completion time, average performance) that the quizzes were incomparable in this way.

7. DISCUSSION AND SUGGESTIONS FOR FUTURE RESEARCH I observed an unexpectedly high degree of trustworthiness. This could be just a concern of sample size. The number of data points collected is likely to be too small to observe enough variation to replicate the common findings that a minor but substantial proportion of the Trustees sends back nothing or very little. The solution then would be to repeat the experiment on a larger scale. Besides the sample size, I have identified five other factors that might have contributed to high trust and trustworthiness and the lack of punishment: endowments, Trustee's action space, participants being acquaintances, degree of anonymity, and size of incentives. I will discuss each in turn. The first factor concerns the unequal endowments. Investors had an endowment of 7 euros whereas Trustees were endowed with 2 euros. This could explain high transfers by Investors if they want to make up for the installed inequity. There exists some modest support for this argument (Johnson and Mislin, 2011). The asymmetric endowments cannot directly explain high trustworthiness here. There might exists an indirect route, if Trustees are motivated by reciprocity and respond with high transfers to Investors' trust when that in turn is motivated by solving the endowment inequity. To prevent this argument to play a role, the endowments should be chosen such that earnings are equal if the Investor decides not to transfer any money. To enable punishment in all scenarios, the Investor needs a positive cash balance next to any earnings from the game. The amount that can be transferred therefore needs to be smaller than the endowment. For example, endow both players with 7 euros of which the Investor can transfer 5.

24 Unfortunately I cannot comment on how Quiz 1 (CRT) performance relates to punishment.

Count to ten first page 27 of 37 The second factor is the action space. In particular the Trustee's action space could be responsible for both high 'trustworthiness' and lack of punishment. As mentioned in Section 6, Trustees can only reward the Investor's trust by sending back 7.50 euros. The in this respect large gap between sending back 5 or 7.50 euros might have increased the number of 7.50 transfers. This could in turn have been anticipated by Investors and hence explain high trust. An improvement in this respect would be to let the Trustee choose among sending back 0, 6, or 7.50 euros. The middle option then also rewards the Investor and could therefore be more attractive to Trustees who prefer to reward the Investor but not necessarily to equate earnings. If this option is indeed chosen more often, this reduces the degree of trustworthiness and possibly provokes more punishment. The Trustee's action space gives rise to another explanation for the low incidence of punishment. Investors have a rationale for punishment when a Trustee decides upon an 'unfair' split of the tripled amount of money. In the current setting, sending back 0 or 5 euros can be regarded as unfair. Many Trustees however choose the equal split, thereby creating almost no situations prone to punishment. To tempt Trustee behavior a bit in the desired direction, the action space could be chosen such that it is more tempting to choose an unequal split. 25 The action space proposed in the last paragraph is one example. For a more extreme case, let Trustees choose between an allocation of sending 3 and keeping 12 versus sending 9 and keeping 6. Such a choice set sharpens the tension between personal gain and cooperative behavior. This might nudge Trustees to choose an unequal split more often, and hence create more situations where there is ground for punishment. The third factor concerns the participants. Although I did not know most participants directly, within a session participants knew each other. This could have increased cooperative behavior in two ways. First, acquaintances have social ties that make the other person's payoff weigh heavier in their own utility function. It has been shown that participants are more willing to help each other if their exists a positive social tie between them (van Dijk et al., 2002). Also, friends are found to be better in coordinating their actions (Reuben and van Winden, 2008). Hence, when participants are positively tied (as I gauge it to be the case in this pool, but of course this is speculative), this can explain the high degree of cooperation. This brings me to the forth factor of anonymity. My participants definitely encounter each other again after the experiment. Although I ensured not to reveal any information about actions and payoffs, it is highly probable that participants foresaw talking about the experiment afterwards and asking each other about their actions. This makes behavior more visible and less anonymous than desirable,

25 Choosing the action space to influence the gameplays participants face is a common practice, especially when the researcher is interested in behavior in particular situations.

Count to ten first page 28 of 37 which opens a window for reputation considerations to enter. Furthermore, there could be a Hawthorne effect (also called an observer effect) operating since the experiment was only single-blind. Participants may have acted more 'socially desirable' because I as the experimenter could directly observe their behavior during the experiment. 26 This argument has a strong intuitive appeal, but double- blindness shows to have hardly any effect on behavior (Johnson and Mislin, 2011). The fifth and last factor I want to discuss here is the incentives. A participants could earn at most 11 euros as an Investor and 19.50 as a Trustee. For a forty minute session, this corresponds to an hourly wage of 16.50 to 29.25 euros. This is fairly in line with pay rates at university research labs. However, the actual payoffs were diluted in two ways. First, there was a 25%-33% chance of payment being effectuated. This also meant that participants did not receive a guaranteed show-up fee (as is common). However, this factor cannot explain the high degree of trust and trustworthiness since transfers are found to be smaller under random payment schemes (Johnson and Mislin, 2011). Secondly, the converted hourly pay (even when disregarding the former point) did not match opportunity costs of all participants since most of them were well-paid employed adults. Taken together, the true stakes were smaller than the absolute number of 19.50 euros suggests. People may act less in accordance with their true preferences when the stakes are small. It is easier to share money when you do not lose a lot by doing so. Studies on the effect of stakes however find only modest differences in behavior (Camerer, 2003, p.63; Johnson and Mislin, 2011).

It is an empirical question whether the factors discussed above have a significant impact on behavior. I do suggest that a future follow-up study at least improves on sample size, securing of anonymity, and maintaining a proper incentive level. This can be realized by recruiting participants from larger pools of people, by applying a double-blind procedure (as used in Berg et al., 1995), and by paying all participants according to their earnings. If the subject pool would again consists of highly- educated professionals, the opportunity costs of participation can be matched by raising the stakes and/or reducing the duration of a session (e.g. by playing one part only). In addition, the experimental design could be adjusted set balanced endowments and to increase the number of gameplays that are likely to give rise to punishment. Also, I believe that measuring beliefs about the other player's action is a valuable addition. This enables the experimenter to assess the degree of strategical

26 I did not record names or other personal details, and this was clearly communicated to the participants. Nevertheless, during a session I looked people over the shoulder to record their behavior.

Count to ten first page 29 of 37 thinking a participant displays. For example, if an Investor is known to fully belief that the Trustee will reciprocate, then for him to send over the endowment qualifies as a rational pay-off maximizing decision. 27 Furthermore, the current design can be expanded to explore the effect of varying the length of the time lag and how this time is spent (active perspective taking, mere waiting, provoking holding a grudge, etc.). Another interesting extension is to operationalize the cooling-off period as an option to revise the punishment decision (instead of delaying the decision). Lastly, it would be interesting to see whether a cooling-off effect is constant over other parameters such as punishment multiples and height of the stakes.

8. CONCLUSION Most if not all economic transactions are embedded in a relationship of trust. When trust is violated, there often exist possibilities for punishment. I designed a behavioral experiment to investigate the effect of imposing a time lag between the observation of a possible violation of trust and the decision to punish. Common theories of trust and punishment such as reciprocity or inequity aversion cannot explain an effect of timing. Others that stress the role of affect, such as dual- process theories, can. This study aimed to test the hypothesis that the delay transports decision makers from a hot emotional state to a cool thoughtful state of mind, which in turn is expected to lower the demand for punishment. Unfortunately, the data collected were insufficient to answer the research question. The practical restrictions of this pilot study may have acted as confounding factors. With those restrictions lifted, the design and the insights I did obtain show enough potential for further investigation of this topic.

27 Different structures such as caring about joint payoffs can also rationalize trust behavior.

Count to ten first page 30 of 37 REFERENCES

Abele, S. and Ehrhart, K.M. (2005) The timing effect in public good games. Journal of Experimental Social Psychology , 41(5), p. 470-481

Adler, R.S., Rosen, B. and Silverstein, E.M. (1998) Emotions in negotiation: how to manage fear and anger. Negotiation Journal , vol. 14(2), p. 161-179

Andreoni, J., Harbaugh, W., Vesterlund, L. (2003) The Carrot or the Stick: Rewards, Punishments, and Cooperation. The American Economic Review , vol 93(3), p. 893-902

Baron, R.M. and Kenny, D.A. (1986) The Moderator-Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations. Journal of Personality and Social Psychology , vol. 51(6), p. 1173-1182

Berg, J., Dickhaut, J., McCabe, K. (1995) Trust, Reciprocity, and Social History. Games and Economic Behavior , vol. 10(1), p. 122–142

Bernheim, B. D. & Rangel, A. (2003). Emotions, cognition, and savings: Theory and policy. Mimeo, Stanford University.

Bolton, G. and Ockenfels, A. (2000) ERC: A Theory of Equity, Reciprocity and Competition, American Economic Review , vol. 90, p. 166-93

Bosman, R., Sonnemans, J., & Zeelenberg, M. (2001) Emotions, rejections, and cooling off in the ultimatum game. Unpublished, http://www1.fee.uva.nl/creed/pdffiles/coolingoff.pdf

Bradley, M.M., and Lang, P.J. (1994) Measuring emotion: the Self-Assessment Manikin and the Semantic Differential. Journal of Behavior Therapy and Experimental Psychiatry , 25, p. 49-59

Brosig, J. and Weimann, J. (2003) The hot versus cold effect in a simple bargaining experiment. Experimental Economics, vol. 6, p. 75-90

Camerer, C.F. (2003) Behavioral Game Theory. Princeton, NJ: Princeton University Press.

Caplin, A. and J. Leahy (2001). Psychological expected utility theory and anticipatory feelings. Quarterly Journal of Economics 116 (1), 55–79.

Cappelletti, D., Güth, W., and Ploner, M. (2008) Being of two minds: An ultimatum experiment investigating affective processes. Jena Economic Research Papers, p.2008–3048

Cardella, E. and Chiu, R. (2012) Stackelberg in the lab: The effect of group decision making and ‘‘Cooling-off’’ periods. Journal of Economic Psychology , vol. 33, p. 1070–1083

Charness, G.B. (2004) Attribution and Reciprocity in an Experimental Labor Market. Journal of Labor Economics , 22: 665-688

Charness, G.B. and Brandts, J. (1998) Hot vs. Cold: Sequential responses and preference stability in experimental games. Experimental Economics , vol.2(3), p. 227/238

Cox, J. (2009) Trust and reciprocity: implications of game triads and social context. New Zealand Economic Papers, vol. 43(2), p.89-104

Damasio, A.R. (1994) Descartes' error: emotion, reason, and the human brain . New York: Putnam

Dijk, F. van, Sonnemans, J.H., and Winden, F.A.A.M. van (2002) Social ties in a public good experiment. Journal of Public Economics , vol. 85(2), p. 275-299

Doner, K. (1996) Heal your angry heart. American Health , vol. 15(7), p. 74-78

Dufwenberg, M. and Kirchsteiger, G. (2004) A theory of sequential reciprocity. Games and

Count to ten first page 31 of 37 Economic Behavior , 47(2), p. 268–298.

Egas, M. and Riedl, A. (2008) The economics of altruistic punishment and the maintenance of cooperation. P roceedings of the Royal Society, B 275, p 871-878 Epstein, S. (1994) Integration of the cognitive and the psycho-dynamic unconscious. American Psychologist , 49, p. 709–724

Falk, A., and Fischbacher, U., (1998) A theory of reciprocity. Working paper No. 6. University of Zürich

Fehr, E. and Gächter, S. (2000) Cooperation and punishment in public good experiments. American Economic Review , 90(4), p. 980-994

Fehr, E., Gächter, S., and Kirchsteiger, G. (1997) Reciprocity as a Contract Enforcement Device: Experimental Evidence. Econometrica, 65(4), p. 833-60

Fehr, E. and Schmidt, K.M. (1999) A Theory of Fairness, Competition, and Cooperation, Quarterly Journal of Economics , vol. 114, p. 817-68.

Frederick, S. (2005) Cognitive reflection and decision making. Journal of Economics Perspectives, vol. 19, p. 25–42

Frijda, N. (1988) The Laws of Emotion. American Psychologist , vol. 5, p. 349-58

Gneezy, U. and Potters, J. (1997) An Experiment on Risk Taking and Evaluation Periods. The Quarterly Journal of Economics, 112, p. 631-645

Goleman, D. (1995) Emotional intelligence. New York: Bantam Books

Grimm, V., and Mengel, F. (2011) Let me sleep on it: Delay reduces rejection rates in ultimatum games. Economic Letters, 111(2), p. 113–115

Hopfensitz, A. and Reuben, E. (2009) The Importance of Emotions for the Effectiveness of Social Punishment. The Economic Journal , 119, p. 1534–1559

Hoppe, E.I. and Kusterer, D.J. (2010) Behavioral basis and cognitive reflection. Cologne Graduate School Working Paper , 1(3)

Johnson, N.D. and Mislin, A.A. (2011) Trust games: a meta-analysis. Journal of Economic Psychology , vol. 32(5), p. 865-889

Kahneman, D. (2011) Thinking, fast and slow. New York: Farrar, Straus and Giroux

Kahneman, D. and Frederick, S. (2006) Frames and brains: elicitation and control of response tendencies. TRENDS in Cognitive Sciences, vol.11(2)

Kosfeld, M., Heinrichs, M., Zak, P., Fischbacher, U., and Fehr, E. (2005) Oxytocin Increases Trust in Humans. Nature , p. 673-676

Kreps, D. M. (1990) Corporate Culture and Economic Theory, in perspectives on positive Economy (J. Alt and K. Shepsle, Eds.) Cambridge: Cambridge Univ. Press.

LeDoux, Joseph (1996) The emotional brain . New York: Simon & Schuster.

Lopez-Perez, R. (2008) Aversion-to-norm-breaking: a model. Games and Economic Behavior , vol. 64(1), p. 237-267

Loewenstein, G. (1996) Out of Control: Visceral Influences on Behavior, Organizational behavior and Human decision Processes , vol.65, p. 272-92

Loewenstein, G. (2000) Emotions in Economic Theory and Economic Behavior. The American Economic Review, vol. 90(2), Papers and Proceedings of the One Hundred Twelfth Annual Meeting of the American Economic Association, p. 426-432

Count to ten first page 32 of 37 Loewenstein, G. and O'Donoghue, T. (2007) The Heat of the Moment: Modeling Interactions Between Affect and Deliberation. Unpublished manuscript. Mimeo: Cornell University

Lukas, C. and Walgenbach, P. (2010) Trust me: it is high trust: on trust and its measurement. University of Konstanz, working paper series 2010-09

Manapata, M.L., Nowak, N.A., and Rand, D.G. (2013) Information, irrationality, and the evolution of trust. Journal of Economic Behavior and Organization, 90(suppl.), p. S57-S75

Mayer, R.C., Davis J.H., Schoorman, F.D. (1995) An integrative model of organizational trust. Academy of Management Review, 20, p. 709‐ 734.

Metcalfe, J. and Mischel, W. (1999) A hot/cool-system analysis of delay of gratification: Dynamics of willpower. Psychological Review , 106(1), p. 3-19

Neo, W.S., Yu, M., Weber, R.A., and Gonzalez, C. (2013) The effects of time delay in reciprocity games. Journal of Economic Psychology, vol. 34, p. 20–35

Nooteboom, B. (2002) Trust: Forms, Foundations, Functions, Failures and Figures . Cheltenham: Edward Elgar.

Oechssler, J., Roider, A. and, Schmitz, P.W. (2008) Cooling-Off in Negotiations - Does It Work? CEPR discussion paper no. 6807

Quervain, D. de, Fischbacher, U., Treyer, V., Schellhammer, M., Schnyder, U., Buck, A., and Fehr, E. (2004) The Neural Basis of Altruistic Punishment. Science , 305, p. 1254-58

Rabin, M. (1993) Incorporating Fairness into Game Theory and Economics , American Economic Review , 83, p. 1281-1302

Rapoport, A. (1997) Order of Play in Strategically Equivalent Games in Extensive Form, International Journal of Game Theory , vol. 26, p. 113-36

Reuben, E. and Winden, F.A.A.M. van (2008) Social Ties and Coordination on Negative Reciprocity: The Role of Affect, Journal of Public Economics, vol. 92, p. 34–53

Rigdon, M. (2009) Trust and reciprocity in incentive contracting. Journal of Economic Behavior and Organization , 70, p. 93–105

Sanfey, A.G., Rilling, J.K., Aronson, J.A., Nystrom, L.E., and Cohen, J.D. (2003) The neural basis of economic decision-making in the ultimatum game. Science , 300, p. 1755–1758

Shefrin, H. M. and Thaler, R. H. (1988) The behavioral life-cycle hypothesis. Economic Inquiry, vol. 26, p. 609-643

Selten, R. (1965) Spieltheoretische Behandlung eines Oligopolmodells mit Nachfragetragheit, Zeitschrift für die gesamte Staatrwissenschaft , 121, 301 - 24, 667 – 89

Sobel, J. (1985) A theory of credibility. The Review of Economic Studies , vol. 52(4), p. 557- 573

Stanovich, K E. and West, R F. (2000) Individual difference in reasoning: implications for the rationality debate? Behavioural and Brain Sciences, 23, p. 645–726

VanOyen Witvliet, C., Ludwig, T.E., and Van der Laan, K.L. (2001) Granting Forgiveness or Harboring Grudges: Implications for Emotion, Physiology, andHealth. Psychological Science , vol. 12(2), p. 117-123

Von Neumann, J. and Morgenstern, O. (1947) The theory of games and economic behavior . Princeton, NJ: Princeton University Press

Watson, J. (1999) Starting Small and Renegotiation. Journal of Economic Theory, vol. 85(1), p. 52-90

Count to ten first page 33 of 37 Watson, J. (2002) Starting Small and Commitment. Games and Economic Behavior , vol. 38(1), p. 176‐ 199

Xiao, E. and Houser, D. (2005) Emotion Expression in Human Punishment Behavior. Proceedings of the National Academy of Sciences of the United States of America , vol. 102(20), p. 7398-7401

Zajonc, R.B. (1984) On the primacy of affect. American Psychologist , vol. 39(2), p. 117-123

APPENDICES

To allow the reader to review the material the participants faced personally, I supply here the following documents:

A1. Spoken text experimenter (in Dutch) A2. Quiz questions and answers (in English) A3. Instructions (in Dutch) A4. Example answer forms (in Dutch)

Appendices A1 and A2 are included on the following pages. Please find A3 and A4 attached as separate documents.

Count to ten first page 34 of 37 A1. Spoken text experimenter (in Dutch)

Upon entering the lab

Ik begin nu met het formele gedeelte.

Goedemiddag. Ik ben Cora Hollander, de leider van het experiment van vandaag. Dit experiment is een onderdeel van mijn onderzoek naar besluitvorming. De totale duur van het experiment is ongeveer 40 minuten.

Ik heb hier een aantal kaarten. Straks vraag ik jullie één kaart te kiezen en de ruimte hier binnen te gaan. Op elke kaart staat een nummer. De tafels zijn ook genummerd. U wordt verzocht plaats te nemen aan de tafel waarvan het nummer overeenkomt met het nummer op uw kaart. Dit tafelnummer wordt gebruikt om elke deelnemer tijdens en na het experiment te herkennen.

Het is voor het onderzoek belangrijk dat verschillende sessies vergelijkbaar zijn. Daarom is het belangrijk dat u de instructies goed leest en precies opvolgt. Het is met name belangrijk dat u vanaf dit moment, op geen enkele wijze met andere deelnemers communiceert . Er moet niet meer informatie gedeeld worden dan in de instructies wordt aangegeven.

Wanneer iedereen aan de juiste tafel zit, zullen verdere instructies volgen.

Zijn er op dit moment vragen?

Oké. Dan mag nu iedereen een kaart trekken en plaatsnemen.

During a session (example T0)

Voor u ligt een set met antwoordenformulieren voor Deel 1. Hierop vindt u uw rol. Ook staan hier nogmaals in het kort de instructies per stap. Stap 3 en 4 beginnen op een nieuwe pagina. Wanneer u een beslissing heeft te nemen, kunt u uw keuze kenbaar maken in de kolom “Uw antwoord”. Wanneer u informatie krijgt over de keuze van de deelnemer waarmee u gekoppeld bent, zal ik deze invullen in de kolom “ingevuld door leiding experiment”.

Deel 1 Start Deel 1, stap 1. Spelers A maak u keuze. U heeft 20 seconden. Tijd is om. Ik kom langs om de beslissingen over te brengen.

Stap 2. Spelers B maak u keuze. U heeft 20 seconden. De tijd is om. Ik kom langs om de beslissingen over te brengen.

Blader naar Stap 3. Spelers A maak u keuze. U heeft 2 minuten. De helft van de tijd is om. De tijd is om.

Blader naar Stap 4. Alle spelers, ga u gang. U heeft 4 minuten. De helft van de tijd is om. De tijd is om.

Count to ten first page 35 of 37 Ik kom de antwoordformulieren van Deel 1 ophalen.

Ik deel nu de instructies van Deel 2 uit.

Deel 2 Start Deel 2, stap 1. Spelers A maak u keuze. U heeft 20 seconden. Tijd is om. Ik kom langs om de beslissingen over te brengen.

Stap 2. Spelers B maak u keuze. U heeft 20 seconden. De tijd is om. Ik kom langs om de beslissingen over te brengen.

Blader naar Stap 3. Spelers A maak u keuze. U heeft 2 minuten. De helft van de tijd is om. De tijd is om.

Blader naar Stap 4. Alle spelers, ga u gang. U heeft 4 minuten. De helft van de tijd is om. De tijd is om. Ik kom de antwoordformulieren van Deel 2 ophalen.

Ik deel nu de enquêteformulieren uit.

Ik deel nu de enveloppen met de betalingen uit. Ik vraag iedereen het ontvangstbewijs in de envelop te tekenen. Ik loop langs met deze bak om deze op te halen.

Het experiment is nu afgelopen. Nogmaals hartelijk dank voor jullie deelname.

Count to ten first page 36 of 37 A2. Quiz questions and answers

Part 1: the CRT consists of the following three questions:

• A bat and a ball cost $1.10 in total. The bat costs $1.00 more than the ball. How much does the ball cost? (intuitive answer is 10 cents, correct answer is 5 cents) • If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets? (intuitive answer is 100 minutes, correct answer is 5 minutes) • In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake? (the intuitive answer is 24 days, the correct answer is 47 days)

Part 2: the next three questions constituting the intermediate task:

• A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day. In the smaller hospital about 15 babies are born each day. As you know, about 50 percent of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50 percent, sometimes lower. For a period of 1 year, both hospitals recorded the days on which more than 60 percent of the babies born were boys. Which hospital do you think recorded more such days? a) the larger hospital b) the smaller hospital c) about the same (the correct answer is b: outlier observations occur more often in small samples) • Imagine tossing a fair coin (i.e. a coin that has an equal chance of coming up heads or tails). It has just come up heads 5 times in a row. For the sixth toss do you think that: a) it is more likely that tails will come up b) it is more likely that heads will come up c) heads and tails are equally probable on the sixth toss (the correct answer is c: each toss is independent) • Please read the description below and then state which of the two statements you find most probable. “Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a student she was deeply concerned with issues of discrimination and social justice and also participated in antinuclear demonstrations.” 1. Linda is a bank teller. 2. Linda is a bank teller and is active in the feminist movement. (the correct answer is 1: follows from logic: a single feature claim is more likely than a claim with two features (i.e. statement 2 is a part of statement 1)

Count to ten first page 37 of 37