Outcome bias and Football 1

Running head: OUTCOME BIAS AND FOOTBALL

Outcome Bias in Subjective Ratings of Performance:

Evidence from the (Football) Field

Edgar E. Kausel School of Management Pontificia Universidad Católica de Chile Av.Vicuña Mackenna 4860 Macul, Santiago, Chile. Phone: (56-2) 235-44018 [email protected]

Santiago Ventura School of Management Pontificia Universidad Católica de Chile Av.Vicuña Mackenna 4860 Macul, Santiago, Chile. Phone: (56-2) 235-47303 [email protected]

Arturo Rodríguez Faculty of Economics and Business University of Chile Diagonal Paraguay 257 Santiago, Chile 6510015 Phone: (56-2) 978-3770 Fax: (56-2) 222-0639 [email protected]

Outcome bias and Football 2

Abstract

The outcome bias occurs when people assess others’ decision making process or performance and put an unwarranted weight to their outcomes. This bias has important implications for the judgment and choice as well as the performance appraisal literatures.

However, virtually every extant study has been conducted in the lab, likely due to endogeneity concerns in field. Penalty shoot-outs in association football (‘soccer’) offer an interesting way of studying outcome bias, as recent research suggests that their outcome is unrelated to in-game performance. We use Goal (goal.com) to study subjective performance ratings by reporters given to 1,157 players in 43 games from important football competitions. Using both multilevel mixed-effects and fixed-effects (within- players design) modeling, we found that winning on penalties was linked to higher performance ratings. This result persisted even after we removed players who took part in the penalty shoot-outs; thus, supporting the idea of outcome bias. We discuss implications for applied settings.

Keywords: Judgment and Decision Making; Behavioral Economics; Heuristics and Biases;

Outcome Bias; Football (Soccer); Performance Appraisal.

Outcome bias and Football 3

Outcome Bias in Subjective Ratings of Performance:

Evidence from the (Football) Field

This is the lottery of penalty shootouts (Franz Beckenbauer during the 1990 World Cup; Vecsey, 1990)

For me, the national team is over (Lionel Messi’s reaction after losing in penalty shootouts the Copa América final against Chile; Das, 2016)

Some people use a similar argument to amplify a behavior after a win and to condemn the same behavior after a defeat. For example, if Neymar recovers a ball and then the team scores, journalists say: ‘Ah, the coach tamed Neymar and made him play collectively.’ But if the team loses, they say: ‘Useless coach, how can he make Neymar chase the wing-back instead of making him play close to the penalty box!’ The media specializes in corrupting human beings depending on a win or a loss. (Coach Marcelo “El Loco” Bielsa; El Dinamo, 2017)

1. Introduction

From a normative perspective, we should distinguish the decision making process from its consequences (Skitka & Tetlock, 1993; Hastie & Dawes, 2009). Given that real decisions are made in an uncertain world, an appropriate decision process may lead to poor results—or a poor decision process may lead to positive outcomes (Baron, 1985; Bazerman

& Moore, 2013; Brockner, 2015). For example, even if a well-intended physician follows a procedure flawlessly and exercises sound judgment, the operation may lead to patient death. More generally, we should be able to differentiate behavior from outcomes. This is consistent with Kahneman’s (2011) observation that “because luck plays a large role, the quality of leadership and management practices cannot be inferred reliably from observations of success” (p. 207).

Indeed, the behavior vs. outcome issue is important in performance assessment within the industrial-organizational and educational psychology literatures. The aim in Outcome bias and Football 4 performance appraisal is to judge an individual's behavior, not factors beyond his or her control (Cascio & Aguinis, 2010). This is one of the reasons many managers and school principals often tend to put emphasis on subjective ratings to assess employees or teachers, as opposed (or in addition) to so-called ‘objective measures’ (Murphy & Cleveland, 1995).

Although objective measures can tap job performance, they are more often indicators of outcomes—likely to be affected by external factors. For example, an objective measure of scholar performance is number of papers published. However, this outcome is affected by external factors such as the journal editors’ attitude toward the scholar’s specific field of study. Thus, researchers agree that subjective ratings are more likely to tap job performance1 than objective measures are (Landy & Conte, 2004). In particular, subjective ratings are said to cover psychologically relevant aspects of job performance given the context and task at hand (Campbell, 2012; Landy & Farr, 1983). Also, subjective ratings can capture different dimensions of job performance, such as task performance, contextual performance, and counterproductive behavior (Rotundo & Sackett, 2002).

A relevant issue regarding subjective ratings is, however, the potential influence of outcome bias (Savani & King, 2015; Gino, Moore, & Bazerman, 2009). Outcome bias occurs when outcome knowledge affects the evaluation of how well an individual made a decision or performed a task (Baron & Hershey, 1988; Johnston & Marshall, 2016). For example, Baron and Hershey showed two groups of participants the same description of a medical decision making process but different outcomes. People who were told that the

1 We should underscore that performance is defined as behavior—something done by an individual (Campbell, Mcenry, & Wise 1990).

Outcome bias and Football 5 outcome was positive perceived the decision making process as more favorable than those told that the outcome was negative.

Yet, although there are several published studies showing evidence of outcome bias using subjective ratings in the laboratory, where outcome information can be randomized, there are virtually no studies in field settings (cf. Lefren, Platt, & Price, 20142). This is unfortunate because, as the authors of a seminal paper on outcome bias recognized, the

“difficulty with the method we used is that it is restricted to somewhat artificial situations”

(Hershey & Baron, 1992, p. 91). The reason for a lack of field studies on outcome bias is that, in the field, an individual’s performance is correlated with his or her outcomes

(Bommer, Johnson, Rich, Podsakoff, & Mackenzie, 1995; Ilgen & Favero, 1985). As such, it is difficult to distinguish whether correspondence between subjective ratings and outcomes is the result of a biased assessment by the rater or actual performance by the ratee. In other words, there is a problem of endogeneity, whereby an individual’s performance may have an effect on both subjective ratings and outcomes.

2. Penalty shoot-outs and performance in the field

Association football (‘soccer’) gives us an interesting setting to examine outcome bias in subjective performance ratings. In some competitions, after the game ends in a draw, the teams proceed to penalty shoot-outs to determine the winner. Many coaches and

2 Lefren et al. is a notable exception. Using American Football games, they found that coaches were more likely to change their team strategy after a close loss than after a close win. Although impressive in terms of the data used, this result could be explained by action bias rather than outcome bias (Bar-Eli, Azar, Ritov, Keidar-Levin, & Schein, 2007). After losing, coaches tend to take action because this ‘looks better’ and thus is more justifiable (Connolly & Zeelenberg, 2002). Relatedly, the study by Leftren et al. does not directly include subjective performance ratings, which is the focus of our study. Outcome bias and Football 6 journalists believe that chance plays a crucial role in the outcome of penalty shoot-outs, as reflected by Franz Beckenbauer’s quote at the outset of the present paper.3

This belief seems to be supported by data. Nate Silver—from the popular Five

Thirty-Eight website—examined 37 official games from 2005 to 2013 played in the knockout stages of different continental championships among men’s national teams

(Silver, 2014). For each game, he classified teams as favorites and underdogs by using the teams’ ELO ranking, a measure of team skill that has relatively strong predicting power in forecasting football outcomes (Lasek, Szlávik, and Bhulai, 2013). Silver found that favorites and underdogs had roughly equal chance to win the game on the penalty shoot- outs. We used Silver’s data and also found no significant differences in ELO rating between teams that won (M = 1,803; SD = 172) and teams that lost (M = 1,800; SD = 153) on penalty shoot-outs (t [36] = .12, p = .91).

A different study with much more data at the individual level is consistent with this finding. The Economist’s Data Team (2017) analyzed 2,788 penalty kicks awarded in three

European leagues between 2007 and 2016. They found no statistically significant relationship between a player’s past conversion rate and his future success (Data Team,

2017). In sum, the outcome of a penalty shoot-out (i.e. who wins and who loses) seems to be unrelated to in-game skill and performance at both the team- and player- levels of analysis. We provide more evidence regarding this issue in our results section.

Note that we are not claiming that, in games defined by penalty shoot-outs, the teams’ in-game performances are equal. Indeed, we report below some evidence

3 We included two other quotes to reflect (a) how people in football are intuitively aware of the outcome bias and its problems (Marcelo Bielsa’s quote) and (b) how dramatic can an outcome of a penalty shootout be. Messi’s quote reflects this, as right after Argentina lost that game on penalties, Messi announced his retirement from the national team. Outcome bias and Football 7 suggesting that the there are some systematic differences in performance between teams in these games (i.e., favorites tend to have better performance than underdogs). What we are claiming is that, whichever the differences between two teams in this type of games are, these are unrelated to the penalty shoot-out outcome.

We exploit this independence between performance and outcome to study outcome bias in football players’ subjective performance ratings. We used subjective ratings from

Goal (goal.com), a large online football publication whose reporters consistently rated players once games had finished. We focused on games that were determined by penalty shoot-outs, after the game ended in a draw during regular (and in some cases extra) time.

The prediction we tested stated that players in a winning team would receive better performance ratings than those players in a losing team. Note again that the teams we analyzed tied during regular time—a quasi-exogenous factor (penalty shoot-outs) determined the winner. We also conducted analyses including and excluding those players who participated in the penalty shoot-outs. Our study is the first to test outcome bias in subjective in performance ratings using data that allow dealing with the endogeneity problem mentioned above.

3. Methodology and data

3.1. Overview

Our dataset is based on Goal (goal.com) ratings of football players’ performances in different games. From 2008 to early 2017, reporters from goal.com gave subjective performance ratings to players’ performance after games from several competitions.4 These

4 Since June 2017, together with a new website, Goal ceased providing players’ subjective ratings after games. In fact, Goal removed all ratings from previous games in their new website, leaving only lineups, basic statistics, and reports. However, we did retain files including printed screen shots of all the ratings we included in the analysis. Outcome bias and Football 8 ratings were included in the website, in addition to lineups, reports including the games’ major events, and basic statistics such as possession percentage. Although other websites such as The Guardian’s or ESPN’s also sometimes include subjective ratings, we decided to use Goal because its use was more consistent (i.e., not limited just to finals or games between popular teams) than in other websites. Furthermore, based on Goal we could find information from games going back as far as to 2008, as opposed to other websites that were more restricted in this respect.

Upon an initial examination of the website, we realized that Goal reporters consistently rated players in the following tournaments: FIFA World Cup, UEFA European

Championship, Cup of Nations, America Cup (Copa América), CONCACAF Gold

Cup, Confederations Cup, UEFA Champions League, UEFA Europa League, and EFL

(League) Cup.

We were interested in games that (a) had ended in a draw and (b) the teams proceeded to penalty shoot-outs to determine the winner. This implied that league games

(e.g., games from La Liga or Bundesliga) and games from group phases were excluded as their outcome are not determined by penalty shoot-outs. In other words, only games from knockout phases that ended in a draw were included. Also, games that were decided by penalty shoot-outs but had ended during regular time with one team beating the other

(second-leg games from the Champions or Europa leagues) were excluded. Using these criteria, we found data from 1,157 individual performances taken from 43 games. The list of the specific games is included in Appendix A.

3.2. Variables

3.2.1. Dependent Variable: Players’ Performance Ratings Outcome bias and Football 9

The dependent variable of this study was the players’ performance ratings made by reporters after games. Between years 2008 and 2010, the ratings could range from 1 to 10 in increments of .5; between years 2011 and 2016, the ratings could range from 1 to 5 in increments of .5. In all years, higher ratings meant better performance. Most of the available ratings came from the 2011-2016 period (1,024 ratings). We transformed the

2008-2010 (in total, only 133) ratings into a variable ranging from 1 to 5 by (a) subtracting

1 unit, (b) dividing by 2, and (c) adding 1 unit [transformed_score = (original_score − 1)/2

+ 1]. Regarding the distribution of the ratings, although positively- and negatively- unskewed, they were clustered around the mid-point (i.e., a leptokurtic distribution).

Around 80% of the ratings were between 2.5 and 3.5.

Although we attempted to identify which reporters scored which game—and we emailed Goal asking for this information—we were unable to obtain this information.

3.2.2. Independent variable and controls

The main predictor of the players’ ratings was whether a player’s team had won or lost the penalty shootouts.

We controlled for a number of variables that could be related to the players’ ratings.

We first used as a control the difference of the teams’ ELO ratings at the date the game was played. Larger numbers imply that the game has a more salient favorite and underdog. As noted above, the ELO rating is a measure of a team skill based on previous performance

(Lasek et al., 2013). In addition to actual skill, this control was included to take into account raters’ expectations (e.g., after a draw, they might rate the favorite team more team than the underdog, even if both teams’ performance was similar). We decided to use ELO ratings because these scores are easily available on a game-to-game basis across time. In Outcome bias and Football 10 addition, given the way ELO ratings are computed, they allow comparing teams across different time periods and competitions (e.g., national teams and club teams).

We also controlled for in-game performance. This is a complex, multifaceted construct in association football, which probably requires several measures. Despite this, we decided to include at least one measure of in-game performance. We decided to look for a statistic that would be a strong predictor of football outcomes. Several authors have argued that shots on target (also known as shots on goal) is one of the best predictors available. A shot on target has been defined as “an attempt to score a goal which required intervention to stop it going in or resulted in a goal/shot which would go in without being diverted” (Whoscored, 2018).

For example, Liu, Gomez, Lago-Peñas, and Sampaio (2015) examined the relationship between 24 in-game statistics and the game outcome (win, draw, loss) from 48 matches of the group stage of the 2014 FIFA World Cup in Brazil. They found that shots on target was the single best predictor of game outcome. A similar result has been found in games from the Korea/Japan 2002, Germany 2006, and South Africa 2010 World Cups

(Castellano, Casamichana, & Lago, 2012), UEFA Championship League (Lago-Peñas,

Lago-Bastelleros, & Rey, 2011), and Spanish Professional Football League (Lago-Peñas,

Lago-Ballesteros, Dellal, & Gómez, 2010). In the case of Liu et al. (2015), they also examined a sub-set of games they called “close matches” where the difference in goals of the final outcome was no more than one. In these 38 games, they found that shots on target, as for the full sample, was the best predictor of the outcome. This is particularly relevant for our purposes, as our sample of games are all close matches. As such, we decided to use the difference in shots on target between the teams in game (henceforth, we will just call it

‘shots in target’) as our measure of in-game performance. Outcome bias and Football 11

Because of similar reasons as skill (i.e., expectations that the home team would win), we controlled for home advantage. Other controls included at the player-level were:

Whether the player scored a goal during the game, whether the player participated in the penalty shoot-out, and whether the player scored during the penalty shoot-out. Given that goalkeepers often end up being praised and are seldom blamed after penalty shoot-outs

(Germany’s goalkeeper Kevin Trapp has reportedly said “You have nothing to lose in this situation”; Ahmed & Burn-Murdoch, 2017), we also controlled for whether the player was a goalkeeper. Additionally, we controlled for whether the player was member of a team who took the first kick, which has been found to predict the shoot-out’s outcome

(Apesteguia & Palacios-Huerta, 2010).5

Finally, we controlled for a dummy variable indicating which team (if any) had scored the last goal during the game (1 = scored last goal; 0 = did not score last goal). As a reviewer noted, Froese and Plessner (2010) found that the team scoring last during the game was more likely to win on penalty kicks than the other team. Note that in cases where both teams failed to score during regular or extra time, both teams received a zero on this variable.

4. Results

4.1. Initial analysis

As an initial analysis, we used clustered t-tests to compare the teams that won vs. those that lost based on some relevant variables. As shown in Table 1, there were no significant differences in Elo rating difference, home advantage, who took the first kick in

5 Indeed, we initially hoped to use this as an instrumental variable. Unfortunately, consistent with Kocher, Lenz, and Sutter (2012), we found that taking the first kick was unrelated to the penalty shoot-out’s outcome (and unrelated to the players’ ratings).

Outcome bias and Football 12

the penalty shoot-out (percentage), who scored the last goal, and the teams’ difference in

shots on target.6 This suggests that our independent variable of interest is indeed

orthogonal to other potential confounds. There were differences, however, in the average

player ratings, as expected.

We also found that the correlation between the shots on target and the teams’ ELO

ratings was 0.32 (p = .04). More specifically, an increase of 105 ELO points was

associated to an increase of 1 shot on target difference between two teams that end up

deciding the game on penalties. This suggests favorite teams played better than underdogs

in the games we analyzed; thus, it gives some convergent validity to our measure of in-

game performance (i.e., evidently better teams do play better in the field).

Table 1. Differences in relevant variables based on penalty shoot-out outcome. Lost Won Difference SE Diff p Player subjective rating 2.81 2.96 .15*** .04 .00 Teams’ ELO rating difference -12.66 18.90 31.56 48.86 .52 Home advantage (1 = Yes) .16 .21 .05 .09 .61 First kick in shootout (1 = Yes) .47 .54 .07 .15 .66 Team scored last during game (1 = Yes) .37 .19 -.18 .11 .11 Teams’ shots on target difference -1.11 .75 1.87 1.52 .23

Notes. N = 1,157. SE Diff = Standard error of the difference. The differences were tested using clustered t-tests. *** p < .001

4.2. Main analyses Our main analysis strategy was using linear multilevel mixed-effects modelling

(Rabe-Hesketh & Skrondal, 2008; Raudenbush & Bryk, 2002) fit by maximum likelihood

6 Of course, t-tests are generally not appropriate for binary dependent variables such as home advantage, first kick, and last goal, but we just wanted to have a first comparison. Results from clustered chi-squares revealed similar results in terms of significance, although lower p- values. Outcome bias and Football 13 with robust standard errors. Individual players’ performance ratings were specified at level

1, teams were specified at level 2, and games (i.e., pairs of teams) at level 3. Note, however, that these data are not perfectly nested, as in a hierarchical structure. We had players with multiple team memberships. For example, in our dataset, Cristiano Ronaldo played in the Portugal vs. Poland, Euro 2016 game; he also played in the Real Madrid vs.

Atlético Madrid, Champions League 2016 game. The same occurs with teams having multiple game memberships. For example, we have the Portugal vs. Poland 2016 game and also the Portugal vs. Spain 2012 game (see Appendix A). Considering this, we specified a model allowing crossed (as opposed to nested) effects (Rabe-Hesketh & Skrondal, 2008).

Thus, we used a 3-level model with random-intercepts taking the following form:

Yijk = β0 + β1Won jk + β2 X′jk + β3X′′ijk +Vk +Uj + eijk (1)

th th th Where Yijk represents the performance rating for the i player from the j team and the k game, Wonjk is our main independent variable at level 2, X′ is a set of controls at level 2

(Rating Difference , Home Advantage, First Kick, Last Goal), X′′ is a set of controls at level

1 (Scored during Game, Participated in Shoot-out, Scored in Shoot-out, Goalkeeper), Vk is

th th the random intercept for the k game at level 3, Uj is the random intercept for the j team at level 2, and eijk is the overall residual.

Results are shown in Table 2. We first have a look at the controls on the right panel of the table. Interestingly, the ELO rating difference between the favorite and the underdog teams negatively predicted the players’ ratings. Something similar occurs with home advantage, although the coefficient fails to reach statistical significance (p = .098). Perhaps unsurprisingly, players who scored goals either during the game or the penalty shoot-out received higher ratings. Goalkeepers also received higher ratings. Participating in the penalty shoot-out was unrelated to the ratings. This suggests that raters do not distinguish Outcome bias and Football 14 between those who participate in the shoot-out from those who do not, except for those who score in the shoot-out and the goalkeeper.

Regarding our main independent variable, winning the penalty shoot-out, both when including and excluding the controls, it remained significant. In other words, those players whose team won in the penalty shoot-out received higher ratings than those whose team lost. Figure 1 depicts the distribution of ratings for players whose team lost or won. We computed Cohen’s d and the effect size was 0.25, which is considered a small but not negligible effect size (Cohen, 1988; Bosco, Aguinis, Singh, & Field, 2015).

Table 2. Linear multilevel mixed-effects using ‘won on penalty shoot-outs’ as the main independent variable and the players’ ratings as the dependent variable, all players in the data set.

(1) (2) Only IV IV + controls Intercept 2.82*** 2.71*** Won on Penalties (1 = Yes) 0.15*** 0.13*** Teams’ ELO rating difference -0.00** Home advantage (1 = Yes) -0.08 First kick in shootout (1 = Yes) 0.01 Player scored during game (1 = Yes) 0.56*** Player participated in Shoot-out (1 = Yes) -0.02 Player scored in Shoot-out (1 = Yes) 0.23*** Player is goalkeeper (1 = Yes) 0.48*** Teams’ shots on target difference -0.00 Team scored last during game (1 = Yes) 0.05 Number of observations 1157 1074 AIC 1859 1554 BIC 1889 1629 Log Likelihood -924 -762 Note. AIC = Akaike information criterion. BIC = Bayesian information criterion. Robust standard errors were used. *** p < .001 ** p < .01 Outcome bias and Football 15

Figure 1. Distribution of Player Ratings by Penalty Shootout Outcome

4.3. Further analyses.

4.3.1. Analyses removing players who participated in the penalty shoot-out.

We conducted similar analyses as in the previous section but removing those players who had participated in the penalty shoot-out. We excluded controls that did not apply giving the sample (e.g., we excluded the ‘score during shoot-out’ control, as players who did not participate in the shoot-out could not score on it). Table 3 shows these results.

Outcome bias and Football 16

Table 3. Linear multilevel mixed-effects regression using ‘won on penalty shoot-outs’ as the main independent variable and the players’ ratings as the dependent variable for players who did not participate in the penalty shoot-out.

(1) (2)

Only IV IV + controls Intercept 2.80*** 2.90*** Won on Penalties (1 = Yes) 0.22*** 0.25*** Teams’ ELO rating difference -0.00 Home advantage (1 = Yes) 0.02 First kick in shootout (1 = Yes) 0.04 Player scored during game (1 = Yes) 0.50*** Teams’ shots on target difference -0.01 Team scored last during game (1 = Yes) 0.15* Number of observations 527 464 AIC 877 728 BIC 902 778 Log Likelihood -432 -352

Note. AIC = Akaike information criterion. BIC = Bayesian information criterion. Robust standard errors were used. *** p < .001 ** p < .01 * p < .05

Despite retaining only players who did not take part of the penalty shoot-out, reporters gave higher ratings to players in the winning team than to those in the losing team.

4.3.2. Fixed-effects regression analyses

We then conducted a different set of analyses to complement the above. We conducted fixed-effects regression analyses (Allison, 2009). The specification was as follows: Outcome bias and Football 17

Yijt = β0 + β1Won jt + β2 X′jt + β3X′′ijt +a′ij + a′′i + eijt (2)

th th Where Yij represents the performance rating for the i player from the j team playing at the tth time. This is similar to the specification indicated in the main analysis, but with three crucial differences. First, there are only two levels of analysis. Second, t represents the different times a player played matches. Third, a′ij and a′′i represents the fixed, unknown factors at the player and team levels (i.e., the unmeasured effects are fixed). This specification utilizes a within- estimator, akin to a repeated measures analysis of variance.

[We removed the third level of analysis (games or pairs of teams) because virtually none of the teams played twice against each other (e.g., we had only one Portugal vs. Poland game)]. Fixed-effects regression is likely to remove the effect of time-invariant characteristics related to teams and players; this way, we can assess the net effect of the predictors on the outcome variable. One advantage of this analysis is that, given that between-observation variance is not used in the estimation of the regression coefficients, fixed-effects regression is less likely to be affected by omitted-variable bias (Rabe-Hesketh

& Skrondal, 2008).

As an example, our dataset includes 5 observations from Lionel Messi while playing for Argentina: Against Uruguay (2011, lost), against the Netherlands (2014, won), against

Colombia (2015, won), and twice against Chile (2015 and 2016; lost both times). It is thus possible to compare the differences in Messi’s ratings (i.e., the within- changes) when

Argentina lost and won. Fixed-effects regression basically averages these within- differences across all players in the sample, resulting in an average “treatment effect” of winning on penalties.

Outcome bias and Football 18

Table 4. Linear fixed-effects regression using ‘won penalty shoot-outs’ as the main independent variable and the players’ ratings as the dependent variable, for players who played in at least two games.

(1) (2) Did not Participate All players In Shoot-out Won on Penalties (1 = Yes) 0.22** 0.32* Teams’ ELO rating difference -0.00*** -0.00 Home advantage (1 = Yes) -0.11 -0.23 First Kick (1 = Yes) -0.07 -0.16 Player scored during game (1 = Yes) 0.66*** 0.90*** Player Participated in Shoot-out (1 = Yes) 0.01 Scored in Shoot-out (1 = Yes) 0.25* Teams’ shots on target difference 0.02 0.01 Team scored last during game (1 = Yes) 0.05 0.29 Number of observations 537 199 AIC 545 185 BIC 584 208 Log Likelihood -263 -85 Note. The left panel (1) includes all players who played in at least two games; the right panel (2) includes who played in at least two games and participated in the penalty shoot- out. AIC = Akaike information criterion. BIC = Bayesian information criterion. *** p < .001 ** p < .01 * p < .05

We conducted fixed-effects regression for multilevel, multiple membership observations using Correia’s (2016) procedure and formulas. We removed variables that do not have within- variation from the analyses (e.g. whether the player was a goalkeeper). In addition, as required by this type of analysis, we only included players that participated in at least two games. Outcome bias and Football 19

Table 4 shows these analyses, both when including all players (left panel) and when restricting the sample to players that participated in the penalty shoot-out (right panel). As before, the fixed effect regression analyses reveal that reporters tended to give a higher rating to players when their team won on penalties than when their team lost on penalties.

5. Discussion

A central tenet in judgment and choice research is that a good outcome does not necessarily imply a good decision process (Baron, 1985; Hastie & Dawes, 2009). It follows that it is unwise to assess decisions based solely on outcomes. Consistent with this, a strong argument for using subjective ratings in performance appraisals in industrial- organizational and educational psychology is that ratings are able to capture actual behavior

(what performers do; Landy & Conte, 2004). In contrast, ‘objective measures’ tend to tap mostly outcomes (Murphy & Cleveland, 1995). This tends to be a problem because outcomes are influenced by external forces, such as the environment or mere luck. The use of subjective ratings, therefore, may be considered fairer than using objective measures due to their focus on performance. But is it true that raters and their subjective ratings focus only on behavior? Researchers who have studied the outcome bias suggest that this is unlikely. These authors have found that both positive or negative outcomes tend to sway raters in their evaluation of others’ behavior or their decision making process (Baron &

Hershey, 1988; Sezer, Zhang, Gino, & Bazerman, 2016; Savani & King, 2015).

In this paper, we extended these results found in laboratory studies regarding the outcome bias by exploiting a specific feature of association football (soccer). Part of the reason there are few to none field studies examining the outcome bias is due to endogeneity. In the field, performance ratings are naturally correlated with outcomes because of a common variable: Actual performance. This issue makes very difficult to Outcome bias and Football 20 discriminate between outcome bias from accurate performance ratings. We used the outcomes of penalty shoot-outs as external shocks—orthogonal to actual performance— impacting players’ ratings. This is justified by previous research showing that penalty shoot-out outcomes seem to be unrelated to skill or previous penalty records, both at the team and individual level (Silver, 2014; The Economist’s Data Team, 2017). Our preliminary analysis also showed that winning by penalties was unrelated to important variables in predicting football outcomes, such as ELO rating, home advantage or in-game performance (shots in target).

We found that winning by penalties did have an effect on the reporters’ performance ratings of football players after controlling several variables. Furthermore, the outcome effect remained even after we excluded from the sample those players who had participated in the penalty shoot-out. We not only found this when comparing ratings between teams that had played against each other. We also found the same result, using fixed effects regression, when examining changes in ratings within footballers playing at different times.

However, the effect was relatively small: On average, teams that won by penalties had a fourth of standard deviation higher ratings than those that did not.

One implication of these results is that it may lessen arguments in favor of subjective ratings. Although normatively speaking human beings can discriminate between behavior and outcomes, our results suggest that descriptively people do put unwarranted weight on outcomes. This is consistent with the extensive literature on biases in performance ratings (Landy & Conte, 2004; Murphy & Cleveland, 1995). As such, our study shows an important limitation of subjective ratings of overall performance. However, there is a limitation in our study, which may prevent generalizing these findings to organizational or work settings, as we suggest above. An important reason subjective Outcome bias and Football 21 ratings are used in organizations is because they are psychologically relevant (Campbell,

2012; Landy & Farr, 1983) and often domain-specific (Rotundo & Sackett, 2002). Some researchers have proposed and found that, by using different performance dimensions, the quality of subjective ratings improves. For example, Arkes, Gonzalez-Vallejo, Bonham,

Kung, and Bailey (2009) found that when people used different criteria or performance dimensions (i.e., disaggregated ratings of performance), they were more likely to assign accurate performance ratings that when they assign a single, overall criterion (i.e., holistic ratings of performance). As a reviewer noted, because Goal reporters only used overall ratings of performance, they likely assigned less accurate performance ratings than those given by managers after a careful job analysis. A better dependent variable may have been judgments about specific performance dimensions (e.g., passing efficiency). We should also note that the outcome bias may be more important in some jobs than others. For example, among salespeople, the outcome bias is likely to be strong, because sales volume is such an important outcome among salespeople. Other jobs, such as receptionists, given that they usually lack explicit outcomes, the outcome bias is less likely to strong.

In sum, these limitations notwithstanding, in this paper we found evidence of the outcome bias in the (football) field. We hope to have contributed to the economic psychology literature by showing that decision biases are alive and well, beyond the laboratory. Outcome bias and Football 22

References

Ahmed, M., & Burn-Murdoch, J. (2017, March 17). How to save a penalty: the truth about

football’s toughest shot. Retrieved November 30, 2017, from

https://www.ft.com/content/8a190a16-0a3f-11e7-97d1-5e720a26771b

Allison, P. D. (2009). Fixed effects regression models (1 edition). Los Angeles: SAGE

Publications, Inc.

Apesteguia, J., & Palacios-Huerta, I. (2010). Psychological pressure in competitive

environments: Evidence from a randomized natural experiment. American

Economic Review, 100(5), 2548–64.

Arkes, H. R., González-Vallejo, C., Bonham, A. J., Kung, Y.-H., & Bailey, N. (n.d.).

Assessing the merits and faults of holistic and disaggregated judgments. Journal of

Behavioral Decision Making, 23(3), 250–270. https://doi.org/10.1002/bdm.655

Bar-Eli, M., Azar, O. H., Ritov, I., Keidar-Levin, Y., & Schein, G. (2007). Action bias

among elite soccer goalkeepers: The case of penalty kicks. Journal of Economic

Psychology, 28(5), 606–621.

Baron, J. (1985). Rationality and intelligence. Cambridge, UK: Cambridge University

Press. https://doi.org/10.1017/CBO9780511571275

Baron, J., & Hershey, J. C. (1988). Outcome bias in decision evaluation. Journal of

Personality and Social Psychology, 54(4), 569–579.

Bazerman, M. H., & Moore, D. A. (2012). Judgment in managerial decision making (8

edition). Hoboken, NJ: Wiley.

Bommer, W. H., Johnson, J. L., Rich, G. A., Podsakoff, P. M., & MacKenzie, S. B. (1995).

On the interchangeability of objective and subjective measures of employee

performance: A meta-analysis. Personnel Psychology, 48(3), 587–605. Outcome bias and Football 23

Bosco, F. A., Aguinis, H., Singh, K., Field, J. G., & Pierce, C. A. (2015). Correlational

effect size benchmarks. Journal of Applied Psychology, 100(2), 431–449.

https://doi.org/10.1037/a0038047

Campbell, J. P. (2012). Behavior, performance, and effectiveness in the twenty-first

century. In S. W. J. Kozlowski (Ed.), The Oxford handbook of organizational

psychology, Volume 1 (pp. 159–196). New York, NY: Oxford University Press.

Campbell, J. P., McHenry, J. J., & Wise, L. L. (1990). Modeling job performance in a

population of jobs. Personnel Psychology, 43(2), 313–575.

Cascio, W. F., & Aguinis, H. (2010). Applied psychology in human resource management

(7th ed.). Edinburgh: Pearson Education.

Castellano, J., Casamichana, D., & Lago, C. (2012). The use of match dtatistics that

discriminate between successful and unsuccessful soccer teams. Journal of Human

Kinetics, 31, 139–147. https://doi.org/10.2478/v10078-012-0015-7

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed). Hillsdale,

NJ: Erlbaum.

Connolly, T., & Zeelenberg, M. (2002). Regret in decision making. Current Directions in

Psychological Science, 11(6), 212–216. https://doi.org/10.1111/1467-8721.00203

Correia, S. (2016). reghdfe: Estimating linear models with multi-way fixed effects (2016

Stata Conference). Stata Users Group. Retrieved from

https://econpapers.repec.org/paper/bocscon16/24.htm

Das, A. (2016, June 27). Messi says his Argentina career Is over. The New York Times.

Retrieved from messi-says-his-argentina-career-is-over

El Dinamo. (2017, September 21). Marcelo Bielsa a periodistas franceses: “La compañía de

ustedes es siempre despreciable.” Retrieved November 30, 2017, from Outcome bias and Football 24

http://www.eldinamo.cl/nacional/2017/09/21/marcelo-bielsa-a-periodistas-

franceses-la-compania-de-ustedes-es-siempre-despreciable/

Gino, F., Moore, D. A., & Bazerman, M. H. (2009). No harm, no foul: The outcome bias in

ethical judgments. Harvard Business School. Retrieved from

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1099464

Hastie, R., & Dawes, R. M. (2009). Rational choice in an uncertain world: The psychology

of judgment and decision making (2nd edition). Los Angeles, CA: Sage

Publications, Inc.

Hershey, J. C., & Baron, J. (1992). Judgment by outcomes: When is it justified?

Organizational Behavior and Human Decision Processes, 53(1), 89–93.

Ilgen, D. R., & Favero, J. L. (1985). Limits in generalization from psychological research to

performance appraisal processes. Academy of Management Review, 10(2), 311–321.

Johnston, M. W., & Marshall, G. W. (2016). Sales force management: Leadership,

innovation, technology. New York, NY: Routledge.

Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Straus and Giroux.

Lago-Peñas, C., Lago-Ballesteros, J., Dellal, A., & Gómez, M. (2010). Game-related

statistics that discriminated winning, drawing and losing teams from the Spanish

soccer league. Journal of Sports Science & Medicine, 9(2), 288–293.

Lago-Peñas, C., Lago-Ballesteros, J., & Rey, E. (2011). Differences in performance

indicators between winning and losing teams in the UEFA Champions League.

Journal of Human Kinetics, 27(1), 135–146. https://doi.org/10.2478/v10078-011-

0011-3

Landy, F. J., & Conte, J. M. (2004). Work in the 21st century. New York, NY: Blackwell. Outcome bias and Football 25

Landy, F. J., & Farr, J. L. (1983). The measurement of work performance: Methods, theory,

and applications. New York: Academic Press.

Lasek, J., Szlávik, Z., & Bhulai, S. (2013). The predictive power of ranking systems in

association football. International Journal of Applied Pattern Recognition, 1(1),

27–46.

Lefgren, L., Platt, B., & Price, J. (2014). Sticking with what (barely) worked: A test of

Outcome Bias. Management Science, 61(5), 1121–1136.

https://doi.org/10.1287/mnsc.2014.1966

Liu, H., Gomez, M.-Á., Lago-Peñas, C., & Sampaio, J. (2015). Match statistics related to

winning in the group stage of 2014 Brazil FIFA World Cup. Journal of Sports

Sciences, 33(12), 1205–1213. https://doi.org/10.1080/02640414.2015.1022578

Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social,

organizational, and goal-based perspectives. Thousand Oaks, CA: SAGE

Publications.

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and

data analysis methods (Vol. 1). New Jersey: Sage.

Rotundo, M. (2002). The relative importance of task, citizenship, and counterproductive

performance to global ratings of job performance: a policy-capturing approach. The

Journal of Applied Psychology, 87(1), 66–80.

Savani, K., & King, D. (2015). Perceiving outcomes as determined by external forces: The

role of event construal in attenuating the outcome bias. Organizational Behavior

and Human Decision Processes, 130(Supplement C), 136–146.

https://doi.org/10.1016/j.obhdp.2015.05.002 Outcome bias and Football 26

Sezer, O., Zhang, T., Gino, F., & Bazerman, M. H. (2016). Overcoming the outcome bias:

Making intentions matter. Organizational Behavior and Human Decision

Processes, 137(Supplement C), 13–26. https://doi.org/10.1016/j.obhdp.2016.07.001

Silver, N. (2014, June 27). Extra time isn’t a crapshoot in the knockout round, but penalties

are. Retrieved November 30, 2017, from https://fivethirtyeight.com/features/extra-

time-isnt-a-crapshoot-in-the-knockout-round-but-penalties-are/

Skitka, L. J., & Tetlock, P. E. (1993). Providing public assistance: Cognitive and

motivational processes underlying liberal and conservative policy preferences.

Journal of Personality and Social Psychology, 65(6), 1205.

Skrondal, A., & Rabe-Hesketh, S. (2008). Multilevel and related models for longitudinal

data. In J. de Leeuw & E. Meijer (Eds.), Handbook of multilevel analysis (pp. 275–

299). New York, NY: Springer.

The Economist. (2017, August 11). Even with the ABBA format, penalty shootouts remain

a lottery. Retrieved November 30, 2017, from

https://www.economist.com/blogs/gametheory/2017/08/take-chance-me

Vecsey, G. (1990, July 5). Soccer, like life, is a lottery. The New York Times. Retrieved

from http://www.nytimes.com/1990/07/05/sports/sports-of-the-times-soccer-like-

life-is-a-lottery.html

Whoscored. (2018). Glossary. Retrieved July 25, 2018, from

https://www.whoscored.com/Glossary

(Ahmed & Burn-Murdoch, 2017; Allison, 2009; Apesteguia & Palacios-Huerta, 2010; Arkes, González-Vallejo, Bonham, Kung, & Bailey, n.d.; Bar-Eli, Azar, Ritov, Keidar-Levin, & Schein, 2007; Baron, 1985; Baron & Hershey, 1988; Bazerman & Moore, 2012; Bommer, Johnson, Rich, Podsakoff, & MacKenzie, 1995; Bosco, Aguinis, Singh, Field, & Pierce, 2015; Campbell, 2012; Campbell, McHenry, & Wise, 1990; Cascio & Aguinis, 2010; Castellano, Casamichana, & Lago, 2012; Cohen, 1988, 1988; Connolly & Zeelenberg, 2002; Correia, 2016; Das, 2016; El Dinamo, 2017; Gino, Moore, & Bazerman, 2009; Hastie & Dawes, 2009; Hershey & Baron, 1992, 1992; Ilgen & Favero, 1985; Johnston & Marshall, 2016; Kahneman, 2011; Lago-Peñas, Lago-Ballesteros, Dellal, & Gómez, 2010; Lago-Peñas, Lago-Ballesteros, & Rey, 2011; Landy & Conte, 2004; Landy & Farr, 1983; Lasek, Szlávik, & Bhulai, 2013; Lefgren, Platt, & Price, 2014; Liu, Gomez, Lago-Peñas, & Sampaio, 2015; Liu et al., 2015; Murphy & Cleveland, 1995; Raudenbush & Bryk, 2002; Rotundo, 2002, 2002; Savani & King, 2015; Sezer, Zhang, Gino, & Bazerman, 2016; Silver, 2014; Skitka & Tetlock, 1993; Skrondal & Rabe-Hesketh, 2008; The Economist, 2017; Vecsey, 1990; Whoscored, 2018)

Outcome bias and Football 27

Appendix A

List of Games Included in the Analyses

Year Tournament Teams 2008 UEFA Champions League Chelsea vs Manchester United 2008 European Championship Croatia vs Turkey 2008 European Championship Spain vs Italy 2010 World Cup Paraguay vs Japan 2010 World Cup Uruguay vs Ghana 2011 Copa América Argentina vs Uruguay 2011 Copa América Brazil vs Paraguay 2011 Copa América Paraguay vs Venezuela 2012 UEFA Champions League Bayern München vs Chelsea 2012 League Cup Bradford City vs Arsenal 2012 League Cup Cardiff City vs Liverpool 2012 European Championship England vs Italy 2012 European Championship Portugal vs Spain 2012 Of Nations Zambia vs Côte D'Ivoire 2013 UEFA Europa League Basel vs Tottenham Of Nations Burkina Faso vs Ghana 2013 Africa Cup Of Nations South Africa vs Mali 2013 Confederations Cup Spain vs Italy 2013 Confederations Cup Uruguay vs Italy 2014 World Cup Brazil vs Chile 2014 World Cup Costa Rica vs Greece 2014 League Cup Liverpool vs Middlesbrough 2014 World Cup Netherlands vs Argentina 2014 World Cup Netherlands vs Costa Rica 2014 UEFA Europa League Sevilla vs Benfica 2015 Copa América Argentina vs Colombia 2015 Copa América Brazil vs Paraguay 2015 Copa América Chile vs Argentina 2015 Africa Cup Of Nations Côte D'Ivoire vs Ghana 2015 Africa Cup Of Nations DR Congo vs Equatorial Guinea 2015 League Cup Liverpool vs Carlisle United 2015 League Cup Manchester United vs Middlesbrough 2015 League Cup Stoke City vs Chelsea 2015 Concacaf Gold Cup United States vs Panama 2016 Copa América Argentina vs Chile 2016 UEFA Champions League Atlético Madrid vs PSV Eindhoven 2016 European Championship Germany vs Italy 2016 League Cup Hull City vs Newcastle United 2016 League Cup Liverpool vs Manchester City 2016 Copa América Peru vs Colombia 2016 European Championship Poland vs Portugal 2016 UEFA Champions League Real Madrid vs Atlético Madrid 2016 European Championship Switzerland vs Poland