Submitted by Matthias Nikolaus Hilgarth, BSc

Submitted at Department of Economics

Supervisor Dr. Mario Lackner

October 2020 Better wear red? The influence of the color of on the outcome of Olympic sport competitions

Master Thesis to obtain the academic degree of Master of Science in the Master’s Program Economics

JOHANNES KEPLER UNIVERSITY LINZ Altenbergerstraße 69 4040 Linz, Österreich www.jku.at DVR 0093696 Sworn Declaration

I, Matthias Nikolaus Hilgarth, hereby declare under oath that the thesis submitted is my own unaided work, that I have not used sources other than the ones indicated, and that all direct and indirect sources are acknowledged as references. This printed thesis is identical with the electronic version submitted.

Linz, Place and Date Matthias Nikolaus Hilgarth

2 "All I am or can be I owe to my angel mother." Abraham Lincoln

For Mom

3 Acknowledgments

First and foremost, I would like to express my deepest gratitude to Dr. Mario Lackner for providing me with the topic of this Master’s thesis, for the trust in me to work independently, the patience to give me time and support whenever I needed it.

Moreover, I am very grateful to Alexander Ahammer, PhD for his assistance in the whole process. Additionally I would like to thank Univ.-Prof. Dr. Martin Halla for his valuable suggestions. Thanks the whole Department of Economics for providing such a an enjoyable environment for and working. Your efforts to make students part of the department are truly appreciated.

A special thanks to my partner Martina, for your kindness and encouragement. Your love protected me in my darkest time. To my friends Andreas, Daniela, Petra, Andreas, Gernot, Tobias, Wolfgang, Anna, Isabel, Martina, Monika, Julia, Peter, Rene and Sophie, thank you for the way you enriched my time and being there whenever I needed a friend. I could not have done it without you.

With boundless love and appreciation, I would like to thank my family. The unconditional support of my mother Gertrude, my father Klaus and my sister Sophie who allow me to chase my dreams. No words could ever describe my gratitude for your trust and eternal love.

4 Contents

Abstract 9

1 Introduction 10

2 Theory and Related Literature 13 2.1 The Economics of Contests and Competitions ...... 13 2.2 Replication Crisis in Economics ...... 19

3 Influence on Competition Outcome 25 3.1 First Mover Advantage ...... 25 3.2 Home Field Advantage ...... 26 3.3 Incentive Distribution ...... 27 3.4 Heterogeneity of the Contestants ...... 28 3.5 Color Influences on Perception and Performance ...... 29

4 Institutional Setting 34 4.1 ...... 34 4.2 ...... 37 4.3 ...... 39

5 Data 41 5.1 Descriptive Statistics ...... 42 5.2 Replication of Hill and Barton (2005a) ...... 45 5.2.1 Win percentage per Sport ...... 45 5.2.2 Win percentage per Degree of Asymmetry ...... 49 5.2.3 Criticism of Hill and Barton (2005a) ...... 51

6 Methods and Empirical Analysis 54 6.1 General Analysis ...... 55 6.2 Effects on Sex ...... 57 6.3 Effects on Random ...... 57 6.4 Cultural Effects ...... 59 6.5 Expand Hill and Barton (2005a) ...... 62

7 Heterogeneity on Sports 64 7.1 Boxing Results ...... 64 7.2 Taekwondo Results ...... 65 7.3 Wrestling Results ...... 67

8 Conclusion 69

5 9 Literature 71

A Appendix - Data per Sport 80 A.1 Boxing Data ...... 80 A.2 Taekwondo Data ...... 82 A.3 Wrestling Data ...... 83

B Robustness Checks for 6.1 General Analysis 85

C Results and Robustness checks for 6.4 Cultural Effects 87 C.1 Africa ...... 88 C.2 Americas ...... 91 C.3 Asia ...... 94 C.4 Europe ...... 100 C.5 Oceania ...... 105

D Robustness Checks for 7 Heterogeneity on Sports 107

6 List of Figures

4.1 Sample image of men boxing at the Olympics 2008 ...... 36 4.2 Sample image of men boxing at the Olympics 2016 ...... 36 4.3 Sample image of women boxing at the Olympics 2016 ...... 37 4.4 Sample image of a taekwondo bout in the Olympics 2016 ...... 38 4.5 Sample image of wrestling at the Olympics 2012 ...... 40

5.1 Win percentage per color, Replication of Hill and Barton (2005a) ...... 46 5.2 Win percentage per color, Replication of Hill and Barton (2005a) ...... 47 5.3 Win percentage per degree of asymmetry ...... 49 5.4 Same as 5.3, data: Hill and Barton (2005a) ...... 50 5.5 Win percentage per degree of asymmetry, from Hill and Barton (2005a, p. 293) 50

6.1 Visual help to understand split in non-random, random first and random later 58 6.2 Win percentage per color and sport, Replication of Hill and Barton (2005a) . 63

7 List of Tables

5.1 Summary of selected variables in the data ...... 42 5.2 Distribution of bouts by sex and sports in the data ...... 43 5.3 Distribution of bouts by seeding and sports in the data ...... 43 5.4 Distribution of bouts by Olympics and sports in the data ...... 44 5.5 Tables of χ2-results per sport in 2004 Olympics, only male competitors . . . 45 5.6 Tables of χ2-results per sport in 2004 Olympics, only female competitors . . 47 5.7 Sign-test of rounds, only male competitors in the 2004 Olympics ...... 48 5.8 Sign-test of weight classes, only male competitors in the 2004 Olympics . . . 48 5.9 Tables of χ2-results per degree of asymmetry in 2004 Olympics, only male competitors ...... 51

6.1 Effects of the color of the sportswear on winning the bout ...... 56 6.2 Effects of the color of the sportswear on winning - sex splits ...... 57 6.3 Effects of the color of the sportswear on winning - random splits ...... 59 6.4 Effects of the color of the sportswear on winning - continent splits ...... 61 6.5 Tables of χ2-results per sport in all tournaments without seeding, only male competitors ...... 63

7.1 Boxing - points per round and points per bout ...... 65 7.2 Taekwondo - points per round and points per bout ...... 66 7.3 Taekwondo - Attack and Penalty Points ...... 67 7.4 Wrestling - points per round and points per bout ...... 68

8 Abstract

The existence of biological and psychological effects of colors has been well documented in animals and humans. Based on these findings, various researchers postulated that the color of the sportswear does affect the outcome of sporting competitions. If this is true, this would have enormous consequences for tournaments. The aim of this thesis was to expand previous research to test this claim. In order to do so, data for three different sports (boxing, taekwondo and wrestling) was gathered for the five Olympic tournaments between 2000 and 2016. This thesis does not find any evidence supporting the claim that red sportswear is a competitive advantage in competitions. The results are consistent over different sports, sexes and geographical distinctions. While this conclusion is reassuring for the design of existing tournaments, it also highlights the the need for replications in the scientific process.

9 Chapter 1

Introduction

Popular media often highlight the color red as a color with super powers. While most of the reporting on scientific results leaves a lot to be desired, these reports are based on genuine research regarding effects of color. An example of this is the red dress effect, which states that men perceive women dressed in red as more attractive (Strain and Daniel, 27.02.2012). Another claim often brought forward is that red enhances the performance. Hill and Barton (2005a) stated that there is a significant effect of the color red on the outcome of competi- tions for boxing, taekwondo and wrestling. People use such scientific evidence as foundation for their decisions. An example of this would be Jürgen Klinsmann, former coach of the German national team, who wanted to change the primary color of the team to red, as red allegedly contributes to positive thinking and a more aggressive play style (Zelustek and Niklaus, 01.02.2006).

What are the implications for sports tournaments? Should we exclude the color red from sports tournaments, as it gives an unwanted advantage? From the literature, we know there are several other factors that can influence the outcome of contests. The most well-known of these are the home-field advantage, and notable differences in the effort dependent on the incentive distribution. Therefore, the color of the sportswear influencing the outcome would not be surprising. A biased tournament design would have large effects on the sports tournaments, like the Olympics, and on betting markets.

Sport is not the only area of importance. Many problems and situations relevant for economists can be described by tournaments. Tournaments can be seen as competitions

10 where the relative performance at a given stage decides the winner. Contestants have to chose their effort dependent on the prizes. Notable fields in economics which can be de- scribed with the help of tournament theory are e.g. the labor market, sales or research and innovation tournaments. The question arises, if there is a genuine effect of the color someone is wearing, do we also have to change the approach there?

Before talking about implications, the reliability of the results which indicate a competi- tive advantage of the color red has to be questioned. Various failed attempts to replicate research led to claims of a replication crisis in the social sciences. This has various reasons. Small sample sizes, sloppy research designs and a publishing culture which suffers from publi- cation bias, contribute to it. Therefore, it is highly important, that already published results get replicated, preferably with bigger data samples or different methods, to ensure these are genuine advancements in human knowledge.

The aim of this thesis is to answer the question, if competing in red comes with an advan- tage. Data was gathered on all Olympic competitions in boxing, taekwondo and wrestling since 2000. These sports are excellent for this study, as competitors wear either red or blue and the color is assigned per a draw. Since Hill and Barton (2005a) found a positive effect of the color red for these sports in 2004, their study was extended, to exclude the possibility of statistical flukes in their smaller sample.

Chapter 2 gives an overview of the relevant literature. Section 2.1 introduces some of the basic concepts of tournament theory, as well as some examples of fields where tournaments can be used. The importance of replications in the scientific process is illustrated in section 2.2. In chapter 3, some aspects which influence the outcome of tournaments are highlighted. While these make no claim to be complete, they give an overview over other factors that do influence the outcome of competitions. The literature regarding the effects of colors is discussed in section 3.5. The sports which were used in this thesis are introduced in chapter 4. In boxing, taekwondo and wrestling all competitors have to compete either in red or in blue, which is an ideal setting to investigate a color effect.

11 The data used is described in chapter 5. This chapter features two main sections. First, all the important descriptives are given, and necessary manipulation of the data, like the exclusion of the 2016 Wresling data, are explained. In the second section, the work by Hill and Barton (2005a) is replicated. This section also features an extensive discussion of publications which criticize their work. Chapter 6 features the results of the analysis, as well as the description of the main methods used. The data allows for a more detailed analysis, if the sample is split into sub-samples for the different sports. These results are described in chapter 7. The conclusion with a short discussion of the results can be found in chapter 8.

12 Chapter 2

Theory and Related Literature

This thesis primarily builds on and contributes to tournament theory by performing a replica- tion and an extension of previous research. In this chapter the relevant theory and literature are discussed. Section 2.1 outlines various fields in economics where competition design, especially tournament theory, are of major importance. Section 2.2, highlights the need for replications in economics and the social sciences.

2.1 The Economics of Contests and Competitions

"[. . . ] tournament theory has stood the test of time and been supported by many subsequent pieces of empirical research. It also passes the smell test: The more grotesque your boss’s pay and the less he has to do to earn it, the bigger the motivation for you to work for a promotion." (Harford, 2006)

"Whom to hire, promote, admit into elite universities, elect, or issue government contracts to are all determined in a tournament-like (winner-take-all) structure." (Fryer and Loury, 2005, p. 263)

Many fields can be described as tournaments, varying from managers and workers on the labor market, financial markets, all the way to sports competitions. The winner in tourna- ments is decided by best relative performance, uncared for the absolute performance levels. The prize, and by extension the prize spread, are the main motivation for an participant for competing and choosing the optimal effort to invest in the tournament. An optimal price

13 ensures the best output of the tournament, including all of the individual outputs. There- fore, competition design, and specifically tournaments, are a major area of interest within the field of economics.

Since the groundbreaking work of Lazear and Rosen (1981), the importance of tournaments as a research object ascended. They showed, that the use of a rank-order tournament to determine the pay can have the same performance incentives for risk-neutral workers as a piece rate. This is especially useful in fields where the marginal product of the worker is much more difficult to observe than their relative rank in the company. Nevertheless, if workers are heterogeneous and firms can not distinguish between different types of workers, an efficient allocation of workers with competitive contests can not be assumed automatically.

Green and Stokey (1983) compared tournaments and individual contracts. In their theo- retically model, they assumed that output is the only observable variable, and the output is a function of effort plus an additional shock. If there is no shock, individual (and opti- mal) contracts are better than the (optimal) tournament. This relation is reversed, if the distribution of the shock is sufficiently diffuse. In case of a large enough number of agents entering a tournament, the use of the optimal tournament is as suitable as using individual contracts while knowing the individual shocks. Bull, Schotter, and Weigelt (1987) tested some assumptions about incentives arising from the literature with experiments. In their comparison of tournaments and piece rates, the latter resulted in the expected effort and therefore productivity, with little variance across the participants. This low variance was not found in tournaments. In uneven tournaments, where one group is advantaged, the advantaged participants chose effort levels similar to the theoretical predictions. The other group chose effort levels higher than the theoretical equilibrium level.

Rosen (1986) argued that the difference of price money or salary at the top end of the competition ensures effort in later stages. As a result, salaries and price money has to be a convex function, to compensate for risk aversion of the contestants, who easily could be satisfied with the already achieved position. This high salaries at the top also have incentive effects at those at the lower stages within the same competition.

14 Carmichael (1983) looked at a different model, where information on productivity of the worker is not known to the worker, and job satisfaction is not known to the firm. In this model, a fixed number of high paid jobs allocated to the most senior workers, can ensure efficiency in the internal labor market. Knoeber and Thurman (1994) tested some of the predictions, which arise from the tour- nament literature empirically. In the broiler chicken market, they found evidence for all of the claims they tested. First, price changes, which do not change the price differentials for better relative performance, do not change the effort of the competitors. Second, those with less ability will take higher risks. In general, producers with a lower average production had a higher variance over the studied period. Last, tournament organizers do use a form of sorting and handicapping, in order to minimize the negative effects of mixed tournaments, where the competitors have very heterogeneous levels of ability.

To get a better understanding of how tournaments are used, some feasible applications will be discussed. These examples given should make it easier to see where we encounter tour- naments and how common they are. First, promotion tournaments within firms. According to Bognanno (2001), several criteria have to be fulfilled, that a firms job market resembles a tournament. There has to be a high promotion rate within the firm, the pay gaps have to increase with increasing , the hierarchy has to be an important factor for the salary and promotions have to come with rewards. They used data on salaries and bonuses from about 600 companies for up to 8 years. This gives information on the income of 25000 managers and executives per year. They found evidence for the tournament structure of corporations, since most position were filled through internal promotions, and these promo- tions and subsequent did have a large effect on the pay. Additionally, the pay of the CEO rises with the number of employees. This is consistent with the predictions of the literature, because the price of a tournament should rise, with the number of competitors entering, to guarantee an optimal outcome. De Varo (2006) used section data from the Multi-City Study of Urban Inequality. Us- ing a structural model, he showed that some theoretical predictions of tournament theory were true. The effort of workers is increasing with the wage spread implemented through promotions and the relative performance is the driving factor for promotions.

15 Further predictions of tournament theory were tested by Conyon, Peck, and Sadler (2001) with data from 100 of the top 150 companies listed at the London Stock Exchange, which represent 63% of the market value. They found the expected convex relationship between executive compensation and hierarchical levels. The pay for the CEO has risen with more competing executives. On average, an additional executive at the board increases the pay for the CEO by 3%. Nevertheless, they found no evidence that executive pay has a robust impact on performance of the company.

Second, tournaments are also applied in sales. Casas-Arce and Martínez-Jerez (2009) looked at regional tournaments among retailers of a commodities producers. These contests took place over multiple phases, resulting in a multiperiod contest. After getting monthly inter- mediate rankings, equal prices were awarded to a fixed number of participants at the end of the tournament. They found, that those participants, who got an early lead, decreased their effort as their lead was extended, while the trailing participants only reduced their effort when the gap to the leaders became very large. All in all, the effort was increased with the introduction of the tournaments, although this incentives weakens as the number of participants increased. Delfgaauw et al. (2015) also used a natural field experiment to test the predictions of tour- nament theory. They used a retail chain in the Netherlands that sells computer games, music, and movies. The tournaments were held in two stages, where the price of the first stage was a monetary bonus and the possibility to compete again in the second stage, where the prize was again an added bonus. They find, that an increased convexity in the price structure, hence a higher price in the second stage compared to the first stage, increased the performance in the second round at the expense of first-round performance. This result is in line with the theoretical predictions. Deller and Sandino (2019) looked at an enhanced version of sales tournaments. In addition to the sales as objective criteria, winners also had to comply to the company’s values and long-term goals. This soft factor was assessed by a well-respected senior manager of the company. This sales tournament incorporating managerial discretion resulted in increased sales relative to a control group. This effect of increased sales was true for both winners and losers of the tournament. Such a approach can have more desired outcomes for the sponsors of the tournament, but the assessment of the subjective part can lead to undesired effects.

16 The last field, where examples of tournament research will be discussed are research and innovation tournaments. In this field, the public sector is the biggest sponsor of tourna- ments, but they are increasingly used in academia and by private corporations. For research, tournaments are a particularly good use, as research inputs can not be observed. In the the- oretical work of Taylor (1995), he shows that the ideal strategy for each participant would be to stop research when innovation exceeds a critical point, which should be sufficient for winning. Nevertheless, there is no global stopping criterion for most research tournaments. This leads to many cases where at least one of the participants will overshoot the criteria and the goals of the tournament. This leads to the possibility for the sponsor of the contest to set the goal below the original aim, and hope that one of the contestants overshoots this goal to reach the original research aim. Using this, there is a possibility that nobody overshoots and therefore the original, not stated, aim is never reached. He also claims that free entry is not optimal, because the increasing number of participants lowers average effort of the participants. Among other results, this effect on the average performance was also shown empirically by Boudreau, Lacetera, and Lakhani (2011). They used data from TopCoder Software Contests, where software development competitions are regularly hosted. The negative effect of added competitors is smaller for the top performances, than for the entire sample. Therefore, for problems with higher degrees of uncertainty, a higher number of participants will have a positive impact on the top performance. The authors call this a parallel path, where more participants will try more possible solutions simultaneously, increasing the chance of extreme outcomes.

Examples of possible applications of tournament theory, like the ones given above, highlight the relevance of tournaments for economists. There are additional reasons why economists should be interested in the outcomes of (Olympic) sports tournaments specifically. First, sports tournaments are a great way to test the predictions of tournament theor. The criteria to win a specific round, as well as the whole tournament, are in general more clear and transparent than, e.g. promotion tournaments in firms. The seriousness of all competitors is more likely in professional sporting competitions than in experiments or other applications. E.g., we can safely assume that every player wants to win the tournament he or she is

17 entering, while not everyone in a given company wants to be promoted to the next highest level, as the tasks and responsibilities are not the same at each level. Examples for research testing the theoretical predictions of tournament theory with sports data can be found in chapter 3. Second, one of the fastest growing markets of relies on sports tournaments. In sports betting, bettors and bookmakers rely on fair and unbiased circumstances in the con- tests. With the global sports betting market projected to be worth 155 Billion US-$, there is an enormous financial interest in unbiased competitions (Zion Market Research, 2018). In case there is a bias which favors competitors in the tournament, this is equally interesting for the participants of the betting market, as shown by numerous work within the betting market literature. Third, the Olympic tournaments themselves are a relevant economic factor. The Olympic games are the biggest multi-sport events in the world. For the majority of the participating sports, they are the most prestigious tournament. To give an impression of the importance, 28 % of the world population watched the Winter Games in 2018 (IOC, 2019). In 2016, the latest summer games, which are way more popular than the Winter Games, over 11 000 athletes were viewed by 50 % of the world population (IOC, 2017). The International Olympic Comitee (IOC) reported 5.7 billion US-Dollar revenue in the last full Olympic cy- cle (2013-2016; IOC, 2017). These reported revenues only affect the IOC directly, but the money at stake is way higher, considering bonuses from sponsors, endorsement deals and other non-financial perks of competing and succeeding among the best of the world. Consid- ering the importance of this events, there is an enormous interest to design the competitions in a fair way. Apart from the financial reasons listed, there is an idealistic value of having fair competitions, which arguably should be even more relevant for the Olympics, given the history of the games.

18 2.2 Replication Crisis in Economics

"Distinct from most sciences, economics has not fully embraced the scientific method; in particular, there is no tradition of replication in economics. Results published in economics journals are accepted at face value and rarely subjected to the independent verification that is the cornerstone of the scientific method." (McCullough, McGeary, and Harrison, 2006, p. 1093)

"Economists treat replication the way teenagers treat chastity – as an ideal to be professed but not to be practised." (Hamermesh, 2007, p. 715)

In the last years, many scientists from different fields claimed that there is a replication crisis in their field. The most well known example for this is psychology, although the prob- lems are prevalent in all social sciences. Shrout and Rodgers (2018) mention three different reasons, why people see a replication crisis in psychology. These three can be summarized as fraud by certain researchers, questionable practices that result in false positives and the inability of replicating research results for a significant number of publications. One specific issue which fulfills two of the mentioned criteria are small sample sizes. In cases, where data is widely available, they are a result of questionable research design. The major problem with a smaller sample size is the immanent increase of the variance. Smaller sample sizes increase the possibility that the samples are fundamentally different than the population. Results arising from such samples, which differ from the population, can not be replicated when using better samples. This leads to an increased number of papers which cannot be replicated. The so called publication bias is also a huge problem regarding the replication crisis. Sterling (1959, p. 30) wrote already 60 years ago, that there is "[. . . ] evidence that in fields where statistical tests of significance are commonly used, research which yields nonsignificant re- sults is not published [. . . ] until eventually by chance a significant result occurs-an ’error of the first kind’-and is published." Therefore, "[. . . ] the concept of accepting articles based solely on importance of research question and quality of research design, before research re- sults are known, has rapidly expanded to include a large number of journals in psychology, political science, and other disciplines, including many of the top journals in the fields. The Center for Open Science keeps a running count of the number of journals that have adopted

19 registered reports. At the time of this writing, there were 41 journals. No economics journals appear on the list."(Duvendack, Palmer-Jones, and Reed, 2017, p. 49)

There are several types of replications, and no standard to assess if research is reproducible or replicable (National Academy of Sciences, 2016). In the literature, authors distinguish between different types of replications. Pesaran (2003) mentions two types of replications: Replications in the "narrow ", where the author of the replication study only checks for errors in the original study, and in the "wide sense", where other data is used. Hamer- mesh (2007) is using three distinctions. "Pure replications", in which the same data set and methods are used. "Statistical replications", where the study is different from the original publication in at least one point, which could be e.g. different data, methods or software. The third form are "scientific replications", where different theoretical or conceptual approaches are used. Four different categories are proposed by Clemens (2017). On the one hand "repli- cations", which can be divided in "verifications" (using the same samples and methods) and "reproductions" (new sample of the same population, same methods), on the other "robust- ness tests", where he also distinguishes between two types. A "reanalysis", using the same data or at least the same population with different methods and code and an "extension", using new data. While these examples are not exhaustive, they show that while replication is an ideal in research, it is not clearly defined in the social sciences. Without going into the detail of possible advantages and disadvantages of the different forms of replication studies, all of them are at the heart of the scientific process.

To realize the scope of the problem, it is important to understand why replicability of research is so fundamental. As McCullough, McGeary, and Harrison (2006, p. 1093) stated, the "[. . . ] lack of emphasis on replication in the economics profession is regrettable because the importance of replication in the scientific process cannot be understated: ’It is attempts at replication that check whether a genuine advance in knowledge has been made or a puzzle encountered, or whether either mistake or fraud lies behind the results’ (O’Brien 1992, p. 263)". In his essay, debating whether economics is on its way to become a hard science, Mayer (1980) stated, that replicability is ranked as the most essential criterion for research by natural scientists, more important than e.g. originality or logical rigor.

20 Replication ensures that the correctness of results can be accessed (McCullough, 2009). For this, the data and the methods used in the research have to be clearly stated and be verifiable. With data and code available, this can be checked easily by every researcher. If this is not the case, the amount of time needed to check results of other researcher would not be feasible for a majority of papers. There are several reasons why replications are needed. First, frequent replications provide incentives for researchers to avoid sloppiness (McCullough, 2009). Mistakes can happen to everyone, also to the top researchers in the profession. As an example McCullough, McGeary, and Harrison (2006) mention Levitt (1997), where findings that police hiring reduces crime could not be replicated by McCrary (2002). Additionally, there is always the possibility of flawed or incorrect data, which can change the outcome significantly (Duvendack, Palmer- Jones, and Reed, 2017). Second, the possibility of fraud can never be excluded (McCullough, 2009; Duvendack, Palmer-Jones, and Reed, 2017). This is a relatively rare phenomenon, but the effects are the most damaging to science. Third, "HARKing" the process of hypothesizing after the results are known, can lead to biased research (Duvendack, Palmer-Jones, and Reed, 2017). A similar approach is the so- called "p-hacking" where the fundamental aim of the work with data and estimations is to get results which are significant at the p < 0.05-level (Duvendack, Palmer-Jones, and Reed, 2017). In both of these cases, "HARKing" and "p-hacking", the researcher changes the ques- tions or methods to improve the probability of being published (Duvendack, Palmer-Jones, and Reed, 2017). Fourth, the use of different software packages can lead to dissimilar results of the same com- putational problem. One of the authors highlighting this problem is Stokes (2004). Fifth, the scientific publication process is far from perfect, and false results will not be neces- sarily corrected by other similar studies (McCullough, McGeary, and Harrison, 2006). The impact of one wrong paper on the publication process should not be underestimated. Sub- sequent papers have to explain their different findings, without having proof the first article was wrong. This is especially true, if the publication with the false results was published in a top-tier journal. Furthermore, the correction of one article is more resource efficient than several articles outweighing the first one.

21 The issue with failed replications in economic research is not particularly new. In July of 1982, the Journal of Money, Credit and Banking (JMCB) started the JMCB Data Stor- age and Evaluation Project. In the publishing process, authors had to consent to provide data and programs to other researchers on request. In December 1984, Dewald, Thursby, and Anderson (1986) asked for data and programs in order to examine the data sets and to replicate some of the papers. The number of positive responses were limited. The original authors did not provide data for six of the 27 papers already accepted and for 16 of the 28 papers which were in review at the time. The reasons for not providing data and methods ranged from confidentiality issues to the deletion of data and programs. When checking for completeness, citation of sources, explanation of data transformations and data descriptions, only seven met the quality criteria. Afterwards, they tried to replicate nine of the papers. Two could be fully replicated and for another two articles they found similar results. While finding computational errors in the rest of them, two articles could not be replicated at all.

Many of the problems described by Dewald, Thursby, and Anderson (1986) are not that severe today, as there is a higher emphasis on the value of data and more memory capacity available. Still, the problems when trying to replicate research remain. One example of this is the replication study conducted by Chang and Li (2017). Within 13 journals, they selected papers that had an empirical component, used only US data and had a key result regarding output measured by US GDP, and were published between July 2008 and October 2013. 59 papers remained in the sample after excluding those using confidential data (6) and those where Chang and Li (2017) did not have access to the specific software used (2). Chang and Li (2017) could replicate 29 publications, 22 of those without the help of the authors. Although only half of the papers where replicable, in the sub-sample of papers published in journals with a data and code policy, 23 of 34 papers could be replicated successfully. Of the papers which could not be replicated, the authors were not able to get data and code for two third of them. For the other third of the non-replicated papers, the results contradicted the published results or the obtained files did not work.

While results obtained by Chang and Li (2017) indicate that data and code policies make research easier to reproduce, they also have to be enforced. Since 1996, the JMCB required authors to deposit their data and replication code. In 2003, McCullough, McGeary, and

22 Harrison (2006) checked the JMCB archive for data and code replication files and the com- pliance of authors was surprisingly low. In the time span from 1996 to (the third issue of) 2003, 266 articles were published in the JMCB. Of those, 193 articles should have been in the archive according to McCullough, McGeary, and Harrison (2006), but only 69 were. This highlights the need for publishers to enforce their policies more strictly, as non-enforced data and replication code policies do not have the desired effects.

Although there is an immanent risk in the lack of replications in economics, the reasons which explain why this problem continues do not get addressed. First, as Dewald, Thursby, and Anderson (1986) note, there is a preconception that researchers replicating studies of other authors lack imagination and creativity. Additionally they mention a negative stigma of replications, as they can be interpreted as lack of trust in the original researchers integrity and abilities. A showcase of this thinking is a term used by Longo and Drazen (2016), two editors of the New England Journal of Medicine, highlighting the negative opinion some researchers have about replications and free access to data: "research parasites". They fear, that some researchers "use the data to try to disprove what the original investigators had posited". Although they have a point in proposing more credit for data-related work, the displayed aversion of replications misses the point of scientific research (Fecher and Wagner, 2016). Second, in academia, there is a much higher reward from the publication of articles than from the documentation of the research or the provision of data and methods. Every minute which is used for data documentation and making replication code more accessible to other researcher is a minute not working on the next publication. Third, Feigenbaum and Levy (1993) argue that journals do not want to publish replications. They have aligning interests with researchers, specifically as many citations as possible. This enhances the underlying process described in the previous point. The scope of the problem is highlighted by Vlaeminck and Herrmann (2015). They looked at 346 journals and found only 49 journals had a policy in place that the authors called "data availability policy". While they found that 37 out of these 49 journals require their authors to provide data and other materials prior to the publication to the journal, only five of these journals published replications.

23 With an increased availability of data and replication code, the field would profit. Besides the additional reliability because existing research can be easily checked, McCullough (2009) argues that the scientific progress as a whole would be faster. In a scientific environment which does provide easy access to replication code and data, researchers do not have to collect the same or similar data to build on the work of another researcher. The whole field could reallocate resources better, if the needed files and data would be easier accessible. There are also indications that publishing the data can be beneficial for an individual re- searcher. Gleditsch, Metelits, and Strand (2003) looked at 416 articles published in the Journal of Peace Research, and while stating that it is difficult to quantify the quality of an article, articles that made their data available were cited twice as much. However, this is an area which needs further research.

All in all, the replication crisis is a problem within the scientific research process, espe- cially in the social sciences. Being able to replicate research is essential to determine if the article is a genuine addition to human knowledge or just describes a statistical coincidence. Nevertheless, researchers and journals made huge strides to replicable science in the last years. McCullough (2009) found ten economic journals with data availability policies in 2009, but Vlaeminck and Herrmann, 2015 found already 49 in 2015. Still, this numbers leave a lot to be desired, especially when remembering: "No theory journal would dream of publishing a result (theorem) without a demonstration (proof) that the reader can trust the result, yet applied journals do it all the time." (McCullough, 2009, p. 117).

24 Chapter 3

Influence on Competition Outcome

The design of a competition can influence the outcome in several ways. Only a selected group of them is discussed in this work, with the main focus on the influence of color on perception and performance (section 3.5). Examples of omitted aspects are the grouping (and by extension, seeding) of contestants (Moldovanu and Sela, 2006, Leitner, Zeileis, and Hornik, 2010) or the sabotage of rivals (Carpenter, Matthews, and Schirm, 2010, Garciano and Palacios-Huerta, 2005). The phenomena discussed are the first mover advantage, the home field advantage, the incentive distribution and the heterogeneity of the contestants.

3.1 First Mover Advantage

Several competition designs are favoring the team or individual who go first. Examples in- clude football (soccer) penalty shootouts and (American) football overtimes (Che and Hen- dershott, 2008). For simplicity, in this chapter only the case of football penalty shootouts is discussed in detail.

In contests, where the conditions (with the exception of the order who shoots first) are similar, the ex-post analysis can still state an unfair advantage for one of the competitors (Apesteguia and Palacios-Huerta, 2010). They analyzed 269 penalty shootouts between 1970 and 2008, of which the team kicking first won 60%. In their model, there are only two factors which influence the outcome of a single significantly, if a team is lagging behind and if the team is the one who second or first. In an accompanying survey 96 % of 242 players and coaches from professional and amateur leagues in Spain stated that they would

25 choose to shoot first. They reason this with the added pressure for the opponent kickers. These results suggested that this form of psychological pressure has negative effects on the performance. Contrary, Kocher, Lenz, and Sutter (2012) extended the data set used in the main analysis (shootouts between 1970-2003) by Apesteguia and Palacios-Huerta (2010). Kocher, Lenz, and Sutter (2012) do not find a statistically significant advantage for the team first, although they still won in about 53 % of the observed penalty shootouts. Palacios-Huerta (2014) extended the data base of the two mentioned studies, finding a statis- tically significant advantage in the same ballpark as Apesteguia and Palacios-Huerta (2010). These discussions show the need for further research in this area, especially given the uncer- tainty of the possible causes of the potential "choking under pressure" effect.

Several researchers proposed mechanisms, which would result in fairer penalty shootouts. The "ABBA"-mode, where one team starts and then the other shoots twice, followed by the first team shooting two time in a row. This mode is also used in tennis tiebreaks. An improvement over the "ABBA" is the "Behind first" or "Catch-up"-mode, where the team trailing after the last round, shoots first in the next round (Anbarci, Sun, and Uenver, 2015, Brams and Ismail, 2018). Football associations tested the "ABBA"-mode in several tournaments, and decided against it because of its complexity1.

3.2 Home Field Advantage

Another effect leading to a biased outcome is the well documented home field advantage (e.g. for : Pollard, 1986, 2008, Clarke and Norman, 1995). Pollard (1986) found for the English association football League, home teams gain about 75% more points than teams playing away. According to this study, the home field advantage adds about 0.6 goals per game for the home team. Pollard (1986, 2008) has hypothesized possible causes including local crowd support, travel fatigue, familiarity with local conditions, referee bias, special tactics (e.g. playing more defensive as visitor) and other psychological factors, including the belief that the home advantage exists. Other sports were the home field ad- vantage was found include among others: , American football, hockey, boxing and

1https://www.fifa.com/about-fifa/who-we-are/news/ifab-s-133rd-annual-business- meeting-recommends-fine-tuning-laws-for-the-benefit, retrieved on the 18.12.2019

26 tennis (Jamieson, 2010).

To counter this home field advantage, many competitions include home and away games against the same opponent (two-stage knock-out competitions), and the aggregate result decides who advances into the next round. While this procedure is fairer compared to a one opponent playing at home and the other away, an unbiased result is still not ensured. Page and Page (2007) analyzed 51 years of international association football tournaments and found a significant higher percentage of teams advancing into the next round played the second game at home. Like the home advantage, the second leg home advantage has decreased over time. Possible explanations for this effect include the possibilities discussed in the "normal" home field advantage combined with the higher stakes of the second leg.

3.3 Incentive Distribution

The effort and risk of the participants is also influenced by the structure of the payoff. Ac- cording to Rosen (1986), the difference in the price distribution should increase in each stage to guarantee high effort til the last stage of the tournament. With this mechanism, the expired possibilities to advance into the (non-existent) next stages should be compensated. These payments for the winners also have indirect incentive effects on the chosen effort in the lower-ranks, respectively lower-stages of competitions.

The positive effects of better and higher prices on performance were shown empirically numerous times. For golfing, Ehrenberg and Bognanno (1990a,b) show that higher price money does attract better skilled players, increasing the competition. Higher prices also lead to a better performance, but this effect is driven by the results of the last round, and stronger when there is a higher marginal return for this better performance. Maloney and McCormick (2000) find similar results for foot races. Higher prices attract bet- ter skilled competition and higher spreads of the price money incentivize better performance. This increased performance is prevalent for the whole sample, as well as for the individual racer compared to the personal average.

27 3.4 Heterogeneity of the Contestants

The heterogeneity of the contestants is another important factor which influences the out- come. Dixit (1987) shows theoretically, that the ability of a player changes the incentive to invest effort. Better players have a strategic incentive to overexert effort to ensure their victory. Against popular belief, that underdogs try harder, his result shows that the un- derdog has an incentive to reduce their effort compared to the situation before knowing the opponent. This results does not rule out the possibility, that an underdog invests more effort than the favorite. The use of riskier strategies as the underdog is also not influenced by this theoretic incentive. This is coherent with the findings of Brown (2011), who shows that the presence of a super- decreases the performance of the other contestants. This was shown with the help of data from the PGA Tour. In tournaments where Tiger Woods did not compete, the rest of the field played about 0.2 strokes better. This effect is not driven by riskier strategies, as the variance of the non-superstar players does not change with or without Woods. In order for this effect to happen, the superstar has to live up to the expectations. The effect vanishes when Woods performed, for his standards, poorly. This is consistent with the theoretical and empirical findings of Lackner et al. (2015). They use NBA and NCAA data to underpin the theory that competitors reduce effort if the heterogeneity of ability in the current stage increases. If the current game is a pivotal game, which can decide the outcome of a series (best-of-5 or best-of-7), the effect of the heterogeneity is not as strong. If a series can be decided, the incentive of reducing the effort is decreased. Independent from the effort, the choice of the strategy is also dependent on the heterogeneity of the contestants. Genakos and Pagliero (2012) used data from international weightlifting competitions, where the riskiness of the strategies is more easily observed than in many other tournaments. Players who rank close to the favorite take more risks in their strategies to catch up to the current leader. With decreasing interim rank, the competitors also reduce the risk they are taking, resulting in an inverted-U relationship between risk and interim rank. More intense competition and a higher prestige also leads to strategies with higher risks. The same research also concludes that for all situations, were riskier strategies are employed, the performance decreases. The prestige and the level of competition in a tour- nament and being ranked closer to the top all decrease performance. This is explained with

28 psychological pressure. In association football, Grund and Gürtler, 2005 show that trailing teams switch to riskier strategies. They analyzed the 2003/2004 season of the German foot- Bundesliga, and treat substitutions as a form of risk management. While trailing teams do use riskier strategies (e.g. substitute a player of a more offensive position in), this risk does not pay off.

Additionally to influences in the current stage of a tournament, future opponents also influ- ence the effort and risk of competitors. Competitors are forward looking, therefore they will increase their effort if their chance of winning at the next stage is higher (Lackner et al., 2015). This is explained by the higher payoff which can be obtained in case of winning. This effect is stronger, when the future can be predicted more easily, e.g. in basketball, if the opponent of the next series is already fixed in advance of a game. This forward look- ing behavior leads to interesting conclusions, as shown by Brown and Minor (2014). Using tennis data, they underpin their two theoretical findings. The better the expected opponent in the next round, the more likely the lower rated player wins the current round. This can be explained with different levels of effort, where the higher ranked player reduces the effort stronger than the one already being the underdog in the current stage. Additionally, previ- ous effort levels the playing field, resulting in a higher chance for the lower ranked player to win, given similar negative effort spillovers from the previous rounds.

3.5 Color Influences on Perception and Performance

There is a huge body of scientific literature, which claims a link between the color red and the perceived dominance, sexual potency and aggressiveness. Especially in recent years, there is an increasing number of papers arguing that these effects found in biology and psychology research are also having an effect on tournament outcomes.

In Biology, Waitt et al. (2003) experimented with rhesus macaques, where females looked longer at the redder faces. This effect was true for five of the six female participants, while the last one looked equally long at both versions of the color-manipulated pictures of male macaques. For male mandrills, the color red could be used as a "batch of status", where the brightness of the color red on the face, rump and genitalia is closely related to the status

29 of the individual. The top-dominant male however is also the one with the brightest red color (Setchell and Dixson, 2001, Setchell and Jean Wickings, 2005). The color red is also used as a dominance signal for red-collared widowbirds. Widowbirds with a red collar dom- inated those with orange, brown and blue ones. In addition to obtaining territories, they defended bigger territories and did not engage in as many fights as there non-red collared counterparts. This effect is mostly within males, as the main criteria for the females choice is the length of the tail (Andersson et al., 2002, Pryke, 2002). For humans, Dreiskaemper et al. (2013) examined athletes in a controlled situation and found influences of the color red. 14 pairs of men had to fight against each other in a red or blue taekwondo outfit. While they did not find any of the two groups performing better in fighting ability, they found effects on the strength and on the heart rate. Before the fights, the participants had to do a strength test. Those dressed in red were significantly stronger than those in blue or in the control without specif colored sportswear. Another interesting result of this study is the higher heart rate during the 30 seconds of combat for the red competitor. No such differences were found before and after the fight.

In the field of psychology there are some researches which suggest that red does also have an influence on perceived dominance and self-confidence and anxiety in humans. Feltman and Elliot (2011) found a higher perceived dominance while wearing red in imagined taek- wondo fight as well as a lower one imagining a red dressed opponent. Although most of the biological literature focuses on the effects on males, their main result is prevalent for men and women. Feltman and Elliot (2011) results are in line with the research from Recours and Briki (2015), who found higher self-confidence wearing and higher anxiety facing red sportswear in a boxing video game. In their study, 60 students were assigned to play “Fight Night Round 4”, either facing red or blue opponents. They did not find an effect of the sex on the perceived dominance, no matter if it was the perceived dominance of themselves or the perceived dominance of the imagined opponent. Wiedemann et al. (2015) adds a more gender specific aspect to the discussion. While they find that red men are rated as more dominant and more aggressive than men wearing blue or gray, this effect is driven by male raters. For female participants, there were no statistically significant difference in the percep- tion of men wearing either red, blue or gray. A major flaw of this study is that participants only rated pictures of men, not testing if there are similar effects in the perception of women.

30 In theory, these principles could also influence outcomes of competitions. Hill and Bar- ton (2005a) found that in Olympic combat sports, where the participants compete in either red or blue sports wear, the red competitor wins more often. They attributed these results to the biological signaling function of dominance and aggressiveness of the color red. They examined the data for the Olympics 2004, and found statistical significant results for boxing, taekwondo and wrestling. This effect was true over all rounds and weight classes. According to the authors, this "red-blue" effect is strongest in close bouts. Only in the most lopsided bouts there were more blue winners than red ones. Many of the aspects of this analysis can be criticized, e.g. the backward causation problem in identifying lopsided bouts or the limited data base of only the 2004 Olympics. A more detailed criticism of the work of Hill and Barton (2005a) can be found in 5.2.3. Rowe, Harris, and Roberts (2005) copied the approach of Hill and Barton (2005a) and an- alyzed the competitions of the 2004 Olympics. There, competitors compete either in blue or white, and for the men’s competition, there was a significant advantage for the blue dressed competitors. The authors argued, that the results could be explained with the easier visibility of the white judogis. No such "blue-white" color effect could be found by Dijkstra and Preenen (2008), who increased the data basis to 72 tournaments. Hagemann, Strauss, and Leissing (2008) rejected both of the explanations given by either Hill and Barton (2005a) and Rowe, Harris, and Roberts (2005). In this experiment, taek- wondo referees awarded points to red or blue fighters after watching tape of short training sequences. The watched two blocks with the same video sequences, but once the colors were switched. In both, the original and the color-altered setting, the red fighters were awarded more points then the blue ones. This leads to the conclusion, that referees are influenced by the perception of color, and therefore their actions can also lead to a biased outcome. Frank and Gilovich (1988) claimed a similar effect of the color black. They found that black uniformed teams were among the most penalized teams in the NFL and NHL between 1970 and 1986. The authors gave three several explanations for this effect. First, the players in black sportswear chose a more aggressive approach in playing the respective games. Second, referees have a bias against teams playing black, because black is universal associated with evil. Third, some team owners and management would like their teams to be aggressive. Therefore they chose sportswear which is perceived to be more aggressive in addition to select

31 more aggressive players to compete for their team. Caldwell and Burger (2010) contradicts this research. After a rule change in the NHL, they could observe the same teams playing against the same opponents in different colored jerseys. They found no effect on the penalty number awarded and the severe penalties. This result is true for black and for red jersey, which they both analyzed separately.

Ilie et al. (2008) found highly significant results of color for the online first-person shooter Unreal Tournament 2004. Over a three months period, the red team won in about 54 % of the contests. Nevertheless, only outcomes of the games of the top 10 players worldwide were included, which raises questions about the robustness, considering that the personal color preference could have influenced the outcome severely. Sorokowski and Szmajke (2011) provides another explanation for differences in contest between different colored contestants. In a virtual arcade game, it was easier to hit red-colored moving objects, while no difference in evading colorful objects was found.

A different sport where a positive influence of a red sportswear was stated is association football. Attrill et al. (2008) found for the English football teams, that teams with red as their home jersey color won the league more often than expected, under the assumption that there is an equal chance of winning the league for each team. This assumption is easily dismissed because the economic possibilities, who influence the outcome over a year sub- stantially, are not evenly distributed. Another point to consider is the analysis is only based on the home jerseys. This ignores the home field advantage, which is described in section 3.2. Hill and Barton (2005a) also suggested an effect of red jerseys in football, examining the results of the five teams playing games in red at the Euro 2004. However, Hill and Barton (2005a) only compared the results of a team playing in red to the other matches of the same team playing in the alternate uniform. Those were white for four teams and blue for the fifth. Such a superficially examination, ignoring important factors like opponent and stage of the tournament, can be neglected. No such effect was found in the Spanish football league (García-Rubio, Picazo-Tadeo, and González-Gómez, 2011) or the Australian Rugby League (Piatti, Savage, and Torgler, 2012). Both García-Rubio, Picazo-Tadeo, and González-Gómez (2011) and Piatti, Savage, and Torgler (2012) suggested, that the results of Attrill et al. (2008) were driven by the fact, that the three most successful teams in the

32 history of English football (Arsenal, Liverpool, Manchester United) wear red as their main color. According to Greenlees et al. (2008), penalty takers were perceived to possess more positive characteristics by goalkeepers when wearing red. Moreover, Greenlees, Eynon, and Thelwell (2013) found that penalty takers were less successful against goalkeeper wearing red. Nev- ertheless, both of theses studies rely on experiments with a very small sample size, therefore more research regarding this topic is needed.

33 Chapter 4

Institutional Setting

In order to test the assumption, that the color red does influence the outcome of (sport) tournaments, three different sports, boxing, taekwondo and wrestling, are used. The fol- lowing sections describe the most important rules per sport, as well as the various changes regarding the rules and the tournament design over the five different Olympic tournaments.

4.1 Boxing

For men, boxing is part of the Olympic program since the 1904 Summer Olympics in St.Louis. Since then, boxing was always a fixture in the program with the exception of the 1912 Olympics in Stockholm, where boxing was outlawed at the time (SOCOG, 2001). The 2012 Olympics was the first which also featured competitions for women (LOCOG, 2013). For the 2016 Olympics, for the first time in Olympic history, there was a possibility to qualify for professional boxers. Until then, only amateurs were allowed to compete (“Professional boxers will be allowed to compete at Rio Olympics” 01.06.2016).

For the Olympic tournaments between 2000 and 2008, bouts consisted of four rounds of two minutes each. In 2012 and 2016, this format was used for the women’s competition, while changing the men’s format back to three rounds of three minutes each, as it was before 2000. In 2000, there were 12 different weight classes, which were reduced to 11 for the 2004 and 2008 games. With the inclusion of three different weight classes for women’s boxing, the weight classes for men were reduced to ten.

34 In boxing, two contestants compete against each other in a square-shaped ring. The aim is to punches at the head and upper body region of the opponent. A referee ensures that the contestants are compliant to the rules in the ring. Judges determine the winner, in case the fight is not stopped early. The main reason to end a bout early is that the referee stopped the contest (RSC), which can be subdivided into four types. The fighter outclassed the opponent (RSCO), outscored the opponent (RSCOS) or the opponent had a head injury (RSCH) or another injury (RSCI). Other possibilities, why a fight can be stopped early, are a (KO) or a disqualification (DSQ). At the 2016 Olympic games, the scoring system was changed to the "10 Point Must System". In each round, the five judges score the performance of the boxers. The winner of the round receives ten points, the loser be- tween 6 and 9, depending on the closeness of the round. Just before the fight, an electronic systems selects three judges at random, and only their scores will count. For the Olympic boxing tournaments before 2016, a was recorded if the white area on the front of the boxing made full contact with the front of the head or with the torso of the oppo- nent. If at least three of the five judges recognize the punch within a second, the punch was counted. At the end of the bout, the boxer with the most legal punches was awarded the win.

In Olympic boxing, the competitors have to wear shorts, shirt, gloves and headgear in the assigned color. The color is determined by the draw, with the fighter placed above his opponent, in the competition sheet, wearing red. The white area at the front of the boxing gloves, used for the determination of a punch, was removed in 2012. In the 2016 Olympics, the wearing of the headgear was optional for the men, but still required in the women’s competition.

In the draws for the competitions in London and Rio some of the competitors were seeded. This ensures, that those athletes, that are perceived to be the best in the competition, do not have to compete against each other in the early stages of the tournament. In the tournaments before, the draw was completely random for all contestants.

35 Figure 4.1: Sample image of men boxing at the Olympics 2008

Note: Joe Murray (GB) and Yu Gu (CHN) during their first round bout (54 kg). Source: https://www.telegraph.co.uk/sport/olympics/2544949/Joe-Murray-falls-at-first- hurdle-Beijing-Olympics-2008-boxing-news.html, last time accessed: 10 Sep 2019

Figure 4.2: Sample image of men boxing at the Olympics 2016

Note: Tony Yoka (FRA) and Joe Joyce (GB) during the Men’s Super Heavy (+91kg) Final. Source: https://www.olympic.org/photos/rio-2016/boxing/-91kg-super-heavy-weight-men, last time accessed: 10 Sep 2019

36 Figure 4.3: Sample image of women boxing at the Olympics 2016

Note: Sarah Ourahmoune (FRA) punches in direction of Ingrit Valencia Victoria (COL). Source: https://www.olympic.org/photos/rio-2016/boxing/48-to-51kg-fly-weight-women, last time accessed: 10 Sep 2019

4.2 Taekwondo

The name taekwondo consists of the three korean terms "tae" ("to stomp, trample") "kwon" ("fist") and "do" ("way, discipline"). This already shows the aim of the sport, punching and kicking the opponent. Based on more than 2000 year old traditions of Korean , the sport was formalized in the middle of the 20th century. After being a demonstration sport at the Olympics 1988 and 1992, taekwondo made its first official Olympic appearance in 2000.

The aim of taekwondo is to score more points than the opponent. Points are awarded by judges for different successful attacks. Attacks which require a higher skill (like a spin- ning kick to the opponents head) are worth more points than easier attacks. A bout consists of three rounds of two minutes each, with the possibility of a fourth round if the bout is tied after three rounds. In the fourth round, a golden point rule is in place, ending the bout with the first points scored. For previous Olympics, the fourth round was only used in Gold medal matches, deciding all the other bouts per jury decision (superiority-SUP). Since 2012,

37 taekwondo’s Protector and Scoring System (PSS) is used in the Olympics to determine if a attack was successful. This system features electronic impact sensors in the protective gear (in 2012 only Trunk PSS) and socks. The Head PSS was used at the 2016 Olympics for the first time. Before that, a point was valid if at least two of the judges recognized them. Since the Olympics 2016 the shape of the competition area has changed from a rectangle to an octagon.

In taekwondo, the sportswear is light-colored, predominately white. The colors relevant in this research, are the colors of the protective gear for the head and the torso. The col- ors are decided in the draw, with the contestant above the opponent competing in blue. Since 2012, there are seeded athletes, for 2000-2008 the draw was completely random (Streif, 23.03.2020).

Figure 4.4: Sample image of a taekwondo bout in the Olympics 2016

Note: Mahama Cho (GB) and Maicon Siqueira (BRA) compete against eachother in the the Men’s +80kg Bronze Medal. Source: https://www.olympic.org/photos/rio-2016/taekwondo, last time accessed: 11 Sep 2019

38 4.3 Wrestling

Wrestling is one of the oldest sports in human history. There are more than 5000 year old cave drawings of wrestlers, and wrestling was introduced in the ancient Olympics in 708 BC. In the first modern Olympics in 1896, Greco-Roman wrestling was included as it was viewed as the reincarnation of the original, ancient wrestling style. After there was no wrestling competition in 1900, the 1904 Olympics were the first which included . Since 1920, both were always part of the Olympic program. In 2004, a woman’s competition was introduced in freestyle wrestling. Greco-Roman wrestling is still a men’s only sport in the Olympics.

In wrestling, the competitors try to gain (and maintain) a superior position against the opponent. This is achieved when the other competitor is thrown to the ground and pinned there. The main difference between freestyle and Greco-Roman wrestling is the use of the feet. In the Greco-Roman style, holds below the waist are forbidden and trips can not be deployed to bring the opponent to the ground. Therefore, only the arms and upper body are used to attack. In contradiction, freestyle wrestling allows the use of the feet. Since the Olympics 2008, bouts are fought in a best of three mode in both styles. To win a bout, a competitor has to win two rounds consisting of three minutes each. There are two main ways to end a bout prematurely, with a "fall", where the opponents shoulders are both pinned to the ground, or by superiority, when one wrestler is outsourcing the other by a large margin. To outscore the opponent, for each successful attack the competitors are awarded points, which depend on the difficulty and risk of the successful attack. A result of this system is, that the wrestler who gains the most points during a match is not necessarily the winner2. At the Olympics 2000 and 2004, the contestant with the most points won, regardless of the round in which they were scored. Another anomaly in those two Olympic tournaments were the group stages at the beginning of the tournament. The winner of the groups advanced into the next round. If a round is tied, there are tiebreak rules to determine a winner. These include the number of warnings given, the highest single core from one action and who scored the last point. In freestyle wrestling, if a round is tied 0-0 at the of the two minute period, there is a 30-seconds tiebreaker, where one competitor has a more favorable position. If the

2For example, wrestler A wins 1-0,0-4,1-0. Wrestler B has twice as much points than wrestler A, but after only winning one round, wrestler A won.

39 wrestler in the superior position can not score, the other one is awarded the point. According to a superficial newspaper analysis, this superior position is a major deciding factor in the bouts (Bell, 09.08.2012). This possibility does not exist in Greco-Roman wrestling, because after one minute wrestling in neutral position, both wrestlers compete 30 seconds from a superior position. If they can not score within these 30 seconds, their opponent will be awarded a point, allowing the normal tiebreakers to determine the winner.

Figure 4.5: Sample image of wrestling at the Olympics 2012

Note: Clarissa Chun (USA) wrestling with Irini Meleni (UKR) in their 48kg Bronze Medal match. Source: https://www.olympic.org/photos/london-2012/wrestling-freestyle, last time accessed: 11 Sep 2019

40 Chapter 5

Data

In order to test the theories presented in section 3.5, data from the Olympics between 2000 and 2016 is used. In boxing, taekwondo and both styles of wrestling (Greco-Roman and freestyle) one competitor has to compete in red, and the other one in blue. This allows to check if the described effects of the color red do change the outcome of the bouts, and there- fore lead to a biased tournament structure. In each of these sports, the color is determined by the relative position on the tournament bracket, so personal preferences, land of origin or other factors can be ignored.

The source of the data are the official Olympic reports, provided by the IOC. The 2016 wrestling competition was excluded, because the requirements for the sportswear was re- laxed, and therefore would result in a subjective call if a certain sportswear is still "red enough" to qualify as red. This is explained in more detail in chapter A.3. Bouts, in which a disqualification happened before the actual bout, are coded as disqualifi- cation and as a withdrawal. A bout is coded as retirement if one of the athletes decided to resign in the bout. If the bout (or round) was decided by withdrawal, retirement, walkover or disqualification, the bout (or round) is coded as notrelevant for the analysis.

41 5.1 Descriptive Statistics

Table 5.1: Summary of selected variables in the data

Share Min Max Sum male 0.828 0 1 6500 color 0.500 0 1 3927 red_win 0.508 0 1 3988 not_relevant 0.023 0 1 182 seeded 0.092 0 1 722 red_and_seed 0.047 0 1 367 blue_and_seed 0.045 0 1 355 both_seeded 0.034 0 1 268 Note: N=7854 competitors, throughout 3927 (=7854/2) bouts. Data is at a person- level, which means that the sum is equal to the number of observations with said trait. E.g., 6500 of the observations in the data are male. The only exceptions are redwin and notrelevant, there the dummy is the same for both competitors, e.g. 1994 (=3988/2) bouts were won by the one competing in red. notrelevant are those bouts, which were decided by withdrawal, retirement, walkover or disqualification. "seeded" indicates that a person was seeded in the bout, "red and seed" ("blue and seed") indicate that a person was seeded and competed in red (blue). When both were seeded, "both seeded" is coded as 1.

A quick overview of selected variables can be found in table 5.1. As seen in the third row, 50.8 % of all bouts where won by the athlete competing in red, which is what we would expect if there is no effect of the color. The shares in the last four rows of table 5.1 show that a vast majority of bouts did not include seeding of the competitors. The Min/Max columns show, that every one of these variables are binary variables.

Table 5.2 highlights the over-representation of men’s competitions in the Olympics and in the data. Taekwondo is the only sport where the number of bouts per sex is close to equal. This can be explained by the fact that the sport was introduced in the 2000 Olympics, where the zeitgeist was already in favor of equal participation. The two older sports, which are part of the Olympic program since over a century, started opening up to female competitors in the Olympics 2004 and 2012. This results in a smaller amount of observations for Women’s competitions.

42 Table 5.2: Distribution of bouts by sex and sports in the data

Sex BOX TKD WFS WGR Total female 66 362 249 0 677 (4.76) (48.92) (24.60) (0.00) (17.24) male 1321 378 763 788 3250 (95.24) (51.08) (75.40) (100.00) (82.76) Total 1387 740 1012 788 3927 (100.00) (100.00) (100.00) (100.00) (100.00) Note: Distribution of bouts by sex and sport. The abbreviation stand for boxing (BOX), taekwondo (TKD) and free-style wrestling (WFS) and Greco-Roman wrestling (WGR). Percentages are given in brackets.

For the majority of the bouts in the data the competitors were randomly assigned. Table 5.3 shows this very clearly. For boxing and taekwondo, the 2012 Olympics where the first where the top athletes were seeded before the draw. For both styles of wrestling, the data does not include bouts where athletes were seeded. It is important to remember, that the Olympic wrestling competitions featured group stages in 2000 and 2004, which are considered as not seeded here.

Table 5.3: Distribution of bouts by seeding and sports in the data

Seedings BOX TKD WFS WGR Total 0 1079 460 1012 788 3339 (77.79) (62.16) (100.00) (100.00) (85.03) 1 308 280 0 0 588 (22.21) (37.84) (0.00) (0.00) (14.97) Total 1387 740 1012 788 3927 (100.00) (100.00) (100.00) (100.00) (100.00) Note: Distribution of bouts by seeding and sport. For the purpose of this table, if one of the two competitors is seeded, the bout is considered as seeded. Wrestling (WFS and WGR) had group stages in the first round of the tournaments in 2000 and 2004, which are considered as not seeded. The abbreviation stand for boxing (BOX), taekwondo (TKD) and free- style wrestling (WFS) and Greco-Roman wrestling (WGR). Percentages are given in brackets.

43 Table 5.4 shows the amount of bouts per Olympic tournament and year. The numbers show that the IOC trid to balance the inclusion of Women’s competitions (for freestyle-wrestling in 2004, for boxing in 2012) with a smaller amount of Women’s weight-classes and the reduction of weight-classes for Men’s competitions. Wrestling was excluded in 2016, because there was a rule change regarding the color of the sportswear. This exclusion is explained in further detail in chapter A.3.

Table 5.4: Distribution of bouts by Olympics and sports in the data

Olympics BOX TKD WFS WGR Total Sydney 298 129 233 245 905 (21.49) (17.43) (23.02) (31.09) (23.05) Athens 272 155 292 220 939 (19.61) (20.95) (28.85) (27.92) (23.91) Beijing 272 152 243 164 831 (19.61) (20.54) (24.01) (20.81) (21.16) London 272 152 244 159 827 (19.61) (20.54) (24.11) (20.18) (21.06) Rio 273 152 0 0 425 (19.68) (20.54) (0.00) (0.00) (10.82) Total 1387 740 1012 788 3927 (100.00) (100.00) (100.00) (100.00) (100.00) Note: Distribution of bouts by Olympics and sport. The abbrevia- tion stand for boxing (BOX), taekwondo (TKD) and free-style wrestling (WFS) and Greco-Roman wrestling (WGR). Percentages are given in brackets.

44 5.2 Replication of Hill and Barton (2005a)

Hill and Barton (2005a) analyzed four combat sports, boxing, taekwondo and both freestyle and Greco-Roman wrestling, in the 2004 Olympic games. Their results led to their rejections of the null hypothesis, that color does not have an influence on the outcome.

5.2.1 Win percentage per Sport

Hill and Barton (2005a) published only the χ2-results of all sports combined, without giving the partial outcomes of the different sports separately. Replicating their methods3, the results are not as clear as suggested in their paper. The results are not statistically significant at the 5%-level for any of the four sports examined. Especially in freestyle wrestling (WFS) and Greco-Roman wrestling (WGR) the wins are nearly the same as the expectations under the null hypothesis, that color has no effect on the contest outcome. Additionally, in taekwondo (TKD) the results are also clearly not significantly different from the null hypothesis. In boxing (BOX) the results are significant at the 10%-level. Only when aggregating the data over all four sports, the effect is statistically significant at the 5%-level. Figure 5.1 depicts this graphically. In table 5.5, we see the results of the χ2-analysis:

Table 5.5: Tables of χ2-results per sport in 2004 Olympics, only male competitors

Sport RED-Winner BLUE-Winner χ2 P df Hill & Burton: ALL 4.19 0.041 1 own calculation: ALL 242 199 4.192 0.041 1 BOX 147 120 2.730 0.098 1 TKD 43 32 1.052 0.305 1 WFS 27 24 0.177 0.674 1 WGR 25 23 0.083 0.773 1 Note: Hill and Barton (2005a) never specify the exact amount of wins per color in their paper.

3Hill and Barton (2005a) did not include wins by walkover in their research. For the data of this work, this exclusion criterion includes wins by walkover, withdrawal and retirement.

45 Figure 5.1: Win percentage per color, Replication of Hill and Barton (2005a)

Win percentage per sport Olympics 2004 - Only male competitors

blue red 95% CI 70

65

60

55

50

45

40

Percentage of contests won 35

30 ALL BOX TKD WFS WGR

Note: These results only include male competitors in the 2004 Olympics. Wins by walkover, withdrawal and retirement were excluded.

Using the same approach for female competitors, no significant effects were found. This is illustrated in figure 5.2, table 5.6 provides the numbers. This gender specific effect was also reported by Hill and Barton (2005b) in their answer to Rowe, Harris, and Roberts (2005). However, their analysis of female bouts Hill and Barton (2005b) included 155 bouts. To arrive at this number of bouts, one had to include the bouts from the group stage in freestyle wrestling, contradicting their own method laid out in their original paper (Hill and Barton, 2005a).

46 Table 5.6: Tables of χ2-results per sport in 2004 Olympics, only female competitors

Sport RED-Winner BLUE-Winner χ2 P df Hill & Burton: ALL N = 155 0.32 > 0.5 1 own calculation: ALL 47 54 0.485 0.486 1 TKD 33 41 0.865 0.352 1 WFS 14 13 0.037 0.847 1 Note: As mentioned, Hill and Barton (2005b) did not follow their own method for female competitors. The numbers stated under "own calculation" use the methods described in Hill and Barton (2005a).

Figure 5.2: Win percentage per color, Replication of Hill and Barton (2005a)

Win percentage per sport Olympics 2004 - Only female competitors

blue red 95% CI 75 70 65 60 55 50 45 40 35 Percentage of contests won 30 25 ALL TKD WFS

Note: These results only include female competitors in the 2004 Olympics. Wins by walkover, withdrawal and retirement were excluded.

Hill and Barton (2005a) stated that their results are consistent across rounds and weight classes. A replication of the analysis yields the same results. In 16 of 21 rounds, the majority

47 of the winner wore red. There was only one round per sport which had more blue winners than red winners, with one balanced round4. In 19 of the 29 weight classes, the fighters with the red sportswear won the majority of the contests. In 6 weight classes the blue competitors won more, leaving 4 weight classes with an even distribution of wins between the two colors5. Nevertheless, the results of both analyses are not significant in any sport, only in the aggregate.

Table 5.7: Sign-test of rounds, only male competitors in the 2004 Olympics

Sport RED-rounds BLUE-rounds Balanced-rounds P Hill & Burton: ALL 16 4 1 0.012 own calculation: ALL 16 4 1 0.012 BOX 4 1 0 0.375 TKD 5 1 0 0.219 WFS 4 1 0 0.375 WGR 3 1 1 0.625

Table 5.8: Sign-test of weight classes, only male competitors in the 2004 Olympics

Sport RED-classes BLUE-classes Balanced-classes P Hill & Burton: ALL 19 6 4 0.015 own calculation: ALL 19 6 4 0.015 BOX 8 3 0 0.227 TKD 4 0 0 0.125 WFS 4 1 2 0.375 WGR 3 2 2 1

4For detailed results see table 5.7 5For detailed results see table 5.8

48 5.2.2 Win percentage per Degree of Asymmetry

The second part of the analysis by Hill and Barton (2005a) focused on the significance of the positive effect for the competitor dressed in red. They used the difference in the points scored to differentiate the fights in four groups per sport, assuming that the difference in the points scored is a viable estimation for the difference in ability. With this approach, there is one major problem. If the color of the sportswear does indeed affect the outcome of the bouts, it would also affect the difference in the points scored, which leads to a massive backward causation problem.

Ignoring this, the result of Hill and Barton (2005a) could not be replicated. The result of this replication is depicted in figure 5.3. Figure 5.4 shows the attempt of replicating the results by Hill and Barton (2005a) using their data, while figure 5.5 shows the graph from their paper. While the difference is smaller, the results are still not the same. As mentioned in section 2.2, the use of different software packages can explain different results when using the same data. Hill and Barton (2005a) did not state which program they used for their analysis, so this possibility could not be checked.

Figure 5.3: Win percentage per degree of asymmetry

Win percentage per degree of asymmetry Olympics 2004 - Only male competitors

blue red 95% CI 75 70 65 60 55 50 45 40 35 Percentage of contests won 30 25 None Small Medium Large

49 Figure 5.4: Same as 5.3, data: Hill and Barton (2005a)

Win percentage per degree of asymmetry Data from Hill and Barton (2005a) Olympics 2004 - Only male competitors

blue red 95% CI 70 65 60 55 50 45 40

Percentage of contests won 35 30 None Small Medium Large

Figure 5.5: Win percentage per degree of asymmetry, from Hill and Barton (2005a, p. 293)

50 Table 5.9: Tables of χ2-results per degree of asymmetry in 2004 Olympics, only male com- petitors

Degree of asymmetry RED-Winner BLUE-Winner χ2 P df Hill & Burton: None 6.07 0.014 1 Small 2.21 0.14 1 Medium 0.47 0.5 1 Large 0.21 0.64 1 own calculation: None 106 60 12.747 0.000 1 Small 41 44 0.106 0.745 1 Medium 50 42 0.696 0.404 1 Large 53 45 0.653 0.419 1 Note: Hill and Barton (2005a) never specify the exact amount of wins per degree of asymmetry in their paper.

5.2.3 Criticism of Hill and Barton (2005a)

Several authors (Matsumoto et al., 2007, Seife, n.d.) have expressed doubts about the methodology used by Hill and Barton (2005a). First, the draw of the competition table is not always random. In most sports competition, the top athletes get seeded, to avoid a match-up of them in the early stages, resulting in a drop-out of one of them. Although there were no seeded athletes in the 4 disciplines in the 2004 Olympics, Seife (n.d.) argues that the assignment is not necessarily strictly random. E.g. in the boxing competitions there are first-round byes, potentially violating an assumption of equal conditions. Second, using only data from one Olympic tournament as basis for the research (Matsumoto et al., 2007). When Seife (n.d.) performed the same analysis with the data of the 2008 Olympics, the results by Hill and Barton (2005a) could not be confirmed. There was no statistical significant effect in three of the four sports, and in the fourth, freestyle-wrestling, the blue athletes won over 60 % of the contests. Additionally, Curby (2016) analyzed the wrestling World championships 2015 and found no effect of the color of the sportswear on the outcome of the bouts in this competition. Third, the explanation around the color red as a marker for quality in male animals, and

51 subsequent also for humans, was questioned. Rowe, Harris, and Roberts (2005, 2006) used judo data from the 2004 Olympics, where the athletes compete either in a white or a blue judogi. They copied the methodology by Hill and Barton (2005a) and found a significant advantage for the (male) competitor in the blue judogi. Matsumoto et al. (2007) identi- fied many of the flaws of the researches by Hill and Barton (2005a) and Rowe, Harris, and Roberts (2005), and analyzed four major judo competitions (World Championships 2001, 2003, 2005 and the 2004 Olympics) with a different, improved method. They found a sig- nificant win bias for male athletes in the blue judogis, but no effect for women. The effect increased in later stages of the tournament. Nevertheless, they assumed, that after the first round, where the contestants are seeded, the color of the sportswear is random. Due to the fact, that the top seeded competitors do not have to change their colors as long as they continue winning, the assumption that the color is random after the first round, should be questioned. In contradiction to the results by Rowe, Harris, and Roberts (2005, 2006) and Matsumoto et al. (2007), Dijkstra and Preenen (2008) found no effect of the sportswear for judo in the 72 tournaments they analyzed. Fourth, most of the mentioned researchers are operating under the implicit assumption that the structure of the tournament does not influence the outcome. Fortunato and Clauset (2016) included two forms of incompleteness of the tournament structure in their analysis. When taking account of byes and walkovers, a Monte Carlo simulation of the 2004 Olympics tournament structure resulted in a bias toward the competitors wearing red when the level of skill variance was increased. For the 2008 Olympics, the findings were in favor of the blue competitors. Therefore, the results of Fortunato and Clauset (2016) are consistent with the findings of Hill and Barton (2005a) and Seife (n.d.), while providing a different explanation. Nevertheless, this analysis does not tackle the question of a fair competition design, because the organizer can not influence the bouts which will not take place due to a walkover. Fifth, while the assumption, that the color influences should only influence close bouts, ap- pears reasonable, the methodology to prove this is questionable. Hill and Barton (2005a) use the results to divide the bouts in different degrees of asymmetry. Only in those bouts, where the results indicate a high asymmetry ability between the competitors, the blue athletes win more than half of the contests. With this method, the question of backward causation arises. If the color of the sportswear indeed has an effect on the outcome of the bout, the point difference of these bouts can never be an unbiased estimator of the difference in ability of

52 the athletes. Therefore, all of the results there should be ignored. Sixth, Hagemann, Strauss, and Leissing (2008) gives another possible explanation for the effect Hill and Barton (2005a) found for the 2004 Olympics. In this research, 42 different judges awarded points for displayed taekwondo bouts. The bouts were split in two identical sets of bouts, once with the original sportswear colors (red and blue) and once with reversed colors. In both sets, the competitors wearing red were awarded more points suggesting that a bias of the referees could explain why red fighters won more bouts in the 2004 Olympics. A color driven bias in the perceived actions of athletes is nothing new. For the NFL and the NHL, a teams wearing black were perceived more aggressive and more likely to be penalized (Frank and Gilovich, 1988).

53 Chapter 6

Methods and Empirical Analysis

As discussed in section 2.2, replications and extensions of existing research are very impor- tant to assess whether these previous results are reliable. Especially the various examples of contradictory research results regarding the effects of color on sporting outcomes are high- lighting this need for replications and more research. Therefore, the data used by Hill and Barton (2005a) was extended to exclude the possibility of chance, and the methods used were expanded. Their results could not be replicated using a bigger sample of tournaments.

Overall, no effect of the color of the sportswear on the outcome of the bouts was found. This result is consistent over several sub-samples. Neither for a single sport, nor for a single sex significant effects were found. When splitting the data along differences in the degree of randomness of the pairing, the results stay insignificant. The same is true for geographical splits, which were used as a proxy for cultural effects. The only time color influenced the outcome statistically significant was for (North)-African men, which won 29 % of the bouts in red and 22 % of their bouts in blue. These raw numbers do not suggest that (North)- African men have a competitive advantage when wearing red. However, we see that there will always be a sub-group, where we can find a statistically significant effect, due to chance or over-specification.

The empirical analysis, which was conducted using STATA 13, is split into four major parts. First, section 6.1 focus on the influence of the color of the sportswear on winning the bout. All of the available bouts are used for this, but because the different sports have different scoring, no analysis on the points per bout or points per round can be performed.

54 Second, the data was split into regional samples to control for different cultural effects. Third, to test if the results by Hill and Barton (2005a) are a result of the methods used, their analysis is replicated with all of the tournaments, which did not include seeding of athletes.

6.1 General Analysis

To analyze if the color of the sportswear does have an impact on the outcome of the bout, a Linear Probability Model (LPM) is used. The foundation of this linear regression is

win = β0 + β1red + β2seeding +  (6.1) where win is the binary outcome variable, red is the binary indicator if the competitor wears red or blue, seeded is the binary indicator if the competitor is seeded for this partic- ular tournament and  is the error term. To account for the possibility, that either both or none of the competitors in a particular bout are seeded, betterseeded, a binary variable that is equal 1 if someone is better seeded than the opponent, and 0 in all other cases, is used in some variations. male, a binary variable indicating the sex of the contestants, is used in some interactions, to account for the possibility, that only one sex is influenced by the color. For this analysis, the shortcomings of the LPM, especially nonsensical predicts, are not that important, as this analysis does not aim to predict the winner of a given bout. All LPM-regressions where also conducted using a Logit approach, to check for the robustness of the findings.

Table 6.1 depicts the results of the linear probability model on winning a given bout. While in specification (1) only seeded was used as control, specifications (2) and (3) include ad- ditional interactions for red and seeded (2) and interactions for red, male and seeded (3). Specifications (4) and (5) replace the dummy for seeded competitors in (2) and (3) with a dummy for competitors who are seeded higher than their opponent. None of these specifi- cations show a significant impact of the color. The only things which have an impact on the outcome are the dummies which represent if an athlete was seeded in the tournament or was higher seeded than their opponent.

55 Table 6.1: Effects of the color of the sportswear on winning the bout

Linear Probability Model (1) (2) (3) (4) (5) win win win win win b/se b/se b/se b/se b/se red 0.018 0.016 -0.014 0.016 -0.010 (0.02) (0.02) (0.03) (0.02) (0.03) seeded 0.156∗∗∗ 0.261∗∗∗ 0.292∗∗∗ (0.02) (0.04) (0.04) red × seeded 0.028 0.033 (0.04) (0.04) male × red 0.036 0.032 (0.03) (0.03) male × seeded -0.057 (0.04) better seeded 0.322∗∗∗ 0.363∗∗∗ (0.04) (0.05) red × better seeded 0.025 0.029 (0.04) (0.04) male × better seeded -0.065 (0.05) Tournament Dummy No Yes No Yes No Olympics No No Yes No Yes Sport No No Yes No Yes R-sqr 0.009 0.015 0.015 0.021 0.022 Note: N=7672 competitors, throughout 3836 (=7672) bouts. Dependent variable for all five Linear Proba- bility Model (LPM)-specifications is win, a dummy indicating if the individual competitor won, and all re- gressions are clustered at the bout-level. The T ournamentDummy distinguishes along sex, sport, Olympics and weight classes. In specifications (3) and (5), Olympics, sport styles and the interactions between them were used as controls. As a robustness check, the same analysis was conducted using a Logit approach. The results can be found in table B.1. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

56 6.2 Effects on Sex

When splitting the sample according to sex, as in table 6.2, we see the same pattern. While the sign of the coefficients changes for the two sexes, the color does not have a significant influence on the probability of winning the bout. The dummies indicating if and how high competitors were seeded are the only variables which do have a significant effect.

Table 6.2: Effects of the color of the sportswear on winning - sex splits

Both sexes Male Female (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se red 0.016 0.016 0.023 0.023 -0.024 -0.021 (0.02) (0.02) (0.02) (0.02) (0.04) (0.04) seeded 0.261∗∗∗ 0.234∗∗∗ 0.319∗∗∗ (0.04) (0.04) (0.07) red × seeded 0.028 0.042 0.022 (0.04) (0.05) (0.08) better seeded 0.322∗∗∗ 0.297∗∗∗ 0.372∗∗∗ (0.04) (0.04) (0.07) red × better seeded 0.025 0.035 0.026 (0.04) (0.05) (0.07) Tournament Dummy Yes Yes Yes Yes Yes Yes N 7672 7672 6324 6324 1348 1348 R-sqr 0.015 0.021 0.012 0.017 0.032 0.044 Note: Dependent variable for all six Linear Probability Model (LPM)-specifications is win, a dummy indicating if the individual competitor won, and all regressions are clustered at the bout-level. The T ournamentDummy distinguishes along sex, sport, Olympics and weight classes. The male sample (used in (3) & (4)) and the female sample (used in (5) & (6)) are subsamples of the whole sample, which was used in (1) & (2). The data is given at the "individual per bout"-level, therefore N gives the number of competitors, so N/2 gives the number of bouts used in the regressions. As a robustness check, the same analysis was conducted using a Logit approach. The results can be found in table B.2. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

6.3 Effects on Random

Splitting the sample in three sub-samples, (1) where the pairing of the contestants was random and in the first round, (2) where the pairing was random and in a later stage of the tournament and (3) where the pairing was non-random also leads to no significant effect of the colors. A pairing is defined as non-random, if a seeded athlete could be in this bout, if all bouts would be won by the favorite. "random first round" is defined as first round bouts

57 where both competitors were not seeded, and "random later round" is defined as non-first round bouts where every bout leading to the bout in question was either "random first round" or "random later round". Figure 6.1 offers a graphical explanation.

Figure 6.1: Visual help to understand split in non-random, random first and random later

Note: Non-random is defined as every bout a seeded athlete could be, if all bouts would be won by the favorite, respectively the seeded athlete. First round bouts where neither of the athletes were seeded are defined as "random first round". "Random later round" is every bout where the previous bouts are either "random first round" or "random later round".

58 Table 6.3: Effects of the color of the sportswear on winning - random splits

Non-random Random first round Random later rounds (1) (2) (3) win win win b/se b/se b/se red 0.022 0.020 -0.055 (0.04) (0.05) (0.04) male × red 0.030 0.013 0.030 (0.04) (0.05) (0.04) Olympics Yes Yes Yes Sport Yes Yes Yes N 3152 2188 2332 R-sqr 0.002 0.001 0.001 Note: Dependent variable for all three Linear Probability Model (LPM)-specifications is win, a dummy indicating if the individual competitor won, and all regressions are clustered at the bout-level. The "Non- random"-sample (1) consists of all bouts, where there was at least one seeded competitor involved, either at the stage of the bout, or at one of the preceding stages of the tournament. E.g., if a seeded competitor lost in the first round, all following bouts in the tournament tree, where this (seeded) competitor could have been if he or she won, are also coded as "Non-random". "Random first round" (used in (2)) are only these bouts, which are the first bout in the tournament for both competitors, and non of them was seeded. "Random later rounds" (used in (3)) consists of all bouts, where all of the first round bouts leading to this specific bout in the tournament was part of the "Random first round" sample. In all three specifications Olympics, sport styles and the interactions between them were used as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors, so N/2 gives the number of bouts used in the regressions. As a robustness check, the same analysis was conducted using a Logit approach. The results can be found in table B.3. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

In table 6.3 we see a change in the sign of the coefficient red when comparing non-random bouts with those which were random in the later rounds, but this effect is not statistically significant. Therefore we can exclude the possibility, that the process of seeding compromises the data in a ways we cannot control for.

6.4 Cultural Effects

None of the results above suggests that color does influence the outcome of a bout. In all of the specifications, the only significant factors were seeded and betterseeded. The influence of the seeding on the outcome should be expected.

59 Cultural difference in the perception of colors could still influence the outcome. Therefore, the data was split using different regions and sub-regions, using the United Nations cate- gorization (Methodology: Standard country or area codes for statistical use (M49)). Using this, the data was split into five sub-samples (Africa, Americas, Asia, Europe, Oceania) and a LPM-analysis like in equation (6.1) was conducted.

The results are depicted in table 6.4. The number of observations varies from region to region, therefore not all of the results are as dependable as desirable. Looking at (1), there is a statistical significant effect for African men which wear red. The low number of obser- vations (684 of both sexes) makes it more likely that this is a statistical fluke than a genuine effect of the color red. When looking at the distribution of wins per color for African men in table C.3 the raw win percentage per color is 22% when wearing blue and 29% wearing red. These numbers are way to low to argue that competitors wearing the color red have an ad- vantage over their opponents. The same effect can be seen in the analysis of the sub-sample, which only includes athletes from Northern Africa (see C.4), where the same argumentation applies. Surprisingly, the seeding is not significant in each sub-sample. This can be explained with the significant number of tournaments in the data, which did not include seeding at all. Table C.2 includes a different specification using betterseeded instead of seeding, and there the betterseeded-dummy is significant.

60 Table 6.4: Effects of the color of the sportswear on winning - continent splits

Linear Probability Models (1) (2) (3) (4) (5) Africa Americas Asia Europe Oceania b/se b/se b/se b/se b/se red -0.115 0.004 0.001 -0.036 0.111 (0.06) (0.05) (0.04) (0.04) (0.11) seeded 0.149 0.222 0.195∗ 0.342∗∗∗ 0.573 (0.15) (0.12) (0.08) (0.07) (0.44) red × seeded -0.049 0.049 0.090 -0.013 -0.145 (0.14) (0.09) (0.07) (0.06) (0.45) male × red 0.199∗∗ 0.019 0.007 0.055 -0.059 (0.06) (0.05) (0.04) (0.04) (0.11) male × seeded 0.081 -0.011 -0.040 -0.100 -0.051 (0.15) (0.11) (0.07) (0.06) (0.36) Olympics Yes Yes Yes Yes Yes Sport Yes Yes Yes Yes Yes N 684 1480 2800 2690 198 R-sqr 0.079 0.028 0.013 0.022 0.176 Note: Dependent variable for all five Linear Probability Model (LPM)-specifications is win, a dummy indi- cating if the individual competitor won. In all three specifications Olympics, sport styles and the interactions between them were used as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. As a robustness check, the same analysis was conducted using a Logit approach. The results can be found in table C.1. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

These regional categorization alongside continents could be too broad, therefore the same analysis was also conducted using 14 sub-regions 6. The results are in the Appendix, chapter C. Besides the already mentioned effect for (North-)African men wearing red, where the argument against a genuine effect of the color is the same as for the effect for African men above. The other results of this sub-region analysis are consistent with the main results in

6The number of sub-regions per region is given in brackets: Africa (2), Americas (2), Asia (5), Europe (4), Oceania (1). The contestants of the region of Oceania were predominantly athletes from Australia and New Zealand. Only 31 bouts in the whole data included athletes from the sub-regions of Melanesia (8 bouts), Micronesia (11) and Polynesia (12), therefore no further analysis was conducted for these subregions.

61 this work. The seeding is the only factor which influences the outcome of a bout, and the effect of color is insignificant.

As a result of this, cultural effects regarding the color influence can be ruled out. How- ever, this does not rule out cultural effects on wins per se. A higher popularity of sports in different regions should lead to more competition in the particular region, resulting in better athletes from the region participating in the Olympic tournaments.

6.5 Expand Hill and Barton (2005a)

Previous results suggest, that the color does not have an influence on the outcome. Neverthe- less, according to Hill and Barton (2005a) there is such an effect. To exclude the possibility, that this is a result of the methods used, their approach was expanded. Using every tour- nament where there was no seeding7, the same analysis as in Hill and Barton (2005a) was conducted. This exclusion of tournaments which include seeding is a requirement to use the methods used by Hill and Barton (2005a) correctly.

No statistical significant effect of the color was found in this analysis. This is not sur- prising, because Hill and Barton (2005a) do not find significant results for three of the four sports they analyzed8.

Figure 6.2 shows a clear trend towards a 50/50 split between the colors. Table 6.5 gives the exact numbers per sport and color. The χ2-analysis shows no statistically significant results, indicating again that there is no influence of the color of the sportswear on the outcome. The same was done for the women’s competitions in the same tournaments. The results are again not significant and can be found in figure D.1 and table D.5. As stated in section 5.2.3, the post-analysis of the win percentage depending on the degree of asymmetry has serious backward causation problems, and was therefore not repeated with more data. In summary, these results indicate again, that the effect reported by Hill and Barton (2005a) was a statistical fluke.

7This includes every wrestling tournament in the data (excluding the group stage), and the boxing and taekwondo tournaments before 2012 (2000-2008). 8see section 5.2.1, especially table 5.5

62 Figure 6.2: Win percentage per color and sport, Replication of Hill and Barton (2005a)

Win percentage per sport All tournaments without seeding - Only male competitors

blue red 95% CI 65

60

55

50

45

40 Percentage of contests won

35 ALL BOX TKD WFS WGR

Note: These results include the men’s competitions in boxing and taekwondo in the Olympics 2000 - 2008, and the men’s competitions in wrestling in the Olympics 2000 - 2012. Bouts in the group stage and wins by walkover, withdrawal and retirement were excluded. The exact numbers per sport can be found in table 6.5.

Table 6.5: Tables of χ2-results per sport in all tournaments without seeding, only male competitors

Sport RED-Winner BLUE-Winner χ2 P df Hill & Burton: ALL 4.19 0.041 1 all tournaments without seeding: ALL 959 924 0.650 0.420 1 BOX 416 411 0.030 0.862 1 TKD 119 98 2.032 0.154 1 WFS 204 210 0.087 0.768 1 WGR 220 205 0.529 0.467 1 Note: These results include the men’s competitions in boxing and taekwondo in the Olympics 2000 - 2008, and the men’s competitions in wrestling in the Olympics 2000 - 2012. Bouts in the group stage and wins by walkover, withdrawal and retirement were excluded.

63 Chapter 7

Heterogeneity on Sports

Although the results of the section 6.1 show no effect of the color of the sportswear on the outcome, it still could be the case that the effect is only prevalent in certain sports. Therefore the next sections will look at boxing, taekwondo and wrestling in more detail. The linear regressions used are

points = β0 + β1red + β2male + +β3seeded +  (7.1) where points are the number of points per bout or per bout, and red, male, seeded and  are the same variables as in equation (6.1). Again, betterseeded is used in some variations of equation (7.1) to account for the possibility that both athletes were seeded. To check the robustness of the results, the analysis was also done using negative binomial regressions.

7.1 Boxing Results

For boxing, the points per round are available for the years 2004-2016, and the overall points per bout are available for 2000-2012. In every specification, the color of the sportswear does not have an effect on the points scored. The variables which do influence the outcome are those dependent on the seeding of the athletes, which are positive and highly significant. Although the sex is also statistically significant, the results should be interpreted with enor- mous cautiousness. The first Olympics, which included women’s boxing, were the Olympics 2012 in London. Therefore the specifications (1) and (2) include 66 bouts, and the specifi- cations (3) and (4) include 33 bouts with female boxers.

64 Table 7.1 shows no significant results of the color red on the points per round and the overall points per bout. The change of the sign of the male-coefficient can be explained with the low amount of bouts for the Women’s competitions in 2012 and 2016.

Table 7.1: Boxing - points per round and points per bout

2004-2016 2000-2012 (1) (2) (3) (4) Points per round Points per round ovr. Points ovr. Points b/se b/se b/se b/se red 0.029 0.030 0.020 0.022 (0.10) (0.10) (0.30) (0.30) male 1.666∗∗∗ 1.673∗∗∗ -4.382∗∗ -4.362∗∗ (0.44) (0.44) (1.35) (1.35) seeded 0.864∗∗∗ 2.648∗∗∗ (0.14) (0.44) better seeded 0.903∗∗∗ 2.760∗∗∗ (0.14) (0.45) constant 3.749∗∗∗ 3.742∗∗∗ 15.191∗∗∗ 15.169∗∗∗ (0.28) (0.28) (1.07) (1.08) Tournament Dummy Yes Yes Yes Yes N 5690 5690 1922 1922 R-sqr 0.362 0.362 0.485 0.485 Note: For all four specifications, a linear regression was used. In specifications (1) and (2) the data is at a per round level for each individual. Therefore 5690 individual rounds are used in (1) and (2). "Points per round" only was only available for the Olympics 2004 - 2016. For specifications (3) and (4) "ovr.Points" were used. These are the sum of the points gained per round, and are given at a per bout level for each individual. Therefore 961 (N/2) bouts were used in the analysis. The T ournamentDummy distinguishes along sex, sport, Olympics and weight classes. As a robustness check, the same analysis was conducted using negative binomial regressions. The results can be found in table D.1. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

7.2 Taekwondo Results

The available data for taekwondo has some advantages compared with the data for the other two sports. As already stated before, the results per round were available for every tournament, which allows for an analysis in much more detail. The effects on the points earned by round are analyzed in specifications (1) and (2) in table 7.2. The points earned by round only include points which are earned by attack. Points which were awarded because of a rule violation of the opponent, or were substracted because an own rule violation, are not included in this analysis. There are two main effects. In general, men score on average

65 0.66 points more per round than women. If a competitor is seeded, or respectively better seeded than their opponent, the scored points per round are also significantly higher.

Table 7.2: Taekwondo - points per round and points per bout

(1) (2) (3) (4) points/round points/round points/bout points/bout b/se b/se b/se b/se red -0.088 -0.086 -0.279 -0.263 (0.06) (0.06) (0.17) (0.17) male 0.659∗ 0.666∗ 2.482∗∗ 2.539∗∗ (0.30) (0.29) (0.85) (0.83) seeded 0.611∗∗∗ 1.936∗∗∗ (0.12) (0.35) better seeded 0.707∗∗∗ 2.144∗∗∗ (0.11) (0.34) Tournament Dummy Yes Yes Yes Yes N 4408 4408 1428 1428 R-sqr 0.118 0.122 0.249 0.256 Note: For all four specifications, a linear regression was used. For specifications (1) and (2), the data is at a per round level, therefore 4408 individual rounds are used. "Points per round" only include the points gained by attack, points deducted for for rule-breaking or points awarded for the rule-breaking of the opponent are not included. For the specifications (3) & (4), the data is coded at the individual per bout-level. For those, N gives the number of competitors, so N/2 gives the number of bouts (714) used in the regressions. For the Olympics 2000-2012, "Ovr. Points", is the sum of the points earned by attacks over all rounds, minus the points deducted for own rule-breaking. In 2016 rule violations of one competitor resulted in points for his or her opponent. For 2016, "Ovr. Points" gives the number earned by own attacks plus the points earned by rule violations of the opponent. The T ournamentDummy distinguishes along sex, sport, Olympics and weight classes. For the regressions, the data was clustered per match. As a robustness check, the same analysis was conducted using negative binomial regressions. The results can be found in table D.2. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

The results for a similar analysis with the overall points scored per bout can be found in (3) and (4) in table 7.2. Unsurprisingly, the same variables affect the results in a significant way. The majority of bouts in the data was fought in three rounds, therefore the size of the effects concerning the seeding ("seeded" and "better seeded") is about three times the effect of the points per round analysis. For the sex dummy, the effect is about four times higher. This is a result of men violate the rules more heavily than women, as we see in specifications (3) and (4) of table 7.3. The analysis regarding the attack points in the specifications (1) and (2) show similar results as before. Seeding and sex are the only variables that influence the outcome.

66 What all eight specifications have in common, is that color does not have a significant effect on points per round, points per bout, points earned by attack and points conducted by rule violations. Especially table 7.3 is important to analyze the effects of the color. If the color alters the behavior of the athletes, this should be reflecting in either the attack points or the penalty points. We see neither, indicating that the effect found by Hill and Barton (2005a) was a statistical fluke.

Table 7.3: Taekwondo - Attack and Penalty Points

Taekwondo (1) (2) (3) (4) Attack Points Attack Points Penalty Points Penalty Points b/se b/se b/se b/se red -0.197 -0.181 0.069 0.068 (0.18) (0.18) (0.04) (0.04) male 1.288 1.314 0.595∗∗ 0.597∗∗ (0.77) (0.76) (0.20) (0.19) seeded 1.831∗∗∗ -0.077 (0.34) (0.08) better seeded 2.083∗∗∗ -0.094 (0.34) (0.07) constant 2.798∗∗∗ 2.790∗∗∗ 0.332∗∗ 0.333∗∗ (0.28) (0.28) (0.12) (0.12) Tournament Dummy Yes Yes Yes Yes N 1430 1430 1448 1448 R-sqr 0.221 0.229 0.134 0.134 Note: For all four specifications, a linear regression was used. The data is at a per bout level, so N gives the number of competitors, so N/2 gives the number of bouts (714) used in the regressions. "Attack points", used in specification (1) and (2) are only the ones earned by own attacks. "Penalty points", used in (3) and (4), are the points deducted for own rule violations between 2000 and 2012, respectively the number of points the opponent earned by own rule violations in 2016. The T ournamentDummy distinguishes along sex, sport, Olympics and weight classes. For the regressions, the data was clustered per match. As a robustness check, the same analysis was conducted using negative binomial regressions. The results can be found in table D.3. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

7.3 Wrestling Results

For wrestling, the data for technical points and overall points is not available for all four tournaments. For the Olympic tournaments 2000 and 2004, technical points (specifications (1) and (3)) were used for the analysis, for 2008 and 2012, overall points (specifications (2) and (4)) were used. As table 7.4 shows, there is no significant effect of the color, as it is for

67 boxing and taekwondo. As there was a group stage in 2000 and 2004, and no seeding in 2008 and 2012, controlling for seeding was not possible. However, this ensures that the assignment of colors is random. These non-significant results from "random bouts" strengthens the point, that the color is not a major influence on the outcome of the bouts.

Table 7.4: Wrestling - points per round and points per bout

Free-style wrestling Greco-roman wrestling (1) (2) (3) (4) Tech. Points Ovr. Points Tech. Points Ovr. Points b/se b/se b/se b/se red 0.203 -0.162 0.196 -0.098 (0.25) (0.10) (0.28) (0.11) male -1.011 -0.686∗ (0.54) (0.27) constant 3.649∗∗∗ 1.756∗∗∗ 1.835∗∗∗ 1.520∗∗∗ (0.34) (0.24) (0.25) (0.12) Tournament Dummy Yes Yes Yes Yes N 1006 2050 888 1492 R-sqr 0.027 0.029 0.045 0.082 Note: For all four specifications, a linear regression was used. Specifications (1) and (3) use "Tech. Points". These were used in the Olympics 2000 and 2004. In 2008 and 2012 (specifications (2) and (4)), "Ovr. Points" were used, which are the sum of the points per round. The data is given at the "individual per bout"-level, therefore N gives the number of competitors, so N/2 gives the number of bouts used in the regressions. There were no women’s competitions for Greco-Roman wrestling, and no women’s competitions for the 2000 free-style wrestling competitions. As a robustness check, the same analysis was conducted using negative binomial regressions. The results can be found in table D.4. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

68 Chapter 8

Conclusion

The main goal of this thesis was to determine if the color red has an impact on the outcome of sports competitions. Although there is research regarding this topic using team sports, a lot of noise is introduced by teams and their team chemistry. This means sports in which individuals compete in predetermined colors provide the better research setting.

The claim, that the color red could potentially interfere in tournaments is plausible. In biological and psychological research there is a lot of evidence that the color red has an in- fluence on the behavior of animals and humans. The biological effects described range from more dominant behavior in macaques and widowbirds to different physical reaction when fighting in Men (Andersson et al., 2002, Dreiskaemper et al., 2013 Setchell and Dixson, 2001, Setchell and Jean Wickings, 2005). Experiments also suggest that humans perceive people dressed in red differently. For example, people had a higher perceived dominance when imagining competing in red (Feltman and Elliot, 2011).

If such an effect of the color red would be true, this could have enormous consequences, especially for sports competitions. Every tournament which features athletes competing in red could possible be biased. This would have large effects on tournaments like the Olympic games itself and on betting markets. While this possibilities are studied by scientist of mul- tiple fields, there is no unanimous view in the literature. Most of the papers which examine this question do leave a lot to be desired regarding their sample size. Shrout and Rodgers (2018) states that small sample sizes are one of the major reasons for the so called replication crisis. This describes the phenomenon that a surprisingly high number of research papers in

69 the social sciences can not be replicated by others. As papers with statistically significant effects are favored over no-effect papers, replications are an important measure to prevent the inclusion of inaccurate findings into the (scientific) knowledge. Therefore, this work also includes a replication of the work by Hill and Barton (2005a). Their main results could be replicated entirely for the 2004 Olympics. Although, these are only statistically significant for boxing and not statistically significant for the other sport styles. The second part of their research, the win percentage dependent on the degree of asymmetry, could not be replicated, while major flaws in this approach were exposed. To tackle the issue regarding the sample size, this work uses data from five different Olympic tournaments and three different sports (boxing, taekwondo and wrestling). When extending their approach to a bigger sample, no effect of the color red was found.

The results of this research speak heavily against a potential effect of the color red on the outcome of sporting contests. There is no indication of an effect of the color red, nei- ther in the whole sample, nor in various sub-samples split along the lines of sex, degree of randomness in the draw or culture. The results remain this way when splitting the data into sub-samples per sport. If there would be a genuine effect of the color red, this should be observable in detailed settings, for example in higher scores per round. To conclude, the data of the last five Olympic tournaments give no indication that competitors dressed in red do have an advantage. The most likely explanation for previous research which found such an effect is that these instances were a statistical fluke. While we see effects for certain sub-samples, with a large enough number of observations the results come close to the ex- pected value. Of all the bouts in the data of this thesis, 50.8 % were won by the competitor in red, which is very close to the expected 50/50-split. All in all, this thesis shows that the competitors dressed in red sportswear did not win significantly more bouts than their counterparts. Based on these results, we can exclude the possibility of such an effect with high certainty.

70 Chapter 9

Literature

Anbarci, Nejat, Ching-Jen Sun, and M. Utku Uenver (2015). “Designing Fair Tiebreak Mech- anisms: The Case of FIFA Penalty Shootouts”. In: SSRN Electronic Journal. issn: 1556- 5068. Andersson, Staffan et al. (2002). “Multiple receivers, multiple ornaments, and a trade-off between agonistic and epigamic signaling in a widowbird”. In: The American naturalist 160.5, pp. 683–691. Apesteguia, Jose and Ignacio Palacios-Huerta (2010). “Psychological Pressure in Compet- itive Environments: Evidence from a Randomized Natural Experiment”. In: American Economic Review 100.5, pp. 2548–2564. issn: 0002-8282. Attrill, Martin J. et al. (2008). “Red shirt colour is associated with long-term team success in English football”. In: Journal of sports sciences 26.6, pp. 577–582. issn: 0264-0414. Bell, Terry (9.08.2012). “London 2012: With the ball draw is it Olympic wrestling or Olympic roulette?” In: The Province. url: https://theprovince.com/sports/london-2012- with-the-ball-draw-is-it-olympic-wrestling-or-olympic-roulette. Bognanno, Michael L. (2001). “Corporate Tournaments”. In: Journal of Labor Economics 19.2, pp. 290–315. issn: 0734-306X. Boudreau, Kevin J., Nicola Lacetera, and Karim R. Lakhani (2011). “Incentives and Problem Uncertainty in Innovation Contests: An Empirical Analysis”. In: Management Science 57.5, pp. 843–863. issn: 0025-1909. Brams, Steven J. and Mehmet S. Ismail (2018). “Making the Rules of Sports Fairer”. In: SIAM Review 60.1, pp. 181–202. issn: 0036-1445.

71 Brown, Jennifer (2011). “Quitters Never Win: The (Adverse) Incentive Effects of Competing with Superstars”. In: Journal of Political Economy 119.5, pp. 982–1013. issn: 0022-3808. Brown, Jennifer and Dylan B. Minor (2014). “Selecting the Best? Spillover and Shadows in Elimination Tournaments”. In: Management Science 60.12, pp. 3087–3102. issn: 0025- 1909. Bull, Clive, Andrew Schotter, and Keith Weigelt (1987). “Tournaments and Piece Rates: An Experimental Study”. In: Journal of Political Economy 95.1, pp. 1–33. issn: 0022-3808. Caldwell, David F. and Jerry M. Burger (2010). “On Thin Ice”. In: Social Psychological and Personality Science 2.3, pp. 306–310. issn: 1948-5506. Carmichael, Lorne (1983). “Firm-Specific Human Capital and Promotion Ladders”. In: The Bell Journal of Economics 14.1, p. 251. issn: 0361915X. Carpenter, Jeffrey, Peter H. Matthews, and John Schirm (2010). “Tournaments and Office Politics: Evidence from a Real Effort Experiment”. In: The American Economic Review 100.1, pp. 504–517. Casas-Arce, Pablo and F. Asís Martínez-Jerez (2009). “Relative Performance Compensation, Contests, and Dynamic Incentives”. In: Management Science 55.8, pp. 1306–1320. issn: 0025-1909. Chang, Andrew C. and Phillip Li (2017). “A Preanalysis Plan to Replicate Sixty Economics Research Papers That Worked Half of the Time”. In: American Economic Review 107.5, pp. 60–64. issn: 0002-8282. Che, Yeon-Koo and Terrence Hendershott (2008). “How to divide the possession of a foot- ball?” In: Economics Letters 99.3, pp. 561–565. issn: 0165-1765. url: http://www. sciencedirect.com/science/article/pii/S0165176507003783. Clarke, Stephen R. and John M. Norman (1995). “Home ground advantage of individual clubs in English soccer”. In: The Statistician, pp. 509–521. Clemens, Michael A. (2017). “The Meaning of Failed Replications: A Review and Proposal”. In: Journal of Economic Surveys 31.1, pp. 326–342. issn: 09500804. Conyon, Martin J., Simon I. Peck, and Graham V. Sadler (2001). “Corporate tournaments and executive compensation: Evidence from the U.K”. In: Strategic Management Journal 22.8, pp. 805–815. issn: 0143-2095.

72 Curby, David G. (2016). “Effect of Uniform Color on Outcome of Match at Senior World Wrestling Championships 2015”. In: International Journal of Wrestling Science 6.1, pp. 62– 64. issn: 2161-5667. De Varo, Jed (2006). “Internal Promotion Competitions in Firms”. In: The RAND Journal of Economics 37.3, pp. 521–542. issn: 07416261. Delfgaauw, Josse et al. (2015). “The Effects of Prize Spread and Noise in Elimination Tour- naments: A Natural Field Experiment”. In: Journal of Labor Economics 33.3, pp. 521– 569. issn: 0734-306X. Deller, Carolyn and Tatiana Sandino (2019). “Effects of a Tournament Incentive Plan Incor- porating Managerial Discretion in a Geographically Dispersed Organization”. In: Man- agement Science. issn: 0025-1909. Dewald, William G., Jerry G. Thursby, and Richard G. Anderson (1986). “Replication in Empirical Economics: The Journal of Money, Credit and Banking Project”. In: The Amer- ican Economic Review 76.4, pp. 587–603. Dijkstra, Peter D. and Paul T. Y. Preenen (2008). “No effect of blue on winning contests in judo”. In: Proceedings. Biological sciences 275.1639, pp. 1157–1162. issn: 0962-8452. Dixit, Avinash (1987). “Strategic Behavior in Contests”. In: American Economic Review 77.5, pp. 891–898. issn: 0002-8282. Dreiskaemper, Dennis et al. (2013). “Influence of Red Jersey Color on Physical Parameters in Combat Sports”. In: Journal of Sport and Exercise Psychology 35.1, pp. 44–49. issn: 0895-2779. Duvendack, Maren, Richard Palmer-Jones, and W. Robert Reed (2017). “What Is Meant by “Replication” and Why Does It Encounter Resistance in Economics?” In: American Economic Review 107.5, pp. 46–51. issn: 0002-8282. Ehrenberg, Ronald G. and Michael L. Bognanno (1990a). “Do Tournaments Have Incentive Effects?” In: Journal of Political Economy 98.6, pp. 1307–1324. issn: 0022-3808. — (1990b). “The Incentive Effects of Tournaments Revisited: Evidence from the European PGA Tour”. In: ILR Review 43.3, 74–S–88–S. issn: 0019-7939. Fecher, Benedikt and Gert G. Wagner (2016). “A research symbiont”. In: Science (New York, N.Y.) 351.6280, pp. 1405–1406. Feigenbaum, Susan and David M. Levy (1993). “The market for (ir)reproducible economet- rics”. In: Social Epistemology 7.3, pp. 215–232. issn: 0269-1728.

73 Feltman, Roger and Andrew J. Elliot (2011). “The Influence of Red on Perceptions of Relative Dominance and Threat in a Competitive Context”. In: Journal of Sport and Exercise Psychology 33.2, pp. 308–314. issn: 0895-2779. Fortunato, Laura and Aaron Clauset (2016). “Revisiting the effect of red on competition in humans”. In: Frank, Mark G. and Thomas Gilovich (1988). “The dark side of self- and social perception: Black uniforms and aggression in professional sports”. In: Journal of Personality and Social Psychology 54.1, pp. 74–85. issn: 1939-1315. Fryer, Roland G. and Glenn C. Loury (2005). “Affirmative action in winner-take-all markets”. In: The Journal of Economic Inequality 3.3, pp. 263–280. García-Rubio, Miguel A., Andrés J. Picazo-Tadeo, and Francisco González-Gómez (2011). “Does a red shirt improve sporting performance? Evidence from Spanish football”. In: Applied Economics Letters 18.11, pp. 1001–1004. issn: 1350-4851. Garciano, Luis and Ignacio Palacios-Huerta (2005). Sabotage in Tournaments: Making the Beautiful Game a Bit Less Beautiful. Genakos, Christos and Mario Pagliero (2012). “Interim Rank, Risk Taking, and Performance in Dynamic Tournaments”. In: Journal of Political Economy 120.4, pp. 782–813. issn: 0022-3808. Gleditsch, Nils P., Claire Metelits, and Havard. Strand (2003). “Posting your data: Will you be scooped or will you be famous”. In: International Studies Perspectives 4.1, pp. 89–97. Green, Jerry R. and Nancy L. Stokey (1983). “A Comparison of Tournaments and Contracts”. In: Journal of Political Economy 91.3, pp. 349–364. issn: 0022-3808. Greenlees, Iain A., Michael Eynon, and Richard C. Thelwell (2013). “Color of Soccer Goal- keepers’ Uniforms Influences the Outcome of Penalty Kicks”. In: Perceptual and Motor Skills 117.1, pp. 1–10. issn: 0031-5125. Greenlees, Iain et al. (2008). “Soccer penalty takers’ uniform colour and pre-penalty kick gaze affect the impressions formed of them by opposing goalkeepers”. In: Journal of sports sciences 26.6, pp. 569–576. issn: 0264-0414. Grund, Christian and Oliver Gürtler (2005). “An empirical study on risk-taking in tourna- ments”. In: Applied Economics Letters 12.8, pp. 457–461. issn: 1350-4851. Hagemann, Norbert, Bernd Strauss, and Jan Leissing (2008). “When the referee sees red”. In: Psychological science 19.8, pp. 769–771. issn: 1467-9280.

74 Hamermesh, Daniel S. (2007). “Viewpoint: Replication in economics”. In: Canadian Journal of Economics/Revue canadienne d’économique 40.3, pp. 715–733. issn: 00084085. Harford, Tim (2006). Why Your Boss Is Overpaid. Forbes. url: https://www.forbes.com/ 2009/02/19/pay- boss- compensation- leadership- compensation_tim_harford. html#1ca2765d50b3. Hill, Russell A. and Robert A. Barton (2005a). “Red enhances human performance in con- tests”. In: Nature 435.7040, p. 293. issn: 1476-4687. — (2005b). “Seeing red? Putting sportswear in context (reply)”. In: Nature 437.7063, E10– E11. issn: 1476-4687. IOC (2017). IOC Annual Report 2016: Credibility, Sustainability and Youth. url: https: //stillmed.olympic.org/media/Document%20Library/OlympicOrg/Documents/IOC- Annual- Report/IOC- Annual- Report- 2016.pdf#_ga=2.137559216.1273528106. 1565770430-643804071.1565770430. — (2019). IOC Annual Report 2018: Credibility, Sustainability and Youth. url: https : //stillmed.olympic.org/media/Document%20Library/OlympicOrg/Documents/IOC- Annual- Report/IOC- ANNUAL- REPORT- 2018.pdf#_ga=2.137559216.1273528106. 1565770430-643804071.1565770430. Ilie, Andrei et al. (2008). “Better to be red than blue in virtual competition”. In: Cyberpsy- chology & behavior : the impact of the Internet, multimedia and virtual reality on behavior and society 11.3, pp. 375–377. issn: 1557-8364. Jamieson, Jeremy P. (2010). “The Home Field Advantage in Athletics: A Meta-Analysis”. In: Journal of Applied Social Psychology 40.7, pp. 1819–1848. issn: 00219029. Knoeber, Charles R. and Walter N. Thurman (1994). “Testing the Theory of Tournaments: An Empirical Analysis of Broiler Production”. In: Journal of Labor Economics 12.2, pp. 155–179. issn: 0734-306X. Kocher, Martin G., Marc V. Lenz, and Matthias Sutter (2012). “Psychological Pressure in Competitive Environments: New Evidence from Randomized Natural Experiments”. In: Management Science 58.8, pp. 1585–1591. issn: 0025-1909. LOCOG (2013). London 2012 Olympic Games Official Report. London: London Organising Comitee of the Olympic Games and Paralympic Games (LOCOG). Lackner, Mario et al. (2015). Are Competitors Forward Looking in Strategic Interactions? Evidence from the Field. url: http://hdl.handle.net/10419/126653.

75 Lazear, Edward P. and Sherwin Rosen (1981). “Rank-Order Tournaments as Optimum Labor Contracts”. In: Journal of Political Economy 89.5, pp. 841–864. issn: 0022-3808. Leitner, Christoph, Achim Zeileis, and Kurt Hornik (2010). “Forecasting sports tournaments by ratings of (prob)abilities: A comparison for the EURO 2008”. In: International Journal of Forecasting 26.3, pp. 471–481. issn: 01692070. Levitt, Steven D. (1997). “Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime”. In: American Economic Review 87.3, pp. 270–290. issn: 0002-8282. Longo, Dan L. and Jeffrey M. Drazen (2016). “Data Sharing”. In: The New England journal of medicine 374.3, pp. 276–277. Maloney, Michael T. and Robert E. McCormick (2000). “The Response of Workers to Wages in Tournaments”. In: Journal of Sports Economics 1.2, pp. 99–123. issn: 1527-0025. Matsumoto, David et al. (2007). “Blue Judogis may Bias Competition Outcomes”. In: Re- search Journal of Budo 39.3, pp. 1–7. Mayer, Thomas (1980). “Economics as a Hard Science: Realistic Goal or Wishful Thinking?” In: Economic Inquiry 18.2, pp. 165–178. issn: 1465-7295. McCrary, Justin (2002). “Using Electoral Cycles in Police Hiring to Estimate the Effect of Police on Crime: Comment”. In: American Economic Review 92.4, pp. 1236–1243. issn: 0002-8282. McCullough, B. D. (2009). “Open Access Economics Journals and the Market for Repro- ducible Economic Research”. In: Economic Analysis and Policy 39.1, pp. 117–126. issn: 03135926. McCullough, B. D., Kerry Anne McGeary, and Teresa D. Harrison (2006). “Lessons from the JMCB Archive”. In: Journal of Money, Credit, and Banking 38.4, pp. 1093–1107. issn: 1538-4616. url: http://dx.doi.org/10.1353/mcb.2006.0061. Moldovanu, Benny and Aner Sela (2006). “Contest architecture”. In: Journal of Economic Theory 126.1, pp. 70–96. issn: 00220531. National Academy of Sciences (2016). Statistical Challenges in Assessing and Fostering Re- producibility of Scientific: Summary of a Workshop. Washington. Page, Lionel and Katie Page (2007). “The second leg home advantage: Evidence from Eu- ropean football cup competitions”. In: Journal of sports sciences 25.14, pp. 1547–1556. issn: 0264-0414.

76 Palacios-Huerta, Ignacio (2014). Beautiful game theory: How soccer can help economics. Princeton: Princeton University Press. isbn: 9781400850310. url: http://www.jstor. org/stable/10.2307/j.ctt6wq05z. Pesaran, Hashem (2003). “Introducing a replication section”. In: Journal of Applied Econo- metrics 18.1, p. 111. issn: 0883-7252. Piatti, Marco, David A. Savage, and Benno Torgler (2012). “The red mist? Red shirts, success and team sports”. In: Sport in Society 15.9, pp. 1209–1227. issn: 1743-0437. Pollard, Richard (1986). “Home advantage in soccer: A retrospective analysis”. In: Journal of sports sciences 4.3, pp. 237–248. issn: 0264-0414. — (2008). “Home Advantage in Football: A Current Review of an Unsolved Puzzle”. In: The Open Sports Sciences Journal 1.1, pp. 12–14. “Professional boxers will be allowed to compete at Rio Olympics” (1.06.2016). In: The Guardian. url: https://www.theguardian.com/sport/2016/jun/01/professional- boxers-allowed-compete-at-rio-olympics. Pryke, S. R. (2002). “Carotenoid status signaling in captive and wild red-collared widowbirds: independent effects of badge size and color”. In: Behavioral Ecology 13.5, pp. 622–631. Recours, Robin and Walid Briki (2015). “The effect of red and blue uniforms on competitive anxiety and self-confidence in virtual sports contests”. In: Revue Européenne de Psycholo- gie Appliquée/European Review of Applied Psychology 65.2, pp. 67–69. issn: 11629088. Rosen, Sherwin (1986). “Prizes and Incentives in Elimination Tournaments”. In: The Amer- ican Economic Review 76.4, pp. 701–715. Rowe, Candy, Julie M. Harris, and S. Craig Roberts (2005). “Sporting contests: Seeing red? Putting sportswear in context”. In: Nature 437.7063, E10; discussion E10–1. issn: 1476- 4687. — (2006). “Corrigendum: Seeing red? Putting sportswear into context”. In: Nature 441.7090, E3–E3. issn: 1476-4687. SOCOG (2001). Official report of the XXVII Olympiad: Sydney 2000 Olympic Games ; 15. September - 1 October 2000. Sydney: Sydney Organising Committee for the Olympic Games (SOCOG). isbn: 9780957961609. Seife, Charles (n.d.). Red does not enhance human performance in the Olympics. url: http: //www.users.cloud9.net/~cgseife/SeifeOlympicsManuscript12February.pdf.

77 Setchell, J. M. and A. F. Dixson (2001). “Arrested development of secondary sexual adorn- ments in subordinate adult male mandrills (Mandrillus sphinx)”. In: American journal of physical anthropology 115.3, pp. 245–252. issn: 0002-9483. Setchell, Joanna M. and E. Jean Wickings (2005). “Dominance, Status Signals and Coloration in Male Mandrills (Mandrillus sphinx)”. In: 111.1, pp. 25–50. issn: 0179-1613. Shrout, Patrick E. and Joseph L. Rodgers (2018). “Psychology, Science, and Knowledge Construction: Broadening Perspectives from the Replication Crisis”. In: Annual review of psychology 69, pp. 487–510. Sorokowski, Piotr and Andrzej Szmajke (2011). “The Influence of the "Red Win" Effect in Sports: A Hypothesis of Erroneous Perception of Opponents Dressed in Red - Preliminary Test”. In: Human Movement 12.4. issn: 1899-1955. Sterling, Theodore D. (1959). “Publication Decisions and their Possible Effects on Inferences Drawn from Tests of Significance—or Vice Versa”. In: Journal of the American Statistical Association 54.285, pp. 30–34. issn: 0162-1459. Stokes, Houston H. (2004). “On the advantage of using two or more econometric software systems to solve the same problem”. In: Journal of Economic and Social Measurement 29.1-3, pp. 307–320. issn: 07479662. Strain and Daniel (27.02.2012). “The Red-Dress Effect”. In: Science. url: https://www. sciencemag.org/news/2012/02/red-dress-effect#. Streif, Georg (23.03.2020). Informationen für Masterarbeit: E-mail message. Ed. by Matthias Nikolaus Hilgarth. Taylor, Curtis R. (1995). “Digging for Golden Carrots: An Analysis of Research Tourna- ments”. In: The American Economic Review 85.4, pp. 872–890. United Nations. Methodology: Standard country or area codes for statistical use (M49). url: https://unstats.un.org/unsd/methodology/m49/overview. Vlaeminck, Sven and Lisa-Kristin Herrmann (2015). “Data Policies and Data Archives: A New Paradigm for Academic Publishing in Economic Sciences?” In: New Avenues for Electronic Publishing in the Age of Infinite Collections and Citizen Science. Ed. by Birgit Schmidt and Milena Dobreva. Waitt, Corri et al. (2003). “Evidence from rhesus macaques suggests that male coloration plays a role in female mate choice”. In: Proceedings. Biological sciences 270 Suppl 2, S144–6. issn: 0962-8452.

78 Wiedemann, Diana et al. (2015). “Red clothing increases perceived dominance, aggression and anger”. In: Biology letters 11.5, p. 20150166. Zelustek, Jürgen and Thomas Niklaus (1.02.2006). “Klinsmann steht auf Rot”. In: Spiegel Online. url: https://www.spiegel.de/sport/fussball/traditionstrikot-vor- dem-aus-klinsmann-steht-auf-rot-a-398580.html. Zion Market Research (2018). Global Sports Betting Market Will Reach USD 155.49 Billion By 2024: Zion Market Research. GlobeNewswire. url: https://www.globenewswire. com/news-release/2018/12/24/1678117/0/en/Global-Sports-Betting-Market- Will-Reach-USD-155-49-Billion-By-2024-Zion-Market-Research.html.

79 Appendix A

Appendix - Data per Sport

A.1 Boxing Data

Woman’s boxing was first included in the Olympics in London 2012, and only featured three different weight classes in the Olympics 2012 and 2016. Therefore, only 66 bouts of Women’s boxing are included, which makes the Woman’s data for boxing highly vulnerable to statistical outliers. Regarding the seeding, in the years 2000-2008 boxers were drawn randomly. Afterwards, up to 8 contestants were seeded. For the tournaments 2004, 2008 and 2012, additional data was available at a per-round basis. For the other two tournaments in the sample, only the results of the full bout was obtainable.

Table A.1: Summary of selected variables for boxing

Share Min Max Sum male 0.952 0 1 2642 color 0.500 0 1 1387 red_win 0.516 0 1 1432 not_relevant 0.018 0 1 50 seeded 0.132 0 1 367 red_and_seed 0.068 0 1 189 blue_and_seed 0.064 0 1 178 both_seeded 0.043 0 1 118 Note: N=2774 competitors, throughout 1387 (=2774/2) bouts. Data is at a person-level, which means that the sum is equal to the number of observations with said trait. E.g., 6500 of the observations in the data are male. The only exceptions are redwin and notrelevant, there the dummy is the same for both competitors, e.g. 716 (=1432/2) bouts were won by the one competing in red. notrelevant are those bouts, which were decided by withdrawal, retirement, walkover or disqualification. "seeded" indicates that a person was seeded in the bout, "red and seed" ("blue and seed") indicate that a person was seeded and competed in red (blue). When both were seeded, "both seeded" is coded as 1.

80 Table A.2: Distribution of bouts by seeding and sex for boxing

Seedings female male Total 0 33 1046 1079 (50.00) (79.18) (77.79) 1 33 275 308 (50.00) (20.82) (22.21) Total 66 1321 1387 (100.00) (100.00) (100.00) Note: Distribution of bouts by seeding and sex for boxing. For the purpose of this table, if one of the two competitors is seeded, the bout is considered as "seeded". Percentages are given in brackets.

Table A.3: Distribution of bouts by sex and sports in the data

Olympics female male Total Sydney 0 298 298 (0.00) (22.56) (21.49) Athens 0 272 272 (0.00) (20.59) (19.61) Beijing 0 272 272 (0.00) (20.59) (19.61) London 33 239 272 (50.00) (18.09) (19.61) Rio 33 240 273 (50.00) (18.17) (19.68) Total 66 1321 1387 (100.00) (100.00) (100.00) Note: Distribution of bouts by sex and Olympics. Percentages are given in brackets.

81 A.2 Taekwondo Data

Taekwondo is the only sport in the data, where the balance between the two sexes is balanced. Since its first inclusion in the 2000 Olympics, there were four weight classes for men and women. For all of the five Olympic tournaments, the data was available per bout as well as per round. This allows for a better and more detailed analysis.

Table A.4: Summary of selected variables for taekwondo

Share Min Max Sum male 0.511 0 1 756 color 0.500 0 1 740 red_win 0.503 0 1 744 not_relevant 0.022 0 1 32 seeded 0.240 0 1 355 red_and_seed 0.120 0 1 178 blue_and_seed 0.120 0 1 177 both_seeded 0.101 0 1 150 Note: N=1480 competitors, throughout 740 (=1480/2) bouts. Data is at a person-level, which means that the sum is equal to the number of observations with said trait. E.g., 6500 of the observations in the data are male. The only exceptions are redwin and notrelevant, there the dummy is the same for both competitors, e.g. 372 (=744/2) bouts were won by the one competing in red. notrelevant are those bouts, which were decided by withdrawal, retirement, walkover or disqualification. "seeded" indicates that a person was seeded in the bout, "red and seed" ("blue and seed") indicate that a person was seeded and competed in red (blue). When both were seeded, "both seeded" is coded as 1.

Table A.5: Distribution of bouts by seeding and sex for taek- wondo

Seedings female male Total 0 222 238 460 (61.33) (62.96) (62.16) 1 140 140 280 (38.67) (37.04) (37.84) Total 362 378 740 (100.00) (100.00) (100.00) Note: Distribution of bouts by seeding and sex for taekwondo. For the purpose of this table, if one of the two competitors is seeded, the bout is considered as "seeded". Percentages are given in brackets.

82 Table A.6: Distribution of bouts by sex and Olympics for taek- wondo

Olympics female male Total Sydney 59 70 129 (16.30) (18.52) (17.43) Athens 75 80 155 (20.72) (21.16) (20.95) Beijing 76 76 152 (20.99) (20.11) (20.54) London 76 76 152 (20.99) (20.11) (20.54) Rio 76 76 152 (20.99) (20.11) (20.54) Total 362 378 740 (100.00) (100.00) (100.00) Note: Distribution of bouts by sex and Olympics. Percentages are given in brackets.

A.3 Wrestling Data

The 2000 Olympic tournament is the only one in the data, where no women competed in wrestling competitions. In 2004, a woman’s competition was introduced for free-style wrestling. For Greco-Roman wrestling, there are still no Olympic competitions for women. There was no seeding in the time frame of this work. While the tournaments 2000 and 2004 included a group stage at the beginning, there was a random draw for 2008 and 2012.

The competitions in the Olympics 2016 were excluded for both styles of wrestling, because the sportswear can not always be defined as red or blue. An example of this is given in figure A.1. While it is still clear that Rodriguez (CUB, left) competes in red and Rahimi (IRI, right) in blue, the colors are not dominant enough to claim an influence on the outcome. Obtaining footage of each fight separately and decide if the sportswear is still mostly red or blue was not feasible, and would not be objective. Using it as a placebo, because the players are still assigned the red or blue sportswear is also not possible, because there are still many fights fought in sportswear which is only red or blue.

83 Figure A.1: Sample image of men freestyle wrestling at the Olympics 2016

Note: Hassan Rahimi (IRI) wrestling with Yowlys Rodriguez (CUB) in their 57kg Bronze Medal bout. Source: http://www.payvand.com/news/16/aug/1115.html, last time accessed: 11 Sep 2019

Table A.7: Distribution of bouts by sex and Olympics for free- style wrestling

Olympics female male Total Sydney 0 233 233 (0.00) (30.54) (23.02) Athens 82 210 292 (32.93) (27.52) (28.85) Beijing 78 165 243 (31.33) (21.63) (24.01) London 89 155 244 (35.74) (20.31) (24.11) Total 249 763 1012 (100.00) (100.00) (100.00) Note: Distribution of bouts by sex and Olympics for free-style wrestling. Percentages are given in brackets.

84 Appendix B

Robustness Checks for 6.1 General Analysis

Table B.1: Robustness - Logit Main Regression

Logit (1) (2) (3) (4) (5) win win win win win b/se b/se b/se b/se b/se

red 0.074 0.063 -0.059 0.064 -0.042 (0.06) (0.07) (0.11) (0.07) (0.11) seeded 0.641∗∗∗ 1.077∗∗∗ 1.203∗∗∗ (0.07) (0.15) (0.19) red × seeded 0.128 0.149 (0.19) (0.19) male × red 0.145 0.128 (0.11) (0.11) male × seeded -0.239 (0.16) better seeded 1.348∗∗∗ 1.533∗∗∗ (0.17) (0.24) red × better seeded 0.134 0.152 (0.19) (0.18) male × better seeded -0.291 (0.23) Tournament Dummy No Yes No Yes No Olympics No No Yes No Yes Sport No No Yes No Yes Note: Number= 7672, two observations per 3836 bouts. Dependent variable for all five Logit Model speci- fications is win, a dummy indicating if the individual competitor won, and all regressions are clustered at the bout-level. The T ournamentDummy distinguishes along sex, sport, Olympics and weight classes. In specifications (3) and (5), Olympics, sport styles and the interactions between them were used as controls. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

85 Table B.2: Robustness - Logit Sex Splits

Both sexes Male Female (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.063 0.064 0.093 0.094 -0.098 -0.087 (0.07) (0.07) (0.07) (0.07) (0.17) (0.16) seeded 1.077∗∗∗ 0.957∗∗∗ 1.344∗∗∗ (0.15) (0.18) (0.30) red × seeded 0.128 0.190 0.088 (0.19) (0.23) (0.36) better seeded 1.348∗∗∗ 1.233∗∗∗ 1.591∗∗∗ (0.17) (0.20) (0.34) red × better seeded 0.134 0.184 0.112 (0.19) (0.22) (0.34) Tournament Dummy Yes Yes Yes Yes Yes Yes N 7672 7672 6324 6324 1348 1348 Note: Dependent variable for all six Logit-specifications is win, a dummy indicating if the individual com- petitor won, and all regressions are clustered at the bout-level. The T ournamentDummy distinguishes along sex, sport, Olympics and weight classes. The male sample (used in (3) & (4)) and the female sample (used in (5) & (6)) are subsamples of the whole sample, which was used in (1) & (2). The data is given at the "individual per bout"-level, therefore N gives the number of competitors, so N/2 gives the number of bouts used in the regressions. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

Table B.3: Robustness - Logit Random Splits

Non-random Random first round Random later rounds (1) (2) (3) win win win b/se b/se b/se

red 0.088 0.080 -0.219 (0.16) (0.20) (0.18) male × red 0.120 0.052 0.120 (0.15) (0.19) (0.17) Olympics Yes Yes Yes Sport Yes Yes Yes N 3152 2188 2332 Note: Dependent variable for all three Logit-specifications is win, a dummy indicating if the individual competitor won, and all regressions are clustered at the bout-level. The "Non-random"-sample (1) consists of all bouts, where there was at least one seeded competitor involved, either at the stage of the bout, or at one of the preceding stages of the tournament. E.g., if a seeded competitor lost in the first round, all following bouts in the tournament tree, where this (seeded) competitor could have been if he or she won, are also coded as "Non-random". "Random first round" (used in (2)) are only these bouts, which are the first bout in the tournament for both competitors, and non of them was seeded. "Random later rounds" (used in (3)) consists of all bouts, where all of the first round bouts leading to this specific bout in the tournament was part of the "Random first round" sample. In all three specifications Olympics, sport styles and the interactions between them were used as controls. The data is given at the "individual per bout"- level, therefore N gives the number of competitors, so N/2 gives the number of bouts used in the regressions. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001. 86 Appendix C

Results and Robustness checks for 6.4 Cultural Effects

Table C.1: Effects of the color of the sportswear on winning - continent splits

Logit (1) (2) (3) (4) (5) win win win win win b/se b/se b/se b/se b/se

red -0.841 0.017 0.006 -0.150 0.732 (0.52) (0.20) (0.17) (0.17) (0.66) seeded 0.872 0.926 0.804∗ 1.479∗∗∗ 16.667∗∗∗ (0.69) (0.50) (0.33) (0.31) (1.70) red × seeded -0.285 0.222 0.410 -0.068 -0.837 (0.66) (0.41) (0.30) (0.29) (1.72) male × red 1.329∗ 0.077 0.026 0.227 -0.306 (0.52) (0.20) (0.17) (0.17) (0.68) male × seeded 0.224 -0.050 -0.178 -0.452 -0.158 (0.71) (0.45) (0.30) (0.28) (1.45) Olympics Yes Yes Yes Yes Yes Sport Yes Yes Yes Yes Yes N 649 1480 2800 2690 158 Note: Dependent variable for all five Logit-specifications is win, a dummy indicating if the individual competitor won. In all five specifications Olympics, sport styles and the interactions between them were used as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

87 C.1 Africa

Table C.2: Regional analysis - Africa

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.050 -0.115 -0.123∗ 0.279 -0.841 -0.885 (0.04) (0.06) (0.06) (0.20) (0.52) (0.52) seeded 0.215∗ 0.149 1.022∗∗ 0.872 (0.09) (0.15) (0.38) (0.69) red × seeded -0.024 -0.049 -0.174 -0.285 (0.14) (0.14) (0.60) (0.66) male × red 0.199∗∗ 0.205∗∗ 1.329∗ 1.363∗∗ (0.06) (0.06) (0.52) (0.51) male × seeded 0.081 0.224 (0.15) (0.71) better seeded 0.259∗ 1.251∗∗ (0.11) (0.48) red × better seeded -0.011 -0.103 (0.16) (0.70) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 684 684 684 684 649 649 R-sqr 0.018 0.079 0.084 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

Table C.3: Wins by color for men from Africa

win blue red Total 0 230 204 434 (77.70) (70.59) (74.19) 1 66 85 151 (22.30) (29.41) (25.81) Total 296 289 585 (100.00) (100.00) (100.00) Note: Distribution of wins by color for men from Northern Africa. Per- centages are given in brackets.

88 Table C.4: Regional analysis - Northern Africa

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.069 -0.201∗∗ -0.182∗ 0.350 -1.884∗ -1.610∗ (0.05) (0.08) (0.08) (0.26) (0.94) (0.81) seeded 0.178 0.329 0.822 1.901∗ (0.11) (0.18) (0.45) (0.97) red × seeded -0.178 -0.084 -0.824 -0.176 (0.17) (0.18) (0.77) (0.92) male × red 0.332∗∗∗ 0.313∗∗∗ 2.574∗∗ 2.307∗∗ (0.08) (0.08) (0.94) (0.82) male × seeded -0.144 -1.022 (0.18) (0.98) better seeded 0.291∗ 1.438∗ (0.13) (0.60) red × better seeded -0.070 -0.246 (0.19) (1.01) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 355 355 355 355 338 338 R-sqr 0.012 0.111 0.117 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

Table C.5: Wins by color for men from Northern Africa

win blue red Total 0 118 94 212 (75.16) (64.83) (70.20) 1 39 51 90 (24.84) (35.17) (29.80) Total 157 145 302 (100.00) (100.00) (100.00) Note: Distribution of wins by color for men from Northern Africa. Per- centages are given in brackets.

89 Table C.6: Regional analysis - Sub-Saharan Africa

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.031 -0.024 -0.022 0.196 -0.139 -0.134 (0.05) (0.10) (0.10) (0.30) (0.66) (0.66) seeded 0.265 -0.249∗ 1.297 -11.731∗∗∗ (0.17) (0.12) (0.70) (1.16) red × seeded 0.239 0.103 0.943 0.356 (0.24) (0.26) (1.11) (1.22) male × red 0.073 0.066 0.467 0.428 (0.10) (0.10) (0.66) (0.66) male × seeded 0.503∗ 12.946∗∗∗ (0.20) (1.31) better seeded 0.235 1.096 (0.22) (0.97) red × better seeded 0.225 1.066 (0.28) (1.54) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 329 329 329 329 303 303 R-sqr 0.044 0.097 0.098 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

90 C.2 Americas

Table C.7: Regional analysis - Americas

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.020 0.004 0.004 0.079 0.017 0.012 (0.03) (0.05) (0.05) (0.12) (0.20) (0.20) seeded 0.093 0.222 0.379 0.926 (0.07) (0.12) (0.28) (0.50) red × seeded 0.041 0.049 0.183 0.222 (0.09) (0.09) (0.40) (0.41) male × red 0.019 0.021 0.077 0.089 (0.05) (0.05) (0.20) (0.20) male × seeded -0.011 -0.050 (0.11) (0.45) better seeded 0.278∗∗∗ 1.178∗∗∗ (0.08) (0.35) red × better seeded 0.061 0.308 (0.09) (0.43) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 1480 1480 1480 1480 1480 1480 R-sqr 0.005 0.028 0.035 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

91 Table C.8: Regional analysis - Northern America

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.065 -0.002 0.013 0.263 -0.006 0.053 (0.05) (0.08) (0.08) (0.19) (0.31) (0.30) seeded 0.118 0.425 0.488 1.839 (0.15) (0.28) (0.64) (1.27) red × seeded -0.118 -0.161 -0.486 -0.640 (0.21) (0.24) (0.88) (1.04) male × red 0.082 0.064 0.343 0.266 (0.08) (0.08) (0.32) (0.31) male × seeded -0.390 -1.712 (0.24) (1.11) better seeded 0.305 1.395 (0.17) (0.88) red × better seeded -0.107 -0.566 (0.22) (1.08) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 504 504 504 504 504 504 R-sqr 0.005 0.035 0.036 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

92 Table C.9: Regional analysis - Latin America and the Caribbean

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red -0.008 -0.006 -0.015 -0.031 -0.027 -0.068 (0.04) (0.07) (0.06) (0.14) (0.27) (0.27) seeded 0.092 0.149 0.370 0.629 (0.08) (0.14) (0.31) (0.60) red × seeded 0.093 0.100 0.397 0.447 (0.10) (0.10) (0.44) (0.46) male × red 0.002 0.015 0.010 0.070 (0.07) (0.06) (0.27) (0.27) male × seeded 0.081 0.336 (0.13) (0.55) better seeded 0.261∗∗ 1.107∗∗ (0.09) (0.39) red × better seeded 0.123 0.592 (0.10) (0.48) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 976 976 976 976 976 976 R-sqr 0.009 0.039 0.046 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

93 C.3 Asia

Table C.10: Regional analysis - Asia

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.006 0.001 0.009 0.024 0.006 0.037 (0.02) (0.04) (0.04) (0.09) (0.17) (0.17) seeded 0.075 0.195∗ 0.308 0.804∗ (0.05) (0.08) (0.19) (0.33) red × seeded 0.098 0.090 0.439 0.410 (0.07) (0.07) (0.30) (0.30) male × red 0.007 -0.002 0.026 -0.005 (0.04) (0.04) (0.17) (0.17) male × seeded -0.040 -0.178 (0.07) (0.30) better seeded 0.233∗∗∗ 0.967∗∗∗ (0.06) (0.26) red × better seeded 0.096 0.496 (0.07) (0.33) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 2800 2800 2800 2800 2800 2800 R-sqr 0.006 0.013 0.019 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

94 Table C.11: Regional analysis - Western Asia

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.057 0.145 0.134 0.228 0.616 0.594 (0.04) (0.10) (0.09) (0.15) (0.44) (0.42) seeded 0.009 -0.049 0.037 -0.206 (0.10) (0.17) (0.38) (0.74) red × seeded 0.151 0.142 0.659 0.654 (0.14) (0.14) (0.60) (0.63) male × red -0.093 -0.080 -0.404 -0.374 (0.10) (0.09) (0.44) (0.43) male × seeded 0.128 0.532 (0.16) (0.72) better seeded 0.144 0.595 (0.12) (0.48) red × better seeded 0.208 1.163 (0.14) (0.78) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 802 802 802 802 802 802 R-sqr 0.007 0.026 0.032 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

95 Table C.12: Regional analysis - Central Asia

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red -0.050 -0.132 -0.120 -0.202 -0.629 -0.533 (0.04) (0.09) (0.09) (0.17) (0.43) (0.43) seeded 0.078 -0.256 0.323 -13.619∗∗∗ (0.09) (0.17) (0.36) (1.07) red × seeded 0.151 0.155 0.661 0.678 (0.12) (0.11) (0.54) (0.59) male × red 0.086 0.081 0.434 0.368 (0.09) (0.09) (0.43) (0.44) male × seeded 0.346 14.025∗∗∗ (0.18) (1.08) better seeded 0.262∗∗ 1.378∗ (0.10) (0.57) red × better seeded 0.110 0.581 (0.11) (0.72) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 659 659 659 659 648 648 R-sqr 0.013 0.070 0.083 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

96 Table C.13: Regional analysis - Eastern Asia

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red -0.018 0.026 0.029 -0.074 0.131 0.155 (0.04) (0.06) (0.06) (0.16) (0.27) (0.26) seeded 0.099 0.128 0.431 0.556 (0.08) (0.12) (0.35) (0.54) red × seeded 0.073 0.084 0.344 0.410 (0.10) (0.11) (0.50) (0.52) male × red -0.076 -0.076 -0.350 -0.363 (0.06) (0.06) (0.28) (0.27) male × seeded 0.029 0.106 (0.10) (0.51) better seeded 0.175 0.776 (0.10) (0.49) red × better seeded 0.063 0.325 (0.11) (0.57) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 824 824 824 824 824 824 R-sqr 0.009 0.069 0.070 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

97 Table C.14: Regional analysis - South-eastern Asia

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red -0.019 -0.104 -0.108 -0.075 -0.464 -0.483 (0.08) (0.12) (0.12) (0.32) (0.55) (0.53) seeded 0.091 0.523∗ 0.366 2.529 (0.18) (0.25) (0.71) (1.64) red × seeded -0.137 -0.281 -0.554 -1.503 (0.29) (0.29) (1.18) (1.70) male × red 0.097 0.094 0.433 0.423 (0.14) (0.14) (0.60) (0.59) male × seeded -0.556∗ -2.712 (0.26) (1.64) better seeded 0.148 0.625 (0.22) (0.90) red × better seeded 0.044 0.195 (0.35) (1.38) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 181 181 181 181 180 180 R-sqr 0.003 0.083 0.059 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

98 Table C.15: Regional analysis - Southern Asia

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.055 -0.302 -0.289 0.219 -1.404 -1.361 (0.06) (0.18) (0.18) (0.23) (0.97) (0.99) seeded 0.012 0.054 0.049 0.237 (0.21) (0.26) (0.83) (1.06) red × seeded 0.082 0.082 0.340 0.317 (0.27) (0.29) (1.10) (1.21) male × red 0.407∗ 0.386∗ 1.870 1.791 (0.18) (0.18) (0.97) (0.99) male × seeded 0.000 0.000 (.) (.) better seeded 0.071 0.304 (0.32) (1.32) red × better seeded 0.230 1.057 (0.34) (1.58) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 334 334 334 334 334 334 R-sqr 0.005 0.104 0.109 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

99 C.4 Europe

Table C.16: Regional analysis - Europe

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.009 -0.036 -0.015 0.037 -0.150 -0.062 (0.02) (0.04) (0.04) (0.09) (0.17) (0.17) seeded 0.185∗∗∗ 0.342∗∗∗ 0.780∗∗∗ 1.479∗∗∗ (0.04) (0.07) (0.19) (0.31) red × seeded -0.036 -0.013 -0.157 -0.068 (0.06) (0.06) (0.29) (0.29) male × red 0.055 0.030 0.227 0.124 (0.04) (0.04) (0.17) (0.16) male × seeded -0.100 -0.452 (0.06) (0.28) better seeded 0.358∗∗∗ 1.605∗∗∗ (0.05) (0.28) red × better seeded -0.061 -0.327 (0.06) (0.32) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 2690 2690 2690 2690 2690 2690 R-sqr 0.011 0.022 0.029 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

100 Table C.17: Regional analysis - Northern Europe

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.087 0.023 0.068 0.350 0.064 0.288 (0.05) (0.09) (0.09) (0.21) (0.41) (0.37) seeded 0.308∗∗∗ 0.427∗∗∗ 1.321∗∗ 2.024∗∗ (0.09) (0.12) (0.44) (0.68) red × seeded -0.177 -0.174 -0.778 -0.838 (0.12) (0.12) (0.56) (0.58) male × red 0.106 0.046 0.486 0.191 (0.10) (0.09) (0.42) (0.37) male × seeded -0.212 -1.058 (0.12) (0.61) better seeded 0.366∗∗∗ 1.772∗∗ (0.11) (0.63) red × better seeded -0.216 -1.118 (0.12) (0.69) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 447 447 447 447 447 447 R-sqr 0.033 0.088 0.092 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

101 Table C.18: Regional analysis - Eastern Europe

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.001 -0.017 -0.016 0.002 -0.069 -0.066 (0.03) (0.07) (0.07) (0.12) (0.28) (0.28) seeded 0.093 0.212 0.386 0.885 (0.08) (0.15) (0.36) (0.64) red × seeded 0.053 0.045 0.238 0.215 (0.11) (0.12) (0.51) (0.57) male × red 0.016 0.013 0.065 0.055 (0.07) (0.07) (0.29) (0.28) male × seeded 0.060 0.283 (0.16) (0.68) better seeded 0.260∗∗ 1.112∗ (0.10) (0.45) red × better seeded 0.070 0.342 (0.12) (0.58) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 1373 1373 1373 1373 1373 1373 R-sqr 0.003 0.020 0.021 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

102 Table C.19: Regional analysis - Western Europe

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.015 0.064 0.125 0.060 0.272 0.547 (0.05) (0.10) (0.10) (0.21) (0.43) (0.41) seeded 0.115 0.278 0.465 1.220 (0.10) (0.15) (0.42) (0.68) red × seeded 0.059 0.115 0.259 0.522 (0.14) (0.15) (0.58) (0.69) male × red -0.058 -0.129 -0.247 -0.561 (0.11) (0.10) (0.43) (0.41) male × seeded -0.207 -0.920 (0.16) (0.73) better seeded 0.411∗∗∗ 1.924∗∗ (0.12) (0.66) red × better seeded -0.054 -0.240 (0.13) (0.71) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 437 437 437 437 430 430 R-sqr 0.011 0.062 0.078 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

103 Table C.20: Regional analysis - Southern Europe

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red -0.017 -0.116 -0.088 -0.070 -0.504 -0.395 (0.05) (0.08) (0.08) (0.22) (0.36) (0.34) seeded 0.307∗∗∗ 0.414∗∗ 1.334∗∗∗ 1.845∗∗ (0.07) (0.14) (0.38) (0.71) red × seeded -0.102 -0.065 -0.499 -0.343 (0.13) (0.14) (0.61) (0.64) male × red 0.122 0.094 0.533 0.421 (0.08) (0.08) (0.37) (0.36) male × seeded -0.003 -0.009 (0.12) (0.64) better seeded 0.472∗∗∗ 2.363∗∗∗ (0.09) (0.60) red × better seeded -0.103 -0.706 (0.14) (0.81) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 433 433 433 433 433 433 R-sqr 0.042 0.080 0.094 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

104 C.5 Oceania

Table C.21: Regional analysis - Oceania

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.069 0.111 0.103 0.468 0.732 0.707 (0.06) (0.11) (0.11) (0.39) (0.66) (0.65) seeded 0.354 0.573 1.764 16.667∗∗∗ (0.36) (0.44) (1.45) (1.70) red × seeded -0.140 -0.145 -0.756 -0.837 (0.41) (0.45) (1.66) (1.72) male × red -0.059 -0.051 -0.306 -0.274 (0.11) (0.11) (0.68) (0.66) male × seeded -0.051 -0.158 (0.36) (1.45) better seeded 1.048∗∗∗ 34.759∗∗∗ (0.04) (1.12) red × better seeded -0.328 -16.875∗∗∗ (0.24) (1.62) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 198 198 198 198 158 158 R-sqr 0.026 0.176 0.222 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

105 Table C.22: Regional analysis - Australia and New Zealand

LPM Logit (1) (2) (3) (4) (5) (6) win win win win win win b/se b/se b/se b/se b/se b/se

red 0.106 0.151 0.137 0.634 0.813 0.786 (0.07) (0.14) (0.13) (0.40) (0.67) (0.66) seeded 0.336 0.590 1.626 16.668∗∗∗ (0.36) (0.43) (1.45) (1.77) red × seeded -0.178 -0.173 -0.922 -0.910 (0.41) (0.44) (1.66) (1.72) male × red -0.081 -0.068 -0.322 -0.287 (0.15) (0.14) (0.71) (0.69) male × seeded -0.034 -0.145 (0.37) (1.46) better seeded 1.069∗∗∗ 32.999∗∗∗ (0.06) (1.15) red × better seeded -0.353 -15.936∗∗∗ (0.24) (1.59) Olympics No Yes Yes No Yes Yes Sport No Yes Yes No Yes Yes N 167 167 167 167 140 140 R-sqr 0.029 0.157 0.205 Note: Dependent variable for all specifications is win, a dummy indicating if the individual competitor won. The models (1)-(3) use a LPM-framework and the models (4)-(6) are the matching Logit-specifications to check for robustness. (2), (3), (5) and (6) use Olympics, sport styles and the interactions between them as controls. The data is given at the "individual per bout"-level, therefore N gives the number of competitors. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

106 Appendix D

Robustness Checks for 7 Heterogeneity on Sports

Table D.1: Robustness - Boxing - points per round and points per bout

2004-2016 2000-2012 (1) (2) (3) (4) Points per round Points per round ovr. Points ovr. Points b/se b/se b/se b/se

red 0.008 0.008 0.002 0.003 (0.02) (0.02) (0.02) (0.02) male 0.361∗∗∗ 0.362∗∗∗ -0.341∗∗∗ -0.340∗∗∗ (0.09) (0.09) (0.10) (0.10) seeded 0.191∗∗∗ 0.190∗∗∗ (0.03) (0.03) better seeded 0.197∗∗∗ 0.196∗∗∗ (0.03) (0.03) Tournament Dummy Yes Yes Yes Yes N 5690 5690 1922 1922 Note: For all four specifications, a negative binomial regression framework was used. In specifications (1) and (2) the data is at a per round level for each individual. Therefore 5690 individual rounds are used in (1) and (2). "Points per round" only was only available for the Olympics 2004 - 2016. For specifications (3) and (4) "ovr.Points" were used. Therefore 961 (N/2) bouts were used in the analysis. These are the sum of the points gained per round, and are given at a per bout level for each individual. The T ournamentDummy distinguishes along sex, sport, Olympics and weight classes. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

107 Table D.2: Robustness - Taekwondo - points per round and points per bout

(1) (2) (3) (4) points/bout points/bout points/bout points/bout b/se b/se b/se b/se

red -0.049 -0.046 -0.051 -0.047 (0.04) (0.04) (0.04) (0.04) male 0.559∗∗ 0.561∗∗ 0.731∗∗∗ 0.741∗∗∗ (0.18) (0.17) (0.19) (0.18) seeded 0.332∗∗∗ 0.348∗∗∗ (0.06) (0.06) better seeded 0.377∗∗∗ 0.376∗∗∗ (0.06) (0.06) Tournament Dummy Yes Yes Yes Yes N 4408 4408 1428 1428 Note: For all four specifications, a negative binomial regression framework was used. For specifications (1) and (2), the data is at a per round level, therefore 4408 individual rounds are used. "Points per round" only include the points gained by attack, points deducted for for rule-breaking or points awarded for the rule- breaking of the opponent are not included. For the specifications (3) & (4), the data is coded at the individual per bout-level. For those, N gives the number of competitors, so N/2 gives the number of bouts (714) used in the regressions. For the Olympics 2000-2012, "Ovr. Points", is the sum of the points earned by attacks over all rounds, minus the points deducted for own rule-breaking. In 2016 rule violations of one competitor resulted in points for his or her opponent. For 2016, "Ovr. Points" gives the number earned by own attacks plus the points earned by rule violations of the opponent. The T ournamentDummy distinguishes along sex, sport, Olympics and weight classes. For the regressions, the data was clustered per match. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

Table D.5: Tables of χ2-results per sport in all tournaments without seeding, only female competitors

Sport RED-Winner BLUE-Winner χ2 P df Hill & Burton: ALL 4.19 0.041 1 all tournaments without seeding: ALL 195 207 0.358 0.550 1 TKD 97 111 0.942 0.332 1 WFS 98 96 0.020 0.886 1 Note: These results include the women’s competitions in taekwondo in the Olympics 2000 - 2008, and the women’s competitions in freestyle wrestling in the Olympics 2000 - 2012. Bouts in the group stage and wins by walkover, withdrawal and retirement were excluded.

108 Figure D.1: Win percentage per color, Replication of Hill and Barton (2005a)

Win percentage per sport All tournaments without seeding - Only male competitors

blue red 95% CI 65

60

55

50

45

40 Percentage of contests won

35 ALL BOX TKD WFS WGR

Note: These results include the women’s competitions in taekwondo in the Olympics 2000 - 2008, and the women’s competitions in freestyle wrestling in the Olympics 2000 - 2012. Bouts in the group stage and wins by walkover, withdrawal and retirement were excluded. The exact numbers per sport can be found in table D.5.

109 Table D.3: Robustness - Taekwondo - Attack and Penalty Points

Taekwondo (1) (2) (3) (4) Attack Points Attack Points Penalty Points Penalty Points b/se b/se b/se b/se

red -0.037 -0.034 0.130 0.130 (0.04) (0.04) (0.07) (0.07) male 0.405∗ 0.411∗ 0.976∗∗ 0.980∗∗ (0.17) (0.17) (0.37) (0.37) seeded 0.381∗∗∗ -0.103 (0.07) (0.10) better seeded 0.422∗∗∗ -0.130 (0.07) (0.10) constant 1.012∗∗∗ 1.010∗∗∗ -1.070∗∗ -1.070∗∗ (0.10) (0.10) (0.33) (0.33)

constant -1.053∗∗∗ -1.069∗∗∗ -14.901∗∗∗ -14.333∗∗∗ (0.07) (0.07) (2.39) (3.24) Tournament Dummy Yes Yes Yes Yes N 1430 1430 1448 1448 Note: For all four specifications, a negative binomial regression framework was used. The data is at a per bout level, so N gives the number of competitors, so N/2 gives the number of bouts (714) used in the regressions. "Attack points", used in specification (1) and (2) are only the ones earned by own attacks. "Penalty points", used in (3) and (4), are the points deducted for own rule violations between 2000 and 2012, respectively the number of points the opponent earned by own rule violations in 2016. The T ournamentDummy distinguishes along sex, sport, Olympics and weight classes. For the regressions, the data was clustered per match. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

Table D.4: Wrestling - points per round and points per bout

Free-style wrestling Greco-roman wrestling (1) (2) (3) (4) Tech. Points Ovr. Points Tech. Points Ovr. Points b/se b/se b/se b/se

red 0.060 -0.106 0.060 -0.057 (0.07) (0.06) (0.08) (0.08) male -0.318 -0.532∗∗ (0.18) (0.19) Tournament Dummy Yes Yes Yes Yes N 1006 2050 888 1492 Note: For all four specifications, a negative binomial regression framework was used. Specifications (1) and (3) use "Tech. Points". These were used in the Olympics 2000 and 2004. In 2008 and 2012 (specifications (2) and (4)), "Ovr. Points" were used, which are the sum of the points per round. The data is given at the "individual per bout"-level, therefore N gives the number of competitors, so N/2 gives the number of bouts used in the regressions. There were no women’s competitions for Greco-Roman wrestling, and no women’s competitions for the 2000 free-style wrestling competitions. Standard errors, which are clustered per match, are presented in parentheses. Stars indicate significance: * p<.05, ** p<.01, *** p<.001.

110