POLITECNICO DI MILANO School of industrial and information Engineering

Master of Science in Management Engineering

Predicting : Do Macro variables still matter? Insight from a Prediction Model applied to both genders

Supervisor: Prof. Emanuele Lettieri Co-Supervisor: Prof. Francesco Braghin Andrea Di Francesco Master thesis of: Luca Tamagni, ID: 905705 Stefano Tettamanti, ID: 905480

Academic Year 2019/20

ABSTRACT

Olympic games are one of the most known and followed events in the world. The first Olympic games were held in Greece in 776 a.C. in the city of Olimpia and consisted of a single running race among the local population. Nowadays every four years, the best athletes of the world compete in all the principal sports practiced in the five main continents.

Due to the prestige associated with this event, all of the participating countries are interested in conquering the higher medal count, guaranteeing a professional preparation to their athletes and sometimes also giving them monetary prizes in case of success.

Since the end of the 60’, it was considered of great interest to understand if the macro variable could explain the number of medals won by each country, finding out that population and GDP were the main factors that contributed to the Olympic triumphs.

In recent years, macro variables seem to give worse results compared to the past. The objective of this thesis is to understand if there are new macro variables that can predict the number of medals won by each nation during the Olympic games and deepen the analysis considering the differences in gender by creating three clusters: Female, Male and aggregated genders. Three Olympic Games editions, 2004, 2008 and 2012 have been used as training set, the 2016 edition instead, as testing set. Tobit and Multiple Linear Regression’s predictions have been compared together with the reference research of Bernard and Busse (2000). The models eventually have been evaluated through MAE and accuracy to determine the best model for all of the clusters.

I

ABSTRACT (Italian)

I giochi Olimpici sono uno degli eventi più conosciuti e seguiti al mondo. I primi giochi Olimpici si svolsero in Grecia nel 776 a.C. nella città di Olimpia e all’epoca consistevano in una singola gara di corsa che veniva disputata tra la popolazione locale. Oggi, ogni quattro anni, i migliori atleti del mondo competono tra di loro in tutte le principali discipline praticate nei 5 maggiori continenti.

Per via del prestigio associato a questo evento, tutti gli stati partecipanti sono interessati nel conquistare il maggior numero di medaglie, garantendo una preparazione professionale ai propri atleti e a volte anche dando loro primi in denaro in caso di successo.

Dalla fine degli anni 60’ fu considerato di grande interesse comprendere se le macro- variabili potessero spiegare il numero di medaglie vinte da ciascuna nazione, scoprendo che la popolazione e il PIL erano i principali fattori che contribuivano ai trionfi Olimpici.

Negli ultimi anni, le macro variabili sembrano avere un comportamento peggiore rispetto al passato. L’obiettivo di questa tesi è capire se esistono nuove macro variabili capaci di predire il numero di medaglie vinte da ogni nazione durante i Giochi Olimpici e approfondire l’analisi considerando differentemente i generi creando tre cluster: donne, uomini e l’insieme dei due precedenti. Le tre edizioni Olimpiche del 2004, 2008 e 2012 sono state utilizzate come training set, quella del 2016 invece come testing. Le previsioni effettuate tramite Tobit e regressione Lineare Multipla sono state comparate con la ricerca di riferimento di Bernard & Busse del 2000. I modelli infine sono stati valutati attraverso il MAE (Mean Absolute Error) e l’Accuracy ( percentuale di previsioni esatte) per determinare il miglior modello per tutti i cluster.

II

Executive summary

Summer Olympic Games represent one of the most followed and important events in the world and, as consequence, this topic has always been subject of researches and investigations. In particular, the most interesting aspect consisted in knowing in advance the results of the medal table, in other words, who is going to win the Olympic Games?

The attention around this theme started a long time ago and Jokl, in 1964, executed the first official research to investigate the factors that lead to Olympic success. People attracted by this prediction have increased during the years, especially in the last two decades, together with the growth of the economic interests related to the event. The turning point happened with Bernard and Busse’s research in 2000 when the authors achieved the result of detecting the macro socio-economic factors which contributed the most to determine nations’ Olympic performances in terms of medals won.

The research became the reference work and, starting from it, several authors tried to improve or deepen the analysis. However, with time passing, the accuracy of the models which used the structure provided by Bernard and Busse decreased and, Macro-level partially lost its importance. New studies started to consider the investments and the policies applied by the single nation, or little clusters of them, in order to understand the winning strategies.

Nevertheless, the loss of accuracy of the prediction provided by Macro-level may be related to the huge changes that are taking place in recent years and, it is possible, that the introduction of new macro variables may restart explaining the model with the previous precision. In addition, the performances achieved by different genders has never been deeply investigated and the effect of the same factors on men and women have to be taken into account to provide a more accurate prediction.

The thesis begins by showing in chapter 1 the evolution of the Olympic Games among the years starting from their origins. A general overview is provided to stress out the importance of the event and the interest that has developed around it In particular the history of the

III

Olympic Games is traced considering their evolution in several aspects, number of participating athletes, the increasing number of sponsorships, the coverage of the event but also the female participation at the Games.

Chapter 2 concentrates on the academic literature review, the research considered all the factors that can determine the Olympic success and is split into three parts: Micro-level, which investigates the athlete and his coach’s perspective, the Meso-level, which analyses the sport policies undertaken by single countries to achieve better results and, the Macro- level, which considers the socio-economic factors that determine the possibility of success of all the participating nations. About the Micro-level it turned out that under athlete’s perspective the personal commitment and the environment in which he lives are the key factors to reach the success. Instead, the Meso-level showed how a well-developed economy is just the starting point to build the Olympic success. In particular, the process that transform a talented kid into an Olympic champion needs to be carefully planned and only with a correct path it is possible to exploit the pool of natural talents of each nation. About the Macro level, it looks like Population and Gross Domestic Product largely predict the number of medals won during the Games, however the correlation among them is fading away with time.

Chapter 3 focuses on the gap in literature together with the purpose of research. In fact, although many researchers have tried to predict the medal table of the Olympic Games, all of them used as starting point in their analysis Population and GDP (i.e. Gross Domestic Product) and it has never been investigated whether there could exist new variables able to substitute them and improving the results. In addition, although the female gender has been analyzed separately by few authors, nobody looked at the different impact that the same macro variables could have on the two different genders. By means of Tobit and multiple linear regression the purpose of this research is to build a reliable model to predict the table considering the overall, male and female results. The Olympic editions of 2004, 2008 and 2012 will be used as training set and the prediction will be tested on the 2016 Olympic Games. Thus, at the end of the research it will be possible to answer the question: “Do Macro variables still predict the Olympic Games? And how do they impact on the different sexes?”

IV

In chapter 4 is explained how the data collection was designed, in particular the macro variables were chosen both analyzing the ones already selected by the authors and introducing the new ones that, under determined hypothesis, could explain the model. Data of the 201 participating countries were collected using online official databases. Chapter 5 is dedicated to methodology, to clarify the technique that has been used to develop the predictive model of Olympic medal table. It was decided to use the Tobit and the multiple linear regression, authors indeed demonstrated the superiority of the Tobit model for this kind of prediction, however, the multiple regression has been used by several authors and the superiority of the Tobit model dates back to 2000 when the last comparison was made. Due to the assumption made, that considers obsolete Bernard and Busse’s model it was necessary to put in discussion also the regression model.

Then, in chapter 6, the results obtained by both the models are presented. For each of the clusters, the outputs obtained by the stepwise analysis are shown together with the result of the single editions. It has been chosen to show the results dividing them by the model they refer to and not by the cluster because in this way it was possible to look at the different influence that variables had on the three groups. In particular, the cluster with the higher explanation provided is the female one that reaches the highest level of R-Squared in the Multiple linear regression and pseudo-R-squared in the Tobit. The male cluster instead worsen its performances in time even though the significance of most of the considered variable in very high, which represents an opposite behavior compared to the female cluster. In fact, women’s model is explained only by few significant variables.

In chapter 7 the discussion of the result is made, in particular it has been analyzed the contribution of each variable. Due to fact that for the Tobit model is not possible to estimate the standardized coefficients, the analysis of the effects has been split into three parts: The significance, to assess the non-casualty of the impact, the average effect that the variable gives to the model and the magnitude of the impact, that is the maximum surplus that each variable provides to the medal count. In other words, expresses what is the difference in terms of predicted medals among the nation with the highest value of a single variable compared to the nation with the lowest one. Successively taking into account the prediction of the 2016 Olympic Games, the Mean Absolute Error (MAE) has been calculated for Tobit, Multiple Linear Regression and Bernard & Busse’s models, by doing so, it has been possible

V

to determine the superiority of the Tobit model with the new variables. In addition to provide a clearer picture of the results also the MAE of the most winning countries (i.e. the nations who won at least 5 medals) has been calculated and also with this indicator can be noticed a consistent improvement. In order to make a right comparison of the three models, the lagged medal share variable of B&B’s model, which consist in the percentage of medals conquered by each nation in the previous Olympic Games has been removed. The improvement varies in function of the cluster considered and the best results concerns the female cluster which improves MAE and MAE5 both of more than 20%, on the contrary the male cluster improves in both the indicators but only by the 10%.

Chapter 8 summarize the implications of the obtained results. The main goal achieved by this thesis consist in the consistent improvement of the prediction as clearly visible in the table in the annex. The predicted medals of the Tobit model in fact are closer to reality compared to the older model which took into account Population and GDP. In this moment indeed, the level of Energy Consumption is a wealth indicator that is highly correlated to the performances at the Olympic Games, much higher than the GDP. As concerns the differences in gender, the male cluster results less affected by macro variables on the contrary of what happens to the female one. The difference may be a consequence of the gender issue that prevent many women in the world from practicing physical activity of participating to global events. For this reason the level of competition is reduced and macro variable still play the biggest role, however with time passing a loss of performance is expected also in the female cluster as consequence of the always higher level of competition reached.

Finally, in chapter 9, some limitations of this dissertation are identified, and future research directions are proposed.

VI

VII

Table of Contents

1 Contextual background ...... 6

1.1 Introduction to games ...... 6

1.2 Access to the Olympic Games by women ...... 7

1.3 Creation of the Paralympics Games...... 7

1.4 The Olympic Games today ...... 8

1.5 Boycotts of the games ...... 8

1.6 Olympic symbols ...... 10

1.6 Other objectives of the Olympics ...... 11

1.7.1 Political ...... 12

1.7.2 Economical ...... 13

1.7.3 Ethical-moral ...... 13

1.8 Sponsorships ...... 14

1.9 Broadcasting ...... 15

1.9.1 Broadcasting in different countries ...... 16

1.10 Licensing and Merchandising ...... 18

2 State of art...... 20

2.1 Literature methodology ...... 20

2.2 The Olympic performance explanation ...... 21

2.2.1 Micro-level ...... 22

2.2.2 Meso-level ...... 23

2.2.3 Macro-level ...... 26

2.2.3.1 Population and GDP ...... 27

2.2.3.2 Ex-Communist regime ...... 29

2.2.3.3 Host effect ...... 30

2.2.3.4 Previous results and team size ...... 31

1

2.2.3.5 Natural environment ...... 32

2.2.3.6 Gender Impact ...... 33

2.2.3.7 Other variables ...... 33

2.3 Data measures ...... 38

2.3.1 Success measurement ...... 38

2.4 Loss of performance of Macro variables ...... 40

3 Gap in literature review ...... 41

4 Materials ...... 44

5 Methods ...... 48

5.1 Stepwise ...... 48

5.2 Tobit...... 49

5.3 Multiple Linear regression ...... 49

5.4 Prediction Models...... 50

5.5 Mean Absolute Error ...... 51

5.6 Receiver Operating Characteristics ...... 52

6 Results ...... 56

6.1 Multiple Linear Regression ...... 56

6.1.1 Aggregated genders ...... 56

6.1.2 Female gender ...... 59

6.1.3 Male gender ...... 61

6.2 Tobit Regression ...... 62

6.2.1 Aggregated genders ...... 63

6.2.2 Female gender ...... 65

6.2.3 Male gender ...... 67

7 Discussion...... 70

7.1 Multiple Linear Regression ...... 71

7.1.1 Aggregated genders ...... 71

2

7.1.2 Female gender ...... 74

7.1.3 Male gender ...... 77

7.2 Tobit regression ...... 79

7.2.1 Aggregated gender ...... 79

7.2.2 Female gender ...... 81

8.2.3 Male gender ...... 83

7.3 Benchmark of performances ...... 85

7.3.1 Aggregated genders ...... 85

8.3.2 Female gender ...... 87

8.3.3 Male gender ...... 89

8 Conclusions ...... 92

9 Limitations and future research directions ...... 94

10 References ...... 97

11 List of Figures ...... 102

12 List of Table ...... 104

13 Annex ...... 105

13.1 Aggregated genders ...... 106

13.2 Female gender ...... 107

13.3 Male gender ...... 108

14 Ringraziamenti ...... 109

3

4

CONTEXTUAL BACKGROUND

5

1 Contextual background

1.1 Introduction to games

First Olympic Edition in Athens in 1896. The idea of organizing an international sports event based on the Olympic Games of ancient Greece came to Baron Pierre de Coubertin. The baron was a French historian and pedagogue and, while trying to find an explanation for the French defeat in the Franco-Prussian War, he came to the conclusion that the losers had lost because they had not received adequate physical education. Besides, de Coubertin wanted to find a way to bring the countries of the world closer together and to allow the nations to clash in a sporting event and not in a war.

In 1892, the year of the fifth anniversary of the foundation of the Union des sociétés françaises de sports athlétiques, de Coubertin gave a speech at the Sorbonne in Paris in front of intellectuals and illustrious French figures of the time and explained his desire to attach greater importance to sport in schools and above all concluded the speech with an appeal to renew the old Olympic competition. Two years later, at the Sorbonne in Paris, it was announced that the first edition of the modern Olympic Games would be held in Athens in 1896. This meeting was named as the “first Olympic congress”.

At the congress there were 2000 participants from 78 delegations. Among the illustrious participants was the King of Belgium Leopold II, the Prince of Wales Edward, the Greek Crown Prince Constantine. De Coubertin proposed that Athens should be the first city to host the modern Olympic Games, probably as a sign of recognition for the homeland of the ancient Olympics. The proposal was approved unanimously by all participants. It should also be remembered that during the first Olympic Congress the International Olympic Committee was founded. IOC had the function of organizing and promoting both the Olympic message and the sporting event.

The first Olympic Games were attended by 285 athletes from 14 nations who competed in 9 different disciplines divided into 43 events. The country with the largest number of representatives was Greece with 169 athletes but despite this the winner of the Olympic

6

medallion was the United States. That’s because de Coubertin was influenced by the culture of the Victorian age where women were considered inferior.

“The participation of women would be impractical, uninteresting, incorrect and anti- aesthetic." Pierre de Coubertin

There was no Olympic village, the athletes had to provide their food and lodging as well as travel expenses, for this reason, many nations refused to participate. Six photographers and 35 journalists were invited to document the achievements of the athletes. Since the first edition of the 1986 Athens Olympic Games, only five sports have appeared in all editions: athletics, cycling, fencing, artistic gymnastics, and swimming.

1.2 Access to the Olympic Games by women

Over the years the regulations of the Olympics have evolved, and finally in 1900 the opportunity came for the female gender to participate in the Games although unofficially and could only be compared in four sports: Tennis, Sailing, Croquet, Golf. The real turning point came with the 1912 Stockholm Olympics when the female gender was allowed to participate in swimming competitions. Since that Olympiad it has always been a crescendo of competitions in which women could participate. On the other hand, participation in the Games has gone from 1.7 % of athletes present at the 1908 London Olympics to over 45 % of athletes at the last Summer Olympics held in Rio de Janeiro in 2016.

1.3 Creation of the Paralympics Games

The are the equivalent of the Olympic Games for athletes with disabilities. The first Olympic Games recognized as the Paralympic Games were held in Rome in 1960, although they were initially referred to as the “X International Paraplegic Games” or the “ninth international edition of the Games of Stoke Mandeville”. It was not until 1984 that the International Olympic Committee approved the name of the Paralympic Games.

7

The initial idea was of the German neurosurgeon Ludwig Guttmann who in 1948 organized the first demonstration for German veterans with damage to the spine. In 1952 the panorama of athletes expanded because the Dutch also joined the event. The competition was named after Stoke Mandeville (the name of the competition referred to the city that hosted the games) and was held annually. In 1958 the Italian doctor Antonio Maglio (director of the National Institute for Insurance against Accidents at Work) proposed to Guttmann to play the 1960 edition in Rome, in the same year the city would host the twenty-seventh edition of the Olympic Games. Twenty-four years later, the ninth edition of the Stoke Mandeville Games was recognized as the first edition of the Paralympic Games.

1.4 The Olympic Games today

Today the numbers of the Olympics are very different from the first edition of Athens 1896. In fact, the number of athletes participating in the last Olympics in Rio was 11184 (the sum of men and women athletes) from 207 countries. The number of disciplines in which the athletes have been able to collide has risen to 34 with a number of races equal to 306. However, the figures that make us think more about how the Games have changed are others like the number of accredited media, volunteers, and tickets sold. In fact, the number of accredited media in Rio was 25000, which is impressive compared to the 41 in Athens in 1896. The volunteers present at the last Summer Olympics were 36000 and the tickets sold were 6.2 million for a total of $321 million in revenue.

1.5 Boycotts of the games

Over the course of the Olympic history at least three boycotts have been noted. The first mass boycott took place in Montreal in 1976 where the Olympics were boycotted by 27 African countries, one Asian and one American. The reason for this gesture was justified as a protest against New Zealand and in particular its rugby team, which according to some sources had toured apartheid South Africa despite the ongoing sports boycott, where it had played with other teams composed exclusively of whites. The IOC justified itself by saying that rugby was no longer an Olympic sport and therefore preferred not to intervene in the matter and let the problem be solved outside the organization of the games. On the day of the opening of the Games, the 29 states boycotted the event.

8

The second boycott took place in Moscow in 1980, where large countries such as the United States and China did not appear at the Games along with 63 other nations. Many European countries (France, Italy, Belgium, Great Britain) participated in the Olympics without flag and in case of victory without making their national anthem resonate because they presented themselves under the banner of the IOC. The reason for the boycott was the Soviet invasion of Afghanistan, the USSR intended to begin a sort of safeguard of its Asian republics against the possible danger of an expansion of the Islamic revolution begun in Iran. The third notable boycott occurred in the next edition of the Olympics in Los Angeles in 1984. In total, 14 countries boycotted this edition of the Games. Most countries were part of the Soviet bloc and did not appear to retaliate against the boycott during the Moscow Olympics.

250

200

150

100 Number of country of Number

50

0

1996 2008 1896 1900 1904 1908 1912 1920 1924 1928 1932 1936 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 2000 2004 2012 2016 Olympic Games

Figure 1.1 Number of partecipating countries

This graph shows how the number of countries participating in the Games in boycott editions has changed.

9

1.6 Olympic symbols

Among the most famous symbols of the Olympic Games it is necessary to highlight four: the flag, the motto, the flame and finally the opening and closing ceremonies. As for the flag, it consists of 5 circles intersected between them and positioned on a white background. Following the official declaration of the IOC in 1914, the five circles identify the continents and are intertwined to represent the encounter of athletes from all over the world. Also, Pierre de Coubertin chose these colors (including the background) because at the time they were the ones used within the flags of the world, so the nations felt united under a single flag in the period of the games.

The Olympic motto Citius! , Altius! , Fortius! , are Latin words that mean Faster! , higher! , Stronger!. It was the Olympic motto adopted by the IOC since the founding of the committee in 1984. These words were an exhortation to athletes to always tend to overcome competitive limits. As is often the case with mottos, it has become a philosophy of life. The phrase “The most important thing in the Olympic Games is not to win but to take part, just as the most important thing in life is not the triumph but the struggle. The essential thing is not to have conquered but to have fought well.”, is mistakenly exchanged for the Olympic motto and is attributed directly to Baron de Coubertin. This phrase emphasizes the spirit inherent in the Olympic competition.

The Olympic flame comes from a tradition of the Olympics of ancient Greece. The fire was kept on for the entire period of the Olympic celebration. In the modern Olympics, the Olympic fire was reintroduced in 1928 and has since been part of the opening and closing ceremonies of the Games. The Olympic torch instead was introduced in a later period, precisely in 1936 (Olympics hosted by Germany in the National Socialist period). The Olympic torch has the function of bringing the Olympic fire from Olympia to the host country of the games through a relay formed by torchbearers. Traditionally, torchbearers carry the torch on foot, but other means of transport may be used if necessary. The relay of the Olympic torch culminates on the day of the beginning of the games, the last torchbearer uses the torch to light the Olympic flame. The opening and closing ceremonies of the Olympic Games have developed over time to establish a well-proven operational scheme. The opening ceremony begins with a show

10

organized by the host country and the show consists of songs, dances, and choreography inspired by the history of the country. Then begins the parade of nations carried out in alphabetical order, the only two exceptions are Greece entering as the first nation and the host country parading last. Each delegation is headed by a flag carrier that flies the flag of its own nation and at the end of the parade there are the speeches of the chairman of the organizing committee, president of the IOC and then the head of state of the host country formally opens the Olympic event. Then the Olympic anthem is played and the flag with the 5 circles is raised. Before proceeding with the ignition of the Olympic flame, the Olympic oath is pronounced where a representative of the athletes and one of the judges of the race are committed to respect the rules that govern the Olympic Games.

During the closing ceremony the athletes’ parade is no longer divided by nation but mixed, then are hoisted the flags of the host country, Greece, and the country that will host the next Olympic edition. After the closing speeches of the president of the organizing committee and of the president of the IOC, the Olympic flag is lowered, and the Olympic Games are declared closed. It is also the custom to organize a show with references to the next host country of the Games, since 2004 instead the winner of the marathon is awarded during this event conclusion. Finally, the Olympic flame is extinguished. Over time a new Olympic symbol has been added: the medallion. The Olympic medallion shows all the medals that the nations participating in the Games have won during the competition. It’s a symbol because eventually all the countries want to reach the highest possible position in order to obtain prominence and prestige.

1.6 Other objectives of the Olympics

At the moment of their rebirth the Olympics had the only objective of a social goal, which is to bring the nations together in a competition where all the countries of the world could participate and compete. Over the years, however, some nations have used the Olympic Games for other purposes: political, economical, and to pass the ethical-moral principles that govern the Games.

11

1.7.1 Political

Speaking of political ends, it is necessary to highlight the behavior of Germany in the Olympics hosted in 1936 in Berlin. The sporting event has gone into the background and the Games became a propaganda tool, to show the dominance and the power of Germany at that time. In fact, Hitler had defined the sport as physical and mental preparation in anticipation of battles and wars, then with an end of patriotic education (“A day should not pass without the child receiving at least one hour in the morning and one hour in the evening of physical education, any kind of sport and gymnastics; In the army it is better to judge how much security in the strength of the body develops courage and awakens the impulse of assault.").

Specifically, the Germans won the medallion of the 1936 edition, thanks to the so-called 'dilettantism of state' (the athletes had been, in fact, financed by the state and had to face a devastating preparatory period in the black forest), and also because in the same edition were introduced sports They were also included in the Games sports in which the Germans excelled. The final outcome of the Olympics was considered by Hitler to be another demonstration of the superiority of the Aryan race and a further opportunity to highlight the concept of "super-human" expressed by Nietsche and then instrumentalized by the Führer.

Another example of how the games became a potential tool of political propaganda occurred in 1972 during the Munich Olympics. In this case, some Palestinian representatives took the opportunity to organize a terrorist attack against the Israeli delegation, causing the death of 18 people.

The last example of the Olympic Games used for political purposes was held in 2008 in Beijing. In this edition two different policies were implemented, one carried out by the host nation and the other carried out by the participating countries. China, after intense propaganda carried out through newspapers, radio and television, wanted to give an image of efficiency and pragmatism and show the world its “new face”. That is the face of an open, friendly, and victorious country in the sporting and economic matters. Whereas many members of the participating countries wanted to bring to light the true face of China, which was that of a country with a dictatorial government that oppresses both civil and religious rights and therefore runs counter to the values to which the Olympics are inspired.

12

1.7.2 Economical

With the passing of the editions the resonance of the games became ever greater which led the Olympics to open more and more sponsorship contracts to be able to support the development of the Games. In particular, the Olympic Games held in Atlanta in 1996 need to be highlighted. They took place in this American city because one of the historic sponsors of the Olympic event, namely the Coca-Cola, was based in Atlanta. Today there are actual battles to be able to become partners of the Olympics thanks to the enormous resonance of the event and the subsequent advantage in terms of visibility, given the fact that the Olympics are the most followed event in the world. The topic of sponsors related to the Olympic Games will be developed further later.

1.7.3 Ethical-moral

When it comes to the Olympics, the beauty and fairness of the competition immediately come to mind, but in the course of the 70s the philosophy behind them passed from de Coubertin’s, “The important thing is not to win, but to participate”, to the one of winning at any cost. The countries with a communist regime began to practice the so-called “state doping”, in particular, Russia (at the time it is still called the USSR). Thanks to this practice they achieved incredible results especially in the women’s and in particular in athletics disciplines where even today there are still world records dating back to that period.

Speaking of doping it has to be mentioned the Seoul Olympics held in 1988, in fact after this Olympic competition the IOC began a tough battle against doping. The battle continues to this day and is far from being won. Let us now consider why athletes began to use doping substances. In these years the numbers of sponsorship contracts began to grow and winning an Olympic gold meant almost with certainty a millionaire gain. The most emblematic case was the positivity of the Canadian athlete Ben Johnson who after two days from the victory in the hundred meters was disqualified for doping.

13

1.8 Sponsorships

Since the first edition of 1896 the Olympic Games have always been able to count on the help of commercial partners for the organization and success of the Olympics. The role of the sponsors, however, has expanded over time because it has gone from being a support for the success of the Olympics to also help athletes from different countries to participate in the largest sporting event on earth. This is the case of Visa, which since 2004 (starting date of the program) has supported athletes (Olympics and Paralympics) so that they could reach their maximum potential. Support was provided regardless of origin and social background. For the support they provide to the Olympics, however, the sponsors can benefit from a global exposure, showing their brand to billions of people from different nations of the world and from different social backgrounds thanks to the unique marketing possibilities that the Olympics provide. We think trivially of Samsung (partner of the Games since the Winter Olympics in Nagano). We all remember the images of the last opening ceremonies of the Games (both summer and winter), as all athletes entered the stadiums with the smartphones of the Korean brand in their hands to film the amazing audience. In addition, the sponsors have the opportunity to associate their brands to the five Olympic circles that is one of the most popular brands in the world. To highlight this fact, we refer to the words of IOC President Thomas Bach: “Our relationship with the Worldwide Olympic Partners is more than a commercial relationship, it is a partnership.” Speaking instead of figures we can highlight how the revenues of the sponsors have grown by 7.6% from the four years that preceded the London Olympics to the four years that led to the Rio Olympics. It should be noted, however, that the sponsor argument is not entirely trivial because the Olympic Committee imposes strict restrictions on advertising in the official venues of the races. In fact, companies give their best outside the Olympic stadiums by personalizing environments trying to attract the interest of the people. Let us give some examples. The Omega watch manufacturer has converted a cultural center overlooking the beach into a celebrity bar and watch museum. In a luxury hotel along Copacabana Beach, Visa has showcased a number of wearable payment devices, such as rings and cuffs. At the Olympic Games there are not only the official sponsors but also those of the federations, and the most striking case is that of Heineken (although he is not directly a sponsor of the Olympics but of some federations). Heineken began planning his presence in Rio three years in advance, sending observers to visit several sites before choosing a location

14

on the city’s lagoon, where rowing competitions were held. In this structure you could count 300 paid employees in addition to volunteers, two floors, different areas to eat, a swimming pool and a nightclub. The Dutch brand was everywhere. There was no shortage of beers for visitors and when the Olympic Committee of the Netherlands celebrated the winners of the country’s medals, the athletes made their triumphant entry through a Heineken-style star on the wall.

1.9 Broadcasting

It has to be clear how the Olympic Games have become the largest event of global interest so in this paragraph we will enunciate some numbers that help to understand the magnitude of the event.

“With half of the world’s population watching the Games, Rio 2016 were the most consumed Olympic Games ever. These figures show the great appeal and the relevance of the Olympic Games.”

These were the words of IOC president Thomas Bach at the end of the last summer Olympic event. The Rio Olympics were the most viewed thanks to the expansion of the event’s broadcast services not only on television but also on digital platforms and the interest created thanks to social networks. The average viewer watched over 20% more content regarding the Olympics than in London 2012. Television coverage increased by 13.5% from the last Olympics. It went from broadcasting 181,523 hours to 357,000 hours of broadcasting (total hours between television and streaming).

15

400.000 356.924 350.000

300.000

250.000 181.523

200.000 Hours 150.000

100.000 71.719 44.000 50.000

0 Athens 2004 Beijng 2008 London 2012 Rio 2016 Hosting country

Figure 1.2 The growth in the Olympic Games coverage

“The Record-Breaking digital coverage of Rio shows that watching the Olympic Games no longer means simply turning on the tv, with more and more fans choosing to stream content on their connected devices wherever and whenever they want.” Timmo Lumme, Managing Director, IOC Television and Marketing Services.

The most interesting figure is the increase in digital coverage of the event which increased by 198.6% compared to London. In fact, the number of hours of coverage of the event on digital platforms has reached an incredible 243,000 hours almost three times as much as the figure recorded in London and has led to an increase in audience of 10%.

1.9.1 Broadcasting in different countries

In this section will be exhibited how the nations broadcast the Olympic competition starting from the host country, Brazil . Obviously, the comparison with the visualizations of London have grown incredibly because the popular interest was much more attracted being the Home Olympics. The most followed event by the Brazilian people was the final of the football tournament where Brazil triumphed against Germany, this event was seen by 47.1 million people. More generally speaking, the Brazilians have looked much more at the Olympics in

16

Rio than in London, in fact the visualizations of the event have increased by 117% compared to London (with an audience of 86 million). Analyzing the other countries of the world we can see that the largest visualizations are linked to events with fellow countrymen in the race. For example, the most visualized race in the United States was the gold medal won by Michael Phelps in the 200-meter butterfly, which recorded an audience of 42.55 million spectators. In addition, in the USA, digital streaming with almost 50 million users has been used more than in any other country.

In Europe the highest average number of spectators during the Olympic event was recorded in United Kingdom with an audience of 45.24 million people and the most followed event was the victory of Jason Kenny in the keirin (a track cycling discipline). In Oceania the nation that recorded the largest number of spectators was Australia. They have increased the total hours of transmission of the event in fact from about 600 hours of transmission to 903 hours of free transmission. The most followed event was the opening ceremony with 2,263 million spectators. In Asia, it should be noted that Japan, the future host nation, had an incredible interest in the Games. In particular, the Japanese have been very interested in women’s competitions, in fact, the three most displayed events have as protagonist women. The most viewed event was the women’s marathon. Among these data the incredible growth of visualizations that has been found in Africa should be underlined, in particular in sub-Saharan Africa. In fact, the Olympics have been watched by over 300 million people, which corresponds to an incredible increase of 75% compared to the previous London Olympics. To sum up, the broadcast section can be concluded with a sentence said by the chairman of the NBC Sport Group that in 2016 said:

“There is no event that aggregates audiences on a such massive scale for so long and across as many platforms as the Olympics.”

17

1.10 Licensing and Merchandising

A brief paragraph is now devoting to the sale of Olympic-related products. With the growth of interest compared to the event, the Olympic brand led by the IOC has taken the opportunity to sell products marked with its own logo. In the last Olympic Games, the authorized stores to sell Olympic merchandise were more than 40000 throughout Brazil and in particular in Rio there were 132 official Rio 2016 stores. The products branded with the Olympic brand were over 5000 and the sale of these items led to a profit of 300 million dollars. The most popular product were flip-flops produced by a well-known Brazilian brand with 2.5 million pairs sold. The most impressive megastores were located at the Barra Olympic park and on Copacabana beach, in particular the first boasted an interior space of 4,200 SQM. Total visitors to the megastores were 3.5 million.

18

STATE OF ART

19

2 State of art

2.1 Literature methodology

The first part of the thesis regarded the research of academic journals related to the Olympic Games’ theme. The instrument used to find them were mainly websites dedicated to researchers and scholars where the authors can upload their research and share their work: Scopus, Research Gate, Google Scholar. Many papers were found during this part due to the large amount of analyses that can be carried out on the Olympic Games, social aspects, economic aspects, consequences and impact of the event where the most numerous. The keywords used to find the articles were: “Olympic Games”.

The second part of the research was related instead to the performances of athletes during big sport events, in particular it was investigated which factors can affect the final result. The research was not concentrated on the Olympic games but considered all the comparable manifestation: Fifa World Cup, UEFA Champions league, Superbowl, NBA finals, and many others where a large part of the global population is involved. Because of this a lot of papers were found even if the aspects investigated were very similar even if the sport was different. The conclusions of the authors were that the performance of an athlete can be affected by several factors, psychological or physical, and, each of them is important. In this phase the keywords were: “Sport performance”, “Athlete Performance”, “Sport event performance”, and the names of the sport events.

In the final phase the focus was on the performances and results at the Olympic Games. The authors were divided into two categories, the one who wanted to understand the whole phenomenon of the games, and the one who concentrated their attention to the single race. Some of the journals provided also a prevision of the result using macro variables to investigate the reasons of success of different countries. The number of papers related to this topic was smaller compared to the previous research phases, but more than 30 articles were found due to the fact that some of them date back to the 70’. The keywords used were: “Olympic Games performance”, “Olympic Games results”, “Olympic Games prediction”, “Olympic Games success”.

20

2.2 The Olympic performance explanation

Seppänen (1981) defines performances in top-level sports as a combination of genetic qualities and the environmental and physical circumstances in which people live. It is very difficult to detect all the factors that affect an athlete’s performance because of their different nature.

Shibli et al. (2006) tried to classify them into 3 levels (Macro-Meso-Micro) with the purpose to provide an explanation to the phenomenon, in particular, the question they wanted to be answered was how similar countries could perform so differently during the Olympic Games.

Figure 2.1 Relationship between factors determining individual and national success

21

2.2.1 Micro-level

The deepest level of analysis is the athlete itself and his coach, a key figure in the formation process of an elite sportsman. Some researchers tried to determine which factors influence the individual success of an athlete investigating what the single athlete considers important during his formation phase (Conzelmann & Nagel, 2003; Duffy, Lyons, Moran et al., 2001; Gibbons, McConnel, Forster et. al., 2003; Greenleaf, Gould & Diefen, 2001; Nys, De Knop & De Bosscher, 2002; Unierzyski, Wielinski & Zhanel, 2003; Van Bottenburg, 2000).

Studies are national-oriented due to the difficulty of collecting data, the research are indeed surveys filled out by the athletes who determine what are in their opinion the key factors to reach the final success. De Knop et al. (2004) found out that in Flanders the 97% of the athletes affirm that personal motivation and persistence is one of the keys to reach the success combined with a good personal environment (i.e. parents, family, friends), 83,6% and expertise and quality of coaches, 61,4%. Other factors such as financial support and facilities are considered less important by respondents.

Gibbons et al. (2003) concentrated their attention on USA champions and they discovered that the financial support is considered important only by the 11,5% of athletes and, like Flanders’s elite sportsman, the dedication is the most important aspect with a 58,1% of share.

Different values emerge in Ireland’s athletes studied by Duffy et al. (2001) in terms of percentages but not in the order since the first three factor are still personal factors 37,2%, social support 36,2% and coaching 31,4%. The most interesting aspect is that, in athletes’ perspective, natural talent is not considered a key success factor, on the contrary of what the common sense indicates.

The limitation of these papers lays in the comparability of the analyzed countries, all of them belong to the occidental culture, with high level of wealth and probably, the same survey performed in African or South American countries could give different results. This would confirm that there exist macro aspects that can affect the environment in which the athletes live, changing their approach to sport, competition and personal success, which is the purpose of this thesis. In addition, authors consider indifferently the gender of their athletes but, as we are going to analyze, a consistent difference exists among them depending from

22

the country in which they live in, the Muslims countries for example try to prevent woman from practicing sport due to their culture and religion, because of this the Micro-level can provide only a small piece of the whole story.

2.2.2 Meso-level

The Meso-level is composed by all the factors that are affected by sport policies and politics. In fact, different countries use diverse strategies to reach a higher level of competitiveness, according to Shibli et al. (2006) elite athletes will have a greater chance of success, depending on the effectiveness of policy and the investment made in elite sport.

Studies on the Meso-level have been executed but are less in number compared to the Macro- level due to the high complexity of data gathering and the non-scalability of the model, it is impossible indeed to collect this kind of data from all the 203 participating countries. The largest study ever done, the SPLISS (i.e. Sport Policy Factors Leading to International Sporting Success, 2006) compared only 8 countries and identified 9 pillars, that are the principles that should affect the performances of the countries during the Olympic Games. These pillars are the result of all the studies carried out on the Meso-level in past years:

- Financial support: The financial aspect is the base of all the pyramid, without this crucial pillar all the theory fades away. It has different shades of meaning, the support for the athlete itself as highlighted by Gibbons, Mc Connel, Forster et al. (2003) in terms of funding and sponsorship; The support for the NGB (i.e. National Governing Bodies) that is the budget allocated by the country in the sport and elite sport development (De Bosscher & De Knop, 2004) or the financial support for training centers and personnel (Clumpner, 1994).

- Integrated approach to policy development: Oacklay and Green (2001) found out that a crucial aspect in the improvement of performances is the capability of recognition of the talent that has costs in term of personnel and structures dedicated, has already highlighted by Larose & Haggerty (1996). In addition, according to Clumpner (1994) to obtain higher performances it is necessary to focus the funding on sports with higher possibility to bring medals through simplicity of

23

administration, with common sporting and political boundaries (Wells, 1991). In addition, the NGB should focus on long terms results and cooperate with regional departments and clubs (De Bosscher & De Knop, 2004).

- Sport participation: The higher the number of sportsmen in a country, the higher the chances to possess a potential champion, because of this Riordan (1989) demonstrated the importance of having the access to sport for all the population and the recognition of physical education and sport as a constitutional law. This thesis was supported by De Bosscher & De Knop (2002) who added another important aspect, the absence of specialization in early ages, they stated indeed that a too early specialization could lead to possible waste of a talent. By practicing diversified sports during childhood and early adulthood it is easier to understand strength and weaknesses of a potential athlete and identify the sport with higher victory chances.

- Talent identification and development system: Since the large part of Olympic athletes are aged between 18 and 30 years, the timing in detecting a talent is a crucial aspect. Sedlacek, Matousek, Holcek et al. (1994) answered this topic showing how the talent identification through schools (typical in former communist nations) and the statistical identification and monitoring the progress of the talented athletes play an important role in the improvement of the chances to win the medals. Douyin (1988), focused his attention on another aspect, the importance to train optimally the detected talent through a high frequency of training starting from school (typical in former communist countries), the possibility to train in high level structures and, to prepare the athlete to the competition, the organization of races during the talent development.

- Athletic and post career support: One of the complications for being a potential Olympic athlete is the economical aspect, the career indeed ends in early ages and the salary earned during the activity years mainly depends on achieved results.

24

However due to the amount of time spent on training most of the athletes cannot graduate and for these reasons many potential talents are lost. Green and Houlian (2005) highlight the emergence of “full-time” athlete with higher amount of money earned and rewards but also a preparation for life after sport as an important success factor.

- Training facilities: Many authors agree in considering the importance of high-level training facilities to create a complete talent (Green and Houlian 2005, Oackley and Green 2001, De Bosscher and De Knop, 2004, Clumpner 1994, Wells 1991). In particular it is important to have a large number of sporting facilities spread all over the country, but also the existence of sport centers with equipment for elite sport and the development of a national training center. In addition, these facilities must have a high level of accessibility by the athletes.

- Coaching provision and coach development: The coaching aspect, as already demonstrated in the Micro-level by athletes themselves, gives a boost to the preparation for the talents. Semotiuk (1990) showed how professional and full-time coaches were determinant, and how is important to have a large number of coaches with high degree of preparation. Gibbons, Mc Connel, Forster et al. (2003) added the importance of coaching preparation and the presence of qualified coaches at all ages, not only in the adulthood but from the very beginning of a potential athlete’s life.

- National and international competitions: One of the last pillars is represented by the habit to compete at high levels in order not to suffer the pressure of the big event. Johnson and Ali (2002) point their attention on the importance of hosting international events within the country, Green and Houlian (2005) stress the attention on the participation to international competition by athletes while Green and Oackley (2001) focus their attention on the organization of professional national competitions.

25

- Scientific research and sport medicine support: De Bosscher and De Knop (2004) recognize the use of scientific methods to seek talent, scientific organization of training programs, applied research geared to specific sports, development of techniques in particular sports and perfection of sporting equipment and facilities as key factors to maximize the potential output of a nation in terms of talents’ creation.

2.2.3 Macro-level

The Macro-level focuses its attention on objective aspects, the condition in which people live such as economic welfare, population, geographic and climatic variation, degree of urbanization, political system, and cultural system. The advantage of this kind of analysis is that data are easy to be found and because of this many study have been carried out during the years.

Figure 2.2 The nine pillars which determine sport success

26

The necessary assumption to perform a macro level analysis of the Olympic games is that the talent is equally distributed throughout the world. Every nation has equal opportunities to produce competitive elite athletes (Grimes, Kelly & Rubin, 1974; Levine, 1974; Kiviaho & Mäkelä, 1978; Morton, 2002). However several authors highlight the impact of two independent macro-economic variables: the Gross National Product of a nation and its population (Bernard & Busse, 2000; De Bosscher, De Knop & Heyndels, 2003 a & b; Jokl, 1964; Johnson & Ali, 2002; Kiviaho & Mäkelä, 1978; Levine, 1974; Morton, 2002; Novikov & Maximenko, 1972; Suen, 1992; Van Bottenburg, 2000; Stamm and Lamprchet,2002; Hoffman et al, 2002; Estellita Lins et al. 2002; Kuper at al. 2003; Bian, 2005; Andreff & Andreff, 2008; Forrest, Sanz & Tena, 2010; Vagenas et al. 2012; Leeds and Leeds, 2012; Trivedi & Zimmer, 2014; Lowen, Deaner and Shmitt, 2014; Noland & Stahler, 2015; Bredtmann, Crede & Otten, 2016; Blais-Morriset et al. 2017; Vagenas et al. 2019;Andreff & Andreff 2020). However, during the years several variables have been considered and analyzed, and the result obtained will be now discussed.

2.2.3.1 Population and GDP

Almost all the academic papers that executed analysis at macro level, considered Population and GDP of a country as starting point for the prediction. In the table are reported the percentages of the authors who choose to include these variables in their models.

100,00% 91,67% 90,00% 80,56% 80,00% 70,00% 60,00% 50,00% 40,00% 30,00% 20,00% 10,00% 0,00% Population GDP

Figure 2.3 Percentage of authors who decided to include Population and GDP in their models

27

According to Bernard and Busse (2000) the larger the population of a country, the higher the chance to discover a talented athlete. The reasoning provided by the authors who used the population variable is that if the talent is equally distributed, by increasing the number of people it is more likable to have a greater number of talented athletes. On the other hand, creating a champion is not enough and the criticality proposed by De Bosscher (2007) and confirmed by Shibli (2012) is the detecting of the talent. If a nation is able to produce a large number of potential athletes but has not the resources to find them and provide the necessary training structures, large part of the talent is lost. A clear example of this are countries like China and India which underperform considering their population and their possibility to create a talented athlete.

In addition, the number of athletes that a nation can qualify at the Olympic Games is not unlimited, it is determined by the IOC (i.e. International Olympic Committee) who negotiate with the countries. In other words, if the ten strongest table tennis players are Chinese, but the IOC denies the participation to seven of them, the chances of winning all the available medals is reduced.

One final comment is that in medal counts, team events count as one medal, even though a country have to provide a greater number of talented athletes. To sum up, even if a country is able to send athletes in proportion to its size, it may still win a smaller share of medals than its size would predict (Bernard and Busse 2000).

The authors claimed that population itself was not enough to explain the medal table, and most of them (De Bosscher, De Knop & Heyndels, 2003 a & b; Jokl, 1964; Johnson & Ali, 2002; Kiviaho & Mäkelä, 1978; Levine, 1974; Morton, 2002; Novikov & Maximenko, 1972; Suen, 1992; Van Bottenburg, 2000; Bian 2005; Andreff, Andreff, and Popaux, 2008; Forrest, Sanz, and Tena, 2010; Vagenas et al. 2012; Leeds and Leeds, 2012; Trivedi & Zimmer, 2014; Lowen, Deaner and Shmitt, 2014; Noland & Stahler, 2015; Bredtmann, Crede & Otten, 2016; Blais-Morriset et al. 2017; Vagenas et al. 2019;Andreff & Andreff 2020) identified another crucial aspect, the wealth of a country. In particular the macro variable that was elected as a good wealth-indicator was the GDP (i.e. Gross Domestic Product) expressed as total amount or per capita.

28

By introducing this variable in their research Bernard and Busse (2002) verified a dramatical improvement in the results, they were able indeed to correctly predict whether a country would have won at least a medal in the 72% of the cases. This variable perfectly completes the model together with population because explains the differences among countries with comparable populations.

A great consensus exists on the influence of these two factors since they tell a large part of the story, but not the whole story. Other important variables have been analyzed but population and wealth have explained the model up to the 00’. Xun Bian (2005) affirms that the reason that lays behind the correlation between GDP and medals won is that high-income countries specialize less; in other words, they win medals in a more diversified range of sports. Smaller countries indeed most of time perform well in a little cluster of disciplines due to traditions or genetic, like long runs races for Ethiopians and Kenyans or rugby for New Zealand. However, during the last decades the accuracy of the prediction provided by population and GDP has decreased and purpose of this thesis will be also understanding the reasons and provide an explanation to this phenomenon.

2.2.3.2 Ex-Communist regime

As already seen in the Meso-level, the communist countries wanted to excel during the Olympic Games, because of this the preparation of the potential athletes started from the schools. The reason was strictly political, they wanted to show the world the superiority of their political system compared to the capitalist one, however this attitude had consequences in terms of performance and improved the result of the involved nations.

The first author who introduced this variable was Levine in 1974; he wanted to highlight the different performances of communist and capitalist countries and partially succeeded since the communist variable turned out to be very significant (contrary to the capitalist one) and the large majority of the following articles and researches introduced the “communist factor” in their analysis.

Later on, Bernard and Busse (2000) tried to deepen Levine’s analysis by considering a further division among the (ex) communist nations. In their opinion the Soviet Union played a different role compared to the other communist regimes (like Cuba or China). Because of

29

this they divided the old “Communist variable” into “Ex-Soviet”, if the country had been under the Soviet regime, and “Planned” if the nation was controlled by a centralized government. Both the variables were significant, however their effect was very similar and because of this most of the authors keep using the old communist variable, some of them instead, like Forrest et al. (2010) and Andreff et al. (2020) preferred to use Bernard and Busse division.

The interesting aspect of this variable is its evolution during the years since when it was introduced by Levine in 1974 it was significant at 1% level, contrary to what happened in the latest research performed by Andreff in 2020, where it was still significant but at 10% level. The reason of this is that the “Communist effect” is fading away over time, because of this in our analysis we are going to highlight this aspect.

2.2.3.3 Host effect

The most discussed variable is for sure the one which captures the positive effect of hosting the Olympic Games. Bernard and Busse (2000) were the first introducing it, analyzing over 30 years of games with a Tobit model and flagging the hosting country they discovered a positive impact on the medal count. As proof of this:

“The hosting country is allowed to participate in all events. In addition, the crowd of home spectators will support the performing athletes. More resources are likely to be devoted to training in preparation for the game that will attract so much attention within the home country” (X. Bian, 2005).

The positive impact of this phenomenon is out of the question, as confirmed by all the authors who introduced this variable (Bernard et al. 2000, Johnson et al. 2002, Kuper at al. 2003, Bian 2005,Andreff et al. 2008, Forrest et al. 2010, Schmitt et al. 2014, Noland et al. 2015, Bredtmann 2016, Blais-Morisset et al. 2017, Andreff et al. 2020), in addition, Forrest et al. (2010) analyzed the Ante-Host effect considering also the countries that are going to host the following Game’s edition and that, under his hypothesis, are already starting to benefit from the hosting effect.

However the measurement of the effect is considered differently by the researchers, the Bernard and Busse’s approach has been replicated by several authors but a deeper analysis

30

has been carried out by Clarke (2000) who analyzed all the Olympic games from 1896 and found out that the hosting effect is random and most of times depends on the average performance of the country, the better the average amount of medals the lower the effect. At the same time, the average increase is around 10%, much bigger compared to the 1,8% stated by the Tobit regression.

Eventually Johnson and Ali (2002) provided another possible effect: the neighboring countries, they tried to use this variable after having discovered that the countries close to the hosting one tended to send more athletes compared to their average. However, the variable turned out to be not significant at all for winter games, significant at 5% level with Multiple regression and not significant at all with the Probit model. Because of this the variable has not been utilized anymore.

2.2.3.4 Previous results and team size

Depending on the aim of the research, authors choose to include or not the variable of the past results, which consist in the number (or percentage) of medals won by each country in the previous game edition. There is no doubt on the efficacy of the variable, since when introduced in each model explains the large part of it, improving dramatically the results as demonstrated by Andreff and Andreff (2008). However, if the aim is not the prediction of the medals but the understanding of the phenomenon, introducing this variable is counterproductive since tends to hide the secondary effects in terms of significance. Johnson and Ali (2002) affirm:

“We chose not to include lagged performance variables (i.e., past Olympic participation or medal success) in any of our analyses. Although lagged performance variables are highly effective explanations and predictors (Bernard and Busse, 2000), our interest is in the effect of purely economic and political variables.”

Another factor that significatively affects the model was detected by Vagenas et al. (2012) and consist in the number of athletes that a country is able to qualify to a single edition of the Olympic Games. The reason is related to the qualification process, each country indeed can take a determined number of athletes for each discipline and the qualification to the Games most of times depends on the performance achieved by the athletes during the year

31

in order to have only the top of the world performers. This rule prevents the single country from bringing a too large number of participants in a single discipline and also prevents the largest country from monopolizing the Games. The only way for a country to bring more athletes is to diversify its portfolio of participants, by being competitive in all the sports a single nation can increase the number of players and consequently increase the chances to win a medal. However, the following researches didn’t consider this variable because it performed like the lagged medal share but with less precision and significance.

2.2.3.5 Natural environment

Levine (1974) was the first trying to determine whether the location in which an athlete live could affect his performance, and, in order to do so, introduced in his model the surface of each nation expressed in squared Kilometers. It turned out that countries with higher surface were more competitive.

The surface indeed comprehends different aspects, bigger countries have access to a larger quantity of natural resources to sustain population and economic growth, in addition, have more internal differences in term of culture, genomic and natural environment such that more sports can be practiced. A clear example of this are the USA where a lot of different cultures and people participate under the same flag, and for sure, this is one of the keys of their success. Due to the high significance of this variable many authors used it in their research (Levine 1974, Condon et al.1999, Van Bottenburg 2000, Tcha et al. 2003, De Bosscher et al. 2003, Vagenas et al. 2012).

As regards temperature and distance from Equator, Vagenas (2012) and Noland (2016) respectively tried to include these variables in their models but the result obtained was to show how them didn’t affect the chances of winning a medal. Tcha et al. (2003) added the coastal length and the average altitude of the country without discovering any correlation. For these reasons the only natural environmental variable that authors discovered affecting the Olympic Games’ performances was the surface of the country.

32

2.2.3.6 Gender Impact

Less attention had been given to the gender theme, only in recent years some researchers attempted to insert this aspect in their model. However, their intention was to determine the existence of variables that could explain the model only for women, because of this the aspects analyzed were most of times related to human rights and religion. Leeds and Leeds (2012) in particular introduced the Arab variable due to the poor rights granted to women in these nations, in addition the fertility rate was taken into account and women labor force percentage. The aim of their research was to find out specific variables only for the female gender, but the model was applied also to the male. However only the fertility rate was significant but for both the genders, because of this the effect does not explain the differences in gender but can be read as similar to the one obtained by the population variable, a demographical factor and not a social factor.

Shmitt et al. (2014) tried to use the GII (i.e. Gender Inequality Index) but the significance of the variable disappeared when logged population and logged GDP were introduced in the model. These researches suggest that there exists an influence of the women rights with the women’s performances at the Olympic Games, but other macro variables better explain the model and cancel this effect.

2.2.3.7 Other variables

During the years different authors tried to include new variables in their models, one of those was the educational aspect suggested by Levine et al. (1974) who considered several indicators of the culture level of a nation, in particular: Illiteracy percentage, expenditure on education primary & secondary education, higher education student, newspaper circulation, but no significance was found.

Condon et al. (1999) considered the magnitude of human’s infrastructures and introduced the number of airports, the rail track length and the paved highways length but also the economic power of a nation considering export value, import value, electricity production, electricity consumption. However, the analysis considered only a single Olympic Games’ edition (2012) contrary to other studies which analyzed several editions to find a consistent

33

correlation, because of this the results obtained cannot be considered as reliable as the other researches.

Some authors understood the importance of the Meso-level and the correlation of investments with the number of conquered medals, in particular Forrest et al. (2010) identified in the recreational expenditure an adequate proxy, since part of the expenditure goes in the sport expenditures. In particular considering five European countries for which data were available they estimated the percentage of recreational expenditure destinated to sport and used it to forecast the value for all the other countries. However, the limitation was double, the total number of analyzed countries was only 53 and, of these 53 countries, only five of them presented a real value.

Even though a lot of researches looked for determinants of the Olympic success, eventually the model always relies on demographical variables, in particular population, economic variables, GDP or GDP per capita, political regime and hosting effect. In the table are listed all the variable ever used by the authors and all of them have been used just one time due to the lack of correlation with the medal won during the Olympic games.

34

her et al. 2003 al. et her

Jokl,1964 Bian2005

Suen, 1992 Suen, 1994 Suen,

Gillis, 1980 Gillis,

Levine,1974

Morisset et al. 2017 al. et Morisset

Gärtner, 1989 Gärtner,

Seppänen1970 -

Lins et al. 2002 al. Lins et

Tcha et al. 2003 al. et Tcha

Shaw et al. 1976 al. Shawet 2016 Bredtmann

Leeds et al. 2012 al. et Leeds

Kuper at al. 2003 al. Kuper at

Condon et al.1999 Condonet 2010 al. Forrest et

Grimes et al. 1974 al. et Grimes 2002 al. et Morton 2014 al. et Trivedi 2015 al. et Noland

Schmitt et al. 2014 al. Schmitt et

Andreff et al. 2008 al. Andreff et 2020 al. Andreff et

Kiviaho et al. 1978 al. Kiviahoet 2000 al. Bernardet 2002 al. et Johnson

Novikov et al. 1972 al. Novikovet 2012 al. et Vagenas 2019 al. et Vagenas

Hoffman et al. 2002 al. et Hoffman

Der Butter et al. 1995 al. Butter et Der 2000 Bottenburg Van

De Koning et al. 1996 Koningal. et De

De Bossc De

Stmm et L. 20002001 L. & et Stmm Blais

Population 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Population 1 density Population. 1 1 Growth rate AverageLife 1 1 expectancy

Death rate 1 1

Birth rate 1

Infant mortality 1

GDP per capita 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

GDP 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Income per 1 capita

Unemployment 1

Labor force 1

Export value 1

Import value 1

Electricity 1 production Electricity 1 consumption Ex-communist 1 1 1 1 1 1 1 1 1 1 1 1 country Capitalist 1 1 1 1 1 country

Ex- soviet 1 1 1 1

Planned 1 1 1 1

Political 1 1 system dummy

Polity2 1

Host 1 1 1 1 1 1 1 1 1 1 1

Ante-Host 1 1 1

35

Next host 1 1

World cup host 1

Neighbbouting 1 countries

Medal share t-4 1 1 1 1 1 1 1

Team size 1 1

Athlete share 1

Duration of IOC 1 membership

Surface 1 1 1 1 1 1

Dummy region 1 1 1

Average 1 1 temperature Temperature 1 1 dummy

Coast lenght 1

Altitude 1

Latitude 1

Urbanization 1 1 1 1 1

Number of 1 airports Rail track 1 length

Paved highways 1 length

Religion 1 1 1 1

Islamic 1 1 1 muslim share 1

Illiteracy 1 1 percentage Expenditure on 1 education

Primary & secondary 1 education

Higher education 1 student Newspaper 1 circulation

Years of school 1

36

Female share 1 1

Gender inequality 1 index

Boycotts 1 1

Doping 1

Recreational 1 expenditure Healt 1 expentidure Invesgtment 1 per capita Military 1 expenditures

Human development 1 index

Quality of life 1 index Calorific 1 consumption

Media 1

Romanic 1 speaking country

Number of Olympic 1 sports in schools

Year 1

Table 2.1 Analysis of the macro-level scientific papers

37

2.3 Data measures

Depending on the final aim of the researches, authors used several methods to measure the success of a nation at the Olympic Games, but some of the most interesting assumptions are related to the way in which variables were used in the models. It is not enough indeed to identify which factor has an impact on the prediction, it is important to choose the correct time horizon in order to increase the accuracy and reliability of the model.

2.3.1 Success measurement

The conventional ranking used by the IOC (i.e. International Olympic Committee) is to sort the countries ordering by the number of gold medals won by their athletes. In the event of a tie in the number of gold medals, the number of silver medals is taken into account, and then the number of bronze medals.

However, authors didn’t reach this kind of accuracy, at this level of competition indeed the difference among a gold and a silver medal is too small and this would have led to mistakes in predictions and analysis. For these reasons, three methods of success measurements have been used by the authors:

- Medal Count (MC): The overall number of medals won by each country during the Olympic games. No distinctions are made in the value of the medals, neither in terms of quality (Gold, Silver, Bronze) nor in terms of number of athletes involved (singular sports and team sports)

- Medal Share (MS): The percentage of medals won by each country considering all the available medals, in formula:

푀푒푑푎푙푠 푊표푛 푀푒푑푎푙 푆ℎ푎푟푒 = 푖 ∑ 푀푒푑푎푙푠 퐴푣푎𝑖푙푎푏푙푒

38

- Number of athletes qualified (AQ): The number of athletes that each nation brought to the Olympic games, in fact, the higher the number of athletes that a nation can qualify to the games, the higher the chance to win a medal.

- Number of gold medals (NG): The number of golden medals conquered by the nations without taking into account silver and bronze medals

The first two measures (MC,MA) have different meanings, in fact available medals increase edition by edition and maintaining the same medal share means conquer more medals than the previous edition and can be considered a success for the nation, however them provide the same result and the authors use the first or the second in function of what is for them the most important aspect to be shown, the final ranking or the improvement/worsening of results compared to previous edition.

The number of golden medals (NG) has been used several times by different authors, most of times executing a comparison with the total medal won to understand if variables affected differently the quality of the medal, but results didn’t highlight anything of concrete.

Johnson and Ali (2002) undertook the same analysis on the macro variables taking into account the number of athletes qualified at the Olympic Games (AQ) instead of considering the medals. From this, they concluded that macro variables explain the number of participants just as much as the number of medals won. In this thesis the Medal Count will be used as dependent variable in order to execute the ROC (i.e. Receiver Operating Characteristic) analysis.

2.3.2 Time and resources to plan

Olympic Games are played every four years, this means that once the edition is over, starting from the next day, athletes have to compete to qualify to the next edition. Most of authors in their papers didn’t consider the evolution of a nation during the years but introduced in their model only the most recent data available. The assumption is that no matter the path that lead a nation in a determined socio-economic situation, the only thing to be considered is the actual potential expressed by macro variables.

39

The first author who changed the perspective was Andreff (2008), who introduced in his research an important novelty: Input data were lagged by four year:

“Under the assumption that four years are required to build up, train, prepare and make an Olympic team the most competitive in due time, four years later”

The reasoning is simple and doesn’t represent the actual situation of a country but its potential, however this way of interpreting data is limited because doesn’t consider possible drastic changes that may happen in following years. The two different methods are the only one ever used by authors, but them seem to be both limited and offering only a partial view of the resources available by a country to prepare the Olympic Games, because of this in our thesis we are going to introduce a third method that combine the effects of the existing ones by considering the average values of the four years preceding the Olympic Games.

2.4 Loss of performance of Macro variables

It is common opinion that Macro variables have been the best method to forecast the results at the Olympic Games up to recent years, however authors have pointed their attention on the loss of performance of the model. According to Bernard and Busse (2002) and Stamm and Lamprecht (2000 & 2001) the importance of factors at the Macro-level has decreased during the years. In the latter study for example, authors found they could explain 57% of international sporting success using macro-level factors in the period 1964-1980; whereas, the percentage decreased to 45% of sporting success using these factors post 1980.

One possible explanation for these findings is that an increasing number of nations have taken a state sponsored strategic approach to the development of medal winning elites, it is the efficacy of these production systems that is becoming increasingly important rather than non-controllable macro-economic variables. In other words, medal winning is increasingly managed by government investment. However most of times the nations which undertake this approach are the ones with a defined and grown economical structure, in addition considering this kind of data would mean shift to a Meso-level approach and due to the difficulty of gathering data the analysis would consider only a limited number of participating nations.

40

On the other hand, the world has faced huge changes during the last two decades, the variables considered instead have remained substantially the same of 30-40 years ago, a possible interpretation could be the existence of new macro variables that are able to better explain the modern editions of the Olympic Games.

3 Gap in literature review

The state of art provided a clear picture of what has been done during the last decades concerning the analysis and the predictions of the medal table at the Olympic Games, in addition it showed how authors has proceeded most of times following a similar path trying to add new variables to the model but not obtaining significant results.

Few studies analyzed the gender impact on the nations’ performances and their purpose was not to show how the variables affected the results, but to look for specific variables that differentiated the female from the male gender. In addition, it has been showed that most democratic countries, where gender inequalities are not enhanced, tend to perform better compared to the low democratic ones. However, this effect only partially affects the women’s amount of medals won and other socio-economic variables have a larger impact, but it is a matter of fact that men and women of each nation perform very differently during the games in terms of: Qualified athletes, number of medals won and discipline performed.

As already said, Population and GDP still play the most important role in determining the potential of a nation in terms of talent creation and, consequently, result obtained. But the world has changed in the last decades and nowadays countries with similar characteristics are starting to perform very differently and these macro-variables are worsening in terms of accuracy of the prediction.

Purpose of this thesis will be double; the first investigated aspect will be the research of new macro-variable that better explain the actual society and the real aspects that can capture the potential of a nation in terms of sport competitiveness. In our opinion indeed is not a matter of loss of performance of macro-variables, but a lack of actualization of the model. New variables will be put together with old ones, in this way it will be possible to understand what are the factors that still affect the model and which ones are losing weight.

41

The second purpose will be to deepen the gender analysis, by analyzing consecutive Olympic Games edition it will be possible to trace the path of the two genders to look for similarities and differences and provide an explanation to them.

At the same time the models utilized will be the Multiple Linear regression and the Tobit regression, in fact, even if the superiority of the Tobit model has been already demonstrated by the authors it could be possible that the actualization processes could affect also the model itself, and a comparison of the obtained results will be performed.

In the light of the above, the questions that this thesis wants to answer are:

“Do Macro-variable still explain nation’s results at the Olympic Games? And how differently do them impact on the male and female genders?”

42

MATERIALS AND METHODS

43

4 Materials

In this section of the thesis will be explained how the variables were chosen and where the data were taken from. The first step was to analyze the scientific papers in the main articles’ databases: Research Gate, Scopus and Google Scholar. In this phase of analysis seven variables used by multiple authors were highlighted:

- Gross Domestic Product (GDP): Is the monetary value of the finished products and services provided during a given time period. GDP per capita is nothing more than GDP split by the number of inhabitants of the nation. This is an economic variable used to highlight country's financial possibilities, it should be noted that in regressions GDP per capita is expressed as natural logarithm, this change has been done because the positive effect of the variable decreases as the value increases.

- Population: Is the amount of people living in a certain period in a nation and represent a social indicator. This variable in regressions is also expressed as natural logarithm because, like for the GDP, as the value increases, the positive effects decrease.

- Ex-Communist regime: Indicates whether a nation is or has been a communist regime in the past. It is a socio-economic variable and is a dummy variable, it takes value 1 when a nation has been or is a communist regime and 0 otherwise.

- Surface: Identifies the total area of a country expressed in squared Kilometers.

- Host: Indicates whether a country is hosting the Olympic Games. There are two different uses of this variable, the first is as a dummy variable that takes value 1 when one nation hosts the Olympics Games and 0 for all other countries, the second is the most used in existing studies. It was observed that the host factor increases on average by 10% the number of medals conquered by the hosting nations, for this reason this variable does not become part of the ones used in the regression but directly changes the predicted value of medals won by the host country of the Olympics games increasing the amount of medals by the observed percentage.

44

- Urban population: Is a variable that identifies the percentage of population of a nation living in urban areas.

- Latitude: This value indicates the distance of the countries from the equator.

Among the variables listed above, the number of medals won by a nation in the previous edition of the Olympic Games is missing. The purpose of this thesis indeed is to verify the effectiveness of macro variables and the presence of this aspect would flatten the effectiveness of the other variables. In the second selection phase, macro variables were chosen to broaden the understanding of the model. The definitions, the units of measurement used and the reason for the choice ware fully explained below:

- Internet users: This variable is expressed as a percentage and represent the part of population of a country who has the possibility to access to internet services. Internet is defined as a world-wide public computer network that provides access to several communication services including the World Wide Web and carries email, news, entertainment and data files. Through this data it is possible to understand the level of development of a country, in addition as reported by Amiri (2013), this variable presents different values in countries with similar GDP, and, as consequence it could help explaining the differences in medal winning of comparable countries.

- Energy Consumption: Primary energy consumption measures the total energy demand of a country. It covers consumption of the energy sector itself, losses during transformation (i.e. oil or gas into electricity), distribution of energy, and the final consumption by end users. The unit of measurement used are the quadrillion of BTU. The BTU is the British Thermal Unit and is defined as the amount of heat needed to raise the temperature of one pound of water by one-degree Fahrenheit.

45

This variable was selected thinking about the different ways of assessing the economic power of a nation and then looking for other variables of economic origin in addition to GDP per capita. It will be essential to evaluate whether energy consumption will improve or worsen the medal collection's forecasting capacity compared to GDP or whether the two variables, working in symbiosis, will improve the capability of explaining the model. It is important to check whether the countries that win the most medals within the Olympic Games are the same ones that consume the most energy not only as urban consumption but also consumption at the industrial level.

- Migration: The net migration is the difference between the number of immigrants (people coming into an area) and the number of emigrants (people leaving an area) throughout the year. When the number of immigrants is larger than the number of emigrants, a positive net migration occurs. This is an important variable for several reasons, the first is the attractiveness of a nation, high-level athletes may choose to go in countries with state-of-the-art training facilities and capable to provide good salaries. The second reason concerns the genetic characteristics of athletes that through the migration process may be exported outside their origin country.

- Alcohol: This variable expresses the amount of pure alcohol consumed within a country considering only the population over the age of 15 and its measured in liters. Alcohol consumption is an important indicator because the value can be interpreted in two ways. The first aspect concerns health. High alcohol consumption has been shown to be correlated with several diseases. The second aspect concerns the wealth of a country and the ability to spend on the purchase of the resource, the higher the consumption the higher the wealth.

- Smoking: Defined as the percentage of people over the age of 15 who use tobacco. This variable provides an idea of a country’s health level. It has been shown how the use of tobacco causes many lung diseases even at a young age, and thus it may reduce the pool of potential athletes of a nation.

46

- BMI (i.e. Body Mass Index): is a biometric data, expressed as a weight-to-square ratio of an individual's height and is used as an indicator of shape weight status. This variable can give a priori information about the possible athletic abilities of the inhabitants of a nation and therefore about the chances of winning a medal at the Olympic Games.

- Democracy index: It is an index compiled by the Economist Intelligence Unit (EIU) for the first time in 2006 which aims to measure the state of democracy in the world. The index is a set of 60 indicators grouped into 4 different categories, measuring political pluralism, political culture and finally civil liberties. In addition to the numerical score, the index ranks each country in one of four regimes: full democratic, flawed democratic, hybrid regimes and authoritarian regime.

The tabulated data of each variable is obtained by averaging the values of the four years leading up to the Olympic competition, so as to keep under control the available resources in the participating countries in the period that brings to the Games (e.g. GDP in 2004 is the average of 2001, 2002, 2003 and 2004).

During the analysis, three clusters were created to better understand the impact of macro variables in the prediction of Olympic Games results. The first one consists in the totality of the medals won at the Olympic Games not distinguishing the gender of the winner, the second is composed by the number of medals won by women and finally the third, by the medals conquered by men. The value of some variables is different depending on which cluster have been examined and the gender specific variables are: Population, Alcohol, Smoking, BMI. As regards the data collection phase were used database sites such as: Statista, World Bank Data, Knoema.

47

5 Methods

The main tool for analyzing the selected variables is the Tobit Regression, in addition, multiple regressions were used to compare the goodness of the results obtained with the primary model. The quality of predictions was also analyzed with statistical tools which are MAE (mean absolute error) and ROC (Receiver Operating Characteristic).

5.1 Stepwise

The stepwise algorithm is an automatic regression method, used in investigative studies with numerous variables to find the subset of the "significant" ones. There are three possible strategies of applying the stepwise algorithm: Forward selection, Backward selection e Bidirectional elimination.

- The Forward selection builds a model starting from the absence of variables and then choosing them one at a time depending on the contribution provided to the model. The selection process ends after the stopping criteria is met. (i.e. algorithm stops when there is no more variable found with selected significance level).

- The Backward selection starts by taking into account the totality of variables inserted into the model and successively eliminating the variables that show a poor correlation with the dependent variable. Algorithm ends when the stopping criteria is met, in other words, when the variables that remained in the model are those with the requested significance.

- Bidirectional selection is a mixture of previous strategies, a variable that was initially included because of its significance can be excluded if the introduction of another variable determines its loss of effectiveness.

Within this thesis will be use the backward selection strategy.

48

5.2 Tobit

Tobit is a statistical model proposed by James Tobin in 1985. This model is also called “censored regression” because observations are expurgated in zero, in other words there happen a switch from the dependent variable Y to a partition of the variable itself called Y*. The model assumes that there is a latent variable (not observable) that is linearly dependent on a parameter (vector) that determines the relationship between the independent variable (or vector) and the latent variable (as in a linear model).

In addition, there is a normal error term distribution to capture random influences on this relationship. Because the expected value of the dependent variable is latent (i.e., not observed), it is not possible to obtain standardized coefficients, unless we apply a special procedure (Long, 1997, pp. 207-208). Even though the standardized coefficients seem usually preferred by psychologists, the economists (and particularly econometricians) dislike standardized coefficients and probably won’t recommend its use.

This model uses a different kind of R-squared compared to the MLR, in fact is called Pseudo R-squared or McFadden R-squared because of the name of his creator. Within the book “Behavioural Travel Modeling” Edited by David Hensher and Peter Stopher, McFadden explains in a chapter the concept of the pseudo R-squared.

“While the R2 index is a more familiar concept for the designer who has experience in OLS, it does not behave as well as the rho-square measure, for estimating ML. Those who are not familiar with the rho-square need to be warned that its values tend to be considerably lower than those of the R2 index ... For example, values from 0.2 to 0.4 per rho-square represent an excellent adaptation.”

5.3 Multiple Linear regression

Multiple linear regression, also known as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. The goal of multiple linear regression is to model the linear relationship between the explanatory (independent) variables and response (dependent) variable. In particular, multiple regression

49

is the extension of ordinary least-squares (OLS) regression that involves more than one explanatory variable.

The R-squared is a statistical metric that is used to measure how much of the variation in the outcome can be explained by the variation in the independent variables. R-squared can only be between 0 and 1, where 0 indicates that the outcome cannot be predicted by any of the independent variables and 1 indicates that the outcome can be predicted without error from the independent variables. An important information is that adding regressors to the model the value of R-squared can only increase, this does not mean that the model is better but only that the fitting of the data has been artificially improved.

5.4 Prediction Models

In this section will be explained how the process that lead to the results has been articulated. Should be noted that the procedure will be the same for both Tobit regression and Multiple regression in order to recognize the model that provides the better results.

- Stepwise phase: The first step consists in using the stepwise algorithm on the aggregate of the regressions, by doing so is possible to highlight which are the best variables to describe the model. The aggregate of the regressions consists of merging the datasets of all the single editions’ regressions.

- Training phase: This part consists in monitoring the behavior of the variables in time, in the specific case the period used for training are the Olympic games of 2004,2008,2012. It is important to remember that in this phase the dependent variable are the Olympic medals that country won in the respective Olympic edition.

- Testing phase: In this section the assumptions validated in training phase are tested through a prediction of the results of the last Olympic Games disputed in order to compare the results of the prediction with the real ones. By doing so will be used the medals won by each country in the penultimate Olympic Games (i.e. in order to

50

predict the results of 2016 must be used the Olympic medals of 2012). The predictive capability of the model is analyzed through the Mean Absolute Error and the ROC curve and the results will be also compared with the ones obtained using the Bernard & Busse model.

- Prediction phase: This final phase is accessed only if the results of the analyses carried out at the testing phase are satisfying. In the prediction phase the model is used to forecast the number of medals that the countries will win at the next Olympic Games. Is important to underline that in order to perform the prediction of 2020 Olympic games input data will be the average of the four years that lead to the event.

5.5 Mean Absolute Error

The Mean Absolute Error represents the distance between the predicted value and the actual value. The goodness of the model is reached when the value is closer to 0. It has to be underlined the presence of the absolute value that precludes the presence of negative values, its absence would lead to the elimination of positive and negative values bringing the result of the MAE close to 0 and, as consequence, a model with enormous errors could accidentally be exchanged for an optimum model. 푛 1 ∑ |푃푀 − 푂푀 | 푛 푖 푖 푖=1 Mean standard error (MSE) has not been used to analyze the results among the selected variables because the goal of the prediction is to check the effectiveness of macro variables and the use of MSE would over emphasize the presence of outliers due to the presence of the squared values. The MSE is commonly used when the margin of mistake among the predicted value and the real value is very reduced. 푛 1 ∑(푃푀 − 푂푀 )2 푛 푖 푖 푖=1

51

5.6 Receiver Operating Characteristics

Receiver Operating Characteristics in decision theory is a graphical tool that allows to test the diagnostic ability of a binary classifier. The two axes of the ROC curve are composed of the Sensitivity and “1-Specificity” which represent respectively the TPR (i.e. True Positive Rate) and the FPR (i.e. False Positive Rate). Reporting the theory to our thesis the ROC curve will be used to find the relationship between the correct and wrong predicted values. 푇푃 푇푃 푆푒푛푠𝑖푡𝑖푣𝑖푡푦 = 푇푃푅 = = 푃 푇푃 + 퐹푁 퐹푃 퐹푃 1 − 푆푝푒푐𝑖푓𝑖푐𝑖푡푦 = 퐹푃푅 = = 푁 퐹푃 + 푇푁

Before calculating the TPR and FPR values, however, have to be defined the meaning of True Positive (TP), True Negative (TN), False Positive (FP), and False negative (FN).

A TP occurs when the difference between the number of real Olympic medals and the predicted medals through regression is zero.

푇푃 = 푂푀 − 푃푀 = 0 ∧ 푂푀 > 0

A FN occurs when a nation doesn't win medals and the prediction confirms the failure.

퐹푁 = 푂푀 − 푃푀 = 0 ∧ 푂푀 = 0

A FP occurs when the number of Olympic medals won by a nation is greater than the number of the predicted medals.

퐹푃 = 푂푀 − 푃푀 > 0

Eventually, a FN occurs when the number of Olympic medals won by a nation is less than the number of medals predicted by the model.

푇푃 = 푂푀 − 푃푀 < 0

52

Real value

Positive Negative

Positive True positive False positive

Predicted value

Negative False negative True negative

Table 5.1 ROC contingent table

After obtaining the parameters described above the accuracy can be calculated

푇푃 + 푇푁 퐴푐푐푢푟푎푐푦 = 퐴퐶퐶 = 푃 + 푁

However, the ROC curve is commonly used with binary output in order to access the precision of the model in terms of obtaining a compliant or non-compliant result, but the output provided by this analysis is not binary. In fact, it will be attempted to provide the exact amount of medals won by each nation and the output will not be a score between 0 and 1 but a value that can go from 0 to 120. Nevertheless, the scheme of the ROC curve will be used to determine the quality of the prediction in terms of Accuracy.

53

54

RESULTS

55

6 Results

The results are going to be presented considering MLR model and Tobit model separately, each model is going to analyze the number of medals won by each nation to look for a correlation with the chosen independent variables with both male and female participants and then the genders are going to be split.

6.1 Multiple Linear Regression

Even though the Multiple Linear regression has been considered by many authors inferior to the Tobit in terms of accuracy in predicting Olympic medals, some researchers used this model anyway. Since the aim of this thesis is to improve the past results, it was necessary to consider both the most utilized methods of analysis and compare the results obtained. The model that considers together male and female will be called Aggregated genders.

6.1.1 Aggregated genders

A stepwise analysis was executed taking into account all the considered variables, in this procedure have been taken into account all the Olympic Games editions from the 2004 up to 2016, the outputs are shown in the picture.

The most impressive result is the removal of GPD as independent variable from the model, together with democracy Index and urban population. The number of observations is reduced compared to the total amount of participating countries during the four editions due to the fact that for smaller nations data are not available. However, this doesn’t represent a problem since no one of those countries ever won an Olympic medal and their absence doesn’t affect the model.

The number of significant independent variables is very high and also their level of significance, all of them reach the 1% level, with the exception of BMI (i.e. Body Mass Index) and Latitude, because of this, a loss of significance is expected in the analysis of single editions.

56

Figure 6.1 MLR agregated geneders stepwise

Coefficients are all positive except for BMI, this means that an increase in each variable lead to a positive contribution to the medal count and, as consequence, the constant is negative, with a value of -17, otherwise every nation would win by default a determined number of medals. This represent the second unexpected results because in the light of the reasoning carried out during the choice of the independent variable, a negative correlation was expected for BMI, Smoking and Alcohol consumption and only one out of three was confirmed by the regression.

Once performed the stepwise procedure, the variables with a confirmed significance have been tested edition by edition from 2004 up 2016 and the results are visible in the table: Energy consumption, Surface and Migrations remain significant at 1% level in all the edition and together with Population and Alcohol are the variables that contribute the most to the model. Latitude, BMI and Smoking totally or partially lose their significance compared to the aggregated stepwise. Internet users and the Ex-communist regime present an opposite behavior, the first became significant in recent editions, the latter loses his significance with time. The model is highly explained, the adjusted R-squared slightly grows starting from a

57

value of 0,7954 in 2012 to reach the highest value of 0,8042 in 2016 with a peak in 2008. This represent a great success because neither lagged medal share nor participants athletes has been inserted in the model.

Table 6.1 MLR aggregated genders Output edition by edition

58

6.1.2 Female gender

Similar results have been obtained running the stepwise analysis considering only the medals conquered by the women of each nation, the removed variables indeed are the same of the aggregated stepwise. There is a little change in the number of observation because some variables are gender-specific and more data were available for the female gender.

Figure 6.2 MLR female gender stepwise

The positive correlation of all variables with the medal count except for BMI is confirmed, in addition more than half of the correlated variables are significant at 1% level, BMI and the Ex-communist dummy variable present a lower significance compared to the other factors and a loss of significance in the single editions is expected.

The investigation of the consecutive Olympic games edition present huge differences compared to the aggregated model since large part of the independent variables lose their significance and eventually only Energy consumption and Alcohol seem to have a constant behavior. Smoking and Population impact on the model but never reach the 1% level of significance, while is very interesting that Migration seems to acquire importance with time

59

becoming very significant in 2012 and confirming the result in 2016. The adjusted R-squared increases of 10% from the 2004 edition to the 2008 and confirms the value of 86% in following editions, this means that fewer variables have obtained a higher level of explanation compared to the aggregated model.

Table 6.2 MLR female gender Output edition by edition

60

6.1.3 Male gender

The results obtained by the male cluster are eventually unique and differ from the other two classification. The removed variable indeed, together with Democracy index and Urban population, are Internet users, Latitude and Smoking. For the first time the Gross Domestic Product maintains its significance in the model even though the adjusted R-squared obtained by the overall regression is the lowest obtained by the different clusters with a value of 68,07%.

Figure 6.3 MLR male gender stepwise

The lower number of independent variables is compensated with a higher significance of the remaining ones, all of them indeed are significant at 1 % level, and, coherently with the previous results, the only factor with a negative impact remains the BMI. For the first time the adjusted R-squared worsen its performance with a starting level of 77,61% in 2004, which is still the lowest of the three models, to a final level of 67,89% that represent the worse result obtained with the Multiple Linear regression. On the contrary the number of significative variables is higher and more constant compared to the previous results. In fact, Migration, Energy consumption and Population never lose the 1% level of

61

significance. Alcohol and Ex-communist factors present a similar behavior with a loss of significance over time, another time the BMI is never significant while the GDP returns to have influence on the model in recent years and is interesting to observe that the only cluster with this variable significant has the poorer result in terms of explanation of the model.

Table 6.3 MLR male gender Output edition by edition

6.2 Tobit Regression

Tobit regression is considered by many authors as the reference model for the prediction of the Olympic medal table. Thanks to this, through the use of this model it is possible to have a direct feedback with the result obtained from the researches already published and then be able to verify the effect of the new selected variables. It is important to note that the Tobit model uses another type of R-squared namely pseudo R-squared (R-squared of McFadden) and this value cannot be compared with the MLR R-squared. As previously done for the multiple regression, the model that considers men and women together will be called Aggregate gender. In the output table of the Tobit regression the uncensored observations are indicated as U.O.

62

6.2.1 Aggregated genders

As in the case of the MLR we start with the execution of the stepwise algorithm and even with the Tobit model in a very surprising way GDP is eliminated. BMI and Democracy Index are also discarded and all the other variables that have been selected by the algorithm maintain a significance at 1% level, except for Latitude which has a significance at 5% level. As can be noted, the Tobit model censors all observations that have a value of the dependent variable lower than zero, for this reason the useful observations are lower than the MLR model. Moreover, all the coefficients are positive, therefore all the independent variables selected by the stepwise give a positive contribution to the model. Let us remember that the number of observations is lower than it should be because some data, especially from smaller countries, are not available. This does not preclude the goodness of the model because the missing data are the ones of the nations that do not win any medal and therefore in any case discarded by Tobit.

Figure 6.4 Tobit aggregated genders stepwise

63

As can be seen from the table Population, Migration, Alcohol and Energy Consumption present in all of the years analyzed a level of significance of 1% and then give the major contributions to explain the model. Three variables, namely Latitude, Urban Population and Smoking, never assume acceptable significance values in the analyzed editions. All the variables have positive coefficients. As for the pseudo R-Squared it can be noticed a slight decreasing trend because it declines every year. The observations considered valid by Tobit instead are growing, this is because more nations are predicted to win at least one medal.

Table 6.4 Tobit aggregated genders Output edition by edition

64

6.2.2 Female gender

Through the stepwise algorithm can be noted that the variables discarded are more compared to the aggregate case. GDP is no longer eliminated while BMI and Democracy Index are always discarded. In addition, Latitude, Smoking and Urban Population are removed as well. All the coefficients of the selected variables are positive. Considering the non-casualty of the variables, two of them, Internet User and Surface have a 5% level of significance while all the others reach the 1% level. The pseudo R-Squared is also greater than the aggregated cluster. Can be also noticed how left censored observation (i.e. censored when the dependent variable is equal to zero) are greater than the aggregate case and, therefore, the data of available observations is also smaller. This means that within the female cluster the nations that are expected to win medals are less.

Figure 6.5 Tobit female gender stepwise

65

The analysis of the single editions shows instead how the variables that have a 1 % level of significance are Population and Energy Consumption. It has to be highlighted that even the Ex-communist regime is always significant at 1% except in 2008 when reaches only the 5% level. Gross Domestic Product and Migration undertake an opposite behavior. The former becomes meaningless with time while the latter acquires significance. The coefficients are all positive except in 2004 when Internet user has negative coefficient. The pseudo R- Squared values are constant in 2004 and 2008 but developed a growth of almost 5% in 2012 and confirmed in 2016. Valid observations raise in time.

Table 6.5 Tobit female gender Output edition by edition

66

6.2.3 Male gender

As for men, in the stepwise backward selection process can be noted that the eliminated variables are the same as in the female case. The only exception is Internet users which becomes non-significant, and Smoking, which aggregates to the partition of significant variables. The pseudo R-Squared returns in line with that of the aggregate model after that in the female model there had been a much higher value. All variables selected by the algorithm have a 1% level of significance and again the coefficients are all positive. Compared to the female model, left censored observations are smaller, this means that there are more observations available and a consequent higher number of predicted winning countries.

Figure 6.6 Tobit male gender stepwise

67

Considering the results obtained edition by edition there are numerous variables with a 1% level of significance which are: Gross Domestic Product, Population, Migration, Ex- communist regime and Energy Consumption. Even alcohol is always significant at 1% level except for the 2016 edition when it reaches only the 5% level. Surface and Smoking instead, have the same behavior, lose significance with the passage of time. The coefficients are all positive. The pseudo R-Squared instead has a decreasing trend because it starts from a value of 31.67% and constantly decreases until reaching the value of 23.82% in 2016. The number of valid observations peaked in 2008 where 80 countries won at least one medal, dropping to 76 in 2012.

Table 6.6 Tobit male gender Output edition by edition

68

DISCUSSION AND CONCLUSIONS

69

7 Discussion

In the discussion section, the analysis has been deepened to better understand the phenomenon, to consider the statistical implications and to see how the model performed. Due to the fact that the standardized coefficients cannot be used in the Tobit regression, in order to compare the results obtained with the analysis, are going to be considered three different perspective, in this way it will be possible to understand the real impact of the independent variables and their evolution in time.

The first aspect investigated is the average effect, calculated as the coefficient of each variable multiplied by the mean value of the observations of all the nations (i.e. in the case of Energy consumption is the coefficient multiplied by the average value of Energy consumption of the participating countries) this kind of analysis provides a general overview of what are the variables that on average impact the most on the number of medals won. For the Ex-communist dummy variable, the coefficient will be multiplied by one in order to obtain the impact associated to the nations that have been communist. Latitude and Migration instead, present observations with both positive and negative values, because of this the average effect is expected to be close to zero.

For these reasons the results obtained provide only a partial explanation to the model, in fact a low average impact doesn’t mean a small impact. In fact, the second level of analysis wants to express the magnitude of the effect, the coefficient of each variable this time is multiplied by the difference of the maximal and the minimal values of all the available observations. The obtained value is the highest surplus of medals that a country can gain thanks to that specific variable in comparison with the lowest one, the higher the value the higher the potential impact of the variable. Only significant variables are going to be displayed due to the fact that if a variable is not significant it means that the impact on the model may be random and, as consequence, meaningless. To conclude, a final evaluation is going to consider the level of significance reached by the variables during the editions, in particular, the 1% level of significance will obtain a score of three, the 5% level a score of 2 and the 10% level a score of one. In this way the analysis can consider also the reliability of the model.

70

As regards the performances, MLR model and Tobit model are going to predict the medal won in 2016, but, in addition, the Bernard and Busse model will be taken into account, and in order to make the three models comparable, lagged medal share variable is going to be removed.

7.1 Multiple Linear Regression

7.1.1 Aggregated genders

In the model that considers both the results of male and female athletes it appears clear that the Population is the variable with the highest average effect, even considering the small fall of 2008 the trend seems to grow together with the growing trend of global population. However, the magnitude remains constant and low during all the editions, this means that the benefit of having a large population is not as large as was expected and the high levels of significance confirm the non-casualty of this factor.

The exact opposite happens to Energy consumption, its average effect indeed is far to be as large as population, but this is due to the large quantity of countries with small values of the observations that pull down the average value. Considering the magnitude instead it looks clear how important this variable is, because, contrary to Population, a high value of Energy consumption ensures a great increase in medals predicted. There is a clear growth of importance from the 2004 edition to the successive ones and it seems that the value is stabilizing on 80/90 medals but what impress the most is also the highest value of significance reached by this variable edition by edition. As already said the average effect of Migration tends to be naturally close to zero since the number of arrivals in a country corresponds to the number of departures in another, because of this the only aspect investigated is the magnitude and, even if a positive correlation was predictable, the importance of Migration is over the expectations. In fact, together with the constant high level of significance, in terms of magnitude of impact is second only to Energy consumption and in 2012 increases the prediction of medals up to 50. No trend is found probably due to the big changes of the observed values edition by edition. A negative trend is observable for the Ex-communist variable, at all the three levels of analysis, this is reasonable because the culture and the implications of having been a communist nation tends to fade away with

71

time. Alcohol and Surface present a similar behavior since both reach the highest level of impact in 2004 followed by an important fall and a consequent very light growing trend, in addition Alcohol presents a contraction also in terms of significance and it is expected to lose importance with time, contrary to Surface that maintains a small but constant impact. The number of Internet users seems to acquire importance, but the values are not enough to determine a clear and confirmed impact on the model. Surprisingly, Smoking, Latitude and BMI are never significant and because of this their effect is not considered, and if we take into account also the positive correlation of alcohol with the number of medals it is clear that the health aspect is less important than the economical one in determining the number of medals, otherwise these variables would have been significant with a greater impact and with a negative correlation. To sum up, it looks like that there are variables able to determine whether a country is going to win or not Olympic Medals, which are Population and Alcohol and variables, that are Migration, Surface and Energy consumption that asses the medal count of the most successful countries.

72

18 16 14 12 10 8 6 4 2 0 -2

Figure 7.1 MLR aggregated genders average effect

100 90 80 70 60 50 40 30 20 10 0

2004 2008 2012 2016

Figure 7.2 MLR aggregated genders magnitude of the effect

3,5 3 2,5 2 1,5 1 0,5 0

2004 2008 2012 2016

Figure 7.3 MLR aggregated genders significant variables

73

7.1.2 Female gender

Shifting to the female cluster the results drastically change and are more difficult to be interpreted because of the reduced number of significant variables. Only four indeed maintain their significance in at least three editions: Energy consumption, Surface, Migration and Alcohol.

The highest average effect belongs to Population but only for the editions of 2012 and 2016, where the variable is taken into account, and its magnitude is very small, the extra medals obtained by the country with the largest population are just three more than the one with the lowest one. Alcohol present a more constant behavior in all of the three aspects investigated, the significance is high for almost all the editions, with a little decrease in 2016. As concerns the impact of the model reaches the highest influence in 2004 but the magnitude remains always enough small and the overall effect remains small.

The most interesting variable is surely the Migration, in fact in 2008 starts to be significant and the impact on the model grows together with the level of significance, it is clear that this factor is becoming always more important and it will also be interesting to see what has happened with the male gender in order to do a comparison. As regards Energy consumption, it is the variable that impact the most on the model explaining the large part of it as demonstrated by the magnitude graph and, like happened in the aggregated genders its impact highly increase from the 2004 to the 2008 edition and then stabilize with a slight decreasing trend. A different situation appears analyzing the behavior of the Surface, even if is the third variable in terms of magnitude of impact, it loses the significance in 2008 and its effect in neither stable nor presenting a clear trend. Looking at the Ex-communist variable no specific considerations can be done, however considering the decreasing trend highlighted in the aggregated cluster it is reasonable to think that the impact of the communist regime was mainly on the male gender and, because of this it faded away sooner in the female gender as confirmed by the earlier loss of significance. The last variable with an impact on the model is Smoking, however its impact at all the 3 levels is negligible compared to the other significant variables.

74

The female gender analysis executed with the Multiple Linear regression shows that the Energy consumption macro variable largely explain the model. It must not be forgotten indeed that the results in terms or R-squared are the best among the three models as already showed in the Results section. In addition, it looks like that Migration started to have an impact later compared to the male and aggregates genders and, if we consider what was stated by different authors about the loss of importance of macro variables, it can be affirmed that this consideration is not valid for the female cluster. At the moment indeed macro variables can predict large part of the Olympic Games success, and, a probable explanation concerns the gender issue. In fact, because of this problematic reality, all over the world women that can choose to participate to the Olympic Games are fewer than the male gender and as consequence the level of competition tends to be lower. For these reasons the Meso- level’s impact on the results obtained by the nations is smaller and the large part of the performances are related to the Macro-level. This aspect will be deeper analyzed in the Conclusion section.

75

4,5 4 3,5 3 2,5 2 1,5 1 0,5 0 -0,5

2004 2008 2012 2016

Figure 7.2 MLR female gender average effect

60 50 40 30 20 10 0

2004 2008 2012 2016

Figure 7.3 MLR female gender magnitude of the effect

3,5 3 2,5 2 1,5 1 0,5 0

2004 2008 2012 2016

Figure 7.4 MLR female gender significant variables

76

7.1.3 Male gender

As already happened in the aggregated cluster, Population is significant in all of the editions and has by far the highest average effect associated to a growing trend even considering a small fall of the value in 2008. The magnitude follows the same trend of the average effect but with lower values. On the contrary Energy Consumption has an average effect that is close zero, but it has the highest magnitude which means that it provides the biggest contribution in the medal’s prediction, in particular, in 2008 reaches the highest value with 38 medals. The significance of this variable is constant in time. Migration, like Energy Consumption, has the usual average effect that is close to zero but a high magnitude, no trend can be identified but the values of magnitude remain high in time with a great contribution in medals prediction. The importance of the variable is also underlined by the fact that it is significant in all the editions. GDP increases its effects with the passing of the editions, in specific, in 2012 and 2016 have the second highest average effect, even the magnitude increases its values but the contribution of medals that this variable supplies to the countries with a high GDP is enogh small, with a highest surplus of 6 medals. As regards, the Ex-Communist variable, it has a constant average and magnitude effects and a slight decreasing trend of significance, but in general terms, has more impact respect the female cluster maybe because the male gender was more influenced by the regimen during years. Alcohol and Surface have both a decreasing trend in all of the three levels of analysis, but the differences are that the first remains significative all the time and have a higher average effect, instead, the second loses significance with time but has a greater impact on the magnitude. BMI is not considered in the analysis because is never significant.

77

12 10 8 6 4 2 0 -2

2004 2008 2012 2016

Figure 7.7 MLR male gender average effect

45 40 35 30 25 20 15 10 5 0

2004 2008 2012 2016

Figure 7.5 MLR male gender magnitude of the effect

3,5 3 2,5 2 1,5 1 0,5 0

2004 2008 2012 2016

Figure 7.9 MLR male gender significant variables

78

7.2 Tobit regression

7.2.1 Aggregated gender

In the model that considers both the results of male and female athletes it appears clear that the Population is the variable with the highest average effect, has a peak in 2004 with a fall of the value in the successive edition followed by an increasing trend and also the magnitude have the same behavior even if in smaller proportion. The importance of this variable is also confirmed by the high level of significance.

Differently from Population, Energy consumption have a very poor average effect due to the large quantity of countries with small values of the observations that pull down the mean. However, it has the highest level of magnitude in all the editions, which means that country with large quantities of energy consumption will be at top in the predicted medal table. As said before for the MLR, the average effect of migration is close to zero since the number of arrivals in a country correspond to the departure in another and as consequence of this the only aspect analyzed is the magnitude that provides an impact that is over the expectation. The variable can increase the number of medals predicted up to 57 medals in country with large immigration, in addition its importance is underlined by a continuous significance in time.

A negative trend is observable for the Ex-communist variable, at all three level of analysis, this because the impact of been a communist nation is waning in time. Alcohol is always significant and has a constant average and magnitude effects, in particular is important to underline the positive correlation among the variable and the number of medals and therefore the economical aspect overtakes the health aspect. As regards Internet User excluding the 2004 edition has a growing trend with good level of significance, however its impact on the model is smaller compere to the other significant variable as highlighted in average effect table.

Urban Population, Latitude and Smoking are never significant and because of this their effect is not considered.

79

80 70 60 50 40 30 20 10 0 -10

2004 2008 2012 2016

Figure 7.10 Tobit aggregated genders average effect

90 80 70 60 50 40 30 20 10 0

2004 2008 2012 2016

Figure 7.6 Tobit aggregated genders magnitude of the effect

3,5 3 2,5 2 1,5 1 0,5 0

2004 2008 2012 2016

Figure 7.12 Tobit aggregated genders significant variables

80

7.2.2 Female gender

In the female cluster, Population has the same trend of the aggregated model, it is always the variable with the highest average effect with a peak in 2004 and a fall of the value in the successive edition followed by an increasing trend, also the magnitude follows the same behavior even if in smaller proportion.As regards Energy consumption the average effect is always close to zero but the magnitude shows that this variable gives the higher contribution in term of medals, up to 47. In particular the lower value is registered in 2004 to grow then in 2008 and starting a slight decreasing trend. The importance of the variable is also underlined by the fact that is significant in all the editions. Migration acquires significance with the passing of the edition and looking at the magnitude effect is clear that it gives an important contribution to the prediction of medals even if the migrations’ value are lower respect at the aggregate model. The average effect is meaningless because the sum of immigration and emigration is all the countries is close to zero. Alcohol is always significant and has a constant average effect, the magnitude instead increases in time, therefore this means that country with high alcohol consumption has a major contribution of medals from this variable. Even in the female model the Ex- Communist variable decreases its predictive capability during the editions even if it is always significant throughout the time. Gross Domestic Product has the second highest average effect and also a great value of magnitude with a contribution up to 17 medals in country with a high GDP value, the criticality is represented by the fact that it has an acceptable significant level only in 2004. Internet users has a low average effect, the magnitude instead has an acceptable contribution in the only edition in which it reaches an acceptable level of significance. Looking at the Surface, it is significant only in two editions and present an average effect close to zero because of its small coefficient, when the variable is significant, magnitude provides a contribution in terms of medal gain that reaches bigger values from countries with an high surface. To sum up, in the female Tobit model the variables that are crucial to figure out if a country wins or not a medal are Population and GDP, when significant, the other variables that give an idea of the quantity of medals won are Energy Consumption, Migration and Population.

81

45 40 35 30 25 20 15 10 5 0 -5

2004 2008 2012 2016

Figure 7.13 Tobit female gender average effect

60 50 40 30 20 10 0

2004 2008 2012 2016

Figure 7.7 Tobit female gender magnitude of the effect

3,5 3 2,5 2 1,5 1 0,5 0

2004 2008 2012 2016

Figure 7.15 Tobit female gender significant variables

82

8.2.3 Male gender

In male cluster the GDP has significance in all the editions, average effect and magnitude follow the same trend which is positive even if a lower value is observable in 2008. The contribution of GDP in the richest countries is up to 18 medals. The observable trend of Population is the same that was already been observed before for the aggregate and female clusters which is a positive trend that start from the 2008 after the decrease that follows the 2004 edition. As explained above Migration and Energy Consumption have a average effect that is close to zero but the magnitude show that those variables have a great importance in term of medals that they can provide, in fact Migration can bring up to 31 medals while Energy Consumption 30. In addition the significance of this variable is constant in time which increase their importance. Similarly to the other clusters also the in male one the Ex- Communist variable presents a decreasing trend as regards the average effect and the magnitude, even if it is always significant in all the editions differently from the previous results. This can be due to the higher impact that the communism have had on the male gender during the years of the regime. Smoking has an average effect that is close to zero and a magnitude that decreases its impact in time. The significance follows the trend of the magnitude, in fact, the Smoking variable is significant only in 2004 and 2008. Surface keeps its significance only in the first year of analysis and as in the previous clusters has no impact on average effect. Instead, in 2004 the magnitude gives a great contribution of medals to the countries with a big surface, this would confirm the thesis of several authors who claimed that the surface of a country have no impact on the medal won at the Olympic Games. As regards Alcohol, it has same behavior of female cluster, whit a constant significance and average effect, the magnitude increases in time up to 8 medals in country with a high consumption of Alcohol. This confirms that Alcohol has a constant impact on the model and can provide up to 8 medals to a single country. Therefore in the male Tobit model, there are more variables with an acceptable level of significance, in addition it is possible to understand if a country is going to win a medal looking at the Population and GDP values. Instead, to have an idea of the quantity of medals it is necessary to observe Migration, Energy Consumption, Population and GDP.

83

50 40 30 20 10 0 -10

2004 2008 2012 2016

Figure 7.16 Tobit male gender average effect

35 30 25 20 15 10 5 0

2004 2008 2012 2016

Figure 7.8 Tobit male gender magnitude of the effect

3,5 3 2,5 2 1,5 1 0,5 0

2004 2008 2012 2016

Figure 7.18 Tobit male gender significant variables

84

7.3 Benchmark of performances

Once analyzed the contribution of the independent variables to the models, them have been tested predicting the medal table of the 2016 Olympic Games. In particular, the prediction obtained with Multiple Linear regression has been compared to the one obtained by the Tobit regression but also with Bernard & Busse’s model. By doing so, it has been possible to assess strength and weaknesses of each model, but also to look for the improvement or worsening of the new models. It will be analyzed the MAE (i.e. Mean Absolute Deviation), the MAE5 (i.e. Mean Absolute Deviation of the countries which won at least five medals) and the accuracy reached by the three models.

7.3.1 Aggregated genders

As reported in the table, the Tobit model performs better both in terms of MAE and MAE5 compared to Bernard & Busse’s, instead it can be noted how the MLR tends to perform better only with the indicator that consider the most winning countries. On the other hand, the MAE of the Multiple Linear regression is aligned with B&B. Both the models however present better performances compared to the past model, the Tobit lowers the MAE of 0,7 points, showing an improvement of the 21%. The improvements in MAE5 for Tobit and Multiple Linear regression are both the 15%.

10,00 9,00 9,00 8,00 7,64 7,67 7,00 6,00 5,00 4,00 3,00 2,98 3,00 2,35 2,00 1,00 0,00 B&B (Tobit) Tobit OLS

MAE MAE>=5

Figure 7.9 Aggregated genders MAE and MAE5

85

Taking into account the number of exact predictions it has been possible to analyze the results obtained by the models and better understand which of them provide the best output.

120 109 112

100 87

80 73

60 50 46 45 40 35 36

20 5 1 4 0 True positive True negative False positive False negative

B&B Tobit OLS

Figure 7.10 Aggregated genders ROC Output for each model

The behavior of the two Tobit models is almost the same, the only difference is that the Tobit model tend to underestimate the number of medals won as is demonstrated by the higher number of false negative observations. As regards the Multiple Linear regression the number of true negatives is much lower compared to the other models, on the other hand obtains the best result in terms of true positive predictions, even if it remains aligned with Tobit results. Looking at the number of false positive predictions it is clearly visible how the model tends to overestimate the number of medals.

86

0,70

0,58 0,60 0,55

0,50

0,39 0,40

0,30

0,20

0,10

0,00 B&B Tobit OLS

Figure 7.11 Aggregated genders accuracy index

In the light of the above the accuracy confirms that the Tobit model achieves the best performances followed by B&B which remains largely better than the MLR prediction.

8.3.2 Female gender

The analysis of the second cluster worsen the performances of the MLR model. The Tobit model indeed another time improves both MAE and MAE5 in comparison with B&B results respectively by 22,7% and 22,8%, on the other hand the MLR maintains a similar MAE to B&B and improves the MAE5 by 20,3%. Another time both the new models provide better results and like in the aggregated cluster the biggest improvement concerns the MAE

87

7,00 6,65

6,00 5,13 5,30 5,00

4,00

3,00

2,00 1,32 1,35 1,02 1,00

0,00 B&B (Tobit) Tobit OLS

MAE MAE>=5

Figure 7.12 Female gender MAE and MAE5

Analyzing in detail the predictions its impressive to see how similar the results of the B&B and Tobit models are, and, without the MAE analysis it would wrongly appear that no progress is made. Considering the MLR model instead the number of true positive grows of 250% compared to the Tobit models but at the same time the true negative worsens of 31% reducing dramatically the accuracy. In addition, the number of false positives exceed the number of false negative, determining a clear overestimation of the medals won. The accuracy confirms the worse performance of the MLR model.

140 131 129

120

100 90

80 68

60 41 44 40 33 25 24 20 10 4 4 0 True positive True negative False positive False negative

B&B Tobit OLS

Figure 7.13 Female gender ROC Output for each model

88

To conclude, the better results achieved by both the new models, is confirmed, however, neither of them succeeded in obtaining improved results in terms of accuracy due to the high difficulty of predicting the exact amount of medal won by each nation, but also to the fact that the number of countries that don’t win any medal have been already detected in the B&B model. On the other hand, the reduction of MAE confirms that even if the accuracy has not been improved, the prediction is closer to reality with a reduction of the error of the 21%.

0,80

0,70 0,67 0,66

0,60 0,50 0,50

0,40

0,30

0,20

0,10 0,00 B&B Tobit OLS

Figure 7.14 Female gender accuracy index

8.3.3 Male gender

Shifting to the male cluster, for the third time the Tobit model achieves the result of improving both MAE and MAE5, however the amelioration is not as enhanced as was in the other two groups. Considering the percentages indeed, the MAE decreases only by 13% together with a 8,7% reduction of the MAE5. Instead, the Multiple Linear regression improves by the 8,7% B&B’s prediction considering all the countries and, in addition, obtains the best value in terms of MAE5. The improvement is consistent and decreases the error by 10,3%.

89

7,00 6,26 6,00 5,71 5,61

5,00

4,00

3,00

1,75 1,86 2,00 1,51

1,00

0,00 B&B (Tobit) Tobit OLS

MAE MAE>=5

Figure 7.15 Male gender MAE and MAE5

Looking at the number of exact and wrong predictions can be noted how the Tobit model succeeds in slightly improving all the indicators and, at the same time, maintains a balance in terms of false positive and false negative predictions. The MLR model reaches the highest value in terms of True positive with a value of 12. Like the other clusters the tendency of this model is to overestimate the number of medals won by each nation. As regards the accuracy instead all the previous considerations are confirmed.

140

120 114 115

100 78 80 80

60 43 43 40 37 40 31

20 12 4 6 0 True positive True negative False positive False negative

B&B Tobit OLS

Figure 7.16 Male gender ROC Output for each model

90

0,7

0,587064677 0,60199005 0,6

0,5 0,447761194

0,4

0,3

0,2

0,1

0 B&B (Tobit) Tobit OLS

Figure 7.17 Male gender accuracy index

91

8 Conclusions

Thanks to the analysis executed with the classification of different clusters and different models, it has been possible to better understand the impact that socio-economic factors have on the capability of a nation to win a determined number of medals during the Olympic Games. The result can be considered positive, in fact, the initial purpose was to show how there could exist new Macro-variables capable to provide better results compared to the past, and this happened because it is clear and confirmed by both models that Energy consumption and Migration have taken the place of the GDP with the only exception of the Tobit male gender.

Energy consumption drastically changed the results obtained, increasing the R-squared and its effect is also much larger compared to the second historical variable, Population. The introduction of the other new variables, such as Internet Users and Alcohol slightly improved the results obtained providing a little but constant and significant contribute to the model.

Taking into account the different models used it is interesting to note the improvement that both of them provide, but the superiority of the Tobit model, as already claimed by several authors, is confirmed. It is able indeed to recognize the countries that are not going to win any medal and the introduction of the new variables increases its performances in all of the three clusters analyzed. This means that provides a clearer general overview of the medal winning phenomenon and better understands the potential of all of the participating nations. On the other hand, the Multiple Linear regression tends to overestimate the possibility of a country in winning medals and, as consequence, fails to detect the countries that do not win at all. However, it is superior to the old B&B model, and this means that new the variables better explain the phenomenon.

Both of the models and all of the clusters tend to rely on similar variables, and this demonstrates the non-casualty of the results. In the light of the above it can be affirmed that Macro variable are still good predictors of the Olympic Games, however some considerations have to be carried out:

92

- Importance of Meso-Level: Considering the results obtained it is evident how the Meso-level is acquiring importance in the Olympic performances of the participating countries. In particular, Macro-level defines the potential of each nation that, without specific actions can warrant itself a certain amount of medals. However, countries that decide to boost their chances of medal winning, with specific investments and policies can obtain an enormous increase in the number of medals won. Looking at the prediction table in the annex it is clearly visible how some countries like United Kingdom or Canada over-perform and, as already demonstrated by different authors, this is due to the specific sport policies they decided to make for the Olympic Games. On the contrary the United States are aligned with the prediction taking into account the great number of medals won, but this alignment doesn’t mean a non- implementation of the Meso-level’s strategies since the USA put a great effort in sport policies. The reason is probably hidden in the rules imposed by the IOC (i.e. only a determined number of athletes per nation is allowed to compete) which tries to prevent the monopolization of the games by a single country. In other words, if the first limit that regards the latent potential of each nation can be overtaken by the implementation of Meso-level strategies, the second limitation is imposed by the IOC and, as consequence, it has only to be accepted.

- Differences in gender: The most interesting result is surely a consequence of the distinction made in the gender of athletes which took part at the Olympic Games. In fact, even if the behavior can be considered similar, some interesting differences emerged. First of all, the male gender’s performances resulted to be a composition of multiple effects, Population together with Energy consumption and Migration are the main actors with a similar importance. This doesn’t happen in the female gender where the biggest role is played almost entirely by the Energy consumption of each country, it looks like this aspect reflects the potential of each nation. In other words, the wealth of each nation, represented by its level of energy consumption, largely predict the capability of winning Olympic medals. In addition, the trend of the two genders is opposite, it has been highlighted a decreasing R-squared value in the male cluster with both methods and, on the contrary, a growing one for the female gender. The reason that this thesis wants to provide is related to a time-gap among the two genders. In fact, due to the gender inequalities issue, women in some parts of the

93

world have achieved the same rights of men only in the last decades, however unfortunately this issue is far to be solved and, as consequence the global female sports participation is not developed as it could and should be. It follows that the level of competition is not as high as in the male gender because the pool of talented athletes is reduced since some of them are prevented or hampered from participating. In this scenario, the Macro-level largely explains the phenomenon as happened in past years for the male gender. However, thanks to the progresses that are being made in this important issue it is expected an always higher level of competition and, consequently a decreasing importance of the Macro-level in favor of the Meso-level.

9 Limitations and future research directions

Obviously, this thesis is not exempt from limitations. A first possible issue is given by the fact that in the research process only 4 editions were taken into account (from 2004 to 2016) of the Olympic Games, indeed, in the study of Bernard & Busse (the search with more references) the editions considered were 10 (from 1960 to 1996), therefore this thesis shows the effect of variables of a limited time window. However, the choice of this restricted period of analysis is due to the fact that data of many variables of the years earlier than 2000 were difficult to be found and thus it was not possible to gather data from complete databases. A second restriction consists in the choosing of the variables to be used within the model, the best ones indeed have been identified using the stepwise algorithm with the backward selection strategy. It was not checked whether the other two approaches which is the forward selection and the bidirectional selection had different optimal variables and therefore would lead to different results. A final limitation regards the process of choice of the variables, in fact, it has been selected only those used in studies that start from Bernard & Busse’s model, besides having propose new ones. So, among the variables that have not been tested and that have been used by other authors, there may be some able to explain the model with the same accuracy but not considered within this research. In addition, starting from this research can be identified two different line for future studies. As mentioned several times within this thesis the primary objective was to verify whether the macro variables were still able to predict the Olympic medal table, but, in order to

94

increase the performances of the model the meso level variables could be added to those of the macro level. In particular, having available the investment data that each nation dedicates to sports, and, in specific to the Olympic federations, it could be verified if there is a correlation between the number of medals won by a nation at the Olympic Games and sportsmen investments. A second line of development could be to verify if the selected macro variables used in this study can have a predictive ability even considering smaller clusters. In other words, it would be of great interest to check if the categories of sport that take place at the Olympic Games (e.g. athletics, combat sports, swimming, team sports) can provide a variation in the significant variables, or, to look for different behaviors of athlete’s performances in relation to the Macro aspects.

95

96

10 References

Andreff, Madeleine, Wladimir Andreff, and Sandrine Poupaux. 2008. “Les De ́terminants Economiques de la Performance Sportive: Pre ́vision des Me ́dailles Gagne ́es aux Jeux de Pe ́kin” [“Economic Determinants of Sport Performance: Forecasting Medals Won at Beijing Games”]. Revue d’Economie Politique.

Baimbridge, M. (1998) Outcome uncertainty in sporting competition: The Olympic Games 1896 – 1996, Applied Economics Letters, 5, 161–164.

Balmer, N. J., Nevill, A. M. and Williams, A. M. (2001) Home advantage in the Winter Olympics (1908–1998), Journal of Sports Sciences.

Balmer, N. J., Nevill, A. M. and Williams, A. M. (2001) Home advantage in the Winter Olym- pics (1908–1998), Journal of Sports Sciences, 19, 129–139. berkeley.edu. (2020). Mc Fadden pseudo R-squared. Retrieved from https://eml.berkeley.edu/~mcfadden/travel.html

Bernard, A.B., & Busse, M.R. (2000). Who wins the Olympic Games: Economic development and medal totals. Retrieved October 20, 2000, from: http://papers.ssrn.com.

Bian, X. (2005) Predicting Olympic medal counts: The effects of economic development on Olympic performance, The Park Place Economist.

Celik, Onur Burak, and Mark Gius. 2014. “Estimating the Determinants of Summer Olympic Game Perfor- mance.” International Journal of Applied Economics.

Chatfield, C. (1995) "Model uncertainty, data mining and statistical inference," J. R. Statist. Soc. A 158, Part 3, pp. 419–466.

Clumpner, R. A. (1994), 21st century success in international competition. In R. Wilcox (Ed.), Sport in the global village, Morgantown, WV: FIT, pp. 298–303.

David Hensher and Peter Stopher. (1979). Bahvioural Travel Modeling. ch.15 pp.306

97

De Bosscher, V. (2007) Sport policy factors leading to international sporting success, dissertation presented in partial fulfilment of the requirements for the degree of Doctor in Physical Education, Free University, Brussels.

De Bosscher, V., De Knop, P., Heyndels, B.(2003a). Comparing relative sporting success among countries: create equal opportunities in sport. Journal for Comparative Physical Education and Sport.

De Bosscher, V., De Knop, P., Heyndels, B.(2003b). Comparing tennis success among countries. International sport studies.

Draper, N. and Smith, H. (1981) Applied Regression Analysis, 2d Edition, New York: John Wiley & Sons, Inc.

Efroymson,M. A. (1960) "Multiple regression analysis," Mathematical Methods for Digital Computers, Ralston A. and Wilf,H. S., (eds.), Wiley, New York.

Forrest, David, Ian G. McHale, Ismael Sanz, and J. D. Tena. 2015. “Determinants of National Medals Totals at the : An Analysis Disaggregated by Sport.” Pp. 166–84 in Placido Rodriguez, Stefan Ke ́senne, and Ruud Koning, eds., The Economics of Competitive Sport. Cheltenham: Edward Elgar.

Forrest, David, Ismael Sanz, and J. D. Tena. 2010. “Forecasting National Team Medal Totals at the Summer Olympic Games.” International Journal of Forecasting.

Grimes, A., Kelly, W., & Rubin, P. (1974). A socio-economic model of national Olympic performance. Social science quarterly.

Heinilä, K. (1982). The totalisation process in international sport. Toward a theory of the totalization of competition in top-level sport. Sportwissenschaft,

IOC.(2020). https://www.olympic.org/the-ioc

Jiang M. & L.C. Xu (2005), Medals in Transition: Explaining Medal Performance and Inequality of Chinese Provinces, Journal of Comparative Economics.

98

Johnson, K.N., & Ali, A. (2002). A tale of two seasons: participation and medal counts at the summer and . Retrieved February 15, 2003, from http://www.wellesley.edu/economics/wkpapers/wellwp_0010.pdf. USA: Wellesley college, Massachusetts.

Jokl, E. (1964). health, wealth, and athletics In E. Simin (Eds.), International research in sport and physical education. Springfield: Thomas.

Jolliffe, Ian T. (1982). "A Note on the Use of Principal Components in Regression". Journal of the Royal Statistical Society, Series C, pp. 300–303.

Kiviaho, P., & Mäkelä, P (1978). Olympic Success: a sum of non-material and material factors. International Review of Sport sociology.

Kiviaho, P., & Mäkelä, P (1978). Olympic Success: a sum of non-material and material factors. International Review of Sport sociolog. knoema.com. (2020). No Title. Retrieved from https://knoema.com/

Kru ̈ger, A. (1989), The sportification of the world: are there any differences left? Journal of com- parative physical education and sport, 2, 5–6.

Leeds, Eva Marikova, and Michael A. Leeds. 2012. “Gold, Silver, and Bronze: Determining National Success in Men’s and Women’s Summer Olympic Events.” Journal of Economics and Statistics.

Levine, N. (1974). Why do countries win olympic medals – some structural correlates of olympic games succes. Sociology and Social Researc.

Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. Thousand Oaks, CA: Sage Publications.

McDonald, J. F. and Moffitt, R. A. 1980. The Uses of Tobit Analysis. The Review of Economics and Statistics

99

Morton, R.H. (2002). Who won the Sydney 2000 Olympics? An allometric approach. The Statistician.

Noland, Marcus, and Kevin Stahler. 2016. “What Goes into a Medal: Women’s Inclusion and Success at the Olympic Games.” Social Science Quarterly.

Novikov, A.D, & Maximenko, A.M. (1972). The influence of selected socio-economic factors on the level of sports achievements in the various countries. International Review of Sport sociology.

Otamendi, F. Javier, and Luis Miguel Doncel. 2014a. “Medal Shares in Winter Olympic Games by Sport: Socioeconomic Analysis After Vancouver 2010.” Social Science Quarterly.

Pfau W.D. (2006), Predicting the Medal Wins by Country at the 2006 Winter Olympic Games: An Econometric Approach, National Graduate Institute for Policy Studies, Tokyo, January, mimeo. qastack.it. (2020). Interpretation of the pseudo R-squared. Retrieved from https://qastack.it/stats/82105/mcfaddens-pseudo-r2-interpretation

Rencher, Alvin C.; Christensen, William F. (2012), "Chapter 10, Multivariate regression – Section 10.1, Introduction", Methods of Multivariate Analysis, Wiley Series in Probability and Statistics, 709 (3rd ed.), John Wiley & Sons, pp. 19

Seppänen, P. (1970). The role of competitive sports in different societies. Proceedings of the 7de world congress sociology. Varna, Portugal .

Shaw, S. and Pooley, J. (1976) National success at the Olympics: An explanation. In C. Lessard, J. P. Massicotte and E. Leduc (eds) Proceed- ings of the 6th International Seminar: History of Physical Education and Sport, Quebec, Trois Rivieres, pp. 1–27. stata.com (2020). Stepwise manual on stata. Retrieved from https://www.stata.com/manuals13/rstepwise.pdf stata.com. (2020). Multiple linear regression manual. Retrieved from https://www.stata.com/manuals13/rregress.pdf

100

stata.com. (2020). Tobit manual on stata. Retrieved from https://www.stata.com/manuals/rtobit.pdf statista.com. (2020). No Title. Retrieved from https://www.statista.com/

Suen, W. (1992). Men, money and medals: an econometric analysis of the Olympic Games. Discussion Paper from the University of Hong Kong.

Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica

Vagenas, George, and Dimitria Palaiothodorou. 2019. “Climatic Origin Is Unrelated to National Olympic Success and Specialization: An Analysis of Six Successive Games (1996–2016) Using 12 Dissimilar Sports Categories.” Sport in Society: Cultures, Commerce, Media, Politics.

Vagenas, George, and Eleni Vlachokyriakou. 2012. “Olympic Medals and Demo- Economic Factors: Novel Predictors, the Ex-Host Effect, the Exact Role of Team Size, and the ‘Population-GDP’ Model Revisited.” Sport Management Review. van Bottenburg, M. (2000). Het topsportklimaat in Nederland [The elite sports climate in the Netherlands]. ’s Hertogenbosch: Diopter-Janssens en van Bottenburg. wikipedia.com (2020). Rio 2016 Olympic medal table. Retrieved from https://en.wikipedia.org/wiki/2016_Summer_Olympics_medal_table

Willmott, Cort J.; Matsuura, Kenji. (2005). "Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance". worldbank.org. (2020). No Title. Retrieved from https://data.worldbank.org/

101

11 List of Figures

Figure 1.1 Number of partecipating countries ...... 9 Figure 1.2 The growth in the Olympic Games coverage ...... 16 Figure 2.1 Relationship between factors determining individual and national success ..... 21 Figure 2.2 The nine pillars which determine sport success ...... 26 Figure 2.3 Percentage of authors who decided to include Population and GDP in their models ...... 27 Figure 6.1 MLR agregated geneders stepwise ...... 57 Figure 6.2 MLR female gender stepwise ...... 59 Figure 6.3 MLR male gender stepwise ...... 61 Figure 6.4 Tobit aggregated genders stepwise ...... 63 Figure 6.5 Tobit female gender stepwise ...... 65 Figure 6.6 Tobit male gender stepwise ...... 67 Figure 7.1 MLR aggregated genders average effect ...... 73 Figure 7.2 MLR aggregated genders magnitude of the effect ...... 73 Figure 7.3 MLR aggregated genders significant variables ...... 73 Figure 7.4 MLR female gender average effect ...... 76 Figure 7.5 MLR female gender magnitude of the effect ...... 76 Figure 7.6 MLR female gender significant variables ...... 76 Figure 7.7 MLR male gender average effect ...... 78 Figure 7.8 MLR male gender magnitude of the effect ...... 78 Figure 7.9 MLR male gender significant variables ...... 78 Figure 7.10 Tobit aggregated genders average effect ...... 80 Figure 7.11 Tobit aggregated genders magnitude of the effect ...... 80 Figure 7.12 Tobit aggregated genders significant variables...... 80 Figure 7.13 Tobit female gender average effect ...... 82 Figure 7.14 Tobit female gender magnitude of the effect ...... 82 Figure 7.15 Tobit female gender significant variables ...... 82 Figure 7.16 Tobit male gender average effect ...... 84 Figure 7.17 Tobit male gender magnitude of the effect ...... 84 Figure 7.18 Tobit male gender significant variables ...... 84 Figure 7.19 Aggregated genders MAE and MAE5 ...... 85

102

Figure 7.20 Aggregated genders ROC Output for each model ...... 86 Figure 7.21 Aggregated genders accuracy index ...... 87 Figure 7.22 Female gender MAE and MAE5 ...... 88 Figure 7.23 Female gender ROC Output for each model ...... 88 Figure 7.24 Female gender accuracy index ...... 89 Figure 7.25 Male gender MAE and MAE5 ...... 90 Figure 7.26 Male gender ROC Output for each model ...... 90 Figure 7.27 Male gender accuracy index ...... 91

103

12 List of Table

Table 2.1 Analysis of the macro-level scientific papers ...... 37 Table 5.1 ROC contingent table ...... 53 Table 6.1 MLR aggregated genders Output edition by edition ...... 58 Table 6.2 MLR female gender Output edition by edition ...... 60 Table 6.3 MLR male gender Output edition by edition ...... 62 Table 6.4 Tobit aggregated genders Output edition by edition ...... 64 Table 6.5 Tobit female gender Output edition by edition ...... 66 Table 6.6 Tobit male gender Output edition by edition ...... 68 Table 13.1 2016 Aggregated genders medal Prediction ...... 106 Table 13.2 2016 Female gender medal prediction ...... 107 Table 13.3 2016 Male gender medal prediction ...... 108

104

13 Annex

In this section is shown the prediction made in the testing phase, in the order are displayed the real medals won by each country in 2016 Olympic Games (real), the prediction made with the Tobit (Tobit) and the Multiple Linear regression (MLR) and last the prediction obtained with Bernard & Busse model (B&B).

105

13.1 Aggregated genders

Country Real Tobit MLR B&B Country Real Tobit MLR B&B Country Real Tobit MLR B&B United States 117 103 104 41 Fiji 1 0 0 0 Kiribati 0 0 0 0 China 69 94 96 50 Finland 1 7 7 7 Kyrgyz Republic 0 0 3 0 United Kingdom 66 27 21 26 Grenada 1 0 0 0 Lao PDR 0 0 3 0 Russian Federation 56 56 52 37 Jordan 1 0 2 0 Latvia 0 6 7 5 Japan 41 28 23 30 Kosovo 1 0 0 0 Lebanon 0 0 3 0 France 39 22 18 25 Morocco 1 0 3 0 Lesotho 0 0 0 0 Germany 36 42 34 45 Niger 1 0 0 0 Liberia 0 0 0 0 Australia 28 22 22 21 Nigeria 1 5 6 7 Libya 0 0 0 0 Italy 28 15 15 23 Philippines 1 2 5 3 Liechtenstein 0 0 0 0 Canada 22 34 32 22 Portugal 1 5 7 4 Luxembourg 0 2 5 0 Korea, Rep. 21 20 16 20 Puerto Rico 1 0 0 0 Madagascar 0 0 0 0 Brazil 19 26 26 21 Qatar 1 0 4 5 Malawi 0 0 0 0 Netherlands 19 13 10 17 Singapore 1 1 5 9 Maldives 0 0 0 0 Azerbaijan 18 2 4 9 Tajikistan 1 0 1 0 Mali 0 0 0 0 Kazakhstan 18 11 11 19 Trinidad and Tobago 1 0 2 0 Malta 0 0 2 0 New Zealand 18 3 6 4 United Arab Emirates 1 6 8 10 Marshall Islands 0 0 0 0 Spain 17 15 13 19 Afghanistan 0 0 0 0 Mauritania 0 0 0 0 Denmark 15 8 7 9 Albania 0 0 4 0 Mauritius 0 0 0 0 Hungary 15 12 9 16 American Samoa 0 0 0 0 Micronesia, Fed. Sts 0 0 0 0 Kenya 13 0 2 0 Andorra 0 0 0 0 Moldova 0 3 6 0 Uzbekistan 13 1 4 9 Angola 0 0 4 0 Monaco 0 0 0 0 Cuba 11 1 4 12 Antigua and Barbuda 0 0 0 0 Montenegro 0 0 4 0 Jamaica 11 0 0 0 Aruba 0 0 0 0 Mozambique 0 0 1 0 Poland 11 15 11 27 Austria 0 10 11 11 Myanmar 0 0 0 0 Sweden 11 10 8 13 Bangladesh 0 0 0 0 Namibia 0 0 2 0 Ukraine 11 12 10 14 Barbados 0 0 1 0 Nauru 0 0 0 0 Croatia 10 3 7 0 Belize 0 0 0 0 Nepal 0 0 2 0 South Africa 10 11 14 6 Benin 0 0 0 0 Nicaragua 0 0 0 0 Belarus 9 11 8 10 Bermuda 0 0 0 0 North Macedonia 0 0 4 0 Czech Republic 9 15 11 20 Bhutan 0 0 0 0 Oman 0 0 3 0 Colombia 8 9 11 6 Bolivia 0 0 0 0 Pakistan 0 0 0 1 Ethiopia 8 0 3 0 Bosnia and Herzeg. 0 0 4 0 Palau 0 0 0 0 Iran 8 6 9 7 Botswana 0 0 1 0 Panama 0 0 2 0 Serbia 8 7 8 6 British Virgin Islands 0 0 0 0 Papua New Guinea 0 0 1 0 Turkey 8 14 16 15 Brunei Darussalam 0 0 0 0 Paraguay 0 0 2 0 Georgia 7 0 4 0 Burkina Faso 0 0 1 0 Peru 0 3 7 1 North Korea 7 5 5 0 Cabo Verde 0 0 0 0 Rwanda 0 0 1 0 Switzerland 7 9 9 16 Cambodia 0 0 0 0 Samoa 0 0 0 0 Belgium 6 13 10 12 Cameroon 0 0 3 0 San Marino 0 0 0 0 Greece 6 4 7 4 Cayman Islands 0 0 0 0 Sao Tomè and Principe 0 0 0 0 Thailand 6 6 10 7 Central African Rep 0 0 0 0 Saudi Arabia 0 12 14 14 Mexico 5 11 11 17 Chad 0 0 0 0 Senegal 0 0 0 0 Armenia 4 0 3 0 Chile 0 0 1 5 Seychelles 0 0 0 0 Lithuania 4 9 8 8 Comoros 0 0 0 0 Sierra Leone 0 0 1 0 Malaysia 4 3 7 6 Congo, Dem. Rep 0 0 4 0 Solomon Islands 0 0 0 0 Norway 4 7 7 13 Congo, Rep. 0 0 1 0 Somalia 0 0 0 0 Romania 4 8 7 19 Costa Rica 0 0 1 0 South Sudan 0 0 0 0 Slovak Republic 4 9 8 14 Cyprus 0 0 5 0 Sri Lanka 0 0 0 0 Slovenia 4 0 5 0 Djibouti 0 0 0 0 St. Kitts and Navis 0 0 0 0 Argentina 3 11 12 11 Dominica 0 0 0 0 St. Lucia 0 0 0 0 Bulgaria 3 8 8 8 Ecuador 0 0 3 0 St. Vinvent 0 0 0 0 Egypt, Arab Rep. 3 0 5 4 El Salvador 0 0 0 0 Sudan 0 0 1 0 Tunisia 3 0 3 0 Equatorial Guinea 0 0 3 0 Suriname 0 0 0 0 Venezuela, RB 3 0 0 9 Eritrea 0 0 0 0 Syria Arab Republic 0 0 0 0 Algeria 2 0 5 1 Gabon 0 0 2 0 Tanzania 0 0 3 0 Bahamas, The 2 0 1 0 Gambia, The 0 0 0 0 Timor-Leste 0 0 0 0 Bahrain 2 0 4 0 Ghana 0 0 0 0 Togo 0 0 0 0 Cote d'Ivoire 2 0 2 0 Guam 0 0 0 0 Tonga 0 0 0 0 India 2 11 13 18 Guatemala 0 0 1 0 Turkmenistan 0 0 4 6 Indonesia 2 3 9 12 Guinea 0 0 0 0 Tuvalu 0 0 0 0 Ireland 2 6 7 8 Guinea-Bissau 0 0 0 0 Uganda 0 0 8 0 Israel 2 2 4 8 Guyana 0 0 0 0 Uruguay 0 0 4 0 Mongolia 2 0 4 0 Haiti 0 0 0 0 Vanuatu 0 0 0 0 Vietnam 2 0 5 0 Honduras 0 0 0 0 Virgin Islands (U.S.) 0 0 0 0 Burundi 1 0 0 0 Hong Kong 0 0 0 8 Yemen, Rep. 0 0 0 0 Dominican Rep 1 0 2 0 Iceland 0 0 2 0 Zambia 0 0 1 0 Estonia 1 11 9 4 Iraq 0 0 3 2 Zimbabwe 0 0 0 0 Table 13.1 2016 Aggregated genders medal Prediction

106

13.2 Female gender

Country Real Tobit MLR B&B Country Real Tobit MLR B&B Country Real Tobit MLR B&B United States 61 57 56 24 American Samoa 0 0 0 0 Malawi 0 0 0 0 China 41 54 56 32 Andorra 0 0 0 0 Maldives 0 0 0 0 Russian Federation 29 30 28 22 Angola 0 0 2 0 Mali 0 0 0 0 United Kingdom 25 12 9 14 Antigua and Barbuda 0 0 0 0 Malta 0 0 1 0 Japan 18 13 10 17 Armenia 0 0 0 0 Marshall Islands 0 0 0 0 Canada 16 16 16 10 Aruba 0 0 0 0 Mauritania 0 0 0 0 Germany 15 22 16 27 Austria 0 3 4 2 Mauritius 0 0 0 0 Australia 12 11 11 9 Bangladesh 0 0 0 0 Micronesia, Fed. Sts 0 0 0 0 Netherlands 12 4 4 6 Barbados 0 0 0 0 Moldova 0 1 2 0 France 11 9 8 13 Belize 0 0 0 0 Monaco 0 0 0 0 New Zealand 11 1 3 0 Benin 0 0 0 0 Montenegro 0 0 3 0 Hungary 10 4 4 7 Bermuda 0 0 0 0 Morocco 0 0 0 0 Italy 10 6 6 12 Bhutan 0 0 0 0 Mozambique 0 0 1 0 Korea, Rep. 9 8 7 9 Bolivia 0 0 0 0 Myanmar 0 0 0 0 Spain 9 6 6 9 Bosnia and Herzeg. 0 0 2 0 Namibia 0 0 1 0 Denmark 8 2 3 1 Botswana 0 0 0 0 Nauru 0 0 0 0 Kazakhstan 8 5 4 9 British Virgin Islands 0 0 0 0 Nepal 0 0 1 0 Poland 8 7 5 15 Brunei Darussalam 0 0 0 0 Nicaragua 0 0 0 0 Sweden 8 3 4 4 Burkina Faso 0 0 0 0 Niger 0 0 0 0 Kenya 7 0 0 0 Cabo Verde 0 0 0 0 Nigeria 0 1 2 2 Jamaica 6 0 0 0 Cambodia 0 0 0 0 North Macedonia 0 0 1 0 Brazil 5 12 13 11 Cameroon 0 0 1 0 Oman 0 0 1 0 Ethiopia 5 0 1 0 Cayman Islands 0 0 0 0 Pakistan 0 0 0 0 Azerbaijan 4 0 0 2 Central African Rep 0 0 0 0 Palau 0 0 0 0 Belarus 4 4 3 3 Chad 0 0 0 0 Panama 0 0 0 0 Colombia 4 3 4 1 Chile 0 2 0 0 Papua New Guinea 0 0 1 0 North Korea 4 1 2 2 Comoros 0 0 0 0 Paraguay 0 0 1 0 Serbia 4 2 4 0 Congo, Dem. Rep 0 0 2 0 Peru 0 0 3 0 Switzerland 4 3 4 5 Congo, Rep. 0 0 0 0 Qatar 0 0 0 0 Thailand 4 1 3 2 Costa Rica 0 0 0 0 Rwanda 0 0 1 0 Bulgaria 3 2 4 1 Cyprus 0 0 2 0 Samoa 0 0 0 0 Croatia 3 0 2 0 Djibouti 0 0 0 0 San Marino 0 0 0 0 Czech Republic 3 6 5 9 Dominica 0 0 0 0 Sao Tomè and Principe 0 0 0 0 Greece 3 0 3 0 Dominican Rep 0 0 1 0 Saudi Arabia 0 5 6 4 Ukraine 3 5 5 7 Ecuador 0 0 1 0 Senegal 0 0 0 0 Bahrain 2 0 0 0 El Salvador 0 0 0 0 Seychelles 0 0 0 0 Belgium 2 3 4 3 Equatorial Guinea 0 0 1 0 Sierra Leone 0 0 0 0 Cuba 2 0 1 4 Eritrea 0 0 0 0 Singapore 0 0 2 0 Egypt, Arab Rep. 2 0 1 0 Estonia 0 1 3 0 Slovak Republic 0 3 3 5 India 2 6 9 10 Fiji 0 0 0 0 Solomon Islands 0 0 0 0 Mexico 2 4 5 8 Gabon 0 0 1 0 Somalia 0 0 0 0 Romania 2 3 3 9 Gambia, The 0 0 0 0 South Sudan 0 0 0 0 Slovenia 2 0 2 0 Georgia 0 0 1 0 Sri Lanka 0 0 0 0 South Africa 2 4 6 0 Ghana 0 0 0 0 St. Kitts and Navis 0 0 0 0 Tunisia 2 0 0 0 Grenada 0 0 0 0 St. Lucia 0 0 0 0 Venezuela, RB 2 0 0 2 Guam 0 0 0 0 St. Vinvent 0 0 0 0 Argentina 1 4 5 4 Guatemala 0 0 0 0 Sudan 0 0 0 0 Bahamas, The 1 0 0 0 Guinea 0 0 0 0 Suriname 0 0 0 0 Burundi 1 0 0 0 Guinea-Bissau 0 0 0 0 Syria Arab Republic 0 0 0 0 Cote d'Ivoire 1 0 1 0 Guyana 0 0 0 0 Tajikistan 0 0 1 0 Finland 1 1 3 0 Haiti 0 0 0 0 Tanzania 0 0 1 0 Indonesia 1 0 3 6 Honduras 0 0 0 0 Timor-Leste 0 0 0 0 Iran 1 2 5 2 Hong Kong 0 0 1 1 Togo 0 0 0 0 Ireland 1 1 3 0 Iceland 0 0 1 0 Tonga 0 0 0 0 Israel 1 0 1 0 Iraq 0 0 1 0 Trinidad and Tobago 0 0 1 0 Kosovo 1 0 0 0 Jordan 0 0 0 0 Turkmenistan 0 0 2 0 Lithuania 1 2 3 1 Kiribati 0 0 0 0 Tuvalu 0 0 0 0 Malaysia 1 0 2 0 Kyrgyz Republic 0 0 1 0 Uganda 0 0 3 0 Mongolia 1 0 1 0 Lao PDR 0 0 1 0 United Arab Emirates 0 0 2 0 Norway 1 2 3 3 Latvia 0 1 3 0 Uruguay 0 0 2 0 Philippines 1 0 1 0 Lebanon 0 0 1 0 Uzbekistan 0 0 1 3 Portugal 1 0 3 0 Lesotho 0 0 0 0 Vanuatu 0 0 0 0 Puerto Rico 1 0 0 0 Liberia 0 0 0 0 Vietnam 0 0 1 0 Turkey 1 6 7 6 Libya 0 0 0 0 Virgin Islands (U.S.) 0 0 0 0 Afghanistan 0 0 0 0 Liechtenstein 0 0 0 0 Yemen, Rep. 0 0 0 0 Albania 0 0 1 0 Luxembourg 0 0 2 0 Zambia 0 0 0 0 Algeria 0 0 2 0 Madagascar 0 0 0 0 Zimbabwe 0 0 0 0 Table 13.2 2016 Female gender medal prediction

107

13.3 Male gender

Country Real Tobit MLR B&B Country Real Tobit MLR B&B Country Real Tobit MLR B&B United States 56 48 48 22 Qatar 1 1 3 5 Latvia 0 3 4 2 United Kingdom 41 15 12 14 Singapore 1 2 4 5 Lebanon 0 0 0 0 China 28 38 40 26 Tajikistan 1 0 1 0 Lesotho 0 0 0 0 France 28 13 9 13 Trinidad and Tobago 1 0 2 0 Liberia 0 0 0 0 Russian Federation 27 27 24 19 Tunisia 1 0 1 0 Libya 0 0 0 0 Japan 23 15 12 16 United Arab Emirates 1 4 5 7 Liechtenstein 0 0 0 0 Germany 21 24 20 23 Venezuela, RB 1 0 0 5 Luxembourg 0 1 3 0 Italy 18 11 9 12 Afghanistan 0 0 0 0 Madagascar 0 0 0 0 Australia 16 13 12 11 Albania 0 0 2 0 Malawi 0 0 0 0 Azerbaijan 14 1 3 4 American Samoa 0 0 0 0 Maldives 0 0 0 0 Brazil 14 14 13 11 Andorra 0 0 0 0 Mali 0 0 0 0 Uzbekistan 13 0 3 4 Angola 0 0 4 0 Malta 0 0 1 0 Korea, Rep. 12 10 8 10 Antigua and Barbuda 0 0 0 0 Marshall Islands 0 0 0 0 Kazakhstan 10 6 6 10 Aruba 0 0 0 0 Mauritania 0 0 0 0 Cuba 9 3 4 6 Austria 0 7 6 6 Mauritius 0 0 0 0 South Africa 8 6 8 3 Bahrain 0 0 2 0 Micronesia, Fed. Sts 0 0 0 0 Spain 8 9 7 10 Bangladesh 0 0 0 0 Moldova 0 0 3 0 Ukraine 8 5 5 6 Barbados 0 0 1 0 Monaco 0 0 0 0 Croatia 7 1 3 0 Belize 0 0 0 0 Montenegro 0 0 2 0 Denmark 7 4 5 5 Benin 0 0 0 0 Mozambique 0 0 0 0 Georgia 7 0 3 0 Bermuda 0 0 0 0 Myanmar 0 0 0 0 Iran 7 2 5 4 Bhutan 0 0 0 0 Namibia 0 0 2 0 Netherlands 7 7 6 9 Bolivia 0 0 1 0 Nauru 0 0 0 0 New Zealand 7 2 4 2 Bosnia and Herzeg. 0 0 2 0 Nepal 0 0 1 0 Turkey 7 8 9 8 Botswana 0 0 1 0 Nicaragua 0 0 0 0 Canada 6 16 16 12 British Virgin Islands 0 0 0 0 North Macedonia 0 0 2 0 Czech Republic 6 9 7 10 Brunei Darussalam 0 0 0 0 Oman 0 0 3 0 Kenya 6 0 1 0 Bulgaria 0 5 5 4 Pakistan 0 0 0 0 Belarus 5 6 5 4 Burkina Faso 0 0 0 0 Palau 0 0 0 0 Hungary 5 6 5 8 Burundi 0 0 0 0 Panama 0 0 2 0 Jamaica 5 0 0 0 Cabo Verde 0 0 0 0 Papua New Guinea 0 0 0 0 Armenia 4 0 2 0 Cambodia 0 0 0 0 Paraguay 0 0 2 0 Belgium 4 7 6 7 Cameroon 0 0 2 0 Peru 0 2 5 1 Colombia 4 4 7 3 Cayman Islands 0 0 0 0 Philippines 0 1 3 1 Serbia 4 3 4 3 Central African Rep 0 0 0 0 Portugal 0 3 4 2 Slovak Republic 4 6 5 7 Chad 0 0 0 0 Puerto Rico 0 0 1 0 Ethiopia 3 0 2 0 Chile 0 5 6 3 Rwanda 0 0 1 0 Greece 3 2 3 2 Comoros 0 0 0 0 Samoa 0 0 0 0 North Korea 3 1 3 3 Congo, Dem. Rep 0 0 2 0 San Marino 0 0 0 0 Lithuania 3 5 5 4 Congo, Rep. 0 0 1 0 Sao Tomè and Principe 0 0 0 0 Malaysia 3 2 4 3 Costa Rica 0 0 2 0 Saudi Arabia 0 6 8 8 Mexico 3 6 6 9 Cyprus 0 0 2 0 Senegal 0 0 0 0 Norway 3 4 5 7 Djibouti 0 0 0 0 Seychelles 0 0 0 0 Poland 3 10 7 14 Dominica 0 0 0 0 Sierra Leone 0 0 0 0 Sweden 3 5 5 7 Ecuador 0 0 3 0 Solomon Islands 0 0 0 0 Switzerland 3 7 6 9 El Salvador 0 0 0 0 Somalia 0 0 0 0 Algeria 2 0 3 0 Equatorial Guinea 0 0 3 0 South Sudan 0 0 0 0 Argentina 2 6 6 6 Eritrea 0 0 0 0 Sri Lanka 0 0 0 0 Romania 2 7 5 9 Finland 0 4 4 4 St. Kitts and Navis 0 0 0 0 Slovenia 2 0 3 0 Gabon 0 0 2 0 St. Lucia 0 0 0 0 Thailand 2 5 6 3 Gambia, The 0 0 0 0 St. Vinvent 0 0 0 0 Vietnam 2 0 2 0 Ghana 0 0 1 0 Sudan 0 0 1 0 Bahamas, The 1 0 1 0 Guam 0 0 0 0 Suriname 0 0 0 0 Cote d'Ivoire 1 0 1 0 Guatemala 0 0 1 0 Syria Arab Republic 0 0 0 0 Dominican Rep 1 0 2 0 Guinea 0 0 0 0 Tanzania 0 0 2 0 Egypt, Arab Rep. 1 0 3 2 Guinea-Bissau 0 0 0 0 Timor-Leste 0 0 0 0 Estonia 1 4 5 1 Guyana 0 0 0 0 Togo 0 0 0 0 Fiji 1 0 0 0 Haiti 0 0 0 0 Tonga 0 0 0 0 Grenada 1 0 0 0 Honduras 0 0 0 0 Turkmenistan 0 1 3 2 Indonesia 1 2 4 6 Hong Kong 0 0 0 4 Tuvalu 0 0 0 0 Ireland 1 5 5 4 Iceland 0 0 2 0 Uganda 0 0 4 0 Israel 1 1 3 4 India 0 7 5 9 Uruguay 0 0 2 0 Jordan 1 0 0 0 Iraq 0 0 2 1 Vanuatu 0 0 0 0 Mongolia 1 0 3 0 Kiribati 0 0 0 0 Virgin Islands (U.S.) 0 0 0 0 Morocco 1 0 1 0 Kosovo 0 0 0 0 Yemen, Rep. 0 0 0 0 Niger 1 0 0 0 Kyrgyz Republic 0 0 2 0 Zambia 0 0 1 0 Nigeria 1 3 4 3 Lao PDR 0 0 1 0 Zimbabwe 0 0 0 0 Table 13.3 2016 male gender medal prediction

108

14 Ringraziamenti

Il primo ringraziamento va certamente al professor Emanuele Lettieri, per averci proposto un tema estremamente stimolante e riguardante il mondo dello sport che da sempre mi appassiona e ad Andrea di Francesco che insieme al professor Braghin ci hanno seguito consigliato e accompagnato in questo non facile percorso.

Il secondo ringraziamento va sicuramente ai miei genitori che mi hanno dato la possibilità di intraprendere questo cammino aiutandomi a rimanere sempre concentrato e consapevole degli obiettivi da raggiungere.

Un ringraziamento a mia sorella Giulia, capace di starmi sempre vicina e di gioire con me per ogni singolo traguardo raggiunto, anche il più banale.

Un ringraziamento ai miei zii, Nadia e Daniele, sempre premurosi e pronti a sostenermi nell’intero percorso di studi e a tutto il resto dei parenti che non solo non mi hanno mai fatto mancare nulla ma mi hanno sempre dimostrato cosa significhi far parte di una famiglia.

Il ringraziamento più grande va ai miei nonni, Antonietta, Alice e Virginio, penso che nessuno più di loro abbia aspettato questo momento e abbia fatto il tifo per me. Visto Virgin che alla fine ce l’ho fatta?

Ringrazio tutti i miei amici, non ho la possibilità di nominarvi tutti ma sono sicuro che ognuno di voi sa l’importanza che ha avuto in tutto questo viaggio ma in generale nella mia vita, grazie.

Per finire un grazie al mio amico Luca, il vero compagno di quest’avventura al Politecnico, con il quale ho condiviso i momenti più importanti, ma anche i più difficili. Se non avessi chiuso questo percorso con te non me lo sarei mai perdonato, grazie.

Stefano

109

Un primo e doveroso ringraziamento va al Professor Emanuele Lettieri, per averci guidato nel corso della stesura della tesi e averci dato la possibilità di condurre un interessante lavoro di ricerca.

Inoltre vorrei ringraziare anche il Professor Francesco Braghin e Andrea di Francesco per averci fornito sempre spunti interessanti per proseguire il lavoro.

Un ringraziamento speciale va alla mia famiglia per avermi concesso di completare gli studi in serenità, sempre pronti a supportarmi nei momenti del bisogno.

Ringrazio anche i miei amici che mi sono stati vicini in questi anni. In particolare vorrei ringraziare Stefano, amico vero con cui ho condiviso tutto il percorso del Politecnico. Ricordo come se fosse ieri il ritorno a casa in macchina dopo l’esame di fisica…

Luca

110