POLITECNICO DI MILANO School of Industrial and Information Engineering
Total Page:16
File Type:pdf, Size:1020Kb
POLITECNICO DI MILANO School of industrial and information Engineering Master of Science in Management Engineering Predicting Olympic Games: Do Macro variables still matter? Insight from a Prediction Model applied to both genders Supervisor: Prof. Emanuele Lettieri Co-Supervisor: Prof. Francesco Braghin Andrea Di Francesco Master thesis of: Luca Tamagni, ID: 905705 Stefano Tettamanti, ID: 905480 Academic Year 2019/20 ABSTRACT Olympic games are one of the most known and followed events in the world. The first Olympic games were held in Greece in 776 a.C. in the city of Olimpia and consisted of a single running race among the local population. Nowadays every four years, the best athletes of the world compete in all the principal sports practiced in the five main continents. Due to the prestige associated with this event, all of the participating countries are interested in conquering the higher medal count, guaranteeing a professional preparation to their athletes and sometimes also giving them monetary prizes in case of success. Since the end of the 60’, it was considered of great interest to understand if the macro variable could explain the number of medals won by each country, finding out that population and GDP were the main factors that contributed to the Olympic triumphs. In recent years, macro variables seem to give worse results compared to the past. The objective of this thesis is to understand if there are new macro variables that can predict the number of medals won by each nation during the Olympic games and deepen the analysis considering the differences in gender by creating three clusters: Female, Male and aggregated genders. Three Olympic Games editions, 2004, 2008 and 2012 have been used as training set, the 2016 edition instead, as testing set. Tobit and Multiple Linear Regression’s predictions have been compared together with the reference research of Bernard and Busse (2000). The models eventually have been evaluated through MAE and accuracy to determine the best model for all of the clusters. I ABSTRACT (Italian) I giochi Olimpici sono uno degli eventi più conosciuti e seguiti al mondo. I primi giochi Olimpici si svolsero in Grecia nel 776 a.C. nella città di Olimpia e all’epoca consistevano in una singola gara di corsa che veniva disputata tra la popolazione locale. Oggi, ogni quattro anni, i migliori atleti del mondo competono tra di loro in tutte le principali discipline praticate nei 5 maggiori continenti. Per via del prestigio associato a questo evento, tutti gli stati partecipanti sono interessati nel conquistare il maggior numero di medaglie, garantendo una preparazione professionale ai propri atleti e a volte anche dando loro primi in denaro in caso di successo. Dalla fine degli anni 60’ fu considerato di grande interesse comprendere se le macro- variabili potessero spiegare il numero di medaglie vinte da ciascuna nazione, scoprendo che la popolazione e il PIL erano i principali fattori che contribuivano ai trionfi Olimpici. Negli ultimi anni, le macro variabili sembrano avere un comportamento peggiore rispetto al passato. L’obiettivo di questa tesi è capire se esistono nuove macro variabili capaci di predire il numero di medaglie vinte da ogni nazione durante i Giochi Olimpici e approfondire l’analisi considerando differentemente i generi creando tre cluster: donne, uomini e l’insieme dei due precedenti. Le tre edizioni Olimpiche del 2004, 2008 e 2012 sono state utilizzate come training set, quella del 2016 invece come testing. Le previsioni effettuate tramite Tobit e regressione Lineare Multipla sono state comparate con la ricerca di riferimento di Bernard & Busse del 2000. I modelli infine sono stati valutati attraverso il MAE (Mean Absolute Error) e l’Accuracy ( percentuale di previsioni esatte) per determinare il miglior modello per tutti i cluster. II Executive summary Summer Olympic Games represent one of the most followed and important events in the world and, as consequence, this topic has always been subject of researches and investigations. In particular, the most interesting aspect consisted in knowing in advance the results of the medal table, in other words, who is going to win the Olympic Games? The attention around this theme started a long time ago and Jokl, in 1964, executed the first official research to investigate the factors that lead to Olympic success. People attracted by this prediction have increased during the years, especially in the last two decades, together with the growth of the economic interests related to the event. The turning point happened with Bernard and Busse’s research in 2000 when the authors achieved the result of detecting the macro socio-economic factors which contributed the most to determine nations’ Olympic performances in terms of medals won. The research became the reference work and, starting from it, several authors tried to improve or deepen the analysis. However, with time passing, the accuracy of the models which used the structure provided by Bernard and Busse decreased and, Macro-level partially lost its importance. New studies started to consider the investments and the policies applied by the single nation, or little clusters of them, in order to understand the winning strategies. Nevertheless, the loss of accuracy of the prediction provided by Macro-level may be related to the huge changes that are taking place in recent years and, it is possible, that the introduction of new macro variables may restart explaining the model with the previous precision. In addition, the performances achieved by different genders has never been deeply investigated and the effect of the same factors on men and women have to be taken into account to provide a more accurate prediction. The thesis begins by showing in chapter 1 the evolution of the Olympic Games among the years starting from their origins. A general overview is provided to stress out the importance of the event and the interest that has developed around it In particular the history of the III Olympic Games is traced considering their evolution in several aspects, number of participating athletes, the increasing number of sponsorships, the coverage of the event but also the female participation at the Games. Chapter 2 concentrates on the academic literature review, the research considered all the factors that can determine the Olympic success and is split into three parts: Micro-level, which investigates the athlete and his coach’s perspective, the Meso-level, which analyses the sport policies undertaken by single countries to achieve better results and, the Macro- level, which considers the socio-economic factors that determine the possibility of success of all the participating nations. About the Micro-level it turned out that under athlete’s perspective the personal commitment and the environment in which he lives are the key factors to reach the success. Instead, the Meso-level showed how a well-developed economy is just the starting point to build the Olympic success. In particular, the process that transform a talented kid into an Olympic champion needs to be carefully planned and only with a correct path it is possible to exploit the pool of natural talents of each nation. About the Macro level, it looks like Population and Gross Domestic Product largely predict the number of medals won during the Games, however the correlation among them is fading away with time. Chapter 3 focuses on the gap in literature together with the purpose of research. In fact, although many researchers have tried to predict the medal table of the Olympic Games, all of them used as starting point in their analysis Population and GDP (i.e. Gross Domestic Product) and it has never been investigated whether there could exist new variables able to substitute them and improving the results. In addition, although the female gender has been analyzed separately by few authors, nobody looked at the different impact that the same macro variables could have on the two different genders. By means of Tobit and multiple linear regression the purpose of this research is to build a reliable model to predict the Olympic medal table considering the overall, male and female results. The Olympic editions of 2004, 2008 and 2012 will be used as training set and the prediction will be tested on the 2016 Olympic Games. Thus, at the end of the research it will be possible to answer the question: “Do Macro variables still predict the Olympic Games? And how do they impact on the different sexes?” IV In chapter 4 is explained how the data collection was designed, in particular the macro variables were chosen both analyzing the ones already selected by the authors and introducing the new ones that, under determined hypothesis, could explain the model. Data of the 201 participating countries were collected using online official databases. Chapter 5 is dedicated to methodology, to clarify the technique that has been used to develop the predictive model of Olympic medal table. It was decided to use the Tobit and the multiple linear regression, authors indeed demonstrated the superiority of the Tobit model for this kind of prediction, however, the multiple regression has been used by several authors and the superiority of the Tobit model dates back to 2000 when the last comparison was made. Due to the assumption made, that considers obsolete Bernard and Busse’s model it was necessary to put in discussion also the regression model. Then, in chapter 6, the results obtained by both the models are presented. For each of the clusters, the outputs obtained by the stepwise analysis are shown together with the result of the single editions. It has been chosen to show the results dividing them by the model they refer to and not by the cluster because in this way it was possible to look at the different influence that variables had on the three groups.