Composing the Optimal Football Squad an Ordered Probit Approach on Changing the World of Football

UNIVERSITY OF AMSTERDAM MASTER THESIS ECONOMETRICS

Composing the Optimal Football Squad An ordered probit approach on changing the world of football

Thesis presented for the degree of Master of Science in Econometrics

Author: Supervisor: Gijs Kruikemeier Dr. J. C. M. van Ophem

Student number: Second Reader: 10750754 Dr. M. J. van der Leij

Track: Date: Econometrics July 25th, 2018

Abstract

Every football club in the world is remembered by its heroic victories. The manager and sporting director of the club can have great influence when it comes to winning prizes. With data driven analyses likely being the future of the football landscape, models that help clubs in managing their teams, become more and more relevant. Therefore, this thesis presents a management tool that can optimize a team’s chances of fulfilling its sportive ambitions by adjusting their squad. Over 3.700 Premier League matches from the past ten years are used to estimate an ordered probit model on match outcome. The differences in footballing abilities between the two opposing teams, as measured by the Euro Player Index (EPI), are used as main explanatory variables. It is found that only the difference in EPI between the central midfielders, the left midfielders, and the substitutions are of significant influence on match outcome. Additionally, a model for player market value, with EPI as explanatory variable, is presented. The results of the ordered probit model for match outcome and the model for player market value are then combined to create the management tool. Subjective to a budget constraint, the tool maximises the probabilities of winning points. Given the squads of the opponents in the coming season, the most efficient budget distribution across the team can be obtained. With that, the club can optimise the probability of ending the season at the desired place in the league table.

2 Statement of Originality

This document is written by Gijs Kruikemeier who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

3 Contents

Abstract ...... 2

Introduction ...... 7

2 Literature Review ...... 10

2.1 Sporting ambitions ...... 11

2.1.1 Manager performance ...... 11 2.1.2 Squad performance ...... 12 2.1.3 Player performance ...... 15

2.2 Club management...... 17

2.3 Individual wages ...... 19

2.4 Transfers ...... 20

2.5 Summary and implications ...... 21

3 Data and Variables ...... 22

3.1 The Euro Player Index ...... 22

3.2 Players in the ordered probit dataset ...... 25

3.3 Matches in the ordered probit dataset ...... 27

3.4 Relation between EPI and market value dataset ...... 29

4 Model Specification ...... 30

4 4.1 Ordered probit model for match outcome ...... 31

4.2 Three versions of the model ...... 32

4.3 Specification tests...... 35

4.4 Manager tool ...... 36

5 Results and Analysis ...... 38

5.1 “Difference in EPI per position” results ...... 39

5.2 “Difference in EPI direct opponents” results ...... 41

5.3 “EPI evaluated team” results ...... 43

5.4 Model comparison and tests ...... 44

5.5 The final model ...... 46

5.6 Manager tool and market value estimation results ...... 49

6 Summary and Conclusions ...... 54

References ...... 57

6 Introduction “European football is unquestionably the world’s most popular sport.” (Matheson, 2003). While this may seem like a rather bold claim in the book of Matheson, it can be supported by numbers and facts. The 2017 final of the most prestigious football club tournament, the Champions League, had an estimated global TV audience of 350 million people (Bentley, 2017). In comparison, the estimated number of viewers for the Super Bowl1, was not even half of that. USA Today (2016) estimates that the European competition for countries in 2016 even attracted two billion television viewers (USA Today, 2016). A sport this big naturally has a lot of money involved. Eurosport (2018) estimates that football clubs across the globe spent an astronomical amount of 6.37 billion dollars on buying players in 2017 (Eurosport, 2018). With clubs spending that much money on transfer sums, let alone the salaries and bonuses they have to pay, it can be questioned whether football clubs are still profitable. Frick (2007) indeed states that most clubs try to maximize sporting success instead of business success (2007, p. 426). He claims that the revenues from ticket sales, merchandise and the sale of television broadcasting rights are directly spent in order to achieve more on the pitch (Frick, 2007, p. 426). It is therefore of great importance that when a club wants to maximize utility, or sporting success, as its main goal, this optimizing process is thoroughly investigated. Consequently, there has been a great deal of research in the field of optimizing team performance: the papers of Carmichael, Thomas & Ward (2000), Oberstone (2011), Dobson & Goddard (2003) and Kern & Süssmuth (2005), just to name a few. However, team performance is not the sole means when it comes to achieving sporting ambitions. For a football team to perform, it needs the right people in the right spots. The team’s manager must be able to lead all the individual players and forge them into a solid squad. Moreover, a striker is expected to score goals, a midfielder to give key passes and a keeper to get clean

1 The championship game in the National Football League (American Football) in the USA.

7 sheets. Therefore, individual manager and player performance are both important aspects for a club to address. As for manager performance, Audas, Dobson & Goddard (2002), Kern & Süssmuth (2005) and Koning (2003) have all written articles that attend that matter. Furthermore, player performance is, amongst others, analysed in McHale, Scarf & Folker (2012), McHale & Szczepan'ski (2014) and Schultze & Wellbrock (2018). However, despite the findings in these papers, it can be questioned whether all professional football clubs have managed to maximize their sporting success given their budget. Sporting directors still tend to rely on their own intuition or that of a scout when deciding how and with whom to improve their first team’s squad. The reason for this may be that relying on the old scouting ways works well enough for them. However, another theory is that a model that brings team performance, player individual performance and managerial decisions together, is not yet present in the current literature. Therefore, this thesis aims to create a model that can optimize a team’s chances of fulfilling its sportive ambitions by adjusting their squad. Furthermore, it tries to obtain an answer to the question at what point the squad is vulnerable and needs improvement. In this process, the question of how and how much achieving sporting ambitions depends on the squad of a football club is central. Consequently, investigating which positions in the field are vital and how player quality influences a team’s result become relevant concepts. The approach on this matter will be that of a three-alternative ordered probit2 model. The starting point is a single latent variable that is either “win”, “draw” or “loss”. For 3734 matches, divided over ten seasons, in the Premier League3, the outcome of the match is explained by the difference in European Player Index4 (EPI) for every position, and difference in average squad age. The European Player Index assigns to every player in the dataset a value that

2 The three-alternative ordered probit model is more extensively explained in chapter 4. 3 The highest-level football competition in England. 4 The EPI index is developed by Hypercube and owned by Remiqz. Both are football data analytics companies located respectively in Utrecht and Amsterdam. This is further explained in chapter 3.

8 represents their footballing abilities. The index stands for the quality of a football player regardless of his position, making comparison between players at different positions possible. In the model developed in this thesis, the EPI of the left back of one team is compared to the EPI of the right forward of the other. This analysis is done from the point of view that having ascendancy at the most, or the most critical positions in the field is key in claiming match victory. Another perspective is that the difference in EPI of players at the same positions of both teams is key in winning the match. Thus, the EPI of the left back of the one team compared to the EPI of the left back of the other. To this extent, a model in which the EPIs of the left backs of both teams, the central backs, the right backs etcetera are compared, is also estimated. The model corrects for team specific home advantage, and the opponent that is faced. In order to be able to correct for home advantage, the outcome of the match is modelled for half of the matches in the dataset where teams played at home, and the other half of the matches where teams played away. Additionally, next to a model that describes the data, this article also presents a managerial tool for clubs to make transfer decisions. This tool is created using the estimated parameters from the probit model for match outcome. In general, the sportive goal of a football club is to attain a certain position (or finish within a certain range of positions) in the league table. For example, clubs like Manchester United, Manchester City and Chelsea may aim to become league winners. Whereas clubs like Swansea City, Stoke City and West Bromwich Albion can have avoiding relegation from the Premier League as sportive ambition. In the manager tool, given the squads of the opponents in the coming season, the expected number of points that a team will collect in this coming season is optimized over the player its own squad given the budget restriction of the club. Observing what number of points resulted in which position in the league table over the past decade, it can be estimated what number of points results in which position in the coming season. For a club, by estimating the amount of points it will collect, it will be possible to answer the question whether or not they are going to fulfil their sportive ambitions/goals in the coming season. An important side note is that because the manager tool takes the squads

9 of the other teams in the league as given, the estimated number of points that results from the optimisation are more relative to the other teams than absolute. Consequently, the tool can only be used for one club per season. The outline of this thesis is as follows. Chapter 2 gives an overview over the literature that is available on football team performance, individual player performance and manager performance. Chapter 3 explains the dataset, its variables and presents relevant descriptive statistics. Chapter 4 describes the model and estimation procedure. Chapter 5 elaborates on the results and analyses these results. In chapter 6, a summary of the thesis is presented. Additionally, a few points of discussion are talked about, and the implications of the findings are given.

2 Literature Review

As stated in the introduction, there has been done extensive research on sporting achievements in general. The influence of player performance, team performance and manager performance on the sporting ambitions of a football club have been thoroughly investigated in the past decades. This chapter gives an overview of the relevant papers regarding these subjects. Furthermore, the methods and findings of the most important papers are explained. Chapter 2 is organized as follows. It starts by addressing the sporting ambitions of football clubs in section 2.1. The general management of a football club and its revenues are discussed in section 2.2. In section 2.3, the individual wages of players and coaches are regarded. The relevant articles on football player transfers are discussed in section 2.4. Finally, chapter 2 will be summarized and discussed in section 2.5.

10 2.1 Sporting ambitions There has been a great deal of research regarding the performance of a sports team. In other sports like baseball, the performance and with that the statistics of teams have extensively been investigated. As early as in 1982, James (1982) introduced Sabermetrics: the mathematical and statistical analyses of baseball. Thereafter, a lot of statistical based baseball analyses books have been written. Dewan (2006), Albert & Bennet (2003), Keri (2006) and Lewis (2003) are just a few examples. The latter is a book based on a true and fascinating story of the Oakland Athletics and their road to victory. Based on Sabermetrics, their manager composed a team that broke the record of most consecutive wins in American League history. Anderson & Sally (2013) investigate football statistics. Questions like “How valuable are corners?” and “Which goal matters most?” are considered. As for sporting ambitions in football, to get success, the ambitions of a club must match the abilities of the squad. Furthermore, as a football squad cannot perform without a capable manager, his abilities must also be in line with the club’s ambitions. This section about sporting ambitions is divided into section 2.1.1, that contains literature on manager performance, section 2.1.2, that elaborates on squad performance, and section 2.1.3, that regards individual player performance.

2.1.1 Manager performance The ways of a manager are of course of influence on the performance of the team. Eventually, the eleven players on the pitch and substitutes have to finish the job. However, the manager is always responsible for the result. Koning (2003) evaluates the effect of firing a coach on team performance. Based on data from the Eredivisie5 from 1993 to 1998, he presents a model in which he controls for the difference in quality of opponents faced by the old and the new coach. The researcher finds that performance of a team does not always improve

5 The highest-level football competition in The Netherlands.

11 when a coach is fired (Koning, 2003, p. 561). Another paper that investigates the influence of discharging a manager is that of Audas et al. (2002). They estimate a model based on match-level data. The researchers find that a manager change within the season, results in worse short-term performances by the squad. Audas et al. (2002) then go on in explaining that it might have something to do with the fact that players have to adapt to the playing style of the new manager. They state that it may take up to sixteen matches (approximately three months) for a team to unlock its full potential after a within-season manager change (Audas et al., 2002, p. 644).

2.1.2 Squad performance As stated in the section before, in the end, the team has to do it. The performance of the squad is decisive for the result. On this matter, a wide variety of research has been done. McHale & Davies (2007) find that simply taking the FIFA world rankings6 to predict match outcome of international games, does not work well. These rankings do not adjust fast enough in order to reflect a team’s current performance. Audas et al. (2002) estimate the outcome of a football match by an ordered probit model. In their model, the latent variable is explained by the home team average win ratio over the recent seasons, the result of the recent home matches by home team and the result of the recent away matches played by home team. Furthermore, the model corrects, amongst other variables, for being the home or the away team, and geographical distance between the two clubs. The researchers find that most of the explanatory variables are significant. For one, the home team average win ratios of the past two seasons are significant. The match results of both teams up to three home and three away matches (i.e. approximately six matches in total), are also of significant influence on the estimated match outcome. Additionally, the variables match significance, and geographical

6 This ranking system was introduced in 1992 and aims to determine which footballing country is the best of the world based on recent results.

12 distance, are also significant. Next to Audas et al. (2002), Koning (2000) also creates an ordered probit model to predict the result of a game. In his model, match outcome is determined by home advantage and difference in quality between the two opposing teams (Koning, 2000). Another article that uses ordered probit to model match result is that of Kuypers (2000). Papers that have another way of handling football match results are present in the literature as well. Oberstone (2009) investigates team performance in the Premier League to distinguish the top clubs from the rest. McHale & Scarf (2007) create a bivariate model for home and away team shots on target, finding a negative correlation between the two. They find that “playing the beautiful game”7 is an effective strategy (2007, p. 444). Successful teams are characterized by their tendency to play beautiful football. Pollard (2006) presents, based on data of competitions between 1997 and 2003, a measure of home advantage per football league in the world. His main finding is that in the main leagues in Europe8, home advantage does exist. In these leagues, the home advantage is between 60% and 65%, that is, 60% to 65% of the points won between 1997 and 2003, were won at home. For the Balkan countries, where up to 78% of the points were won playing at home, home advantage is even more of an issue. On the other hand, countries like San Marino and Andorra, that have small football competitions and very small stadiums, do not seem to display any presence of an advantage of playing in the home stadium. As playing at home is a variable that can describe data as well as predict it, because it is known beforehand, home advantage is accounted for in the model of this thesis. Another paper that compares leagues from different countries is that of Oberstone (2011). The researcher presents in his article the main differences between

7 “The beautiful game” is characterized by a lot of passing and crossing. This type of football is considered to be joyful to watch. 8 The main leagues that are considered here are those of Spain, France, Germany, Italy, England and The Netherlands.

13 La Liga9, Serie A10 and the Premier League. The findings that are most relevant for this thesis are the following. The Premier League has a significantly lower percentage of shots on target than the Serie A and La Liga. This may be a consequence of the tighter man marking in the Premier League (Oberstone, 2011, p. 11). Furthermore, players in the Serie A have a passing accuracy that is significantly better than that of players in the other two leagues. Additionally, the Serie A has the highest percentage of successful tackles and both Serie A and the Premier League have a higher average number of tackles per game. Lastly, of the three leagues, the Premier League has the lowest number of fouls, yellow cards and red cards (Oberstone, 2011, p. 11). The results of Pollard (2006) and Oberstone (2011) give reason to develop different models for different countries. In this thesis, only a model for the Premier League is considered. However, as implied by Pollard (2006) and Oberstone (2011), the results in this thesis do not hold for other football competitions. Dobson & Goddard (2003) investigate if persistence in sequences of football match results is an issue in the Premier League. They conclude that there is no persistence for sequences of consecutive matches without a loss, and sequences of consecutive losses. However, the researchers find negative persistence for sequences of consecutive wins and sequences of consecutive matches without a win. Furthermore, Dobson & Goddard (2003) state that their results reveal little to nothing about the true existence of a persistence effect. This is due to a selection effect that is present in their model. Teams that have long sequences of not winning are generally the weaker clubs. Thus, their chances to not win again are not only based on their dry spell at this moment, but also on the fact that their team is weak. The model that this thesis aims to create is a predictive one. It does not solely focus on the first upcoming game, it predicts the outcome probabilities for all the matches in the coming season. Therefore, if the mood of a football team (i.e. if a team is experiencing a sequence of

9 The highest-level football competition in Spain. 10 The highest-level football competition in Italy.

14 consecutive wins or losses) is taken up into the model, as the season progresses, the estimation of the next match depends on the outcome of the past matches. Then, if these past matches were not forecasted right, the estimated mood after a few matches is not right. The estimation of the outcome probabilities of next match after this wrongly predicted sequence, could become really distorted. As the season progresses, this problem could get worse. Therefore, the mood of a football team is not regarded in the model of this thesis. The last paper that is discussed in this section is that of Carmichael et al. (2000). These researchers present a model for team performance per match. Their dependent variable is the observed team’s goals minus that of the opponent. The variables that this match outcome is regressed upon are, amongst many others, shots hitting the woodwork, clearances, blocks and interceptions, tackles, percentage successful passes, playing at home, and red cards. Also, a fixed effect per opposing team is added to the model. Except for the dummy variable for home advantage, and the fixed effect dummies, all variables are in differences between the opposing teams. The variables mentioned here are all found to have a significant effect on determining match outcome. Also, as expected, all variables except for red cards have a positive effect on match outcome.

2.1.3 Player performance With any professional sports team consisting of individual athletes, it is important to investigate sports performance on the individual level. The performance of the individual athlete has extensively been investigated. However, most of the papers about this subject regard athletes that compete in an individual sport, such as tennis, or golf. McHale & Forrest (2005) create a model to predict professional golf tournaments, whereas McHale & Morton (2011) investigate tennis match outcomes. The contributions of a player to the result of a match are much clearer if the sport played is an individual one. Still, research on individual player performance in team sports is available. The first index that rated players in a team sport regardless of their position was the EA Sports player performance index. McHale et al.

15 (2012) analyse the construction of this index. They explain that it is a weighted average of match contributions, winning performance, match appearances, goals scored, assists and clean sheets. Next to McHale et al. (2012), another article on individual player performance is that of Lewis (2005). In his paper, he presents a measure of player performance in cricket. Further, books about this subject in baseball have been written by Goldman & Kahrl (2010) and James & Henzler (2002). Sill (2010) wrote a book about an adjusted plus/minus (APM) metric in the NBA11. This APM metric starts with assuming that what matters most is a player’s contribution to the victories of the team. Additionally, it corrects for the teammates and opponents while the player is on the field. This APM system is also used in hockey, as described by Macdonald (2011), and, of course, in football. For example, Schultze & Wellbrock (2018) used data from the 2012/2013 Bundesliga12 season and created an individual player performance index that is built up as follows. Whenever a team scores, its players on the field get rewarded points. When the team concedes, points are subtracted. This model is than corrected for the strength of the opponent and the timing of the goal. If a goal was scored in crunch time (important goals), it is much more valuable than if it is scored in garbage time (outcome of the game is already decided). The identifying assumption in Schultze & Wellbrock (2018) is that all players on the field contribute equally to the result of the match, corrected for the minutes they played (2018, p. 122). Their plus/minus metric assigns a value or index to every player in the system based on their contributions to the results in the 2012/2013 season. The researchers state that “This shows that the plus/minus metric has a dual nature, as it can be used both as an evaluation tool for one team and as a scouting tool for another” (Schultze & Wellbrock, 2018, p. 125). A system that is similar to the plus/minus metric, but is more advanced, is the Euro Player Index. The EPI is used in the

11 National Basketball Association, the highest basketball league in North-America. 12 The highest-level football competition in Germany.

16 model of this thesis as the most important explanatory variable and it is explained in the next chapter.

2.2 Club management From the point of view of managing any organization, it is important to establish the relationship between the inputs used in production and their relative contributions to output (Carmichael et al., 2000, p. 31). In this approach, a football club is no more than an organization with inputs and outputs. Kern & Süssmuth (2005) state that “Of course most clubs still consider success on the pitch and the glory of victory as their main business objective” (2005, p. 486). Every football club in history is remembered by its great victories, not by their net profit. Consequently, if a football club is considered as an organization, it can boldly be claimed that the input is money, and the output is sporting success. Furthermore, Kern & Süssmuth (2005) state that “Clubs invest in players, coaches and management in order to succeed in the several competitions in which they take part and thereby increase revenue from the gate, broadcasting rights, merchandising and sponsoring” (2005, pp. 485- 486). Managing a football club can be seen as a continuing cycle of increasing revenues and investing those revenues in the improvement of the first team squad13. A better team will then (expectedly) perform better and increase profits, which can then again be invested to improve the squad. Stene (2016) develops in his paper a strategic management tool for managers in European professional football. He states that clubs nowadays thrive on a point maximizing mentality instead of a profit maximizing mentality. While Stene (2016) does believe that his

13 Not all revenues are directly invested in the first team squad. A football club often has a youth academy in which investments also need to be done. Naturally, these investments in the youth academy are also, however indirectly, aimed at eventually improving the first team’s results. However, in this thesis, the focus will be on directly improving the first team by means of buying and selling players.

17 tool provides managers with a better understanding of the problems they are facing, he concludes by stating that modelling a football club as a business in total is a complicated challenge. Kern & Süssmuth (2005) examine the economic output of a football club. The researchers use clubs from the Bundesliga to execute a pooled regression using the data of two seasons: 1999/2000 and 2000/2001. A Cobb-Douglas type production function is estimated with the log of the club’s adjusted total revenues (ln(REV)), as output. They find that participation in the Champions League14 has a positive influence on revenues. Off course, in their turn, increased revenues possibly have a positive influence on the probability of entering the Champions League, which may create endogeneity issues. Moreover, if a club has a big fanbase, they generally have a higher income. Kern & Süssmuth (2005) present results in which the logs of the ex-ante estimates of the wage bills of the players and that of the coaches, have a significant influence on ln(REV). In their final estimate, a 1% increase in player wages results in a 0.52% increase in revenues. For the wage of the coach, if that increases by 1%, the club’s revenues will, according to the model, increase by 0.27%. However, the researchers also estimate a model in which sporting performance is the dependent variable. For every team, this is a weighted aggregated point index based on up to four competitions in which the team can compete. The weights are determined using a difference in importance between the competitions (i.e. Champions League is more important than the national cup). Here, they find that the wage of players as well as that of the coach do not have a significant influence on athletic output. Kern & Süssmuth (2005) examined the economic output of a football club. They found that player and manager wages significantly influence economic output. The relation between the input in the football industry, money, and the output, sporting success, is complex. What factors are of influence on individual player salary, may give insight in this complex relation.

14 The biggest football club tournament in Europe.

18 2.3 Individual wages For the management of a football club, it is important to investigate the wages of coaches and players, and what variables influence these wages. Batré et al. (2008) present a model for football player wage. Their main objective is to find whether and to what extent performance influences salary. In their paper, performance is measured in terms of, amongst others, career games and goals, number of games played and goals in the last season, and a dummy for team captain. For the period of 1995 to 2007, the researchers estimate an equation for 1993 different players where ln(wage) is the dependent variable. They find that the variable that has the biggest influence is age. The positive influence of age on salary is also found in Lehmann & Schulze (2005), Feess, Frick & Muehlheusser (2004), Lucifora & Simmons (2003) and Huebl & Swieter (2002). Another finding is that players from South- America and Western-Europe receive a considerable pay premium in comparison to players from the rest of the world. However, this may be a consequence of their way of modelling. It is indeed more logical that a player gets payed based on their abilities and not their country of origin. Furthermore, a player’s position has a big influence on his wage. Forwards earn the most, then midfielders, then defenders, and goalkeepers earn the least of the squad. Because Batré et al. (2008) also estimate the effect of goals scored in the past years, this cannot be the reason that forwards have the highest salary, followed by midfielders. As for the influence of goals scored on player remuneration, the goals that are scored in the last season have a far greater influence than career goals, that is, recent performance is far more important than past performance. This also holds for the variable games played. The positive effect of player performance on salary is also found by Lucifora & Simmons (2003). They use a cross section from the Serie A to obtain that the number of games played, and goals scored have a significant positive effect on player wages. Lastly, Batré et al. (2008) find that a so-called “superstar effect”, is present. This is the effect that causes the wage of a player to increase because spectators come to the stadium or watch television just to see him play.

19 Battré et al. (2008) conclude by stating that their models explain player salary quite well from the various performance measures.

2.4 Transfers An important aspect for any sporting director is the transfer value of a desired player. In his negotiation efforts, he will always try to keep the price as low as possible. The selling party will do the opposite thing. The question becomes however, what determines transfer value? What aspects of a player make him more valuable than his colleague? A paper that investigated these questions is that of Eschweiler & Vieth (2004). They investigated 254 transfers in the Bundesliga from 1997 to 2003. The researchers find that factors that positively influence a transfer fee are, amongst others, age, not being a goalkeeper, the FIFA- coefficient of the country of origin and number of international caps. Eschweiler & Vieth (2004) find that age squared, and international caps squared negatively influence transfer fee. This indicates that as a player ages, the positive effect of age on transfer fee lingers. Carmichael, Forrest & Simmons (1999) use a Tobit model with transfer fee as dependent variable, for the estimation of the transfer fee of football players. These researchers find that variables that positively influence transfer fee are age, number of appearances for former and current clubs and number of goals (1999, p. 143). They also find that age squared negatively influences transfer fee. A more recent paper on football transfers is that of Ruijg & Ophem (2015). The researchers create a model that corrects for the selectivity problem that not all transfer fees are observed and thus that the used sample may not be random. In their estimates, they find that the most important variables that influence the transfer value are age, average minutes played and not being a goalkeeper (Ruijg & Ophem, 2015, p. 19). In conclusion, the important variables that are found by all papers to positively influence transfer fee, are age, playing matches and not being a goal keeper.

20 2.5 Summary and implications Chapter 2 gave an overview of the literature regarding football team performance, individual player performance and manager performance. Additionally, club management, individual wages and transfers were also regarded. In summary, the main implications that the literature has on the model of this thesis are the following. In section 2.1.2, it is found that an ordered probit model in predicting match outcome works well. An ordered probit approach in forecasting match result is also the approach of this thesis. However, the difference is that the researchers in section 2.1.2 explain match result based on the difference in teams as a whole, whereas this thesis explains match result based on the differences between the individual players of opposing teams. Furthermore, Carmichael et al. (2000) create a model in which they model team performance on various variables. Almost all their explanatory variables are in difference between the two teams. Combined with the assumption that player performance is very important in determining match result, this thesis uses differences in EPIs per position as main explanatory variables for match outcome. Additionally, this thesis presents different interpretations of explaining match outcome based on differences in Euro Player Index. A model in which direct opponents (left back against right winger) are compared is reviewed against a model in which players with the same positions are compared (left back against left back). Furthermore, a model in which just the EPIs of the players of the evaluated team are used as explanatory variables is created. Pollard (2006) finds that home advantage is not the same in the big European leagues but definitely something that influences match outcome. Therefore, the model of this thesis corrects for home advantage. The technical specification of the model is explained in detail in the chapter 4. First, in chapter 3, the datasets that are used in this thesis are thoroughly regarded.

21 3 Data and Variables

In this thesis an ordered probit model for match outcome is presented. Additionally, a model for player market value with EPI as main explanatory variable is estimated. These two models are then combined in the creation of a manager tool. For the estimation of the ordered probit model on match outcome, and for the estimation of the relation between EPI and market value, two different datasets are used. In this chapter, the variables in the two datasets are explained. First, in section 3.1, the most important variable in both of the datasets, the Euro Player Index, is explained. Then, in section 3.2, the dataset that is used for the estimation of the ordered probit model is regarded. Section 3.3 elaborates on the different team formations that were used in the different matches in the dataset. Finally, in section 3.4, the dataset that is used for the estimation of the relation between EPI and individual player market value is described.

3.1 The Euro Player Index In this thesis, the variable that is key in explaining and forecasting match outcome, is the Euro Player Index. This index is developed by Hypercube15 and used in their football analytical models. The construction of the EPI is, based on intel from Hypercube and Remiqz16, globally explained in this section. A precise description of EPI on model basis cannot be given as this information is confidential. To start off this explanation, a short definition of the European Club Index (ECI), also developed by Hypercube, is given. The Euro Club Index is a single value given to each club based on their recent performance. After each match the club has played, the index is adjusted based on the result.

15 Hypercube is, as mentioned before, a football data analytics company located in Utrecht. 16 Remiqz is, as mentioned before, a football data analytics company located in Amsterdam that works closely with Hypercube.

22 This adjustment accounts for the ECI of the opponent. If for example AFC Ajax, a club with a relatively high ECI, were to play against NAC Breda, a club with a considerably lower ECI, the index of AFC Ajax will not increase a lot if they win the match, as this was already expected based on the ECI of the two teams. On the contrary, if Ajax loses, their index will drop a great deal, because they then lost against a weaker club. Furthermore, the system also corrects for the competition a team is in. If a Premier League team loses against a team from La Liga, all the teams in the Premier League will get a small negative correction because their competition is now of lower level than before, in comparison with other competitions. Also, all teams in La Liga get a small positive correction. In July 2007, the Euro Player Index system started. Back then, because the ECI system was already operational, the EPI started by giving every player the starting value of the ECI of their club. After about a year, the indices of all the players were calibrated, and the EPI system was up and running. The Euro Player Index aims to assign a single value to each player in the system based on their footballing abilities. It is an incremental system that updates the EPI for each player after each game they played, and it works as follows. Before every game, the Euro Selection Index (ESI) is determined of both teams. The ESI is the average of the EPIs of the eighteen best players from that club at that particular time. Where the ECI is an index that represents the performance of the club in the past years, the ESI represents the ability of the current squad. Thus, for every game, the expected match outcome is determined based on this ESI. Match outcome is either one, if the home team wins, zero for a draw, and minus one if the away team wins. Given the historical results in matches with teams with similar ESI’s, corrected for home advantage, an expected result is established. This expected result is a value between minus one and one. As stated by Pollard (2006), the measure of home advantage is not the same for all competitions. Consequently, in this correction for home advantage, it is considered which competition the match is played in. Then, based on the predicted match outcome, the personal EPI, and the ratio of the personal EPI against that of his teammates, an expected value of each individual player is

23 determined. This individual expected value is, just as expected match outcome, also a number between minus one and one. While the match is played, the expected value of the match result changes depending on whether or not goals are being scored. If for example, before the match, the expected value of the result was 0.3, and no goals are scored, as the minutes pass, the expected value will linearly go to zero (a draw). However, if at 0-0 in the 85th minute, the home team scores, the expected match result will make a jump upwards and will then linearly go to one. The system that assigns to every player an individual expected value is implemented as to obtain different EPI changes for players in the same team. With this system, the EPI of the best players will rise slower and decline faster. For the EPI of the least players, the contrary holds. The change in EPI per player per match then depends on the change in his individual expected value between the times he stepped on and off the pitch. Furthermore, goals and their timing (how important was the goal?) are taken into account when determining the change in EPI. Additionally, assists, and yellow and red cards are also taken account for. Say for example, Matthijs de Ligt, started a match for AFC Ajax. While he was on the field, the score changed from 0-0 to 2-0 in favour of Ajax. If he is then substituted off in the 60th minute and Ajax loses the game with 2-3, this is not the fault of De Ligt. While he was on the field, the chances of Ajax winning the game went up. Therefore, his personal contribution to the match outcome is positive and his EPI goes up. With EPI, a system is created in which all players get assigned an index, which are, regardless of their positions, comparable to each other. So, the footballing ability of a central back can, based on the EPI, be compared to that of a left forward. To summarize, the change in Euro Player Index for a particular player in a particular match depends on his contribution in trying to favourably change the outcome of the game. As stated before, since EPI is owned by Remiqz, and with that confidential, the precise explanation of how the construction of the EPI works in terms of models cannot be given here.

24 3.2 Players in the ordered probit dataset The dataset that is used in this thesis for the estimation of the ordered probit model is gathered by Gracenote17. It was then delivered to Remiqz via Hypercube. The dataset contains information about 3734 matches in the Premier League. The matches were played in the seasons 2008/2009, 2009/2010, … , 2017/2018. In these ten seasons, 380 matches were played every season. In the last season, 2017/2018, the last 66 fixtures were not yet played when this dataset was created. Consequently, these matches are not in it. After deleting matches that were not fully documented, 3729 matches remained. These matches were played by 36 unique teams, consisting of 1867 unique players. Of every match, the two opposing teams, the stadium, and the final score are known. Additionally, for every match, the players that played in that match are known. That is, there is data only about the players who were on the pitch during some point in the match. Nothing is known about bench players that did not make an appearance. Of the players that were in the starting eleven or were substituted on, their age, their position and the number of minutes they played are known. The players that were substituted on have the position label “SUB”. Thus, of those players it is not clear which position they played. While it is possible to just give the substitute the position label of the player they replaced, it can be argued that this implicitly assumes that a substitution is always done because the player in the field performs badly. It can be questioned if this is always the case. A manager can choose to make a tactical change in his team. For example, when in the last phase of the match, he wants to favourably change the score by replacing a defender by an attacker. Then, labelling the player that is substituted on as defender, is wrong. Furthermore, of all the players, EPIs before and after the match are known. Since, as explained in section 3.1, the EPI system is incremental, the EPI of a player is never the same before and after the match.

17 Gracenote is an American music, video and sports metadata provider.

25 Table 1 contains the relevant descriptive statistics of the ordered probit dataset. The first thing that stands out is the minimum EPI of -274.29. As can be deducted from the average EPI and its standard deviation, for a player in the Premier League, this is extremely low. However, it is not likely to be a representative value for this player’s qualities at that time. For players that are not yet in the system, it takes some time to calibrate their EPIs. The starting value of this particular player (-274.29) is amongst other things based on the ECI of the club he is from. Thus, if he transferred from a very bad club, to a club in the Premier League, he probably was one of the better players in his former team. Nevertheless, his starting value EPI is based on his former club, and therefore very low. However, after one

26 match, his EPI was 1026.53. This value, as adjusted after one match, probably reflects his footballing abilities better already. The next thing that sticks out is that defenders have a somewhat lower average EPI than midfielders and attackers. Though, the standard deviation of their average EPI is lower. Furthermore, the most players in the dataset play in the central midfield. With the highest average EPI of all the positions in the defence and midfield, the central midfielders seem to have an important role in most teams. The lowest average EPIs are those of the right and the left backs.

3.3 Matches in the ordered probit dataset In the 3729 matches in the dataset, teams played in seven different formations. The formation that is most used (2936 times), is 4-5-1. This means that the team plays with four defenders, five midfielders and one striker. The next most used one (2753 times), is 4-4-2, with four defenders, four midfielders and two strikers. With 1221 and 262 times, the third and fourth most used formations are, respectively, 4-3-3 and 3-4-3. The three least played formations are 3-5-2 (197 times), 5-3-2 (56 times) and 5-4-1 (33 times). It is hard to realistically compare players per position of two teams that play very different formations. As it is considerably arbitrary, few sensible things can be said about which players are direct opponents of each other. To make realistic comparisons possible between two teams on the pitch, matches in which one of the teams used one of the three least used formations, are dropped. This leaves a dataset of 3459 matches with only 4-5-1, 4-4-2, 4-3-3 and 3-4-3 as used formations. These formations are respectively clarified in Figure 1, Figure 2, Figure 3 and Figure 4. Subsequently, the formations 4-3-3 and 3-4-3, which are considerably less frequently used than 4-5-1 and 4-4-2, are written as if they were 4-5-1. This is done as to make

27 comparison between teams with the different formations less complicated. For the 4-3-3 formation in Figure 3, the Left Midfielder (LM) and the Right Midfielder (RM) are transformed into Central Midfielders (CM). Also, the Left Forward (LF) and the Right Forward (RF) are transformed into LM and RM, respectively. Lastly, the Central Forward (CF) becomes the Striker (ST). This former 4-3-3 formation is now transformed into a 4-5-1 formation. For the 3-4-3 formation in Figure 4, the Central Back (CB) becomes a CM, and the Left Back (LB) and the Right Back (RB) become CB. Furthermore, the LM and RM become LB and RB, respectively. The LF and RF respectively become LM and RM while

28 the CF is again transformed into an ST, resulting in a 4-5-1 formation. After these transformations, the dataset consists of 4241 starting formations 4-5-1 and 2677 starting formations 4-4-2. Naturally, the decisions made in these transformation processes are somewhat arbitrary. It can be questioned whether writing formations as other, comparable formations, is the right way to go in representing the actual match events.

3.4 Relation between EPI and market value dataset The dataset that is used in this thesis to estimate the relation between EPI and market value of football players is gathered by Hypercube, from whom Remiqz received it. The dataset consists of 5259 market value estimates in different points in time of 1769 unique players from 35 different Premier League clubs. Of these players, the age and EPI at time of the estimation of the market value is known. Furthermore, the club for which they played at that moment is also known. These estimates cover a period from 2008 until 2017. Table 2 contains the relevant descriptive statistics of the EPI and market value dataset. The estimated market values go from 25 thousand to 80 million. With an average of 6.47 million and a standard deviation of 8.36 million, there seems to be a great variation in estimated market values. Figure 5 gives a visual representation of the relation between EPI and logarithm of the estimated market values. The red line is a best fit third-degree polynomial. This best fit is, while being a third-degree polynomial, almost a straight line, indicating that the relation

29 between the logarithm of market value and EPI is close to linear. With that, the relation between market value and EPI is likely to be exponential.

4 Model Specification

The specifications of the model for match outcome and the model for the market value of football players are explained in this chapter. Additionally, the manager tool is described. Match outcome is estimated with an ordered probit model. Three versions of the ordered probit model, all with different explanatory variables, are described in this chapter. The outline of this chapter is as follows. Section 4.1 clarifies the ordered probit model for match outcome. The three different versions of explanatory variables are explained in section 4.2. Then, in section 4.3, the marginal effects for the match outcome model and the Wald

30 specification tests are elaborated on. Lastly, the manager tool that this thesis aims to provide and the model for market value are described in section 4.4.

4.1 Ordered probit model for match outcome In this section, the basis model for match outcome is described. In this model, match outcome for the evaluated team is the latent variable. This latent variable in the three-alternative ordered probit model is either “win”, “draw” or “loss”. The starting point model for the match outcome variable is the following.

∗ � = �� + � for i = 1, ... , N football matches The three-alternative ordered probit model is then created.

∗ � = � if � < � ≤ � with j = 1, 2 or 3

Here, j = 1 stands for a loss, j = 2 for a draw and j = 3 for a win. Then, with � = −∞ and

� = ∞, the probability that the evaluated team in match i gets match outcome j is determined as follows.

∗ � ≔ �[� = �] = �� < � ≤ � = �� − �� − �(� − ��)

Where F is the CDF of �, the standard normal CDF. Furthermore, three binary variables, for each observation in y, are introduced.

1 �� = � � = 0 �� ≠ �

Finally, the parameters �, � �� are estimated from maximizing the following log- likelihood.

ln(�) = � ∗ ln�

31 For the 3459 matches in the dataset, the evaluated team is alternatingly chosen to be the home or the away team. So, for half of the matches (1730), the outcome of the match is evaluated from the home team’s perspective, and for the other half (1729), the outcome is evaluated from the away team’s perspective. The outcome of the match is then explained by variables (�) that are different for every version of the model. However, some control variables are used in every model. It can be reasoned that a match for a team that is battling against relegation, played against a top team, is mentally a very different match than against another relegation candidate. In most cases, it is not weird for a small club to lose against a top club, whereas the players of the small club are expected to at least draw against another small club. To this extent, for every team in the dataset, the opponent of the evaluated team is controlled for by a dummy variable that takes the value one if the opponent of the evaluated team is that particular team, and zero otherwise. Moreover, as mentioned in section 2.1.2, Pollard (2006) finds that home advantage in the Premier League is present and has to be accounted for in predicting match outcome. It can be argued that home advantage is not the same for Manchester United as for AFC Bournemouth. The first club plays their matches at Old Trafford, that has more than 75.000 seats, while the latter plays at Dean Court and can have a maximum support of around 11.500 fans in the stadium. Consequently, the three versions of the model all control for a playing at home dummy per team. This is a dummy variable that is equal to one for the home playing team in that match.

4.2 Three versions of the model In this section, the three versions of the ordered probit model for match outcome are described. They are different in their view on match events and with that in the explanatory variables. The first model that is regarded is the “Difference in EPI per position” model. Here, the explanatory variables are constructed as follows. This model takes the point of view that the difference in EPI per position is key in explaining match outcome. Thus, that it matters which team has the best left back or the best striker. To this extent, every player in

32 the evaluated team is compared to the player of the opponent that plays in the same position. For example, the left back of the one team as opposed to the left back of the other team. Note however that in the formations in Figure 1 and Figure 2, some players have the same label. The data cannot distinguish between the two central backs, the two or three central midfielders and the two strikers. Thus, for example, it is not clear which one of the central backs directly takes on the striker and which one supports the entire defensive line. Consequently, it is not correct to simply compare a random central back of the evaluated team to a random central back of the opponent. Hence, the EPIs of players with the same label in the dataset are pooled together into one average. For both the 4-5-1 and the 4-4-2 formation, this results in eight pooled EPI values per team. Namely, goalkeeper, left back, central backs, right back, left midfielder, central midfielders, right midfielder and striker(s). The first explanatory variable for the first model is then the EPI of the goalkeeper of the evaluated team minus the EPI of the goalkeeper of the opponent. The second is the EPI of the left back of the evaluated team minus the EPI of the left back of the opponent. This is then preserved for the entire team as to obtain eight explanatory variables. Additionally, most teams bring in at least one substitute in every match. Because not all substitutions play the same number of minutes, a weighted average of their EPIs is taken. As mentioned before, of players with the label “SUB”, it is not clear which position they played in the field. So, the explanatory variable that is added to the model is simply the weighted substitution EPI of the evaluated team minus the weighted substitution EPI of the opponent. For the 43 matches in the dataset where at least one of the two teams did not bring in a substitute, this “SUB-SUB” variable is set equal to zero. Furthermore, of both teams in every match, the average age is determined. The last explanatory variable that is added to the model is the average age of the evaluated team minus the average age of the opponent. Also, the control variables, as explained in section 4.1, are added to the model. The second model is the “Difference in EPI direct opponents” model. Here, the angle is taken that the difference in EPI between direct opponents on the field is crucial. So, if every

33 direct battle on the field is won by one team, that team is likely to win the match. Thus, players of both teams that play the same area on the field and will therefore have a lot of direct confrontations, are compared. The three combinations of formations that can play against each other are 4-4-2 against 4-4-2, 4-4-2 against 4-5-1, and 4-5-1 against 4-5-1. For all these combinations, players on the pitch are compared as follows. Since keepers do not have direct battles with a particular player of the opponent, a good comparison with a field player will likely not exist. Therefore, the first explanatory variable is the EPI of the keeper of the evaluated team against the EPI of the opposing keeper. For the next variable, a direct comparison with an opposing field player is possible. This variable is the EPI of the left back of the evaluated team minus the EPI of the right midfielder of the opposing team. For the other side of the defence, the EPI of the right back of the evaluated team minus the EPI of the left midfielder of the opponent is taken as explanatory variable. The other explanatory variables are the EPI of the central backs of the evaluated team minus the EPI of the striker(s) of the opposing team, the EPI of the left midfielder of the evaluated team minus the EPI of the right back of the opposing team, the EPI of the central midfielders of the evaluated team minus the EPI of the central midfielders of the opposing team, the EPI of the right midfielder of the evaluated team minus the EPI of the left back of the opposing team and, finally, the EPI of the striker(s) of the evaluated team minus the EPI of the central backs of the opposing team. Additionally, the EPI difference of the substitutions and the difference in average age are added to the model in the same way as in the “Difference in EPI per position” model. Also, the control variables are added as explained in section 4.1. The explanatory variables in this version of the model are only logical because, as stated in section 3.1, EPIs of players are, regardless of their positions, comparable to each other. Because the explanatory variables in these first two models are in difference between the two teams, the models are restrictive in its parameters. However, the idea of these specifications is to see whether the difference in EPI between players with the same position in the field, or opposing players, significantly influences match outcome. Because the ordered probit model is not linear, it can be

34 questioned whether it is possible to simply use the EPIs of all players on the field as sole explanatory variables, take the difference in the estimated coefficients and then test for simultaneous significance. The model in which the sole effect of the EPIs of the evaluated team are used as explanatory variables is considered next. The third and last model is the “EPI evaluated team” model. The perspective that only the players of the evaluated team make the difference between winning and losing is taken in this model. The explanatory variables in this model are simply the EPIs of every position and the weighted EPI of the substitutions. The EPIs of the players of the opponent are not considered in this version of the model. Only the dummy control variable for the opponent is used here. Moreover, average age of the evaluated team and the control variables, as explained in section 4.1, are also added.

4.3 Specification tests In this section, the tests and further analyses that are executed on the models from section 4.2, are explained. First, the three versions of the ordered probit model on match outcome are compared to each other based on their pseudo R-squared value. The R-squared that can be used to express model performance for logistic regressions is McFadden’s pseudo R-squared (McFadden, 1974). This pseudo R-squared depends on the log-likelihoods of the model without any covariates, and the model as estimated. It is defined as follows.

( ) ln � � = 1 − ln(�)

Here, � is the likelihood of the model that is estimated and � that of the model with no predictors. Clearly, this R-squared is not the same as that of an OLS regression. Where an R- squared with OLS says something about the proportion of variance that is explained by the covariates, McFadden’s R-squared is not to be interpreted in the same way. Consequently, with the interpretation of this R-squared, great care is advised. The models in this thesis are compared to each other based on McFadden’s R-squared but no conclusions are drawn based

35 on it as to the degree of clarification of these models. On the model that performs best, the following analyses are executed. Wald tests for simultaneous insignificance of some variables are executed. If the tested variables appear to be simultaneously insignificant, they are removed from the model. Then, the model is estimated again with only the variables that have not been removed after the Wald tests. Of the variables in this final model, the marginal effects are determined and analysed.

4.4 Manager tool The ultimate goal of this thesis is to create a manager tool. This tool can be used by sporting directors and football managers to determine how to divide their budget more efficiently across their squad. In this sense, efficient means that with a certain budget, the probability for a club to attain their sporting ambitions, is maximised. For this manager tool, the final model as described in section 4.3, is used. The construction of the tool is addressed in this section. After the executed tests and analyses described in section 4.3, a final model is obtained. In this final model, the thresholds and coefficients of the explanatory variables are estimated. Given the squads of the evaluated team and the opponent, the probability for each match outcome can then be determined as follows.

∗ �(�) ≔ �[� = �] = �� < � ≤ � = �� − �� − �(� − ��)

Here, the thresholds (�, for j = 1 or 2) and coefficients (�) are known from the estimation of the final model. Using these probabilities, the basis manager tool looks as follows.

ℒ�(�), � = ∑[1 ∗ �(�) + 3 ∗ �(�)] − ��(��) − ��(��)

In this manager tool, the aggregated weighted probabilities of collecting points during the coming season are optimised over the EPIs of the players in the evaluated squad, and lambda.

36 In the tool, � is a vector that is a function of the EPIs of all the positions in the evaluated squad and the average age of the squad. These EPIs and the average age are variables that are not yet known because they will result from the optimisation of the Lagrange function.

Furthermore, � also depends on the EPIs of the players of the opponent in match i, and their average age. The precise format of � is not yet determined in this chapter. It depends on what model specification performs best, and what positions are significant in determining match outcome. The results in chapter 5 will lead to a definitive form of �. The variable

��(��) is not yet known at the start of the optimisation, as it depends on variables over which the Lagrange is optimised. However, the variable ��(��) is known at the start of the optimisation, as it is simply the average of the EPIs of the starting eleven in the current squad of the evaluated team. The reason that the probability of winning is multiplied by three, is that it is assumed that since winning a match yields three points and drawing yields one, any team likes winning three times as much as drawing. The constraint of this basis model, regarding the average EPIs of the current and the needed squad, is added so that the club’s budget is in a way accounted for. Given that a football club does not get a sudden cash flow impulse, they can only distribute their financial resources better across the squad. However, the model above assumes that using the average EPI of the current squad as upper bound for the average EPI of the needed squad, represents a relevant budget restriction. Whereas, it may be more reasonable to estimate the market value of the players in the current squad from their EPIs and use that as upper bound for the market value of the players in the needed squad. Then, that upper bound can be used as budget constraint. With that, the evaluated club can distribute their actual budget better across the squad, rather than a better distribution of EPI points. Figure 5 indicates that the relation between player market value and EPI is likely to be exponential. Therefore, an OLS model with the logarithm of player market value as dependent variable, is estimated. In Figure 5, a third-degree polynomial is fitted through the data. As this visually seems like a good specification, the starting explanatory variables of the logarithm of player market value are ��, �� and

37 ��. After a model for market value as explained by EPI is obtained, the restriction in the Lagrange function is adapted according to the estimated relation. The improved version of the manager tool then looks as follows.

ℒ�(�), � = [1 ∗ �(�) + 3 ∗ �(�)] − �(�� − ��)

Here, the market value that is needed for the new squad (��) depends on the EPIs over which the entire Lagrangian is optimised. It is the aggregated value of the estimated market value of every player that is needed for the squad of the coming year. The aggregated market value of the players in the current squad (��) depends on their EPIs, which are known at the time of the optimisation.

5 Results and Analysis

In this chapter, the results for the models as explained in chapter 4 are presented and analysed. The results for the three different ordered probit models are given and the best model in terms of pseudo R-squared value is selected. Then, on this selected model, different Wald tests for simultaneous insignificance are performed. With the variables that appear significant after the Wald tests, a final model is estimated, and the marginal effects are computed. Additionally, this chapter presents the results for the OLS estimation of market value using EPI as explanatory variable. Lastly, the final version of the manager tool is presented. The outline of this chapter is as follows. In section 5.1, the results for the “Difference in EPI per position” model are presented and analysed. The results for the “Difference in EPI direct opponents” model are presented and analysed in section 5.2. Then, in section 5.3, the results for the last model, the “EPI evaluated team” model, are considered. Section 5.4 elaborates on the model selection and the Wald specification tests. Then, in section 5.5, the final specification of the ordered probit model is presented. Additionally, the marginal effects of

38 the variables in the final ordered probit model are given and analysed in section 5.5. Furthermore, the results of the OLS estimation for player market value and the final manager tool are presented in section 5.6. Also, in section 5.6, a summary of chapter 5 is given.

5.1 “Difference in EPI per position” results In this section, the results of the first version of the model, the “Difference in EPI per position” model, are described and analysed. Table 3 presents the results of this model. It is an ordered probit with match outcome of the evaluated team as latent variable. Here, match outcome is either “win”, “draw” or “loss”. The explanatory variables in this model are based on the difference in player quality of the opposing teams of players that are on the same position in the formation. So, the variable GK-GK is the EPI of the keeper of the evaluated team minus the EPI of the keeper of the opponent. This also holds for CB-CB, LB-LB, etc. The estimated coefficients and standard errors in, respectively, the first and the second column, result from an estimation in which the model does not correct for either the opponent or a team specific home advantage dummy. The model from which the estimates in the third and fourth column result, does control for these factors. Since the ordered probit model is not a linear one, the actual estimated coefficients are not straightforwardly interpretable. Therefore, only the significance of the estimates is considered. In both the models that do and do not control for home advantage and the opponent, the only estimated coefficients that are significant are those of CM-CM, LM-LM and SUB-SUB. These results imply that in most Premier League matches, the outcome is determined by quality of the midfielders and the substitutions. The team that wins that “midfield” battle is likely to come out on top. The formations that all teams in the dataset play can only be 4-4-2 or 4-5-1. In both these formations, the midfield is heavily occupied as compared to the third most used formation, 4-3-3. From this point of view, for both formations 4-4-2 and 4-5-1, having respectively two and three central midfielders, the result that the central midfield is rather important in determining match outcome, is a logical result.

Furthermore, the fact that left midfielders have a significant estimated coefficient, can be a bit misleading. From chapter three, it is clear that the LF’s in the 4-3-3 and 3-4-3 formations, become LM’s when these formations are transformed into 4-5-1. Additionally, in the 4-5-1 and the 4-4-2 formations, it can be argued that left midfielders play a role in the team’s attacks. As they do not have a left forward in front of them, they are likely to give more than

40 a few crosses (high passes from the side of the field to the box), and with that be important in the goal scoring process. According to the model, having a better left midfielder than the opponent increases your chances at a positive result. It is not clear why the right midfielders do not play a significant role in determining match outcome. Lastly, the weighted average EPI of a team’s substitutions as opposed to that of their opponent, is important. Apparently, the substitutions as determined by the manager, are likely to make the difference in the outcome of a match. The pseudo � that is given in Table 3, is McFadden’s R-squared (McFadden, 1974). The interpretation of the values of the pseudo �s in Table 3, and the comparison between these values for the other model specifications, are considered in section 5.4.

5.2 “Difference in EPI direct opponents” results In this section, the results of the second version of the model, the “Difference in EPI direct opponents” model, are described and analysed. Table 4 presents the result of this model. The model in this section only differs from the model in section 5.1 in its explanatory variables. Here, these variables are based on direct confrontations on the pitch. The central backs of the evaluated team will likely have a lot of personal duels with the striker(s) of the opponent. Hence, the variable CB-ST is the average EPI of the central backs of the evaluated team minus the (average) EPI of the striker(s) of the opponent. The other variables are constructed in the same way. The only estimated coefficients that are significant in both the models that do and do not control for team specific home advantage and the opponent, are those of CM-CM, LM- RB, and SUB-SUB. Additionally, the difference in average age between the two teams is

41 significant on the 10% level in the model that does not correct for home advantage and opponent. The negative sign of this coefficient indicates that having a younger team positively contributes to match outcome. The significant estimate of the coefficient of the variable CM-CM is again an indicator that the difference in footballing ability of the central midfielders of the two teams is often decisive in determining match outcome. Additionally,

42 if the left midfielder of the evaluated team has a higher EPI than the right back of the opposing team, the chances at a positive result for the evaluated team are greater. As mentioned before, in the pool of left midfielders, also left forwards are included. Additionally, the left midfielders that are originally classified as such do not have a teammate in front of them. Therefore, they could be participating in a significant number of attacks as a left forward in disguise. The difference in weighted average of substitutions EPI between the two teams is, just as in the model in section 5.1, likely to be decisive in match outcome. The pseudo � values are considered in section 5.4.

5.3 “EPI evaluated team” results In this section, the results of the third version of the model, the “EPI evaluated team” model, are described and analysed. Table 5 presents the result of this model. Here, the explanatory variables are the EPIs of the players of the evaluated team. With that, the sole effect of position specific EPI contribution to match outcome is estimated. The estimated coefficients of the EPI of the goalkeeper and that of the defence are not significant. However, because only Premier League matches are in the dataset, this set is submissive to a selection effect. The EPIs of goalkeepers and defenders in the Premier League are generally not of an exceptionally low level, just because they play in the Premier League. Concluding that, due to the insignificance of these variables, it does not matter what EPI defenders and goalkeepers have in deciding match outcome, may not be right. The estimated coefficients of the central and the left midfielders are significant in both the corrected and the not corrected model up to at least a five percent significance level. Regardless of the direct opponent or the opposing player with the same position, these positions are still key in determining match outcome. Additionally, the coefficient of the weighted average EPI of the substitutions is also significant in the corrected and the not corrected model. The pseudo � values of Table 5 are considered in section 5.4.

5.4 Model comparison and tests In this section, the three models are compared in terms of their pseudo � value. On the model that performs best, a number of Wald tests for simultaneous insignificance is performed. Variables that are simultaneous insignificant are not used in the estimation of the final model.

44 In every specification of the model, the pseudo R-squared values of the models that do not correct for team specific home advantage and the opponent are lower than the pseudo R-squared values of the models that do. Thus, the best performing model is one that does correct for team specific home advantage and opponents. Table 3, in which the “Difference in EPI per position” model is considered, presents an R-squared value of 0.1024. In Table 4, the “Difference in EPI direct opponents” model performs only slightly less good, with an R- squared of 0.1022. The last one, the “Difference in EPI evaluated team” model performs the worst with 0.0999 as R-squared value. As mentioned before, these values do not represent the degree of clarification in the model in the same way as an R-squared of an OLS regression would. They are merely used to compare the models in this thesis to each other. With the “Difference in EPI per position” model that corrects for team specific home advantage and the opponent slightly outperforming the other models, the Wald tests for

45 simultaneous insignificance are executed on that particular model. From Table 3, it is clear that the only variables that are significantly influencing match outcome, are CM-CM, LM- LM, and SUB-SUB. In this section, it is tested whether certain groups of insignificant variables are simultaneously equal to zero. Table 6 presents the results of these tests. First, it is tested if the differences in EPI per position in the defence (central backs, left back and right back), are simultaneously equal to zero. With a 44.27% chance of observing a chi-squared random variable with three degrees of freedom that is as least as extreme as 2.69, these variables indeed have a simultaneous insignificant effect on determining match outcome. Then, it is tested if the defence and goalkeeper differences between the two teams are simultaneously insignificant. Here too, the null hypothesis of simultaneous insignificance of these variables cannot be rejected. Also, of all the difference in EPI variables that are solely insignificant, central back, left back, right back, goalkeeper, right midfielder and striker, it cannot be concluded that they are not simultaneously insignificant. Lastly, the difference in age variable, that is solely insignificant, is added. This does not lead to rejecting the null hypothesis. The results in Table 6 lead to the removal from the model of the variables GK- GK, CB-CB, LB-LB, RB-RB, RM-RM, ST-ST, and AGE DIFF. Section 5.5 elaborates on the estimation of the final version of the ordered probit model and the marginal effects of the remaining variables.

5.5 The final model In this section, the final ordered probit model is presented. Additionally, the marginal effects of the explanatory variables in the final model are determined. Table 7 presents the estimation results of the model that resulted from section 5.4. The variables that were found to be simultaneously and solely insignificant in determining match outcome, are dropped. This results in an ordered probit model in which match outcome is determined by the difference in EPI of the central midfielders, the left midfielders and the substitutions between the evaluated and the opposing team. The estimated coefficients of these variables are all positive

46 and significant up to a one percent significance level. Their positive sign means that if the evaluated team has a better player or better players at that position, their winning chances grow with the difference. Because substitution is not really a predefined position, this coefficient is not interpreted in terms of position in the field. It is merely a sign that the team with the better bench players, or substitutions, is more likely to obtain a positive match result as the EPI difference between their substitutions and that of the opponent becomes larger. The pseudo R-squared value of the final model is 0.1015, which is marginally smaller than the model in Table 3. As mentioned before, the estimated coefficients in an ordered probit model specification are not easy to interpret. That is, the estimated coefficients do not directly say something about the outcome probabilities of a certain match. It can be more interesting to obtain marginal effects for every variable and for every match outcome, as marginal effects are more straightforward interpretable. These marginal effects give an interpretation to how the outcome probabilities change with a change in explanatory variables.

47 Table 8 presents the marginal effects for the final model specification. The marginal effect of -0.0000436 that is estimated for the variable CM-CM and the match outcome “loss”, indicates that if the evaluated team has an average central midfield that is one EPI point better than the opponent, they are 0.00436 percentage points less likely to lose the match. For all possible match outcomes, the variable CM-CM has the most influence on the change in probability to obtain a certain outcome, followed by LM-LM and SUB-SUB. The latter has a considerably lower marginal effect on match outcome than the first two. For all variables, the logical conclusion can be drawn that investing in better players at these positions leads to increasing win probabilities and decreasing draw and lose probabilities, given that the opponent’s squad stays the same. The implementation of the final ordered probit model in the manager tool and the extra constrains that the final model imposes on this tool, are considered in section 5.6.

48 5.6 Manager tool and market value estimation results In this section, the model for market value of players is presented. EPI and its square and cube values are used as explanatory variables in an OLS regression for market value. Then, using this regression results, the final version of the manager tool is presented. The restriction that is implemented in the manager tool that this thesis presents, is to make sure that only a different distribution of capital in the squad can be obtained. So, if for some position a player with a higher EPI is needed, a player at another position must be sold to gain the capital for the new transfer objective. With that, the tool assumes that a large capital injection in the transfer budget is not applicable (without making the necessary adjustments). Market value of a player is not a predefined price tag. It is possible for clubs to disagree on the value of a player. Consequently, it can be argued that an estimation of the market value based on a player’s footballing abilities instead of subjective opinions is the better option. In Table 9, the results of a model for market value with EPI, and its square and cube values, as explanatory variables, are presented. The estimated coefficients of the variables �� and �� are significant up to a one percent significance level. The estimated coefficient of �� is significant on the ten percent significance level. The R-squared value indicates that 62.79% of the variation in market value is explained by the model. The estimated coefficients in Table 9 are used in the budget restriction in the final version of the manager tool. In the first version of the manager tool, as proposed in section 4.4, all positions in the field are used in set of variables over which the Lagrange function optimises. However, from the results in section 5.4, it became clear that some of these positions do not have a significant effect on match outcome. Therefore, section 5.5 presents the final estimated probit model in which only the difference in EPI values between the two opposing teams of the central midfielders, the left midfielder and the substitution(s) are used as explanatory variables. However, as shortly mentioned in section 5.3, the dataset that is used in this thesis is

49 submissive to a selection effect. While the results of section 5.4 indicate that the quality of some players in the squad as opposed to the opponent does not matter, it can be argued that from a certain low point, it does matter how bad a player is. If a Premier League manager decides to put a random left back from the fourth division in England in his starting eleven, it can be questioned whether the quality of that left back as opposed to the quality of the opponent’s left back still makes no difference in determining the outcome of the match. The insignificance of some estimated coefficients in the regression results of Table 3, may be caused by the fact that players in the Premier League seldomly have an extremely low EPI value. They are almost always better than a certain lower bound when it comes to footballing abilities. Therefore, just leaving the EPIs of the players that seemingly have no influence in determining match outcome out of the manager tool, may not lead to a relevant Lagrange optimisation. The restriction that these players still have to be better than a certain lower

50 bound has to be accounted for in the optimisation process. As lower bound for every position, the fifth percentile of all the EPI values in the dataset of every position is taken. These lower bounds are presented in Table 10. Additionally, for reference, the first and tenth percentile are also reported. It can be argued that the lower bound for a right back at Manchester United is higher than the lower bound for a right back at Huddersfield. Consequently, with the implementation of the manager tool at a specific club, the lower bound can be altered to comply with the wishes and intentions of that club. The fifth percentile as lower bound is just to give an idea how the manager tool could be used in practice. The final version of the manager tool uses the estimated coefficients from the ordered probit regression in Table 7 and the estimated coefficients from the OLS regression in Table 9. Additionally, this final version uses the fifth percentiles in Table 10 as lower bound for the insignificant positions. The final version of the manager tool looks as follows.

ℒ�(�), � = [1 ∗ �(�) + 3 ∗ �(�)] − �′�

51 Here, � is the probability that match i ends in result j for the evaluated team. For j = 2, the match ends in a draw, and for j = 3, the match is won by the evaluated team. These probabilities depend on EPI values of the players of the two opposing teams. In these probabilities, the estimated coefficients and thresholds from Table 7 are used. � Is then determined as follows.

∗ �(�) ≔ �[� = �] = �� < � ≤ � = �� − �� − �(� − ��)

Here, � is element j in the vector of estimated thresholds � and F is the cumulative distribution function of the standard normal distribution. The vectors �, � and � are defined as follows.

−∞ �� − �� −0.9627 0.0001359 � = � = ⎛ �� − �� ⎞ � = 0.0001348 −0.1845 �� − �� 0.0000998 ∞ ⎝ ⎠

The EPI values of the opponents in the matches of the coming seasons are those of the players that have the highest EPI in that position of the opponent’s squad. Consequently, the optimisation gives the best results if the squads of the other teams in the competition are final. So, the closer to the transfer deadline, the better results the manager tool will provide. Furthermore, � is the vector of constraints that the model is submissive to with � being the vector of Lagrange multipliers. Both vectors look as follows.

�� − �� − �� ⎛ ⎞ � �� − �� ⎛ ⎞ ⎜ ⎟ ⎜ �⎟ �� − �� = ⎜ ⎟ � = ⎜ �⎟ ⎜ �� − �� ⎟ � ⎜ ⎟ ⎜ ⎟ �� − �� ⎝ �� − �� ⎠ ⎝ �⎠

52 The fifth percentiles of the different positions are taken from Table 10. The market value of the squad that results from the optimisation of the Lagrangian (�� ) depends on the EPI values of the players in that optimal squad. The market value of the current squad (�� ) is simply estimated by the EPIs of the players in the current squad. The model for this market value estimation looks as follows.

�� = exp� + � ∗ �� + � ∗ �� + � ∗ ��

Here, the gamma values are the estimated coefficients from Table 9.

11.61 0.0012 � = 1.13 ∗ 10 −2.53 ∗ 10

The estimated market value of the squad is simply the aggregated estimated market values of the players in the squad. Optimising the Lagrangian to the EPI values in � of the evaluated team and to � will result in a budget distribution across the squad that optimises the probabilities of winning points and with that the probability of reaching the sporting ambitions of the club. In chapter 5, the results for the three versions of the ordered probit model were presented. In all of the models, the variables that were (some sort of transformation on) central midfielders, left midfielders and substitution, were the significant variables. The other positions had no significant influence on the result of the match. Based on a pseudo R-squared value, the three models were compared. However slightly, the “Difference in EPI per position” model that correct for a team specific home advantage and the opponent, performed best. On this model, some Wald tests for simultaneous insignificance of some groups of variables were performed. The tested groups were all simultaneously insignificant. Then, a final ordered probit model was estimated with only variables that had significant influence on match result. The last section of the chapter presented the result for an OLS model that estimated player market value with EPI, EPI square, and EPI cube as explanatory variables.

53 With the results of the market value model and that of the final ordered probit specification, the final version of the manager tool was presented.

6 Summary and Conclusions

To summarize, in this thesis, the following processes were carried out and conclusions were drawn. An extensive literature review was presented in chapter 2. As suggested by Audas et al. (2002), Koning (2000) and Kuypers (2000), in chapter 2, an ordered probit model specification for the estimation of match result was proposed. Additionally, Carmichael et al. (2000) use a model for team performance in which nearly all of the explanatory variables are in difference between two opposing teams. The assumption that individual player performance is important in determining match result was then combined with the findings of these papers. This resulted in an ordered probit model for match outcome with the difference between the opposing teams in EPI values, as main explanatory variables. Also, Pollard (2006) finds that home advantage influences match outcome. Thus, next to a team fixed effect, the model of this thesis corrected for a team specific home advantage. Chapter 3 described the datasets for the ordered probit model and the market value model. For the ordered probit model, the formations in the 3729 matches, divided over ten seasons, were converted into only 4-4-2 and 4-5-1 to make a comparison between opposing teams possible. As explained in chapter 4, three different versions of explanatory variables were used for the ordered probit model. The results for the “Difference in EPI per Position”, “Difference in EPI Direct Opponents” and “EPI Evaluated Team” models were presented in chapter 5. There, it was found that only the positions of central midfielder, left midfielder and the substitutions were of significant influence on determining match outcome. The result that central midfielders significantly influence the outcome of the match may be caused by the fact that the only possible formations have two or three central midfielders. Also, in

54 concluding that the significance of the estimated coefficient of the left midfielder means, that that position is vital, caution is needed. The 4-4-2 and 4-5-1 formations both have no left winger. So, all the actual left wingers in the dataset were transformed into left midfielders. It was concluded in chapter 5 that the significant effect of the left midfielders may be caused by their role as hidden left winger. A reason for the insignificance of the right midfielders, while the left midfielders do have a significant effect, was not found. The fact that the other positions do not appear to affect the outcome of the match may be caused by a selection effect in the dataset, that only contains Premier League teams. Because teams in the Premier League generally have a certain EPI value that serves as lower bound for every first team player, a possible negative effect of putting a really bad player in the starting eleven can never be found. Therefore, restrictions on the lower bound of the EPI values of the players in the insignificant positions were added to the manager tool in chapter 5. The other constraint that the manager tool is subjective to, is the budget constraint. This constraint makes sure that the budget of the club is distributed more efficiently, rather than that a big cash input is used to buy new players. For clubs that do have a big incoming cashflow for the coming season, the budget constraint needs some adaption. The market value of the squad that is used in this constraint was estimated by an OLS regression with EPI, EPI square, and EPI cube as explanatory variables. These variables were all found to have a significant influence on the market value and an R-squared value of 0.6279 for this regression was obtained. In this thesis, a manager tool was created that optimises a team’s chances of fulfilling their sporting ambitions by adjusting their squad. With this, the question at what point the squad is vulnerable and needs improvement, was implicitly answered. That is, for every club that uses the manager tool, different shortcomings in their squad will come to light. To this extent, investigating which positions in the field are vital and how player experience and quality influence the result of a match, were investigated. The most important positions in determining the outcome of a football match are found to be central midfielder, left midfielder, and substitution. Given that the EPI value is above a certain lower bound, the EPI

55 values of players at other positions do not significantly influence the match result. However, the fact that the dataset is subjective to only Premier League players, that are of a certain high level, is a limitation in this thesis. Further research should include leagues of different levels as to overcome this limitation. Also, the conclusions of this thesis only hold for Premier League clubs that play specific formations. Finding a way to include different formations would be a useful extension of this thesis. Furthermore, including different competitions from different countries could result in interesting findings. Lastly, a more specific dataset that makes a distinction between the two central backs, the central midfielders and the strikers, is a way to improve the models of this thesis.

56 References Albert, J. & Bennett, J. (2003). Curve ball, Springer, New York. Anderson, C. & Sally, D. (2013). The numbers game: why everything you know about football is wrong. Penguin, UK. Audas, R., Dobson, S. & Goddard, J. (2002). The impact of managerial change on team performance in professional sports, Journal of Economics and Business, 54, 633- 650. Battré, M., Deutscher, C. & Frick, B. (2008). Salary determination in the German “Bundesliga”: A Panel Study, Universität Paderborn. Bentley, C. (2017, June). The 2017 Champions League final explainer. Retrieved from https://www.golfdigest.com/story/the-2017-champions-league-final-explainer. Cameron, A. C. & Trivedi, P. K. (2005). Microeconometrics methods and applications, Cambridge University Press, New York. Carmichael, F., Forrest, D. & Simmons, R. (1999). The labour market in association football: Who gets transferred and for how much?, Bulletin of Economic Research, 51, 125–50. Carmichael, F., Thomas, D. & Ward, R. (2000). Team performance: The case of English Premiership football, Managerial and Decision Economics 21, 31–45. Dewan, J. 2006. The Fielding Bible. ACTA Sports, Chicago. Dobson, S. & Goddard, J. (2003). Persistence in sequences of football match results: A Monte Carlo analysis, European Journal of Operational Research, 148(2), 247-256. Eschweiler, M. & Vieth, M. (2004). Preisdeterminanten bei spielertransfers in der Fussball- Bundesliga, Die Betriebswirtschaft, 64, 671–92. Eurosport (2018, January). English clubs top world spending list despite record-breaking Neymar deal. Retrieved from https://www.eurosport.com/football/premier- league/2017-2018/english-clubs-top-world-s-spending-list-despite-record- breaking- Neymar-deal_sto6505839/story.shtml.

57 Feess, E., Frick, B. & Muehlheusser, G. (2004). Legal restrictions on outside trade clauses – theory and evidence from German soccer. Discussion paper, Institut Zukunft der Arbeit, 1140. Frick, B. (2007). The football players’ labor market: Empirical evidence from the major European leagues, Scottish Journal of Political Economy, 54, 422–46. Goldman, S. & Kahrl, C. (2010). Baseball prospectus 2010: The essential guide to the 2010 baseball season, John Wiley & Sons, Inc., Hoboken, NJ. Huebl, L. & Swieter, D. (2002). Der spielermarkt in der Fussball-Bundesliga. Zeitschrift für Betriebswirtschaft, Ergänzungsheft 4, Sportökonomie, 72, 105–25. James, B. (1982). The Bill James baseball abstract, Ballentine, New York. James, B. & Henzler, J. (2002). Win shares, STATS Publishing Inc.3. Keri, J. 2006. Baseball between the numbers, Basic Books, New York. Kern, M. & Süssmuth, B. (2005). Managerial efficiency in German top league soccer: An econometric analysis of club performances on and off the pitch, German Economic Review, 6(4), 485-506. Koning, R. H. (2000). Balance in competition in Dutch soccer. The Statistician, 49, 419– 431. Koning, R. H. (2003). An econometric evaluation of the effect of firing a coach on team performance, Applied Economics, 35, 555–564. Kuypers, T. (2000). Information and efficiency: An empirical study of a fixed odds betting market, Applied Economics, 32, 1353–1363. Lehmann, E. & Schulze, G. (2005). What does it take to be a star? The role of performance and the media for German soccer players, Mimeo, Department of Economics and Management, University of Augsburg. Lewis, A. J. (2005). Towards fairer measures of player performance in one-day cricket, Journal of the Operational Research Society, 56(7), 804–815.

58 Lewis, M. M. (2003). Moneyball: The art of winning an unfair game. W. W. Norton and Co., New York. Lucifora, C. & Simmons, R. (2003). Superstar effects in sports: Evidence from Italian soccer, Journal of Sports Economics, 4, 35-55. Macdonald, B. (2011). A regression-based adjusted plus-minus statistic for NHL players, Journal of Quantitative Analysis in Sports, 7(3), 1-31. Matheson, V. (2003). European football: A survey of the literature, Mimeo, Department of Economics, Williams College. McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (ed.), Frontiers in Econometrics (pp. 105-142). McHale, I. G. & Davies, S. M. (2007). Statistical analysis of the FIFA world rankings. McHale, I. G. & Forrest, D. (2005). The importance of recent scores in a forecasting model for professional golf tournaments, IMA Journal of Management Mathematics, 16(2), 131–140. McHale, I. G. & Morton, A. (2011). A Bradley-Terry type model for forecasting tennis match results, International Journal of Forecasting, 27(2), 619–630. McHale, I.G. & Scarf, P.A. (2007). Modelling soccer matches using bivariate discrete distributions with general dependence structure, Statistica Neerlandica, 61(4), 432- 445. McHale, I.G., Scarf, P.A. & Folker, D.E. (2012). On the development of a soccer player performance rating system for the English Premier League, Interfaces, 42(4), 339- 351. McHale, I.G. & Szczepan’ski, Ł. (2014). A mixed effects model for identifying goal scoring ability of footballers, Journal of the Royal Statistical Society: Series A (Statistics in Society), 177(2), 397-417.

59 Oberstone, J. (2009). Differentiating the top English premier league football clubs from therest of the pack: Identifying the keys to success, Journal of Quantitative Analysis in Sports, 5(3), 1-29. Oberstone, J. (2011). Comparing team performance of the English Premier League, Serie A, and La Liga for the 2008-2009 Season, Journal of Quantitative Analysis in Sports, 7(1), 1-16. Pollard, R. (2006). Worldwide regional variations in home advantage in association football, Journal of Sports Sciences, 24(3), 231-240. Ruijg, J. & Ophem, H. van (2015). Determinants of football transfers, Applied Economics Letters, 22(1), 12–19. Schultze, S.R. & Wellbrock, C. (2018). A weighted plus/minus metric for individual soccer player performance, Journal of Sports Analytics, 4, 121-131. Sill, J. (2010). Improved NBA adjusted +/- using regularization and out-of-sample testing. Proceedings of the 2010, MIT Sloan Sports Analytics Conference. Stene, C. (2016). A football management simulator, Norwegian University of Life Sciences. USA Today (2016, December). Euro 2016 seen by 2 billion on TV; 600 Million tune in for final. Retrieved from htps://www.usatoday.com/story/sports/soccer/2016/12/15/ euro-2016-seen-by-2bn-people-on-tv-600mn-tune-in-for-final/95462450/.