Modeling Team-Compatibility Factors Using a Semi-Markov Decision Process: A Framework for Performance Analysis in Soccer

By Ali Jarvandi

B.S. in Systems Engineering, May 2009, George Mason University M.S. in Operations Research, May 2010, George Mason University

A Dissertation Submitted to

The Faculty of The School of Engineering and Applied Science of The George Washington University in partial satisfaction of the requirements for the degree of Doctor of Philosophy

January 31, 2014

Dissertation directed by

Shahram Sarkani Professor of Engineering Management & Systems Engineering

Thomas A. Mazzuchi Professor of Engineering Management & Systems Engineering

The School of Engineering and Applied Science of The George Washington University certifies that Ali Jarvandi has passed the Final Examination for the degree of Doctor of

Philosophy or Doctor of Science as of 5 November 2013. This is the final and approved form of the dissertation.

Modeling Team-Compatibility Factors Using a Semi-Markov Decision Process: A Framework for Performance Analysis in Soccer

Ali Jarvandi

Dissertation Research Committee:

Shahram Sarkani, Professor of Engineering Management & Systems Engineering, Dissertation Co-Director

Thomas Mazzuchi, Professor of Engineering Management & Systems Engineering, Dissertation Co-Director

Edward Lile Murphree, Professor Emeritus of Engineering Management & Systems Engineering, Committee Member

Bereket Tanju, Adjunct Professor of Engineering Management & Systems Engineering, Committee Member

Pavel Fomin, Adjunct Professor of Engineering Management & Systems Engineering, Committee Member

ii

Acknowledgements

Dr. David Rico, for providing guidance in generating the original idea.

Dr. Jeffrey Ohlmann, for conducting several detailed reviews and providing invaluable technical feedback.

Dr. Shahram Sarkani and Dr. Thomas Mazzuchi for supporting this research idea and providing guidance throughout the dissertation process.

iii

Abstract

Modeling Team-Compatibility Factors Using a Semi-Markov Decision Process: A Framework for Performance Analysis in Soccer

Soccer is the most popular sport worldwide. Over time, the importance of soccer has grown beyond the sports domain, making it a large industry, a source of national pride, and the center of public attention in most countries. Due to this increased significance, it is highly important for soccer teams at both the club and national levels to invest in sciences providing a competitive edge over opponents. Quantitative analysis of soccer is one of the domains that have enjoyed a sharp growth in the recent years. Using the advanced data collection and analysis tools, it has become possible to implement more sophisticated performance analysis methodologies. In this study, a model has been developed to anticipate the collective team performance based on the attributes of the individual players. The model is then used to predict how the hiring of new players affects team performance. The data used for this study has been collected from the

English between the 2008/09 and 2011/2012 seasons. Using the model, team performance can be predicted with an average error of 7.857 units of goal differential. Also, the effect of a new player on team performance can be predicted with an average error of 18.912 units of goal differential. Using a classification strategy, the model was able to correctly predict the direction of change in team performance caused by a new player 85.6% of the time. This provides a minimum of 20% increase in accuracy compared to the current transfer success rate at the highest level of club soccer.

Therefore, using this model is expected to save clubs large amounts of money while enhancing performance.

iv

Contents

Acknowledgements ...... iii Abstract ...... iv List of Figures ...... vii List of Tables ...... ix 1. Soccer as an Ever-Growing Industry ...... 1 1.1 Initial Development and Growth of Soccer ...... 1 1.2 From a Sport to an Industry ...... 3 1.3 Soccer in the Recent Decades ...... 8 2. Great Expectations, Great Risks ...... 12 2.1 Emotional Involvements ...... 12 2.2 Social and Political implications ...... 14 2.3 Winning: Great Value and High Uncertainty ...... 16 3. Quantitative Analysis: A New Way Forward ...... 19 3.1 Why and How? ...... 19 3.2 Performance Analysis in Soccer ...... 21 3.3 Existing Models ...... 26 3.4 The Problem of Team Compatibility ...... 27 4. Markov Models, Simulation, and Stochastic Modeling ...... 29 4.1 Stochastic Models and Markov Decision Process ...... 29 4.2 Simulation ...... 32 5. Team Compatibility Model ...... 33 5.1 Team Performance Measures ...... 33 5.2 Data ...... 36 5.3 Game-flow ...... 39 5.4 Scoring and Conceding Goals ...... 44 6. Output Analysis and Results ...... 50 6.1 Baseline Model Accuracy...... 50 6.2 Transfer Analysis Strategy ...... 54 6.3 Performance Variability ...... 56

v

6.4 Transfer Analysis Results ...... 57 6.5 Multiple Changes in a Transfer Season ...... 61 6.6 Suitability Analysis ...... 63 6.7 Player Classification Method ...... 65 6.8 Sensitivity Analysis ...... 68 7. Conclusion ...... 71 7.1 Overview of Model, Results, and Benefits ...... 71 7.2 Limitations and Future Directions ...... 73 8. Bibliography ...... 75 9. Appendices ...... 80

vi

List of Figures

Figure 1 - Business Value System for Football Clubs ...... 6

Figure 2 – Wage Bill (£) for a Bottom-table Second Division English Club ...... 7

Figure 3 – FIFA’s annual revenue between years 2007 and 2012 ...... 9

Figure 4 – Market Share for Different Sports in the Sports Events Market ...... 10

Figure 5 – Market Size and Compound Annual Growth Rate for Different Sports ...... 11

Figure 6 – The process of quantitative analysis in soccer ...... 21

Figure 7 – Player performance database ...... 23

Figure 8 – Visualization of passing characteristics ...... 23

Figure 9 – Visualization of player movements ...... 24

Figure 10 – Transition from a State to Another under a Given Decision ...... 31

Figure 11 – Points versus Goal Differential in La Liga ...... 35

Figure 12 – Model Structure and Inputs ...... 38

Figure 13 – Model flow in each stage of the SMDP ...... 39

Figure 14 – Goal Scoring Process ...... 44

Figure 15 – Percentage of Goals Conceded versus Number of Passes Leading to the Goals ...... 48

Figure 16 - Estimated Goal Differential From the Model Compared to Actual Goal Differential 52

Figure 17 - Adjusted Estimated Goal Differential Compared to Actual Goal Differential ...... 52

Figure 18 - Adjusted Estimated Goal Differential Based on Regression Analysis of the Previous

Season ...... 53

Figure 19 - Actual and Expected Contributions for Transfers Completed in Summer of 2009 .... 59

Figure 20 - Transferred Players’ Actual Contribution to Their new Teams’ Performance...... 60

Figure 21 - Comparison Between Absolute Prediction Errors With and Without the Model ...... 64

Figure 22 - Effectiveness Threshold per Games Played ...... 67

Figure 23 - Classification Accuracy for Different Base Effectiveness Thresholds...... 68

vii

Figure 24 - Changes in the effectiveness thresholds for different number of games played by changing the base threshold...... 70

viii

List of Tables

Table 1 – Prediction Accuracy of Points Based on Goal Differential ...... 35

Table 2 - Player Attributes Used in Building the Model ...... 36

Table 3 – Available Decisions for Players in each State ...... 42

Table 4 – Decision Likelihoods and Transition Probabilities in State 1 for Cesc Fabregas ...... 43

Table 5 – Percentage of Goals Conceded per Position of Lost Possession ...... 46

Table 6 – Probability of Conceding per Lost Possession in Each Third for Arsenal (2010/11) ... 46

Table 7 – Linear Regression Equations and Prediction Accuracy of Actual Goal Differential

Based on Estimated Goal Differential ...... 51

Table 8 - Expected and Actual Team Performance ...... 57

Table 9 - Absolute Prediction Error for Teams with a Given Number of Transfers in a Transfer

Season...... 62

ix

1. Soccer as an Ever-Growing Industry

1.1 Initial Development and Growth of Soccer

The historical development of soccer can be traced to at least half a dozen games played by the Chinese, Japanese, Romans, and Greeks, and as early as the second and third centuries B.C. (FIFA, 2013). Since before medieval times, “folk football” games were being played in different towns and villages according to local customs and with minimal rules (Britannica, 2013). Industrialization and urbanization, which reduce the free time and space available to the working class, along with the history of legal prohibition of particularly violent forms of folk football, undermined the status of the game in England. The new form of football started as a winter game between residents of public schools such as Winchester, Charterhouse, and Eton. The set of rules used in each school still differed. In 1843, an attempt was made to standardize the rules of the game at the University of Cambridge. These rules were then adopted elsewhere when Cambridge graduates moved on to form their own soccer teams. In 1863, a series of meetings between the clubs in the London metropolitan area and surrounding areas led to the first printed set of rules, which prohibited carrying the ball. As a result of this set of accepted rules, the first Football Association (FA) was born in England (Britannica, 2013). Soon after, football associations were formed in other areas of England. These associations would often play against each other, until 1877 when they reached an agreement and the

London Football Association (FA) emerged as the sole authority for the game in England.

Having only 10 local association members in 1867, London FA reached 1,000 members

1

in 1888 and 10,000 members in 1905 (Murray, 1996, p. 6). As standardized soccer gained huge popularity in England, it was quickly exported to other European countries. Many of the non-English top soccer clubs such as Real Madrid C.F. (1902), F.C. Barcelona

(1899), Juventus (1897), A.C. Milan (1899), and Bayern Munich (1900) were founded during that era (FIFA, 2013) . As a result of this fast growth, there was a need for an international governing body for the game. So, a group of football associations and club representatives from France, Belgium, Denmark, the Netherlands, Spain, Sweden, and

Switzerland formed the Fédération Internationale de Football Association (FIFA) in 1904 in Paris.

With the start of the First World War, soccer leagues across the world were suspended. The FA cup final in 1915 was the last professional game played before the end of the war. Prior to the First World War, soccer was a sport played and watched almost exclusively by the top social class in most countries, other than England. During the war, many of the barriers between different social classes were removed and soccer found its way to the working class. Soon after the war, many European countries started professional soccer leagues. Also, the possibility of radio broadcasting made the game known to a larger portion of the population. While professional leagues were established in many European countries in the 1920s, many components of professionalism such as team training, player selection, and tactics were non-existent. These concepts were first introduced by Herbert Chapman, who joined Arsenal as a coach in 1925. Chapman defined specific positions for players and developed programs for physical and technical development of players in each position. He developed actual tactics for his team and called for lights in stadiums, use of white balls, artificial pitches, numbered jerseys, and

2

stadium clocks. Despite initial rejection of his requests by the English FA, these ideas were later accepted across the world and became playing standards.

Although FIFA was created in the early years of the 20th century and many international games were being played under its supervision, there was no international soccer tournament in the 1920s other than the Olympics. In an attempt to create a soccer tournament similar to the Olympics, Jules Rimet, who became president of FIFA in 1921 led the effort for creating the World Cup. The first World Cup was held in Uruguay in

1930 and the cup was named after Jules Rimet. It was then decided that the World Cup would be played every four years. However, the start of World War II created a 12 year gap between the1938 and 1950 World Cups. During the years of World War II, although professional soccer significantly slowed down in the countries involved in the war, the game grew considerably in South America as top players from that region were not leaving for European leagues. Also, neutral European countries held soccer competitions consistently throughout the war. So, at the end of the war some traditional soccer powerhouses had fallen behind the countries not involved in the war. This shift was clear in the 1950 World Cup as the first four places went to Uruguay, Brazil, Sweden, and

Spain. World War II can be considered the last obstacle to soccer’s international growth; a process that picked up soon after the war and has continued to this date (The People

History, 2013).

1.2 From a Sport to an Industry

During the 1950s and 60s, Brazil was the leader in international soccer. However, club soccer was led by Europeans in the same period. With the increasing number of

3

European member associations in FIFA, Europeans decided to form a continental association in response to CONMEBOL, the South American association established about 40 years earlier. The new European soccer association was named UEFA and was formed in 1955. Within a year, UEFA established a continental club tournament, the

European cup (still not known as Champions League). The European cup was held every year and gained increased popularity among soccer fans. Another important event in this era was the introduction of television broadcasting. This change made it possible for people to watch soccer stars from different locations and get more attached to the game.

It also became possible to keep video tapes as historical records. As a result of the increased popularity, the number of FIFA members increased from 59 countries to 140 between the end of World War II and the start of João Havelange’s term as FIFA president in 1974.

João Havelange’s tenure as FIFA president marks a significant progress in the business aspect of international soccer. Before Havelange took office, soccer was already very popular and the World Cup had been broadcast in color for the first time in 1970.

However, there was a lack of ideas in generating cash from the public interest in the game. Havelange came up with a paradigm including four components for the sponsorship of the World Cup:

1. The World Cup would be interested in the largest sponsors with global reach.

2. The sponsorships would be divided by type, with only one of each type

present in the World Cup. For example, there would be only one soft drink

sponsor, and one beer sponsor.

3. FIFA would have full control over TV rights and advertising

4

4. FIFA would not directly negotiate the sponsorship deals. Instead, a

middleman would do the negotiations for a guaranteed amount of money.

This strategy was first tried in the 1982 World Cup in Spain. At the same time, the number of teams in the World Cup was increased from 16 to 24. With the newly implemented business approach, a total of 42 million Swiss francs were generated from sponsorships. In the next two World Cups, TV viewership over the course of the competition increased to 10 billion (1986) and 20 billion (1990). That allowed the total sponsorships to reach 100 million Swiss francs (The People History, 2013).

The strategies Havelange implemented in international soccer were also critical in club competitions, where broadcasting and sponsorships could be similarly negotiated.

The difference was that club competitions were played around the year compared to international competitions, which would be played much less frequently. So, all of these sources of income could be exploited to a much greater level by clubs. In addition, clubs could use merchandising and player transfers as sources of income. At the same time, the increased popularity of either club or international soccer would directly benefit the other as it respectively increased the popularity of the game and its stars. Figure 1 shows the relationship between the different elements of a club’s business cycle.

5

Figure 1 - Business Value System for Football Clubs (Grundy, 1998, p. 129)

The popularity of soccer was built around its stars. More attractive players could increase game attendance and therefore gate takings. They would also increase the demand to watch a team’s games on TV, buy its products, or sign sponsorship deals with it. While players in the early 20th century hardly received large sums of money, the new climate increased the competition for signing players. That meant a sharp increase in player salaries. Figure 2 shows the annual amount of player salaries paid by an English club in or around the relegation spots in the second division between 1974 and 1989. The figure shows that the effect of the industrial view of the game was not limited to the top few clubs and international teams.

6

Figure 2 – Wage Bill (£) for a Bottom-table Second Division English Club (Szymanski

& Smith, 2006, p. 138)

1000000

900000

800000

700000

600000

500000

Wage Bill (£) 400000

300000

200000

100000

0

1978 1974 1975 1976 1977 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 Year

The decades of 1970s and 1980s can be considered transitional years for soccer.

The foundation that was created in these decades set the stage for sharp changes in the following decades. This was primarily thanks to the possibility of TV broadcasting and the opportunities it had presented.

7

1.3 Soccer in the Recent Decades

The perspective change of the 1970s and 1980s set the stage for quick revenue expansions in the 1990s. In 1991, a group of top English teams met and formed the

English Premier League. Soccer matches in England were exclusively shown by BBC up to that point and with small shares allocated to the clubs. The new premier league sold the broadcasting rights to Sky Sport for £304 million for three years. This amount was increased to £670 million in 1996 and £1.1 billion in 2001. A similar approach was taken in Germany, France, Spain, and Italy, and increased club revenues significantly. In the

2011/12 season, Real Madrid and F.C. Barcelona each had approximately €500 million in revenue (Fontevecchia, 2012). At the continental level, UEFA also tried to grow its brand through more attractive club competitions. In 1992, the European Cup was renamed to

UEFA Champions League and a group stage was added to it in order to make the competition larger. This created an additional potential to secure larger financial deals

(The People History, 2013). The estimated gross commercial revenue of the UEFA

Champions League and the UEFA Super Cup (a single match between the winner of

UEFA Champions League and UEFA Europa League) was €1.34 billion in the

2012/2013 season (UEFA, 2012).

At the international level, soccer has followed a similar path to increase involvement and revenue. João Havelange, who had previously increased the number of teams in the World Cup to 24, increased to number to 32 in World Cup 1998. This decision helped continents like Asia, Africa, and Oceania to have a stronger influence in the competition. It also helped the event financially both by involving more nations and

8

increasing the number of games. One of the consequences of this decision was that China qualified for the World Cup for the first time in 2002. This made the largest nation on earth became more exposed to soccer than ever, opening a new market for the game.

FIFA currently has 209 member associations (FIFA, 2013), which is higher than the 193 of the United Nations. This shows the magnitude of influence the game of soccer has had in all parts of the world. Along with the progress in expanding the game, FIFA has experienced a continuous increase of revenue despite the volatile economic conditions.

Figure 3 shows FIFA’s annual revenue between 2007 and 2012. Given the cyclical nature of some activities, particularly the World Cup, the graph shows a steady growth in revenue generation.

Figure 3 – FIFA’s annual revenue between years 2007 and 2012 (FIFA, 2012, p. 15)

1291 1166 1059 1070 957 882

FIFA Revenue (USD Million)

2007 2008 2009 2010 2011 2012 Year

The continuous progress in the last several decades has made soccer by far the most successful sport. Despite large popularity of several other sports in specific areas of the world such as American Football in the United States and Cricket in India, soccer has been able to capture the largest international audience and therefore draw the largest

9

revenue. Figure 4 shows the market share for different sports in the sports events market including ticketing, media, and marketing revenue in 2009.

Figure 4 – Market Share for Different Sports in the Sports Events Market (A.T. Kearney,

2011, p. 2)

In addition to currently holding the largest market share, soccer is still growing at a high rate. Figure 5 shows both market size and compound annual growth rate for different sports between 2005 and 2009.

10

Figure 5 – Market Size and Compound Annual Growth Rate for Different Sports (A.T.

Kearney, 2011, p. 2)

With the passion of many nations invested in it, soccer carries the title of “The

World’s Game” 150 years after its foundation (Murray, 1996). With that come many financial, social, and political implications, making it increasingly important to study the different aspects of this sport.

11

2. Great Expectations, Great Risks

2.1 Emotional Involvements

Emotional involvement is one of the notable characteristics of sports fans. Scenes of fans celebrating or crying at the end of important soccer matches are frequently shown on TV. Once in a while there is also news about fans having heart-attacks during or at the end of matches. With the continuous increase in the size of soccer’s fan base, there has been also an increase in the degree of fans’ emotional ties to their teams. This has given soccer the power to influence people’s lives not just as a form of entertainment, but an important type of identification. Fans around the world consider their clubs and international teams part of their lives, they open their daily schedules for following their teams, and consider their teams’ success or failure their own. This huge passion among fans is directly driven by their team’s performance and can be released in both positive and negative ways. The level of emotional ties between fans and teams has sharply increased as the industrial side of the game has demanded it. Realizing that the fans are the main bloodline of sports business, the different parties involved in the game, particularly the media, have moved towards further emotionalizing the game (Ismer,

2011). This has increased the potential effects of teams’ performance on their fans in a region, country, and around the world.

There have been many instances of fans’ emotional reactions to negative team performances. Fans’ anger over team results have often led to riots during or at the end of soccer matches. During the twentieth century, 276 people lost their lives in disasters on

12

soccer grounds only in UK (Johnes, 2004, p. 1). The term “Soccer Hooligans” has been often used to describe the angry fans forming riots. While this term was initially used to refer to British fans, fan riots quickly expanded to other parts of the world and in many cases claimed lives. The riots in the Liverpool Vs. Juventus (1985), Catania Vs. Palermo

(2007), and Rangers Vs. Zenit (2008) are examples of fan misbehavior in soccer (Jones,

2009). While each of these events happened under unique circumstances, the frequency of fan misbehavior in soccer matches shows the game’s unique potential for moving people’s emotions in a specific way. This is another reason that makes soccer an important phenomenon in our world.

On the positive side, emotional ties with soccer have made this sport a source of much positive energy in various societies. Scenes from national celebrations when teams perform well in big competitions are frequently shown on TV. In the recent years, important matches are shown on big screens in city centers so large crowds can watch the game together and share their emotions. One of soccer’s unique potentials is its high international visibility. That means both clubs and nations view soccer tournaments as a way of displaying their values and power. In the 2010 World Cup, even though the host country, South Africa, was not successful on the pitch, the people of South Africa generally felt very proud as the competition provided an opportunity for the world to see their country’s capabilities for holding a competition of that magnitude. In the days following the World Cup, Sowetan, which was one of the most anti-apartheid newspapers in the pre-democracy times, wrote: “What a glorious 31 days it has been! . . . Not even the release of Nelson Mandela and the first democratic elections were as electrifying.

Important and epoch making as both these events were, they were not as unifying as this

13

World Cup seems to have been” (Ismer, 2011, p. 548). This type of reaction to a soccer event shows the amount of energy this sport is capable of generating in a group of people or a nation. Clearly, any club or national team wants to present this positive energy to its fans. This requires strategies for success both on and off the pitch.

2.2 Social and Political implications

Given the large influence of sports on masses, there have been many studies on the social and political significance of sports. One of the important potentials of popular sports is in promoting a national identity among people of a certain region or country and to export ideological values to other countries (Tuñón & Brey, 2012). Soccer, as the most popular international sport, has become a national identity builder in many parts of the world, particularly in South America, Africa, and parts of Europe. Given the high significance of soccer, it has sometimes played an important role in international conflicts. In 1969, the riot at the end of a soccer match between Honduras and El

Salvador contributed to triggering a war between the two countries. That war is now known as the Soccer War (Football War). Similarly, the events of the soccer match between Dinamo Zagreb and Red Star Belgrade in 1990 played an important role in the

Balkan conflicts. These are just examples of soccer’s impact on international politics through mobilizing large groups of people. Moreover, soccer has slowly found a traditional and symbolic role in developing and promoting national and regional identities. This is due to the special characteristics of sports, such as the ability to transcend social classes and create a feeling of solidarity (Tuñón & Brey, 2012).

14

In addition to the role that soccer teams play in unifying nations and representing cultures, some soccer teams are perceived as important cultural elements for their region or country. These teams make a significant contribution to forming a culture. An example of this type of teams at the highest level of soccer is F.C. Barcelona. “Academia and the popular press recognize that Football Club (FC) Barcelona (often called by its nickname

Barca) functioned as an important vehicle for the expression of Catalan identity and

Catalan national sentiments under the authoritarian regime of Francisco Franco” (Shobe,

2008, p. 87). Franco’s regime denied Catalonia of its political, cultural, and linguistic institutions; Catalan symbols, including their flag, were banned. During that period,

Camp Nou (F.C. Barcelona’s Stadium, which had 100,000 people capacity) turned into a place to express emotions against the political system, and F.C. Barcelona became a symbol of Catalan identity. This was particularly true in the games against Real Madrid, which symbolized the centralized political power. The club is owned solely by its supporter members and the president is elected through a democratic process. F.C.

Barcelona still serves as one of the symbols of Catalan identity and its results have a strong psychological influence on the people of Catalonia. In the 2013/14 season, the club changed the color of its away kit to the ones of the Catalan flag in order to emphasize its political role.

Soccer has often been influential in politics across the world. Political candidates received endorsement from soccer stars in England as far back as early 1900s. In Italy,

Benito Mussolini’s regime used the victory in the 1934 World Cup as a major tool for propaganda. Hitler had a similar view of football. He tried to dominate Europe in all ways possible, including on the pitch. In South America, Argentina’s national team

15

pulled out of the 1949 Copa América and the 1950 World Cup as Juan Perón’s regime thought a defeat in soccer on the international stage could weaken the government.

Finally, during the apartheid era, South Africa’s all-white soccer association was barred from the first African Cup of Nations in 1957. Its membership in both the Confederation of African Football (CAF) and FIFA was later suspended. However, after an investigation by FIFA’s president, the association was allowed to represent South Africa in the World Cup despite calls for sanctions from other African countries (The People

History, 2013). There are many other examples of the close relationship between soccer and politics which show the extraordinary significance of this sport in the lives of nations.

2.3 Winning: Great Value and High Uncertainty

Previous sections described how soccer can be important to governments, investors, and ordinary people. There are many ways soccer can impact people’s lives positively or negatively. However, the key to achieving the financial, social, and political benefits of soccer discussed previously is the establishment of good results. It is with good results on the pitch that teams can participate in important tournaments, get more visibility, create a larger fan base, and attract more sponsors. Good results are also necessary for creating positive emotions among fans, creating a sense of power in nations, and sending political messages. Looking at successful soccer clubs, the ones with the largest fan base and highest financial potentials are the ones that are most successful on the pitch. For instance, in Spain’s top first division (La Liga), Real Madrid and F.C.

Barcelona (the two most successful teams) together collect about half of the TV revenue

16

for the league while the other 18 teams share the other half. This is simply because these two teams attract the majority of customers to the TV channels purchasing the rights to

La Liga. With the large potential gains at stake, it is highly important for soccer teams both at the club and international level to obtain the best results possible on the pitch. It is because of the various potential benefits that clubs and countries are willing to fiercely compete against each other in order to win soccer matches.

In the complex world of professional soccer, success cannot be achieved easily.

The quality of soccer teams is influenced by a large number of factors such as the infrastructure, youth academy, fan base, talent pool, recruiting system, marketing and sponsorships, and club management. Clearly, some of these elements can be in turn strengthened over time by club success. Nonetheless, long term success requires special attention to all of these elements. With the large amount of investment and planning required for success, the return on investment is always an important factor. Developing investment strategies in soccer is particularly difficult as financial resources need to be divided between different club needs, each serving the ultimate goal of success on the pitch. By far, the largest chunk of each professional club’s budget is spent on player salaries. This means the quality of investment in players is the most important factor in determining clubs’ return on investment. Keeping in mind that players have the most direct role in determining match results, good player selection can be considered as a critical success factor for soccer teams.

Considering the large amounts of investments made in the process of forming soccer teams, it is clear that knowing the expected return on investment has a high value to decision makers. In soccer however, it is very difficult to develop accurate estimates

17

of return (in terms of results) on investment (in terms of dollar value). This is due to the following characteristics of the game:

1. The Stochastic nature of the game: Looking at the different events in the game of

soccer, there is always a degree of uncertainty about the outcome. Part of this

uncertainty is related to the natural variability in human performance. The rest can be

associated with the surroundings including the ball, pitch, and weather, each adding a

degree of uncertainty to the game.

2. Inconsistency in player performances: In addition to the natural variability in

human performance, soccer players can have major inconsistencies. This, added to

the potential injuries, suspensions, and other events, makes it very difficult to

estimate the return on investment on a specific player.

3. Multidimensional performance: Despite the simple measure for determining the

winner of a game (scoring and conceding goals), there are multiple dimensions

associated with player and team performance. Factors such as fitness, creativity,

passing, shooting, positioning, headers, mental strength, etc. all contribute to

performance. This makes it a large challenge to derive the specific utility of each

player for the team and then perform cost/benefit analysis.

4. Lack of specific measures for some attributes: Some of the factors contributing to

team results are extremely difficult to measure. For instance, a player’s positioning in

different conditions, effectiveness of moves without the ball, mental strength, and

motivational effect of teammates each other are qualities that are very difficult to

specifically measure, let alone measuring their utility.

18

With the difficulty of predicting the return, investments in soccer have often had less than successful outcomes. Yet, due to its large international market, soccer has become increasingly popular among investors of different types. In this context, any progress in enhancing team performance for a fixed amount of investment can be a significant help to soccer teams by increasing the return on investment.

3. Quantitative Analysis: A New Way Forward

3.1 Why and How?

The role of coaching in soccer has gradually increased over time. Top coaches play a critical role in club success by developing tactical plans for individual games and seasons, forming the squad, managing investments on player transfers, and directing all activities related to their team’s physical, technical, tactical, and mental preparation. The value of a good coach is in efficiently utilizing club potentials towards positive on-the- pitch results. This is the key to increasing return on investment for clubs. In the recent years, quantitative analysis has become one of the important resources available to coaches for increasing return on investment. The term “quantitative analysis of sports” refers to the activities related to collecting data on player and team performance and utilizing them towards team success. The way quantitative analysis is used can be significantly different in different sports. However, the ultimate goal of this type of analysis is to provide unbiased insight into the game and help teams enhance their performance. Traditional coaching in sports has been done based on experience. That means coaches (mostly former players) would perform their tasks based on what they had

19

learned from being exposed to the game over the years. The integration of quantitative analysis with traditional coaching has largely grown in the recent years as this type of analysis has gained acceptance in the coaching community.

Early forms of quantitative analysis in soccer were performed by coaches themselves trying to manually collect data on players’ actions and use that for decision making. The sharp growth in computer technology in the final decades of the 20th century triggered several changes in the quantitative analysis of sports. On one hand, the possibility of data collection using computer tools significantly increased both the types of data that can be collected and the volume of data on players and teams. On the other hand, the ability to process the collected data has largely increased. However, utilizing the large volumes of data for the creation of performance improvement strategies has proven to be highly complex. Given all the progress in data collection and processing, there are fundamental questions that need to be answered for effective performance analysis in soccer:

1. What parameters contribute to winning a game and how much?

2. How can we collect data on those parameters?

3. What is the nature of the relationship between player and team performance

parameters and the overall team performance?

While the questions in this field are very clear, there have not been clear answers to them due to the limitations in data collection, large number of player and individual attributes, and the highly dynamic setup of the game.

The work in quantitative analysis of sports can be viewed in terms of three components: Data collection, modeling, and communication. It is required for any

20

successful end-to-end analysis that data is collected correctly and accurately, a robust model is developed, and results are communicated to the coaching team in an effective manner. Nonetheless, due to the large scope of the work, research teams often limit their efforts to one of these components. To obtain a clear understanding of the current state of performance analysis in soccer, it is important to review the state of each of these components. It will be then possible to discuss the problem of team compatibility as it can be associated with all three components. Figure 6 shows the end-to-end process of quantitative analysis in soccer.

Figure 6 – The process of quantitative analysis in soccer

3.2 Performance Analysis in Soccer

Data collection is the starting point of performance analysis. The large increase in the amount of available data has enabled significantly more in-depth performance analysis and has drawn more attention to quantitative analysis of sports. The data

21

collected on athletes is generally divided into data on physical conditions and technical performance. Data collection on athletes’ physical conditions has been a routine at different levels of professional sports thanks to the advances in the medical science. This type of data collection is relatively easy, as each player’s fitness is measure against well- defined quantitative criteria and is generally unaffected by the surroundings. Another reason that fitness tests are largely popular is that there is little effort needed to translate the results into meaningful information for coaches. In addition to general fitness tests for athletes, there are well-established tests specifically designed for soccer players. These tests are designed to assist with developing a physical profile for players, which in turn helps with evaluating the effects of different training programs (Rösch, et al., 2000).

Traditionally, collecting a player’s technical data was viewed as a tedious task with little benefit. This was primarily due to the fact that there was a lack of ability to collect data on many of the players’ attributes as well as a lack of clear relationships between player attributes and team results. The sharp growth of computers and electronic devices in the late 20th century made it possible to significantly increase both the types of data and the volume of data collected from players. The innovative tools have made it possible to track players during the entire match, record their measurable actions with high accuracy, and save large amounts of data easily. Figures 7, 8, and 9 show examples of collected data using new technologies.

22

Figure 7 – Player performance database (EPL Index, 2013)

Figure 8 – Visualization of passing characteristics (Four Four Two, 2012)

23

Figure 9 – Visualization of player movements (Figueira, 2013)

The increased data collection capabilities have triggered extensive efforts both in academia and industry to utilize the collected data towards team success. The result has been a large number of models for predicting team performance based on collected data.

With the large amounts of performance data easily accessible, modeling is the largest short term challenge for successful performance analysis. A performance model can be described as any mathematical framework developed with the purpose of making sense out of performance data. This “requires mapping the real game process into an abstract representation formulated in features specifically designed for game analysis objectives” (Beetz, Kirchlechner, & Lames, 2005, p. 33). Traditionally, there have been different approaches to analyzing the game of soccer. Given that the game raises many unique questions about team performance, a variety of models have been developed for different types of performance analysis. In addition, the highly stochastic nature of the game has encouraged researchers to also spend time on smaller models covering very

24

specific parts of the game with the purpose of reducing the effects of the inherent performance variability on their results. In this context, a large number of studies have been performed on sports analytics and specifically soccer, aiming at making a breakthrough in utilizing available data in an effective way. Generally, the quantitative models for soccer can be divided into the following categories:

1. Player-based Models: These models are developed with the purpose of modeling

individual players’ tendencies and enhancing player performance in specific settings.

These models are highly popular among professional clubs as they do not contain a

lot of variability and their results are easy to communicate to players and coaches.

2. Scenario-based Models: This type of model is developed with the purpose of

gaining insight into team behavior in specific scenarios (possession under pressure,

defending in own half, etc.) and enhancing performance in each scenario. These

models are often used by coaches for analyzing opponents’ tactics and developing

game plans.

3. Team-based Models: This type of model can be a collection of scenario-based

models and player-based models. The purpose of these models is to enhance the

overall team performance in the long run.

4. Match-based Models: These models take into account the dynamics of the game

between two teams in modeling a specific game. This type of modeling is rarely

applied in professional soccer due to the high variability leading to lack of accuracy.

Computer games are examples of this type of modeling.

One of the key features of performance analysis models is the quality of communicating results to the end user. Given the high internal complexity of computer-

25

based performance models, the strategy implemented for communicating model output to users can play an important part in a model’s success. Depending on the type of model output, there are generally two types of strategies for enhancing the quality of communication with user: visual communication and contextual communication. Visual communication is currently used by a large number of models and is ideal for representating data in a meaningful way. Examples of that were shown in Figures 7 and

8. In the case of predictive performance models, however, outputs are mostly in the form of pure numbers. In this case it is highly important that the numbers are presented in an appropriate context so they cannot be misinterpreted. While there has been large progress in the visual representation of data in recent years, contextual representation of model outputs has proven challenging as there is little common ground between the developers of performance models and the users. This can be viewed as one of the reasons that there has been a degree of hesitancy among coaches for utilizing more in-depth quantitative analysis.

3.3 Existing Models

In the recent years, a large number of models have been developed with the purpose of quantifying team and player performance. These models are often largely different from each other based on their theoretical approach and also whether they model parts of performance or its entirety. Kang, Hwang, and Li (2006) studied the relationship between players through trajectory analysis. This type of analysis has become possible through real-time positioning systems, including one developed by

Beetz, Kirchlechner, and Lames (2005). There are also a large number of models looking

26

at the role of playing tactics in team performance. Dobson and Goddard (2010) developed a model for optimizing playing strategy based on the conditions of the game including score, player dismissal, and playing home or away. Their playing strategies included offensive, defensive, violent, and non-violent. The study by Tenga, Holme, Ronglan, and

Bahr (2009) as well as the study by Beck and Meyer (2011) provide examples of performing statistical analysis on the effects of playing strategies on team performance.

Finally, a group of studies use position-based analysis. These studies generally attempt to find performance indicators for different playing positions and measure player performance based on that. The study by Hughes et al. (2011) provides a full classification of players based on their positions and the required attributes for each playing position. Boon and Sierksma (2003) approached this problem by asking coaches to score the different qualities of each player, the qualities needed for each position, and the relative importance of each position all from 0 to 10. They used this information to run a linear programming model for optimizing team lineup.

3.4 The Problem of Team Compatibility

Player selection is an important part of every professional soccer club’s strategy for achieving technical and financial success. In the recent years, the international transfer market has been largely hit by the worldwide economic crisis, which has made it difficult for average clubs to continue paying players at the old rates; at the same time, the introduction of wealthy investors at some of the top clubs has made it virtually impossible for others to compete over hiring most skilled players. As a result, the gap between the financial resources of the few top clubs and the rest has considerably

27

widened. In the new financial climate, the quality of transfer decisions can determine the future of clubs. Therefore, it is critical that proper tools are developed and used in order to increase the quality of hiring decisions.

Quantitative analysis is currently used in different forms by the majority of first division clubs in top European leagues. Large volumes of data are collected on players’ physical, technical, and decision making attributes in every game. Among other purposes, this data is used by clubs to analyze their prospective players and make better transfer decisions. In reality however, a large number of costly transfers have proven unsuccessful for the hiring clubs. This identifies a need for a new type of performance analysis that leads clubs to a more accurate prediction of the changes in team performance caused by the introduction of a new player. Given the amount of financial resources at stake, this type of analysis can greatly serve clubs if an increase in transfer success rate is achieved.

Looking back at some of the most widely known unsuccessful transfers, the number one cause for players’ poor performances in their new teams is perceived to be lack of compatibility with new teammates. While player-oriented performance analysis methods provide many ways of analyzing players’ individual characteristics, the difference in the same player’s performance with the same level of fitness at two different clubs cannot be explained without analyzing the context in which the player performs.

Therefore, it is highly important in performance analysis that the links between player attributes and the quality of team performance are identified so player attributes can be assessed as they impact team performance (Young, 2010). This team-oriented approach to performance analysis can also limit potential bias towards players’ less decisive attributes. A robust model will help soccer clubs quantify the effects of team

28

compatibility on performance in order to obtain a more accurate prediction of new players’ potential contribution to team success before making a transfer decision.

To accurately measure the effect of team compatibility on performance, we need a theoretical framework that captures the interactions between players. Soccer can be described as a continuous process of simultaneous decision making and execution by multiple players. It is clear that the decisions made by teammates may or may not complement each other at each instant. Also, each decision may or may not be followed by a perfect execution. Due to the highly stochastic nature of the game, each decision can only be made when the previous set of “decision and execution” has been completed and the outcome is known. This means the theoretical framework considered for modeling the game shall allow for frequent decision making and change of state. The next chapter discusses the theoretical foundation needed for developing a team compatibility model for soccer.

4. Markov Models, Simulation, and Stochastic Modeling

4.1 Stochastic Models and Markov Decision Process

“Any realistic model of a real-world phenomenon must take into account the possibility of randomness. That is, more often than not, the quantities we are interested in will not be predictable in advance but, rather, will exhibit an inherent variation that should be taken into account by the model” (Ross, 2007, p. 1). The game of soccer is a highly stochastic, multidimensional process involving many layers of randomness. This randomness is created by a variety of sources, most importantly the natural variability in

29

human performance, particularly in dynamic settings. Therefore, a realistic modeling approach to soccer needs to incorporate the variability in performance. That is, to recognize the randomness in both the input provided to the model and the outcome that the model is predicting.

Markov models are a group of stochastic models that are widely used in a variety of domains. These models are used for performance analysis in various sports (Goldner,

2012). Semi-Markov Decision Process (SMDP) is a model of decision making under uncertainty (Denardo, 2003). Please see appendix 1 for details of SMDP. This type of modeling was initially used for solving resource allocation and inventory optimization problems. In the recent years, this approach has been used in a variety of technical fields to solve problems related to decision and risk analysis. Some domains benefiting from semi-Markov models are Economics, Robotics, and Manufacturing. Typically, SMDP is utilized as an optimization approach. However, in this study SMDP is used to model the process of players’ decision making and execution in a simulation context. Given the characteristics of the game of soccer, SMDP provides an ideal modeling framework as it provides the following advantages:

 It allows for repetitive decision epochs.

 It defines a clear framework for states and transition probabilities.

 It allows for simultaneous decision processes within the logic of the game.

 With a large enough number of iterations it is capable of modeling a

continuous system in a near-continuous manner.

The SMDP model used in this study consists of four main components: stage, state, decision, and transition probabilities. These components are defined as follows:

30

Stage: In our near-continuous model, a game consists of a large number of stages, each representing a set of decision and execution for all players. For instance, in stage one, each player makes a decision and the outcome is computed before the game moves on to stage two.

State: A state is a player’s condition with respect to the game flow at a decision epoch.

This determines the set of decisions available to the player in that stage. For example, a player who has possession of the ball is in a different state and has a different decision set than a player whose opponent has possession of the ball.

Decision: Based on a player’s state at each decision epoch, there are multiple decisions available to that player. For instance, a player who has possession of the ball may decide to execute a short pass, a long pass, a shot, or dribble.

Transition Probability: It is the likelihood that a given decision takes the player to a specific state in the next stage of the game. Figure 10 shows a sample transition from

State 1 to States 1 through 4 under decision .

Figure 10 – Transition from a State to Another under a Given Decision

31

4.2 Simulation

“A simulation is the imitation of the operation of a real-world process or system over time. Whether done by hand or on a computer, simulation involves the generation of an artificial history of a system and the observation of that artificial history to draw inferences concerning the operating characteristics of the real system (Banks, Carson,

Nelson, & Nicol, 2005, p. 3).” Depending on system characteristics, a simulation can be discrete or continuous. Unlike some sports (e.g. baseball, American football) soccer is a continuous process where the game is rarely stopped (only when a foul has occurred, a goal has been scored, or the ball has gone out). That means decisions can be made at any instant during the game. Nonetheless, simulating the game requires a framework that accounts for state transitions of all players on the field as a result of a set of actions. That means a discrete-event models suits the system better. To account for both the continuity of the game and the state transition requirement, the model can be developed in a near- continuous fashion where transition times are very short.

Given the probabilistic decisions and outcomes in the SMDP model, a Monte-

Carlo Simulation approach was chosen. This type of simulation works based on Monte-

Carlo Sampling, which is a methodology for generating probabilistic input based on the model’s probability distributions in each stage of the simulation (Winston, 2004, p.

1153). Using this approach, the data on players’ decision likelihoods and transition probabilities can be fully utilized in the simulation model.

32

5. Team Compatibility Model

5.1 Team Performance Measures

Modeling player and team performance requires a clear definition of performance measures. We know that in a single game, the ultimate goal for each team is winning.

However, individual games are normally part of a larger competition. Therefore, the larger objective is winning the competitions, which can be a domestic league, domestic cup, continental cup, etc. Looking at the number of games played in each competition, domestic leagues include the largest number of games (38 games in most top leagues).

They have a high priority for clubs and provide the largest amount of data and therefore are most representative of players’ real qualities. The only true performance measure for teams in domestic leagues is their standing in the league table at the end of the season.

Clearly, this attribute also depends on the other teams’ performance. For instance,

Manchester United finished second in the English Premier League with 79 points in the

2009/10 season but finished first with 73 points in the 2010/11 season. This means that from each team’s perspective the goal is to maximize the probability of winning in each game in order to collect the largest number of points possible and then hope for the best standing in the league table. The implication is that team performance should be primarily assessed based on the number of points collected over the course of the season

(Oberstone, 2011, p. 2).

The next question to be answered is: what events translate into points for teams?

We know that the only performance measures in a single game are the number of goals

33

scored and conceded, and the difference between the two determines the number of points each team collects from the game. These two performance measures are easy to work with since they are strictly quantitative. However, we need to find out if there is a strong enough correlation between the single-game performance measure (goal differential) and the season-long performance measure (number of points). Although the relationship between goal differential and points might sound trivial, it is important to consider that teams may win games by close margins but lose with larger margins, or the opposite. Therefore, it is important to confirm that total number of goals scored and conceded (combined in goal differential) is a good indicator of total number of points at the end of the season. To answer this question, a regression analysis was performed for goal differential versus points in the English Premier League (EPL), Italian Serie A, and

Spanish La Liga in the 2009/10 season. The linear fit obtained from the regression analysis along with the goal differentials from the 2010/11 season was used to approximate the points obtained in 2010/11. Finally the coefficient of determination ( ) was computed for predicted points versus actual points. Table 1 shows the prediction accuracy of points based on goal differential. Figure 11 shows the small change in the linear fit for points versus goal differential in La Liga in two consecutive seasons.

Numeric values of actual and estimated points using this method are listed in Appendix 2.

34

Table 1 – Prediction Accuracy of Points Based on Goal Differential

R-Squared of R-Squared of Linear Fit League Linear Fit Linear Fit Against Against 10/11 09/10 Points Points

EPL Points = 48.7 + 0.566*Goal Differential 0.940 0.909

Serie A Points = 51.9 + 0.825*Goal Differential 0.912 0.930

La Liga Points = 52.3 + 0.645*Goal Differential 0.956 0.965

Figure 11 – Points versus Goal Differential in La Liga (2009/10 versus 2010/11). The

figure shows the small difference between the linear fits in two consecutive seasons.

As shown above, the for predicting the number of points in a season based on goal differential is greater than 0.9 for all three leagues. The important implication of this relationship for this research is that goal differential can be substituted for the number of

35

points as the teams’ ultimate performance measure in a domestic league season. This means that each player’s contribution to team performance can be defined in terms of its effects on the expected number of goals scored and conceded by the team in a season.

5.2 Data

The data for this study was received from the EPL Index. The dataset included season by season performance data for all players in the EPL between the 2008/09 and

2011/2012 seasons. The raw attributes used by the model are shown in Table 2.

Table 2 - Player Attributes Used in Building the Model

Player Performance Data Tackling Defending Passing Creativity Possession Attacking Miscellaneous Total Total Total Total Total Total Minutes Dribble Tackles Clearances Passes Touches Shots Played Attempts Tackles Headed Accurate Dribble Unsuccessf Goals Won Clearances Passes Accuracy ul Touches Total Total Total Dispossess Shots on Ground Long Chances ed Target Duels Balls Created Ground Accurate Ball Shots off Duels Long Overrun Target Won Balls

Total Interceptio Chances Aerial ns Received Duels

Aerial Duels Won

36

The players’ technical data were used to model the performance of destination teams before and after a transfer and determine the expected change. With this dataset, it became possible to analyze the transfers within the EPL in the transfer seasons of 2009,

2010, and 2011. For each player moving from team “A” to “B” at the end of season , performance data from team A in season , team B in season , and team B in season

was needed. That meant transfers to and from teams promoted to EPL at the beginning of season as well as transfers to teams relegated at the end of season could not be analyzed. In addition, players who had not played or had played very little

(generally less than 500 minutes) in the season before or after their transfer could not be considered for analysis as they did not provide reliable amount of data. Also, goalkeepers were excluded from the analysis as their functions are largely different from outfield players. Using the above criteria, all transfers occurred within the EPL were scanned and

69 were found to be eligible for analysis using the available dataset. The set of input data derived from the raw data and used for running the model is shown in Figure 12.

Appendix 3 includes details on deriving input attributes from raw data.

37

Figure 12 – Model Structure and Inputs. The figure shows the data required for running

the model.

Performance data obtained from players is used to determine the likelihood of selecting each of the available decisions in each state and the transition probabilities associated with that decision. General model flow in one stage of the SMDP is shown in

Figure 13. Appendix 4 describes the model’s detailed flow.

38

Figure 13 – Model flow in each stage of the SMDP

5.3 Game-flow

The data collected from players reflects their performances throughout the season.

It is clear that some games in a domestic league are more difficult than others. However, the overall contribution of a player to team performance is the sum of his contribution in all games. Therefore, it was decided to develop the model in a way that it is neutral to the quality of the opponent. In fact, certain attributes in players’ performance data (such as pass completion, duel wins, dribble success, etc.) help shape an average opponent for the entire length of the season. To demonstrate this concept, let’s assume Team A has 100%

39

passing accuracy. This means once they gain possession of the ball, the opponent will never have the ball again if players in Team A decide to pass the ball in every stage.

Clearly, no team has 100% passing accuracy. So if Team A has 80% passing accuracy, the interpretation is that the average quality of opponents throughout the season along with the imperfections in Team A has led to this number. This means that the pass accuracy percentage has captured not only the quality of Team A but also the quality of the average opponent in intercepting passes and forcing errors. This concept works similarly for other parameters of individual and team performance and is an important element in developing a model for the entire season rather than creating a different model for each game.

In order to perform a discrete-event simulation of a seemingly continuous system, it is highly important to represent the frequency and the number of decision epochs accurately. The goal in this study was to capture the largest possible portion of player decisions in the model. Therefore, it was assumed that every time the ball is touched, a decision has been made. Knowing that in SMDP each stage consists of a decision followed by an action, we can say that the total number of stages in the model will be equal to the number of decisions and therefore equal to the number of times the ball is touched. Since each team makes only a portion of total touches on the ball in a game, the total number of decision epochs in any given time period can be computed as:

(1)

40

Using this formula, the number of decision epochs for Arsenal in the 2008/09 season of EPL was computed to be 55,581. With the 38 games in the season consisting of

3,420 minutes, this translates to a decision epoch every 3.7 seconds. The resulting numbers for other teams show little variation in the number of decision epochs.

In order to measure the effects of team compatibility on performance, it is important that the different forms of interaction between players are captured in the model. The

Semi-Markov Model developed in this study is based on a detailed game flow with sub- models for scoring and conceding goals. The purpose of the game flow portion of the model is to provide an accurate picture of the types of effort needed from a player to complement the process of decision making and execution of teammates. This will enable an evaluation of a player’s qualities with respect to specific team requirements.

The first step to develop a game flow was to define the different states of the game.

Since this model is a decision process, it views the game from the players’ perspective.

So, it is important that the defined states reflect that. Looking at the game as Player A on the pitch, there are four possible states as listed below:

1. Player A has possession of the ball.

2. One of the teammates of Player A has possession of the ball.

3. One of the opposition players has possession of the ball.

4. No player on either side has full possession of the ball.

Once the states have been defined, the decisions available to each player can also be defined. Table 3 shows the decisions available in each state.

41

Table 3 – Available Decisions for Players in each State

Available Decisions

State 1 Short Pass Long Pass Shoot Dribble

State 2 ATR* Short Pass ATR Long Pass Make Space

State 3 Contain Intercept

State 4 Contain Challenge

* ATR = Attempt to Receive

For each state, the likelihood of taking each available decision is computed and a probability is defined for transition to each of the States 1 through 4 as a result of executing that decision. Table 4 shows sample decision likelihood and transition probability matrices in State 1. The decision likelihoods are computed by dividing the number of attempts for executing each of the actions by the total number of touches a player has had on the ball. The transition probabilities for each decision are computed using different methods to reflect the dynamics of the game. For instance, the transition probabilities associated with short passes, long passes, and dribbles are simply obtained from the success rates of those actions for each player. However, in the case of shots, off target attempts, saved attempts, and scoring shots take the system to State 3, while shots blocked take the system to State 4. That means we need to group the types of shots with similar resulting state together and divide by total shots. The decision likelihoods and transition probabilities for other states are computed using a similar approach.

42

Table 4 – Decision Likelihoods and Transition Probabilities in State 1 for Cesc Fabregas

Cesc Fabregas State 1 Transition State (2010/11)

Likelihood Decision 1 2 3 4

0.857 Short Pass 0 0.824 0.176 0

0.076 Long Pass 0 0.595 0.405 0

0.041 Shoot 0 0 0.704 0.296

0.026 Dribble 0.550 0 0.450 0

In addition to each player’s decisions for the type of action, it is important to capture the links between players in order to model ball distribution in a team. To do this, two matrices are created, one for short passes and one for long passes exchanged within the team. Each of the rows in these 11x11 matrices represents the probability of a given player passing the ball to each of the teammates, given that he decides to pass the ball.

These probabilities can be computed by dividing the number of passes the sender has made to that specific receiver by the total number of passes the sender has executed.

Since this data was not always available, the percentage of passes each player received was often estimated by the number of that player’s touches on the ball divided by the total number of touches by all players on the team. This estimate ensures that each player still has a realistic portion of the ball possession.

43

5.4 Scoring and Conceding Goals

An accurate ball distribution model in the game flow assists with a realistic simulation of the process of scoring. In this sub-model, three new parameters derived from historic data are imported into the model. These parameters are chance creation rate, chance distribution, and chance conversion rate. The process of goal scoring is triggered when a chance is created by the player in possession of the ball. This means the goal scoring process can be part of each stage of the model but is only triggered when a chance is created. Once used in the context of the game flow, this process determines not only how chances are created and who receives them but also what type of chances are created (header, low ball, etc.). Finally, the conversion probabilities are used to determine whether or not a goal is scored. Figure 14 shows the process of scoring a goal.

Figure 14 – Goal Scoring Process. The figure shows the three stages of the buildup for

scoring a goal.

44

The process of goal scoring can be considered easier to model than conceding goals. Regardless of whether or not goals are scored as a result of teamwork, there is always only one person scoring the goal. Therefore, it is possible to model goal scoring as an individual process within the context of the game flow. In the case of conceding however, there is not a specific person conceding the goal (since the argument that the goal keeper is the one and only person at fault does not make sense). Instead it is the team that has conceded. It is also important to note that unlike game simulation tools (such as computer games), our model does not simulate a game between Teams A and B as that would add unnecessary variability to the results. Instead, it simulates the performance of

Team A for an entire length of the season by only inputting data from Team A. This means that the process of conceding goals cannot be modeled as scoring of the opposition. Considering the characteristics of conceding a goal, the only realistic way to model it is as a team process. To accomplish this, the data and results from a study by

Tenga and Sigmunstad (2011) were used. This study analyzed 997 goals based on the starting position of the goal scoring possession and the number of passes leading to the goal. An analysis of their data provided the probability of conceding a goal based on the position where possession is lost and the number of passes exchanged between the opposition prior to that stage. Table 5 shows the percentage of goals conceded as a result of losing possession in each area of the pitch.

45

Table 5 – Percentage of Goals Conceded per Position of Lost Possession

Loss of Possession Percentage of Goals Conceded

Attacking Third 0.379

Midfield Third 0.531

Defensive Third 0.090

If the total number of goals conceded in a season and the total number of balls lost in each third are known, the probability of conceding a goal when losing possession in each third can be computed. Table 6 shows the probability of conceding goals per lost possession in each third of the field for Arsenal in 2010/11.

Table 6 – Probability of Conceding per Lost Possession in Each Third for Arsenal

(2010/11)

Loss of Number of Lost Percentage of Goals Probability of Conceding

Possession Possessions Conceded per Lost Possession

Attacking Third 2490 0.379 0.005

Midfield Third 1826 0.531 0.010

Defensive Third 217 0.090 0.015

Goals Conceded 37

In addition to the probability of conceding a goal in the entire length of the opponent’s possession, we need to know the probability of conceding as a function of the

46

number of stages in which the opposition keeps possession of the ball. This will lead us to finding the exact defensive utility of obtaining possession at each moment. By breaking down the data from Tenga and Sigmunstad’s study, we know that 66% of goals are scored with 4 passes or less. Given that each attacking decision (including passing) can be viewed as a trial for scoring, the decisions leading to a goal can be viewed as a

Bernoulli process. In this case, since possession is stopped once a goal is scored, the number of passes in a goal scoring possession will follow a geometric distribution. This modeling approach provides us with the shrinking probability needed to reward the defending team for keeping away the danger despite not gaining possession. It also helps take into account the stronger defensive organization against longer possessions and the greater defensive utility of taking back possession in the early stages of opposition’s possession. Given that the cumulative distribution function of a geometric distribution is approximately 0.67 at the mean, we can set the mean of the geometric distribution to 4.

To test the validity of this distribution, goodness of fit was measured against data from a study by Hughes and Franks (2007). That study had identified the number of passes leading to a goal in World Cups 1990 and 1994 and included a total of 244 goals. The proposed geometric distribution provided = 0.86 against that data, which implies a strong fit. A comparison between geometric distribution with mean = 4 passes and the data from Hughes and Franks is shown in Figure 15.

47

Figure 15 – Percentage of Goals Conceded versus Number of Passes Leading to the

Goals. The figure shows the conditional probability of conceding a goal in the

iteration of opposition’s possession, given that a goal is conceded during that

possession.

Using the project dataset, we know that passes make approximately of the

decisions in the game. Therefore, we can say that the number of iterations in a goal scoring possession follows a geometric distribution with mean = 6 stages. Given the properties of geometric distribution, we know that the conditional probability of conceding a goal in the iteration of the opposition’s possession can be computed as following:

48

(2)

Where:

By combining the two probabilities shown in Table 6 and Figure 15, we can compute the probability of conceding a goal in the iteration of an opposition’s infinitely long possession as:

(3)

Using this method, conceding can be modeled as a team process in which the goal is to gain back possession of the ball in order to cut down the probability of conceding. It is important to note that in the case of conceding goals, it is the conditional probability of conceding that follows a geometric distribution. However, the actual probability of conceding in the iteration of opposition’s possession does not follow a Bernoulli process.

49

6. Output Analysis and Results

6.1 Baseline Model Accuracy

It was previously shown that the entire length of the season can be simulated in approximately 55,581 iterations of the SMDP model. Due to the highly stochastic nature of the game, a single simulation run provides a standard deviation of approximately 10 units of goal differential. To reduce the variability in output, in each scenario the model was set to simulate the length of the season 100 times and take the average of the expected goal differential. That means a run of the model consists of completing 55,581 iterations of the SMDP 100 times and averaging the results. With 100 simulation runs, the standard error falls below 0.1. An important part of validating the model is to measure its accuracy in capturing the dynamics of the game. To do this, the model was used to simulate performance and estimate the goal differential for all teams hiring any of the 69 players in this study in the season prior to the transfers. Then, a regression analysis was performed for estimated goal differentials versus actual goal differentials. Table 7 shows the results of the regression analysis. Also, Figure 16 shows the estimated goal differentials versus the actual goal differentials. As shown in the figure, the model accurately captures the trends in goal differential. Note that model output seems to underestimate goal differential. This is likely due to scoring accidental goals (own goals, deflections, defensive mistakes, etc.), which are not entirely caused by the scoring team’s actions and therefore are not captured by the goal scoring process. Conceding these types of goals is, however, automatically accounted for due to the mechanics of the conceding

50

process. By placing estimated goal differentials in the regression equation for all transfers from Table 7, we get highly accurate predictions shown in Figure 17, where average absolute prediction error is 7.217 units of goal differential. Appendix 5 lists the numeric values of actual and estimated goal differential using this method.

Table 7 – Linear Regression Equations and Prediction Accuracy of Actual Goal

Differential Based on Estimated Goal Differential

Year Regression Equation R-Squared

2009 Actual Goal Differential = 26.5 + 0.952* Estimated Goal Differential 0.868

2010 Actual Goal Differential = 22.7 + 0.801* Estimated Goal Differential 0.947

2011 Actual Goal Differential = 16.0 + 0.571* Estimated Goal Differential 0.858

Total Actual Goal Differential = 20.5 + 0.739* Estimated Goal Differential 0.856

51

Figure 16 - Estimated Goal Differential From the Model Compared to Actual Goal

Differential. Black line shows teams’ actual goal differential and gray line shows the

model’s estimate.

60

40

20 0 -20

-40 GoalDifferential -60

-80

Fulham

Arsenal

Chelsea Chelsea Everton

Hull City Hull

Liverpool Liverpool Liverpool

Stoke City Stoke City Stoke City Stoke

Sunderland Sunderland Sunderland

Portsmouth

Aston Villa Aston Villa Aston

Wigan Athletic Wigan

Wolverhampton Wolverhampton

Manchester City Manchester Manchester City Manchester City Manchester

Newcastle United Newcastle

West Ham United Ham West

Blackburn Rovers Blackburn

Bolton Wanderers Bolton Bolton Wanderers Bolton Wanderers Bolton

Manchester United Manchester United Manchester United Manchester

Tottenham Hotspur Tottenham Tottenham Hotspur Tottenham Hotspur Tottenham 2011 2010 2009

Figure 17 - Adjusted Estimated Goal Differential Compared to Actual Goal Differential.

Black line shows the teams’ actual goal differential and gray line shows the adjusted

estimate.

60 50

40 30 20 10 0 -10 GoalDifferential -20 -30

-40

Fulham

Arsenal

Chelsea Chelsea Everton

Hull City Hull

Liverpool Liverpool Liverpool

Stoke City Stoke City Stoke City Stoke

Sunderland Sunderland Sunderland

Portsmouth

Aston Villa Aston Villa Aston

Wigan Athletic Wigan

Wolverhampton Wolverhampton

Manchester City Manchester City Manchester City Manchester

Newcastle United Newcastle

West Ham United Ham West

Blackburn Rovers Blackburn

Bolton Wanderers Bolton Wanderers Bolton Wanderers Bolton

Manchester United Manchester United Manchester United Manchester

Tottenham Hotspur Tottenham Hotspur Tottenham Hotspur Tottenham 2011 2010 2009

52

Using the results of regression analysis for each year, it becomes possible to make highly accurate estimations of the teams’ goal differential in the following year. Figure

18 shows estimated goal differential in the 2009/10 and 2010/11 seasons by inputting performance data from those seasons and placing model output in the regression equation obtained from the 2008/09 and 2009/10 seasons respectively. The average error of these estimates is 7.857, which is only slightly higher than the average error of adjusted results based on the regression equation for all transfers.

Figure 18 - Adjusted Estimated Goal Differential Based on Regression Analysis of the

Previous Season. Black line shows actual goal differential and gray line shows model’s

estimation.

80

60

40

20

0

Goal Differential Goal -20

-40

-60

Arsenal

Chelsea

Liverpool Liverpool

Stoke City Stoke Stoke City Stoke

Sunderland Sunderland

Aston Villa Aston

Wigan Athletic Wigan

Wolverhampton Wolverhampton

Manchester City Manchester Manchester City Manchester

Newcastle United Newcastle

West Ham United Ham West

Blackburn Rovers Blackburn

Bolton Wanderers Bolton Bolton Wanderers Bolton

Manchester United Manchester United Manchester

Tottenham Hotspur Tottenham Hotspur Tottenham 2011 2010

53

The important implication of Figures 16, 17, and 18 is that the model can successfully detect the trends in team performance using its input attributes. Among many potential applications, this provides the possibility of predicting future changes in team performance by changing players. It is important to note that players’ attributes can often change in different environments. This is particularly true regarding decision making attributes under different tactics and with a different set of surrounding players.

So, it is clear that performance data that is collected when player was playing in the current team can be more reliable. However, when predicting results of player transfers, this data is not available. So, predictions will have to be made on the basis of player’s performance data in the previous team and with a larger margin of error. The general strategy in computing the change in team performance is to measure a team’s performance before and after a given transfer using the model. That means, instead of using the team’s actual goal differential prior to hiring a player, estimated goal differential provided by the model is used as baseline. Using this value helps reduce variability and evaluate the model only based on how accurately it predicts the changes in performance.

6.2 Transfer Analysis Strategy

To simulate the effect of a prospective player from Team A on the performance of

Team B in season , the model is first run for Team B in season . Then the prospective player’s data is placed in the dataset for Team B, then the model is run again, and the average numbers of goals scored and conceded are computed. The difference between the team’s expected goal differential with and without that player determines the

54

expected contribution of the player to team performance. The model also outputs many individual results such as the number of touches, shots, goals, and interceptions for each player, which are considered secondary for the purpose of this research.

The model output represents the expected change in team performance as a result of changing one player in the team. To measure the accuracy of the model, this number needs to be compared with the actual change in team performance after the transfer was made. To accomplish this for a transfer completed at the end of season , the difference between team performances in season with the new player on and off the pitch is computed. This number represents the actual contribution of that player to team performance. A comparison between the expected and actual change in team performance determines model accuracy.

To mimic the real process of player selection, the model treats each hiring decision made by a team independently from the other hiring decisions in the same transfer season and computes the expected performance. That means each player is placed into a context created only by players who played for the club in the previous season. This strategy provides an unbiased approach to all transfer decisions from the team’s perspective prior to entering a transfer season, without including future information. It also helps account for new players who may enter the team from outside the available database. Specifically, the model does not estimate performance in season

with all the new players. Instead, it places each prospective player in the context of the players who played for the team in season . This allows for including only the players within the database. In terms of measuring actual performance, it is important to note that one of the reasons for computing this parameter as the difference between team

55

performances in season with and without the new player as opposed to the difference between team performances in seasons and is to make the analysis neutral to other potential changes in a team. This approach ensures that actual performance is measured as the difference a player makes in a team in a given season by accounting for all players in the squad, yet regardless of the number of changes compared to the previous year.

6.3 Performance Variability

To draw valid conclusions from the model output, it is necessary to understand the variability in individual and team performance. With the high level of uncertainty involved in different aspects of soccer, it is common sense that if the same match is played multiple times in similar conditions it will likely end with different results.

Therefore, any modeling effort for soccer needs to address the variability in actual outcome, commonly known as “luck.” While there are many representations of luck in the game, the most symbolic and measurable type of uncertainty in results is related to hitting the woodwork. This refers to the scenarios when the ball hits the crossbar or each of the goal posts. The statistics on hitting the woodwork can uncover part of the variability in match outcomes. In the 2011/12 season of EPL, each team hit the woodwork an average of 14.5 times. This means the difference between a team’s goal differential in the two extremely “lucky” and “unlucky” scenarios is 29. While these extreme scenarios are unlikely, this view of the game helps understand the variability in a team’s performance in fixed conditions.

56

6.4 Transfer Analysis Results

The goal of the transfer analysis performed using the compatibility model is to predict the impact of a new player on the team’s goal differential over the course of a season. This means the quality of the model’s predictions is determined by the accuracy of estimated change in goal differential compared to actual change. This can be simply measured by absolute prediction error in terms of units of goal differential. Table 8 shows the four parameters computed in order to measure the quality of the model’s prediction on each player’s performance. Appendices 6 and 7 show the same parameters for transfers completed in the summers of 2010 and 2011.

Table 8 - Expected and Actual Team Performance

Transfers Completed in Summer of 2009 Expected Expected Scaled Scaled Player Differential Differential After Differential Differential Before Transfer Transfer With Without Roque Santa Cruz -7.100 -10.210 21.509 29.483 Glen Johnson 18.400 14.160 36.740 8.022 Antonio Valencia 18.950 30.630 44.467 101.963 Daniel Sturridge -4.480 24.120 106.505 67.723 Carlos Tévez -7.100 -20.180 29.812 18.587 Stewart Downing -28.430 -29.410 14.841 10.163 Emmanuel Adebayor -7.100 -8.150 30.922 22.649 Dean Whitehead -54.940 -52.000 -10.617 -32.821 Zat Knight -43.450 -43.930 -23.886 -38.000 -43.450 -47.111 -41.034 2.729 Peter Crouch -19.050 -15.833 30.964 16.945 Kolo Touré -7.100 -8.333 30.299 19.241 Seyi Olofinjana -57.110 -66.670 -54.526 -34.334 Sebastien Bassong -19.050 -16.520 29.817 18.224

57

Lee Cattermole -38.620 -31.857 9.157 -21.308 Damien Duff -15.920 -12.680 -2.690 -19.498 Jonathan Greening -15.920 -14.400 -6.835 -7.129 Joleon Lescott -7.100 -16.930 17.482 36.873 Robert Huth -54.940 -54.820 -13.715 -15.155 Stephen Warnock -28.430 -25.560 13.174 12.451 Michael Brown -41.760 -36.429 -12.398 -57.421 Sylvain Distin -1.680 2.580 18.486 -12.361 Jamie O'Hara -41.760 -43.040 -30.319 -35.258 Tuncay Şanlı -54.940 -63.390 -18.645 -11.208 Michael Turner -38.620 -33.940 -1.342 -27.454 Danny Collins -54.940 -58.318 -23.041 0.000 James Collins -28.430 -33.600 5.779 29.231 Richard Dunne -28.430 -26.980 10.920 35.625 Niko Kranjčar -19.050 -12.460 39.145 13.602 Ibrahima Sonko -57.110 -66.670 -54.889 -36.690

In Table 8, parameters “Expected Differential Before Transfer” and “Expected

Differential After Transfer” are provided by the model. The difference between these two numbers determines the Expected Contribution. This is the expected change in team performance caused by replacing a specific player in the lineup by the new player. The parameters “Scaled Differential With” and “Scaled Differential Without” represent the team’s scaled goal differential in season with and without that player and are computed by normalizing the data from the number of goals scored and those conceded with a given player on and off the pitch using the following equations:

(4)

58

(5)

The difference between these two numbers determines the Actual Contribution.

Figure 19 shows that the model generally detects the patterns of player contribution to team performance accurately. Appendices 8 and 9 show a similar trend for transfers completed in summers of 2010 and 2011.

Figure 19 - Actual and Expected Contributions for Transfers Completed in Summer of

2009. Gray line shows actual contribution and black line shows expected contribution.

60

40

20

0

-20

GoalDifferential -40

-60

-80

To determine the prediction quality of the compatibility model, it is important to gain an understanding of the current transfer success rate for EPL clubs. Figure 20 shows the actual contribution of the players analyzed in this study to team performance in their new teams.

59

Figure 20 - Transferred Players’ Actual Contribution to Their new Teams’ Performance.

Ibrahima Sonko Richard Dunne Danny Collins Tuncay Şanlı Sylvain Distin Stephen Warnock Joleon Lescott Damien Duff Sebastien Bassong Kolo Touré Sam Ricketts

Dean Whitehead Stewart Downing Daniel Sturridge Glen Johnson Paul Konchesky Franco Di Santo James Milner Martin Petrov Steven Fletcher Wilson Palacios Player Player ContributiontoTeam Performance Cameron Jerome Scott Dann Charles N'Zogbia Craig Gardner Demba Ba Jose Enrique John O'Shea Scott Parker Samir Nasri Jordan Henderson Phil Jones -80 -60 -40 -20 0 20 40 60 80 Units of Goal Differential

Actual Contribution

60

Looking at the players’ actual contribution, only 39 of the 69 players

(approximately 57%) made a positive impact on team performance. This can be an indicator of the current transfer success rate. To verify this, a survey was conducted among 25 soccer coaches, journalists, and fans asking whether or not they considered each of the signings of 10 elite European clubs in the transfer seasons of 2010 and 2011 to be successful (Appendix 10). The results showed a 63% overall success rate, which strongly agrees with the results from the dataset. To measure the model’s accuracy, expected contribution was computed for each of the transfers previously identified and absolute prediction error was calculated. The model provided an average absolute prediction error of 18.912 for all transfers. From the total of 69 transfers, 55 were found to have absolute prediction error of less than 29. That means, for approximately 80% of the transfers, the absolute prediction error was within the performance variability range.

To put the results into perspective, note that at this level of absolute prediction error, the negative effect of 22 of the 30 transfers ultimately leading to negative change in team performance was correctly detected in the model’s output. This provides a great potential for improving the quality of transfer decisions.

6.5 Multiple Changes in a Transfer Season

It is common for professional soccer clubs to hire more than one new player in a transfer season. This can introduce additional variability in team performance. However, it is often the case that clubs cannot sign all of their transfer targets for a variety of reasons.

Therefore, designing the model to analyze batches of transfers instead of single transfers could prove unrealistic as the information on the teams’ exact signings is only acquired

61

when the transfer season has ended. Table 9 shows the model output for teams with a given number of transfers within the EPL in a transfer season.

Table 9 - Absolute Prediction Error for Teams with a Given Number of Transfers in a

Transfer Season.

Mean Number of Signings Made by Number of Standard Deviation of Absolute a Team in a Single Transfer Transfers in Absolute Prediction Prediction Season Each Category Error Error 1 17 30.067 20.234 2 24 18.085 15.615 3 3 12.939 5.279 4 20 13.030 10.864 5 5 12.069 7.340 Total 69 18.912 15.910

While a larger dataset is needed to determine the exact effect of multiple transfers on model accuracy, the table shows that model accuracy increases as teams make more hires from within the same league. Looking at the transfers in the three transfer seasons covered in this study, all EPL teams hired 14 or more players over those three years

(Transfer League, 2013). That means the teams with the lowest number of signings hire an average of approximately 5 players per year. So, for teams hiring only 1 or 2 players from the EPL, we can generally expect a large number of signings from foreign leagues and lower English leagues. The lower model accuracy for teams with less EPL signings can be attributed to the tactical differences that players from outside the EPL bring with them. This can cause a larger change in team dynamics and reduce model accuracy. The consistency of the model in providing high accuracy for teams with 3 or more EPL

62

signings shows the success of this approach in utilizing team dynamics for predicting outcome of player transfers.

6.6 Suitability Analysis

To demonstrate the value of the model, we can compare the results with other approaches to player selection. Here, we compare the model’s absolute prediction error to a simpler method where for Player P playing in Team A in season n, we evaluate the predicted impact on Team B's goal differential in season by computing:

(Team A's scaled differential with Player P in season ) – (Team A's scaled differential without Player P in season )

This method provides an average absolute prediction error of 38.546. Considering the model’s 18.912 average absolute prediction error, we can say that using the model reduces the error by approximately 51%. Figure 21 compares the results from this approach with the model output.

63

Figure 21 - Comparison Between Absolute Prediction Errors With and Without the

Model. Gray bars show error from data analysis and black bars show error from the

model.

Ibrahima Sonko Richard Dunne Danny Collins Tuncay Şanlı Sylvain Distin Stephen Warnock Joleon Lescott Damien Duff Sebastien Bassong Kolo Touré Sam Ricketts Dean Whitehead Stewart Downing Daniel Sturridge

Glen Johnson

Paul Konchesky Franco Di Santo James Milner Kenwyne Jones

Joe Cole PredictionError Martin Petrov Steven Fletcher Wilson Palacios Cameron Jerome Scott Dann Charles N'Zogbia Craig Gardner Demba Ba Jose Enrique John O'Shea Scott Parker Samir Nasri Jordan Henderson Ashley Young Phil Jones 0 20 40 60 80 100 120 140 160 Units of Goal Differential

Error with the Model Error Without the Model

64

6.7 Player Classification Method

A complementary approach to measuring model accuracy in terms of the continuous measure of goal differential is to use a player classification methodology.

Using this approach, model accuracy can be measured in terms of the percentage of accurate classifications. This method can be particularly useful for translating the output into actionable information for coaches. The goal of the classification approach is to accurately predict the direction of change in team performance, given that the actual numerical outcome of team performance may largely change under different circumstances. In this approach, since the final objective of a decision support tool is a prediction of player contribution, players are divided into three categories: Players with positive impact, players with negative impact, and players with no considerable impact on team performance. The borderlines between these categories can be referred to as the

“effectiveness threshold”. To keep the analysis unbiased, this study defines the effectiveness threshold for players as a function of the number of minutes played during the season. There is also a subjective base effectiveness threshold defined as the difference between the team’s normalized goal differentials with and without a player who played exactly half of the season. This number is adjusted to find the effectiveness threshold for players with a different number of minutes played using the following formula:

65

(6)

(7)

The effectiveness threshold for players with different number of games played using a base threshold of 8 is shown in Figure 22. Each game in the figure stands for 90 minutes of playing time. This formula helps neutralize the effect of variability for players playing or missing very few minutes. The red line shows the maximum change in goal differential that hitting the woodwork can potentially cause. As shown in the figure, for the majority of players the effectiveness threshold is well below the red line. This demonstrates the high accuracy delivered by the model given the variability in team performance.

66

Figure 22 - Effectiveness Threshold per Games Played. Blue line shows the required

difference between “Normalized Differential With” and “Normalized Differential

Without” parameters to exceed the effectiveness threshold. Red line indicates maximum

variability caused by hitting the woodwork

160

140

120

100

80 Effectiveness Threshold Woodwork Difference 60

Changein Goal Differential 40

20

0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Games Played

In addition to the classification of players’ actual contribution, there is a need to create player classifications based on the expected performance estimated by the model.

It is important to note that unlike measuring the players’ actual contribution, we can reduce the effect of performance variability in prediction of the expected contribution by running the model for a large enough number of iterations until the outputs converge.

This means we need a much smaller effectiveness threshold for the expected performance. Another difference between the classification of the expected performance and the actual performance is that the classification strategy of expected performance is a function of the model’s internal characteristics and can be a fixed number, while

67

classification of the actual performance reflects the performance requirements set by coaches and can change under different circumstances. To account for the small variability in the model outputs, the effectiveness threshold for expected performance was set to 3. This means that only a change of 3 or more units of goal differential in the model output was considered a positive or negative impact on team performance.

Using a base effectiveness threshold of 8, the model correctly predicted the performance category of 59 players in their future teams. This provides 85.6% prediction accuracy. It means the model provides a large gain in classification accuracy compared to the success rate of current transfer decisions as previously described.

6.8 Sensitivity Analysis

Knowing that the base effectiveness threshold can vary for different hiring decisions, an analysis was performed to determine the sensitivity of model accuracy to the value of this attribute. Figure 23 shows model accuracy for different values of base effectiveness threshold.

Figure 23 - Classification Accuracy for Different Base Effectiveness Thresholds.

0.9 0.8 0.7

0.6 0.5 0.4 0.3

0.2 ModelAccuracy 0.1 0 4 6 8 10 12 14 16 18 20 Base Effectiveness Threshold

68

The sensitivity analysis shows that model accuracy is at its highest when the base effectiveness threshold is between 7 and 10. It is also important to note that even for very high thresholds that are less realistic, the model accuracy still remains above the current level of transfer success. Looking at the effectiveness threshold for different number of games played in a season in Figure 24, we can see that by increasing the base threshold from 8 to 12, the effectiveness threshold for players who play or miss a small number of games is increased by a large amount. That means many players with a positive impact on team performance can be deemed ineffective. Similarly, by decreasing the base threshold from 8 to 4, the effectiveness threshold for players who play or miss a small number of games is reduced by a large amount. That means it becomes very easy to classify those players incorrectly due to variability in data. Therefore, it seems that the optimal range for the model’s classification accuracy is also the most realistic area to set the base effectiveness threshold.

69

Figure 24 - Changes in the effectiveness thresholds for different number of games played

by changing the base threshold

250

225

200

175

150

125 Base Threshold = 4 Base Threshold = 8 100 Base Threshold = 12

75 Changein Goal Differential

50

25

0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Games Played

70

7. Conclusion

7.1 Overview of Model, Results, and Benefits

The SMDP model presented in this study provides a foundation for using player attributes in estimating overall team performance. It was shown that the model is capable of estimating goal differential with an average error of 7.857. The error goes up to 18.912 on average when predicting performance with newly hired players who have not played with the rest of the team yet. However, the average error drops to approximately 13 or less for teams hiring 3 or more players from the same league. This implies that the changes in team dynamics can be smaller if fewer players from other leagues are hired.

The model also provides 85.6% classification accuracy when the goal is to identify the direction of change in team performance caused by hiring a new player.

It is a clear objective for all professional clubs to limit the number of transfers, leading to a negative impact on team performance. The model developed in this study can considerably reduce the loss resulting from poor transfers if used as a decision support tool for estimating team performance under specific hiring decisions.

One of the advantages of a strictly quantitative approach to performance analysis is that personal opinions are excluded from the conclusions. This type of application is meant to serve as a decision support tool for decision makers who ultimately make judgment-based decisions. Therefore, it is best for the supporting system to rely on quantitative sources in order to bring a different kind of insight into the process of decision making. Looking at the model outputs, some of the results may be surprising.

71

Among players with poor performance there are ones who have received positive reviews and vice versa. This difference in assessment is caused by the fact that in this simulation model, player actions only have value as they contribute to scoring goals and preventing the opposition from scoring. For instance, if a striker is strong in the air but does not receive many good crosses, he cannot use that skill to contribute to team performance.

Similarly, if a striker scores a good number of goals but does not perform well defensively, his net contribution to team performance will be reduced. In this case, the public may judge the striker’s performance primarily based on his goal scoring record and assign less value to his other less visible but highly important attributes. With the large amount of on-the-ball data available from players in top leagues, the key to more accurate performance analysis is identifying and capturing right types of off-the-ball data.

One of the goals of this model is to capture the effect of those less visible actions on team performance. Given the presented results, it seems highly important to dedicate more efforts to understanding the impact of players’ off-the-ball actions on team performance.

Soccer is one of the most difficult sports to analyze due to its continuity and players’ dynamic positions. These attributes along with the highly stochastic nature of the game have caused most of the analysis efforts to be focused on individual aspects of the game that can be generally modeled with higher confidence rather than the entire game as a whole (Hirotsu & Wright, 2003) (Shafizadeh, Gray, Sproule, & McMorris, 2012).

While it is a reality that soccer as a process is too complex to fully model, many player characteristics can only be modeled and evaluated in the context of team performance.

This study was aimed not at developing a perfect model for the game of soccer, but at applying the large amount of available player data in the context of a compatibility model

72

with the purpose of predicting team performance more accurately than is currently done.

The large improvement in prediction accuracy using this model shows the potential of compatibility analysis for impacting team performance at the highest level of club soccer.

7.2 Limitations and Future Directions

The main limitation to quantitative analysis of soccer is the types of available data. Despite considerable advances in data collection tools and methodologies, the main challenge is that many player actions remain difficult to quantify. Some people have argued that the inability to quantify player actions makes quantitative analysis of sports less relevant. However, there is a general realization that quantitative analysis, despite its current imperfections, can lead to performance improvement. As data limitations seem to remain an issue in the near future, the largest challenge ahead for researchers is to develop methodologies that capture the effects of qualitative attributes despite the inability to directly measure them. The team-oriented model presented in this study is an effort to capture team dynamics that cannot be directly measured but which still influence the team’s overall performance.

Another important challenge in the quantitative analysis of soccer is to design studies in a way in which actionable results are generated. The key to growing the field of performance analysis of sports is in earning the coaches’ trust. However, coaches often complain that the statistics presented to them do not reflect the realities on the field. For instance, if a striker fails to put himself in goal scoring positions but scores the few chances he gets, providing the striker’s conversion rate or the midfielders’ chance creation rate can be misleading. Similarly, if a striker keeps the defenders busy while

73

other players create an opportunity and score a goal, using only the data on the goal and the assist does not reflect the reality. These are examples of the types of problems that the quantitative analysis of soccer should address in the future. Given the limitations in data collection, solving these problems will only be possible through developing creative methodologies to indirectly measure the hidden attributes.

74

8. Bibliography

A.T. Kearney. (2011). The Sports Market, Major Trends and Challenges in an Industry

Full of Passion. Chicago: A.T. Kearney.

Banks, J., Carson, J. S., Nelson, B. L., & Nicol, D. M. (2005). Discrete-Event System

Simulation. New Jersey: Pearson Education Inc.

Beck, N., & Meyer, M. (2011). Modeling team performance; Theoretical and empirical

annotations on the analysis of football. Empirical Economics, 335–356.

Beetz, M., Kirchlechner, B., & Lames, M. (2005). Computerized Real-Time Analysis of

Football Games. Pervasive Computing, 33-39.

Boon, B. H., & Sierksma, G. (2003). Team formation: Matching suality supply and

quality demand. European Journal of Operational Research, 277-292.

Britannica. (2013). Encyclopedia Britannica. Retrieved May 26, 2013, from Britannica:

http://www.britannica.com/EBchecked/topic/550852/football

Denardo, E. V. (2003). Dynamic Programming Models and Application. Mineola: Dover

Publications inc.

Dobson, S., & Goddard, J. (2010). Optimizing strategic behaviour in a dynamic setting in

professional team sports. European Journal of Operational Research, 661-669.

EPL Index. (2013). EPL Index Stats Center. Retrieved May 26, 2013, from EPL Index:

http://www.eplindex.com/stats/

FIFA. (2012). FIFA Financial Report. Zurich: FIFA.

75

FIFA. (2013). FIFA Associations. Retrieved May 26, 2013, from FIFA:

http://www.fifa.com/aboutfifa/organisation/associations.html

FIFA. (2013). FIFA, Clubs. Retrieved May 26, 2013, from FIFA:

http://www.fifa.com/classicfootball/clubs/europe/index.html

FIFA. (2013). History of Football - The Origins. Retrieved May 26, 2013, from FIFA:

http://www.fifa.com/classicfootball/history/the-game/origins.html

Figueira, J. (2013, January 30). Stats and Info. Retrieved May 26, 2013, from ESPN

Deportes:

http://espndeportes.espn.go.com/blogs/index?entryID=1712628&name=stats_and

_info&cc=3888

Fontevecchia, A. (2012, July 20). FC Barcelona Raking In The Big Bucks: Revenue Hits

Record €494 Million. Retrieved May 26, 2013, from Forbes:

http://www.forbes.com/sites/afontevecchia/2012/07/20/fc-barcelona-raking-in-

the-big-bucks-revenue-hits-record-602-million/

Four Four Two. (2012, April 25). Stats Zone analysis: How Chelsea shocked Barça and

made it to Munich. Retrieved May 26, 2013, from Four Four Two:

http://fourfourtwo.com/blogs/statszone/archive/2012/04/25/stats-zone-analysis-

how-chelsea-shocked-barca-and-made-it-to-munich.aspx

Goldner, K. (2012). A Markov Model of Football: Using Stochastic Processes to Model a

Football Drive. Journal of Quantitative Analysis in Sports, Article 1.

Grundy, T. (1998). Strategy, value and change in the football industry. Strategic Change,

127-138.

76

Hirotsu, N., & Wright, M. (2003). Determining the best strategy for changing the

configuration of a football team. Journal of the Operational Research Society,

878–887.

Hughes, M., & Franks, I. (2007). Analysis of passing sequences, shots and goals in

soccer. Journal of Sports Sciences, 509-514.

Hughes, M., Caudrelier, T., James, N., Redwood-Brown, A., Donnelly, I., Kirkbride, A.,

et al. (2011). Moneyball and soccer - an analysis of the key performance

indicators of elite male soccer players by position. Journal of Human Sport &

Exercise, 402-412.

Ismer, S. (2011). Embodying the nation: football, emotions and the construction of

collective identity. The Journal of Nationalism and Ethnicity, 547-565.

Johnes, M. (2004). ‘Heads in the Sand’: Football, Politics and Crowd Disasters in

Twentieth-Century Britain. Soccer and Society, 134–151.

Jones, N. (2009, March 5). Football Violence & Top 10 Worst Football Riots. Retrieved

May 26, 2013, from Soccer Lens: http://soccerlens.com/football-violence-worst-

football-riots/23093/

Kang, C.-H., Hwang, J.-R., & Li, K.-J. (2006). Trajectory Analysis for Soccer Players.

Sixth IEEE International Conference on Data Mining. IEEE Computer Society.

Murray, W. J. (1996). The World's Game: A History of Soccer. Illinois: Board of Trustees

of the University of Illinois.

77

Oberstone, J. (2011). Comparing Team Performance of the English Premier League,

Serie A, and La Liga for the 2008-2009 Season. Journal of Quantitative Analysis

in Sports, Article 2.

Rösch, D., Hodgson, R., Peterson, L., Graf-Baumann, T., Junge, A., Chomiak, J., et al.

(2000). Assessment and Evaluation of Football Performance. American Journal of

Sports Medicine, 29-39.

Ross, S. (2007). Introduction to Probability models. Burlington: Academic Press.

Shafizadeh, M., Gray, S., Sproule, J., & McMorris, T. (2012). An exploratory analysis of

losing possession in professional soccer. International Journal of Performance

Analysis in Sport, 14-23.

Shobe, H. (2008). Football and the politics of place: Football Club Barcelona and

Catalonia, 1975–2005. Journal of Cultural Geography, 87-105.

Szymanski, S., & Smith, R. (2006). The English Football Industry: profit, performance

and industrial structure. International Review of Applied Economics, 135-153.

Tenga, A., & Sigmundstad, E. (2011). Characteristics of goal-scoring possessions in open

play: Comparing the top, in-between and bottom teams from professional soccer

league. International Journal of Performance Analysis in Sport, 545-552.

Tenga, A., Holme, I., Ronglan, L. T., & Bahr, R. (2010). Effect of playing tactics on goal

scoring in Norwegian professional soccer. Journal of Sports Sciences, 237–244.

The People History. (2013). History Of Soccer / . Retrieved May 26,

2013, from The People History:

http://www.thepeoplehistory.com/soccerhistory.html

78

Transfer League. (2013). Premiership Transfers. Retrieved August 09, 2013, from

Transfer League: http://www.transferleague.co.uk/premiership-transfers

Tuñón, J., & Brey, E. (2012). Sports and Politics in Spain – Football and Nationalist

Attitudes within the Basque Country and Catalonia. European Journal for Sport

and Society, 7-32.

UEFA. (2012, August 10). UEFA Champions League revenue distribution. Retrieved

May 26, 2013, from UEFA:

http://www.uefa.com/uefa/management/finance/news/newsid=1845591.html

Winston, W. L. (2004). Operations Research Applications and Algorithms. Belmont:

Thomson Learning.

Young, W. A. (2010). A Team-Compatibility Decision Support System to Model the NFL

Knapsack Problem: An Introduction to HEART. Ann Arbor: ProQuest LLC.

79

9. Appendices

Appendix 1 – Overview of Semi-Markov Decision Process

Discrete Time Markov Process (Markov Chain)

Markov chains are the processes describing transition probabilities over discrete time intervals in stochastic systems with multiple available states. There are three attributes in a Markov chain:

State: To use a Markov chain, it is necessary to clearly define the possible states of a system. A state describes the condition a system is in at a certain point in time.

Transition Probabilities: Once all possible states are defined, a probability can be computed for transitioning from a given state to any other state over a discrete time period. These probabilities can be computed from the systems’ historical data or technical specification.

Stage: Since Markov chains are used for modeling discrete systems, it is possible to count the number of transitions between two points in time. In Markov models, stage refers to the number of transitions. For instance, if we are interested in a system’s state after four discrete time intervals, we can say we are computing the probability of being in a certain state after four stages.

80

Based on the descriptions above, transition probability matrix P can be constructed with the set of transition probabilities ( ) from state “i” to state “j” over one stage for a system with a total of “n” possible states:

State 1 2 … j … n 1 P1,1 P1,2 … P1,j … P1,n 2 P2,1 P2,2 … P2,j … P2,n P: . . . … . … . i Pi,1 Pi,2 … Pi,j … Pi,n . . . … . … . n Pn,1 Pn,2 … Pn,j … Pn,n

Semi-Markov Decision Process

When using Markov chains to describe a system’s behavior the fundamental assumptions is that the system’s state in the next stage is only determined by the current state and the corresponding transition probabilities. There are systems that possess the general attributes used in Markov chains but do not meet this underlying assumptions.

Semi-Markov decision process is used to model probabilistic systems where user can control the system by making decisions in each stage. In this type of models there are two or more decisions available in each stage. This means the system’s state in the next stage depends on its current state, the decision made in the current stage, and the transition probabilities associated with that decision. As a result, there will be a different transition probability matrix under each decision.

Based on the description above, transition probability matrix can be constructed with the set of transition probabilities ( ) from state “i” to state “j” under decision “k” over one stage for a system with a total of “n” possible states:

81

State 1 2 ... j … n 1 P1,1,k P1,2,k … P1,j,k … P1,n,k 2 P2,1,k P2,2,k … P2,j,k … P2,n,k

: . . . … . … . i Pi,1,k Pi,2,k … Pi,j,k … Pi,n,k . . . … . … . n Pn,1,k Pn,2,k … Pn,j,k … Pn,n,k

In this type of systems there is a probability associated with each available decision. In each stage of the SMDP the decision likelihoods determine which decision can be taken. The decision likelihoods for a system with “m” available decisions can be shown as below:

Decisions State 1 2 … k … m 1 L1,1 L1,2 … L1,k … L1,m 2 L2,1 L2,2 … L2,k … L2,m L . . . … . … . i Li,1 Li,2 … Li,k … Li,m . . . … . … . n Ln,1 Ln,2 … Ln,k … Ln,m

As a result, the probability of transitioning from state “i” to state “j” under decision “k” with likelihood “ ” can be computed as: ).

82

Appendix 2 – Regression Analysis Results for EPL, La Liga, and Serie A

Regression Analysis: Points versus Goal Differential in EPL

The regression equation is Points10 = 48.7 + 0.566 Goal Differential10

Estimated points based on goal differential

EPL 2010/11 Actual Points Estimated Points 73 70.774 70 70.208 64 66.246 59 60.586 55 52.662 52 55.492 48 51.530 46 48.700 45 52.662 43 48.700 43 40.776 41 49.266 41 41.342 41 41.342 38 40.776 38 39.078 35 36.248 35 35.682 33 35.682 32 36.248

83

Regression Analysis: Points versus Goal Differential in La Liga

The regression equation is Points11 = 53.1 + 0.603 Goal Differential11

Estimated points based on goal differential

La Liga 2010/11 Actual Points Estimated Points 96 100.030 92 96.805 71 65.200 62 58.750 58 52.945 58 54.880 58 58.105 49 46.495 47 51.655 47 47.785 46 43.270 46 42.625 45 43.915 45 45.205 45 41.335 44 45.205 44 42.625 43 41.980 35 36.820 30 30.370

84

Regression Analysis: Points versus Goal Differential in Serie A

The regression equation is Points10 = 51.9 + 0.825 Goal Differential10

Estimated points based on goal differential

Serie A 2010/11 Actual Points Estimated Points 82 85.725 76 74.175 70 68.400 66 70.050 66 65.100 63 57.675 58 60.150 56 47.775 51 56.025 51 50.250 46 50.250 46 45.300 46 42.000 45 46.125 43 42.000 42 37.875 41 35.400 36 38.700 32 37.050 24 27.975

85

Appendix 3 – Deriving Input Attributes From Raw Data

Player Attributes

Action Likelihood Transition Probability P(State 1) = (Accurate Passes - Accurate Long Balls)/(Total (Total passes – Total Long Balls)/(Total State 1 Short Pass Passes - Total Long Balls) Touches) P(State 3) = 1 - P(State 1) P(State 1) = (Accurate Long Balls)/(Total Long Balls) Long Pass (Total Long Balls)/(Total State 1 Touches) P(State 3) = 1 - P(State 1) P(State 3) = (Goals + Shots Off Target)/(Total Shots) Shot (Total Shots)/(Total State 1 Touches) P(State 4) = 1 - P(State 3) P(State 1) = (Successful Dribbles)/(Dribble Attempts) Dribble (Total Dribble Attempts)/(Total State 1 Touches) P(State 3) = 1 - P(State 1) Possession (Unsuccessful Touches + Dispossessed + Ball P(State 3) = 1 Errors Overrun)/(Total State 1 Touches) Attempting to Transition is determined by the player in state 1. Receive a (Total Touches)/(Total Team Touches) This action only controls ball distribution within the team. Short/Long Pass Ending (Interceptions + Clearances)/(Total Opponent P(State 1) = (Interceptions)/(Interceptions + Clearances) Opposition’s Touches) P(State 4) = 1 - P(State 1) Possession P(State 1) = (Successful Tackles + Successful Ground Duels Challenging for (Total Tackles + Total Ground Duels + Total + Successful Aerial Duels)/(Total Tackles + Total Ground the Ball Aerial Duels)/(Total Team Challenges) Duels + Total Aerial Duels) P(State 3) = 1 - P(State 1) This attribute only detects chance creation and does not represent a specific action. Chance Creation (Chances Created)/(Total Touches) Therefore, it is not associated with a specific transition function This attribute only detects chance distribution and does not Chance represent a specific action. (Chances Received)/(Total Chances) Distribution Therefore, it is not associated with a specific transition function This attribute only detects chance conversion and does not Chance represent a specific action. (Goals)/(Chances Received) Conversion Therefore, it is not associated with a specific transition function Total State 1 Touches = Total Touches – (Successful Tackles + Ground Duels Won + Aerial Duels Won) Note: All player attributes are scaled by multiplying to (Total Minutes/Played Minutes)

86

Appendix 4 – Detailed Model Flow

Game Flow

 Set initial states

o If a player is in state 1, set state for other players to 2

o If a player is in state 3, set state for other players to 3

o If a player is in state 4, set state for other players to 4

 Set action

o If any of the players is in state 1:

. Using a random number, select one of the 4 available actions in

that state based on the decision likelihoods for that player

o If any of the players is in state 2:

. Using a random number, select one of the 3 available actions in

that state based on the decision likelihoods for that player

o If any of the players is in state 3:

. Using a random number, select one of the 3 available actions in

that state based on the decision likelihoods for that player

o If any of the players is in state 4:

. Using a random number, select one of the 3 available actions in

that state based on the decision likelihoods for that player

 Set state transitions

o If a player was in state 1:

. If a possession error is made, New state is 2

87

. Using a random number, select the new state based on the

transition probability for the selected action for that player

. If the player’s new state is 1, set other players’ states to 2

. If the player’s new state is 3, set other players’ states to 3

. If the player’s new state is 4, set other players’ states to 4

. If the player’s new state is 2 (which means he successfully

completed a short pass or a long pass):

 If at least one player with initial state 2 has attempted to

receive the same type of pass as executed:

o Identify the action he took (whether short pass or

long pass)

o Identify the players with initial state of 2 whose

action was to attempt to receive the same type of

pass as executed (we can refer to these players as

pass candidates)

o Each player has a percentage of the team’s ball

possession. Divide the percentage of each

candidate’s possession by the sum of their

percentages to find the probability of each candidate

receiving the pass

o Using a random number, select the pass receiver

based on the probabilities computed in the previous

step

88

 If none of the players with initial state 2 have attempted to

receive the same type of pass as executed:

o Using a random number, select the pass receiver

based on each players percentage of total ball

possession o If a player’s initial state was 3 (implying all players were in state 3):

. Using random numbers, identify the players that press the ball

based on each player’s probability of making a block

. Using the probability of each block leading to a transition to state 1

or 4, compute the overall probability of transitioning to state 1 or 4

. Using random numbers determine whether the transition will lead

to state 1, 3, or 4

. If transition to state 1:

 Select the player who will be in state one by dividing the

percentage of each blocker’s possession by the sum of their

percentages of overall possession.

 Set the states for other players to 2

. If transition to state 3:

 Set the states for all players to 3

. If transition to state 4:

 Set the states for all players to 4 o If a player’s initial state was 4 (implying all players were in state 4):

89

. Using random numbers, identify the players that challenge for the

ball

. Using the probability of each block leading to a transition to state 1

or 4, compute the overall probability of transitioning to state 1 or 4

. Using random numbers determine whether the transition will lead

to state 1, 3, or 4

. If transition to state 1:

 Select the player who will be in state one by dividing the

percentage of each blocker’s possession by the sum of their

percentages of overall possession.

 Set the states for other players to 2

. If transition to state 3:

 Set the states for all players to 3

. If transition to state 4:

 Set the states for all players to 4

Scoring

 If a player is in state 1 in stage n:

o Using each player’s chance creation rate, determine if a chance is

created when the player touched the ball (in state 1)

o If a chance is created, use chance distribution data to determine the

receiver of the chance

o If a player is in state 1 in stage n+1:

90

. Using players’ conversion rates, determine whether or not a

goal is scored

Conceding

 If the players are in state 3:

o Identify the player who has lost possession (defender, midfielder,

attacker)

o Determine the number of consecutive stages in which the players have

been in state 3 (including the current stage)

o Compute the probability of conceding as:

(The probability of conceding a goal as a result of losing possession in

a given area of the pitch) * (The probability of success in the nth

iteration of a geometric distribution, where n is the number of

iterations the players have been in state 3)

o Using random numbers and based on the computed probability of

conceding determine if a goal is conceded

91

Appendix 5 – Model’s Estimated Goal Differential Versus Actual Differential

Adjusted Estimated Actual Goal Estimated CLUB Goal Differential Goal Differential Differential Manchester United 39 27.440 40.778 Chelsea 38 22.600 37.201 Arsenal 31 19.120 34.630 Manchester City 21 23.180 37.630 Tottenham Hotspur 7 -11.880 11.721 Liverpool 12 9.760 27.713 2011 Bolton Wanderers 0 -38.990 -8.314 Stoke City 0 -44.850 -12.644 Newcastle United 1 -17.390 7.649 Aston Villa -13 -45.380 -13.036 Sunderland -13 -44.480 -12.371 Blackburn Rovers -14 -66.150 -28.385 Wolverhampton -23 -46.220 -13.657 Manchester United 53 36.020 47.119 Tottenham Hotspur 26 -7.040 15.297 Manchester City 27 -5.010 16.798 Liverpool 28 22.140 36.861 Sunderland -6 -33.180 -4.020 2010 Stoke City -11 -49.090 -15.778 Bolton Wanderers -25 -55.450 -20.478 Wolverhampton -23 -48.150 -15.083 Wigan Athletic -34 -68.470 -30.099 West Ham United -18 -53.470 -19.014 Manchester United 44 18.950 34.504 Liverpool 50 18.400 34.098 2009 Chelsea 44 -4.480 17.189 Everton 18 -1.680 19.258

92

Aston Villa 6 -28.430 -0.510 Fulham 5 -15.920 8.735 Tottenham Hotspur 0 -19.050 6.422 Manchester City 8 -7.100 15.253 Stoke City -17 -54.940 -20.101 Bolton Wanderers -12 -43.450 -11.610 Portsmouth -19 -41.760 -10.361 Sunderland -20 -38.620 -8.040 Hull City -25 -57.110 -21.704

93

Appendix 6 – Expected and Actual Team Performance for Transfers Completed in

Summer of 2010

Transfers Completed in Summer of 2010 Scaled Scaled Expected Differential Expected Differential Player Differential Differential Before Transfer After Transfer With Without Steven Fletcher -48.150 -45.150 -17.161 -21.956 Stephen Hunt -48.150 -41.000 -15.797 -22.574 Martin Petrov -55.450 -35.490 -9.312 0.000 Chris Smalling 36.020 30.710 50.573 36.570 Joe Cole 22.140 10.070 -34.372 29.977 Titus Bramble -33.180 -31.750 -10.255 -12.051 Kenwyne Jones -49.090 -36.040 12.236 -65.664 Nedum Onuoha -33.180 -30.350 -7.677 -22.892 James Milner -5.010 -8.600 5.007 59.869 William Gallas -7.040 -13.500 2.924 22.146 Franco Di Santo -68.470 -72.160 -26.250 -19.091 Lars Jacobsen -53.470 -56.500 -17.160 -40.743 Paul Konchesky 22.140 18.840 0.000 23.662

94

Appendix 7 – Expected and Actual Team Performance for Transfers Completed in

Summer of 2011

Transfers Completed in Summer of 2011 Scaled Scaled Expected Differential Expected Differential Player Differential Differential Before Transfer After Transfer With Without Phil Jones 27.440 15.560 41.864 79.167 Gael Clichy 23.180 20.940 67.355 55.043 Ashley Young 27.440 17.530 52.214 59.221 Mikel Arteta 19.120 25.730 42.836 -21.714 Jordan 9.760 10.170 10.274 -4.518 Henderson Charlie Adam 9.760 10.050 6.290 8.241 Samir Nasri 23.180 29.040 56.685 76.731 Peter Crouch -44.850 -47.640 -19.078 -9.357 Scott Parker -11.880 -3.300 37.309 -7.238 Wes Brown -44.480 -34.320 3.940 -6.093 John O'Shea -44.480 -35.130 -1.401 0.000 Stewart 9.760 7.900 9.665 0.000 Downing Jose Enrique 9.760 12.490 5.715 15.981 Raul Meireles 22.600 27.620 14.161 25.282 Demba Ba -17.390 -20.450 -2.505 34.696 Sebastian -44.480 -44.590 -3.786 9.634 Larsson Craig Gardner -44.480 -46.630 0.000 -2.615 Roger Johnson -46.220 -48.600 -39.194 -48.214 Charles -45.380 -39.910 -3.284 -35.812 N'Zogbia Matthew Upson -44.850 -48.950 -33.562 -9.971 Scott Dann -66.150 -71.350 -35.581 -16.814 Alan Hutton -45.380 -44.840 -13.189 -24.813 Cameron -44.850 -51.750 -53.118 -5.296 Jerome David N'Gog -38.990 -41.650 -28.416 -34.847 Wilson Palacios -44.850 -48.800 -42.750 -9.137 Yakubu -66.150 -75.330 -19.527 -56.529

95

Appendix 8 – Expected and Actual Contributions for Transfers Completed in Summer of

2010. Gray line shows actual contribution and black line shows expected contribution.

100.000 80.000

60.000

40.000 20.000 0.000

-20.000 GoalDifferential -40.000 -60.000 -80.000

96

Appendix 9 – Expected and Actual Contributions for Transfers Completed in Summer of

2011. Gray line shows actual contribution and black line shows expected contribution.

80.000

60.000

40.000

20.000

0.000

-20.000GoalDifferential

-40.000

-60.000

Yakubu

PhilJones

Demba Ba

ScottDann

WesBrown

GaelClichy Samir Nasri

ScottParker

Alan Hutton

John O'Shea

MikelArteta JoseEnrique

Peter Crouch

DavidN'Gog

Raul Meireles Raul Charlie Adam

Craig Gardner

Ashley Young

Roger Johnson

Matthew Upson

Wilson Palacios

CameronJerome

StewartDowning

CharlesN'Zogbia Jordan Henderson

97

Appendix 10 – Survey for Approximating Transfer Success Rate

Do you consider each of the following hiring decisions successful in terms of the overall outcome?

EPL Ye N Ye N Ye N Ye N Manchester United s o Chelsea s o Arsenal s o Manchester City s o De Gea David Luiz Mertesacker Aguero Jones Cahill Koscielny Nasri Smalling Benayoun Arteta Milner Hernandez Mata Park Dzeko Young Torres Andre Santos Kolarov Meireles Squillaci Savic Lukaku Gervinho Clichy Romeu Chamakh Yaya Toure Balotelli Boateng

Serie A Ye N Ye N Ye N Juventus s o A.C. Milan s o Inter Milan s o Vucinic Van Bommel Pazzini Elia Mexes Forlan Quagliarella Amelia Alvarez Barzagli Ibrahimovic Guarin Bonuchi Mesbah Obi Padoin Aquilani Ranochia Pirlo Nocerino Zarate Vidal Boateng Nagatomo Giacherini Emanuelson Coutinho Lichtsteiner Robinho Krasic Yepes Estigarribia El Shaarawy Storari Cassano Matri Taiwo Pepe Aquilani Luka Toni

98

La Liga Ye N Ye N Real Madrid s o Barcelona s o Sahin Fabregas Khedira Sanches Ozil Macherano Coentrao Villa Altintop Adriano Varane Afellay Callejon Di Maria Pedro Leon Canales

Bundesliga Ye N Bayern Munich s o Neuer Jerome Boateng Alaba Gustavo

99