<<

Eindhoven University of Technology

MASTER

Does data know more than the market? data-based goal prediction in association football, tested on the Asian betting market

Neggers, R.

Award date: 2016

Link to publication

Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration.

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Eindhoven, October 2016

Does data know more than the market? Data-based goal prediction in association foot- ball, tested on the Asian betting market.

by Rob Neggers

BEng Mechanical Engineering Student identity number 0721387

In partial fulfilment of the requirement for the degree of: Master of Science (MSc) in Operations Management & Logistics

First supervisor: Dr. S.S. Dabadghao Second Supervisor: Dr. M. Udenio Third assessor: Dr. A.M. Wilbik TUE. School of Industrial Engineering Series Master Thesis Operations Management and Logistics

Subject headings: Association football, Fuzzy interference systems, Neural networks, Prediction of , , Structural equation modelling. Abstract

In this thesis, variables from preceding are used to predict the outcome of asso- ciation football games. To do so, three different model types are used: partial least squares structural equation modelling (PLS-SEM), artificial neural networks and fuzzy interference sys- tems. These three models return predictions that have comparable errors and outperform other heuristics. Investigation of the Asian betting market shows that there are no biases big enough to be significantly profitable by blindly betting on a bet in a certain situation. It was however found that betting on favourites results in significantly higher returns than betting on under- dogs. When using the predictions of the different models to select bets using the closing lines of the Asian betting market, no significantly positive returns were obtained under any situa- tion. This implies that the used models are not good enough to make a profit by using their predictions.

i Preface & Acknowledgements

This thesis report marks the end of my time in the master’s programme of Operations Man- agement & Logistics and my time as a student in general. It is the turning point from being a student to the working life. I owe gratitude to the people that made this period challenging, interesting and most of all fun.

First of all, I would like to thank dr. Shaunak Dabadghao, my first supervisor. For not only being my supervisor and providing me with the right feedback and ideas at the right times, but even more so for accepting me to do my thesis about the application of operations research methods to sports. A field in which no master thesis projects are usually done at the faculty of Industrial Engineering, but a field that interests me greatly and I am truly passionate about. I would also like to thank my second university supervisor, dr. Maximiliano Udenio, for helping me out with my questions related to statistical methods and testing and for taking the time to provide extensive feedback on my final report. I want to thank Joris Bekkers, who showed me the relative ease of using Python to collect data from Squawka. Without this insight and the starting points he provided, the data I used would have much less depth and would’ve been a lot more time consuming to gather.

I would like to thank all friends I have made during my time at the TU/e and all my team mates of University Racing Eindhoven. With you all coming from different backgrounds varying from economics to electrical engineering, I learned much more from working with you than just the curriculum and I am truly grateful for all this knowledge outside of the borders of my own field. I would like to thank Tanja, for her mental support and understanding of my occasional lack of time while writing my thesis.

Finally, I want to thank my parents, for supporting me unconditionally throughout my whole time as a student and giving me the opportunity and freedom to pursue the education I wanted.

Rob Neggers Eindhoven, October 2016

ii Summary

Introduction & motivation

Previous research has been done on the use of mathematics to predict association footbal games (for example with a poisson distribution (Maher, 1982) and using Bayesian networks (Constantinou, Fenton, & Neil, 2012)), as well as previous research aimed at using mathemat- ical models to make predictions that were used on the sports betting market (by for example Dixon and Coles (1997) and Constantinou, Fenton, and Neil (2013)). However, these mod- els only used goals/points or psychological and situational factors as prediction variables. For American Football, research was done on using a regression model based on statistics from pre- vious games (Zuber, Gander, & Bowers, 1985). This model was able to generate profits on the betting market. Therefore, this thesis focussed on using game statistics as input for forecast- ing models to predict the number of goals in association football games. Three different model types were used for forecasting: Partial least squares structural equation modelling (PLS-SEM), neural networks and fuzzy interference systems. The forecasts of these models were then tested against the Asian betting market to see if they are able to generate a profit. Summarizing, this led to the following main research question:

What type of forecasting model based on match statistics is the most accurate for association football and how does this model’s performance compare to the Asian betting market?

Approach

To answer this question, it was first researched which variables should be included in the used models. Data was gathered from Squawka’s website using a Python script. Different types of rolling averages for the teams were calculated for all gathered variables and regression analyses were done for these averages as inputs and the goals conceded and scored in the next match as output variable. It was found that an average of the previous 5 matches before the predicted game for the predictor variables had the most significant relationship with the predicted vari- ables. Therefore, 5-match average values for the variables were used as inputs for the prediction models.

The three different models were fitted individually and then used to make predictions of the number of goals a team would score for an independent sample consisting of 1200 matches (resulting in 2400 predictions, one per team for each match). For these predictions, the mean absolute error and the root mean squared error were calculated. These error measures were calculated as well for a number of heuristics. All of these values are shown in Table 1.

iii Table 1: MAE and RMSE for created models and various heuristics

Model/heuristic MAE RMSE Structural equation model 0.9203 1.1661 Neural network 0.9083 1.1561 Fuzzy interference system 0.9217 1.1689 General average goals scored 1.0048 1.2484 Average goals scored for home and away team separate 0.9815 1.2350 General average goals scored per league 0.9988 1.2446 Average goals scored per league for home and away team separate 0.9754 1.2309 Average of a team’s goals scored in last 5 matches 1.0223 1.3003 Average of upcoming opponent’s goals conceded in last 5 matches 1.0441 1.3341 Average of scored and opponent’s conceded goals in past 5 matches 0.9755 1.2383

As can be seen in this table, the error measures of the three different models are quite compa- rable and outperform the other heuristics. Statistical testing was done too and no significant differences between the absolute errors of the models were found. This implies that there is not enough proof to claim that one model is better than another.

As a next step, the Asian betting market was analysed. Data from PinnacleSports was used, the with the highest limits in the world offering Asian-style markets. It was checked whether betting on bets that fall under certain situations is significantly profitable, and if betting on bets that fall under certain situations is more profitable than betting on the opposed bet of this situation. It was found that there are no type of bets that generate a significant profit when blindly betting on them. As expected, nearly all bets (apart from favourites and especially away favourites), return in a significant loss when blindly betting on them. When investigating opposing bets, it was found that blindly betting on favourites returns a significantly higher profit (or rather, a smaller loss) than blindly betting on underdogs. This is also particularly true for betting on home favourites as opposed to away underdogs and betting on away favourites as opposed to home underdogs.

The most interesting finding in this stage is that bets with higher odds (and thus higher volatil- ity), have lower returns (bigger losses) than bets with smaller odds (lower volatility). This conflicts with the phenomenon of risk premium that is present in normal investment markets, which assumes that investments with a higher volatility lead to a higher expected return.

Results

In the final stage of the project, the predictions made by the models were tested on the Asian betting market. Implied quality differences between the teams as well as an expectation for the total goals were calculated both for the models and for the odds of the bookmaker. Different selectivity settings were used in the simulations (defined as threshold between model prediction and bookmaker implied prediction). The returns of investment for betting with the different threshold settings were calculated and are depicted in Figure 1.

As can be seen in this graph, improving the selectivity of the models generally only decreases the performance of the models. This implies that the models are not better at predicting matches than the betting market. This claim is supported by the significance tests done to check whether

iv Figure 1: ROI results for betting with varying thresholds

the returns are significantly different from zero. For no model, there is a threshold setting that results in median returns significantly higher than zero, while several models have median returns significantly lower than zero. As last check of performance of the models, it was tested whether they could generate returns with a median significantly different from the bookmaker’s profit margin. Under certain threshold settings, this is the case for the models betting on the total number of goals, implying that they are better than randomly picking bets.

Conclusion and contributions

Concluding, the developed models purely based on game statistics are not accurate enough to generate significant profits when their predictions are used to select bets based on the closing odds of the Asian betting market. However, these closing odds are the sharpest lines in the world as they have been adjusted for all available last-minute information such as weather, injuries and managers revealing tactical plans in pre-game interviews. It is still possible that the models could generate profits on odds taken earlier before the start of the match. Apart from these practical implications, this project contributes to science in the following way:

• No previous research had been done on using game statistics from previous matches as input for prediction models in Association football

• For betting on association football, only betting on the outright winner and correct scores of games had been researched. This research adds to science by having focussed on the handicap and goal totals markets.

• Biases and behaviour of the Asian betting market for association football had not been researched previously

v Contents

1 Introduction 1 1.1 Motivation of study ...... 1 1.2 Summary of Literature review ...... 2 1.2.1 Prediction of association football games ...... 2 1.2.2 Dynamics and biases of betting markets ...... 4 1.2.3 Predictive modelling applied to sports betting ...... 5 1.3 Gaps in the literature and research questions ...... 7 1.4 Relevance to the field of operations research ...... 8 1.5 Report structure ...... 9

2 Data acquisition and treatment 10 2.1 Data acquisition ...... 10 2.2 Data preparation ...... 10 2.2.1 Handling missing values ...... 10 2.2.2 Outlier anlysis ...... 11 2.2.3 Calculating rest before a game ...... 11 2.2.4 ELO-ratings ...... 12

3 Selection of variables 14 3.1 The influence of rest ...... 14 3.2 Indicators of team quality ...... 16 3.2.1 Averages of match statistics ...... 16 3.2.2 Variable selection ...... 17

4 Modelling 19 4.1 PLS-SEM ...... 19 4.1.1 Baseline model ...... 21 4.1.2 Evaluation of measurement models ...... 22

vi 4.1.3 Evaluation of structural model ...... 24 4.1.4 First iteration ...... 26 4.1.5 Second iteration ...... 27 4.1.6 League-specific results ...... 29 4.2 Artificial neural networks ...... 30 4.2.1 Single layer neural networks ...... 32 4.2.2 Two layer neural networks ...... 35 4.2.3 Three layer neural networks ...... 35 4.2.4 League-specific results ...... 36 4.3 Fuzzy interference systems ...... 36 4.3.1 Performance in literature of input methods ...... 38 4.3.2 Variable selection ...... 39 4.3.3 Evaluation of model performance ...... 39 4.3.4 Grid partition ...... 40 4.3.5 Subtractive clustering ...... 41 4.3.6 Fuzzy c-means clustering ...... 41 4.3.7 League-specific results ...... 42 4.4 Comparison of models and heuristics ...... 42 4.4.1 Significance of prediction model differences ...... 43

5 Biases in the betting market 44 5.1 Asian handicaps ...... 44 5.2 Main lines ...... 46 5.2.1 Separate league results ...... 48 5.3 The effect of handicap size ...... 50 5.4 High and low risk lines ...... 52

6 Betting simulations with model results 54 6.1 Bet criteria ...... 54 6.2 Results ...... 57 6.3 Significance of results ...... 58 6.3.1 Testing difference from zero ...... 58 6.3.2 Comparison against random betting ...... 59

7 Conclusion 61

vii 7.1 Conclusions of research questions ...... 61 7.2 Limitations and ideas for future research ...... 62

References 64

Appendices 69

A Collected match statistics 70

B Signifiance of variables 72

C SEM results 75

D Fuzzy system variable selection 81

E Fuzzy system SRIC values 85

F Skewness values and confidence intervals of bias return vectors 89

G Number of bets simulated for different models 91

H Skewness values and confidence intervals for bet simulation returns 92

viii List of Figures

1 ROI results for betting with varying thresholds ...... v

1.1 Schematic overview of thesis project ...... 8

4.1 Path diagram baseline model ...... 22 4.2 Path diagram of first iteration ...... 26 4.3 Path diagram of second iteration SEM model ...... 27

4.4 Model of an artificial neuron (from Kantardzic, 2011) ...... 30

4.5 Multilayer perceptron (Kantardzic, 2011) ...... 31

4.6 AIC values for single layer networks ...... 33 4.7 AIC values for different datasets ...... 34

5.1 Regression on different line ROI’s (sides) ...... 50

5.2 Regression on different line ROI’s (home teams) ...... 51

5.3 Regression on different line ROI’s (away teams) ...... 51

5.4 Regression on different line ROI’s (overs) ...... 52

6.1 Linear relation between line and odd ...... 55 6.2 Polynomial fit between line and odd ...... 56 6.3 Exponential fit between line and odd ...... 56 6.4 ROI results for betting with varying thresholds ...... 57 6.5 ROI results for betting predictions trained on a maximum of 4 goals ...... 58

D.1 RMSE results for home goals with 1 variable fuzzy interference systems . . . . . 81 D.2 RMSE results for away goals with 1 variable fuzzy interference systems ...... 82

ix List of Tables

1 MAE and RMSE for created models and various heuristics ...... iv

3.1 Influence of rest for full dataset ...... 15 3.2 Influence of rest for top teams ...... 15 3.3 Influence of rest for non-top teams ...... 15 3.4 Influence of rest for mid-table teams ...... 16 3.5 Mean p-values for different averages ...... 17 3.6 Variables included in the models ...... 18

4.1 Coefficients of determination for different models ...... 25 4.2 Effect sizes of relationships ...... 25 4.3 Coefficients of determination for first iteration model after removing relationships 26 4.4 Effect sizes of relationships for model after first iteration ...... 27 4.5 Coefficients of determination for first iteration model after removing relationships 29 4.6 Effect sizes of relationships for model after first iteration ...... 29 4.7 Separate League MAE and RMSE values for the Iteration 2 SEM model . . . . . 29

4.8 Test set RMSE averages (1 to 10 nodes) for different regularization parameters . 34

4.9 AIC values for 2 layer network ...... 35 4.10 AIC values for 3 layer network ...... 35 4.11 Separate League MAE and RMSE values for a three layer neural network . . . . 36 4.12 RMSE results for away goals for a 3 variable fuzzy interference system ...... 40

4.13 SRIC results for using varying membership function types) ...... 41

4.14 Separate League MAE and RMSE values for a fuzzy c-means clustering FIS . . . 42 4.15 MAE and RMSE for created models and various heuristics ...... 43 4.16 Significances for model differences ...... 43

5.1 Closing side lines and odds at PinnacleSports for Ajax - FC Utrecht ...... 44 5.2 Closing total goals lines and odds at PinnacleSports for Ajax - FC Utrecht . . . 45

x 5.3 ROI results for all bets in certain situations (main lines) ...... 48

5.4 p-value results for opposing bets ...... 48

5.5 ROI per league for all bets in certain situations for different leagues (main line) . 49

5.6 p-values for different type of bets per league (Wilcoxon signed rank test) . . . . . 49

5.7 Regression results line/ROI (sides) ...... 50

5.8 Regression results line/ROI (home teams) ...... 51

5.9 Regression results line/ROI (away teams) ...... 52

5.10 Regression results line/ROI (overs) ...... 52

5.11 ROI results for all bets in certain situations for highest and lowest odds offered . 53 5.12 T-test p values for ROI’s of main lines and lines with the highest and lowest odds 53

6.1 Sides betting ROI’s and significance of return (compared to zero) ...... 59

6.2 Totals betting ROI’s and significance of return (compared to zero) ...... 59

6.3 Sides betting ROI’s and significance of return (compared to bookmaker’s margin) 60

6.4 Totals betting ROI’s and significance of return (compared to bookmaker’s margin) 60

B.1 Abbreviations of averages ...... 72 B.2 Significance of relationship of match variable averages with goals scored . . . . . 73 B.3 Significance of relationship of match variable averages with goals conceded . . . . 74

C.1 Values and significance of weights and loadings for measurement model . . . . . 75 C.2 VIF values for constructs ...... 77 C.3 Weights and significances constructs of baseline model ...... 78 C.4 Weights and significances constructs of model after first iteration ...... 79 C.5 Weights and significances constructs of model after second iteration ...... 80

D.1 RMSE results for home goals for a 3 variable fuzzy interference system ...... 83 D.2 RMSE results for away goals for a 3 variable fuzzy interference system ...... 84

E.1 SRIC results for a grid partition FIS with different variables ...... 86 E.2 SRIC results for a subtractive clustering FIS with different variables ...... 87 E.3 SRIC results for a fuzzy c-means clustering FIS with different variables . . . . . 88

F.1 Skewness values and confidence intervals for main lines ...... 89 F.2 95% confidence limits for skewness of returns per situation for different leagues . 90

xi G.1 Number of bets made by the models on sides for different thresholds ...... 91 G.2 Number of bets made by the models on totals for different thresholds ...... 91

H.1 Skewness 95% confidence intervals for returns when betting on sides ...... 92

H.2 Skewness 95% confidence intervals for returns when betting on totals ...... 92

xii Abbreviations & Terminology

Abbreviation or term Meaning AIC Akaike information criterion

Virtual lead or deficit that a team starts a match with in order to obtain betting possibilities for both sides with odds of approximately 2

FCM Fuzzy c-means clustering

GP Grid partition

Betting line for a side or total with the odds closest to 2. This line Main line usually allows the highest betting limits of the market

MF Membership function

NIC Network information criterion

Betting on the amount of goals scored being higher than the total Over (in betting) goals line as set by the bookmaker

PLS-SEM Partial least squares structural equation modelling

Betting market related to the score difference between to teams in a Sides (in betting) particular match

SEM Structural equation modelling

SC Subtractive clustering

SIC Schwarz information criterion

SRIC Schwarz-Rissanen information criterion

Betting market related to the total number of goals scored in a Total (in betting) particular match

Betting on the amount of goals scored being lower than the total goals Under (in betting) line as set by the bookmaker

xiii Chapter 1

Introduction

The interest in predicting events is something that naturally comes with sports. Most people play a football pool with friends, family or colleagues during a World Cup or European Championship. However, a vast amount of betting opportunities is offered online by too. This master thesis is about goal prediction in association football, based on statistics from previous matches. In various other applications, computational models have already been proven to be able to make better and more accurate complex decision than humans based on the available data. Take the recent example of one of the top Go players in the world, Lee Sodol, who lost 3 games in a row to AlphaGo, a program driven by Google’s DeepMind AI (Morris, 2016). In this thesis, statistical methods and computational intelligence will be used to predict the amount of goals a team will score. These models will then be used as a ’selector’ of betting opportunities to evaluate its accuracy relative to bookmaker’s odds, which are a product of human judgement (namely that of the bookmaker and public that bets, causing the odds to change).

1.1 Motivation of study

Accurate predictions football for matches can be utilized for various applications. The main application can be found in the sports betting market. Predictions are needed on both sides of the market. On one side, bookmakers need predictions to correctly set their odds in order to get the betting behaviour they desire from the public. On the other side, bettors can utilize models to make predictions for sports events. They can then compare these predictions to the betting odds as set by the bookmaker. When the difference between the prediction of the player and the implied prediction (reflected by the odds) of the bookmaker is big enough, the bettor can profit from this if his prediction is accurate enough.

Another way it could be used in practice is for football clubs who can utilize it to adapt their tactics or use it as input for their training schedules. They can use the model to simulate upcoming matches and do sensitivity analyses to see what the effects will be if their team improves on a certain variable. It will also give them an idea if their recent performances are in line with the expectations or if they were lucky or unlucky. Coaches can use this to see if they need to change things tactically. Chairmen can use this to decide whether the club is performing well or not, rather than just emotionally reacting to losses or wins based on the amount of goals that were scored (and firing the coach).

1 Other potential applications could lie in the allocation of different goods related to football games. For example, people who watch football on TV prefer to see games that are exciting to watch and have a lot of goals. The predictions could help to select a game to broadcast on TV in order to obtain a maximum number of interested viewers. On the longer term (looking ahead for more than one match), it can be used for scheduling game times. When certain ‘strong’ teams have been identified and are expected to compete for the title later in the season, the football associations who schedule the games can ensure that these teams play at the same moment for the last games of season to keep the competition as exciting and fair as possible.

Furthermore, it could be used by companies who are interested in participating in football as for example sponsors or technology partners. Rather than just going for the teams that are on top of table at that moment, they could identify teams that are under performing and are playing better than their ranking suggests (and are therefore expected to do better soon). If they choose to cooperate with these clubs, they could most likely get a better deal for their money than with clubs higher in the table that are over performing, while getting the same amount of exposure in the end. This will increase the efficiency of their investment.

1.2 Summary of Literature review

1.2.1 Prediction of association football games

The first research found on predicting the outcome of association football games was done by Maher (1982). He used an adapted model based on research done by Moroney (1951), who used a negative binomial distribution to model the scores in an association football game. Maher introduced four parameters in his model: home team offensive quality (α), away team defensive quality (β), away team offensive quality (δ) and home team defensive quality (γ). Data was obtained for each of England’s highest four football divisions. To estimate these parameters, he uses maximum likelihood estimates based on the amount of goals these teams have scored and conceded in their respective home and away games. After testing various models changing the before mentioned parameters, Maher found that just using α and β led to the most accurate model, implying that home advantage is equal for all teams.

The research done by Maher was further adapted by Dixon and Robinson (1998) in order to estimate how goals are distributed within a single match and what influences this distribution. The model is fitted with league and cup goal time data from 1993 to 1996. It was found that in the last part of each half, a noticeably high number of goals are scored. It was also found that the scoring rate increases throughout the match. Using a birth process, it is analysed what the scoring rates are until a next goal is scored for various different current scores. A Poisson distribution was used to model the independent scoring processes for both teams. Both earlier mentioned findings are incorporated in the model, in combination with the parameters that were introduced by Maher (1982). From that, it was found that the scoring rates depend on the current score in a game. When the home team leads, the home team’s scoring rate decreases and the away team scoring rate increases. When the away team is leading, both scoring rates increase. The authors see the explanation that even a draw is a good result for the away team as a possible explanation for this. The scoring rates generally increase once the first goal has been scored. It is also shown that after a team scores, they are less likely to immediately concede a goal the next couple of minutes.

Cheng, Chu, Fan, Zhou, and Lu (2003) researched the application of neural networks for the prediction of football games. They developed a model that could be divided in three different

2 neural networks that all have their own weights for the variables: a network for when the home team is stronger, a network for when the away team is stronger and a network for when the teams are approximately equal in quality. The authors have done this because they believe that some teams overperform against teams that are classified as stronger. The model automatically selects the used neural network based on the class information (difference in points and difference in goals between the two teams). For training the network, the following parameters are used: win ratio, draw ratio, lose ratio, average goals score and conceded, morale (form) and home advantage. Data from the Italian Serie A of the 2001/2002 season was used to test the model and compare it to existing statistical methods. The model was successfully able to predict 52.3% of the matches correctly. Which is better than other methods it is compared with: the ELO ranking (Jones, 2000), goal ratio ranking (Jackson, 1990) and latest 6 match results compare model, which is a rule based system based simply based on the goals scored for the two teams in the last 6 matches.

The model by Cheng et al. was later extended by Aslan and Inceoglu (2007). They extend this model and test two different input vectors, named LV Qa (learning vector quantization) and LV Qb, of which LV Qb is a simplified version of LV Qa. LV Qa uses the home and away ratings of both teams, while LV Qb uses just the home rating of the home team and away rating of the away team. The same dataset was used as by Cheng et al. in order to get a fair comparison. LV Qa has an overall accuracy of 51.29% while LV Qb has an overall prediction accuracy of 53.29%, also outperforming the model proposed by Chang et al. (2003). This could be caused by over-fitting on the training data of the more complex model. Interesting is that LV Qa has a higher prediction accuracy on away wins and draws, while the higher accuracy of LV Qb is largely caused by its good prediction of home wins.

Research on using fuzzy modelling to predict the goal difference between two teams in an association football game was done by Rotshtein, Posner, and Rakityanskaya (2005). This model used the goals scored in the five previous matches for both teams as inputs as well as the last two meetings between the teams playing the game. To tune this model, data from 1056 matches in the highest Finnish football league was used, because it is characterized by a minimum number of upsets. The prediction results were the best for extreme classes of decisions (high-score wins or losses) and worst for low-score wins or losses. The model showed to be quite successful in predicting draws.

A comparative study for different methods was done by Tsakonas, Dounias, Shtovba, and Vivdyuk (2002). They predicted the outcome of association football matches using a fuzzy model, a neural network, and a genetic programming model. All three models could make pre- dictions of the game winner, while the fuzzy model and the neural network could also predict the score of games. The following features were used as input for their models: number of sus- pended and injured players, form, league ranking, home-advantage and goal difference in direct matches between the two involved teams over the last 10 years. When it came to predicting the outcome of games, the genetic programming model outperformed both the neural network and the fuzzy model, which were equal in performance. However, the genetic algorithm was the heaviest in terms of required computing power. For predicting the scores, the fuzzy model slightly outperformed the neural network.

The application of Bayesian networks to predict outcomes of individual matches was researched by Constantinou et al. (2012). A sample of all English Premier League matches from 1993/1994 to 2010/2011 was used (6244 matches) to make predictions for the 380 matches of the 2010/2011 season. The model that was built used four components to base its predictions on: strength, form, psychology and fatigue. Strength is measured by points from previous seasons (modelling higher uncertainty for older seasons) and the current season as well as optional subjective in-

3 formation that the model cannot know (for example the signing of new superstar players). The component team form consist of the availability of key players, the current form (consisting of home and general form) and possible first team players returning after injuries. The psychologi- cal component is a subjective component which holds team spirit, managerial issues, motivation and possible head-to-head biases (rivalries for example). Fatigue is modelled by toughness of the previous match, the number of days that have passed since last match, the number of first team players rested and the participation of first team players in matches of their national teams. The objective model showed to be inferior both to the subjective model and normalized predictions by bookmakers. The subjective model however, is comparable in performance to the predictions of the normalized bookmakers’ odds.

1.2.2 Dynamics and biases of betting markets

The first article found about bookmaking dates back to 1950. This is an article where Lawrence (1950) discusses the history of bookmakers in America, starting with the growing prominence of horse race tracks in the period of 1850-1875. He writes that ‘a perfect mathematical booking of any race would ensure a profit regardless of the outcome’, suggesting that a ‘balanced book’ (equalling their profits for various outcomes of the game) approach for the bookmaker would be the optimal solution to run its business.

Later, Levitt (2004) compared the dynamics of the betting market to the dynamics of the financial market. First, he notes that the role of the bookmaker is different than the role of a market maker in a financial market, who merely matches sellers and buyers. Bookmakers take this risk themselves. Three examples of scenarios are mentioned in which bookmakers can sustain profits. The first one is if a bookmaker is really good at determining the prices in such a way that bettors will spread their money over the different outcomes (leading to balanced books, as discussed earlier). The second scenario is when a bookmaker is simply better than bettors at predicting game outcomes than the bettors. The third scenario combines both previous scenarios: the bookmaker is good at predicting bettor behaviour and can predict game outcomes better than the bettors. In this way, they could set the odds ‘right’ and adjust them in order to exploit bettors preferences to make even more profit. To research how bookmakers would react on betting action, wagers placed by bettors as part of a contest at an online sportsbook during the 2001/2002 NFL season were used. It was found that on average, almost 60% of the bets were placed on the home team spread (which should have about an equal chance to be a winning bet as the away team spread) if it was favoured in that match. In games where the away team was the favourite, the percentage amount of bets that was placed on this favourite was even higher (almost 70%). The number of winning bets however, was 57.7% for teams that were underdog in away games, and 50.4% for teams that were underdog in home games, showing a favourite-bias by the betters. This indicates that bookmakers might not at all be interested in ‘balancing the books’, as they can exploit the favourite bias of the betting public in order to make even more profit.

This was further researched by Paul and Weinbach (2007), who used actual betting data from internet bookmaker sportsbook.com (who publish percentage of dollars bet on different bets in a game) of NFL games, rather than data from a tipping competition. They found the same things as Levitt (2004): considerably more money comes in on the favourite than on the underdog. As the bookmaker is willing to accept this, they do not mind incurring risks when attempting to maximize their own profits. It was found that a 3-point favourite would have more than 56% of the dollars wagered on the team. A 7-point favourite would have approximately 61% of the share of bets. For road teams, these percentages increase by another 16%. The same sort of

4 behaviour was found regarding point totals. For a total points line of 31, it is predicted that 55% of the bets are placed on the ‘over’. For the average total line of 40.5, 63% of the bets are predicted to be placed on the over. For a line of 55, it is predicted that 76% of the betting volume is placed on the over.

Comparable research was done for the National Association (Paul & Weinbach, 2008). Unlike in their previous research on the NFL, the number of bets was used rather than the amounts wagered in dollars. A comparable regression analysis was done as in the NFL research, and it was again found that bigger favourites had a bigger amount of bets being placed on them. Like Levitt (2004) found, it was also found that especially road favourites were being overbet, with the dummy variable for the road favourite being positive and significant. For the game totals, it was again found that bettors preferred the over rather than the under. Again, this shows that bookmakers do not necessarily have interest in the ‘balanced-book’ approach when taking wagers. While fading the market (betting against public sentiment) showed to be profitable for NFL games (Paul & Weinbach, 2007), this was not the case for the NBA as found in this paper.

Forrest and Simmons (2008) researched bettor bias (sentiments) in the Spanish football market from the 2001/2002 season to the 2004/2005 season, with the 2005/2006 season being retained as hold-out sample. Odds were taken from Interwetten, with a bookmaker commission on stake of 15.8%, which the authors perceive as quite high (personal note: Interwetten is one of the slowest reacting bookmakers on the market when it comes to news, they simply cannot afford to have a lower commission). They use a multivariate probit regression model for forecasting the probability of a certain team to win based on the implied probability by the bookmaker, the home advantage, and the influence of sentiment (measured as the difference in attendance between the subject team and its opponent). The results of the regression analysis implied a negative longshot bias, but also that betting on home teams would be favourable. As favourites are usually the home team, these two cancel each other out in a lot of cases. Finally, it was also found that betting on ‘unpopular’ clubs was less profitable than betting on popular clubs (with more supporters). This conflicts with research done by Avery and Chevalier (1999) on American Football, who found the same effect, but it in the opposite direction. This could have to with the way the rivalries in the two different sports work. In Spanish football, there are two superpowers (Barcelona and Real Madrid), that are disliked by the rest of the country aside of their own supporters. This could cause the public to bet against these teams hoping that they will lose. In American Football, rivalries might be spread out more with people mainly betting on their own team resulting in teams with more fans being over-bet.

1.2.3 Predictive modelling applied to sports betting

Dixon and Coles (1997) used the earlier mentioned research by Maher (1982) to make predictions for betting, but extend it in order to be able to predict low scoring games more accurately. Another extension to the model by Maher is that they use cup games to be able to rank all the teams on attack and defence parameters, including quality differences between teams from different leagues. In this way, not only league games can be predicted, but also cup games. Finally, they introduce dynamic parameters for offensive and defensive quality, rather than static parameters. This means that the quality parameters change every game based on the team’s performance in the last game. The dynamic parameters are built up in such a way that recent performances count heavier than earlier performances. Using the implied probability from the bookmakers and the implied probabilities from the model, the model is used to select betting opportunities. For different thresholds (ratios between model probabilities

5 and bookmaker probabilities) simulations were done in order to see what mean return would be obtained. It was found that the model had a positive return for any threshold higher than 1.1 (which makes sense because of the 91% pay-out percentage that the bookmakers used).

Dixon and Pope (2004) extend this research to include betting on correct scores rather than betting on just match outcome. In this research, a more in depth analysis of odds offered by bookmakers is done. While the probability density of match outcomes predicted by the model follows a fairly normal distribution, the probability density implied by bookmakers’ odds is not so much normally distributed. Relative to the model estimated distributions, bookmakers’ home odds density is high in the probability range of 0.3-0.4 and low in the range 0.4-0.5. For away odds, the density is low in the 0.2-0.35 range and high in the range 0.35-0.45. The same was found for the correct score odds, the differences between different bookmakers are small, but the difference between bookmakers and the model is high. Aside from the obtained bookmakers pay-out percentage on match outcomes in previous research, the bookmakers take on correct score odds was estimated in this research as well. A bookmakers’ take of 26 % was found (standard error 1%), resulting in a pay-out percentage of 74%. Again, it is shown for both match outcomes and correct score bets, that if the ratio between implied model probability and bookmakers’ odds is high enough, a positive return is observed. Another interesting finding in this research is that when bets are placed on all possible outcomes in all different matches, return for low-probability outcomes is much higher than for high-probability outcomes, implying that bookmakers artificially under-price favourites. This confirms the findings by Woodland and Woodland ((2001), (2003)) who have found similar longshot bias in baseball and ice hockey. However, the opposite was found to be true for horse racing and greyhound racing betting markets (Vaughan Williams, 1999).

The use of Elo-ratings for football match result prediction was researched by Hvattum and Arntzen (2010). The regression model estimates probabilities for the different match outcomes. For both the baseline Elo-ratings and the goal based Elo-ratings, predictions are made. These predictions are then used to simulate betting on past matches. A bet is placed on every op- portunity that provides a value bet (probability times odd is higher than one). Three different betting strategies are tested: the unit bet (one unit flat stake on every opportunity), unit win (choosing the stake in such a way that a successful bet yields a unit profit) and the Kelly cri- terion (Kelly, 1956). The Kelly criterion is captured in the following formula in which o is the odd of the bet in decimal format and p is the predicted probability:

o · p − 1 Betsize = (1.1) o − 1

When the Kelly criterion is used, the size of the bet becomes bigger as the implied value of the bet increases. To build the models and tune the parameters, data from a total of 30.000 matches of the highest four English association football divisions was used. For testing against the bookmakers’ odds, data from 16.288 matches from the same divisions was used. Matches for which arbitrage opportunities existed were excluded from the database, leaving 16.015 (8 seasons) matches to be used for testing. The unit win strategy returned higher profits than the unit bet strategy, indicating that betting on favourites was more profitable than betting on underdogs. This conflicts with prior research done by for example Dixon and Pope (2004) as mentioned earlier. Interesting is that the Kelly criterion does not result in profitable returns either. This would imply that both developed models are simply not good enough to beat the betting market. Even increasing the desired value of the bet (odds multiplied by probability) did not return profit when applying the models on the betting market.

6 A more advanced method was used by Constantinou et al. (2013). They used a simplified version of the earlier mentioned model by Constantinou et al. (2012) to see if it could be successfully applied to the sports betting market. The model presented in their previous research is split in three different levels: level one contains just the ‘strength’ component. On level two, the ‘form’ component is added. Finally, on level three, fatigue and motivation are added as well. Overall, it is shown that this model was able to beat the betting market. Different betting strategies were tested with this model. The strategies where the bet size was dependent on the level of discrepancy (difference in model probability and bookmaker implied probability), were the most profitable. It is shown that the ‘form’ component in level 2 makes the model much more successful than just using the level 1 model. When it comes to the level 3 model including ‘fatigue’, the authors suggest that they have most likely overestimated the negative effect of fatigue as a predictor for performance (confirming research of Lago-Pe˜nas,Rey, Lago- Ballesteros, Cas´ais,and Dom´ınguez(2011)). The profit that was made heavily depended on 5 teams, of which two were historical ‘big name’ teams (Arsenal and Liverpool) who were under- performing in the last part of the season. Another team that returned excessive profits was Newcastle United, who were just promoted a season earlier and ended up finishing 5th and were most likely underrated by the bookmakers (or the market) because of this.

1.3 Gaps in the literature and research questions

A lot of research has been done in the past on predicting association football games and using these predictions against bookmakers based on the goals scored in previous matches and sit- uation factors such as home advantage. However, no research has been done before on using secondary statistics from previous matches to predict match outcomes. This leaves interest- ing opportunities to do so because significant relationships have been found between some of these statistics and a team’s game performance (Moura, Martins, & Cunha, 2014)(Yue, Broich, Mester, & Seifriz, 2014)(Liu, Gomez, Lago-Pe˜nas,& Sampaio, 2014).

A model based on game statistics has already shown to be profitable when being employed on the American Football betting market (Zuber et al., 1985). This could imply that there is information in these underlying statistics that bookmakers do not use in the odds, and that odds are mainly based on match results. This is supported by research that shows that large winning and large losing streaks for a certain team influenced the bookmakers’ spread in a subsequent game, but not so much the actual game outcome (Wagonner, Wines, Soebbing, Seifried, & Martinez, 2014).

In association football betting, previous research has focussed on betting on match outcomes (betting on which team will win the match) and correct score betting. While research for American sports (basketball and American football) has been done on , no prior research has been done on betting on the so called Asian handicap1 or betting on goal totals in assocation football. The Asian handicap is essentially the point spread betting in association football, but then adjusted and divided in quarter points to make up for association football’s low scores. This leaves interesting opportunities because the highest limits in the sports betting market are offered on these type of bets by so called Asian-style bookmakers such as Pinnacle- Sports or SBOBET (who, unlike their European and American counterparts, do not limit or ban players who are long term winners2 (Paul & Weinbach, 2007)).

1http://en.wikipedia.org/wiki/Asian handicap 2http://www.theguardian.com/global/2015/aug/02/betting-horses-gambling-bookmakers-accounts -closed

7 To fill these research gaps, the following research questions were identified:

1. What game statistics and other variables from previous matches are relevant predictors for the amount of goals a team will score in the upcoming game?

2. What type of model is the most accurate when predicting the amount of goals scored by a team in an association football game based on statistics from previous games?

3. Are there biases in the Asian association football betting market that can be exploited?

4. Can a model based on secondary statistics in previously played association football games beat the bookmakers in Asian handicap and total goals betting?

Combined, these four research questions lead to the main research question:

What type of forecasting model based on match statistics is the most accurate for association football and how does this model’s performance compare to the Asian betting market?

Schematically, the full thesis project is shown in Figure 1.1.

Figure 1.1: Schematic overview of thesis project

1.4 Relevance to the field of operations research

As this master thesis is meant as final examination for the master’s programme in Operations Management & Logistics, it should be relevant for the field of operations research. This relevance lies in multiple aspects. The main relevance lies in the methods used. In the modelling section,

8 methods are used that are often used in operations research and this thesis researches their application and usefulness for the prediction of sports games. Furthermore, the behaviour and dynamics of the Asian betting market was researched. This behaviour was subsequently be compared to the behaviour of regular investment markets to see whether assumptions that generally count for regular investment markets are also valid for the Asian betting market.

1.5 Report structure

The report structure is as follows: Chapter 2 explains which data was used for the research, how this data was collected and which treatments were done before the data was ready to be used. Furthermore, it gives an explanation about other variables that were not directly importable but had to be calculated manually.

Chapter 3 matches with research question 1 and evaluates the influence of the different variables that were calculated and collected. In this chapter, it is shown what variables were significant enough to be included in the modelling phase.

In chapter 4, which matches with research question 2, it is shown how three different models were used to predict the goals scored in association football games. The following models were used: Partial least squares structural estimation modelling (PLS-SEM), which is a combination of factor analysis and linear regression, neural networks and fuzzy systems.

Chapter 5 treats research question 3 and evaluates the researched biases in the Asian betting market for association football. Finally, chapter 6 is about research question 4. In this chapter, the predictions as made by the models in chapter 4 are tested against closing odds that were directly taken from the Asian betting market, which is the betting market with the highest limits and volume traded in the world and is therefore seen as the hardest to beat (and therefore an accurate predictor of match outcomes).

9 Chapter 2

Data acquisition and treatment

In order to start modelling, data had to be obtained and cleaned first. As no in-depth data in exported format was publicly available, a script was written in order to collect this data from sources on the internet. This chapter will explain what kind of data was needed, how this data was acquired and what transformations were done to make the data suitable for modelling.

2.1 Data acquisition

The website Squawka (http://www.squawka.com/) offers in-depth match reports for different leagues in European football. Their match reports include all relevant match statistics such as attempted shots, tackles, passes and crosses as well as their outcomes. An example of such a match report can be found at: http://la-liga.squawka.com/barcelona-vs-deportivo-de-la- coruna/23-05-2015/spanish-la-liga/matches. The underlying data that is used to generate such match reports is stored in a more convenient way on pure text pages, which for this game can be found at: http://s3-irl-laliga.squawka.com/dp/ingame/10362. The different events that could happen in a match are grouped by type in these text pages; there is for example a part that holds all shots, a part that holds all passes and a part that holds all tackles.

This data was automatically imported from Squawka with a self-written Python script, using the Selenium plug-in for simulating a web environment and the beautifulsoup plug-in for reading the source codes of the opened websites and obtain the desired data from it. The statistics that were collected for both teams can be found in Appendix A. These statistics were collected for six European leagues: The English Premier League, the Spanish La Liga, the Italian Serie A, the German Bundesliga, the French Ligue 1 and the Dutch Eredivisie. This was done for the full 2013/2014 and 2014/2015 seasons as well as the 2015/2016 season up to April 9, 2016, resulting in a sample of 6156 matches.

2.2 Data preparation

2.2.1 Handling missing values

Some of the match reports on Squawka’s websites had dead links, were not coupled correctly to the underlying file with match statistics, or were just simply incomplete. Two possibilities

10 exist for handling missing values: case deletion or imputation based on comparable cases. As in this case imputation would have led to inaccurate predictions (a lot of values would have to be imputed, other teams that would be comparable to this case could have a completely different playing style), the choice was made to delete the cases with missing values. As this concerned only 11 cases in the dataset, a dataset of 6144 matches still remained.

For some variables, certain situations could lead to division by zero (for example if no shots in a match have been taken, calculating the percentage of shots on target will lead to division by zero). If this happened, the variable was assigned a ‘NaN’ (not a number) value.

2.2.2 Outlier anlysis

In the context of this research, the only outliers of interest were entry mistakes. Real observa- tions that would be classified as outlier under certain criteria (for example the z-score outlier rule) are still results that happened in games and therefore are identifiers of a teams quality (or its lack of quality). Therefore it was not needed to delete them.

Unfortunately, finding entry mistakes was hard in this case. The collected values are either averages or the counted total of certain events occurring in a game. If the data entrist added a duplicate event (or forgot an event), this would influence the total occurrence of this certain event slightly, but it would go unnoticed unless the data entrist made a lot of mistakes for a certain match. Entry mistakes for event variables used for calculating variable averages (such as shot distance or pass length) were also quite hard to spot, especially if a lot of such events happened in one match (because the magnitude of this outlier would be spread over the number of occurrences of such an event). However, big entry mistakes could still be spotted with outlier analysis.

Therefore, z-score outlier analyses were conducted on all the collected variables. No hard thresholds were used for deleting cases with extreme z-scores, but the z-scores were merely used as an indicator for further investigation for extreme cases. One extreme case was found (with a z-score of 12.3): An average distance for the shots for Hertha BSC in their game against Hoffenheim on 22/11/15 of 57 meters. Cross-checking this match with the match report on www.whoscored.com1, who also offer detailed match reports for assocation football matches, this indeed seems to be an entry mistake. Therefore, this value of 57 meters was manually corrected to 16 meters, which was the approximate distance of Hertha’s only shot in that game. Other relatively high z-scores were manually cross-checked with both the whoscored and squawka match reports. However, these were simply high realized values in the games and therefore required no further treatment.

2.2.3 Calculating rest before a game

As in a later stage the effect of rest on a team’s performance is evaluated, the number of rest a team had before a particular game had to be calculated. In order to do so, both domestic cup games and European cup games had to be added to the data, after which the number of days rest a team a had before a game n is calculated in the following way:

a a a restn = dn − dn−1 (2.1)

1http://www.whoscored.com/Matches/969682/MatchReport

11 In which d is the serial value of the date, which is equal to the number of days that have passed since the 1st of January, 1990. n-1 is the index of the previous match team a played.

2.2.4 ELO-ratings

To evaluate what the performance of a team in the previous game(s) was worth, the strength of the opposition has to be taken into account. This was done with the Elo-method (Elo, 1978), which was shown to be easily applicable for association football by Hvattum and Arntzen (2010). An advantage of the Elo-method is that it is a straightforward method, comparable to an exponential smoothing model, and does not require solving of matrices unlike methods proposed by Colley (2002) and Massey (1997). The Elo-rating and its updating procedure works as follows:

H H H H A A A A `1 = `0 + k(α − γ ) and `1 = `0 + k(α − γ ) (2.2)

With:

1 γH = and γA = 1 − γH (2.3) `A−`H 0 0 1 + c d

 1 if home team wins  αH = 0.5 if match is a draw and αA = 1 − αH (2.4)  0 if away team wins

In which superscripts H and A refer to the home and away team respectively. `1 is the Elo- rating after a match with index 1 and `0 is the Elo-rating for a team prior to this match. c and d are parameters that set the scale of the ratings and k is the parameter that determines how the recent performance is valued. If k is high, the ratings will be more dependent on the most recent performance and if k is low, it will be less dependent on the most recent performance. It should therefore be set with care.

Hvattum and Arntzen further researched what values are best suitable for the parameters men- tioned earlier. They first set the following values for the scale parameters: c = 10, d = 400 and then find k = 20 to be a suitable value. These update parameters were therefore used in the later stages of the project. α is the value that is given for a win, loss or draw. γ is the factor that adjusts for the quality difference between the two teams and gives worth to the performance of the two teams respective to each other.

With the update parameters known, the initial values had to be set before the update procedure could be applied. As the goal is not to predict matches between teams from different leagues, strength of the different leagues relative to each other did not have to be taken into account when setting the initial values. Therefore for every league, the highest ranked team of the previous season was assigned an initial Elo-rating of 700 points, and the lowest ranked team of the previous team was assigned an initial Elo-rating of 200 points. The initial Elo-ratings for the other teams could then be scaled linearly in the following way:

700 − 200 ` = 700 − (i − 1) (2.5) i n − 1

12 In which i is the place that team ended in the previous season relative to the other teams in the league this season. In case multiple teams were promoted, they were ranked on their league ranking of the previous season in the second tier. Finally, n denotes the number of teams in a certain league. As 2013/2014 is the first season in the used dataset, the league rankings from 2012/2013 were used for the different leagues to initialize the Elo-ratings. Finally, promoted teams in subsequent seasons were automatically assigned an Elo-rating of 225.

13 Chapter 3

Selection of variables

With the data cleaned and ready to be used, the variables in the collected dataset had to be evaluated. Before this was done, the influence of rest was evaluated check if the number of days rest a team has before a game influences their performance. For the different collected variables, different rolling averages were calculated, which are indicators of a certain team’s ability for these measures. Then, significance of the relationship between these averages and performance in the upcoming match was evaluated to decide which measures should be used in the prediction models.

3.1 The influence of rest

The influence of rest on an association football team’s performance has been evaluated in prior research by several researches with mixed results. Both Dupont et al. (2010) and Lago-Pe˜nas et al. (2011) found no statistical decrease in performance when the teams they analysed played more than one match per week because of involvement in European cups. More in-depth research was done by Scoppa (2015), who also found that there was indeed not enough proof for a linear relationship between rest and performance. However, he did find that when the time between 2 matches was reduced to 3 days or less, this would negatively affect the performance of teams.

To check if this is true for the dataset used in this research as well, the average number of goals teams score and concede before and after different rest periods were compared, and the difference was tested for significance. For the full dataset, these averages and the p-values obtained by doing a comparative Mann-Whitney U-test are shown in Table 3.1. The Mann-Whitney (also known as the Wilcoxon ranksum test) test was chosen as a regular t-test assumes normally distributed data, which is not the case here as a number of goals cannot have a negative value. The big contribution of 0 goals on the data points also made it impossible to transform the data in such a way that it becomes normally distributed with regular transformations such as taking the square, square root or logarithm.

As can be seen from this table, teams that have less rest perform significantly better than teams that have more rest. This is caused by the fact that teams that play more games (and thus have less rest) usually are better quality teams (as they are involved in European games). Therefore the dataset was filtered. First, a sample of top teams (that have played in European cup games for all 3 seasons in the dataset) was taken, after which the values are compared in the same way as for the full dataset. The results of these tests can be seen in Table 3.2.

14 Table 3.1: Influence of rest for full dataset

Mean goals scored Mean goals conceded Rest Less or equal Less or equal More rest p More rest p threshold rest rest 3 days 1.345 1.466 0.000 1.392 1.221 0.000 4 days 1.326 1.464 0.000 1.413 1.239 0.000 5 days 1.319 1.453 0.000 1.423 1.251 0.000 6 days 1.313 1.421 0.000 1.426 1.298 0.000

Table 3.2: Influence of rest for top teams

Mean goals scored Mean goals conceded Rest Less or equal Less or equal More rest p More rest p threshold rest rest 3 days 1.985 2.051 0.124 0.980 0.914 0.128 4 days 1.957 2.055 0.103 1.009 0.909 0.024 5 days 1.974 2.027 0.360 1.025 0.911 0.010 6 days 1.951 2.032 0.178 1.014 0.933 0.058

When it comes to scoring, the top teams do not seem to perform statistically different when they had less rest compared to when they had more rest. When it comes to goals conceded, the teams don’t perform different when it comes to having more or less than 3 days rest. However, they do perform better defensively when they have had less rest for thresholds of 4 and 5 days. This could be caused by the top teams having top facilities and knowledge, for example related to physiology. Therefore, the same U-test has been done for all teams in the dataset with the top-teams excluded. The p-values of these U-tests can be seen in Table 3.3. The same conclusions can be drawn here as for the top teams: the teams concede significantly less goals after short rest periods.

Table 3.3: Influence of rest for non-top teams

Mean goals scored Mean goals conceded Rest Less or equal Less or equal More rest p More rest p threshold rest rest 3 days 1.228 1.224 0.884 1.467 1.347 0.000 4 days 1.232 1.214 0.532 1.473 1.378 0.000 5 days 1.228 1.227 0.862 1.478 1.385 0.000 6 days 1.225 1.230 0.696 1.482 1.411 0.012

To check if these results would perhaps be caused by the sample, the same U-test has been done for a random selection of mid-table teams, of whom some may have played European games a season, but never structurally. The results of this test can be seen in Table 3.4. Again, there is no statistical difference in the goals they have scored for shorter or longer rest periods. However, just like in the sample for the top teams and the sample for non-top teams, when it comes to

15 rest thresholds of 4 or 5 days, the teams significantly concede less goals on average after shorter rest periods.

Table 3.4: Influence of rest for mid-table teams

Mean goals scored Mean goals conceded Rest Less or equal Less or equal More rest p More rest p threshold rest rest 3 days 1.253 1.297 0.541 1.415 1.344 0.327 4 days 1.267 1.230 0.486 1.433 1.311 0.031 5 days 1.266 1.240 0.596 1.439 1.319 0.023 6 days 1.249 1.271 0.714 1.436 1.365 0.128

These results could be caused by teams employing a more defensive (counter-attacking) style of play when they have to play 2 games a week. They could approach the games more cautious after a short rest period and play more freely after a longer rest period. Either way, it can be concluded that shorter rest periods do not have a negative effect on performance, but even a slightly positive effect in some specific cases. The number of days rest a team had should therefore be taken into account in the modelling phase.

3.2 Indicators of team quality

3.2.1 Averages of match statistics

To reduce the model’s sensitivity for one match, averages from multiple matches for all gathered variables were used. To see what kind of average is the most suitable, linear relationships for all variable averages and the goals they scored and conceded in the next match have been evaluated for different averages. This was done one by one, with a new regression analysis for each of the averages as single input and the goals scored and conceded as output. The means for all p-values for these different relationships are shown in Table 3.5. These mean p values were calculated to decide what kind of rolling average should be used in the modelling stage for the match variables.

Most of the match averages are self-explanatory. The ‘last x matches’ averages concern average values for all variables from the last x matches a team played before the game of which it is correlated with. Then there is an average taken from a team’s last home match and last away match, an average of the last 2 home matches and the last 2 away matches a team played. Finally, there is a weighted average. In this average, the last 2 games of a team contribute for 35% each to this average, while games they played 3, 4 and 5 matches back contribute for 10% each to this average.

As can be seen in Table 3.5, the lowest average p-values are obtained for the ’Last 5 matches’ average, for both the relationship with goals scored and goals conceded. Therefore, the averages for the last 5 matches for the different variables were used in the predictions models.

16 Table 3.5: Mean p-values for different averages

Mean p (on goals Mean p (on goals Average Type scored) conceded) Last 2 matches 0.0927 0.1260 Last 3 matches 0.0724 0.1229 Last 4 matches 0.0728 0.0745 Last 5 matches 0.0618 0.0649 Last home match & last away match 0.0819 0.1059 Last 2 home matches & last 2 away matches 0.0674 0.0678 Weighted average 0.0749 0.0950

3.2.2 Variable selection

A full table with the p-values of all evaluated relationships between different variables and the number of goals scored/conceded in the next match can be found in Appendix B in Tables B.2 and B.3. In these tables it can be seen that nearly all variables (for the 5 match average) have a significant relationship with either goals scored or conceded (or both). The following variables have no significant relationship with either goals scored or goals conceded:

• % of blocked own shots

• Successful tackles

• Mean pass angle from opponent goal.

Therefore, these variables were not used in the modelling stage. Furthermore, some of these variables are a combination of other variables. When the variables that these combinations are built from are both significant, there is no need to include the combination variable. Therefore the following variables were also not included in the modelling stage for this reason:

• Crossing success %

• Passing success %

• Forward passing success %

• % of passes intercepted

• % of successful take-ons.

While most of these variables are an indicator for a team’s offensive ability, their defensive ability should be taken into account as well. This was done by using the variables related to shooting, passing and crossing for the opposing teams in the past games. Furthermore, the average Elo-rating of the opposing teams was included in the models. This was done so the performance on the different variables is given a context, otherwise a team that have played a weak set of other teams in their last five games would be overrated by the model. The Elo-ratings of both teams themselves also were included in the model.

17 Concluding, all game related variables that were included in the different models are shown in Table 3.6.

Table 3.6: Variables included in the models

Yellow cards received Red cards received Possession Fouls made Shots taken Shots on target taken Shots blocked Mean shot distance to center of goal Mean shot angle from center of goal Keeper save % Tackles made Tackling success % Crosses attempted Successful crosses Mean cross length Mean cross distance from center of goal Mean cross angle from center of goal Header success % Passes attempted Passes successful Mean pass length Mean pass distance from center of goal Forward passes attempted Forward passes successful Successful interceptions Corners taken Take-ons attempted Take-ons successful Shots conceded Shots on target conceded Opponent’s shots blocked Mean distance from goal of shots conceded Mean angle from goal of shots conceded Attempted crosses conceded Successful crosses conceded Mean length of crosses conceded Mean distance from goal of crosses conceded Mean angle from goal of crosses conceded Attempted passes conceded Successful passes conceded Mean length of passes conceded Mean distance from goal of passes conceded Attempted forward passes conceded Successful forward passes conceded Own Elo-rating Mean Elo-rating of past opponents Number of days rest since last game

18 Chapter 4

Modelling

The next step was to use the found significant variables in different models in order to find out what model can predict the goals scored by both teams in the next match most accurate. Three different model types were tested: Partial least squared structural equation modelling (PLS-SEM), artificial neural networks and fuzzy interference systems.

4.1 PLS-SEM

For the modelling of the PLS-SEM model, the book ‘A primer on partial least squares structural equation modeling (PLS-SEM)’ by Hair., Hult, Tomas, Ringle, and Sarstedt (2014) was used as guideline. This book describes PLS-SEM as a combination between factor analysis and linear least squares regression. To do the mathematical analyses with the PLS-SEM algorithm, the PLS-SEM Toolbox1 for MatLab developed by Massimo Aria (associate professor at the University of Naples) was used.

Hair. et al. describe a PLS-SEM project to have the following steps:

1. Specifying the structural model: Defining the relationships between the different constructs. The endogenous and exogenous relationships have to be defined and then the path model has to be drawn.

2. Specifying the measurement model: Defining relationships between constructs and the variables that represent them. Relationships between constructs and variables can be both reflective (constructs have an effect on variables) or formative (constructs are determined by the underlying variables).

3. Data collection and examination: Usually in this stage, missing data and outliers are evaluated. However, this was already done previously and does not need to be done again.

4. Model estimation with PLS-SEM algorithm: The PLS-SEM algorithm estimates the unknown elements in the path model. To do so, the algorithm first determines the scores of constructs Y that are used as input for the partial regression models within the path model. These construct scores are based on the values of the variables they are related with. The construct scores are calculated in the following way: for a formative

1http://www.mathworks.com/matlabcentral/fileexchange/54147-pls-sem-toolbox

19 (constructs are formed by variables) measurement model, the weights of the variables are estimated by a partial multiple regression analysis where Y (construct value) is a depen- dent variable in a multiple regression with the related x values as independent variables. For a reflective measurement model (variables are calculated form the construct), the outer loadings are calculated through single independent regressions (one for each of the variables related to the construct). Then with these estimates for the constructs, the partial regression equation for every endogenous variable is set up and the loadings of the paths are calculated, as well as the resulting R2 values of these variables. This is an iterative process until the stopping criterion is reached. To do so, the researcher has to determine a threshold value (stop criterion) for the degree of improvement as well as the maximum number of iterations. 5. Evaluation of measurement models: Evaluating the validity and the reliability of the construct measures. This is done in two stages. The first stage is assessing the results of the reflective measurement models. The first step here is to evaluate the internal consistency reliability. This can be done with Cronbach’s alpha but it is advised to use composite reliability, which also takes into account the different outer loadings of the indicator variables. The next step is determining the convergent validity, which is the extent to which the measures correlate positively with the other measures of the same construct. A regular measure to do this is the average variance extracted (AVE). Then, discriminant validity has to be checked. This is the extent to which a construct is distinct from the other constructs. This can be done by either examining to cross loadings or using the Fornell-Larcker criterion. The formative measurement models are assessed in a different way and should not be assessed with the methods listed earlier for reflected measurement models. For formative measurement models, the first step is to assess the convergent validity. This is done to measure to what extent a variable correlates with other measures of the same construct. The second step is to assess for collinearity issues. A way to do this is using the variance inflation factor (VIF), defined as a reciprocal of tolerance. Finally, the outer weights of the indicators have to be assessed. The outer weight is the result of a multiple regression with the construct as dependent variable and the variables as independent variables. This way, the relative contributions of the different variables to the construct can be assessed, and it needs to be checked that their contributions are significantly different from zero. Non-significant variables should not necessarily be deleted, the absolute contribution (the information a variable provides without the other variables) to the construct should also be considered. 6. Evaluation of structural model: Finally, the structural model itself is evaluated. The first step in this is assessing collinearity. This is done with the same measures (VIF) as earlier treated in the formative measurement models. The second step is assessing the significance of the path estimates; this can be done with a t-test where the t-value is obtained by dividing the path coefficient estimate by the bootstrap standard error. Then, R2 has to be assessed. The problem with R2 is that its value will increase even when non-significant variables are added to the model. Hence, the adjusted R2 can be used to prevent bias towards complex models. The next step is to check the effect sizes of different constructs.

Mathematically, a SEM relationship structural looks as follows:

n X Yj = (Xi · rij) (4.1) i=1

20 In which Yj is the value of the predicted construct j, Xi is the the value of predictor construct i and rij is the weight of the path between the independent construct Xi and dependent construct Yj.

Construct values are calculated in the following way for formative measurement models:

n X Xj = (xi · Wij) (4.2) i=1

In which Xj is the value of construct j, xi is the z-score value of variable i belonging to construct j. Wij is the weight of variable i on construct j.

Construct values for reflective constructs are simply correlated to the different individual vari- ables measuring the construct:

Xj = xi · Lij (4.3)

In which Xj is the value of construct j, xi is the z-score value of variable i belonging to construct j. Lij is the loading of variable i on construct j.

4.1.1 Baseline model

A first baseline model (see Figure 4.1) was made using constructs based on my own insight. The goals for each team were made dependent of the following constructs (with in brackets behind the construct the variables that are used to measure the construct):

1. Own duelling power (header %, attempted take-ons, successful take-ons) 2. Own discipline (yellow cards, red cards, fouls) 3. Own defence (keeper save %, attempted tackles, successful tackles, interceptions) 4. Own crosses (attempted, successful, length, distance and angle from goal and corners) 5. Own passing/control (attempted, successful, length, distance, forward attempted, forward successful and ball possession) 6. Own shooting (total shots, shots on target, blocked shots, mean distance and angle) 7. Opponent’s duelling power (header %, attempted take-ons, successful take-ons) 8. Opponent’s discipline (yellow cards, red cards, fouls) 9. Opponent’s passing/control (attempted, successful, length, distance, forward attempted, forward successful and ball possession) 10. Opponent’s defence (keeper save %, attempted tackles, successful tackles, interceptions) 11. Strength of opponent (ELO prior to match) 12. Strength of own past opponents (Average ELO of last 5 opponents) 13. Rest of opponent (Number of days since last match) 14. Opponent’s crosses conceded (attempted, successful, length, distance and angle from goal and corners)

21 15. Opponent’s shots conceded (total shots, shots on target, blocked shots, mean distance and angle) 16. Opponent’s passes conceded (attempted, successful, length, distance, forward attempted, forward successful and ball possession)

Figure 4.1: Path diagram baseline model

4.1.2 Evaluation of measurement models

All measurement models for the different constructs are formative (the measurement variables do not measure exactly the same phenomenon but are indicators of the construct). Therefore all of them can be assessed in the same way. Normally, the first step in assessing measurement models for formative constructs is checking the convergent validity (or redundancy analysis). This is done by checking the correlation between the obtained construct and an independent indicator for the same construct. However, for the constructs used in this research, there are no other independent, objective measures that could be used. Therefore, the choice was made to skip the evaluation of convergent validity.

The next step is assessing collinearity. This can be done by checking the variance inflation factor (VIF), which is automatically generated by the PLS-SEM plug-in and calculated with the following procedure:

1. Take a formative indicator and use this as dependent variable in a multivariate regression

22 analysis using the other indicators for this constructs as independent variable to calculate R2 2. Calculate the tolerance (TOL) for this indicator in the following way: TOL = 1 - R2

VIF is then simply the reciprocal of tolerance:

1 VIF = (4.4) 1 − R2

VIF values above 5 are should be treated. Using this criterion, some very high values were found in the used measurement variables. For the all four passing-related constructs, VIF values of over 1000 were obtained for total passes, successful passes, total forward passes and successful forward passes. Furthermore, possession had a VIF value of approximately 6. The four ‘shots’ constructs (own shooting for home and away and opponent’s shots conceded for home and away) had one variable which excluded the VIF threshold of 5 as well: the total shots variable. To solve these problems of high collinearity, the ‘total shots variable’ was removed from the model of the shots constructs. For the ‘passes’ constructs, ‘total passes’, ‘successful passes’ and ‘total attempted forward passes’ were removed, this was done as ‘successful forward passes’ has the highest correlation with goals scored and therefore is the best predictor of the involved variables. After having done so and rerunning the model, both ‘possession’ and ‘successful forward passes’ still had VIF values above 5, after which ‘possession’ was deleted as well. This resulted in all VIF values being under 5.

The final step in evaluating the measurement models was evaluating the significance and rele- vance of the different indicators (measurement variables). This was done in multiple steps: first, the outer weights were tested for significance. If the outer weight (the relative contribution to the construct) is significant, the indicator can be kept. If the outer weight is insignificant, the outer loading (the absolute contribution to the construct) should be evaluated. If the outer loading is above 0.5, the indicator can be kept. If it is below 0.5 and it is also insignificant, it should be removed. If it is below 0.5 but significant, the researcher should consider removing it but this is not a necessity. The significances were calculated with the following t-test:

w1 t = ∗ (4.5) sew1

The standard error is the bootstrapping standard error which is obtained in the following way: a number of times (in this case 5000, as advised by Hair. et al. (2014)) a random sample equal in size to the original sample is drawn with replacement (drawn cases are put back in the original dataset after they are picked) from the dataset. The weights and loadings are then re-estimated with this sample and the standard errors can be calculated with the values for each time this is done. The results of this procedure are shown in Table C.1 in Appendix C.

Following the described procedure and a cut-off value for the result of the t-test of 0.05, the following variables were deleted from the model:

• Keeper save % (both for home and away) • Tackles attempted (both for home and away) • Tackling success % (both for home and away) • All cross related variables except corners (both for home and away)

23 • All variables related to conceded crosses (both for home and away) • Mean angle of conceded shots (both for home and away)

Also, the following indicators were considered to be removed from the model:

• Red cards (both for home and away) • Mean shot angle (both for home and away)

The choice was made to keep both red cards and mean shot angle (as the loadings were highly significant). The rest of the variables were subsequently removed from the model (leading to the complete removal of the ’crosses conceded’ constructs).

4.1.3 Evaluation of structural model

The first step in evaluating the structural model is checking collinearity. This was done with VIF values in the same manner as was done for the measurement models. The VIF values for the different constructs can be seen in Table C.2 in Appendix C. From this table it can be seen that there are no VIF values above 5 and that there are no issues with collinearity with the used constructs.

The next step is evaluating the significance of the structural path coefficients. This was done with bootstrapping in the same manner as for the measurement models. The path coefficients and their respective significances can be seen in Table C.3 in Appendix C. As can be seen in this table, the following relationships had significant path weights for both home goals and away goals:

• Passing and control on goals scored • Shooting on goals scored • Conceded shots on goals conceded • Opponent Elo on goals scored

These constructs and relationships were therefore kept. The following relationships did not have a significant effect for either the home or away team and were deleted from the model:

• Crosses on goals scored • Defence on goals scored • Defence on goals conceded • Discipline on goals scored • Discipline on goals conceded • Duelling power on goals conceded • Passes conceded on goals conceded.

The following relationships were significant for either the home or away teams and insignificant for the other and were therefore investigated more thoroughly:

24 • Rest on goals conceded • Duelling power on goals scored • Average strength of last 5 opponents on goals scored

This was done by using other criteria described by Hair. et al.. First, the coefficient of deter- 2 2 mination, R and Radjusted, which corrects for more complex models and is calculated in the following way:

n − 1 R2 = 1 − 1 − R2 (4.6) adjusted n − k − 1

In which n is the number of cases and k is the number of constructs influencing the dependent 2 construct. This Radjusted adjusts its value for more complex models with variables that almost have no effect on the model. The coefficients of determination before and after removing the different constructs can be seen in Table 4.1. In this table it can be seen that removing the relationships barely influenced the coefficients of determination. For the home goals part of the model, the coefficient of determination did not change at all.

Table 4.1: Coefficients of determination for different models

home goals away goals 2 2 2 2 Model R Radjusted R Radjusted Starting model 0.1258 0.1243 0.0964 0.0949 Removed: rest → goals conceded 0.1257 0.1244 0.0958 0.0945 Removed: Past opponent average ELO → goals scored 0.1244 0.1231 0.0955 0.0942 Removed: Duelling power → goals scored 0.1250 0.1237 0.0942 0.0929

The next step is assessing the effect sizes f 2, which are calculated in the following way:

2 2 2 Rincluded − Rexcluded f = 2 (4.7) 1 − Rincluded

The effect sizes are shown in Table 4.2. According to Cohen (1988), effect sizes of 0.02, 0.15 and 0.35 can be considered small, medium and large respectively. As the effect sizes of the variables investigated were not even close to reaching these values, they were considered to be minimal. Therefore, based on these effect sizes, the choice was made to remove these relationships from the model in order to obtain a first iteration model.

Table 4.2: Effect sizes of relationships

Relationship f 2 (on home goals) f 2 (on away goals) Rest → goals conceded 0.0001 0.0007 Past opponent average ELO → goals 0.0016 0.0010 Duelling power → goals scored 0.0009 0.0024

25 4.1.4 First iteration

The second version (and thus first iteration) of the SEM model is shown in Figure 4.2. This is much simpler than the initial baseline model, showing that the most important factors in a foot- ball match are control and getting shooting opportunities while avoiding shooting opportunities for the opposition. The significances for this model’s path weights were re-tested and are shown in C.4 in Appendix C. The effect of shots conceded by the away team became insignificant on the goals scored by the home team as well as the effect of home passes & control on away team goals.

Figure 4.2: Path diagram of first iteration

To further investigate this, the earlier presented measures were calculated again. These are presented in Tables 4.3 and 4.4 . This full first iteration model model has R2 values of 0.1235 2 and 0.0928 for home and away goals respectively. The values for Radjusted are 0.1226 and 0.0919 respectively.

Table 4.3: Coefficients of determination for first iteration model after removing relationships

home goals away goals 2 2 2 2 Model R Radjusted R Radjusted Iteration 1 model 0.1235 0.1226 0.0928 0.0919 Removed: Opponent Elo → goals scored 0.1181 0.1174 0.0836 0.3828 Removed: Shots conceded → goals conceded 0.1193 0.1186 0.0900 0.0892 Removed: Shots made → goals scored 0.1084 0.1077 0.0788 0.0780 Removed: Own passes & control → goals scored 0.1018 0.1011 0.0805 0.0797 Removed: Own passes & control → goals conceded 0.1207 0.1200 0.0927 0.0920

As can be seen from Table 4.4, the effect sizes were still all pretty small, implying that the constructs used do not contain information that makes them unique compared to the other constructs used. Especially the effects of shots conceded on goals conceded and passes & control on goals conceded seem minimal. Therefore, both of these relationships were removed, leading to the second iteration model.

26 Table 4.4: Effect sizes of relationships for model after first iteration

Relationship f 2 (on home goals) f 2 (on away goals) Opponent Elo → goals scored 0.0062 0.0101 Shots conceded → goals conceded 0.0048 0.0031 Shots made → goals scored 0.0172 0.0154 Own passes & control → goals scored 0.0248 0.0136 Own passes & control → goals conceded 0.0032 0.0001

4.1.5 Second iteration

The path diagram of this second iteration SEM model is shown in 4.3. The significances of the path weights were evaluated again and are shown in Table C.5 in Appendix C. One insignificant path relation remained: effect of passes & control on goals for the away side.

Figure 4.3: Path diagram of second iteration SEM model

The other measures were calculated again, as well as the effect of removing the different rela- tionships from the model. These results are shown in Tables 4.5 and 4.6. Despite the low effect sizes, the choice was made to keep all of these remaining variables. For the opponent ELO, the path weights are highly significant while the effect sizes also both exceed 0.02. For shots made, both path weights were also highly significant. Finally, the choice whether to stick with passes & control was the hardest one. However, as it was significant for the home team as well as having an effect size exceeding 0.02 for the home teams, the choice was made to keep this variable included in the model. Furthermore, the effects on the R2 were quite high as well. As 2 for all of the variables, removing them causes a drop even for the complexity-corrected Radjusted of at least 10%.

This makes this iteration 2 model the final model, with an MAE of 0.9203 and an RMSE of 1.1661 for the test set.

Using Equations 4.1 and 4.2 and the weights and coefficients obtained from the PLS-SEM algorithm, the final model can mathematically be described as follows:

27 Yhomegoals = −0.181XawayELO − 0.170Xhomepassing + 0.154Xhomeshots (4.8)

Yawaygoals = −0.174XhomeELO − 0.132Xawaypassing + 0.146Xawayshots (4.9)

With the construct scores defined in the following way:

XawayELO = xawayELO (4.10)

Xhomepassing = −0.123x1 + 0.244x2 − 0.932x3 (4.11)

In which x1, x2 and x3 are mean home pass length, mean home pass distance from goal and successful forward passes in previous matches of the home side respectively.

Xhomeshots = 0.862x4 + 0.182x5 − 0.210x6 + 0.048x7 (4.12)

With x4 the shots on target of the home team in previous matches, x5 the number of shots taken by the home team that were blocked, x6 the mean distance from goal of the home team’s shots and x7 the mean angle from the middle of the goal for the home team’s shots.

XhomeELO = xhomeELO (4.13)

Xawaypassing = −0.133x8 + 0.202x9 − 0.9650x10 (4.14)

With x1, x2 and x3 being mean away pass length, mean away pass distance from goal and successful forward passes in previous matches of the away side respectively.

Xawayshots = 0.862x11 + 0.142x12 − 0.221x13 − 0.063x14 (4.15)

Again, with x11 the shots on target of the away team in previous matches, x12 the number of shots taken by the away team that were blocked, x13 the mean distance from goal of the away team’s shots and x14 the mean angle from the middle of the goal for the away team’s shots.

As the home and away goals constructs only have one variable, the variable values are equal to the construct values:

Yhomegoals = yhomegoals (4.16)

Yawaygoals = yawaygoals (4.17)

The number of goals are finally calculated by denormalizing the z-scores in the following way by the use of the mean number of home and away goals as well as their standard deviations:

28 Homegoals = σhomegoals · yhomegoals + µhomegoals (4.18)

Awaygoals = σawaygoals · yawaygoals + µawaygoals (4.19)

Table 4.5: Coefficients of determination for first iteration model after removing relationships

home goals away goals 2 2 2 2 Model R Radjusted R Radjusted Iteration 2 model 0.1153 0.1147 0.0897 0.0891 Removed: Opponent Elo → goals scored 0.0826 0.0822 0.0595 0.0591 Removed: Shots made → goals scored 0.0990 0.0987 0.0752 0.0748 Removed: Own passes & control → goals scored 0.0955 0.0951 0.0778 0.0775

Table 4.6: Effect sizes of relationships for model after first iteration

Relationship f 2 (on home goals) f 2 (on away goals) Opponent Elo → goals scored 0.0370 0.0332 Shots made → goals scored 0.0184 0.0159 Own passes & control → goals scored 0.0224 0.0131

4.1.6 League-specific results

The model of the previous sections used data from all competitions. This might lead to over- generalization and cause league-specific relationships to go missing. Therefore, the model ob- tained in the previous section was re-estimated 6 times, each time using just the matches from one league. The error measures of the independent test set are shown in Table 4.7.

Table 4.7: Separate League MAE and RMSE values for the Iteration 2 SEM model

League MAE RMSE German Bundesliga 0.9814 1.1783 Dutch Eredivisie 0.9582 1.2144 English Premier League 0.9120 1.1220 French Ligue 1 0.9460 1.2198 Italian Serie A 0.8742 1.0910 Spanish La Liga 0.8783 1.1672 Average 0.9250 1.1655

On average, the performance of the one-league models are almost identical to the model based on all-leagues grouped. Noteworthy are the very low error measures of the Italian Serie A. However, as these single-league models on average are no improvement compared to the multi- league model, the multi-league model was used in later stages of the project.

29 4.2 Artificial neural networks

Artificial neural networks are based on the way the human brain works. An artificial network contains several nodes (neurons, see Figure 4.4) that receive inputs xi (synapses) that all have their own weights wki. With these, the summed product vk is calculated (Kantardzic, 2011):

m X vk = xiwki (4.20) i=0

The neuron then computes the output its output y based on a certain activation function f:

yk = f(vk) (4.21)

Figure 4.4: Model of an artificial neuron (from Kantardzic, 2011)

These outputs can go either to a next neuron or are directly linked to an output variable. There are several types of activation functions, some that have a fixed output, while others are dependent on the input.

Roughly, two types of neural networks can be distinguished: feed-forward networks and recur- rent networks. Feed-forward means it propagates from the input side to the output side without any loops or feedbacks. In a layered network, there are no links between neurons in the same layer. For recurrent networks, feedback loops exist.

The most common neural network, and also the one to be used in this project, is the multilayer perceptron (see Figure 4.5). A multilayer perceptron has three distinctive characteristics: the neurons have a non-linear activation function, the network contains one or more hidden layers of neurons and the network exhibits a high degree of connectivity from one layer to the next one. The learning algorithm of neural networks is called backwards propagation and has two steps: forward pass and backwards pass. In forward pass, cases are sent through the network, and the error at the end is calculated. The weights of the network are then adjusted in the backward pass after which the cases are again sent through the network (again, a forward pass). This is done until the improvement of the accuracy reaches a certain threshold value. When no further improvement is found, the algorithm stops and the neural network is final.

Mathematically, this works as follows. First, error e is defined as the difference between the output of neural j at iteration n and the actual value d:

30 ej(n) = dj(n) − yj(n) (4.22)

The corrections for the weights after a step is then done:

δwjn(n) = η · δj(n) · xi(n) (4.23)

In which η is the learning-rate parameter defined by the researcher, and δj(n) is the local gradient that is defined as follows, with φ0 being the derivative of the activation function of the neuron j:

0 δj(n) = ej(n) · φ (vj[n]) (4.24)

Figure 4.5: Multilayer perceptron (Kantardzic, 2011)

When using artificial neural networks for prediction purposes, the topology should be selected carefully. A network that is too simple will not be able to model a complex problem accurately. A network that is too complex however, will lead to over-fitting on the training data, and will return poor results when it is presented with new data. While previous research has been done regarding the optimal topology of a neural network, the results are contradicting. Hecht- Nielsen (1987) for example presents the following simple heuristic based on the number of input variables (denoted by n).

nodes = 2n + 1 (4.25)

However, this heuristic is based on a specific type of activation function and generalizing it should be done with care. Kurkova (1992) suggests that for normal activation functions, at least 2 hidden layers should be used. If this is not done, the number of required hidden nodes can be extremely big, up to the size of the number of training samples presented to the model. Turban, Sharda, and Delen (2010) present a heuristic that the number of hidden nodes used is typically the average of the number of input and output nodes. As there is no real uniform answer to the question how many nodes or layers should be used, it was determined experimentally. Three main types of evaluation criteria can be used to evaluate the suitability of a certain topology of a neural network: hypothesis testing, cross-validation and information criteria (Anders & Korn, 1999).

31 As hypothesis testing is complex and labour intensive (and the model used in this case has over 90 input variables), this one was disregarded. Due to the nature of the dataset, it is impossible to use cross-validation (as described by Barrow and Crone (2013)). Cross-validation re-estimates the weights of the network a k amount of times (therefore also sometimes being referred to as k-fold cross-validation). Every time the weights are re-estimated, a different, independent test set is used. The cross-validation error is then the average of all the different errors of the k test sets. If this method is used in this context, the variable averages could contain future data. Therefore, information criteria were used in this project to evaluate the different topologies of neural networks.

There are three mainly used used information criteria in neural networks: the Schwarz Infor- mation Criterion (SIC), the Network Information Criterion (NIC) and the Akaike Information Criterion (AIC). The AIC was found to outperform the NIC by Anders and Korn (1999). Swanson and White (1995) found the SIC models to ’not appear as a reliable guide to out of sample performance’. Heafke and Helmenstein (1996) found that the models selected by the SIC were generally underdimensioned (and thus underfitted) and the models selected by the AIC were overdimensioned. As overfitting can be avoided by the use of regularization (Singh & RoyChowdhury, 2001), the choice was made to use the AIC to select the most suitable model.

The AIC for neural networks is defined as follows:

2W AIC = log(MSE) + (4.26) N

In which N is the number of observations and W is the number of weights in the neural network. A lower value for the AIC means a better fitting model.

Regularization adds a term to the error measure of the model in the training phase. This additional term is the sum of a function of all weights in the network multiplied by a scaling parameter λ. MatLab 2015b, in which the neural networks were modelled, uses the following adjusted formula for regularization:

1 N 1 n MSE = (1 − λ) · ( X e2) + λ · ( X w2) (4.27) reg N i n j i=1 j=1

In which e is the difference between the predicted value and the real value for case i. w is the value of weight j, and n is the total number of weights.

As training algorithm, the choice was made to choose trainscg (scaled conjugate gradient back- propagation) over trainlm (Levenberg-Marquardt back-propagation) as the former requires less computing time for bigger networks in MatLab, while the performance of the two are comparable (and these two algorithms obtain the best results as found by Sharma and Venugopalan (2014)).

4.2.1 Single layer neural networks

The first simulations were run for a single layer network varying from 1 node (neuron) to 50 nodes in this layer (loosely based on the research done by Turban et al.). As another measure to avoid over-fitting, the data is split in three different sets: a training set, a validation set and a test set. The training set is used to set the weights of the net. After every iteration, the

32 validation set is used to check if the new weights performed better than the previous weights. Finally, the test set can be used as independent sample to evaluate the performance of the obtained network on an independent sample (and as these are independent cases from the training set, they can later on be used to check the performance of the model type against the odds and lines of the betting market).

Crowther and Cox (2005) suggest that 50% to 70% of the data should be used as training set. Therefore, the choice was made to use 60% of the dataset for training, 20% for validation and 20% for testing. As time dependent data was used, the first 60% of cases was used as training data, the next 20% was used as validation data, and the final 20% was used as test data. As initial guess for regularization, λ = 0.1 was used.

The results of these first results can be seen in Figure 4.6.

Figure 4.6: AIC values for single layer networks

As can be seen in this figure, the relationship between the number of artificial neurons in the network and the value of the AIC is almost linear. A network with only 2 artificial nodes had the lowest AIC and would therefore be classified as the most suitable network.

However, this might have been caused by the +2W term in the AIC formula. This penalizes the number of weights of the network. It could be possible that a simpler model (with less variables and thus less weights) would return a lower AIC value if there are a lot of variables that add little in terms of added information to the model. Therefore, correlations were checked for all the different variables. For different thresholds of correlation coefficients, the least significant of the two variables that correlated was removed from them model. These simulations were only run for models up to 5 nodes and the results are shown in Figure 4.7.

As can be seen in this figure, the best AIC results were obtained by using the dataset in which one of the correlated variables for all correlations higher than 0.6 have been removed. When more variables were removed, the AIC rose again which shows that the model is under-fitting. Again, a 2 neuron network was found as optimal topology.

The next step was finding the optimal regularization scale parameter λ. The goal of regulariza- tion is to avoid over-fitting, which is done by avoiding high weights. As the AIC is measured on

33 Figure 4.7: AIC values for different datasets

the training set errors, AIC is not a useful way to determine a suitable regularization parameter since the goal of regularization is to make the network that is being fitted more suitable for new, unseen datasets. Therefore, simulations were done with values for λ from 0 to 0.3 for single layer networks from 1 to 10 nodes (with the correlation threshold for the variables at r = 0.6). Then, the RMSE values for the test sets were averaged for each of the different values for λ. The results are shown in Table 4.8. In this table, it can be seen that without regularization, the RMSE is slightly higher (because of over-fitting happening at higher-order networks) than with suitable regularization (λ = 0.05). If λ becomes higher, the networks starts under-fitting and the RMSE increases again.

Table 4.8: Test set RMSE averages (1 to 10 nodes) for different regularization parameters

λ RMSE average 0.00 1.1576 0.05 1.1571 0.10 1.1584 0.15 1.1594 0.20 1.1650 0.25 1.1667 0.30 1.1676

With the transfer function initially used for the output layer (a hyperbolic tangent sigmoid transfer function, or tansig in MatLab), it is possible that negative outputs are generated by the neural network. Therefore a positive linear transfer function (poslin in MATLAB) was tested, which cannot return negative values. However, the results for using poslin were much worse than for the tansig transfer function. Therefore, the choice was made to manually correct the negative values to zero after the model finished simulating.

Concluding, for a 1 layer neural network, the optimal topology is a two neuron network with a

34 value for the regularization parameter of 0.05. This leads to a value for the AIC of 0.3185.

4.2.2 Two layer neural networks

As next step, two layer neural networks were simulated with the number of nodes varying from 1 to 4 for each of the layers. For regularization, λ = 0.05 was used. The resulting AIC values are shown in Table 4.9.

Table 4.9: AIC values for 2 layer network

nodes (layer 1) nodes (layer 2) AIC 1 1 0.3288 1 2 0.3272 1 3 0.3277 1 4 0.3288 2 1 0.3415 2 2 0.3378 2 3 0.3342 2 4 0.3324 3 1 0.3598 3 2 0.3496 3 3 0.3361 3 4 0.3418 4 1 0.3791 4 2 0.3562 4 3 0.3578 4 4 0.3566

As can be seen, all the networks with 2 layers have a higher AIC value than the best networks with 1 layer and are therefore not considered as an improvement over the 1 layer model.

4.2.3 Three layer neural networks

Finally, three layer networks were investigated too, following the same procedure (1 to 4 neurons per layer, λ = 0.05). The AIC values for the five best performing networks are shown below in table 4.10.

Table 4.10: AIC values for 3 layer network

nodes (layer 1) nodes (layer 2) nodes (layer 3) AIC 1 3 2 0.3321 1 2 3 0.3323 2 3 4 0.3327 2 1 1 0.3441 2 2 4 0.3446

35 Just like for the two layer networks, the three layer networks show no improvement on the AIC with regard to the single layer networks.

Concluding, the neural network with 2 nodes in 1 layer (with a regularization parameter of λ = 0.05 and correlating variables with a coefficient higher than 0.6 removed) is the best performing network. For the independent test set, this network has a MAE of 0.9083 and a RMSE of 1.1561.

4.2.4 League-specific results

The errors when using a model with only cases from one league were calculated again, as was done for the SEM model. These results are shown below in Table 4.11.

Table 4.11: Separate League MAE and RMSE values for a three layer neural network

League MAE RMSE German Bundesliga 0.9763 1.1851 Dutch Eredivisie 0.9353 1.1952 English Premier League 0.9022 1.1741 French Ligue 1 0.9250 1.1520 Italian Serie A 0.8948 1.1277 Spanish La Liga 0.8880 1.1799 Average 0.9203 1.1686

In this case, the single-league models on average performed slightly worse than the multi-league model. As these league-specific models are again no improvement to the already obtained model, the multi-league model’s predictions were used in the later stages of this project.

4.3 Fuzzy interference systems

Fuzzy models (or fuzzy interference systems) are based on the idea that most real world things cannot be measured precisely. Kantardzic (2011) gives the example of rain, being hard to be described precisely, but can be described in terms such as light rain, moderate rain or heavy rain. In normal set theory, there is a crisp line between different sets. For example, in crisp sets, there is a set of short people and tall people, divided by a threshold at 175 cm. This would mean someone that is 174 cm would be classified as short and someone that is 175 cm is classified as tall. A fuzzy set does not have such a crisp boundary, but is based on membership functions. Instead of a case belonging to a set or not, it gives a degree between 0 and 1 to which degree a certain case belongs to a set. In the example, this could mean that someone that is 174 cm could belong to the group ‘short people’ with a degree of 0.52 and to the group ‘tall people’ with a degree of 0.48.

The formal notation can be described as follows, with X being a collection of objects x. A fuzzy set A in X is then defined as:

A = (x, µa[x])|x ⊆ X (4.28)

36 Where µa(x) is the membership function for fuzzy set A. Just like a neural network has different types of activations functions, there are different types of membership functions in fuzzy systems, that all have their own shapes. The most commonly used shapes are triangular, trapezoidal and Gaussian functions. Based on these membership functions, relationships between different variables can be derived by the fuzzy model. These relationships can be seen as linguistic if-then rules such as ’if pressure is high, then volume is small’. A fuzzy relationship depends on the membership functions of the involved variables:

µR(x, y) = f(µA[x], µB[y]) (4.29)

Now let A, A’ and B be fuzzy sets respectively on the X, X and Y domain. Fuzzy set B’ induced by A’ is then defined as:

µB0 (y) = maxxmin[muA0 (x), µR(x, y)] (4.30)

The fuzzy interference system will transform the input variable value into degrees of membership to fuzzy membership functions. Based on the rules and the different membership functions it belongs to and the degree to which the variable belongs to this functions, the fuzzy interference system will calculate the output (Jang, Sun, & Mizutani, 1997).

The output can be calculated in two different ways. The first way is modelling the output to be fuzzy sets with their own membership functions. This type of modelling is called Mamdani- type fuzzy interference, after research done by Mamdani and Assilian (1975). The second way was first presented in research by Takagi and Sugeno (1985) and is called Takagi-Sugeno fuzzy interference. This type of fuzzy interference system models the output as either a constant or a linear function dependent on the input values.

The inputs membership grades can be generated in three different ways: grid partition (GP), subtractive clustering (SC) and fuzzy c-means clustering (FCM). Grid partition divides variable data in a user defined number of equally sized (in terms of range) subspaces with each subspace being linked to a membership function (Kantardzic, 2011).

Subtractive clustering was introduced by Chiu (1994) based on the mountain clustering algo- rithm presented by Yager and Filev (1994) and works as follows: each data point xi is considered as a potential cluster center and its measure of the potential value is defined as:

n 2 X −α||xi−xj || Pi = e (4.31) j=1

With: 4 α = 2 (4.32) ra

Where ra is a positive constant, effectively the radius defining a neighbourhood. So, the measure of the potential is a function of its distances to other points. After the potential for every point is calculated, the point with the highest potential is selected as the center for the first cluster.

After a k-th cluster has been defined, the potential values are updated according to the following ∗ ∗ formula in which xk is the location of the k-th cluster center and Pk its potential value:

37 ∗ 2 ∗ −β||xi−x || Pi = Pi − Pk e k (4.33)

With: 4 β = 2 (4.34) rb

Where rb again is a positive constant. The authors suggest that it should approximately be equal to 1.5ra.

This is repeated until the stopping criterion is reached (when the potential value of the found clusters are significantly smaller than the first found potential value).

Fuzzy c-means clustering was first introduced by Dunn (1973) and later extended to its current form by Bezdec (1981) and aims to minimize this objective function:

n c X X m 2 Jm = (uik) ||xk − vi||A (4.35) k=1 i=1

In which n is the number of cases, c is the number of clusters, m is a matrix that defines the overlap of the fuzzy sets, xk is data point k, vi is the center of cluster i and uik is the degree of membership of of xk in cluster i.

The algorithm is then executed as follows:

First, uik is randomly initialized, after with the clusters centres are recalculated:

Pn m k=1(uik) xk vi = Pn m (4.36) k=1(uik)

Then, uik is updated:

1 u = (4.37) ik 2/(m−1) Pc ||xk−vi|| j=1 ||xk−vj ||

Once the difference between the obtained value for uik and its previous value is small enough, the process is ended.

4.3.1 Performance in literature of input methods

In previous research, the results of these different input methods differ. In a study done on earthquake prediction a fuzzy model based on FCM inputs scored higher than GP and SC fuzzy systems (Mirrashid, 2014). In a study done on predicting evaporation of water in Turkey, a GP model outperformed a SC model (Kisi & Zounemat-Kermani, 2014). Research on using fuzzy interference systems to model lake fluctuations was done by Sanikhani, Kisi, Kiafar, Zaman, and Ghavidel (2015) and in this research a GP and SC model were used. The best model in this research heavily dependent on the timespan that was being investigated.

38 4.3.2 Variable selection

As the fuzzy interferences systems are not able to handle all variables in one model due to high computational power demand, a selection of variables had to be made. This was done in multiple steps in which the MatLab function exhsrch (exhaustive search) is used. This function generates a single iteration fuzzy system for all variables (or combinations of a user defined number of variables) and checks their RMSE on an independent sample. This way, the variables that are the best predictors will show up with the lowest RMSE values. In this model, the same variables as in the baseline SEM models were used. However, the variables that were highly collinear (as found in Paragraph 4.1.2) were removed.

The 20 best performing predicting variables (and their training and test set RMSE’s) are shown in Appendix D in Figures D.1 and D.2 for home and away goals respectively.

Interesting is that the model for away goals has a lower RMSE than the model for the home goals. The following variables result in top 10 RMSE’s for both home and away goals and were further investigated:

• Successful passes forward made • Shots on target made • ELO-rating of upcoming opponent • Mean pass distance from goal of own passes • Length of own passes • Previous opponent’s successful forward passes made • Upcoming opponent’s successful forward passes made • Number of own successful take-ons

This further investigation was again done with MatLab’s exhsearch function, but this time creating models with 3 variables instead of 1. Tables D.1 and D.2 in Appendix D show the best 10 performing models and their training and test set RMSE values for the 3 variable models for home and away goals respectively.

The number of appearances for each variable was used to determine its importance to the mod- els that were subsequently created. The different variables and the number of times they were included in the models is shown in Table 4.12. For the different fuzzy models that were sub- sequently modelled, these variables were introduced stepwise until the computational demands exceed the computational power or until model performance decreases (due to over-fitting).

4.3.3 Evaluation of model performance

Just like for the neural networks, an information criterion was used for evaluation of the different models and this determined the stopping criterion for the complexity of the model. Different information criterion for fuzzy systems were researched by Yen and Wang (1998) who found the Schwarz-Rissanen information criterion (SRIC) to be the most suitable one. This SRIC is defined as follows:

log(N)m SRIC = log(MSE) + (4.38) N

39 Table 4.12: RMSE results for away goals for a 3 variable fuzzy interference system

Variable Number of appearances Own successful forward passes 16 Own shots on target 12 Opponent ELO 12 Mean pass distance from goal of own passes 6 Upcoming opponent’s successful forward passes 6 Mean length of own passes 5 Own successful take-ons 2 Previous opponent’s successful forward passes 1

In which m is the number of parameters to be adjusted.

Yen and Wang suggest that for the application of fuzzy systems, the parameter m should be replaced by the following complexity function:

s = ma + mr + cmr (4.39)

In which ma is the number of antecedent (input) parameters, mc is the number of consequent (output) parameters and mr is the number of fuzzy rules in the model. c is merely a scaling parameter. The authors suggest using a value for c between 2 and 5, using 3 themselves. A value of 3 for c was used in this research as well, resulting in the following formula for the adjusted SRIC:

log(N)(m + m + 3m ) SRIC = log(MSE) + a r r (4.40) N

4.3.4 Grid partition

For the Takagi-Sugeno type grid partition model, the variables presented in the previous section were entered stepwise after which the performance was evaluated. This was again done using 3 different sets: a training set of 60%, a validation set for the training set of 20% and an independent test set of 20 % for which the error measures are calculated. Fuzzy systems with 2 and 3 Gaussian membership functions (MF) for each variable were used in this stage.

The AIC values for the grid partitions fuzzy systems can be seen in Table E.1 in Appendix E. In this table, it can be seen that a 3 variable model is the best performer for systems with 2 membership functions and a 2 variable model is the best performer for fuzzy systems with 3 membership functions for each variable. Furthermore, the models with 2 membership functions outperform the models with 3 membership functions for any combination of variables. Finally, a 2 membership function model with 3 variables results in the lowest SRIC value of 0.3468.

40 Selection of membership functions

In order to further increase model performance, different functions were evaluated for both the input and output variables using the best performing model as selected previously. The results of using the different function types for the input variables are shown below in Table 4.13. In this table it can be seen that changing the membership function has very little influence on the performance of the model.

Table 4.13: SRIC results for using varying membership function types)

Function type SRIC Gaussian (previously used) 0.3468 Triangular 0.3476 Generalized bell-shape 0.3478 Trapezoid 0.3500

4.3.5 Subtractive clustering

In the subtractive clustering model (which is a Takagi-Sugeno model), the same steps were taken as for the grid partition model. However, instead of varying the number of membership functions, the radii of the clusters have to be varied when using subtractive clustering for the input space. For values of the radii of the clusters, 0.25 and 0.5 were used, which means the radii of the clusters are 0.25 or 0.5 times the width of the data space of the variable it is clustering the cases in. The results of the simulations are shown in Table E.2 in Appendix E. In case there are dashes in the table, this means the simulation could not be done due not enough rules being generated.

Again, the most complex models tested performed worse than some models with less complexity. A model with 4 included variables was found to be the best performer with an SRIC value of 0.3238, also lower than the grid partition fuzzy systems.

4.3.6 Fuzzy c-means clustering

The FCM model is basically an adapted SC model. MatLab’s FCM algorithm uses the results from SC (with radii of 0.5) as starting point and optimizes from there. Unlike the GP and SC models, for which MatLab only offers support modelling Takagi-Sugeno models, FCM models can also be modelled with Mamdani-outputs in MatLab. Simulations for both these output types were done, and the results are shown in Table E.3 in Appendix E.

For the Takagi-Sugeno model, the most accurate model is a 6 variable model, which outperforms the grid partition model and results in value for SRIC of 0.3301. However, this is worse than the subtractive clustering model. The performance of the Mamdani model does not even come close to the Takagi-Sugeno models for the used combinations of variables.

Therefore, the 4 variable subtractive clustering model is chosen as the final fuzzy system. This final model has a test set MAE of 0.9217 and a test set RMSE of 1.1689, which is worse than both the neural network and the PLS-SEM model.

41 4.3.7 League-specific results

Again, to investigate if there are different league-specific effects that the full model over- generalizes, the model was refitted with data from separate leagues. The MAE and RMSE values are shown in Table 4.14.

Table 4.14: Separate League MAE and RMSE values for a fuzzy c-means clustering FIS

League MAE RMSE German Bundesliga 1.0046 1.1966 Dutch Eredivisie 0.9578 1.2148 English Premier League 0.9147 1.1292 French Ligue 1 0.9367 1.2173 Italian Serie A 0.8730 1.0960 Spanish La Liga 0.8813 1.1755 Average 0.9280 1.1716

On average, these league-specific models perform worse than the general model using all leagues. However, for the Dutch Eredivisie, English Premier League and Italian Serie A, the single-league models outperform the general model. But as the single-league models again are no improvement on average compared to the multi-league model, the multi-league model was kept and used in later stages of the project.

4.4 Comparison of models and heuristics

Of the obtained models, the neural network is able to model the scored goals by the different teams most accurate. The SEM model is the second best and the fuzzy interference system is the worst performing model. To get an idea of the performance of the developed models compared to other prediction methods, the error measures were also calculated for a number of heuristics. The following heuristics were used:

• General average goals scored • Average goals scored for home and away team separate • General average goals scored per league • Average goals scored per league for home and away team separate • Average of a team’s goals scored in last 5 matches • Average of upcoming opponent’s goals conceded in last 5 matches • Average of a team’s goals scored in the last 5 matches and upcoming opponent’s goals conceded in the last 5 matches

The results are shown in Table 4.15. As can be seen in this table, the developed models all have lower error measures than the heuristics. The neural network has both the lowest RMSE and MAE, followed by the SEM and Fuzzy System which are quite comparable in performance.

42 Table 4.15: MAE and RMSE for created models and various heuristics

Model/heuristic MAE RMSE Structural equation model 0.9203 1.1661 Neural network 0.9083 1.1561 Fuzzy interference system 0.9217 1.1689 General average goals scored 1.0048 1.2484 Average goals scored for home and away team separate 0.9815 1.2350 General average goals scored per league 0.9988 1.2446 Average goals scored per league for home and away team separate 0.9754 1.2309 Average of a team’s goals scored in last 5 matches 1.0223 1.3003 Average of upcoming opponent’s goals conceded in last 5 matches 1.0441 1.3341 Average of scored and opponent’s conceded goals in past 5 matches 0.9755 1.2383

4.4.1 Significance of prediction model differences

As seen, the neural network’s predictions return slightly smaller error measures than the predic- tions for the fuzzy system and the structural equation model. However, without testing signifi- cances, there could no be conclusion made whether one model outperforms another. Therefore, Wilcoxon ranksum tests were done again do determine whether the distributions of the absolute errors had different medians. Normal t-tests again were unsuitable because of the non-normality of the data. The results are shown in Table 4.16. In this table it can be seen that there are no significant differences in median between the absolute errors of the predictions fo the different models. There is therefore not enough proof to conclude that one model outperforms another.

Table 4.16: Significances for model differences

Models p Neural network vs. SEM 0.5446 SEM vs. Fuzzy system 0.9971 Fuzzy system vs. Neural network 0.5470

43 Chapter 5

Biases in the betting market

As several researches found (as shown in the literature review), there were several biases found in American betting markets. In this chapter, handicap and total goal markets for 6 different European association football leagues are checked for biases.

5.1 Asian handicaps

To understand the rest of this chapter, knowledge about how the Asian betting market works is essential. As mentioned earlier, the Asian handicap is comparable to the spread in Ameri- can sports, but then with quarter goal intervals between the lines to make up for association football’s low scores. These kind of bets are offered in order to get an approximate 50% chance of both sides of the bet to win, which is not the case with outright win markets when there is a big quality difference between the two teams. Aside from bets on the point/goal difference between two teams (also called side handicap betting or Asian handicap betting), bets are also offered for the total number of goals/points to be scored (also called Asian total betting). For a single match, more than 1 line is usually offered for both the Asian handicap for the sides and the total amount of goals scored. This is illustrated with an example. Tables 5.1 and 5.2 show the offered lines by PinnacleSports (the highest limit Asian-style bookmaker) at the start of the match for both the side and the total lines for Ajax - FC Utrecht on October 2, 2016.

Table 5.1: Closing side lines and odds at PinnacleSports for Ajax - FC Utrecht

Line for home team Odd for home team Line for away team Odd for away team -2 2.93 +2 1.44 -1.75 2.38 +1.75 1.65 -1.5 2.08 +1.5 1.85 -1.25 1.83 + 1.25 2.10 -1 1.56 +1 2.57

The most important Asian Handicap for every match is the main line. Main line is used as definition for the Asian handicap or total goal line which has odds closest to 2 (or in other words, for which the absolute difference in odds of the two bets on this line is minimal). Considering the Ajax - Utrecht example, the main lines are -1.5 and 3 for the side and total goals respectively.

44 Table 5.2: Closing total goals lines and odds at PinnacleSports for Ajax - FC Utrecht

Line Odd for over Odd for under 2.5 1.55 2.54 2.75 1.69 2.26 3 1.92 1.98 3.25 2.20 1.73 3.5 2.39 1.62

Let’s now consider the main line for the sides for this Ajax-Utrecht match:

• Ajax -1.5 @ 2.08

• Utrecht +1.5 @ 1.85

This game ended in a 3-2 victory for Ajax. Consider a bet on Ajax -1.5. After the match, the handicap is applied on the score, resulting in a handicap result of 1.5-2. At this score, Ajax -1.5 was a lost bet. Applying the Utrecht handicap results in 3-3.5, making Utrecht +1.5 a winning bet. So for the Ajax handicap to win, they had to win by two goals at least, otherwise the Utrecht +1.5 bet would be a winner. The returns of a winning bet are calculated by multiplying the size of the bet by the odds. For example, a e10 bet on Utrecht +1.5 would have returned e18.50, a profit of e8.5. This is how the full handicaps work (ending in .5 or .0): if the score of the team that was bet on is higher than the opponent’s after applying the handicap, the bet is won. If it is lower than the opponent’s after applying the handicap, it is lost. If the score after applying the handicap is equal, the player gets exactly his or her money back.

Additional to the full handicaps, there are also quarter (and three-quarter) handicaps, ending in .25 or .75. These handicaps are basically split handicaps. A +0.25 handicap for example equals half of the stake on the +0.0 handicap and half of the stake on the +0.5 handicap. Let’s consider a e10 bet on the -1.25 handicap @ 1.83 for Ajax in the Ajax-Utrecht game. This equals half of the stake (e5) on -1 @ 1.83 and half of the stake (e5) on Ajax -1.5 @ 1.83. As Ajax won the game 3-2, half the bet is lost (the -1.5 part) while the other half is returned (the -1 part). This returns in a total pay-out of the bet of e5, a loss of e5. Now, a bet on Utrecht +1.25 would result in half win (winning the +1.5 part and getting money back for the +1 part).

If Ajax would have won 4-2, the bet on Ajax -1.75 would have been half won and the bet on Utrecht +1.75 half lost.

Summarizing:

• If the score difference after applying the handicap is 0 for your team, you get your money back.

• If the score difference after applying the handicap is -0.25 for your team, you lose half your money.

• If the score difference after applying the handicap is -0.5 or less for your team, you lose your complete bet.

45 • If the score difference after applying the handicap is +0.25 for your team, you have half the net profit of what you would have if was fully won. Mathematically, the odd a half Oddoriginal−1 won bet is settled with is calculated as follows: Oddhalfwon = 2 + 1 • If the score difference after applying the handicap is +0.5 or more for your team, you fully win the bet.

Betting on the goal total line works in an identical way. When betting on overs:

• If the total goals minus the line is 0.5 or higher, the bet is fully won.

• If the total goals minus the line is 0.25, the bet is half won.

• If the total goals is equal to the line, the player’s money is returned to the player.

• If the total goals minus the line is -0.25, the player loses half of the stake.

• If the total goals minus the line is -0.5 or higher, the bet is fully lost.

When betting on unders:

• If the total goals minus the line is 0.5 or higher, the bet is fully lost

• If the total goals minus the line is 0.25, the bet is half lost.

• If the total goals is equal to the line, the player’s money is returned to the player.

• If the total goals minus the line is -0.25, the bet is half won.

• If the total goals minus the line is -0.5 or higher, the bet is fully won.

5.2 Main lines

First, the biases in the main lines were investigated. The following opposing situations were investigated to see if there are biases in the odds offered by PinnacleSports:

• Home teams vs. Away teams

• Over (total goals) vs. Under (total goals)

• Favourites vs. Underdogs

• Home favourites vs. Home Underdogs

• Home underdogs vs. Away favourites

Underdogs and favourites were defined in the following way: if the main line for a certain team was -0.25 or lower, they were marked as favourite. If the main line for a certain team was 0.25 or higher, they were marked as underdog. Consequently, if the main line was 0 for the side Asian handicap, this match was disregarded for the favourite/underdog analysis.

46 The biases were analysed in the following way: For each team that was classified as a team in a certain situation, a virtual Euro (or Dollar) was placed on the main line of that particular situation. For all different situations, the return of investment of betting on all the teams was calculated in the following way:

Total Profit ROI = (5.1) Total amount wagered

The total ROI of these bets are shown in Table 5.3. Aside from the ROI for these bets, the p- value for the Wilcoxon signed-rank test is shown. The Wilcoxon signed-rank test tests whether the median of a set of values differs significantly from zero. This test was used as it does not require the the data to be normally distributed, unlike the t-test. As the data used here has a lot of contributions at -1 (full loss) and at around +1 (full win) and very little at 0 (money back), the data is not normally distributed and a t-test could not be used. It does however assume that the data used is symmetric. To test this assumption, confidence intervals for the skewness of the vectors were calculated. To do so, first the sample skewness was calculated (Cramer, 1997):

1 Pn 3 n i=1(xi − x¯) s = q (5.2) 1 Pn 2 3 ( n i=1(xi − x¯) )

The skewness standard error can be calculated in the following way:

s 6n(n − 1) se = (5.3) s (n − 2)(n + 1)(n + 3)

Finally, the upper and lower 95% confidence interval values can be calculated as follows:

s ± 1.96 · ses (5.4)

These values and confidence interval values for the vectors with returns for different situations can be seen in Appendix F in Table F.1. For interpreting skewness values, Bulmer (1979) says that if the skewness value is between -0.5 and 0.5, the distribution is approximately symmetric. As the confidence limit values do not exceed -0.15 and 0.15, the data for all different situations is assumed to be symmetric, making it suitable for the Wilcoxon signed-rank test.

As can be seen in the table, the median returns of betting on home teams, away teams, overs, unders, general underdogs, home favourites, home underdogs and away underdogs are all signif- icantly lower than zero at a 5% significance level. At a 10% significance level, the median return of betting on favourites is significantly lower than zero too. While the away favourites have a positive ROI as the only type of bet, its median returns are not significantly higher than zero. These results indicate that unlike what Paul and Weinbach (2007) found for the NFL, there are no biases in the Asian association football betting market big enough to return a profit by blindly betting on every team in a certain situation.

The next step was to test if there are biases so small that they are not profitable, but can still indicate if bookmakers slightly under- or overprice certain bets in order attract more money on

47 Table 5.3: ROI results for all bets in certain situations (main lines)

Situation Real ROI p (Wilcoxon signed-rank test) Home teams -0.0194 0.0000 Away teams -0.0148 0.0000 Over -0.0274 0.0000 Under -0.0169 0.0000 Favourites -0.0102 0.0846 Underdogs -0.0217 0.0000 Home favourites -0.0148 0.0342 Home underdogs -0.0313 0.0000 Away favourites 0.0019 0.8824 Away underdogs -0.0180 0.0000

one side than another.

This was done by comparing the pay-out vectors of the opposing situations with the Wilcoxon ranksum test. The resulting p-values are shown in Table 5.4. In this table it can be seen that there is no significant difference in median for home vs. away teams and overs vs. unders respectively. There is a significance in median difference for favourites compared to underdogs, also reflecting in the specific results of filtering home and away favourites/underdogs. This could indicate that the public that bets prefers to play underdogs and Pinnacle artificially needs to raise the odds of the favourites in order to attract the desired betting volume on them.

Table 5.4: p-value results for opposing bets

Situation p (Wilcoxon rank sum test) Home teams vs. Away teams 0.6207 Overs vs. Unders 0.2552 Favourites vs. Underdogs 0.0001 Home favourites vs. Away underdogs 0.0023 Away favourites vs. Home underdogs 0.0000

5.2.1 Separate league results

As next step, the results per league were assessed. For all different types of bets, the ROI’s were calculated per league and are shown in Table 5.5.

Two things are interesting in this table. The 16.4% ROI on away favourites in the French Ligue 1 and the 6.5% ROI on home underdogs in the German Bundesliga. This result in the French Ligue is the reason that away favourites in general returned a very small positive ROI. However, as the number of away favourites is so low, it is not even enough to make betting on the awayteams in general profitable (while ROI for the away underdogs is not extremely low with -2.2%).

To assess whether the medians of these returns are significant from zero, the Wilcoxon signed- rank test was done with the return vectors leading to these ROI’s. First, the skewness confidence intervals were calculated again, these are shown in F.2 in Appendix F. In this table it can be

48 Table 5.5: ROI per league for all bets in certain situations for different leagues (main line)

Situation Bundesliga Eredivisie Premier League Serie A Ligue 1 La Liga Home teams 0.0038 -0.0217 -0.0286 -0.0236 -0.0348 -0.0084 Away teams -0.0463 -0.0219 0.0024 -0.0103 -0.0004 -0.0189 Over -0.0273 -0.0519 -0.0306 -0.0170 -0.0094 -0.0337 Under -0.0234 -0.0005 -0.0090 -0.0260 -0.0359 -0.0037 Favourites -0.0490 -0.0209 -0.0029 0.0069 0.0256 -0.0309 Underdogs 0.0150 -0.0141 -0.0227 -0.0406 -0.0628 0.0036 Home favourites -0.0336 -0.0151 -0.0109 0.0022 -0.0177 -0.0191 Home underdogs 0.0646 0.0045 -0.0389 -0.0590 -0.1956 0.0372 Away favourites -0.0902 -0.0360 0.0159 0.0182 0.1664 -0.0627 Away underdogs -0.0036 -0.0213 -0.0158 -0.0330 -0.0220 -0.0089

seen that all home underdogs apart from the English Premier League and away favourites apart from the Spanish La Liga, have values for the skewness confidence limits that come close to the threshold where the distribution is not considered symmetric.

The p-values obtained from the signed-rank test are shown in Table 5.6. As can be seen in this table, most returns have a median significantly lower than zero. While the home underdogs in the Bundesliga have an ROI higher than 0, the median of its returns is not significantly higher than 0. The away favourites in the French Ligue 1 however, do have returns for which the median is significantly higher than zero. Surprising is also the significant value for the home underdogs in the Dutch Eredivisie, which have a positive ROI. Although, both of this could be caused by violation of the symmetry assumption as explained earlier.

Table 5.6: p-values for different type of bets per league (Wilcoxon signed rank test)

Situation Bundesliga Eredivisie Premier League Serie A Ligue 1 La Liga Hometeams 0.3813 0.0017 0.1743 0.0084 0.0022 0.0457 Awayteams 0.0000 0.0128 0.0438 0.0019 0.0070 0.0726 Over 0.0003 0.0000 0.0002 0.0052 0.1076 0.0029 Under 0.0003 0.0417 0.0001 0.0000 0.0000 0.0000 Favourites 0.0624 0.0449 0.8225 0.6874 0.7741 0.4703 Underdogs 0.0018 0.0011 0.0005 0.0000 0.0000 0.0080 Home favourites 0.2106 0.0636 0.9795 0.8059 0.2368 0.4210 Home underdogs 0.7385 0.0493 0.1677 0.0002 0.0000 0.1135 Away favourites 0.1264 0.4216 0.6505 0.7263 0.0052 0.9416 Away underdogs 0.0001 0.0090 0.0011 0.0001 0.0002 0.0362

Therefore, the sign test was used for these two league-specific situations. Unlike the Wilcoxon signed-rank test, the sign test does not assume a symmetric distribution. The sign test however has less statistical power than the Wilcoxon signed-rank test and therefore was used in this case as a second option. The p-values obtained by the sign test are 0.3397 and 0.1281 for the home underdogs in the Eredivisie and the away favourites in the French Ligue 1 respectively. The high significance of away favourites in the French Ligue 1, despite the ROI being highly positive, was most likely caused by the low sample. Generally, there are very little away favourites because of the home advantage. This could be especially present in the French Ligue 1. Because of the sign test p-value, there is not enough support to say that blindly betting on teams in a certain

49 situation is profitable for a certain league.

5.3 The effect of handicap size

In section 5.2 it was shown that the median returns of betting on favourites are higher than the median returns of betting on underdogs. As next step it was interesting to research if the size of the handicap also has a significant influence on the returns.

This was done by calculating the ROI per line (for lines from -3 up to +3) if a bet was made on every team and subsequently running an ordinary least squares regression analysis through these line/ROI data points.

The result of this regression is shown in Figure 5.1 and Table 5.7 in which β0 is the intersect (ROI for a line of 0) and β1 is the effect of the line. As can be seen, the effect of the line parameter is insignificant. Therefore, there is no support to claim that bigger favourites have higher ROI’s than smaller favourites.

Figure 5.1: Regression on different line ROI’s (sides)

Table 5.7: Regression results line/ROI (sides)

Estimate SE t p β0 -0.0157 0.0147 -1.0660 0.2975 β1 -0.0098 0.0082 -1.2011 0.2419

While there is no significant effect for both home and away teams combined, there could be an effect of one of these individually. Hence, regression analyses were done for the calculated ROI’s per line for home and away teams separately as well. As home teams usually are favoured, lines from -3 to +1.5 were used for them, and lines from -1.5 of +3 for the away teams. The results of these regression analyses can be seen in Figure 5.2 and Table 5.8 for the home teams and in Figure 5.3 and Table 5.9 for the away teams.

As can be seen in the tables, the coefficients for the line component β1 are insignificant for both

50 Figure 5.2: Regression on different line ROI’s (home teams)

Table 5.8: Regression results line/ROI (home teams)

Estimate SE t p β0 0.0069 0.0242 0.2834 0.7803 β1 0.0089 0.0155 0.5765 0.5718

Figure 5.3: Regression on different line ROI’s (away teams)

home and away teams, indicating that there is no significant relationship between the margin by which a team is favoured and the ROI of betting on them.

The same was done for overs, to check whether there is a significant difference in ROI of betting on low over lines compared to high over lines (and thus also for high under lines compared to low under lines). This was done for over lines from 2 to 3.5 and the results are shown in Figure 5.4 and Table 5.10. β1 again represents the influence of the line. This is, as can be seen, again insignificant. Thus there is no proof that there is a significant difference in return of betting on

51 Table 5.9: Regression results line/ROI (away teams)

Estimate SE t p β0 -0.0398 0.0233 -1.7063 0.1062 β1 0.0097 0.0149 0.6490 0.5250

higher over lines compared to betting on lower over lines.

Figure 5.4: Regression on different line ROI’s (overs)

Table 5.10: Regression results line/ROI (overs)

Estimate SE t p β0 -0.0776 0.0683 -1.1351 0.3078 β1 -0.0190 0.0244 -1.7790 0.4714

5.4 High and low risk lines

Aside from betting on the main lines, it is also possible to bet on lines with higher odds (more volatile and thus riskier) or to bet on lines with lower odds (less volatile and thus less risky). This was simulated with the same conditions as for the main lines. The results are shown in Table 5.11. As can be seen in this table, the ROI’s for the highest odds are all lower than the ROI’s for the lowest odds, which in their turn are also all lower than the ROI’s for the main lines. This is caused by the bookmaker margin which is lowest for the main line.

The ROI values of home teams, away teams, overs and unders (so it includes all bets) for the different odds could be be used in a comparative two-sample t-test. This way, it could be checked whether there are significant differences in betting on different types of odds as these ROI values can be seen as normally distributed. The p-values for these t-tests are shown in Table 5.12.

52 Table 5.11: ROI results for all bets in certain situations for highest and lowest odds offered

Situation ROI (highest odd) ROI (lowest odd) ROI (Main line) Home teams -0.0458 -0.0269 -0.0194 Away teams -0.0364 -0.0204 -0.0148 Over -0.0454 -0.0320 -0.0274 Under -0.0443 -0.0280 -0.0169 Favourites -0.0256 -0.0183 -0.0102 Underdogs -0.0492 -0.0264 -0.0217 Home favourites -0.0311 -0.0230 -0.0148 Home underdogs -0.0783 -0.0353 -0.0313 Away favourites -0.0109 -0.0059 0.0019 Away underdogs -0.0381 -0.0230 -0.0180

Table 5.12: T-test p values for ROI’s of main lines and lines with the highest and lowest odds

Situation p (t-test) Main lines vs. Highest odd lines 0.0006 Main lines vs. Lowest odd lines 0.0973 Highest odd lines vs. Lowest odd lines 0.0026

In this table, it can be seen that the difference between the ROI for the main lines and the lines with the highest odds is significant at a level of 5%. The same goes for the difference between the lines with the highest odds and the line with the lowest odds. This means that a riskier does not only result in more volatile returns, but also a lower expected return. This contradicts the idea of risk-premium in normal investment markets that increasing the risk (volatility) will lead to a higher expected return of investment. In this case, increasing the risk leads to a lower expected return of investment.

The difference between the return of the main lines and the lowest odd lines is not significant at a 5% level, but is significant at a 10% level.

53 Chapter 6

Betting simulations with model results

As final step, the predictions obtained by the three different models were tested on actual betting odds offered by PinnacleSports for the Asian handicap and Asian totals markets. Closing odds were used, so the odds taken at the moment the match started. These odds were used because the limits on these Asian markets are the highest in the world when it comes to sports betting. PinnacleSports in particular was used as they are the bookmaker offering the highest limits on these Asian betting markets. As high limits attract high-stakes players (who are generally good bettors), these are the most accurate betting markets in the world (and thus the hardest to beat). Therefore, using these odds and lines as ‘challenge’ for the model will give a good indication about the potential profitability of the predictions made by the models.

6.1 Bet criteria

To decide what matches the models should bet on, a comparison between the expectation by the bookmaker and the expectation by a model has to be made. Betting on Asian handicaps for sides is basically betting on the quality difference between two teams. Therefore, the implied spread handicap S by the models between the two teams in a certain match can simply be calculated in the following way for the home and away team respectively:

Sm,home = −hm + am (6.1)

Sm,away = hm − am (6.2)

Where S is the spread, m is the subscript for model and h and a are the home and away goals respectively.

The model implied number of total goals can simply be calculated by taking the sum of the expected home and away goals:

Tm = hm + am (6.3)

54 With the expected value for the spreads and totals predicted by the model known, it needs a comparison from the bookmaker to determine whether to play a bet or not.

A virtual line matching an odds of 2.00 was calculated for every bet situation. This was needed because odds of 2 match an implied probability of 50%. If the model line therefore mismatches with these implied bookmaker lines on the right side, it indicates the opportunity to make a profitable bet if the model’s predictions are better than the bookmaker’s. As bookmakers only offer lines with quarter points, a relationship between the odds offered by bookmakers and an expected line with odds of 2 that can serve as comparison to our model had to be found. This was done with data obtained from TxOdds. TxOdds is a company that collects data from almost all bookmakers in the world and has multiple lines for one match with different odds for PinnacleSports. A good type of relationship between line and odd had to be found: as lines can be negative in some cases (for favoured teams), this means that power or logarithmic fits were deemed unsuitable in advance.

Linear relationships were fitted and an example (Arsenal - Sunderland in the English Premier League on August 8, 2012) is shown in Figure 6.1. As can be seen, fits are fairly good but there is one problem: the implied under line would be higher than the implied over line. This would mean that there could be an arbitrage opportunity, which is impossible as both odds are offered by the same bookmaker.

Figure 6.1: Linear relation between line and odd

Figure 6.2 shows a second order polynomial fit. While the fit of the points is good, there are problems when the line is extrapolated. For the under lines, the odds increase when the line increases, which shouldn’t happen. Figure 6.3 shows an exponential fit, which shows neither of the problems encountered for the polynomial or linear fits. Therefore, the relationship between line and odd was fitted with an exponential fit, and the lines that match an odd of 2 were calculated with this relationship.

With both the implied lines by the model and the bookmaker known, the decision criterion for betting could be developed. The following criterion was used for the Asian handicap side betting:

55 Figure 6.2: Polynomial fit between line and odd

Figure 6.3: Exponential fit between line and odd

Sm + t < Sbookmaker (6.4)

In which Sbookmaker is the implied line from the bookmaker found by exponential fitting. t is the threshold, which is a parameter to be defined by the user. This parameter basically represents how selective the model should be. If it is set at zero, it will play any bet for which it thinks the bookmaker has set the line incorrectly. When the value for t is increased, the model becomes more selective and looks for opportunities that hold bigger value according to its predictions.

For betting on overs, the following criterion had to hold for the model to play the bet, with O being the bookmaker’s implied line for the over:

56 Tm − t > Obookmaker (6.5)

And finally, for unders, a bet was played if the following condition is satisfied, with U being the bookmaker’s implied line for the under:

Tm + t < Ubookmaker (6.6)

6.2 Results

The results of the bet simulations with increasing thresholds with steps of 0.1 from 0 to 1 are shown in Figure 6.4. The sides and totals are shown separately because the thresholds have a different scale in the sides than they have in the totals as the number of goals in a game is higher than the difference of goals between the two teams (unless it ends 0-0). If the models would be objectively better than the predictions made by the bookmaker and the public betting, it is be expected that the ROI’s improve with the threshold improving (because when the higher the threshold is, the more selective the model becomes). Sadly, this is not the case with any of the models that were developed in the previous sections. In contrary, the model performance generally only decreases as the threshold increases.

Figure 6.4: ROI results for betting with varying thresholds

The number of bets that were played for home teams, away teams, overs and unders for different thresholds are shown in Appendix G in Tables G.1 and G.2. In these tables, it can be seen that the models had a small bias towards playing bets on the home teams and a large bias towards playing overs rather than unders. This could be caused by having too much outliers in terms of scores in the training data. Therefore, the maximum number of goals in the training data was capped at 4. In order words: all values for goals scored in the training data higher than 4 were reduced to 4. Predictions were then made again by the three different models and they were used again for bet simulation. The results of these simulations can be seen in Figure 6.5.

As can be seen in Figure 6.5, using 4 goals as a maximum in the training data leads to no real improvement for the predictions. Therefore, these results were disregarded and the rest of the

57 Figure 6.5: ROI results for betting predictions trained on a maximum of 4 goals

analysis in this chapter focusses on the predictions made with the training data that was not capped.

6.3 Significance of results

With results are known, their significance were assessed in two steps: first, it was checked whether the returns are significantly different from zero. In case this is true, there is proof that the model is actually a losing model and it is not based on luck or bad luck. In the second stage, the performance of the selections of the models will be compared to the bookmaker’s profit margin.

6.3.1 Testing difference from zero

To test significant difference from zero, the Wilcoxon signed-rank test was again used to test whether the medians of the returns is significantly different from zero. To use this test, skewness was tested again with the formulae presented in the previous chapter. The skewness values and their 95% confidence intervals are shown in Appendix H in Tables H.1 and H.2 for the bets on the sides and totals respectively. As can be seen in these Tables, the returns of the models up to a threshold of 0.4 can be seen as symmetric for both the sides and the totals, as their confidence intervals fall between -0.5 and 0.5. Therefore, the returns for thresholds up to 0.4 were tested for significance with Wilcoxon’s signed-rank test and the returns for 0.5 and higher were tested with the standard sign test. The ROI’s and the p-values are shown in Table 6.1 and Table 6.2.

As can be seen in these tables, there is no combination of a threshold and a model for which a significant positive result is obtained, indicating that none of the models under any circum- stances is able to deliver a significant profit. For many of the lower thresholds (especially for the SEM and Fuzzy models), returns with significantly negative medians are obtained, indicating that these models deliver significant losses when being played. For all higher threshold models (t > 0.4), no significant differences for the median returns from zero are obtained, which is most

58 Table 6.1: Sides betting ROI’s and significance of return (compared to zero)

Neural network SEM Fuzzy model Threshold ROI p ROI p ROI p 0.0 -0.0036 0.0489 -0.0065 0.0256 -0.0118 0.0076 0.1 -0.0039 0.0585 -0.0158 0.0227 -0.0116 0.0290 0.2 -0.0220 0.1389 -0.0230 0.0248 -0.0152 0.0521 0.3 -0.0460 0.1255 -0.0526 0.0110 -0.0167 0.0245 0.4 -0.0531 0.2403 -0.0488 0.0348 -0.0505 0.0075 0.5 -0.0827 0.2386 -0.0831 0.3299 -0.1055 0.0993 0.6 -0.0832 0.4657 -0.1250 0.1839 -0.1424 0.0906 0.7 -0.0911 0.6718 -0.1313 0.2820 -0.1291 0.2615 0.8 -0.2205 0.4421 0.0128 1.0000 -0.0715 0.7838 0.9 -0.0584 1.0000 -0.1589 0.5572 -0.0719 0.8714 1.0 0.1623 0.7539 -0.2010 0.6291 0.0402 0.8450

Table 6.2: Totals betting ROI’s and significance of return (compared to zero)

Neural network SEM Fuzzy model Threshold ROI p ROI p ROI p 0.0 -0.0200 0.0070 -0.0191 0.0047 -0.0162 0.0079 0.1 -0.0076 0.1515 -0.0295 0.0040 -0.0104 0.0597 0.2 -0.0093 0.2630 0.0075 0.2199 -0.0324 0.0264 0.3 0.0092 0.7636 0.0014 0.4756 -0.0394 0.1278 0.4 -0.0507 0.1136 -0.0240 0.2286 -0.0175 0.4882 0.5 -0.1027 0.1150 0.0082 0.6831 -0.0271 0.2405 0.6 -0.0950 0.2945 -0.0005 0.8286 -0.0769 0.2207 0.7 -0.2283 0.1114 -0.0428 0.7067 -0.1228 0.2031 0.8 -0.4055 0.0636 -0.1646 0.1686 -0.1175 0.4104 0.9 -0.3150 0.7266 -0.2491 0.1433 -0.1153 0.4869 1.0 -0.6038 0.6250 -0.0651 1.0000 -0.1366 1.0000

likely caused by the small number of bets and the small power of the sign test.

6.3.2 Comparison against random betting

To check if the models perform better than randomly betting, signed-rank tests were done to compare the return vectors against a median which is equal to the negative of the bookmaker’s profit margin: -0.0171 for sides and -0.02215 for the totals respectively. The p-values for these signed-rank tests are shown in Tables 6.3 and 6.4 for betting on sides and overs respectively.

In Table 6.3, it can be seen that none of the models for the sides has a return that is significantly different from the bookmakers margin, implying that there is no proof that any of these models is better than randomly picking bets. For the totals, as can be seen in Table 6.4, there are significant p-values obtained. For a threshold of 0.1, the neural network and fuzzy model have a significantly higher ROI than the bookmaker margin. For a threshold of 0.2, the neural network and SEM predictions have a significantly higher median return than the bookmaker’s margin

59 at a 5% significance level. For a threshold of 0.3, the neural network and SEM predictions both select bets that have a significantly higher median return than the bookmaker’s margin. This implies that for certain settings, using the models to bet on game totals is better than randomly selecting bets.

Table 6.3: Sides betting ROI’s and significance of return (compared to bookmaker’s margin)

Neural network SEM Fuzzy model Threshold ROI p ROI p ROI p 0.0 -0.0036 0.6564 -0.0065 0.6524 -0.0118 0.9461 0.1 -0.0039 0.7326 -0.0158 0.9075 -0.0116 0.8596 0.2 -0.0220 0.7527 -0.0230 0.8556 -0.0152 0.8464 0.3 -0.0460 0.7202 -0.0526 0.4350 -0.0167 0.8426 0.4 -0.0531 0.7861 -0.0488 0.5215 -0.0505 0.3492 0.5 -0.0827 0.4291 -0.0831 0.4324 -0.1055 0.1472 0.6 -0.0832 0.4974 -0.1250 0.3713 -0.1424 0.2487 0.7 -0.0911 0.2575 -0.1313 0.2289 -0.1291 0.2806 0.8 -0.2205 0.1460 0.0128 0.8923 -0.0715 0.8001 0.9 -0.0584 0.4013 -0.1589 0.1437 -0.0719 0.7560 1.0 0.1623 0.9043 -0.2010 0.1671 0.0402 0.9693

Table 6.4: Totals betting ROI’s and significance of return (compared to bookmaker’s margin)

Neural network SEM Fuzzy model Threshold ROI p ROI p ROI p 0.0 -0.0200 0.0734 -0.0191 0.1249 -0.0162 0.0813 0.1 -0.0076 0.0319 -0.0295 0.2102 -0.0104 0.0415 0.2 -0.0093 0.0460 0.0075 0.0134 -0.0324 0.1865 0.3 0.0092 0.0395 0.0014 0.0247 -0.0394 0.1396 0.4 -0.0507 0.8288 -0.0240 0.2280 -0.0175 0.1095 0.5 -0.1027 0.5736 0.0082 0.1197 -0.0271 0.2623 0.6 -0.0950 0.5967 -0.0005 0.2612 -0.0769 0.5546 0.7 -0.2283 0.3284 -0.0428 0.9680 -0.1228 0.4507 0.8 -0.4055 0.1012 -0.1646 0.3198 -0.1175 0.4840 0.9 -0.3150 0.3750 -0.2491 0.0629 -0.1153 0.5267 1.0 -0.6038 0.2500 -0.0651 1.0000 -0.1366 0.6748

60 Chapter 7

Conclusion

With the main research done, conclusions can be made. This is done per research question and finally for the main research question. Then, the limitations of this research are assessed and ideas for future research on this subject are given.

7.1 Conclusions of research questions

The first research question was aimed at finding which game statistics and other variables from previous matches are relevant predictors for the amount of goals a team will score in the upcoming game. This question was more or less answered in two stages. At the end of chapter 3 (in Table 3.6), a list is given with all variables for which the last 5-match averages have a significant relationship with the number of goals a team scores or concedes in the next match. However, in Chapter 4, when developing the models, it is shown that most of these variables have little added value compared to the main variables. Using the results from the included variables in the final SEM and Fuzzy models, the following variables can be seen as the most relevant predictors for goals a team will score in the following game:

• Elo rating of upcoming opponent

• Shots on target made

• Mean angle of shots made

• Mean distance from opponent’s goal of shots made

• Number of successful forward passes

• Mean pass distance from the opponent’s goal of passes made

• Mean length of passes made

• Successful take-ons made

The second research question looked for the most accurate type of model for predicting goals scored by association football teams based on statistics from previous games. All three models researched were able to outperform the heuristic measures. While the neural network was able to

61 predict the goals scores for the matches in the test data with the highest accuracy, no significant differences for the absolute errors were found compared to the other models. Thus there is not enough proof that the neural network is the better model. Even if the neural network would be significantly better, researching the problem with the two other models has still been useful as the neural network provides no insight the mechanism of predictions, unlike the SEM and Fuzzy system in which the relationships between different variables are directly quantified.

Research question 3 investigated biases in the Asian betting market for European football. It was found that blindly betting on favourites resulted in a significantly higher ROI than betting on underdogs. This was also the case for specifically looking at home favourites as opposed to away underdogs and away favourites as opposed to home underdogs. However, these biases are not big enough to exploit in order to make a profit. No returns significantly higher than 0 were found for blindly betting on any team or bet that falls under a set situation. When assessing different lines, it was found that betting on lines with the lowest odds (that therefore have a high chance of occurring, and thus have a low volatility) returned a significantly higher ROI than betting on the highest odd available for a certain team (which have high volatility). This contradicts with the ’risk premium’ phenomenon in normal investment markets for which higher volatility investments should result in a higher expected return.

Finally, research question 4 checked whether the predictions of the developed models were accurate enough to make a profit when betting on the closing lines of the Asian betting market. This was not the case. None of the models could ensure a profit significantly higher than zero under any of the used selectivity levels. However, the models betting on the total number of goals scored in a match were able to generate a returns that were significantly higher than the bookmaker margin under for specific selectivity levels. This implies that, while not being able to make a significant profit, the developed models are better than randomly betting when it comes to the totals market.

7.2 Limitations and ideas for future research

This research has a few limitations. The first limitation is that there are no psychological or motivational factors included in the prediction models. It could happen that a certain top team plays a match on Sunday against a weaker side while they have a very important game upcoming in the Champions League on Tuesday or Wednesday. In these cases, it happens a lot of times that the manager rests a few of his key players and that he doesn’t care about winning by a big margin. This was not included in the models developed in this thesis and could be an interesting addition for future research. This can be done by adding 2 variables: the quantifying the number of days until the next match and the ELO-rating of the upcoming opponent (indicating the importance of the match).

The second limitation concerns the injuries or suspension of individual players. If a team has to miss one of their key players, this will influence their performance. Consider for example if a team has to miss their top goalscorer who scores 50% of their goals and a younger, inexperienced player has to step in for him. This heavily influences their performance and the number of goals they are expected in the next match. While this was not used in the models developed, the big syndicates betting on the Asian betting markets know this information and will use this to make their bets, which could be a reason the predictions of the models were not profitable.

Environmental factors were also not taken into account. However, if the temperature is really high, playing costs much more energy and the number of goals will possibly be lower. The same

62 goes for playing in heavy rain, which impedes the movement of the ball on the pitch and will also influence the number of goals. While adding weather data per match would be hard, it would be possible to add dummy variables for different seasons. These could then be combined with dummy variables for the leagues as Spanish summers are generally hotter than English summers and autumn in England will most likely be wetter than autumn in Italy.

Finally, closing odds taken from the Asian betting market might not be a good indicator of the profitability of the models. These closing odds are the sharpest odds of the market and have been adjusted for all information available (weather, last minute missing players, coaches revealing tactical plans in interviews) by the betting public. It could be possible that the models are profitable when betting earlier before the match starts. This can be simulated by using for example opening odds from the PinnacleSports.

63 References

Anders, U., & Korn, O. (1999). Model selection in neural networks. Neural Networks, 12 , 309-323.

Aslan, B., & Inceoglu, M. M. (2007). A comparative study on neural network based soccer result prediction. In Proceedings of the Seventh International Conference on Intelligent Systems Design and Applications (p. 545-550).

Avery, C., & Chevalier, J. (1999). Identifying investor sentiment from price paths: the case of football betting. Journal of Business, 72 , 493-521.

Barrow, D., & Crone, S. (2013). Crogging (cross-validation aggregation) for forecasting – a novel algorithm of neural network ensembles on time series subsamples. In The 2013 international joint conference on neural networks (IJCNN) (p. 1-8).

Bezdec, J. (1981). Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press.

Bulmer, M. (1979). Principles of statistics. Dover publications.

Cheng, T., Chu, D., Fan, Z., Zhou, J., & Lu, S. (2003). A new model to forecast the results of matches based on hybrid neural networks in the soccer rating system. In Computational Intelligence and Multimedia Applications, ICCIMA 2003 Proceedings (p. 308-313).

Chiu, S. (1994). Fuzzy model identification based on cluster elimation. Journal of Intelligent and Fuzzy Sytems, 2 , 267-279.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Mahwah, NJ: Lawrence Erlbaum.

Colley, W. (2002). Colley’s bias free college football ranking method: The Colley matrix ex- plained. (http://www.colleyrankings.com/matrate.pdf)

Constantinou, A., Fenton, N., & Neil, M. (2012). A bayesian network model for forecasting assocation football match outcomes. Knowledge-Based Systems, 36 , 322-339.

Constantinou, A., Fenton, N., & Neil, M. (2013). Profiting from an inefficient association football market: Prediction, risk and uncertainty using bayesian networks. Knowledge- Based Systems, 50 , 60-86.

Cramer, D. (1997). Basic statistics for social research. Routledge.

64 Crowther, P., & Cox, R. (2005). A method for optimal division of data sets for use in neural networks. In Knowledge-based intelligent information and engineering systems 9th inter- national conference, kes 2005, melbourne, australia, september 14-16, 2005, proceedings, part iv (p. 1-7). Springer-Verlag.

Dixon, M., & Coles, S. (1997). Modelling association football scores and inefficiencies in the football betting market. Applied Statistics, 46 (2), 265-280.

Dixon, M., & Pope, P. (2004). The value of statistical forecasts in the UK association football betting market. International Journal of Forecasting, 20 , 697-711.

Dixon, M., & Robinson, M. (1998). A birth process for association football matches. The Statistician, 47 (3), 523-538.

Dunn, J. (1973). A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. Journal of Cybernetics, 3 (3), 32-57.

Dupont, G., Nedelec, N., McCall, A., McCormack, D., Berthoin, S., & Wisloff, U. (2010). Effect of 2 soccer matches in a week on physical performance and injury rate. The American Journal of Sports Medicine, 38 (9).

Elo, A. E. (1978). The rating of chessplayers, past and present. Arco.

Forrest, D., & Simmons, R. (2008). Sentiment in the betting market on Spanish football. Applied Economics, 40 , 119-126.

Hair., J., Hult, M., Tomas, G., Ringle, C., & Sarstedt, M. (2014). A primer on partial least squares structural equation modeling (PLS-SEM). SAGE.

Heafke, C., & Helmenstein, C. (1996). The applicability of information criteria for neural network architecture selection. Computational Intelligence for Financial Engineering, 1996., Proceedings of the IEEE/IAFE 1996 Conference.

Hecht-Nielsen, R. (1987). Kolmogorov’s mapping neural network existence theorem. In IEEE first annual international conference on neural networks (Vol. 3, p. 11-13).

Hvattum, L., & Arntzen, H. (2010). Using ELO ratings for match result prediction in association football. International Journal of Forecasting, 26 , 460-470.

Jackson, D. (1990). The parameter gaming in sports. In Proceedings of the International gaming conference. (p. 308-313).

Jang, J., Sun, C., & Mizutani, E. (1997). Neuro-fuzzy and soft computing. Prentice Hill.

Jones, A. (2000). International soccer ranks and ratings. Statistics in sports, 2 (1).

Kantardzic, M. (2011). Data mining: Concepts, models, methods, and algorithms (2nd ed.). John Wiley & Sons Inc.

Kelly, J., J. L. (1956). A new interpretation of information rate. The Bell System Technical Journal, 35 , 917–926.

65 Kisi, O., & Zounemat-Kermani, M. (2014). Comparison of two different adaptive neuro-fuzzy inference systems in modelling daily reference evapotranspiration. Water resource man- agement, 28 .

Kurkova, V. (1992). Kolmogorov’s theorem and multilayer neural networks. Neural Networks, 5 .

Lago-Pe˜nas,C., Rey, E., Lago-Ballesteros, J., Cas´ais,L., & Dom´ınguez,E. (2011). Influence of a congested calendar on physical performance in elite soccer. Journal of Strength and Condition Research, 25 (8), 2111-2117.

Lawrence, L. (1950). Bookmaking. The Annals of the American Academy of Political and Social Science, 269 , 46-54.

Levitt, S. (2004). Why are gambling markets organised so differently from financial markets? The Economic Journal, 114 , 223-246.

Liu, H., Gomez, M., Lago-Pe˜nas,C., & Sampaio, J. (2014). Match statistics related to winning in the group stage of the 2014 Brazil FIFA World Cup. Journal of Sports Sciences, 33 (12), 1205-1213.

Maher, M. (1982). Modelling association football scores. Statistica Neerlandica, 36 (3), 109-118.

Mamdani, E., & Assilian, S. (1975). An experiment in linguistic synthesis with a fuzzy logic controller. International Journal of man-machine studies, 4 , 1-13.

Massey, K. (1997). Statistical models applied to the rating of sports teams. Honors project.

Mirrashid, M. (2014). Earthquake magnitude prediction by adaptive neurofuzzy inference system (ANFIS) based on fuzzy c-means algorithm. Hazards, 74 .

Moroney, M. (1951). Facts from figures. Pelican.

Morris, D. Z. (2016). Google’s go computer beats top-ranked human. Webpage. (Retrieved from http://fortune.com/2016/03/12/googles-go-computer-vs-human)

Moura, F., Martins, L., & Cunha, S. (2014). Analysis of football game-related statistics using multivariate techniques. Journal of Sports Sciencess, 32 (20), 1881-1887.

Paul, R., & Weinbach, A. (2007). Does sportsbook.com set pointspreads to maximize profits? tests of the Levitt model of sportsbook behaviour. The Journal of Prediction Markets, 1 (3), 209-218.

Paul, R., & Weinbach, A. (2008). Price setting in the NBA gambling market: Test of the Levitt model of sportsbook behaviour. International Journal of Sport Finance, 3 , 137-145.

Rotshtein, A., Posner, M., & Rakityanskaya, A. B. (2005). Football predictions based on a fuzzy model with genetic and neural tuning. Cybernetics and Systems Analysis, 41 (4), 619-630.

Sanikhani, H., Kisi, O., Kiafar, H., Zaman, S., & Ghavidel, Z. (2015). Comparison of different

66 data-driven approaches for modeling lake level fluctuations: The case of Manyas and Tuz lakes (Turkey). Water resource management, 29 .

Scoppa, V. (2015). Fatigue and team performance in soccer: evidence from the FIFA world cup and the UEFA European championship. Journal of Sports Economics, 16 (5), 482-507.

Sharma, B., & Venugopalan, K. (2014). Comparison of neural network training functions for hematoma classification in brain ct images. IOSR Journal of Computer Engineering, 16 (1).

Singh, Y., & RoyChowdhury, P. (2001). Dynamic tunneling based regularization in feedforward neural networks. Artificial Intelligence, 131 , 55-71.

Swanson, N., & White, H. (1995). A model-selection approach to assessing the information in the term structure using linear models and artificial neural networks. Journal of Business & Economic Statistics, 13 (3), 265-275.

Takagi, T., & Sugeno, M. (1985). Fuzzy identification of systems and its applications to modeling and control. Transactions on systems, man, and cybernetics, 15 .

Tsakonas, A., Dounias, G., Shtovba, S., & Vivdyuk, V. (2002). Soft computing-based result prediction of football games. In The 1st international conference on inductive modelling (p. 20-25).

Turban, E., Sharda, R., & Delen, D. (2010). Decision support and business intelligence systems. In (chap. 6). Prentice Hall.

Vaughan Williams, L. (1999). Information efficiency in betting markets: A survey. Bulletin of Economic Research, 51 , 1-30.

Wagonner, B., Wines, D., Soebbing, B. P., Seifried, C. S., & Martinez, J. (2014). “Hot hand” in the National Basketball Assocation point spread betting market: A 34-year analysis. International Journal of Financial Studies, 8 , 359-370.

Woodland, L., & Woodland, B. (2001). Market efficiency and profitable wagering in the National Hockey League: Can bettors score on longshots. Southern Economic Journal, 67 , 983-995.

Woodland, L., & Woodland, B. (2003). The reverse favourite-longshot bias and market efficiency in Major League Baseball: An update. Bulletin of Economic Research, 55 , 113-123.

Yager, R., & Filev, D. (1994). Generation of fuzzy rules by mountain clustering. Journal of Intelligent and Fuzzy Sytems, 2 , 209-219.

Yen, J., & Wang, L. (1998). Application of statistical information criteria for optimal fuzzy model construction. IEEE Transactions on fuzzy systems, 6 (3).

Yue, Z., Broich, H., Mester, J., & Seifriz, F. (2014). Statistical analysis for the first Bundesliga in the current soccer season. Progress in Applied Mathematics, 7 (2), 1-8.

Zuber, R., Gander, J., & Bowers, B. (1985). Testing the efficiency of the gambling market for

67 National Football League games. Journal of political economy, 93 (4), 800-806.

68 Appendices

69 Appendix A

Collected match statistics

The following variables were collected from Squawka’s match reports:

• Team names • Match date • Ball Possession • Number of offsides • Total shots • Total shots on target • Total shots blocked • Average shot location (measured by average absolute angle and average distance from the centre of the goal) • Tackles attempted • Successful tackles • Tackling success rate • Crosses attempted • Successful crosses • Average cross starting position (again, measured by angle and distance from the centre of the goal) • Average cross distance • Passes attempted • Successful passes • Passing success rate • Forward passes attempted • Successful forward passes • Average starting position of passes (measured by angle and distance from the centre of the goal) • Average pass distance • Goal keeper save percentage

70 • Percentage of headed duels won • Number of interceptions • Percentage of passes intercepted • Attempted take-ons • Successful take-ons • Take-on success rate • Corners • Yellow cards • Red cards • Fouls

71 Appendix B

Signifiance of variables

This chapter contains the resulting p-values for the significance tests that were done to evaluate the relationship between the different match variable averages and the goals scored and conceded in the next match.

In the tables, abbreviations are used for the different averages. These abbreviations are shown below in table B.1.

Table B.1: Abbreviations of averages

Abbreviation Meaning L2 Average of last 2 matches L3 Average of last 3 matches L4 Average of last 4 matches L5 Average of last 5 matches WA Weighted Average (35% - 35% - 10% - 10% - 10% for the last 5 matches) LHLA Average of last home and last away game L2HL2A Average of last 2 home games and last 2 away games

72 Table B.2: Significance of relationship of match variable averages with goals scored

P-values for linear regression relationship on goals scored Average type L2 L3 L4 L5 WA LHLA L2HL2A Yellow cards received 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Red cards received 0.036 0.006 0.000 0.000 0.001 0.012 0.000 Possession 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Fouls committed 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Offsides committed 0.288 0.174 0.166 0.227 0.218 0.455 0.206 Own shots 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Own shots on target 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Own shots, blocked 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean shot distance 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean angle of shots 0.009 0.008 0.002 0.000 0.001 0.025 0.000 % blocked own shots 0.181 0.341 0.210 0.067 0.085 0.176 0.237 Save % (own keeper) 0.045 0.004 0.003 0.000 0.005 0.040 0.001 Tackles attempted 0.102 0.047 0.024 0.012 0.025 0.061 0.045 Successful tackles 0.134 0.086 0.092 0.116 0.137 0.117 0.054 Tackling success % 0.002 0.001 0.000 0.000 0.000 0.000 0.001 Crosses attempted 0.001 0.038 0.003 0.009 0.009 0.001 0.002 Successful crosses 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Crossing success % 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean cross length 0.006 0.003 0.000 0.000 0.001 0.005 0.000 Mean cross distance 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean cross angle 0.000 0.000 0.000 0.000 0.000 0.000 0.000 % headed duels won 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Passes tried 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Successful passes 0.000 0.000 0.000 0.000 0.000 0.000 0.000 % successful passes 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean pass length 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean pass distance 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean pass angle 0.373 0.212 0.410 0.112 0.118 0.540 0.332 Forward passes tried 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Successful forward passes 0.000 0.000 0.000 0.000 0.000 0.000 0.000 % successful forward passes 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Successful interceptions 0.000 0.000 0.000 0.000 0.000 0.000 0.000 % passes intercepted 0.040 0.041 0.019 0.030 0.039 0.055 0.022 Corners taken 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Take-ons attempted 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Take-ons successful 0.000 0.000 0.000 0.000 0.000 0.000 0.000 % take-ons successful 0.000 0.000 0.000 0.000 0.000 0.000 0.000

73 Table B.3: Significance of relationship of match variable averages with goals conceded

P-values for linear regression relationship on goals conceded Average type L2 L3 L4 L5 WA LHLA L2HL2A Yellow cards received 0.005 0.076 0.003 0.001 0.001 0.000 0.012 Red cards received 0.727 0.472 0.060 0.026 0.251 0.547 0.066 Possession 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Fouls committed 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Offsides committed 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Own shots 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Own shots on target 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Own shots, blocked 0.261 0.101 0.014 0.006 0.035 0.140 0.004 Mean shot distance 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean angle of shots 0.313 0.470 0.477 0.165 0.893 0.141 0.848 % blocked own shots 0.052 0.169 0.123 0.095 0.094 0.036 0.168 Save % (own keeper) 0.234 0.104 0.088 0.013 0.202 0.177 0.031 Tackles attempted 0.039 0.069 0.182 0.263 0.101 0.143 0.218 Successful tackles 0.643 0.840 0.922 0.710 0.765 0.856 0.832 Tackling success % 0.005 0.003 0.009 0.006 0.007 0.006 0.004 Crosses attempted 0.963 0.860 0.250 0.075 0.283 0.417 0.052 Successful crosses 0.003 0.158 0.004 0.002 0.004 0.003 0.000 Crossing success % 0.022 0.037 0.014 0.049 0.018 0.132 0.017 Mean cross length 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean cross distance 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean cross angle 0.417 0.134 0.087 0.044 0.139 0.277 0.091 % headed duels won 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Passes tried 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Successful passes 0.000 0.000 0.000 0.000 0.000 0.000 0.000 % successful passes 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean pass length 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean pass distance 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Mean pass angle 0.592 0.755 0.402 0.286 0.352 0.224 0.272 Forward passes tried 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Successful forward passes 0.000 0.000 0.000 0.000 0.000 0.000 0.000 % successful forward passes 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Successful interceptions 0.081 0.016 0.004 0.006 0.021 0.071 0.004 % passes intercepted 0.163 0.250 0.122 0.054 0.070 0.117 0.090 Corners taken 0.013 0.001 0.000 0.000 0.000 0.004 0.000 Take-ons attempted 0.006 0.008 0.000 0.000 0.000 0.008 0.000 Take-ons successful 0.000 0.000 0.000 0.000 0.000 0.000 0.000 % take-ons successful 0.001 0.002 0.002 0.004 0.002 0.004 0.002

74 Appendix C

SEM results

Table C.1: Values and significance of weights and loadings for measurement model

Indicator name W p-value L p-value Home yellow cards 0.4800 0.0150 0.7990 0.0000 Home red cards 0.1600 0.1860 0.3430 0.0170 Home fouls 0.6410 0.0000 0.8750 0.0000 Home shots on target 0.8400 0.0000 0.9540 0.0000 Home shots blocked 0.1900 0.0020 0.4530 0.0000 Home mean shot distance -0.2600 0.0000 -0.4430 0.0000 Home mean shot angle 0.0170 0.3890 -0.1480 0.0060 Home keeper save % -0.4650 0.1600 -0.4910 0.1590 Home total tackles attempted -0.0790 0.3170 0.1830 0.2050 Home tackling success % -0.3750 0.1620 -0.4780 0.1560 Home crosses attempted 0.4910 0.1640 -0.2460 0.1630 Home successful crosses -0.3200 0.1700 -0.3540 0.1600 Home mean cross length -0.2250 0.1740 0.1870 0.1670 Home mean cross distance 0.7600 0.1560 0.7480 0.1550 Home mean cross angle -0.1460 0.1980 0.4150 0.1570 Home header % 0.5330 0.0000 0.6070 0.0000 Home mean pass length -0.0720 0.1710 0.5950 0.0000 Home mean pass distance 0.2140 0.0010 0.6140 0.0000 Home successful passes forward -0.9250 0.0000 -0.9850 0.0000 Home interceptions made 0.7610 0.1510 0.7970 0.1500 Home corners taken -0.7390 0.1570 -0.7320 0.1560 Home take-ons attempted 0.0680 0.3540 0.7310 0.0000 Home successful take-ons 0.7390 0.0000 0.8480 0.0000 Home shots on target conceded 0.9690 0.0000 0.9970 0.0000 Home conceded shots blocked 0.0690 0.2820 0.3170 0.0020 Home mean distance of shots conceded -0.0120 0.4610 -0.2420 0.0170 Home mean angle of shots conceded -0.0500 0.3370 -0.1770 0.0630 Home cross attempts conceded 0.4530 0.0070 0.4990 0.0000 Home successful crosses conceded 0.3140 0.0420 0.6490 0.0000 Home mean length of crosses conceded 0.6980 0.0000 0.5090 0.0000 Home mean distance of crosses conceded -0.2780 0.0200 -0.1590 0.0880

75 Indicator name W p-value L p-value Home mean angle of crosses conceded -0.4340 0.0000 -0.3940 0.0000 Home mean length of passes conceded 0.9400 0.0000 0.2820 0.0210 Home mean distance of passes conceded -0.6660 0.0030 -0.4620 0.0070 Home successful forward passes conceded 0.6750 0.0000 0.6330 0.0000 Away yellow cards 0.6040 0.0000 0.8730 0.0000 Away red cards 0.2280 0.0540 0.4060 0.0010 Away fouls 0.4840 0.0010 0.7860 0.0000 Away shots on target 0.8760 0.0000 0.9710 0.0000 Away shots blocked 0.1470 0.0430 0.4080 0.0000 Away mean shot distance -0.1690 0.0140 -0.3740 0.0000 Away mean shot angle -0.1090 0.0710 -0.2410 0.0000 Away keeper save % -0.4320 0.1300 -0.4520 0.1260 Away total tackles attempted 0.3430 0.1030 0.5630 0.0650 Away tackling success % -0.2860 0.1780 -0.4430 0.1240 Away crosses attempted 0.7050 0.1320 -0.1710 0.1890 Away successful crosses -0.4470 0.1470 -0.3370 0.1540 Away mean cross length -0.1340 0.2160 0.2340 0.1330 Away mean cross distance 0.7540 0.1250 0.7610 0.1220 Away mean cross angle -0.2160 0.1840 0.3860 0.1230 Away header % 0.4120 0.0000 0.5080 0.0000 Away mean pass length -0.0500 0.2790 0.6140 0.0000 Away mean pass distance 0.2260 0.0020 0.6220 0.0000 Away successful passes forward -0.9060 0.0000 -0.9820 0.0000 Away interceptions made 0.6390 0.0640 0.7590 0.0560 Away corners taken -0.7580 0.1360 -0.6740 0.1350 Away take-ons attempted 0.0410 0.4030 0.7860 0.0000 Away successful take-ons 0.8310 0.0000 0.9120 0.0000 Away shots on target conceded 0.7490 0.1130 0.8850 0.1140 Away conceded shots blocked 0.0770 0.2460 0.2240 0.1160 Away mean distance of shots conceded -0.4350 0.1360 -0.6370 0.1280 Away mean angle of shots conceded -0.1330 0.2070 -0.3220 0.1460 Away cross attempts conceded 0.4660 0.0150 0.6210 0.0000 Home successful crosses conceded 0.4110 0.0250 0.7530 0.0000 Away mean length of crosses conceded 0.4620 0.0020 0.2890 0.0140 Away mean distance of crosses conceded -0.2620 0.0640 -0.2930 0.0210 Away mean angle of crosses conceded -0.4820 0.0010 -0.3960 0.0030 Away mean length of passes conceded 1.0350 0.0000 0.3970 0.0030 Away mean distance of passes conceded -0.7740 0.0020 -0.4470 0.0100 Away successful forward passes conceded 0.5030 0.0030 0.4820 0.0040 Home ELO 1.0000 0.0000 1.0000 0.0000 Away ELO 1.0000 0.0000 1.0000 0.0000 Home rest 1.0000 0.0000 1.0000 0.0000 Away rest 1.0000 0.0000 1.0000 0.0000 Home mean ELO of past opponents 1.0000 0.0000 1.0000 0.0000 Away mean ELO of past opponents 1.0000 0.0000 1.0000 0.0000 Home goals 1.0000 0.0000 1.0000 0.0000 Away goals 1.0000 0.0000 1.0000 0.0000

76 Table C.2: VIF values for constructs

Construct VIF value Home passes & control 2.7689 Home crosses 1.4392 Home rest 3.4911 Home defence 1.1768 Home discipline 1.3229 Home duelling power 1.4539 Home shooting 1.8991 Home passes conceded 2.0054 Home shots conceded 1.4275 Away passes & control 2.6872 Away crosses 1.3962 Away rest 3.4889 Away defence 1.1841 Away discipline 1.3066 Away duelling power 1.4516 Away shooting 1.8938 Away passes conceded 1.7809 Away shots conceded 1.3381 Home ELO 2.2031 Away ELO 2.2179 Home opponent average ELO 1.1650 Away opponent average ELO 1.1331 Home goals 1.1617 Away goals 1.1103

77 Table C.3: Weights and significances constructs of baseline model

Relationship Path weight p-value Home passes & control → home goals -0.1680 0.0000 Home passes & control → away goals 0.0510 0.0330 Home crosses → home goals 0.0140 0.1950 Home rest → away goals 0.0290 0.0260 Home defence → home goals -0.0160 0.1430 Home defence → away goals 0.0020 0.4600 Home discipline → home goals 0.0020 0.4500 Home discipline → away goals -0.0140 0.2400 Home duelling power → home goals 0.0110 0.2650 Home duelling power → away goals 0.0150 0.1990 Home shooting → home goals 0.1500 0.0000 Home passes conceded → away goals -0.0130 0.2780 Home shots conceded → away goals 0.0550 0.0010 Away passes control → home goals 0.0440 0.0410 Away passes control → away goals -0.1060 0.0000 Away crosses → away goals 0.0010 0.4660 Away rest → home goals 0.0110 0.2140 Away defence → home goals 0.0070 0.3250 Away defence → away goals 0.0030 0.4310 Away discipline → home goals 0.0160 0.1800 Away discipline → away goals -0.0170 0.1520 Away duelling power → home goals 0.0190 0.1310 Away duelling power → away goals 0.0580 0.0010 Away shooting → away goals 0.1140 0.0000 Away passes conceded → home goals 0.0040 0.4270 Away shots conceded → home goals 0.0750 0.0000 Home ELO → away goals -0.1300 0.0000 Away ELO → home goals -0.1290 0.0000 Home opponent average ELO → home goals 0.0250 0.0440 Away opponent average ELO → away goals 0.0150 0.1590

78 Table C.4: Weights and significances constructs of model after first iteration

Relationship Path weight p-value Home passes & control → home goals -0.1800 0.0000 Home passes & control → away goals 0.0240 0.0970 Home shooting → home goals 0.1500 0.0000 Home shots conceded → away goals 0.0640 0.0000 Away passes control → home goals 0.0740 0.0000 Away passes control → away goals -0.1350 0.0000 Away shooting → away goals 0.1430 0.0000 Away shots conceded → home goals 0.0720 0.0680 Home ELO → away goals -0.1310 0.0000 Away ELO → home goals -0.1010 0.0000

79 Table C.5: Weights and significances constructs of model after second iteration

Relationship Path weight p-value Home passes & control → home goals -0.1700 0.0080 Home shooting → home goals 0.154 0.0000 Away passes control → away goals -0.1320 0.1010 Away shooting → away goals 0.1460 0.0000 Home ELO → away goals -0.1740 0.0000 Away ELO → home goals -0.1810 0.0000

80 Appendix D

Fuzzy system variable selection

Figure D.1: RMSE results for home goals with 1 variable fuzzy interference systems

81 Figure D.2: RMSE results for away goals with 1 variable fuzzy interference systems

82 Table D.1: RMSE results for home goals for a 3 variable fuzzy interference system

Variables RMSE Training 1 2 3 Test set set Upcoming opponent’s Own successful Own shots on target successful forward 1.2216 1.2492 forward passes passes Own successful Own shots on target Opponent ELO 1.2123 1.2515 forward passes Upcoming opponent’s Mean distance from Own successful successful forward 1.2301 1.2615 goal of own passes forward passes passes Upcoming opponent’s Own successful successful forward Opponent ELO 1.2223 1.2623 forward passes passes Mean distance from Own successful Own shots on target 1.2271 1.2624 goal of own passes forward passes Mean distance from Own shots on target Opponent ELO 1.2171 1.2625 goal of own passes Mean length of own Own shots on target Opponent ELO 1.2173 1.2646 passes Mean length of own Own successful Own shots on target 1.2274 1.2653 passes forward passes Mean length of own Own successful Opponent ELO 1.2243 1.2673 passes forward passes Mean distance from Own successful Opponent ELO 1.2220 1.2683 goal of own passes forward passes

83 Table D.2: RMSE results for away goals for a 3 variable fuzzy interference system

Variables RMSE Training 1 2 3 Test set set Upcoming opponent’s Own successful successful forward Opponent ELO 1.1076 1.1377 forward passes passes Upcoming opponent’s Own successful Own successful successful forward 1.1137 1.1384 forward passes take-ons passes Mean distance from Own successful Own shots on target 1.1193 1.1391 goal of own passes forward passes Upcoming opponent’s Own successful Own shots on target successful forward 1.1080 1.1395 forward passes passes Previous opponent’s Own shots on target Home ELO successful forward 1.1043 1.1413 passes Mean length of own Own shots on target Opponent ELO 1.1053 1.1422 passes Own successful Own successful Opponent ELO 1.1051 1.1422 forward passes take-ons Mean length of own Own successful Own shots on target 1.1211 1.1423 passes forward passes Mean distance from Own successful Opponent ELO 1.1063 1.1423 goal of own passes forward passes Own successful Own shots on target Opponent ELO 1.1070 1.1431 forward passes

84 Appendix E

Fuzzy system SRIC values

85 Table E.1: SRIC results for a grid partition FIS with different variables

Included variables SRIC (2 MF) SRIC (3 MF) Own successful forward passes 0.3786 0.3838

Own successful forward posses 0.3681 0.3857 Own shots on target

Own successful forward passes 0.3519 0.3692 Opponent Elo

Own successful forward passes Opponent Elo 0.3468 0.4085 Own shots on target

Own successful forward passes Opponent Elo 0.3738 0.5878 Own shots on target Mean pass distance from goal of own passes

Own successful forward passes Opponent Elo 0.3728 0.5918 Own shots on target Upcoming opponent’s successful forward passes

Own successful forward passes Opponent Elo Own shots on target 0.4272 1.1386 Upcoming opponent’s successful forward passes Mean pass distance from goal of own passes

Own successful forward passes Opponent Elo Own shots on target 0.5354 unsolvable Upcoming opponent’s successful forward passes Mean pass distance from goal of own passes Mean length of own passes

86 Table E.2: SRIC results for a subtractive clustering FIS with different variables

Included variables SRIC (r = 0.25) SRIC (r = 0.5) Own successful forward passes 0.3865 -

Own successful forward posses 0.3683 - Own shots on target

Own successful forward passes 0.3618 - Opponent Elo

Own successful forward passes Opponent Elo 0.3446 0.3243 Own shots on target

Own successful forward passes Opponent Elo 0.3428 0.3241 Own shots on target Mean pass distance from goal of own passes

Own successful forward passes Opponent Elo 0.3411 0.3238 Own shots on target Upcoming opponent’s successful forward passes

Own successful forward passes Opponent Elo Own shots on target 0.3406 0.3242 Upcoming opponent’s successful forward passes Mean pass distance from goal of own passes

Own successful forward passes Opponent Elo Own shots on target 0.3505 0.3240 Upcoming opponent’s successful forward passes Mean pass distance from goal of own passes Mean length of own passes

87 Table E.3: SRIC results for a fuzzy c-means clustering FIS with different variables

Included variables SRIC (Takagi-Sugeno) SRIC (Mamdani) Own successful forward passes - -

Own successful forward posses - - Own shots on target

Own successful forward passes 0.3491 0.8232 Opponent Elo

Own successful forward passes Opponent Elo 0.3313 1.5371 Own shots on target

Own successful forward passes Opponent Elo 0.3317 0.8739 Own shots on target Mean pass distance from goal of own passes

Own successful forward passes Opponent Elo 0.3303 0.7364 Own shots on target Upcoming opponent’s successful forward passes

Own successful forward passes Opponent Elo Own shots on target 0.3308 0.7674 Upcoming opponent’s successful forward passes Mean pass distance from goal of own passes

Own successful forward passes Opponent Elo Own shots on target 0.3301 0.9792 Upcoming opponent’s successful forward passes Mean pass distance from goal of own passes Mean length of own passes

Own successful forward passes Opponent Elo Own shots on target Upcoming opponent’s successful forward passes 0.3313 0.8244 Mean pass distance from goal of own passes Mean length of own passes Own successful take-ons

88 Appendix F

Skewness values and confidence intervals of bias return vectors

Table F.1: Skewness values and confidence intervals for main lines

Situation Skewness value Lower 95% confidence limit Upper 95% confidence limit Home teams 0.0251 -0.0361 0.0863 Away teams -0.0028 -0.0640 0.0585 Over 0.0391 -0.0222 0.1003 Under -0.0219 -0.0831 0.0394 Favourites 0.0384 -0.0271 0.1038 Underdogs -0.0226 -0.0880 0.0429 Home favourites 0.0415 -0.0353 0.1184 Home underdogs -0.0126 -0.1373 0.1121 Away favourites 0.0304 -0.0943 0.1552 Away underdogs -0.0265 -0.1034 0.0504

89 Table F.2: 95% confidence limits for skewness of returns per situation for different leagues

Situation Bundesliga Eredivisie Premier League Serie A Ligue 1 La Liga -0.1704 -0.1475 -0.0811 -0.1122 -0.0933 -0.1553 Hometeams 0.1534 0.1726 0.2091 0.1783 0.1959 0.1353

-0.1222 -0.1408 -0.1859 -0.1591 -0.1802 -0.1192 Awayteams 0.2017 0.1793 0.1043 0.1313 0.1090 0.1714

-0.1468 -0.1149 -0.1067 -0.1061 -0.1132 -0.0804 Over 0.1771 0.2052 0.1834 0.1843 0.1760 0.2102

-0.1437 -0.1753 -0.1708 -0.1691 -0.1698 -0.2008 Under 0.1802 0.1448 0.1193 0.1213 0.1195 0.0898

-0.0592 -0.1299 -0.1240 -0.1456 -0.1831 -0.0749 Favourites 0.2917 0.2074 0.1847 0.1637 0.1323 0.2323

-0.2761 -0.1963 -0.1651 -0.1489 -0.1165 -0.2153 Underdogs 0.0748 0.1410 0.1435 0.1604 0.1988 0.919

-0.1172 -0.1820 -0.1340 -0.1674 -0.1330 -0.1407 Home favourites 0.2941 0.2146 0.2345 0.2003 0.2274 0.2189

-0.5034 -0.4061 -0.2355 -0.2657 -0.0456 -0.1072 Home underdogs 0.1640 0.2313 0.3264 0.3041 0.6018 0.1239

-0.1436 -0.2210 -0.2974 -0.2942 -0.5848 -0.1072 Away favourites 0.5238 0.4163 0.2645 0.2755 0.0625 0.4805

-0.2804 -0.2032 -0.2190 -0.1839 -0.2121 -0.2022 Away underdogs 0.1309 0.1935 0.1495 0.1835 0.1483 0.1574

90 Appendix G

Number of bets simulated for different models

Table G.1: Number of bets made by the models on sides for different thresholds

Neural network SEM Fuzzy model Threshold home away home away home away 0.0 650 572 661 570 684 554 0.1 510 425 522 429 555 419 0.2 385 284 385 308 440 316 0.3 264 188 268 219 330 224 0.4 162 113 173 143 232 159 0.5 100 70 104 87 140 102 0.6 57 37 67 52 97 65 0.7 32 18 43 32 62 39 0.8 15 12 29 21 33 24 0.9 8 8 13 13 22 17 1.0 5 5 8 9 14 12

Table G.2: Number of bets made by the models on totals for different thresholds

Neural network SEM Fuzzy model Threshold over under over under over under 0.0 841 336 886 302 904 275 0.1 695 236 788 219 797 205 0.2 558 156 670 163 663 160 0.3 392 110 536 115 519 107 0.4 232 78 382 76 363 76 0.5 123 51 275 45 238 45 0.6 62 33 169 32 132 30 0.7 28 22 94 25 73 22 0.8 8 11 51 18 42 16 0.9 3 5 27 14 23 13 1.0 1 3 11 8 10 6

91 Appendix H

Skewness values and confidence intervals for bet simulation returns

Table H.1: Skewness 95% confidence intervals for returns when betting on sides

Neural network SEM Fuzzy model Threshold LCL UCL LCL UCL LCL UCL 0.0 -0.1540 0.1202 -0.1444 0.1290 -0.1353 0.1373 0.1 -0.1744 0.1392 -0.1482 0.1627 -0.1527 0.1545 0.2 -0.1536 0.2168 -0.1578 0.2061 -0.1612 0.1873 0.3 -0.1410 0.3092 -0.1339 0.2999 -0.1964 0.2105 0.4 -0.1897 0.3862 -0.1918 0.3458 -0.1669 0.3169 0.5 -0.2014 0.5287 -0.1995 0.4899 -0.1061 0.5074 0.6 -0.3287 0.6463 -0.1882 0.6812 -0.1003 0.6473 0.7 -0.5018 0.8177 -0.2913 0.7961 -0.2331 0.7086 0.8 -0.4719 1.2837 -0.7045 0.6150 -0.5030 0.7370 0.9 -1.0684 1.1436 -0.6003 1.1855 -0.6183 0.8643 1.0 -1.7466 0.9466 -0.7141 1.4409 -1.0258 0.7600

Table H.2: Skewness 95% confidence intervals for returns when betting on totals

Neural network SEM Fuzzy model Threshold LCL UCL LCL UCL LCL UCL 0.0 -0.0911 0.1884 -0.0955 0.1827 -0.1026 0.1762 0.1 -0.1245 0.1897 -0.0820 0.2201 -0.1166 0.1863 0.2 -0.1391 0.2195 -0.1667 0.1654 -0.0820 0.2521 0.3 -0.2121 0.2152 -0.1648 0.2107 -0.0787 0.3042 0.4 -0.1606 0.3822 -0.1526 0.2946 -0.1618 0.2950 0.5 -0.1503 0.5715 -0.2593 0.2749 -0.1905 0.3772 0.6 -0.2887 0.6814 -0.3273 0.3450 -0.2187 0.5289 0.7 -0.1892 1.1303 -0.3625 0.5069 -0.2400 0.7300 0.8 -0.1013 1.9519 -0.2368 0.8950 -0.3996 0.8402 0.9 -0.9166 2.0316 -0.2596 1.1888 -0.5412 0.9976 1.0 -0.8331 3.1425 -0.9276 1.1256 -0.9245 1.2875

92