Data-Based Goal Prediction in Association Football, Tested on the Asian Betting Market
Total Page:16
File Type:pdf, Size:1020Kb
Eindhoven University of Technology MASTER Does data know more than the market? data-based goal prediction in association football, tested on the Asian betting market Neggers, R. Award date: 2016 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Eindhoven, October 2016 Does data know more than the market? Data-based goal prediction in association foot- ball, tested on the Asian betting market. by Rob Neggers BEng Mechanical Engineering Student identity number 0721387 In partial fulfilment of the requirement for the degree of: Master of Science (MSc) in Operations Management & Logistics First supervisor: Dr. S.S. Dabadghao Second Supervisor: Dr. M. Udenio Third assessor: Dr. A.M. Wilbik TUE. School of Industrial Engineering Series Master Thesis Operations Management and Logistics Subject headings: Association football, Fuzzy interference systems, Neural networks, Prediction of sports, Sports betting, Structural equation modelling. Abstract In this thesis, game variables from preceding games are used to predict the outcome of asso- ciation football games. To do so, three different model types are used: partial least squares structural equation modelling (PLS-SEM), artificial neural networks and fuzzy interference sys- tems. These three models return predictions that have comparable errors and outperform other heuristics. Investigation of the Asian betting market shows that there are no biases big enough to be significantly profitable by blindly betting on a bet in a certain situation. It was however found that betting on favourites results in significantly higher returns than betting on under- dogs. When using the predictions of the different models to select bets using the closing lines of the Asian betting market, no significantly positive returns were obtained under any situa- tion. This implies that the used models are not good enough to make a profit by using their predictions. i Preface & Acknowledgements This thesis report marks the end of my time in the master's programme of Operations Man- agement & Logistics and my time as a student in general. It is the turning point from being a student to the working life. I owe gratitude to the people that made this period challenging, interesting and most of all fun. First of all, I would like to thank dr. Shaunak Dabadghao, my first supervisor. For not only being my supervisor and providing me with the right feedback and ideas at the right times, but even more so for accepting me to do my thesis about the application of operations research methods to sports. A field in which no master thesis projects are usually done at the faculty of Industrial Engineering, but a field that interests me greatly and I am truly passionate about. I would also like to thank my second university supervisor, dr. Maximiliano Udenio, for helping me out with my questions related to statistical methods and testing and for taking the time to provide extensive feedback on my final report. I want to thank Joris Bekkers, who showed me the relative ease of using Python to collect data from Squawka. Without this insight and the starting points he provided, the data I used would have much less depth and would've been a lot more time consuming to gather. I would like to thank all friends I have made during my time at the TU/e and all my team mates of University Racing Eindhoven. With you all coming from different backgrounds varying from economics to electrical engineering, I learned much more from working with you than just the curriculum and I am truly grateful for all this knowledge outside of the borders of my own field. I would like to thank Tanja, for her mental support and understanding of my occasional lack of time while writing my thesis. Finally, I want to thank my parents, for supporting me unconditionally throughout my whole time as a student and giving me the opportunity and freedom to pursue the education I wanted. Rob Neggers Eindhoven, October 2016 ii Summary Introduction & motivation Previous research has been done on the use of mathematics to predict association footbal games (for example with a poisson distribution (Maher, 1982) and using Bayesian networks (Constantinou, Fenton, & Neil, 2012)), as well as previous research aimed at using mathemat- ical models to make predictions that were used on the sports betting market (by for example Dixon and Coles (1997) and Constantinou, Fenton, and Neil (2013)). However, these mod- els only used goals/points or psychological and situational factors as prediction variables. For American Football, research was done on using a regression model based on statistics from pre- vious games (Zuber, Gander, & Bowers, 1985). This model was able to generate profits on the betting market. Therefore, this thesis focussed on using game statistics as input for forecast- ing models to predict the number of goals in association football games. Three different model types were used for forecasting: Partial least squares structural equation modelling (PLS-SEM), neural networks and fuzzy interference systems. The forecasts of these models were then tested against the Asian betting market to see if they are able to generate a profit. Summarizing, this led to the following main research question: What type of forecasting model based on match statistics is the most accurate for association football and how does this model's performance compare to the Asian betting market? Approach To answer this question, it was first researched which variables should be included in the used models. Data was gathered from Squawka's website using a Python script. Different types of rolling averages for the teams were calculated for all gathered variables and regression analyses were done for these averages as inputs and the goals conceded and scored in the next match as output variable. It was found that an average of the previous 5 matches before the predicted game for the predictor variables had the most significant relationship with the predicted vari- ables. Therefore, 5-match average values for the variables were used as inputs for the prediction models. The three different models were fitted individually and then used to make predictions of the number of goals a team would score for an independent sample consisting of 1200 matches (resulting in 2400 predictions, one per team for each match). For these predictions, the mean absolute error and the root mean squared error were calculated. These error measures were calculated as well for a number of heuristics. All of these values are shown in Table 1. iii Table 1: MAE and RMSE for created models and various heuristics Model/heuristic MAE RMSE Structural equation model 0.9203 1.1661 Neural network 0.9083 1.1561 Fuzzy interference system 0.9217 1.1689 General average goals scored 1.0048 1.2484 Average goals scored for home and away team separate 0.9815 1.2350 General average goals scored per league 0.9988 1.2446 Average goals scored per league for home and away team separate 0.9754 1.2309 Average of a team's goals scored in last 5 matches 1.0223 1.3003 Average of upcoming opponent's goals conceded in last 5 matches 1.0441 1.3341 Average of scored and opponent's conceded goals in past 5 matches 0.9755 1.2383 As can be seen in this table, the error measures of the three different models are quite compa- rable and outperform the other heuristics. Statistical testing was done too and no significant differences between the absolute errors of the models were found. This implies that there is not enough proof to claim that one model is better than another. As a next step, the Asian betting market was analysed. Data from PinnacleSports was used, the bookmaker with the highest limits in the world offering Asian-style markets. It was checked whether betting on bets that fall under certain situations is significantly profitable, and if betting on bets that fall under certain situations is more profitable than betting on the opposed bet of this situation. It was found that there are no type of bets that generate a significant profit when blindly betting on them. As expected, nearly all bets (apart from favourites and especially away favourites), return in a significant loss when blindly betting on them. When investigating opposing bets, it was found that blindly betting on favourites returns a significantly higher profit (or rather, a smaller loss) than blindly betting on underdogs. This is also particularly true for betting on home favourites as opposed to away underdogs and betting on away favourites as opposed to home underdogs. The most interesting finding in this stage is that bets with higher odds (and thus higher volatil- ity), have lower returns (bigger losses) than bets with smaller odds (lower volatility). This conflicts with the phenomenon of risk premium that is present in normal investment markets, which assumes that investments with a higher volatility lead to a higher expected return.