Statistical Assessment of Soccer Players
Total Page:16
File Type:pdf, Size:1020Kb
Statistical Assessment of Soccer Players by Mohammad-Amin Nabavi A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree Master of Science in Mathematics and Statistics Carleton University Ottawa, Ontario © 2019 Mohammad-Amin Nabavi ABSTRACT This thesis examines the use of neural network modelling and ordinal logistic regression on a single season of data (2015-16) to score or rank soccer players. These scores and ranks are then compared with ones from FIFA EASports that are based on a much more extensive data set that includes more variables as well as historical data. Using repeated measures one-way ANOVA and subsequent multiple comparison testing, we conclude that a neural network model with ten nodes in one hidden layer is able to generate scores statistically similar to FIFA EASports, while ordinal logistic regression cannot. We confirm this result be also examining the ranks obtained by both the neural network model and the ordinal logistic regression model. Spearman rank correlations are also examined and confirm that a neural network with 10 nodes in one hidden layer also generates ranks statistically similar to FIFA EASports. We also demonstrate the use of association rule mining on one team’s data from the 2015-16 season to identify players and combinations thereof that are associated with winning (or not winning) a match. Analyses are based on data from the Italian Serie A League 2015-2016 season. ii Acknowledgements These past couple of years have proven to be the hardest period of my life so far. I could not be here without the help of the people around me. I have been surrounded by the kindest people. I want to thank my parents for their support without any hesitation. I want to thank my brother and sister who have always been there for me. I want to thank Gray Barski and Shintara Hagiwara for many discussions about this research and for their moral encouragement. Last but not the least, I want to thank my supervisor, Dr. Mills. She supported me during this hard time and never gave up on me. I am blessed with being her student. iii Table of Contents Abstract……………………………………………………………..………………………………………..………………….……….. ii Acknowledgements……………………………………………..…………………………………………………………………… iii Table of Contents………………………………………………………..……………………………………………………………. iv List of Tables .................................................................................................................................. vii List of Figures ................................................................................................................................ viii 1. Introduction ................................................................................................................................. 1 2. Literature Review ........................................................................................................................ 4 3. Study Data and Exploratory Data Analyses ............................................................................... 25 3.1. Data ......................................................................................................................................... 25 3.1.1. Definition of the Variables ............................................................................................... 25 3.1.2. Structure of the Data Sets ................................................................................................ 26 3.1.3. Data Wrangling ................................................................................................................ 27 3.2. Exploratory Data Analyses ...................................................................................................... 28 3.2.1. Descriptive Statistics ............................................................................................................ 28 3.2.1.1. Defenders ...................................................................................................................... 29 Time played ............................................................................................................................ 29 Goals ...................................................................................................................................... 29 Overall Assists ........................................................................................................................ 30 iv Warning Cards ........................................................................................................................ 31 3.2.1.2. Midfielders .................................................................................................................... 31 Time played ............................................................................................................................ 32 Goals ...................................................................................................................................... 32 Overall Assists ........................................................................................................................ 33 3.2.1.3. Strikers .......................................................................................................................... 34 Time played ............................................................................................................................ 35 Goals ...................................................................................................................................... 35 Overall Assists ........................................................................................................................ 36 3.2.1. Visualization ......................................................................................................................... 37 3.2.1.1. Parallel Coordinates .................................................................................................. 37 Defenders ............................................................................................................................... 40 Midfielders ............................................................................................................................. 41 Strikers ................................................................................................................................... 42 4. Methodologies ........................................................................................................................... 44 4.1. Scoring/Ranking Players ...................................................................................................... 44 4.1.1. Neural Network Regression ......................................................................................... 44 4.1.2. Ordinal Logistic Regression .......................................................................................... 48 4.1.2.2. Ordinal Logistic Regression ....................................................................................... 50 4.1.2.3 Comparison of scoring and ranking methods ............................................................ 51 v 4.2. Methodology to address the second research question: Association rule mining ............ 52 4.2.1. The ApriorI Algorithm ...................................................................................................... 53 5. Results ...................................................................................................................................... 55 5.1. Scoring/Ranking Players Based on Performance during the 2015-16 Season .................... 55 5.1.1. Preparation of the Data for Scoring and Ranking ........................................................ 56 5.1.2. Scoring Players: Using Neural Networks ...................................................................... 59 5.1.3. Scoring/Ranking Model: Using Ordinal Regression ..................................................... 79 5.1.4. Comparison of Methods for Scoring and Ranking ....................................................... 95 5.1.5. Summary of ORM/NNET Findings .............................................................................. 102 5.2. Association rule mining ..................................................................................................... 103 5.2.1. Data preparation ........................................................................................................ 103 5.2.2. Results: Association rule mining ................................................................................ 105 6. Conclusions .............................................................................................................................. 119 7. References ............................................................................................................................... 123 vi List of Tables Table 1:Empirical studies with predominantly descriptive analysis. ............................................... 5 Table 2:Comparative studies with predominantly comparative analysis according to the different functional positions of the players. ................................................................................................. 7 Table 3:Empirical studies with predominantly comparative analysis based on the different competitive levels. ......................................................................................................................... 11 Table