Assessing the Skill of Football Players Using Statistical Methods
Total Page:16
File Type:pdf, Size:1020Kb
Assessing the skill of football players using statistical methods Łukasz Szczepanski´ SALFORD BUSINESS SCHOOL, UNIVERSITY OF SALFORD,SALFORD, UK SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS OF THE DEGREE OF DOCTOR OF PHILOSOPHY, JANUARY 2015 Contents 1 Introduction 1 1.1 Background . .1 1.2 Problem statement . .4 1.3 Contribution . .7 2 Player evaluation in team sports 11 2.1 Characteristics of player evaluation metrics . 12 2.1.1 Definitions . 12 2.1.2 Approaches to player evaluation . 13 2.2 Football research . 14 2.3 Research in other sports . 16 2.3.1 Baseball . 16 2.3.2 Invasion sports . 21 3 Problem statement 24 3.1 Team production model . 24 3.1.1 Individual and team performance . 25 3.1.2 Player valuation in the context of the model . 26 3.2 High I - high II approach for evaluating football players . 27 3.2.1 Regularised plus/minus model . 28 3.2.2 Introducing mixed effects Markov chain model for player eval- uation . 29 4 Data 32 5 Signal and noise in goalscoring statistics 38 5.1 Background and motivation . 39 5.2 Data . 41 5.3 Methods . 41 i CONTENTS ii 5.3.1 Generalized Linear Mixed Model . 43 5.3.2 Shot counts . 44 5.3.3 Shots to goals conversion rates . 47 5.3.4 Predicting future performance . 49 5.4 Results . 50 5.4.1 Model fit for shot counts . 50 5.4.2 Model fit for shots to goals conversion . 54 5.4.3 Comparing characteristics of the model fits . 55 5.4.4 Goals predictions . 56 5.5 Discussion . 60 6 Adding context to passing analysis 62 6.1 Background and motivation . 63 6.2 Data . 64 6.3 Methods . 64 6.3.1 Factors influencing pass success and their proxies . 65 6.3.2 Generalized Additive (Mixed) Model . 67 6.3.3 Pass outcome model . 69 6.3.4 Prediction types . 71 6.4 Results . 73 6.4.1 GAMM estimation results . 73 6.4.2 Ease of pass . 77 6.4.3 Evaluating players . 81 6.4.4 Comparing predictive utility . 82 6.5 Discussion . 84 7 Individual player contribution to team success 86 7.1 Introduction . 86 7.2 Data . 88 7.3 Methods . 88 7.3.1 Time-homogeneous Markov chain with a finite state space . 88 7.3.2 Basic model of the game . 89 7.3.3 Player specific model of the game . 93 7.4 Results . 104 7.4.1 Basic model of the game . 104 7.4.2 Player specific model of the game . 105 CONTENTS iii 7.4.3 Evaluating predictive utility . 118 7.5 Discussion . 119 8 Conclusions 122 8.1 Summary of the findings . 122 8.2 Limitations of the study . 124 8.3 Recommendations for future work . 125 A Some results from the estimation theory 126 A.1 Generalized Linear Models . 126 A.2 Iteratively Weighted Least Squares . 127 A.3 Penalized Iteratively Weighted Least Squares . 128 A.4 Laplace approximation . 129 A.5 Fisher scoring . 130 B Breakdown of transition probabilities 131 B.1 Choice node . 131 B.2 Pass node . 132 B.3 Shot node . 134 C Sensitivity analysis 136 D Code 146 List of Figures 4.1 Players’ success rate at various actions in two consecutive seasons of the English Premier League. 33 4.2 Players’ pass success rate in two consecutive seasons of the English Pre- mier League depending on the location of the pass origin. 35 4.3 Contour plot of anticipated player positions in games of the 2006/07 season. 37 5.1 Goals per 100 minutes played for 2007/08 versus goals per 100 minutes played for 2006/07. 40 5.2 Histogram of shots per player per game, with fitted Poisson distribution. 47 5.3 Histogram of shots per player per game and model based frequencies. 53 5.4 Average conversion rate residuals against the number of shots for the basic and the extended shot conversion model . 55 5.5 Average (basic) model implied predictions of shot generation and con- version versus the average observed values per player in the fitting sample. 56 5.6 Goals scored per 100 minutes in season 2007/08 versus the (extend- ed/basic) model player predictions. 59 6.1 Time related component smooth functions on the scale of the linear pre- dictor of pass success. 74 6.2 Smooth function of the executing player’s average position in the pass success model . 75 6.3 Value of the linear predictor of pass success with respect to the location of the pass origin and its target. 76 6.4 Team random effects prediction in the pass success model. 77 6.5 Ease of pass. 78 6.6 Average observed 2007/08 pass completion rate against the naive and the model predictions. 79 iv LIST OF FIGURES v 6.7 Estimated players’ passing ability against the observed pass completion rate, a proxy for the ease of pass and the number of passes in the fitting sample. 81 6.8 Home team goals supremacy in fixtures of 2007/08 season against the difference in the average predictor of the pass completion rate for the home and the away team players in that fixture. 83 7.1 State location for the pass events in the Markov chain model. 89 7.2 Structure of the transition matrix of the basic Markov model of the game. 92 7.3 Structure of the transition matrix of the player specific Markov model of the game. 96 7.4 Estimate of the one step transition matrix in the basic Markov chain model.105 7.5 State transition from n + 1 to n + 2 depending on the team in possession at n + 0................................... 106 7.6 Histogram of the predicted values of player random effects in the shot conversion model. 107 7.7 Histograms of player specific shot conversion rates predictions depend- ing on where the shot is taken from. 108 7.8 Predicted probability of a pass being directed to a given zone depending on its origin and the nominal position of the executing player. 109 7.9 Predicted pass completion rate of an average player depending on the zone of origin and the targeted zone . 111 7.10 Histogram of the predicted values of player random effects in the pass completion model. 111 7.11 Distributions of the predicted player specific pass completion rate de- pending on the zone of origin and the targeted zone. 112 7.12 Expected and observed frequency of events per game in a given zone. 114 7.13 Expected and observed goals per game by team. 115 7.14 Expected and observed goal supremacy per game by team. 116 7.15 Estimated probability distribution of passes in Manchester United games with Cristiano Ronaldo or his average replacement in the team. 116 7.16 Estimated probability distribution of shots in Manchester United games with Cristiano Ronaldo or his average replacement in the team. 117 7.17 Expected goal supremacy above an average player in their position in season 2006/07. 117 C.1 Pitch division into 4, 5, 6 and 7 zones. 137 LIST OF FIGURES vi C.2 Estimate of the one step transition matrix in the basic Markov chain model (different pitch divisions). 138 C.3 Predicted probability of a pass being directed to a given zone depending on its origin and the nominal position of the executing player. 139 C.4 Predicted pass completion rate of an average player (a goalkeeper or not) depending on the zone of origin and the targeted zone. 140 C.5 Expected and observed frequency of passes per game in a given zone. 141 C.6 Expected and observed frequency of shots per game in a given zone. 142 C.7 Expected and observed goal supremacy per game by team. 143 C.8 Top 20 players by the expected goal supremacy above an average player in their position in season 2006/07. 144 D.1 Outline of the code. 152 List of Tables 4.1 Sample of the Opta events data. 32 5.1 Parameter estimates of the shots count models. 52 5.2 Likelihood ratio test for the basic and extended shot count models. 53 5.3 Parameter estimates of the conversion rate model. 54 5.4 Likelihood ratio test for the basic and extended shot conversion models. 54 5.5 Performance of the models and the naive method in predicting goals. 57 5.6 Predicted and actual 2007/08 goals per 100 minutes for the top 15 model predicted goals per 100 minutes scorers based on season 2006/07 data. 58 6.1 Covariates used to proxy factors influencing pass success. 67 6.2 Estimates of the parametric terms in the pass success model. 73 6.3 Top 5 passers by position based on 2006/07 season performance. 80 7.1 Translation of a sample.