Sports Analytics Algorithms for Performance Prediction
Total Page:16
File Type:pdf, Size:1020Kb
Sports Analytics Algorithms for Performance Prediction Chazan – Pantzalis Victor SID: 3308170004 SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Data Science DECEMBER 2019 THESSALONIKI – GREECE I Sports Analytics Algorithms for Performance Prediction Chazan – Pantzalis Victor SID: 3308170004 Supervisor: Prof. Christos Tjortjis Supervising Committee Members: Dr. Stavros Stavrinides Dr. Dimitris Baltatzis SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Data Science DECEMBER 2019 THESSALONIKI – GREECE II Abstract Sports Analytics is not a new idea, but the way it is implemented nowadays have brought a revolution in the way teams, players, coaches, general managers but also reporters, betting agents and simple fans look at statistics and at sports. Machine Learning is also dominating business and even society with its technological innovation during the past years. Various applications with machine learning algorithms on core have offered implementations that make the world go round. Inevitably, Machine Learning is also used in Sports Analytics. Most common applications of machine learning in sports analytics refer to injuries prediction and prevention, player evaluation regarding their potential skills or their market value and team or player performance prediction. The last one is the issue that the present dissertation tries to resolve. This dissertation is the final part of the MSc in Data Science, offered by International Hellenic University. Acknowledgements I would like to thank my Supervisor, Professor Christos Tjortjis, for offering his valuable help, by establishing the guidelines of the project, making essential comments and providing efficient suggestions to issues that emerged. I would also like to thank him for his promptness and his cooperativeness throughout the whole process. I would like to thank my family for their support and their patience during the past few months. They have done everything possible for me and I could not have made it without them. This dissertation is devoted to the little guy I do everything for, my greatest inspiration, my son, Vasilis. “Research is to see what everybody else has seen and to think what nobody else has thought.” Albert Szent–Gyorgyi Chazan–Pantzalis Victor 25–11–2019 III Contents ABSTRACT .................................................................................................................III CONTENTS ................................................................................................................. IV LIST OF FIGURES ....................................................................................................... VI LIST OF TABLES ..................................................................................................... VIII 1 CHAPTER 1 ............................................................................................................. 1 INTRODUCTION ........................................................................................................... 1 2 CHAPTER 2 ............................................................................................................. 3 2.1 HISTORICAL BACKGROUND ................................................................................... 3 2.1.1 Baseball .............................................................................................. 3 2.1.2 Tennis ................................................................................................. 7 2.1.3 American Football ................................................................................. 7 2.1.4 Basketball ............................................................................................ 9 2.1.5 Motorsports (Formula 1) ...................................................................... 12 2.1.6 Football (Soccer) ................................................................................ 14 2.2 LITERATURE REVIEW ......................................................................................... 18 2.2.1 Definitions and Data Composition ......................................................... 19 2.2.2 Game Result Predictive Models ............................................................ 19 2.2.3 Game Result Comparative Models ........................................................ 22 2.2.4 Rating Systems .................................................................................. 24 2.2.5 Expected Goals (xG) Models ................................................................ 27 2.2.6 Long–Term Prediction Models .............................................................. 28 2.2.7 Pass Effectiveness Models, Networks of Passes and Spatiotemporal Data 30 2.2.8 Cameras and Wearable Devices........................................................... 36 2.2.9 Player Performance Prediction ............................................................. 40 2.2.10 Player Injuries Prediction ..................................................................... 43 2.2.11 Uncertainty of Outcome, Competitive Balance and Competitive Intensity ... 45 2.2.12 Outstanding Previous Work in other Sports ............................................ 46 IV 3 CHAPTER 3 .......................................................................................................... 52 3.1 GENERAL TERMS ............................................................................................. 52 3.1.1 Machine Learning ............................................................................... 52 2.1.2 Machine Learning Algorithm ................................................................. 57 2.1.3 Data Mining ........................................................................................ 57 2.1.4 Data Analysis ..................................................................................... 59 2.1.5 Sport Analytics ................................................................................... 61 2.1.6 Performance Prediction in Sports .......................................................... 62 3.2 ALGORITHMS AND TOOLS................................................................................... 63 3.2.1 Decision Trees ................................................................................... 63 3.2.2 Random Forests ................................................................................. 65 3.2.3 Support Vector Machines (SVM) ........................................................... 66 3.2.4 Linear Regression ............................................................................... 67 3.2.5 Neural Networks ................................................................................. 69 3.2.6 Jupyter Notebook................................................................................ 71 3.2.7 Weka................................................................................................. 71 4 CHAPTER 4 ........................................................................................................... 73 4.1 PROBLEM DEFINITION........................................................................................ 73 4.2 APPROACH FOLLOWED ...................................................................................... 74 5 CHAPTER 5 ........................................................................................................... 76 5.1 EXPERIMENTS ................................................................................................. 76 st 5.1.1 1 Experiment: Team Performance Prediction ........................................ 76 nd 5.1.2 2 Experiment: Player Performance Prediction ...................................... 92 5.2 EVALUATION OF RESULTS ................................................................................ 102 6 CHAPTER 6 ......................................................................................................... 105 6.1 CONCLUSIONS ............................................................................................... 105 6.2 FUTURE WORK .............................................................................................. 106 REFERENCES ........................................................................................................... 108 V List of Figures Figure 1 – Branch Rickey formula....…………. .................................................................... 5 Figure 2 – MLB 2002 team salaries....…………................................................................... 6 Figure 3 – SportVU....…………. ...................................................................................... 10 Figure 4 – Houston Rockets 2019 shot chart....…………. ................................................... 11 Figure 5 – Three points shot frequency....…………. .......................................................... 12 Figure 6 – Pit stop strategies....…………. ......................................................................... 13 Figure 7 – Game statistics....…………. ............................................................................ 16 Figure 8 – Live coverage game statistics....…………. ........................................................ 17 Figure 9 – Number and proportion of sport analytics journal articles....…………. ................... 18 Figure 10 – pi–ratings update process....…………. ............................................................ 27 Figure 11 – Pass network....…………. ............................................................................