An Elo-Based Approach to Model Team Players and Predict the Outcome of Games
Total Page:16
File Type:pdf, Size:1020Kb
AN ELO-BASED APPROACH TO MODEL TEAM PLAYERS AND PREDICT THE OUTCOME OF GAMES A Thesis by RICHARD EINSTEIN DOSS MARVELDOSS Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Chair of Committee, Jean Francios Chamberland Committee Members, Gregory H. Huff Thomas R. Ioerger Head of Department, Scott Miller August 2018 Major Subject: Computer Engineering Copyright 2018 Richard Einstein Doss Marveldoss ABSTRACT Sports data analytics has become a popular research area in recent years, with the advent of dif- ferent ways to capture information about a game or a player. Different statistical metrics have been created to quantify the performance of a player/team. A popular application of sport data anaytics is to generate a rating system for all the team/players involved in a tournament. The resulting rating system can be used to predict the outcome of future games, assess player performances, or come up with a tournament brackets. A popular rating system is the Elo rating system. It started as a rating system for chess tour- naments. It’s known for its simple yet elegant way to assign a rating to a particular individual. Over the last decade, several variations of the original Elo rating system have come into existence, collectively know as Elo-based rating systems. This has been applied in a variety of sports like baseball, basketball, football, etc. In this thesis, an Elo-based approach is employed to model an individual basketball player strength based on the plus-minus score of the player. The plus-minus score is a powerful metric because it quantifies the contribution of a player like good defense, setting up screens, or sledging the opposite team, which are not reflected by metrics that are pri- marily based on points. Then, the individual player ratings are combined to obtain a team rating, Team rating are compared pairwise to obtain the probability of a win by each of the teams during a matchup. This method not only predicts wins/losses, but offers more information than the Elo rating system as ratings are assigned to each individual player instead of just considering teams. This information includes for example, the effect of mid-season transfers or the impact of injuries to team strengths; these items are overlooked by the standard Elo algorithm. The performance of the proposed Elo-based rating system is compared to that of the standard Elo rating system for basketball by using sythetic data. The rating systems are also compared by running them over real-life data from past NBA seasons. ii DEDICATION This dissertation is dedicated to my family, professors and friends for their guidance and encouragement which made this work possible. iii ACKNOWLEDGMENTS I would like to thank Dr. Jean Francois Chamberland for his support and guidance throughout the course of my thesis. This phase has helped me understand what research really is and I have learnt life-long lessons. I would also like to thank Dr. Gregory Huff and Dr. Thomas Iorger who offered me the freedom to venture into the domain of my interest. iv CONTRIBUTORS AND FUNDING SOURCES Contributors This work was supported by a thesis committee consisting of Professor Jean Francios Cham- berland and Professor Gregory H. Huff of the Department of Electrical and Computer Engineering and Professor Thomas R. Ioerger of the Department of Computer Science. All other work conducted for the thesis was completed by the student independently. v NOMENCLATURE +/- Plus-minus score NBA National Basketball Association NCAA National Collegiate Athletic Association FIDE Fed´ eration´ Internationale des Echecs´ USCF United States Chess Federation ATL Atlanta Hawks BOS Boston Celtics BKN Broklyn Nets CHA Charlotte Hornets CHI Chicago Bulls CLE Cleveland Cavaliers DAL Dallas Mavericks DEN Denver Nuggets DET Detroit Pistons GSW Golden State Warriors HOU Houston Rockets IND Indiana Pacers LAC Los Angeles Clippers LAL Los Angeles Lakers MEM Memphis Grizzlies MIA Miami Heats MIL Millwakee Bucks vi MIN Minnesota Timberwolves NOP New Orleans Pelicans NYK New York Knicks OKC Okhalohoma Thunder ORL Orlando Magic PHI Philadelphia 76ers PHX Phoneix Suns POR Portland Trailblazers SAC Sacramento Kings SAS San Antonio Spurs TOR Toronto Raptors UTA Utah Jazz WAS Washington Wizards vii TABLE OF CONTENTS Page ABSTRACT ......................................................................................... ii DEDICATION....................................................................................... iii ACKNOWLEDGMENTS .......................................................................... iv CONTRIBUTORS AND FUNDING SOURCES ................................................. v NOMENCLATURE ................................................................................. vi TABLE OF CONTENTS ........................................................................... viii LIST OF FIGURES ................................................................................. x LIST OF TABLES................................................................................... xii 1. INTRODUCTION AND LITERATURE REVIEW ........................................... 1 1.1 Rating system ............................................................................. 1 1.1.1 Rating system for basketball tournaments....................................... 1 1.2 Elo algorithm .............................................................................. 3 1.3 Mathematical Model ...................................................................... 3 1.3.1 K-factor ........................................................................... 4 1.3.2 Actual Score (Sij) ................................................................ 4 1.3.3 Expected Score (µij) ............................................................. 4 1.4 Home advantage........................................................................... 7 1.5 Update Algorithm ......................................................................... 8 2. PROPOSED HYPOTHESIS .................................................................... 9 2.1 Motivation ................................................................................. 9 2.1.1 Plus-Minus Score (+/-) ........................................................... 9 2.2 Elo-Based Approach ...................................................................... 10 2.2.1 Strength of a Player .............................................................. 11 2.2.2 Expected Value ................................................................... 11 2.2.3 K-factor ........................................................................... 12 2.2.4 F(x)................................................................................ 13 2.2.5 Point Conservation ............................................................... 13 2.2.6 Outcome of a game ............................................................... 16 2.3 Update Algorithm ......................................................................... 17 viii 3. DATASET ....................................................................................... 18 3.1 Real-life Data.............................................................................. 18 3.1.1 Data Scrapping ................................................................... 18 3.2 Synthetic Data ............................................................................. 18 3.2.1 Initialization ...................................................................... 20 3.2.2 Metric Calculation................................................................ 21 4. NUMERICAL SIMULATIONS................................................................ 22 4.1 Synthetic Data ............................................................................. 22 4.1.1 Elo Algorithm .................................................................... 22 4.1.2 Proposed Algorithm .............................................................. 22 4.2 Real-life Data.............................................................................. 23 4.2.1 Prediction Discrepancy........................................................... 24 4.2.2 Testing on the NBA 2017-18 season ............................................ 26 4.2.3 Testing using bracket scores ..................................................... 27 4.2.3.1 March Madness NCAA brackets ..................................... 28 4.2.3.2 Bracket score prediction using the standard Elo algorithm.......... 30 4.2.3.3 Bracket score prediction using the proposed algorithm ............. 30 4.3 Player Performance ....................................................................... 30 5. SUMMARY AND CONCLUSIONS .......................................................... 36 REFERENCES ...................................................................................... 37 ix LIST OF FIGURES FIGURE Page 1.1 Example of the statistics collected for an NBA game, reprinted from [1]. ............. 2 1.2 This figure is the distribution of player strengths with player 1 having an average rating of 1500; and player 2, a rating 1900............................................... 5 1.3 This figure represents the distribution of difference in player strengths with player 1 having an Elo rating of 1500 and player 2 with an Elo rating 1900 respectively. .... 6 1.4 Comparsion of logisitc distrbution and normal distribution............................. 6 2.1 The