Quarterback Evaluation in the National Football League
Total Page:16
File Type:pdf, Size:1020Kb
Quarterback Evaluation in the National Football League by Matthew Reyers B.Sc., Simon Fraser University, 2018 Project Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in the Department of Statistics and Actuarial Science Faculty of Science c Matthew Reyers 2020 SIMON FRASER UNIVERSITY Summer 2020 Copyright in this work rests with the author. Please ensure that any reproduction or re-use is done in accordance with the relevant national copyright legislation. Approval Name: Matthew Reyers Degree: Master of Science (Statistics) Title: Quarterback Evaluation in the National Football League Examining Committee: Chair: Gary Parker Professor Tim Swartz Senior Supervisor Professor Derek Bingham Committee Member Professor Harsha Perera External Examiner Lecturer Date Defended: August 20, 2020 ii Abstract This project evaluates quarterback performance in the National Football League. With the availability of player tracking data, there exists the capability to assess various options that are available to quarterbacks and the expected points resulting from each option. The quarterback’s execution is then measured against the optimal available option. Since decision making does not rely on the quality of teammates, a quarterback metric is introduced that provides a novel perspective on an understudied aspect of quarterback assessment. Keywords: Sports Analytics, Expected Points, Machine Learning, Model Validation, Player Tracking Data iii Dedication In loving memory of my grandfather, Clarke Akeroyd iv Acknowledgements This project would not have been possible without the support and mentorship of my supervisor Dr. Tim Swartz. His effort and patience have helped propel me through my graduate studies and my project. Despite an ever increasing workload, Dr. Swartz has been a stalwart in the creation of this work. I am beyond grateful. Further, an enormous thank you to Dr. Michael Lopez of Skidmore College and the NFL for his role in developing my fascination with Sports Analytics and in the facilitation of this project. All tracking data made available for this project comes by way of him and the NFL Big Data Bowl, the event that inspired this project. Thank you to the wonderful instructors I have had the chance to work with closely during my graduate studies including Dr. Derek Bingham, Dr. Richard Lockhart, Dr. Jinko Graham, Dr. Rachel Altman, Dr. Tom Loughin, and Ian Bercovitz. I also want to thank Charlene Bradbury, Sadika Jungic, Jay Young, and Kelly Jay for their undeniable ability to keep me organized and informed. From outside the department, a thank you to Dr. Tamon Stephen for his guidance and support. To my colleagues, thank you for making my time at SFU memorable. The lunch meetings, the late night discussions, and the ever expanding list of projects has only enhanced the experience. I am appreciative for all the time we had together. Finally, the biggest thank you of all to my family for their love, support, and laughter. No matter the place in sports analytics, no team is stronger than family! 1 Chapter 1 Introduction The National Football League (NFL) is the top revenue league in world sport (Raul 2016) with an average team revenue of $453,000,000 in the 2017 season (Gough 2018). Despite the big money nature of the NFL, football analytics trails some of the other “big” professional sports including basketball (the National Basketball Association), soccer (major European leagues) and baseball (Major League Baseball). For a survey of some of the work that has been done in sports analytics, see Albert, Glickman, Swartz and Koning (2017). The analytics landscape in the NFL is beginning to change as Next-Gen-Stats’ player tracking data was made available to all 32 NFL teams in 2019. Player tracking data is detailed spatio- temporal data where the locations of each player on the field are recorded 10 times per second. This type of data leads to analytics opportunities that were previously unthinkable in the era of boxscore data. Subsets of the data have been released by the NFL in a yearly competition known as the Big Data Bowl (https://operations.nfl.com/the-game/big-data-bowl/) which is an analytics event held in conjunction with the NFL Scouting Combine. The availability of the player tracking data has led to a flurry of recent NFL analytics research and includes Burke (2019), Chu et al. (2020), Deshpande and Evans (2020), Yam and Lopez (2020) and Yurko et al. (2020). A traditional NFL statistic that is widely reported and is endorsed by the NFL is the quarterback passer rating. In the 2019 season, Patrick Mahomes of the Super Bowl Cham- pion Kansas City Chiefs was one of the top quarterbacks in the NFL with a rating of 105.3 (see www.nfl.com/stats/categorystats?statisticCategory=PASSING). The quarterback passer rating (Zilavy 2018) is a complex formula for which there is a minimum rating of 0 and a maximum rating of 158.3. The components of the formula involve aspects of passing perfor- mance (e.g. completions, passing yards, interceptions, etc.) that all make “football sense”. Yet, these components are combined in an ad-hoc manner to yield a rating that does not admit straightforward interpretation. It is a curious fact that amongst the top 12 quarterbacks in 2019 (according to passer rating), all 12 of these quarterbacks played for the 12 NFL playoff teams. An immediate reaction to this observation is that a team must have an outstanding quarterback in order to make the playoffs. 2 Another explanation is that a quarterback’s rating is highly dependent on the quality of his team. This is the motivation for our research; we attempt to introduce a quarterback metric which is less dependent on the performance of one’s teammates. It is possible that there are some excellent quarterbacks who play on weaker teams and their worth is not fully appreciated. To investigate quarterback performance, we use the player tracking data previously men- tioned. On each passing down, we identify options that are available to the quarterback. This is a novel investigation as it requires the enumeration of options (both passing and running) prior to the quarterback’s actual decision, and even options that may have eventuated after his decision. Then, each of these options are assessed an EPV (expected point value) depending on the state of the game. Further, probabilities of the successful execution of these options are esti- mated. These components then allow us to formulate a metric which compares actual outcomes versus optimal options. An advantage of this approach is that the quarterback metric is less dependent on one’s teammates - some quarterbacks will have better options than other quar- terbacks, and the metric is formulated such that a quarterback’s performance is only compared against his available options. Another advantage is that the metric takes running into account unlike the traditional quarterback passing rating. Clearly, the running abilities of quarterbacks such as Russell Wilson, Lamar Jackson, Deshaun Watson and Michael Vick have been a great benefit to their respective NFL teams. In terms of completing a catch, there have been various investigations (Yurko et al. 2020, Deshpande and Evans 2020) and vendors (Next Gen Stats Team 2018) that provide probabilities of pass completion. These approaches typically look at the circumstances at the time of the catch (e.g. the location of defenders, where the ball is thrown, etc.). We emphasize an important novelty of our approach where completion probability is assessed at the time that the ball is thrown - this involves more uncertainty as it is unclear how the play will develop downfield. The only closely related approach to our work is that by Burke (2019). Burke (2019) uses neural networks to predict the targeted receiver. The covariates chosen are different from ours and consideration is only given to passing options at the instant that the pass is made. In assessing quarterbacks, Burke (2019) uses expected yards; we believe that EPV is a more relevant measure in football since it incorporates context. For example, gaining 7 yards on first down and 10 yards is a much better outcome than gaining 7 yards on third down and 10 yards. Unlike Burke (2019), we also introduce some stochasticity in our approach; this facilitates error assessment. Importantly, our approach also takes into account non-designed quarterback runs and interceptions; intercepted passes form a critical component of the outcome of matches. In Section 2, we begin with a broad overview of our approach. The basic idea is that we identify options that are available to a quarterback, we assign value to these options, and we then compare the actual results versus the optimal options. This results in a quarterback rating that is less dependent on teammates. In Section 3, details of the procedure are provided. We describe the player tracking data, propose covariates and then use machine learning algorithms to determine the probabilities associated with the options available to the quarterback. The 3 probabilities are validated against fresh data. In Section 4, the methods are applied and ratings are obtained for NFL quarterbacks. The ratings generally agree with popular opinion although they reveal some surprises; there are some quarterbacks held in high esteem who are not rated so highly, and vice-versa. We consider a discussion of results in Section 5, comparing our evaluation criterion with a process-based alternative. We conclude with a brief summary of outcomes and future considerations in Section 6. Further, the work presented in this MSc project is an expansion of Reyers and Swartz (2020). 4 Chapter 2 Overview of the Approach Consider a particular quarterback and all of his passing and running options on a play that was not a designed run. For the ith play, the quarterback executes a decision at time ti.