Shrinkage Estimation of NFL Field Goal Success Probabilities
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Sports Analytics 3 (2017) 129–146 129 DOI 10.3233/JSA-16140 IOS Press Shrinkage estimation of NFL field goal success probabilities Jason A. Osbornea,∗ and Richard A. Levineb aDepartment of Statistics, North Carolina State University, NC, USA bDepartment of Mathematics and Statistics, San Diego State University, CA, USA Abstract. National Football League (NFL) kickers have displayed improvement in both range and accuracy in recent years. NFL management in turn has displayed a rather low tolerance for missed field goals, particularly in game-deciding situations. However, these actions may be a consequence of a perceived appreciable variability in NFL kicker ability. In this paper, we consider shrinkage estimation of NFL kicker field goal success probabilities. The idea derives from the literature on estimating batting averages in baseball, though the classic James-Stein shrinkage approaches there do not apply to independent binary field goal attempt trials. We study a variety of weighting schemes for shrinking model-based kicker-specific field goal success probability estimates to a league-wide estimate, as a function of distance. As part of the development, we briefly detail collecting NFL play-by-play data with the R statistical software package, identify the complementary log-log link function as preferable to the more commonly applied logit link function in a generalized linear model for field goal success, and demonstrate the desired variance-reduction, both in and out of sample, enjoyed by our proposed shrinkage estimators. We illustrate our methods by ranking NFL kickers from 1998 to 2014, analyzing individual kicker success at mid-range and long-range field goal attempts, and studying kicker ability over the last decade. Stadium effects are added to the model and found to be highly significant and to have an impact on the kicker rankings. Keywords: NFL kicker rankings, generalized linear models, weighted average, big leg kickers, web scraping, stadium effects 1. Introduction who in turn missed a 29 yard field goal attempt in another loss two weeks later. Hartley then regained Given the high frequency of personnel transactions his job the following week. The Saints are not the involving field goal kickers in the National Football only team with a quick trigger on in-season kicker League (NFL), there appears to be a perception of personnel decisions. Table 1 summarizes the number considerable variability among their abilities to carry of kickers who have been released, for reasons not out their duties, at least among general managers. listed as injury, over the 2000-2014 timeframe. On In particular, when kickers miss field goals or extra average, there are more than 3 such kickers released points that are perceived to cost their team a game, per season. These counts were assembled by inspec- they are sometimes terminated shortly thereafter, pre- tion of season summaries of kickers found on and sumably reasoning that a replacement would have a subsequent post-hoc inspection of various websites better chance of success from the same distance. Con- that track transactions. sider the 2010 New Orleans Saints. In Week 3, Garrett Quantitative analyses (Morrison and Kalwani, Hartley missed a 29 yard field goal attempt in an over- 1993) have found surprisingly limited evidence of time loss. He was replaced by veteran John Carney, differences among kickers. In an investigation of weather and distance effects on kicking, Bilder and ∗ Corresponding author: Jason A. Osborne, Department of Loughin (1998) chooses not to model individual Statistics, North Carolina State University, Box 8203, NCSU, Raleigh, NC, USA. Tel.: +1 919 515 1922; E-mail: kicker effects, describing kickers as “interchange- [email protected]. able parts.” In a study of whether “icing the kicker” ISSN 2215-020X/17/$35.00 © 2017 – IOS Press and the authors. All rights reserved This article is published online with Open Access and distributed under the terms of the Creative Commons Attribution Non-Commercial License (CC BY-NC 4.0). 130 J.A. Osborne and R.A. Levine / Shrinkage estimation of NFL field goal success probabilities Table 1 Number of mid-season performance-based kicker changes over time Year 2001-2003 2004-2006 2007-2009 2010-2012 2013-2015 Number players released 12 10 11 6 10 affects kickers (it does!), Berry and Wood (2004) also select variables and finds that adding distance to an assumes that dependence of log-odds of success on intercept-only model decreases a predictive mean distance to be the same for all kickers. If there are dif- squared error criterion by 14%, but that adding an ferences among kickers in the way they are affected additional 7 variables (including temperature, wind, by field goal attempts of increasing distance, these kicker fatigue, defense quality, and whether the kick effects may be subtle enough to go undetected with- was in Denver) brings about a further reduction that is out large samples. Due to these subtleties, estimation less than 2%. Since many failed extra point attempts of success probabilities for particular kickers can be are a consequence of poor snaps or holds, we restrict a difficult task. our attention to field goal attempts and do not con- We consider the problem of estimating probabil- sider extra point success probabilities or data in our ities of field goal success using play-by-play data analyses. available on the web. As with analogous investi- Our aim in this paper is not to select variables for gations in the literature, we develop generalized the best model, nor to account for the various sub- linear models which assume kicks to be indepen- tle (in comparison to distance) effects of weather or dent Bernoulli trials. However, we show that the psychological pressure, but rather to demonstrate the complementary log-log (CLL) link provides a bet- potential advantages of shrinking regression-based ter fit than the logistic regression models typically estimators towards a central value with the goal of applied in prior work. In models that include kicker variance-reduction. Indeed, this work is motivated effects, estimates of field goal success for inexpe- by the success of James-Stein shrinkage techniques rienced kickers may have high sampling variance (Efron and Morris, 1975; Brown, 2008) for the prob- due to small samples. We investigate a number of lem of predicting batting averages in baseball. shrinkage techniques that borrow information from In Section 2, we describe our data collection pro- field goal attempts by other kickers in an effort to cedure and consider data-driven selection of the link reduce the sampling variance in these kicker-specific function in a generalized linear model for field goal estimates. success. In Section 3, we consider a variety of shrink- The textbook by Morris and Rolph (1981) uses age estimators for the kicker and distance-specific the problem of estimating field goal success prob- success probabilities as functions of model parame- abilities after regression on distance (grouped into ters. In Section 4, we illustrate several applications of intervals with 10 yard widths) as an introduction to our shrinkage estimators, including a ranking of kick- logistic regression. Berry and Berry (1985) develops ers and a discussion of long field goal attempts and a generalized linear model for field goal and extra stadium effects. We present a summary and conclude point success probabilities using a customized link in Section 5. function that is derived from consideration of three factors: distance, left-right error, and the chance of a non-goalpost related error (a fumble or blocked 2. Data and statistical models kick). Other work investigating additional factors beyond distance that may be associated with field 2.1. Data collection: web scraping with R goal success probabilities, such as weather and psy- chological pressure, includes Bilder and Loughin Like most other work in the area, we assume (1998); Pasteur and Cunningham-Rhoads (2014); kicks to be independent and model the dependence Berry and Wood (2004); Clark et al. (2013). All of of success probability on explanatory variables using these contributions are in agreement that the single generalized linear models. The explanatory informa- most important factor in predicting field goal suc- tion we consider includes only the distance of the cess is the distance of the attempt. Morrison and attempt and who the kicker is. Using the readLines() Kalwani (1993) fails to reject the hypothesis of equal command and regular expressions functionality in the success probability among all kickers. Pasteur and R statistical software package (R Core Team, 2014), Cunningham-Rhoads (2014) uses cross-validation to the distance of a field goal attempt is gleaned from J.A. Osborne and R.A. Levine / Shrinkage estimation of NFL field goal success probabilities 131 Fig. 1. Play-by-play item containing relevant information on a field goal attempt. the website with url .This website has complete play- The records online do not easily allow for deter- by-play information going back as far as 1998. More mination of blame. An example of an entry that specifically, the string “yard field goal" is matched was scraped from the url (http://www.pro-football-re and the kicker and distance occurring in the string ference.com/boxscores/199809060cin.htm). (a game before the match are recorded along with the outcome from week 1 of the 1998-1999 season) appears in that follows the match. Every field goal attempted, Fig. 1. even those on which a penalty occurred, but for which Occasionally, comparisons between total kick fre- a result was given in the play-by-play transcript, are quencies computed from play-by-play data with included in the dataset. The task loops over sea- kicker career totals will reveal slight discrepancies, sons 1998-2014, weeks within season (including the where certain plays are absent from the pro-football- NFL playoffs), games within week, and finally plays reference play-by-play information.