Evaluating NBA Shooting Ability using Shot Location

Dennis Lock December 16, 2013

There are many statistics that evaluate the performance of NBA players, including some that attempt to measure a players shooting ability (FG%, etc.). Most of the common shoot- ing performance metrics fall short in that they do not account for the fact that some shots are more difficult than others. This paper demonstrates a method using spatial information to evaluate shooting ability which accounts for the difficulty of the shots taken based on location.

Articles have been written examining NBA shooting on the spatial level, however the majority use spatial data to examine tendencies in specific shooters (Hickson (2003), Re- ich (2006)). Recently Kirk Goldsbury wrote a column on Grantland evaluating shooters by separating the NBA court into regions and measuring the effectiveness of a player in each individual region. While the concept of my approach is similar to Goldsbury’s it differs in that locations are treated separately rather than combined into regions, and the resulting metric measures overall shooting.

Initially examining how difficulty of shots varies by location can help rule out some of the more common models. Figure 1 shows empirical estimation of shooting percentage by distance and (x,y) coordinates from all shots during the 2012-13 NBA regular season. The trend in Figure 1(a) indicates that a logistic regression with a distance covariate will not fit the data well. Regardless due to extrapolation individually fit logit curves tend to overesti- mate the shooting ability of players with no long shots. Figure 1(b) also indicates that their may be information in the (x,y) coordinates beyond distance, especially close to the basket, that our model should account for. The natural first model to try for spatial point data may be a Kriging model, however one of the major assumptions of Kriging is stationarity. Referencing Figure 1(b) again it does not appear this is a reasonable assumption in this case. Specifically the effect of a spatial distance measure is far greater for observations close to the basket, compared to observations in the rest of the sample space.

The model fit here is closer related to an aerial data modeling approach, utilizing how each (x,y) coordinate location is exact integers (in feet) making a grid of possible locations. The specific model fit is an alteration of an autologistic model (Heikkinen (1994)), and is presented below. Yij ∼ Bern(pij), logit(pij) = φ[logit(¯pij)] + βj P y lk∈Nij lk p¯ij = nij 2 Bj ∼ Normal(0, σβ) with yij = 1 if the shot resulted in a basket, Nij the neighborhood for observation ij, nij the number of observations in the neighborhood of ij, and Bj a random component indicating

1 which player took the shot. The neighborhood for observation ij is determined as shots from all locations sharing a border or corner (so most interior locations have 8 neighbors), as well as shots from the same location. Plotting thep ¯ values in Figure 2 we see a bimodal distribution of probabilities, which coincides with what would be expected from observing Figure 1(b). We place fairly non-informative, but positive restricting priors on φ the parameter associated 2 withp ¯ and σβ the variance of the random component for players

2 φ ∼ G(1, 1), σβ ∼ IG(1, 1).

Performance of this model is demonstrated on players from the NBA champion Miami 2 Heat. Posterior plots for the parameters φ and σβ are show in Figure 3. We would expect φ to around 1 by design, however the median φ = 1.13 with the model fit to the , possibly indicating Miami shooters as a whole are slightly more affected by location than the league average. Note that φ works decently as a sensitivity to distance parameter since the logit transformation splits the two modes in the distribution ofp ¯ into positive and negative values.

Table 1 shows results for the random component β after transformation for each of the Miami shooters (who took at least 100 shots), ordered by median value. Number of shots taken and raw shooting percentage (FG%) are also provided for each player. Note the top three rated shooters by median β and the only three players with credible intervals above 0 are the players known as the “Big 3”, and are by far the highest paid players on the team. The next 4 shooters are all smaller guards known for shooting 3 point shots, which results in positive β values despite their relatively low FG%. Chris Andersen the player with the highest FG% on the team actually has a negative median β, due most likely to the fact that Andersen only takes high probability shots.

Player Shots FG% β Lebron James 1344 0.564 0.090 (0.061,0.117) 906 0.535 0.076 (0.043,0.108) 1082 0.525 0.044 (0.013,0.072) Mike Miller 230 0.430 0.043 (-0.017,0.103) 358 0.425 0.039 (-0.010,0.094) 642 0.449 0.026 (-0.011,0.063) 525 0.432 0.018 (-0.031,0.060) 249 0.506 0.001 (-0.055,0.062) 246 0.506 -0.001 (-0.071,0.066) Chris Andersen 123 0.577 -0.032 (-0.106,0.062) 419 0.418 -0.060 (-0.110,-0.007)

Table 1: Miami Heat players shooting percentage and transformed Beta values (with 95% credible interval), bold indicates the credible interval does not contain 0.

Comparing FG% to β values for all players in the league we see a positive relationship (Figure 4), but still a wide array of possible β values at similar FG%. The correlation is

2 presented in Table 2, as well as the correlation between percentage (FT %) and both metrics. One of the easiest arguments against FG% as a shooting metric is that it is negatively correlated with FT %. By contrast median β values are significantly positively correlated with both FG% and FT %. Due to computational challenges and to obtain a fair metric across teams the β values fit here are modeled individual for each player with φ fixed at 1.

FG% FT % β FG% 1 -0.192 0.653 FT % -0.192 1 0.287 β 0.653 0.287 1

Table 2: Correlation matrix showing the relationship between FG%, FT %, and β from all players (with at least 100 shots).

Shortcomings of this model include an inability to determine where a shooter is successful, since we obtain one value to evaluate a shooters overall ability to make shots. To determine how or where one shooter is better than another the approach by Goldsbury would outper- form this approach, as it specifically measures where a player is successful. Additionally in either of our approaches shot difficulty is only measured by location, ignoring the possibility that two shots from the same location could be very different in difficulty based on a number of unobserved variables. Also noted previously computational challenges do arise in attempts to fit the model to all shooters at once, due to the large number of shooters and very large number of shots.

While it is difficult to fully evaluate the worth of a shooting ability model since a com- parative metric does not exist, my model appears to agree with what one would expect to see from conventional thinking. That is the players most people would deem the best scorers on the Heat have the largest positive parameter values, and the values from all players are positively related to both FG% and FT %.

3 (a) (b)

Figure 1: (a) Shooting percentage by distance (dotted line is start of 3 point arc), (b) shooting percentage heatmap using (x,y) coordinates, the basket is at (25,4).

Figure 2: Histogram ofp ¯ values from all shots of the 2012-13 season.

4 (a) (b)

2 Figure 3: Posterior plots from 10,000 draws of (a) φ and (b) σβ.

Figure 4: Comparing FG% and median β values from all shooters who took at least 100 shots.

5