Dynamic Player Significance (DPS): A new comprehensive statistic

David Hill Media Lab, Massachusetts Institute of Technology Cambridge, Massachusetts, USA [email protected]

15.071 (The Analytics Edge) Final Project May 13, 2013

Abstract

Dynamic player significance is a new basketball metric designed to measure each NBA player's importance, or significance, to his franchise. This metric attempts to clearly define each player's role on the team and how it fits with the team's identity. Its key aspect is that it is influenced by the on-court identity of each franchise, which is defined as the collection of factors that contribute most to a win by a given team. These factors differ for every NBA team. Therefore, two players with completely identical stats/skillsets, but different teams, will most likely have different significance values. Alternatively, if a player is traded from one team to another, his significance will differ on the new team even if his production remains constant. Hence, dynamic player significance. Here, I have broken down the components of the model and explored three case studies that clearly show how teams’ identities deviate. Additionally, player evaluations have been explored to show tendencies in the model across multiple conditions. The proposed statistic could help inform team personnel decisions and coaching strategies, in addition to gauging player effectiveness.

Motivation

In the NBA, one of the most important things for any team to establish is an identity. These are the traits that define a team. Often, identity is the major factor that governs all transactions, whether the team is looking for players, hiring a coach, or filling front office positions. Establishing that identity is key to achieving sustained success.

Up until this , there have not been, to my knowledge, any basketball stats that attempt to uncover the importance of team identity. Instead, most focus on gauging player effect with general metrics that evaluate players universally. However, while these metrics form an outstanding comparison of players that is consistent across the league, they are not the best for bringing out each player's individual importance to his team. Actually, extremely talented players can go unnoticed, buried on the bench, because they're strengths do not coincide with their current team's identity. Their presence would be much more significant on another team.

Here, we have built a model that uncovers team identity and uses it to derive a quantity to describe a player's importance, or significance, to his team. It is called Dynamic Player Significance (DPS). Dynamic refers to the models ability to tailor to any given team, revealing each player's significant contributions. Therefore, each player's DPS will be different on different teams. This stat could be used to influence personnel decisions, coaching strategies, and gauge player effect.

Methods

Data

1

All data used for this project were pulled from www.basketball-reference.com. Databases were formed in excel and imported into R using the “gdata” library. R was used to aggregate the data into more organized data frames. The dataset used in this project included the complete 2012-2013 game logs (82 games) for all 30 NBA teams. Variables/stats included in the dataset are shown in Table 1.

Dataset (only relevant variables shown) General Home/Away, Opponent, Win/Loss Offense Field Goals, Three Point Shots, Free Throws, Assists, Offensive Rebounds, Turnovers Defense Steals, Blocks, Defensive Rebounds, Personal Fouls

Table 1: Basic Team Statistics

Model

I desired to build a model using the previously described dataset (See Table 1) that could be used to effectively predict wins for NBA franchises and form the basis of a reasonable metric for evaluating NBA players. The model will assume a dynamic nature by being altered for individual NBA franchises, extracting each team’s identity and judging the players on each roster according to the significant statistics for their specific team.

The programming language R was used to build all regression models. Specifically, logistic regression commands are a part of R’s basic library and the commands for Classification and Regression Tree (CART) modeling come from the “rpart” library.

Results

Team Identity Determination

First, a general model was constructed to predict wins for the entire NBA. It was built using logistic regression. The model took the following form:

Win Probability = -15.16 + (0.21× FGM) + (0.11×3PM) + (0.13× FTM) + (0.18× DRB) + (0.04 × AST) + (0.18×STL) (1) + (0.10 × BLK) - (0.13× TO) - (0.10 × PF) where the variables shown represent field goals made, three point shots made, free throws made, defensive rebounds, assists, steals, blocks, turnovers, and personal fouls (listed in the order that they appear in the equation). This model correctly predicts the outcome of NBA games 78% of the time, which holds up well when compared to other baseline models (Table 2).

Model Prediction Accuracy Home Court 0.61 Advantage Better Record 0.68

2 Original Model (eq. 1) 0.78

Table 2: Baseline Comparison

However, when this general model is applied to teams, the results become a little more questionable. Let's look at three separate case studies: the , , and Charlotte Bobcats. Now, applying the general model to these teams yields success rates of 71%, 82%, and 79%, respectively. This exposes the flaws with using the general model to evaluate every team. Variability in prediction accuracy amongst different teams shows how this model may not work as well for evaluating some teams as it does for others because each team wins, and equivalently loses, in different ways.

Instead, models for each of those teams were built to see how they compare to the general model. The models take the following forms:

WPMEM = −36.22 + (0.67× FGM )+ (0.76 ×3PM ) +(0.36 × FTM )+ (0.35× DRB)+ (0.39 × STL) (2) +(0.74 × BLK)− (0.22 ×TO)− (0.28× PF)

WPMIA = −39.08+ (0.49 × FGM )+ (0.38×3PM ) +(0.38× FTM )+ (0.22 × DRB)+ (0.53× AST) (3) +(0.55× STL)

WP = −65.58+ (0.84 × FGM )+ (0.56 × FTM ) CHA (4) +(0.65× DRB)+ (0.47× STL)+ (0.36 × BLK)

These new models have prediction accuracies of 89%, 94%, and 90%, respectively. This underscores the importance of team identity and shows that each team has a slightly different formula for producing wins.

Player Evaluation

Finally, the same model used to determine team identity can be used to evaluate player performance. This evaluation is what I call Dynamic Player Significance. Using the most basic NBA statistics recorded for individual players, the models from above can used to quantify each NBA player’s importance to his team. Table 3 shows DPS values for a few NBA players. More DPS ratings can be found in the appendix.

Player Gen_DPS MEM_DPS MIA_DPS CHA_DPS LeBron James 4.120131579 11.82381579 13.20907895 16.06447368 2.843188406 8.240434783 9.810869565 12.07072464 2.526842105 6.784210526 6.426842105 11.1775 2.437702703 7.381486486 6.50027027 10.60351351 2.379634146 7.080853659 9.249878049 9.430853659 2.297875 6.623375 7.278125 10.215

3 2.238095238 6.930952381 6.911666667 9.551190476 Mike Conley 1.976875 5.82275 8.585375 7.77425 Gerald Henderson 1.928088235 5.766617647 6.364558824 8.393088235 1.733442623 4.98 6.538032787 7.321311475 1.619056604 5.006226415 4.467924528 6.659245283 Michael Kidd-Gilchrist 1.441794872 4.129358974 4.045641026 6.538974359 1.318734177 4.504556962 4.523544304 5.190379747 1.293797468 3.651772152 4.085696203 6.039113924 Josh McRoberts 1.253846154 3.436153846 4.100384615 5.510769231 1.164324324 3.348918919 3.598918919 4.502972973 1.0955 3.4355 2.309875 5.29675 1.081298701 3.486103896 5.139480519 4.331428571 Ben Gordon 1.052 3.9084 4.200533333 4.663333333

Table 3: Selected DPS Ratings

Discussion

The results above reveal a few very interesting traits in the model. The first key trait centers around the team model. Comparing the team-specific models to the general model shows just how much identity differs from team-to-team. That is, each team has certain statistics that increase (or decrease) the likelihood of a win, and the degree to which these key statistics affect the model can shift quite a bit. We see prediction accuracy make a steep jump when we move from the general model to the team identity model. This shows that, while accurate, general models should not be used as blanket predictors for all teams. Additionally, it provides a blueprint for teams trying to win and a strategy for teams preparing to stop that win.

The next trait involves tying these findings to player evaluation. Pinpointing a player's true worth takes a dynamic approach and can actually vary depending on circumstances. The results above clearly indicate this. From these results, we can see how different players, different positions, and different skill sets are valued by certain teams with varying schemes. Superstars, like Lebron James, can basically go anywhere and there presence will be felt. On the other hand, second tier players, like Zach Randolph or Mike Conley, become more or less valuable in different situations depending on how the team's needs/identity fit with their skill set. This is an interesting observation, which exemplifies the true worth of this sort of statistic. Now, we can begin to quantify "fit", which helps with targeting players for different roles on a franchise, given an established identity.

The work presented here does have a few limitations. Among them are a lack of more advanced statistics and an absence of efficiency stats. The model, as presented here, only takes into account the most basic NBA statistics (see Methods). There are several important metrics that are obtained or calculated by NBA teams that are not included in the calculation of DPS. More comprehensive datasets that include advanced statistics would greatly enhance our findings, providing a deeper view of the important aspects of team success. In addition, player efficiency is key in basketball and should be incorporated into the model to enhance the metric.

Several applications for this work exist. The basic application is for general evaluation of player performance. Through this statistic, one can quickly gauge a player’s importance to the team and

4 how pivotal their game/season output is in influencing wins. Next, DPS can be used to determine 5- on-5 line-up combinations. DPS uncovers the exact factors that are most closely linked to each team’s success. Knowing these factors and which players affect them the most helps the coach determine which players to play, when to play them, and with whom to play them. The most important effects of DPS will be on personnel decisions. Since DPS differs from team-to-team, it could allow franchises to evaluate potential player transactions with its ability to quantify “fit” and bring out the importance of “specialists” numerically. Less obvious and perhaps less realistic, is DPS’s potential role in picking coaches. Given a crop of NBA players, a franchise could use DPS to hire a coach that fits the roster. By examining the coach’s previous terms to discover his coaching identity, a franchise can weigh the coach’s strengths alongside the specialties of the current roster to determine whether the coach fits with the team.

Conclusion

I have proposed a new basketball metric for quantifying the contributions of NBA players in team- defined key areas. To my knowledge, this is the first basketball statistic that measures and incorporates team identity. We have examined case studies to show how DPS measures up in this past season. Potential applications include informing personnel decisions for NBA franchises, affecting 5-on-5 line-up combinations, and gauging the effect of each player’s performance on a nightly basis. Going forward, I would like to extend this work to address the previously discussed limitations and look farther back into NBA history to build a more complete analysis of previous players and teams.

Acknowledgements

This work was done as a part of The Analytics Edge class at the MIT Sloan School of Management. Special thanks to the instructors and TA for the course: Dimitris Bertsimas, Allison O’Hair, and John Silberholz.

5 Appendix A – Full DPS Chart

Player Gen_DSP MEM_DSP MIA_DSP CHA_DSP LeBron James 4.120131579 11.82381579 13.20907895 16.06447368 Dwyane Wade 2.843188406 8.240434783 9.810869565 12.07072464 Zach Randolph 2.526842105 6.784210526 6.426842105 11.1775 Chris Bosh 2.437702703 7.381486486 6.50027027 10.60351351 Kemba Walker 2.379634146 7.080853659 9.249878049 9.430853659 Marc Gasol 2.297875 6.623375 7.278125 10.215 Rudy Gay 2.238095238 6.930952381 6.911666667 9.551190476 Mike Conley 1.976875 5.82275 8.585375 7.77425 Gerald Henderson 1.928088235 5.766617647 6.364558824 8.393088235 Ramon Sessions 1.733442623 4.98 6.538032787 7.321311475 Byron Mullens 1.619056604 5.006226415 4.467924528 6.659245283 Michael Kidd-Gilchrist 1.441794872 4.129358974 4.045641026 6.538974359 Ray Allen 1.318734177 4.504556962 4.523544304 5.190379747 Tony Allen 1.293797468 3.651772152 4.085696203 6.039113924 Josh McRoberts 1.253846154 3.436153846 4.100384615 5.510769231 Tayshaun Prince 1.164324324 3.348918919 3.598918919 4.502972973 Bismack Biyombo 1.0955 3.4355 2.309875 5.29675 Mario Chalmers 1.081298701 3.486103896 5.139480519 4.331428571 Ben Gordon 1.052 3.9084 4.200533333 4.663333333 0.983875 3.058875 4.503875 4.298625 0.768888889 3.233194444 2.612638889 2.809583333 0.754933333 1.781866667 1.872933333 3.73 0.64925 2.13125 1.794 3.45525 Mike Miller 0.630847458 2.094915254 2.293220339 2.111186441 0.616779661 2.199661017 2.373220339 2.566779661 Jeff Taylor 0.61038961 2.100519481 2.270909091 2.762857143 0.59047619 2.00952381 1.512380952 3.03452381 0.576388889 2.117222222 1.258055556 2.98 0.553050847 1.904915254 1.824576271 3.013389831 0.547037037 1.668888889 1.855925926 2.753333333 0.535818182 1.997454545 1.66 2.142363636 0.52125 1.617 2.8075 2.700125 0.512884615 1.495769231 1.500769231 2.551538462 0.457540984 1.494098361 1.284590164 2.651639344 0.36275 1.553 1.58225 1.23225 Reggie Williams 0.193 0.8315 0.9435 0.54625 0.191612903 1.056774194 0.764193548 0.788064516 Tyrus Thomas 0.154615385 0.827692308 0.881153846 1.149230769 0.101612903 0.571612903 0.225322581 0.646290323 0.077777778 1.407222222 1.78 0.166666667 James Jones -0.138947368 -0.039210526 -0.24 -0.774210526 Tony Wroten -0.170571429 -0.230857143 0.488 -0.318571429 DeSagana Diop -0.273636364 -0.445454545 -0.75 -1.095454545

6 -0.482105263 -1.106842105 -1.205263158 -2.024210526 Hamed Haddadi -0.932307692 -2.04 -2.270769231 -3.646923077 -1.307 -2.989 -2.878 -5.243 Chris Johnson -1.38375 -2.79125 -3.30625 -6.25 Keyon Dooling -1.785714286 -3.737142857 -3.57 -7.528571429 -1.81 -4.251428571 -4.122857143 -7.345714286 -1.915 -4.4925 -4.55375 -7.59625 -1.971428571 -4.67 -4.808571429 -8.184285714 -2.16 -5.17 -5.45 -9.062857143 Cory Higgins -2.176666667 -5.101666667 -5.106666667 -9.545 -2.311666667 -5.278333333 -5.821666667 -9.705 Dexter Pittman -3.6625 -8.68 -9.2375 -15.2775 Matt Carroll -15.22 -36.5 -38.55 -65.58

7