Dynamic Player Significance (DPS): a New Comprehensive Basketball Statistic
Total Page:16
File Type:pdf, Size:1020Kb
Dynamic Player Significance (DPS): A new comprehensive basketball statistic David Hill Media Lab, Massachusetts Institute of Technology Cambridge, Massachusetts, USA [email protected] 15.071 (The Analytics Edge) Final Project May 13, 2013 Abstract Dynamic player significance is a new basketball metric designed to measure each NBA player's importance, or significance, to his franchise. This metric attempts to clearly define each player's role on the team and how it fits with the team's identity. Its key aspect is that it is influenced by the on-court identity of each franchise, which is defined as the collection of factors that contribute most to a win by a given team. These factors differ for every NBA team. Therefore, two players with completely identical stats/skillsets, but different teams, will most likely have different significance values. Alternatively, if a player is traded from one team to another, his significance will differ on the new team even if his production remains constant. Hence, dynamic player significance. Here, I have broken down the components of the model and explored three case studies that clearly show how teams’ identities deviate. Additionally, player evaluations have been explored to show tendencies in the model across multiple conditions. The proposed statistic could help inform team personnel decisions and coaching strategies, in addition to gauging player effectiveness. Motivation In the NBA, one of the most important things for any team to establish is an identity. These are the traits that define a team. Often, identity is the major factor that governs all transactions, whether the team is looking for players, hiring a coach, or filling front office positions. Establishing that identity is key to achieving sustained success. Up until this point, there have not been, to my knowledge, any basketball stats that attempt to uncover the importance of team identity. Instead, most focus on gauging player effect with general metrics that evaluate players universally. However, while these metrics form an outstanding comparison of players that is consistent across the league, they are not the best for bringing out each player's individual importance to his team. Actually, extremely talented players can go unnoticed, buried on the bench, because they're strengths do not coincide with their current team's identity. Their presence would be much more significant on another team. Here, we have built a model that uncovers team identity and uses it to derive a quantity to describe a player's importance, or significance, to his team. It is called Dynamic Player Significance (DPS). Dynamic refers to the models ability to tailor to any given team, revealing each player's significant contributions. Therefore, each player's DPS will be different on different teams. This stat could be used to influence personnel decisions, coaching strategies, and gauge player effect. Methods Data 1 All data used for this project were pulled from www.basketball-reference.com. Databases were formed in excel and imported into R using the “gdata” library. R was used to aggregate the data into more organized data frames. The dataset used in this project included the complete 2012-2013 game logs (82 games) for all 30 NBA teams. Variables/stats included in the dataset are shown in Table 1. Dataset (only relevant variables shown) General Home/Away, Opponent, Win/Loss Offense Field Goals, Three Point Shots, Free Throws, Assists, Offensive Rebounds, Turnovers Defense Steals, Blocks, Defensive Rebounds, Personal Fouls Table 1: Basic Team Statistics Model I desired to build a model using the previously described dataset (See Table 1) that could be used to effectively predict wins for NBA franchises and form the basis of a reasonable metric for evaluating NBA players. The model will assume a dynamic nature by being altered for individual NBA franchises, extracting each team’s identity and judging the players on each roster according to the significant statistics for their specific team. The programming language R was used to build all regression models. Specifically, logistic regression commands are a part of R’s basic library and the commands for Classification and Regression Tree (CART) modeling come from the “rpart” library. Results Team Identity Determination First, a general model was constructed to predict wins for the entire NBA. It was built using logistic regression. The model took the following form: Win Probability = -15.16 + (0.21× FGM) + (0.11×3PM) + (0.13× FTM) + (0.18× DRB) + (0.04 × AST) + (0.18×STL) (1) + (0.10 × BLK) - (0.13× TO) - (0.10 × PF) where the variables shown represent field goals made, three point shots made, free throws made, defensive rebounds, assists, steals, blocks, turnovers, and personal fouls (listed in the order that they appear in the equation). This model correctly predicts the outcome of NBA games 78% of the time, which holds up well when compared to other baseline models (Table 2). Model Prediction Accuracy Home Court 0.61 Advantage Better Record 0.68 2 Original Model (eq. 1) 0.78 Table 2: Baseline Comparison However, when this general model is applied to teams, the results become a little more questionable. Let's look at three separate case studies: the Memphis Grizzlies, Miami Heat, and Charlotte Bobcats. Now, applying the general model to these teams yields success rates of 71%, 82%, and 79%, respectively. This exposes the flaws with using the general model to evaluate every team. Variability in prediction accuracy amongst different teams shows how this model may not work as well for evaluating some teams as it does for others because each team wins, and equivalently loses, in different ways. Instead, models for each of those teams were built to see how they compare to the general model. The models take the following forms: WPMEM = −36.22 + (0.67× FGM )+ (0.76 ×3PM ) +(0.36 × FTM )+ (0.35× DRB)+ (0.39 × STL) (2) +(0.74 × BLK)− (0.22 ×TO)− (0.28× PF) WPMIA = −39.08+ (0.49 × FGM )+ (0.38×3PM ) +(0.38× FTM )+ (0.22 × DRB)+ (0.53× AST) (3) +(0.55× STL) WP = −65.58+ (0.84 × FGM )+ (0.56 × FTM ) CHA (4) +(0.65× DRB)+ (0.47× STL)+ (0.36 × BLK) These new models have prediction accuracies of 89%, 94%, and 90%, respectively. This underscores the importance of team identity and shows that each team has a slightly different formula for producing wins. Player Evaluation Finally, the same model used to determine team identity can be used to evaluate player performance. This evaluation is what I call Dynamic Player Significance. Using the most basic NBA statistics recorded for individual players, the models from above can used to quantify each NBA player’s importance to his team. Table 3 shows DPS values for a few NBA players. More DPS ratings can be found in the appendix. Player Gen_DPS MEM_DPS MIA_DPS CHA_DPS LeBron James 4.120131579 11.82381579 13.20907895 16.06447368 Dwyane Wade 2.843188406 8.240434783 9.810869565 12.07072464 Zach Randolph 2.526842105 6.784210526 6.426842105 11.1775 Chris Bosh 2.437702703 7.381486486 6.50027027 10.60351351 Kemba Walker 2.379634146 7.080853659 9.249878049 9.430853659 Marc Gasol 2.297875 6.623375 7.278125 10.215 3 Rudy Gay 2.238095238 6.930952381 6.911666667 9.551190476 Mike Conley 1.976875 5.82275 8.585375 7.77425 Gerald Henderson 1.928088235 5.766617647 6.364558824 8.393088235 Ramon Sessions 1.733442623 4.98 6.538032787 7.321311475 Byron Mullens 1.619056604 5.006226415 4.467924528 6.659245283 Michael Kidd-Gilchrist 1.441794872 4.129358974 4.045641026 6.538974359 Ray Allen 1.318734177 4.504556962 4.523544304 5.190379747 Tony Allen 1.293797468 3.651772152 4.085696203 6.039113924 Josh McRoberts 1.253846154 3.436153846 4.100384615 5.510769231 Tayshaun Prince 1.164324324 3.348918919 3.598918919 4.502972973 Bismack Biyombo 1.0955 3.4355 2.309875 5.29675 Mario Chalmers 1.081298701 3.486103896 5.139480519 4.331428571 Ben Gordon 1.052 3.9084 4.200533333 4.663333333 Table 3: Selected DPS Ratings Discussion The results above reveal a few very interesting traits in the model. The first key trait centers around the team model. Comparing the team-specific models to the general model shows just how much identity differs from team-to-team. That is, each team has certain statistics that increase (or decrease) the likelihood of a win, and the degree to which these key statistics affect the model can shift quite a bit. We see prediction accuracy make a steep jump when we move from the general model to the team identity model. This shows that, while accurate, general models should not be used as blanket predictors for all teams. Additionally, it provides a blueprint for teams trying to win and a strategy for teams preparing to stop that win. The next trait involves tying these findings to player evaluation. Pinpointing a player's true worth takes a dynamic approach and can actually vary depending on circumstances. The results above clearly indicate this. From these results, we can see how different players, different positions, and different skill sets are valued by certain teams with varying schemes. Superstars, like Lebron James, can basically go anywhere and there presence will be felt. On the other hand, second tier players, like Zach Randolph or Mike Conley, become more or less valuable in different situations depending on how the team's needs/identity fit with their skill set. This is an interesting observation, which exemplifies the true worth of this sort of statistic. Now, we can begin to quantify "fit", which helps with targeting players for different roles on a franchise, given an established identity. The work presented here does have a few limitations.