<<

University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange

Supervised Undergraduate Student Research Chancellor’s Honors Program Projects and Creative Work

Spring 5-1999

Predicting National Association (NBA) Player Salaries

Andrew Thomas Fleenor University of Tennessee - Knoxville

Follow this and additional works at: https://trace.tennessee.edu/utk_chanhonoproj

Recommended Citation Fleenor, Andrew Thomas, "Predicting National Basketball Association (NBA) Player Salaries" (1999). Chancellor’s Honors Program Projects. https://trace.tennessee.edu/utk_chanhonoproj/306

This is brought to you for free and open access by the Supervised Undergraduate Student Research and Creative Work at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Chancellor’s Honors Program Projects by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. UNIVERSITY HONORS PROGRAM

SENIOR PROJECT· APPROVAL

Name: _~.rJfJ_~~_J:lt$~S------______College: _B.lf~i.tl~.?~______D~partment: _Etl1_f11CL ______Faculty Mentor: __ ~b0.rE~A1~.!Q~!_t ______

PROJECT TITLE: _Jl.e.rJl'flj_.N~~~~~L_~5J:.GJ1'iU_t+s5.99~t2fL _ _(~~b_ll1~c~{Elifj------_------______------.------I have reviewed this completed senior honors thesis with this student and certify that it is a project commensurate with honors level undergraduate research in this field. Signed: -_~_K_~ ______, Faculty !>1entor

Da teo ----2j.JJi-+~------Comments (Optional): Predicting National Basketball Association (NBA) Player Salaries

Andy Fleenor Mentor: Sharon Neidert Abstract

The National Basketball Association (NBA) is the world's premier professional basketball league, and features some of the world's best athletes competing on some of the world's most distinguished professional franchises. These players play for their teams in attempts at winning the NBA , and they are compensated well.

Increasingly, NBA players have come under intense scrutiny for their large salaries.

Many questions have been asked, including why they are paid what they are and how are salary decisions made. Although these questions are beyond my scope, I have created a sufficient statistical model, using player statistics from the 1997-98 NBA season, to predict salaries. Only quantitative data were used, which flaws the model somewhat

since items such as fan appeal cannot be taken into account. However, this model does highlight some interesting trends and clearly shows the statistical data by which many

NBA player salaries are judged. Introduction

Following the 1997-98 NBA season, team owners locked out the players in what can best be called a simple labor dispute. Owners felt players were overpaid, players felt as if their earning power was restricted. The lockout lasted several months, and threatened to completely cancel the 1998-99 season. As a result of this labor dispute, the question ofNBA salaries became a hot topic of discussion both in the world and the economic community. So in an effort to at least get some idea of what players are worth, judging strictly from quantitative data, this model attempts to estimate salary based on on-the-court performance. I will also look, though not extensively, into the argument that there is unequal distribution of salaries across the board for the NBA players.

Data

This model cannot be fully explained without the detailing of the actual data used.

First, the data used were the individual statistics accumulated by each of the NBA players during the 1997-98 season. Included in this are the raw numbers as well as per game averages for each category to compensate for any lost statistics due to injuries or suspensions. Second, each player was sorted according to team. Third, data were collected as to how many games each player's team won and whether or not that team made the . Finally, the players were classified, via a dummy variable field, as to what position they played. Since forwards and centers generally are most useful to a team in rebounding whereas guards are called upon to generate assists, this field proved useful in the final model. 2

Here is a description of each data field:

Salary: Amount of money, on an annual basis, paid to the player by the team. Years: Number of years player has played in the league. Team Wins: Number of wins player's team had in 1997-98 season. Playoffs: Whether or not player's team made playoffs. Games: Number of regular season games (out of 82 total) player participated in. Minutes: Total number of minutes player was on court during games. MinslGame: Average number of minutes played per game. FGM: Total number offield goals (shots) made. FGM/Game: Average number of field goals made per game. FGA: Total number offield goals attempted. FGAIGame: Average number offield goals attempted per game. FG%: Percentage of made field goals (FGMlFGA). F1M: Total number of free throws (shots attempted by player after being fouled) made. F1MIGame: Average number of free throws made per game. FTA: Total number of free throws attempted. FTA/Game: Average number of free throws attempted per game. F'fO/6: Percentage offree throws made (FTMlFTA). 3FGM: Total number of three-point field goals made. 3FGMIGame: Average number of three-point field goals made per game. 3FGA: Total number of three-point field goals attempted. 3FGA/Game: Average number of three-point field goals attempted per game. 3FG%: Percentage of three-point field goals made. ORB: Total number of offensive rebounds ORB/Game: Average number of offensive rebounds per game. 1RB: Total number of rebounds (offensive and defensive). TRBIGame: Average number of rebounds per game. Assists: Total number of assists. AssistslGame: Average number of assists per game. Steals: Total number of steals. Steals/Game: Average number of steals per game. Points: Total number of points scored all season. Average: Average number of points scored per game. Position: Position they play, broken down into front court (1) and backcourt (0).

Before going any further, it should be noted that many players in this study signed long-term contracts well before the 1997-98 season. Ideally, the amount of these 3

contracts would not be dependent on their 1997-98 statistics, but rather their statistics in the year or years just prior to the signing of the contract. All this study does is attempt to predict 1997-98 salaries based solely on statistics from 1997-98.

Method

The first step in determining a statistical model to predict salary was to identify which data fields correlated most with salary. In other words, which pieces of statistical information were most related to how much money the players made? Exhibit I shows how each data field correlated with salary.

Judging from this set of correlations, it would appear that scoring average, free throws made per game, free throws attempted per game, and field goals made per game would have the most impact on explaining how much a player makes. This seems logical, as most of the best players are generally fouled more often to prevent them from making shots, thus increasing their free throw numbers. Also, players who score more are paid more because a team must put points on the board to win games. Also, players with high scoring average are often more "fan favorite" type players. Interestingly, the number of years a player has been in the league correlates with salary rather weakly, coming in at a value under 0.30. This is partially due to players who signed long-term deals in the early- to mid-nineties. Another interesting point to consider is that whether or not a player's team made the playoffs is correlated very weakly with how much money they make. At 0.11, this says that winning games has very little to do with how much players are paid. 4

Models

So with some idea as to what pieces of information might prove useful in designing a prediction model, the next step was to manipulate the data to generate the

model itself The first model (Exhibit II) created includes every player in the NBA, and

gives an R2 value of 0.4546. In other words, roughly 45 percent of the variation in

salaries is explained in the model by the variation in the eleven statistical fields used.

These fields (position, years, team wins, field goals made per game, field goal

percentage, free throws attempted per game, total rebounds, total rebounds per game,

assists, assists per game, and average) work together to create a linear equation which

predicts one's salary based on these eleven individual statistics. The equation would be

this: y=229453.25-878420.3(position)+ 187395.22(years)+24182.28(teamwins)

+ 1786771.9(FGM/Game)-5.4(FG%)+700596.56(FTNGame)-7305.1(TRB)+

673427.75(TRB/Game) + 103458.6(assists)-920822.6(assistslgame)-580874(average).

Using this equation and a mythical set of statistics, Exhibit III shows an example of what

this equation would produce. In this made-up set of player statistics, the player is a guard

who specializes in assists (point guard), scores an average amount, and plays for a

winning team. His salary, based on this statistics, is predicted to be $6,668,120. In the

model, the actual salaries vary quite a bit from the predicted, both on the downside and

the upside. What this says is that if this model is a base for what players are worth

(which it is not, completely, due to fan appeal and other non-quantitative data), then some

players are vastly overpaid while others are vastly underpaid. 5

After looking at the model made with every NBA player, it became readily apparent that there is indeed a statistical outlier in terms of salary. One player, Michael

Jordan, made over $33 million in 1997-98, far outdistancing the rest of the NBA. This outlier status caused problems in the ability to make a valid model, and so another was made without Jordan data included. The results ofthis model are found in Exhibit IV. In this model, the R2 value is over 0.5, at 0.511. In other words, 51 percent of the variation in salaries ofNBA players other than Jordan is explained by the variations in the statistical categories used in this model. This is a jump of six percent from the first model. As in the first, eleven categories proved useful in predicting salary, but only six overlapped from the first model with every player. The six overlapping were years, team wins, field goals made per game, free throws attempted per game, total rebounds, and total rebounds per game. The five different categories which proved useful in the second model are field goals made, offensive rebounds, offensive rebounds per game, steals and steals per game.

Analysis

Both the model using every player and the model without Jordan are useful in predicting salary, with each achieving a level of significance greater than 0.99. The

Jordan-less model does achieve a higher R2 value, indicating a stronger relationship, and the residual plot appears to be more normally distributed. In deciding which model to use, there are two schools of thought. The first is that for it to be as accurate as possible, every NBA player should be included to represent the entirety of statistics and salaries.

In other words, it is a bit more representative. The other school of thought would say that 6

Jordan represents an outlier and that his presence only skews the study. Jordan's salary is, in fact, over $12 million more than the next highest paid player, but his overall statistics are also clearly the best. In my final estimation, I feel it is important to include everyone, no matter how much more money they make than everyone else. Jordan does skew the study somewhat, but there is little debate about him being the best player, bar none, in the NBA. There is at least an argument that he deserves his pay. With that said, the best model to predict salary would be the whole model, everyone included. The variation in the statistical fields used explains nearly 46 percent of variation in salary, and the p-value is low, registering less than 0.0001. It does not pinpoint exactly how much a player should get paid, but it does provide an accurate range of salaries a player should get paid, based only on quantitative data.

As for the issue of players being greatly over- or underpaid, this model is unable to determine. Judging from the output of the fit line and the residual plot, however, it is clear that many players' actual salaries are far below their predicted salaries, while even more are far above their predicted salaries. Only nine of341 players earn more than $10 million a year, while 106 make below $1 million. Of those 106, 13 make the league minimum, which is $242,000. The average salary is $2.46 million, which 124 players exceed. The median salary is $1.75 million. These numbers suggest an uneven distribution, skewed toward the right, and can be seen more readily in Exhibit V.

Conclusion

As has been shown, one can create a useful model for predicting the salaries of

NBA players based solely on their statistical performance in anyone year. Using data 7

from the 1997-98 season, I have created a model which stands to explain how much players get paid according to their court performance. Certain categories of statistics one would expect to playa large role in determining salary, such as scoring average and the number of wins the player's team had over the course of the season. Other categories, such as assists and assists per game, generally categories dominated by guards, bear significance in determining salaries for all players. Other categories such as the number of minutes a player plays per game are not useful in the prediction model at all.

In all, players do not get paid according to their performance on the court only. A variety of other factors come into play, such as fan appeal and who represents them as an agent. Some team's management bodies tend to pay more, too. With that said, though, one can still produce a reasonably accurate statistical model predicting salary based solely on on-the-court performance, measured quantitatively. 8

Exhibit I

Correlations with Salaty Years 0.2952 FGA 0.4533 3FGM 0.0606 TRBIGame 0.4436 TeamWins 0.1394 FGAIGame 0.5248 3FGMlGame 0.0448 Assists 0.2174 Playoffs 0.1104 FG% 0.1554 3FGA 0.0805 Assists/Game 0.2173 Games 0.1391 FTM 0.4785 3FGAlGame 0.0642 Steals 0.2651 Minutes 0.3699 FTMIGame 0.5245 3FG% -0.05n Steals/Game 0.2719 MinsiGame 0.4635 FTA 0.4897 ORB 0.2969 Points 0.471 FGM 0.4707 FTAlGame 0.5337 ORB/Game 0.3169 Average 0.5457 FGM/Game 0.5248 FT% 0.0507 TRB 0.3867 Position 0.0293 9

Exhibit n

Response: Salary Summary of Fit RSquare 0.454696 RSquare Adj 0.436409 Root Mean Square Error 2248759 Mean of Response 2457739 Observations (or Sum Wgts) 340

Parameter Estimates Term Estimate Std Error t Ratio Prob>ltl Intercept 229453.25 1000247 0.23 0.8187 Position -878420.3 360107.9 -2.44 0.0152 Years 187395.22 31868.3 5.88 <.0001 TeamWins 24182.278 8591.193 2.81 0.0052 FGMgame 1786771.9 568734.9 3.14 0.0018 FG% -5.4421e6 2324909 -2.34 0.0198 FTAgame 700596.56 187101.3 3.74 0.0002 TRB -7305.113 2159.826 -3.38 0.0008 TRBgame 673427.75 172569.4 3.90 0.0001 Assists 10345.578 4387.798 2.36 0.0190 Assistsgame -920822.6 331394.5 -2.78 0.0058 Average -580874 244319.8 -2.38 0.0180

Effect Test Source Nparm DF Sum of Squares F Ratio Prob>F Position 1 1 3.00902e13 5.9503 0.0152 Years 1 1 1.74858e14 34.5780 <.0001 Tea mWi ns 1 1 4.00658e13 7.9230 0.0052 FGMgame 1 1 4.9911ge13 9.8700 0.0018 FG% 1 1 2.7707ge13 5.4792 0.0198 FTAgame 1 1 7.09036e13 14.0211 0.0002 TRB 1 1 5.7849ge13 11.4398 0.0008 TRBgame 1 1 7.70088e13 15.2284 0.0001 Assists 1 1 2.81127e13 5.5593 0.0190 Assistsgame 1 1 3.90434e13 7.7208 0.0058 Average 1 1 2.85846e13 5.6526 0.0180

WhOle-Model Test 35000000

30000000

25000000

20000000 i::'... iii rn 15000000

10000000

5000000

0 05000000 15000000 25000000 350000 salary predicted 10

Exhibit n (coot' d)

20000000 -

15000000 -

til 10000000 - • ;:, • "0 a::°iIJ 5000000 - .. >.- .: o - _:"~i-_mn'_nn __ n__ n_n ___ ,nnn_m __ - ' . •'. . -5000000 I I I I I I I o 5000000 15000000 25000000 3500000 Salary Predicted 11

Exhibit III

Prediction Field Player Stats Years 10 tearTMlins 50 fgm'garre 5.3 position o fg% 0.454 ftagarre 4.2 trb 164 trbgarre 2 assists 820 assistsgarre 10 average 14.6

14526876.07 -7858755.3 $6,668,120.77 12

Exhibit IV

Response: Salary Summary of Fit RSquare 0.511221 RSquareAdj 0.494627 Root Mean Square Error 1776153 Mean of Response 2367370 Observations (or Sum Wgts) 336

Parameter Estimates Term Estimate Std Error t Ratio Prob>ltl Intercept -1.5075e6 378477.1 -3.98 <.0001 Years 128057.91 25495.35 5.02 <.0001 TeamWins 13673.884 6729.671 2.03 0.0430 FGM -9709.217 3423.271 -2.84 0.0049 FGMgame 916804.39 249944.8 3.67 0.0003 FTAgame 280350.43 96406.74 2.91 0.0039 ORB 40421.366 12898.9 3.13 0.0019 ORBgame -3. 7274e6 862503.9 -4.32 <.0001 TRB -17142.46 5196.163 -3.30 0.0011 TRBgame 1746273.7 353726.6 4.94 <.0001 Steals 39856.516 11351.71 3.51 0.0005 Stlgame -2.9865e6 780946.4 -3.82 0.0002

Effect Test Source Nparm DF Sum of Squares F Ratio Prob>F Years 1 1 7.95887e13 25.2285 <.0001 TeamWins 1 1 1.30244e13 4.1285 0.0430 FGM 1 1 2.53774e13 8.0443 0.0049 FGMgame 1 1 4.2444ge13 13.4544 0.0003 FTAgame 1 1 2.66777e13 8.4564 0.0039 ORB 1 1 3.09797e13 9.8201 0.0019 ORBgame 1 1 5.89173e13 18.6759 <.0001 TRB 1 1 3.43354e13 10.8838 0.0011 TRBgame 1 1 7.68865e13 24.3719 <.0001 Steals 1 1 3.8889ge13 12.3275 0.0005 Stlgame 1 1 4.61352e13 14.6242 0.0002

Whole-Model Test

20000000

i 10000000 CI)

o o 10000000 20000000 salary predicted 13

Exhibit IV (cont' d)

10000000 -

....

.-..., -.- ...,,_. o -- .:~~~.:~: ......

\.~... ..': ..

I I I o 10000000 20000000 Salary Predicted 14

Exhibit V

Salary 35000000

30000000

25000000

20000000

15000000 '10 10000000 • 5000000 [l 0

Quantiles maximum 100.0% 3.314e7 99.5% 2.417e7 97.5% 1.07ge7 90.0% 5078000 quartile 75.0% 3169500 median 50.0% 1750000 quartile 25.0% 704000 10.0% 272250 2.5% 242000 0.5% 242000 minimum 0.0% 242000

Moments Mean 2458743 Std Dev 2991091 Std Error Mean 161977 Upper 95% Mean 2777350 Lower 95% Mean 2140136 N 341 Sum Weights 341 Sum 838431280 Variance 8. 9466e12 Skewness 5 Kurtosis 37 CV 122