<<

NBA MVP Prediction Model

Tongan Wu

1 Abstract The NBA Most Valuable Player (MVP) award has been awarded since the 1955-56 NBA season, with one player with the most perfect behaviors each regular season. The determination of MVP shifted from being voted by NBA players to being decided by sportswriters and reporters of the United States and Canada in 1981. For decades, the final choice of MVP has been one of the most popular discussion topics in the NBA league as well as among all the sportswriters, media outlets, and fans. This interest has also evoked many predictions toward the determination of the Most Valuable Player.

Based on previous investigation and articles, there are many factors affecting who is awarded the MVP award, including average , , win shares, win shares/48 min, assists, rebounds, fouls, turnovers, blocks, adjusted production, percentages, and much more.

In this research paper, I investigate the question: which factors predict who is awarded the NBA MVP., To do so, I conducted background research, developed graphs of different variables and how they relate to the MVP award, created a stepwise logistic regression model to predict the Most Valuable Player in the NBA based on the players’ data of each regular season, and tested my model based on previous data and MVP results. I find that the factors such as the

PER (Player Ratings), the TS% (True Shooting Percentage), the TRB% (Total Rebounds

Percentage ), the AST% (Assists Percentage Percentage), USG% (Usage Percentage), the WS

(Win Share), and the Adjusted Production have positive relationship with MVP, while the ORB%

(Offensive Rebounds Percentage), the DRB% (Defensive Rebounds Percentage), and the TOV%

( Percentage) are in opposite.

2 Finally, some unexpected results are discussed after the prediction and the testing process, as well as some limitations and future plans.

Key Words: NBA, MVP prediction, logistic regression model, possibilities, variables, code

3 Contents

Abstract 2 Contents 5 Introduction 6 Data 10 Descriptive Analysis 12 Empirical Design 20 Results 20 Discussion 24 Conclusion 28 References 30 Acknowledgement 32

4 Introduction On June 25th, 2019, was selected as the Most Valuable Player

(MVP) for 2018-2019 NBA regular season.

The NBA has been awarding the Most Valuable Player Award (MVP) each year since the

1955-1956 season, with the purpose of acknowledging the player with the best performance of the regular season.

Chart 1.1 MVP in history

The Most Valuable Player Award was first presented by the National Basketball

Association in 1955, the winner of which receives The Maurice Podoloff NBA Basketball Sports

5 Trophy, also known as the NBA MVP Trophy, named in honor of the first commissioner and president of the NBA. The MVP award was selected by NBA players through voting before the season of 1980-1981. Since then, it has been decided by a group of sportswriters and broadcasters from North America. 1

Choosing the MVP and the final decision of MVP has been discussed for years. People compare the players’ statistics and make predictions as the regular seasons go on, and they eagerly wait for the results after the season is over.

Interest for ‘nba mvp’ peaks each year in June when the award is announced. Source: Google

Trends

Previously, I believed that the MVP award was only based on the ability of the player.

Among the methods that already exist currently, one of the most effective and powerful methods to determine the general basketball ability of an NBA player, which is derived from the basic statistics of the player, is the (PER) developed by John Hollinger from

1 WIKIPEDIA.ORG: NBA Most Valuable Player Award. https://en.wikipedia.org/wiki/NBA_Most_Valuable_Player_Award

6 ESPN. According to Hollinger, “The Player Efficiency Rating (PER) sums up all a player's positive accomplishments, subtracts the negative accomplishments, and returns a per- rating of a player's performance."2 The league average PER is normalized at 15.00 every season. The higher the PER of a player is, the better he generally performed in that season. By comparing the relationship between the PER ranking list and the winner of MVP each season, I found a somewhat consistent correlation. Shown in the Chart 1.2 below, the MVP winner had the highest PER of the season for most of the time.

Season MVP winner Player with 1st PER The MVP’s PER Ranking

2006-2007 Dwayne Wade 2

2007-2008 Lebron James 7

2008-2009 Lebron James Lebron James 1

2009-2010 Lebron James Lebron James 1

2010-2011 Lebron James 9

2011-2012 Lebron James Lebron James 1

2012-2013 Lebron James Lebron James 1

2013-2014 Kevin Durant 1

2014-2015 3

2015-2016 Stephen Curry Stephen Curry 1

2016-2017 Russell Westbrook 1

2017-2018 James Harden 1

2018-2019 Giannis Giannis 1 Antetokounmpo Antetokounmpo Chart 1.2 The relationship of MVP and PER ranking

2 HOLLINGER, JOHN: “What is PER?” http://sports.espn.go.com/nba/columns/story?columnist=hollinger_john&id=2850240

7 However, it can be inferred that the ranking of PER of the season doesn’t necessarily determine the winner of the MVP award. Therefore, a more precise model is needed that take into account more variables This research paper seeks to create an accurate NBA MVP prediction model that, at the very least, uncovers which variables are most predictive.

Among all the factors, the most important factor found in the literature is the winning factor of a team (i.e. ‘win%’)3. Among the 65 NBA MVPs from the 1955-1956 season to the 2018-2019 season, 41 of them were on the team that had the best regular season record, and 23 of them won the championship during their year of winning the MVP. Only 6 MVPs belonged to the team of season win% below .60, and more than two thirds of the MVPs’ teams had the win% more than .70.

The two lowest win% among the 65 seasons were ’s team win% of .458 in 1956 and

Kareem Abdul-Jabbar’s team win% of .488 in 1976. While the two highest win% during the past

65 years were Stephen Curry’s .890 in 2016, with the record of 73-9, and ’s .878 in 1996. Over the past thirty seasons, 29 of the MVPs were played for the team with the win % of more than .65, except for Russel Westbrook with the win% of .573.

However, winning is not the only factor that is influencing the MVP. Based on the research done previously, there are many factors and criteria that contributes to the decision of MVP, including team wins, games played, minutes played, points per game, true shooting%, free-throw, three-points, assists, rebounds, blocks, teals, fouls, turnovers, , , and a player’s contribution to the offense, defense, and overall result of each game. My model also suggests that the win share, the offensive win share per 48 minutes, and the adjusted production are some of the most important variables in determining the NBA MVP.

3 LEUNG, STUART: “NBA MVP Criteria: How the NBA should choose MVP based on past winners” https://www.interbasket.net/news/22344/2018/06/nba-mvp-criteria-how-nba-choose-mvp-winning-team- record/

8 In order to create my analysis, I first did some background research by reading articles and essays, as well as searching on the Internet. Secondly, I investigated and summed up the variables that influence the possibilities of an MVP. Then, I searched the NBA dataset and download it into

Excel. Before I created the model, I developed graphs of different variables that can show their relationship. By doing the data analysis, I can get more background information of these factors and also have a deeper understanding of each factor’s function. After that, I input the MVP data into the chart in Excel, with ‘1’ as MVP, and ‘0’ as NonMVP.

My analysis builds on the literature by broadening the NBA statistics considered in predicting the NBA MVP using a dataset with over 50 variables.4 I used data from 67 seasons and create a model predicting the possibilities for a player to win the MVP based on his statistics.

Creating initial predictive model for MVP is crucial since a precise model can help NBA players, commentators, and fans. For example, it could help those who choose MVP think more about the criteria they use.

Data Sources of data

Kaggle is an online sharing platform that includes download-open datasets on 1000s of project, with a huge variety of subjects and topics such as Government, Sports, Medicine,

Fintech, Food, and much more. 5Through this platform, Omri Goldstein updated a collection of the NBA statistical data since the season of 1950, with the competition data of more than 3000

4 LEUNG, STUART: “NBA MVP Criteria: How the NBA should choose MVP based on past winners” https://www.interbasket.net/news/22344/2018/06/nba-mvp-criteria-how-nba-choose-mvp-winning-team- record/ 5 KAGGLE.COM: Your Home for Data Science https://www.kaggle.com/

9 players, over 60 seasons, and more than 50 features per player. The data on the website is free downloadable, which provides comprehensive source of data that can be utilized in further research.

The link of this project’s resource data is listed below:

https://www.kaggle.com/drgilermo/nba-players-stats#Seasons_Stats.csv

The potential variables included in the data that are likely to influence the possibilities of an MVP are listed in the chart 2.1 below.

Variable Description

PER Player Efficiency Rating

G Games

MP Minutes played

PTS Points

TS% True Shooting Percentage ( =PTS / 2 * TSA, TSA=True Shooting Attempts)

FG

FG% (=FG/FGA, FGA=Field Goal Attempts)

FT Free Throw

FT% Free Throw percentage (=FT/FTA, FTA=Free Throw Attempts)

3P 3 field goal

3P% 3 Point field goal percentage (=3P/3PA, 3PA=3 Point Field Goal Attempts)

DRB Defensive Rebounds

DRB% Defensive Rebounds Percentage

ORB Offensive Rebounds

ORB% Offensive Rebounds Percentage, =100 *

10 (ORB * (Tm MP / 5)) / (MP * (Tm ORB + Opp DRB))

TRB Total Rebounds

TRB% Total Rebounds Percentage, =100 * (TRB * (Tm MP / 5)) / (MP * (Tm TRB + Opp TRB))

AST Assists

AST% Assists Percentage Percentage, =100 * AST / (((MP / (Tm MP / 5)) * Tm FG) - FG)

TOV Turnover

TOV% Turnover Percentage

BLK Blocks

BLK% Blocks Percentage, = 100 * (BLK * (Tm MP / 5)) / (MP * (Opp FGA - Opp 3PA))

USG% Usage Percentage, the proportion of the team to the end of the ball (in the basket, the shot hit, the cap, the , the mistake, etc.). Also known as the team's tactical position.

3PAr Ratio of three point ball in each shot

FTr Ratio of times a player gets to the line to times he makes shooting

WS Win Share Chart 2.1 Glossary for variables

Descriptive Analysis

Before making the model to predict the possibilities of an MVP, I first selected some single variables and generate graphs to illustrate the relationship of MVP with that variable.

● Age range

11 The average age of MVPs and NonMVPs are nearly the same, while the age range of MVPs and NonMVPs are different. By using formulas of MAX and MIN, the maximum age and the minimum age of both MVPs and NonMVPs are generated. The age range of MVPs is from 21 to

34, with a range of 13; while the age range of NonMVPs is from 18 to 44, with a range of 26, which is twice of range of MVPs. Thus, we can include that the ages of MVPs are more concentrated, and the ages of NonMVPs are more scattered.

Age

Maximum Age Minimum Age Average Age

MVP 35 22 27.74

NON-MVP 44 18 26.79 Chart 3.1 Age for MVPs and NonMVPs

Figure 3.1 Maximum and minimum age for MVPs and NonMVPs

● PER

12 The PER (Player Efficiency Rating) refers to the general basketball ability of an NBA player, and was a sum up calculation of a player’s accomplishment. Therefore, the higher the PER, the more capable a player is. As mentioned before, even though the result of PER (Player

Efficiency Rating) does not necessarily mean the result of MVP, the difference in average PER between MVPs and NonMVPs is apparent. By using the AVERAGEIF formula, the average PERs are calculated and shown directly in the chart below. The average PER of MVPs is 26.26, while the NonMVPs’ average PER is 12.44. This contrast in PER between MVPs and NonMVPs demonstrates that the general performance of an MVP in a regular season is much better than that of a NonMVP.

PER

MVP 27.49

NON-MVP 12.43 Chart 3.2 PER for MVPs and NonMVPs

Figure 3.2 PER for MVPs and NonMVPs

13

The formula of “linearMod <- lm(MVP ~ PER, data=NBA_datacs1)” in R studio and the coefficient number, as well as the significant codes show that the PER variable is important in predicting MVP. As shown below in the figure, the significant codes with three stars means that the PER contributes tremendously to the decision of MVPs.

MVP (0, 1)

PER 0.0007867*** (Player Efficiency Rating) (0.0000547) -0.0076057*** Constant ( 0.0007686) R^2 0.04659 N 17261 Chart 3.3 PER linear Model

***p<0.01, **p<0.05, *p<0.1

Also using the formula of plot(NBA_datacs1$PER, NBA_datacs1$MVP, main = "MVP

PER", xlab = "PER", ylab = "MVP or not") in R studio, I established the scatter plot of PER for both MVPs and NonMVPs. It can be suggested from the scatter plot that the mean and median

PER of MVPs (1.0 means MVPs) is much higher than that of NonMVPs (0.0 means NonMVPs).

14

Figure 3.3 PER scatter plot

● TS% (True Shooting Percentage)

I use the “linearMod <- lm(MVP ~TSper, data=NBA_datacs1)” in R studio, of which the coefficient number and significant codes show that the True shooting percentage also has three stars, which means crucial.

MVP (0, 1)

TS% 0.019999*** (True Shooting Percentage) ( 0.003622) -0.007847*** Constant (0.001856) R^2 0.04692 N 17195 Chart 3.4 True Shooting Percentage linear model

15 ***p<0.01, **p<0.05, *p<0.1

● WS (Win shares)

Win share is a criteria that estimates the number of wins a player produces for his team.6 The

Win Share of a player can be calculated individually, or by adding his Offensive Win Shares with his Defensive Win Shares. The WS of MVPs and NonMVPs have enormous differences.

By utilizing the AVERAGEIF function in excel, the average WS for both MVPs and

NonMVPs are shown below. The data for MVPs is 13.7, while the data for NonMVPs is only

2.4, which is much lower than that of the MVPs. Thus, the positive relationship between MVPs and Win Shares is exhibited.

WS (Win Shares)

MVP 15.5

NON-MVP 2.42 Chart 3.5 Win Shares for MVPs and NonMVPs

6 SPORTINGCHARTS.COM: What is Win Shares - WS? https://www.sportingcharts.com/dictionary/nba/win-shares-ws.aspx

16

Figure 3.4 Win Shares for MVPs and NonMVPs

Moreover, using linear function “linearMod <- lm(MVP ~ WS, data=NBA_datacs1)” in R studio as a method, the coefficient number and significant codes show that the Win Shares contributes largely to the possibility of MVP.

MVP (0, 1)

WS 0.0031673*** (Win Shares) (0.0001160) -0.0055820*** Constant (0.0004508) R^2 0.04589 N 17261 Chart 3.6 Win Shares linear model

***p<0.01, **p<0.05, *p<0.1

● Other scatter plot that shows the positive relationship between two variables, including

TS% vs WS and TS% vs PER are shown in the graphs below.

17

Figure 3.5 Relationship of a player’s True shooting percentage and the Win Shares

Figure 3.6 Relationship of a player’s True shooting percentage and the PER

18 Empirical Design To create a predictive model of NBA MVP using player statistics, first, I identified variables that have some reasonable likelihood of influencing the NBA award. Next, I inputted them in a stepwise logistic logistic regression model in R studio using NBA data from 1950-51 season to 2015-16 season, and generated the coefficient number (and standard errors) of each variable. By comparing the number of each variable, I was able to distinguish the importance of each factor in determining the MVP.

To test the accuracy of this model, I then used the model to predict the NBA MVP probabilities for each player from 2006-2016, and compared the probability ranking of the actual

MVP of the year.

Results 1. The creation of the model

I selected some variables that are affecting the possibilities of MVP. I put those variables into the linear model R code and run the code in R studio. The variable with the biggest p value in the result chart is dropped, and the remaining variables are run again. The cycle is repeated until all variables have at least one star.

The final variables and the code used in the code for logistic regression model is listed below:

> linearMod <- lm(MVP ~ PER + MP + TSper + ORBper + DRBper + TRBper + TOVper + USGper + WS + OWSper_fourtyeight + Adjusted_Production, data=NBA_datacs1_4_33_55_PM) > summary(linearMod)

MVP (0, 1)

19 PER 1.295e-03** (Player Efficiency Rating) (4.801e-04)

TS% 1.437e-01*** (True Shooting Percentage) (1.682e-02)

MP 1.996e-05*** (Minutes Played) (1.222e-06)

ORB% -1.321e-02*** (Offensive Rebounds Percentage) (1.916e-03)

DRB% -1.224e-02*** (Defensive Rebounds Percentage) (1.929e-03)

TRB% 2.575e-02*** (Total Rebounds Percentage) (3.830e-03)

AST% 1.225e-04* (Assists Percentage Percentage) (5.713e-05)

TOV% -8.627e-04*** (Turnover Percentage) (1.498e-04)

USG% 1.117e-03*** (Usage Percentage) ( 2.284e-04)

WS 1.850e-03*** (Win Share) (3.126e-04)

OWS/48 2.266*** (Offensive Win Shares Per 48 minutes) (3.336e-02)

20 Adjusted Production 8.064e-15*** (1.644e-15) 3.542e-02*** Constant (8.264e-03) R^2 0.05464 N 11870 Chart 4.1 MVP prediction model

***p<0.01, **p<0.05, *p<0.1

2. Testing the model

After the creation of the logistic model, the prediction model for MVP is tested based on historical data. Using the code in R studio, which refers to

“ install.packages(readxl) install.packages(xlsx) library(readxl) library(xlsx) NBA_datacs1_4_33_55_PM <- read_excel("Desktop/NBA datacs1.xlsx") nba_data <- na.omit(NBA_datacs1) View(nba_data) step(glm(MVP ~ PER + TSper + MP + ORBper + DRBper + TRBper + TOVper + USGper + WS + Wstwo + OWStwo + Height + Adjusted_Production + TrueSalary, data=nba_data), direction="backward", family = "binomial") model<-glm(formula = MVP ~ PER + TSper + MP + ORBper + DRBper + TRBper + TOVper + USGper + WS + Wstwo + OWStwo + Height + Adjusted_Production + TrueSalary, data = nba_data) summary(model) nba_data$MVP_predict<-0 nba_data$MVP_predict<-model$fitted.values write.csv(nba_data, file = "nba_prediction.csv") ”

21 The data of each player’s possibility of earning MVP is listed in the new excel file. By sorting the possibilities of MVP each year from the largest to the smallest, the ranking of probability are shown.

The MVP owners from 2006-2007 NBA Season are listed in the chart, and their probabilities are calculated based on the model. Then the probabilities of each year’s players are ranked from the largest to the smallest, and the ranking of the MVP owner of the year in the possibilities ranking are observed, shown in Chart 4.2 below.

Season MVP Prob(MVP) based on Ranking of that model probability

2006-2007 Season Dirk Nowitzki 8.8% 1

2007-2008 Season Kobe Bryant 6.6% 3

2008-2009 Season Lebron James 11.7% 2

2009-2010 Season Lebron James 10.2% 1

2010-2011 Season Derrick Rose 6.1% 4

2011-2012 Season Lebron James 7.8% 1

2012-2013 Season Lebron James 10.5% 1

2013-2014 Season Kevin Durant 9.8% 1

2014-2015 Season Stephen Curry 9.2% 2

2015-2016 Season Stephen Curry 10.0% 1 Chart 4.2 Testing the model

Based on the chart above, except for Kobe Bryant in 2007-2008, Derrick Rose in 2010-

2011, other MVP’s result is mostly the first place in probability ranking of that season, which means the success of the prediction model to some extent. The exception of Kobe Bryant and

22 Derrick Rose and the estimated reasons for them to win the MVP award are analyzed below in the discussion part.

Discussion There are some surprising results of variables that emerged during the research process.

The height of players, which is the variable I originally included but then excluded based on the high P value, seems to be not decisive in the possibilities of MVP. Instead, the initial coefficient number of height (before I excluded it) is negative. Moreover, the Offensive Rebounds Percentage and Defensive Rebounds Percentage also have negative coefficients, yet they are still crucial in determining the MVP. I was confused about these results at first, but I then found the answer by searching for the players’ positions on the floor.

There are five positions on the floor, including (PG), (SG), (SF), power forward (PF), and (C). The Center (C), also known as the five, or the big man, is usually the tallest player of a team. In the NBA, the center is usually 6 feet 10 inches (2.08 m) or taller and usually weighs 240 pounds (110 kg) or more.7 The most crucial work for a center is mainly getting rebounds, blocking, and having perfect leaping ability. Other important abilities for the center includes endurance and shooting ability. The Power Forward

(PF)’s most important ability is fast speed, defensive ability, getting rebounds, and jockey for position when getting rebounds. While for the Small Forward (SF), shooting, leaping ability, steals, three points, and rebounds are the most important. The obligations for the two positions in the backcourt are different from the other three. For the Point Guard (PG), they need to obtain the

7 WIKIPEDIA.ORG: Center (basketball) https://en.wikipedia.org/wiki/Center_(basketball)

23 abilities of dribbling, passing, steals, shooting, speed, endurance, three points, and organizing the attack. While for the Shooting Guard (SG), the ability of shooting free throws and three points, speed, leaping ability and dribbling and passing the ball, as well as stealing the ball, are the most important.

Based on the different jobs of different positions, we can clearly summarize that the Centers usually have the highest height, and nearly the most contribution in getting offensive and defensive rebounds. And that is what I believe as a reason that the height, ORB%, and DRB% are usually negative influencing the MVP. In the center position, after entering the 21st century, only Shaquille

O'Neal won the MVP in 1999-2000 season. Even though O’Neal is one of the best players in NBA history, he only won the award once. Apart from the fierce competition during that time period, it is also because of an indisputable fact that the center began to decline after entering the 21st century, as the alliance momentum has proved. However, the TRB% (Total Rebounds Percentage) is positively affecting the MVP, which is because the overall performance in taking rebounds does show the ability of a player as the Center, the Power Forward, and the Small Forward, while only advanced in single type of rebounds (ORB or DRB) does not indicate the strong ability and great performance.

Therefore, the reason for the surprising results of height, ORB%, and DRB% is probably because of the declining of the Center, who usually contributes a lot in the average height, as well as the data of ORB% and DRB%.

Moreover, there is another variable that I did not include in the model but might be important in some cases is the Years experience, which refers to the time length a player has been in the NBA. The years a player has been in the NBA is not the determining factor in the determination, since long time and intricate experience do not necessarily lead to a better

24 performance. However, the year experience can be very useful when the MVP is choosing between two or more well-performed players.

This observation is derived from an exception in the process of testing the model, the argumentative choice of MVP in 2007-2008 Season, between Kobe Bryant and .

In 2007-08 Season, Kobe Bryant led the Lakers to the first place in the West with 57-25, while Chris Paul's Hornet came second with 56-26. 8The two teams are nearly no different in terms of team record. The basic data of both Kobe Bryant and Chris Paul has its own advantage. Bryant is more focused on scoring, and Paul is more focused on organization. Kobe Bryant, as the leader of the team, maintained a consistent excellent performance. While Chris Paul, as an indispensable core of the team, is the organizer and the initiator of the attack. CP3 was also the champion of steals and assists of that season, with a slight advantage in high-ranking data (WS of 17.8 versus

Kobe Bryant’s 13.8, PER of 28.3 and Kobe Bryant with 24.2).9

However, Kobe Bryant won 82 first votes in the 2008 regular season MVP vote, while Paul had only 28 votes. 10The reason was mainly because of the difference in experience. Kobe's strength and power was considered to be the first of the league, with a striking personal score in the previous two seasons, and was selected in the first squad and All-star many times. The championship and the All-Star MVP have long been, with the only lack of an MVP. The outside world is also looking forward to his first MVP. Chris Paul, however, was totally different. It was his third season in 2008, before that he had never been in the best lineup, had never been an All-

8 BASKETBALL-REFERENCE.COM: 2007-08 NBA Season Summary https://www.basketball- reference.com/leagues/NBA_2008.html 9 LOB CITY: “Why Chris Paul Got Robbed of MVP in 2008” https://aminoapps.com/c/nba/page/blog/why-chris-paul-got-robbed-of-mvp-in- 2008/vdJZ_nQQFnullk42m8PXRXYJ2VLxLnnjYg 10 BASKETBALL-REFERENCE.COM: 2007-2008 NBA Awards Voting https://www.basketball- reference.com/awards/awards_2008.html

25 Star, had never been the top 15 of the regular MVP, and had never even played in the playoffs. As a result, the gap between the two's qualifications and the outside impression of the two were given to the MVP by the MVP.

Another exception of the prediction is Derrick Rose, who won the MVP in 2010-2011

Season and is the youngest MVP in NBA history. If Rose had not won the MVP that season,

Lebron James would probably get it instead, and therefore be the first NBA player to win the MVP for a successive five years. Judging from the data alone, Derrick Rose got an average 25 score and 7 rebounds per game in his MVP season, with a shooting rate of only 44 per cent and a three- point shooting rate of only 33 per cent, which still had a huge gap from Lebron James in 2010-11 season.11 His average score per game was only in seventh place, and the data of assists is also in the tenth place. Only looking from this aspect, Rose didn’t seem to deserve MVP. However, the significance of Ross to the team that season is almost unmatched in the league. Although the Bulls seem to have a good lineup, they are a team good in defense but weak in offense, and Rose is almost alone in supporting the offensive system of the whole team. Therefore, although the data is not very impressive, Rose’s performance in leading the team, his visual impact, and his significance to the team made Ross get the MVP.

Admittedly, the limitations occur in this study. First, there might be a certain amount of errors in the data analysis, because only one player each regular season won the MVP award, which

I marked it with a “1” for MVP variable and a “0” for Non-MVP, while there are many more Non-

MVPs each year, which might contribute to the uncertainty in the final results. Moreover, the criteria for MVP might have changed over time. All players from different time period are treated

11 BASKETBALL-REFERENCE.COM: Derrick Rose Stats https://www.basketball- reference.com/players/r/rosede01.html

26 equally, and there is only one model for all the players, while the criteria for choosing the MVPs might have changed over time.

There are many steps to do for the future research. The prediction of other data, such as

NBA 2k score, can be needed to establish a better understanding of NBA players’ data and its meaning, which can also help to perfect the prediction of MVP. I am also planning to apply different methods of regression. The stepwise regression is mainly used in this research, but perhaps a classification method in machine learning could be effective. Moreover, analysis of data from 2016-2017 NBA Season as well as the seasons after 2017 is also required to test the prediction model more precisely. Last but not the least, the possibilities of MVP in the future can be calculated based on up-to-date data of each player, and the final choice for MVP can be predicted and then tested.

Conclusion The MVP (Most Valuable Player) award has been one of the most popular discussion topics in the NBA. Each year, thousands of sportswriters, media reporters, and basketball fans have huge interests in the final decision and also make some predictions toward the determination of the Most Valuable Player.

In this essay, I chose some variables that influence the possibilities of MVP based on previous investigations, developed the relationships between MVP and different variables, created a stepwise logistic regression model of predicting the Most Valuable Player in the NBA based on the players’ data of each regular season, and tested the model based on historical data and MVP results.

27 There are many factors in the criteria of the choice of the Most Valuable Player, including player efficiency ratings, true shooting percentage, win shares, win shares/48 min, assists, rebounds, fouls, turnovers, blocks, adjusted production, free throw percentages, experience, and much more. Among them, the PER (Player Efficiency Ratings), the TS% (True Shooting

Percentage), the MP (Minutes Played), the TRB% (Total Rebounds Percentage ), the AST%

(Assists Percentage Percentage), USG% (Usage Percentage), the WS (Win Share), the OWS/48

(Offensive Win Shares Per 48 minutes), and the Adjusted Production are positively influencing the possibility of MVP. In contrast, the ORB% (Offensive Rebounds Percentage), the DRB%

(Defensive Rebounds Percentage), and the TOV% (Turnover Percentage) have negative relationships with the MVP.

The model and its possibility ranking are quite precise, since it provides with accurate data that is suitable with most of the historical data. Limitations including lack of more comparison with other models and shortage of precise testing, require further research. Hopefully additional research will expand on my work to create an even more accurate predictive model that, among other things, will help uncover the most important factors in being awarded NBA MVP.

28 References 1. HOLLINGER, JOHN: “What is PER?”

http://sports.espn.go.com/nba/columns/story?columnist=hollinger_john&id=2850240

2. FEIN, ZACH: Calculating PER. http://www.basketball-reference.com/about/per.html

3. WIKIPEDIA.ORG: NBA Most Valuable Player Award.

https://en.wikipedia.org/wiki/NBA_Most_Valuable_Player_Award

4. BASKETBALL-REFERENCE.COM: NBA League Index. http://www.basketball-

reference.com/leagues/

5. ESPN.COM: Hollinger NBA Player Statistics - All Players.

http://insider.espn.com/nba/hollinger/statistics/_/year/2011

6. BASKETBALL-REFERENCE.COM: NBA MVP & ABA Most Valuable Player Award

Winners https://www.basketball-reference.com/awards/mvp.html

7. LEUNG, STUART: “NBA MVP Criteria: How the NBA should choose MVP based on

past winners” https://www.interbasket.net/news/22344/2018/06/nba-mvp-criteria-how-

nba-choose-mvp-winning-team-record/

8. NBA.COM: Blogtable: What criteria matters most in making your MVP decision?

https://www.nba.com/article/2017/04/12/blogtable-what-criteria-matters-most-making-

mvp-decision

9. LI, PETER: “NBA MVP Prediction Model: Calculating Value based on Team and

Individual Success” https://towardsdatascience.com/nba-mvp-predictor-c700e50e0917

10. SPORTINGCHARTS.COM: What is Win Shares - WS?

https://www.sportingcharts.com/dictionary/nba/win-shares-ws.aspx

29 11. BASKETBALL-REFERENCE.COM: NBA Win shares https://www.basketball-

reference.com/about/ws.html

12. BASKETBALL-REFERENCE.COM: Glossary https://www.basketball-

reference.com/about/glossary.html#fga

13. GOLDSTEIN, OMRI: “NBA Players stats since 1950: 3000+ Players over 60+ Seasons,

and 50+ features per player” https://www.kaggle.com/drgilermo/nba-players-

stats#Seasons_Stats.csv

14. WIKIPEDIA.ORG: Center (basketball) https://en.wikipedia.org/wiki/Center_(basketball)

15. WIKIPEDIA.ORG:

https://en.m.wikipedia.org/wiki/Basketball_positions

16. BASKETBALL-REFERENCE.COM: Derrick Rose Stats https://www.basketball-

reference.com/players/r/rosede01.html

17. BASKETBALL-REFERENCE.COM: 2007-08 NBA Season Summary

https://www.basketball-reference.com/leagues/NBA_2008.html

18. LOB CITY: “Why Chris Paul Got Robbed of MVP in 2008”

https://aminoapps.com/c/nba/page/blog/why-chris-paul-got-robbed-of-mvp-in-

2008/vdJZ_nQQFnullk42m8PXRXYJ2VLxLnnjYg

19. BASKETBALL-REFERENCE.COM: 2007-08 NBA Awards Voting

https://www.basketball-reference.com/awards/awards_2008.html

20. BUI, ETHAN: “NBA MVP 2008: Kobe Bryant vs. LeBron James, Chris Paul”

https://bleacherreport.com/articles/22122-nba-mvp-2008-kobe-bryant-vs-lebron-james-

chris-paul

30 Acknowledgement I would like to express my heartfelt thanks to all those who helped me in writing this paper.

I sincerely acknowledge the help of my supervisor, Sisi Wang, for her useful advice on my topic, as well as her help in instructing the generation of tables and graphs in this essay.

The source and research background of the selected topic is based on my own interest. I am a big fan of basketball and NBA. I have been watching the NBA for four years, and my favorite team is the . I was completely interested in NBA games and the

MVP award when Stephen Curry suddenly stepped out, led the Golden State Warriors to set the regular season record as 73 wins, and won the Most Valuable Player Award by a unanimous vote. As seeing Westbrook, Harden, and Antetokounmpo won the MVP award, I started to pay more and more attention on the prediction of MVP. I always have the idea of using mathematical methods to create a model that predicts the possibilities of the MVPs, which later developed as the topic of this essay.

31