NBA MVP Prediction Model
Total Page:16
File Type:pdf, Size:1020Kb
NBA MVP Prediction Model Tongan Wu 1 Abstract The NBA Most Valuable Player (MVP) award has been awarded since the 1955-56 NBA season, with one player with the most perfect behaviors each regular season. The determination of MVP shifted from being voted by NBA players to being decided by sportswriters and reporters of the United States and Canada in 1981. For decades, the final choice of MVP has been one of the most popular discussion topics in the NBA league as well as among all the sportswriters, media outlets, and basketball fans. This interest has also evoked many predictions toward the determination of the Most Valuable Player. Based on previous investigation and articles, there are many factors affecting who is awarded the MVP award, including average points per game, true shooting percentage, win shares, win shares/48 min, assists, rebounds, fouls, turnovers, blocks, adjusted production, free throw percentages, and much more. In this research paper, I investigate the question: which factors predict who is awarded the NBA MVP., To do so, I conducted background research, developed graphs of different variables and how they relate to the MVP award, created a stepwise logistic regression model to predict the Most Valuable Player in the NBA based on the players’ data of each regular season, and tested my model based on previous data and MVP results. I find that the factors such as the PER (Player Efficiency Ratings), the TS% (True Shooting Percentage), the TRB% (Total Rebounds Percentage ), the AST% (Assists Percentage Percentage), USG% (Usage Percentage), the WS (Win Share), and the Adjusted Production have positive relationship with MVP, while the ORB% (Offensive Rebounds Percentage), the DRB% (Defensive Rebounds Percentage), and the TOV% (Turnover Percentage) are in opposite. 2 Finally, some unexpected results are discussed after the prediction and the testing process, as well as some limitations and future plans. Key Words: NBA, MVP prediction, logistic regression model, possibilities, variables, code 3 Contents Abstract 2 Contents 5 Introduction 6 Data 10 Descriptive Analysis 12 Empirical Design 20 Results 20 Discussion 24 Conclusion 28 References 30 Acknowledgement 32 4 Introduction On June 25th, 2019, Giannis Antetokounmpo was selected as the Most Valuable Player (MVP) for 2018-2019 NBA regular season. The NBA has been awarding the Most Valuable Player Award (MVP) each year since the 1955-1956 season, with the purpose of acknowledging the player with the best performance of the regular season. Chart 1.1 MVP in history The Most Valuable Player Award was first presented by the National Basketball Association in 1955, the winner of which receives The Maurice Podoloff NBA Basketball Sports 5 Trophy, also known as the NBA MVP Trophy, named in honor of the first commissioner and president of the NBA. The MVP award was selected by NBA players through voting before the season of 1980-1981. Since then, it has been decided by a group of sportswriters and broadcasters from North America. 1 Choosing the MVP and the final decision of MVP has been discussed for years. People compare the players’ statistics and make predictions as the regular seasons go on, and they eagerly wait for the results after the season is over. Interest for ‘nba mvp’ peaks each year in June when the award is announced. Source: Google Trends Previously, I believed that the MVP award was only based on the ability of the player. Among the methods that already exist currently, one of the most effective and powerful methods to determine the general basketball ability of an NBA player, which is derived from the basic statistics of the player, is the Player Efficiency Rating (PER) developed by John Hollinger from 1 WIKIPEDIA.ORG: NBA Most Valuable Player Award. https://en.wikipedia.org/wiki/NBA_Most_Valuable_Player_Award 6 ESPN. According to Hollinger, “The Player Efficiency Rating (PER) sums up all a player's positive accomplishments, subtracts the negative accomplishments, and returns a per-minute rating of a player's performance."2 The league average PER is normalized at 15.00 every season. The higher the PER of a player is, the better he generally performed in that season. By comparing the relationship between the PER ranking list and the winner of MVP each season, I found a somewhat consistent correlation. Shown in the Chart 1.2 below, the MVP winner had the highest PER of the season for most of the time. Season MVP winner Player with 1st PER The MVP’s PER Ranking 2006-2007 Dirk Nowitzki Dwayne Wade 2 2007-2008 Kobe Bryant Lebron James 7 2008-2009 Lebron James Lebron James 1 2009-2010 Lebron James Lebron James 1 2010-2011 Derrick Rose Lebron James 9 2011-2012 Lebron James Lebron James 1 2012-2013 Lebron James Lebron James 1 2013-2014 Kevin Durant Kevin Durant 1 2014-2015 Stephen Curry Anthony Davis 3 2015-2016 Stephen Curry Stephen Curry 1 2016-2017 Russell Westbrook Russell Westbrook 1 2017-2018 James Harden James Harden 1 2018-2019 Giannis Giannis 1 Antetokounmpo Antetokounmpo Chart 1.2 The relationship of MVP and PER ranking 2 HOLLINGER, JOHN: “What is PER?” http://sports.espn.go.com/nba/columns/story?columnist=hollinger_john&id=2850240 7 However, it can be inferred that the ranking of PER of the season doesn’t necessarily determine the winner of the MVP award. Therefore, a more precise model is needed that take into account more variables This research paper seeks to create an accurate NBA MVP prediction model that, at the very least, uncovers which variables are most predictive. Among all the factors, the most important factor found in the literature is the winning factor of a team (i.e. ‘win%’)3. Among the 65 NBA MVPs from the 1955-1956 season to the 2018-2019 season, 41 of them were on the team that had the best regular season record, and 23 of them won the championship during their year of winning the MVP. Only 6 MVPs belonged to the team of season win% below .60, and more than two thirds of the MVPs’ teams had the win% more than .70. The two lowest win% among the 65 seasons were Bob Pettit’s team win% of .458 in 1956 and Kareem Abdul-Jabbar’s team win% of .488 in 1976. While the two highest win% during the past 65 years were Stephen Curry’s .890 in 2016, with the record of 73-9, and Michael Jordan’s .878 in 1996. Over the past thirty seasons, 29 of the MVPs were played for the team with the win % of more than .65, except for Russel Westbrook with the win% of .573. However, winning is not the only factor that is influencing the MVP. Based on the research done previously, there are many factors and criteria that contributes to the decision of MVP, including team wins, games played, minutes played, points per game, true shooting%, free-throw, three-points, assists, rebounds, blocks, teals, fouls, turnovers, offensive rating, defensive rating, and a player’s contribution to the offense, defense, and overall result of each game. My model also suggests that the win share, the offensive win share per 48 minutes, and the adjusted production are some of the most important variables in determining the NBA MVP. 3 LEUNG, STUART: “NBA MVP Criteria: How the NBA should choose MVP based on past winners” https://www.interbasket.net/news/22344/2018/06/nba-mvp-criteria-how-nba-choose-mvp-winning-team- record/ 8 In order to create my analysis, I first did some background research by reading articles and essays, as well as searching on the Internet. Secondly, I investigated and summed up the variables that influence the possibilities of an MVP. Then, I searched the NBA dataset and download it into Excel. Before I created the model, I developed graphs of different variables that can show their relationship. By doing the data analysis, I can get more background information of these factors and also have a deeper understanding of each factor’s function. After that, I input the MVP data into the chart in Excel, with ‘1’ as MVP, and ‘0’ as NonMVP. My analysis builds on the literature by broadening the NBA statistics considered in predicting the NBA MVP using a dataset with over 50 variables.4 I used data from 67 seasons and create a model predicting the possibilities for a player to win the MVP based on his statistics. Creating initial predictive model for MVP is crucial since a precise model can help NBA players, commentators, and fans. For example, it could help those who choose MVP think more about the criteria they use. Data Sources of data Kaggle is an online sharing platform that includes download-open datasets on 1000s of project, with a huge variety of subjects and topics such as Government, Sports, Medicine, Fintech, Food, and much more. 5Through this platform, Omri Goldstein updated a collection of the NBA statistical data since the season of 1950, with the competition data of more than 3000 4 LEUNG, STUART: “NBA MVP Criteria: How the NBA should choose MVP based on past winners” https://www.interbasket.net/news/22344/2018/06/nba-mvp-criteria-how-nba-choose-mvp-winning-team- record/ 5 KAGGLE.COM: Your Home for Data Science https://www.kaggle.com/ 9 players, over 60 seasons, and more than 50 features per player. The data on the website is free downloadable, which provides comprehensive source of data that can be utilized in further research. The link of this project’s resource data is listed below: https://www.kaggle.com/drgilermo/nba-players-stats#Seasons_Stats.csv The potential variables included in the data that are likely to influence the possibilities of an MVP are listed in the chart 2.1 below.