Predicting an MVP Brian King, Derek Zhang, Juleen Graham, Erin Henning, Ryan Haney How Is an MVP Selected?
Total Page:16
File Type:pdf, Size:1020Kb
Predicting an MVP Brian King, Derek Zhang, Juleen Graham, Erin Henning, Ryan Haney How is an MVP selected? ◼ From 1979-1995, NBA players voted for the MVP ◼ 1995-2010, votes strictly from a panel of sportswriters and broadcasters - Votes from US and CA, each of whom casted a vote for 1st through 5th place selections ◼ 2010- One ballot is cast by fan votes from online https://en.wikipedia. org/wiki/NBA_Most_Valuable_Player_Aw ard) Trends? ◼ What caused a change in trend from Centers/Forwards to Guards/Forwards? Questions ◼ What are the most important statistical criteria for choosing an MVP? ◼ Can we create a model to predict the probability of an individual winning the MVP award? Procedures ◼ Data from the 1991-1992 season to 2015-2016 - Top 150 players for each season that had the most playing time ◼ Logistic Regression Model ◼ Used the data from 1991-1992 to 2012-2013 seasons to fit the model ◼ Predicted on 2013-2014 to 2015-2016 ◼ Compare “order” of prediction to true voting order The Logistic Regression Model Where Xi = Predictor variable Assumptions ● Binary Response variable (MVP or not) ● Continuous, Independent Explanatory variables The Variables ◼ Points Per Game ◼ Blocks, Steals, Assists, Rebounds ◼ Effective Field Goal Percentage ◼ Position ◼ Personal Fouls, Age, Minutes Played, Turnovers, ... 2013-14 Season: All Stats Prediction Actual MVP: Kevin Love MVP: Kevin Durant 2nd: LeBron James 2nd: LeBron James 3rd: Kevin Durant 3rd: Blake Griffin 4th: Stephen Curry 4th: Joakim Noah 5th:LaMarcus Aldridge 5th: James Harden Blake Griffin: 12th Kevin Love: 11th Joakim Noah: 31st Stephen Curry: 6th James Harden: 8th LaMarcus Aldridge: 10th 2013-14 Season: MVStats Prediction Actual MVP: Kevin Durant MVP: Kevin Durant 2nd: LeBron James 2nd: LeBron James 3rd: Kevin Love 3rd: Blake Griffin 4th: Stephen Curry 4th: Joakim Noah 5th: Chris Paul 5th: James Harden Blake Griffin: 7th Kevin Love: 11th Joakim Noah: 23rd Stephen Curry: 6th James Harden: 9th Chris Paul: 7th 2014-15 Season Prediction Actual MVP: Russell Westbrook MVP: Stephen Curry 2nd: LeBron James 2nd: James Harden 3rd: Chris Paul 3rd: LeBron James 4th: James Harden 4th: Russell Westbrook 5th: Stephen Curry 5th: Anthony Davis Anthony Davis: 10th Chris Paul: 6th 2015-16 Season Prediction Actual MVP: Stephen Curry MVP: Stephen Curry 2nd: Russell Westbrook 2nd: Kawhi Leonard 3rd: LeBron James 3rd: LeBron James 4th: Kevin Durant 4th: Russell Westbrook 5th: James Harden 5th: Kevin Durant Kawhi Leonard: 26th James Harden: 9th Random Forests ◼ Decision Tree Learning ◼ Bootstrap Aggregating ◼ Random Subspace Method Decision Tree Learning Points Per Game Pts<x Algorithm chooses variable at each step that best splits Pts>x the data into successes and 000 000 failures 000 AssistsAssists PerPer GameGame Assists<x Assists>x 000 100 ReboundsRebounds Per 001 PerGame Game 2 Bootstrap Aggregating ◼ random forest consists of b= 1, …, B randomized tree models ◼ each model (tree) is built with a bootstrap sample of the original data (sample of the original data of same size with replacement) ◼ training many trees on the same data set leads to problems (possibly recreating the same tree) ◼ averaging the predictions from all the individual regression trees leads to better performance Random Forest Interpretation ◼ samples not included in any given bootstrap sample are called “out-of-bag” samples ◼ %IncMSE “=” how much worse the predictions are when a permuted version of the variable is used instead of the true values ◼ Build tree, make predictions using “real” data values, record the error (MSE) of this ◼ Permute values of variable in the out-of-bag sample, re-do predictions, recompute MSE ◼ %IncMSE is how much the error increases for the permuted samples vs the true samples Most Important MVP Variables According to the Random According to Logistic Forest method: Regression: % Increase MSE Z-score (absolute value) PPG 0.0020350589 PPG 5.167 APG 0.0010963324 RPG 3.395 MPG 0.0010791207 PFPG 3.08 SPG 0.0010197867 APG 2.58 PFPG 0.0007518895 Age 2.291 TPG 0.0007515411 eFG. 2.104 eFG. 0.0007482838 BPG 1.566 BPG 0.0006166867 POS 1.459 Age 0.0002998612 MPG 0.62 RPG 0.0001863932 TPG 0.314 POS 0.0001672975 SPG 0.128 Drawbacks to our models ◼ Only one MVP can be crowned every year ◼ Predictions using our models assume that the response variable (MVP or not) is independent between players ◼ As a result, all probabilities do not sum to 1 ◼ Our models can rank players in likelihood of winning MVP, but cannot give explicit probabilities Conclusions ◼ The most important variables are: ◼ Points Per Game ◼ Assists Per Game ◼ Rebounds Per Game ◼ The least important variables include: ◼ Blocks Per Game ◼ Steals Per Game ◼ The problem with defensive production ◼ MVP Voting: Stat-Driven, but not completely ◼ Steve Nash, 2005 Future Work ◼ Further research into possible interaction between variables ◼ Better interpretability of logistic regression predictions ◼ Impact of team on MVP prospects ◼ Change in MVP selection criteria over the years ◼ Changes in rules over the years ◼ Growing data set and possible outcomes Thanks! Questions?.