All code for analysis and data Twitter: @aabsblog fetching/preprocessing: Email: [email protected] Player Compatibility in the NBA https://hwchase17.github.io/sports/ By Harrison Chase (Kensho Technologies) Motivation Existing Graphical Models • Single number metrics (Plus-Minus variants) do not account for player compatibility • General ridge regression set up does not allow for easy estimation of interaction terms between • Fig 1 shows an abstraction of the typical players due to the sheer number of interaction terms (and corresponding increase in degrees of network most graphical models use freedom) that would have to be introduced. • The area boxed in red shows the part of • Graphical methods rarely attempt to account for a player’s influence on teammates the model I focus on: scoring. I focus on • Closest: Joseph Kuehn, Sloan Sports Conference 2016, Accounting for Complementary Skill Sets When this as I hypothesize compatibility is Evaluating NBA Players’ Values to a Specific Team Fig 1 most strong here. • Even the above player does not estimate player specific effects.

I model � �� ��) = ∑ � �� ��, ��) � �� ��) Terminology • � = ������ ������ �� ���������� � • � = �ℎ����� �� ���������� � • �, �, �, � = ��������� ��������� �� ���������� � • �, �, �, � , � = ���������� ��������� �� ���������� � • Let � = �, �, �, �, �, �, �, �, � , � • Want to model �(�|�) • Divide court into n locations, �,…., � • Can also add position dummies for players on court • {Guard, Forward, } • Caveat: do not model fouls at all

Estimate: Inference • Player’s {tendency to shoot, ability to score} from court locations • Player’s influence on teammates’ {tendency to shoot, ability to score} from court locations Offense (�) • Player’s influence on opponents’ {tendency to shoot, ability to score} from court locations Center • Position averages for all the above (See visualizations to right) Guard Forward Novel Features Shooting (�) Location Guard Forward Center “A player’s influence • Allows for independent on his teammates’ estimates of a players tendency tendency to shoot Location to shoot from a location & their from a given “A player’s ability to score from there location” • More granular court locations tendency to than usual shoot from a • Inclusion of position dummies, given location” Shooting % and estimation of their “A player’s influence coefficients on his teammates’ ability to score from Shooting % a given location” Metrics “A player’s ability to score from a Parameters: given location” • Season: 2017-18 Defense (�) • Train set is first 1100 games (189K shots) Guard Forward Center • Test set is last 130 games (23K shots) Location “A player’s influence on • Future Work For the purposes of equal comparison, all models are compared his opponents’ • using multiclass log loss. Utilize x,y coordinates of shot tendency to shoot from • The classes which the loss is calculated on are the most abstract • Combine location a given location” outcomes of a shot: tendencies/shooting ability into • Zero points scored one visualization • Two points scored • Model who will shoot • Three points scored Shooting % • When the modeling more granular outcomes than just these • Combine in larger graphical model “A player’s influence on three, those are aggregated up to these three classes his opponents’ ability to score from a given 0-PT 2-PT 3-PT AVG location” Model Features Model Target/s logloss logloss logloss logloss Miss None (Baseline) Made 2 pointer For estimate use train set means Made 3 pointer 0.6896 0.6393 0.3683 0.5657 Conditional on knowing the shooter: Case Study: LeBron James 0-PT 2-PT 3-PT AVG Model Features Model Target/s logloss logloss logloss logloss Which players would help LeBron score the most efficiently (controlling for position)? Location Shooting Location Shooting % Shooter dummy (multinomial, n=7) “Shot Model” – the Top Players: Bottom Players: Offensive teammate dummies For each location fit focus of this poster • DeMarcus Cousins • Joel Bolomboy Defensive opponents dummies separate make/miss • • Positional dummies classifier 0.6856 0.6144 0.3427 0.5475 • Jack Cooley • • Shooter dummy Marcus Georges-Hunt Can you just model • • Andre Ingram Offensive teammate dummies Miss/Made 2 pts/ Defensive opponents dummies Multinomial – Miss, Made 3 pts directly? Positional dummies Made 2pt, Made 3pt 0.6931 0.6201 0.3489 0.554 Location (multinomial, n=7) Do player position Shooter dummy For each location fit dummies help? Offensive teammate dummies separate make/miss Defensive opponents dummies classifier 0.6859 0.6155 0.3435 0.5483 Which players would LeBron help score the most efficiently (controlling for position)? Not conditional on knowing the shooter: Offensive Offensive 0-PT 2-PT 3-PT AVG Top Players: Location Shooting % Bottom Players: Model Features Model Target/s logloss logloss logloss logloss • DeAndre Liggins • To estimate P of each Location • Wesley Johnson • Andre Roberson player shooting uses Shooter dummy, offensive (multinomial, n=7), • Arron Afflalo • DeAndre Jordan naïve 20%, from there teammate dummies, defensive for each location fit • Rodney McGruder • Salah Mejri uses shot model opponents dummies, positional separate make/miss • dummies classifier 0.6893 0.6386 0.367 0.565 Offensive player dummies, Models outcome directly defensive opponents dummies, Multinomial – Miss, (worse than positional dummies Made 2pt, Made 3pt 0.692 0.6395 0.3682 0.5666 baseline!!!???)