Quantifying Offensive Player Types in the NBA with Non-Negative Matrix
Total Page:16
File Type:pdf, Size:1020Kb
Quantifying Offensive Player Types in the NBA Andrew Miller1, Harvard XY Hoops, Ryan Adams2 Harvard University NESSIS 2013 September 23, 2013 1 http://people.seas.harvard.edu/~acm/ 2 http://people.seas.harvard.edu/~rpa/ Data [ Animation ] Data available: I (x; y) for all 10 players (and 3 refs) at 25 frames per second I (x; y; z) for ball at 25 frames per second I event annotations (made shot, missed shot, pass, dribble, etc) Motivation What is the probability of Steph Curry ... I making a corner three? I making a corner three against the Celtics? I taking a corner three against the Celtics? I taking a corner three while being guarded? Data analysis overview 1. Select data, expressive representation 2. Iterate: I Choose estimand, specify model I e.g. defense effects on spatial efficiency/frequency? I Fit model I Check model, interpret parameters, make predictions Motivation Practical concerns: I Data volume: I 515 games from 2012-2013 season I about 72k moments per game I over 23 dimensions per moment I ≈ 800 million space-time data points I Representation: I How can millions of spatio-temporal data points be used to characterize an athlete? Representing Players: shot moments LeBron James (315 shots) Stephen Curry (940 shots) ●● ● ● ● ● ●● ● ●●● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ●● ● ●●●● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ●● ● ● ● ●● ●●● ● ●●● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●●● ●●● ● ●●● ● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ●● ●●●● ●●●● ● ●● ●●●● ● ● ●● ●● ● ●● ● ● ●● ●●●●●● ● ● ● ●●●● ●● ● ● ● ● ●●● ●●● ● ● ● ● ●● ● ● ●● ●●●●●●● ● ● ● ●● ●●● ● ●● ● ●● ● ●● ●●●●● ● ● ● ● ●●● ●●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●●●●●●●● ●●●●● ● ● ●●● ● ●●●● ● ● ●●● ● ● ● ●● ● ●●●●● ● ●●● ● ●● ● ● ● ●● ● ● ●● ●● ●● ● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ●●●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●●● ●●●●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●●● ●●●● ● ● ● ● ● ● ● ● Representing Players: shot moments Steve Novak (290 shots) Brook Lopez (217 shots) ● ● ●●●● ● ●● ● ● ● ●●● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●●●●● ●●● ● ● ●●●●●● ● ● ● ● ● ● ● ●●●●●● ●●● ● ● ●● ●● ●● ● ● ●● ●● ● ● ● ● ● ●●● ● ●●● ● ● ●● ● ● ●● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ●●● ● ● ● ● ● ●● ●● ●● ● ● Representing Players: shot moments James Harden (965 shots) Tony Parker (692 shots) ● ● ● ● ● ●●● ●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ●●● ● ● ● ● ● ●●●● ● ● ●●●● ● ● ●● ● ● ●● ● ●●●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●●● ● ● ●● ●●● ● ● ●●●● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ●●●●●● ● ● ● ● ●● ● ● ● ● ●● ●● ●●●●●●● ●●●● ●●● ●● ● ● ●● ●● ●● ● ● ●●●● ●● ● ●● ●● ● ●●●●●●●●●● ●● ● ● ● ● ●●●●● ● ●●●● ●● ● ●●●●●●●● ● ● ● ● ●●●●● ● ● ● ●● ● ● ● ●●●●●●●●●●●● ● ●● ● ● ●● ● ● ●● ● ●● ●●●●●●●●●●●●● ●● ● ●●●● ●●● ●● ● ● ●●● ●● ● ●●●●●●●●●●●● ● ● ● ● ●● ●●● ●● ● ●●●●●●●●●●●●●●● ● ● ● ●● ● ●●●●●●●●●●● ● ● ● ● ●●●●●●●●●● ●● ● ● ● ●●●● ●●●●●●●●● ● ● ●●●●●●●● ●●● ●● ● ●●●● ● ● ●● ● ● ●●● ●●● ● ● ●●●● ●●● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●●● ●● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●● ● ●●● ●● ● ● ●●● ● ● Representing Players I For each player, n, model shots (x1;:::; xM ) as a point process I Fit intensity surface over the court zn(·) ∼ GP(0; K) smooth process on court λn(·) ∼ exp(zn(·)) intensity surface xn;1;:::; xn;M ∼ PP(λn(·)) player n's shots where GP(0; K) indicates a Gaussian process with covariance function K. Representing Players: LGCP I Discretize shot counts over V tiles LeBron James shot chart LeBron James shot grid ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ●●●● ● ● ●●● ●●●● ● ●●● ● ● 6 ●● ● ● ● ● ●●●●● ● ● ● ●●●●● ● ● ● ● ● ●● ●●● ●●● ● ● ● ●●●●●●● ● ● ●● ● ●● ● ●●● ● ●●● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● 2 ● ● ●● ● ● ●● ● 0 Representing Players: LGCP I Fit (approximate) LGCP to smooth court, λn LeBron James shot chart LeBron James intensity surface (posterior mean) ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ●●●● ● 3 ● ●●● ●●●● ● ●●● ● ● ●● ● ● ● ● ●●●●● ● ● ● ●●●●● ● ● ● ● ● ●● ●●● ●●● ● ● ● ●●●●●●● ● ● ●● ● ●● ● ●●● ● ●●● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● 1 ● ● ●● ● ● ●● ● Representing Players: LGCP I Fit (approximate) LGCP to smooth court, λn Nate Robinson shot chart Nate Robinson intensity surface (posterior mean) 2.0 ● ● ● ● ● ● ● ● ● 1.5 ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● 1.0 ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0.5 ● ● ● ● ● ● ● ● ● ● Methods I Each player LGCP is V -dimensional vector (∼ 1000) I All players as matrix T Λ = [λ¯1;:::; λ¯N ] ¯ P ¯ where all elements of λi > 0, and v λi (v) = 1 I Goal: find a reduced space to capture offensive player `types' Nonnegative Matrix Factorization Story: Each of the N players is some combination of K different `shot types' (basis surfaces): Λ ≈ WB (1) low rank approximation: B K N Λ ≈ N W V V K where N×K K×V W 2 R+ and B 2 R+ Optimize: W^ ; B^ = arg min `(Λ; WB) (2) W;B>0 Nonnegative Matrix Factorization Interpretation: player n's shooting K X λ^n = Wnk Bk k=1 I Bk is the k'th basis surface or `shot type' I Wnk is player n's loading onto Bk . Nonnegative Matrix Factorization: basis, K = 3 ● ● ● (a) close-range (b) mid-range (c) long-range Nonnegative Matrix Factorization: basis, K = 5 ● ● ● (d) close-range (e) mid-range ● ● (f) long-range Nonnegative Matrix Factorization: basis, K = 10 close-range ● ● ● mid-range ● ● ● ● long-range ● ● ● Nonnegative Matrix Factorization: basis, K = 20 ● ● ● ● ● ● ● Figure: close-range Nonnegative Matrix Factorization: basis, K = 20 ● ● ● ● ● ● Figure: mid-range Nonnegative Matrix Factorization: basis, K = 20 ● ● ● ● ● Figure: mid-range NMF Weights Individual player weights, wn;k : ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● LeBron James 0.21 0.16 0.12 0.09 0.04 0.07 0.00 0.07 0.08 0.17 Brook Lopez 0.06 0.27 0.43 0.09 0.01 0.03 0.08 0.03 0.00 0.01 Tyson Chandler 0.26 0.65 0.03 0.00 0.01 0.02 0.01 0.01 0.02 0.01 Marc Gasol 0.19 0.02 0.17 0.01 0.33 0.25 0.00 0.01 0.00 0.03 Tony Parker 0.12 0.22 0.17 0.07 0.21 0.07 0.08 0.06 0.00 0.00 Kyrie Irving 0.13 0.10 0.09 0.13 0.16 0.02 0.13 0.00 0.10 0.14 Stephen Curry 0.08 0.03 0.07 0.01 0.10 0.08 0.22 0.05 0.10 0.24 James Harden 0.34 0.00 0.11 0.00 0.03 0.02 0.13 0.00 0.11 0.26 Steve Novak 0.00 0.01 0.00 0.02 0.00 0.00 0.01 0.27 0.35 0.34 Close ● ● ● ● ● ● LeBron James 0.21 0.16 0.12 Brook Lopez 0.06 0.27 0.43 Tyson Chandler 0.26 0.65 0.03 Marc Gasol 0.19 0.02 0.17 Tony Parker 0.12 0.22 0.17 Kyrie Irving 0.13 0.10 0.09 ● Stephen Curry 0.08 0.03 0.07 James Harden 0.34 0.00 0.11 Steve Novak 0.00 0.01 0.00 (a) Close Range Weights ● Mid ● ● ● ● ● ● ● ● LeBron James 0.09 0.04 0.07 0.00 Brook Lopez 0.09 0.01 0.03 0.08 Tyson Chandler 0.00 0.01 0.02 0.01 Marc Gasol 0.01 0.33 0.25 0.00 Tony Parker 0.07 0.21 0.07 0.08 Kyrie Irving 0.13 0.16 0.02 0.13 ● Stephen Curry 0.01 0.10 0.08 0.22 James Harden 0.00 0.03 0.02 0.13 Steve Novak 0.02 0.00 0.00 0.01 (b) Mid Range Weights ● Long ● ● ● ● ● LeBron James 0.07 0.08 0.17 ● Brook Lopez 0.03 0.00 0.01 Tyson Chandler 0.01 0.02 0.01 Marc Gasol 0.01 0.00 0.03 Tony Parker 0.06 0.00 0.00 Kyrie Irving 0.00 0.10 0.14 ● Stephen Curry 0.05 0.10 0.24 James Harden 0.00 0.11 0.26 Steve Novak 0.27 0.35 0.34 (c) Long Range Weights ● Full ● ● ● Full ● ● ● Full ● ● ● Data analysis overview 1. Select data, expressive representation 2. Iterate: I Choose estimand, specify model I e.g. defense effects on spatial efficiency/frequency? I Fit model I Check model, interpret parameters, make predictions Current work: spatial field goal percentage Stephen Curry Posterior Court 50 1.0 40 0.8 30 0.6 ● xg2 0.4 20 0.2 10 0.0 0 0 5 10 15 20 25 30 35 xg1 Figure: Boston opponent field goal percentage, deviation from mean Future work: spatial opponent field goal percentage Bos Mean Court Bos Mean − Global Mean (Diff) 50 50 1.0 0.6 40 0.8 40 0.4 0.2 30 0.6 30 ● ● ● 0.0 xg2 xg2 0.4 20 20 −0.2 0.2 −0.4 10 10 −0.6 0.0 0 0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 xg1 xg1 Figure: Boston opponent field goal percentage, deviation from mean Thanks!.