Real-Time Prediction of Outcomes Using High-Resolution Spatio-Temporal Tracking Data Daniel Cervone, Alex D’Amour, Luke Bornn, and Kirk Goldsberry Department of Statistics, Harvard University for Geographic Analysis, Harvard University

Goal Estimated EPV Curve Independence Assumptions

RASHARD James Using optical tracking data, estimate the value of 3 2 3 2 2 OFFENSE Assumptions necessary to make EPV estimation 3 2 A B 3 C 2 D POSSESSION LEWIS LEBRON JAMES POSSESSION 2 1 shot 2 3 1 Norris Cole 4 4 4 POSS. 2 3 4 3 5 1 4 1 2 every action in a basketbal game, including non- 3 4 15 4 4 Splits the defense; computationally tractable clear path to 1 3 15 basket 5 5 5 4 LeBron James Accelerates 5 terminal actions like passes and dribble penetrations 1 5 towards basket • Macrotransitions decouple the outcome from 1 5 Accelerates EPV into the that do not show up in the box score. Pass paint fine-grained information in F . Runs behind Pass t 2 1 2 1 2 2 DEFENSE 3 1 1 basket and 2 E 3 F G H { Slight dip in EPV after 3 2 defenders close 2 3 2 1 Slight dip in EPV after { crossing 3 line 1 1 3 1 1 3 crossing 3 point line EPV constant while • Coarsened process C is marginally Semi-Markov 3 2 t 4 4 pass is en route 5 4 4 4 4 4 3

4 0.8 1.0 1.2 1.4 1.6 3 5 5 5 5 (sojourn times not exponential). NBA Optical Tracking Data 4 5 5 5 5 0 3 6 9 12 15 18 time Hierarchical Modeling The SportVU optical tracking data consists of x, y Figure: possession against Nets with estimated EPV curve. Norris Cole wanders the perimeter (A) then locations on the court of all 10 players—and x, y, z drives toward the basket (B). Instead of taking the shot, he runs underneath the basket (C) and passes to Rashard Lewis (D), who Hierarchical modeling of unknown components in locations of the ball—25 times per second. A full passes to LeBron James (E). After entering the perimeter (F), James slips behind the defense (G) and scores an easy layup (H). our transition models provides shrinkage necessary season of data is over a billion points in space-time. Model Components: Two Transition Types for good predictive performance. • Functional basis representation of spatial surfaces Expected Possession Value (EPV) ` Macrotransition Model Microtransition Model ξj more appropriate for hierarchical models [3]. A player similarity network, learned by the spatial Expected Possession Value (ν ) is the number of Macrotransitions encapsulate discrete Microtransitions represent player movement • t distribution of players’ time on the court, provides points (X) the offense is expected to score on basketball actions (e.g. passing and shooting). between macrotransitions. WITH BALL DWIGHT HOWARD WITHOUT BALL CAR priors for model parameters across players. a particular possession, given its spatiotemporal Modeled using competing risks [4]. Assuming player ` is the ballcarrier, the hazard of • Inference performed using R-INLA software [5]. history up to time t (Ft): macrotransition j at time t is • Scale of data necessitates implementation on ν = [X|F ]. t E t ` ` 0 ` ` high-performance, distributed computing systems. log(λj(t)) = [Wj(t)] βj + ξj (z`(t)) , νt averages over all possible future courses of the ` ` possession. Temporally coherent estimation re- where Wj(t) are time-varying covariates and ξj a Results quies a stochastic process model of basketball. Gaussian Process-distributed spatial random effect. EPV curves (νt) calculated for every NBA possession Figure: Acceleration fields (µx, µy) for Dwight Howard. Multiresolution Modeling in 2013-2014 season, revealing the real-time “stock Classical dynamics motivates an ARI(1,1) model price” of a possession. Using these, analysts can for forecasting a player’s location (z(t)) conditional identify the most impactful actions during a posses- Jointly model full-resolution basketball process (Zt) on a fixed ballcarrier. Acceleration fields modeled sion and estimate novel ways in which players con- and coarsened view of the process (Ct = C (Zt)). with a Gaussian Process prior. tribute value to the offense. For more details, please Figure: Posterior mean and SD for spatial effect ξ in LeBron see the authors’ additional work on this project [1, 2]. Player 1 Player 2 Player 3 Player 4 Player 5 James’ shot-taking hazard model. Cposs References Pass to P5 Pass to P4 Pass to P3 Pass to P2 Evaluating EPV with Multiresolution Transitions

Ctrans Turnover Shot [1] D. Cervone, A. D’Amour, et al. “A Multiresolution Semi-Markov Letting τ and δ be the start and end times of the next macrotransition to complete, Mj(τ) be the identity Process Model for Predicting Basketball Possession Outcomes.” In Cend Made 2pt Made 3pt End of possession of this macrotransition, and Zs and Cs denote the full-resolution and coarsened states at time s, under preparation (2014). Figure: Schematic of coarsened state process Ct. Macrotransi- conditional independence assumptions, rewrite EPV as [2] D. Cervone, A. D’Amour, et al. “Pointwise: Predicting Points and Valuing Decisions in Real Time with NBA Optical Tracking Data.” tion states shaded in gray. Z ∞ Z  X MIT SSAC (2014). νt = E[X|Cδ = c] (Cδ = c, M(τ)|Zτ = z, τ = s) (Zτ = z, τ = s|Ft)dzds . t Z P P c∈C [3] F. Lindgren, H. Rue, S. Martino, et al. “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic • [email protected] • Computed from long-run distribution of C, • Computed directly from Microtransition partial differential equation approach.” JRSSB 73.4 (2011): 423–498. • [email protected] derived from Macrotransition model. model. [4] R. Prentice, J. Kalbfleisch, et al. “The analysis of failure times in the presence of competing risks.” Biometrics (1978): 541–554. [email protected] • • Computed directly from Macrotransition • Integration by Monte Carlo. [5] H. Rue, et al. “Approximate Bayesian inference for latent Gaussian • [email protected] model. models by using integrated nested Laplace approximations.” JRSSB 71.2 (2009): 319–392.

1