The Immaculate Reception

Dimensionality-Reduced Receiver Route Optimization Problem How can we optimize routes so that we can increase expected yardage in any situation?

Shape Based Turn x,y coordinates of every player at every Clustering 1 moment into usable receiver routes.

Combine situational data with route Machine Learning 2 information to predict Yards and EPA. First Two Attempts Time series clustering and auto-encoding routes worked, but it didn’t give us the quality of insights we were hoping for.

Time series clusters for one game Examples of auto-encoded routes Shape-Based Clustering Shape Based Clustering Shape-Based Clustering: Example Routes

10 Yard RB Out Route

WR 71% TE 24% RB 5% WR 05% TE 05% RB 90% Shape Based Clustering

Odell Beckham Rob Gronkowski Ezekiel Elliott Double Model Approach Situational Variables

● Seconds Remaining in Game ● Yard Line ● Down and Distance Likelihood of Completion ● Score Difference 1 ● Offensive Formation ● # of Pass Rushers Accuracy 71% AUC .75 ●

Engineered Variables 2 Yards Gained Given Completion ● The routes run on the play Cor .51 RMSE 10.0 ● Position (WR,TE,etc…) of the player running the route Important Variables Routes are much more important than the Quarterback at predicting play success

Completion % Important Vars Yards Given Completion Important Vars

● Yard Line ● Seconds Remaining in Game ● Yard Line ● Score Difference ● Score Difference ● Number of Pass Rushers ● Seconds Remaining in Game

● Route Groups ● Route Groups ● … x65! ● … x16

● Matt Ryan ● EJ Manuel Importance of Routes on Predicted Yards

Broncos vs Bills Ravens vs Raiders

Predicted Yards = 5.8 Predicted Yards = 10.5 Actual Yards = 2 Actual Yards = 52 Optimizing Routes to Improve Yardage

Change TE (82) from “blocking” to “hitch”

Predicted Yards +1.65

4.73 → 6.38 Quick Insights

Run Short 5 Yard Hitches Good Routes >>>> Good

Top 20 Most Important Factors by Model

Completion % Yards Given Completion

Routes QBs Routes QBs 15 0 14 0 Completion % +39%

Yards +.15 Thank You

Jake Flancer - [email protected], @jakef1873 Jack Soslow - [email protected], @jack_soslow Andrew Castle - [email protected], @AndrewCastle510 Eric Dong - [email protected]

Special thanks to Professor Abraham J. Wyner of the Wharton School for his advice and assistance, and to Michael Lopez and Jay Reid.

References:

1. Keogh, Eamonn, and Jessica Lin. "Clustering of time-series subsequences is meaningless: implications for previous and future research." Knowledge and information systems 8.2 (2005): 154-177. 2. Steinbach, Michael, Levent Ertöz, and Vipin Kumar. "The challenges of clustering high dimensional data." New directions in statistical physics. Springer, Berlin, Heidelberg, 2004. 273-309. 3. Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." Proceedings of the 25th international conference on Machine learning. ACM, 2008. Appendix First Two Modeling Attempts Modeling helps us optimize routes for any situation as well as tells us what variables matter when predicting yards gained

Variable Importance Plot for XGboost Neural Net Diagnostic Charts Model One - Completion % Anova Model Two - Yards Given Completion Anova Starting Positions of Receivers Important Variables Routes are much more important than the Quarterback at predicting play success

Completion % Important Vars Yards Given Completion Important Vars Final Double Model Diagnostics

Cor = .15

RMSE = 10.9