Using Autoencoded Receiver Routes to Optimize Yardage
Total Page:16
File Type:pdf, Size:1020Kb
The Immaculate Reception Dimensionality-Reduced Receiver Route Optimization Problem How can we optimize routes so that we can increase expected yardage in any situation? Shape Based Turn x,y coordinates of every player at every Clustering 1 moment into usable receiver routes. Combine situational data with route Machine Learning 2 information to predict Yards and EPA. First Two Attempts Time series clustering and auto-encoding routes worked, but it didn’t give us the quality of insights we were hoping for. Time series clusters for one game Examples of auto-encoded routes Shape-Based Clustering Shape Based Clustering Shape-Based Clustering: Example Routes 10 Yard Crossing Route RB Out Route WR 71% TE 24% RB 5% WR 05% TE 05% RB 90% Shape Based Clustering Odell Beckham Rob Gronkowski Ezekiel Elliott Double Model Approach Situational Variables ● Seconds Remaining in Game ● Yard Line ● Down and Distance Likelihood of Completion ● Score Difference 1 ● Offensive Formation ● # of Pass Rushers Accuracy 71% AUC .75 ● Quarterback Engineered Variables 2 Yards Gained Given Completion ● The routes run on the play Cor .51 RMSE 10.0 ● Position (WR,TE,etc…) of the player running the route Important Variables Routes are much more important than the Quarterback at predicting play success Completion % Important Vars Yards Given Completion Important Vars ● Yard Line ● Seconds Remaining in Game ● Yard Line ● Score Difference ● Score Difference ● Number of Pass Rushers ● Seconds Remaining in Game ● Route Groups ● Route Groups ● … x65! ● … x16 ● Matt Ryan ● EJ Manuel Importance of Routes on Predicted Yards Broncos vs Bills Ravens vs Raiders Predicted Yards = 5.8 Predicted Yards = 10.5 Actual Yards = 2 Actual Yards = 52 Optimizing Routes to Improve Yardage Change TE (82) from “blocking” to “hitch” Predicted Yards +1.65 4.73 → 6.38 Quick Insights Run Short 5 Yard Hitches Good Routes >>>> Good Quarterbacks Top 20 Most Important Factors by Model Completion % Yards Given Completion Routes QBs Routes QBs 15 0 14 0 Completion % +39% Yards +.15 Thank You Jake Flancer - [email protected], @jakef1873 Jack Soslow - [email protected], @jack_soslow Andrew Castle - [email protected], @AndrewCastle510 Eric Dong - [email protected] Special thanks to Professor Abraham J. Wyner of the Wharton School for his advice and assistance, and to Michael Lopez and Jay Reid. References: 1. Keogh, Eamonn, and Jessica Lin. "Clustering of time-series subsequences is meaningless: implications for previous and future research." Knowledge and information systems 8.2 (2005): 154-177. 2. Steinbach, Michael, Levent Ertöz, and Vipin Kumar. "The challenges of clustering high dimensional data." New directions in statistical physics. Springer, Berlin, Heidelberg, 2004. 273-309. 3. Vincent, Pascal, et al. "Extracting and composing robust features with denoising autoencoders." Proceedings of the 25th international conference on Machine learning. ACM, 2008. Appendix First Two Modeling Attempts Modeling helps us optimize routes for any situation as well as tells us what variables matter when predicting yards gained Variable Importance Plot for XGboost Neural Net Diagnostic Charts Model One - Completion % Anova Model Two - Yards Given Completion Anova Starting Positions of Receivers Important Variables Routes are much more important than the Quarterback at predicting play success Completion % Important Vars Yards Given Completion Important Vars Final Double Model Diagnostics Cor = .15 RMSE = 10.9.