___Signature Redacted Signature Redacted

___Signature Redacted Signature Redacted

Incorporating Spatiotemporal Machine Learning into Major League Baseball and the National Football League by Jeremy H. Hochstedler B.S. Electrical Engineering MASSACHUSES INSTITUTE OF TECHNOLOGY Rose-Hulman Institute of Technology, 2006 M.S. Electrical Engineering JUN 2 7217 University of Notre Dame, 2008 LIBRARIES M.S. Management Science and Engineering Stanford University, 2012 ARCHIVES Submitted to the Systems Design and Management Program In Partial Fulfillment of the Requirements for the Degree of Master of Science in Engineering and Management at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2016 D 2016 Jeremy H. Hochstedler. All rights reserved. The author hereby grants MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created. Signature of Author ______Signature redacted I I Jeremy H. Hochstedler System Design and Management Program June 2016 Certified by Signature redacted Dr./ Sanjay Sarma Dean, Digital Learn & Professor, Mechanical Engineering Signature redacted Thesis Supervisor Accepted by Patrick Hale Director, System Design and Management Program This page intentionally left blank. Page 2 of 55 Incorporating Spatiotemporal Machine Learning into Major League Baseball and the National Football League by Jeremy H. Hochstedler Submitted to the MIT System Design and Management Program on May 11, 2016 in partial fulfillment of the requirements for the Degree of Master of Science in Engineering and Management Abstract Rich data sets exist in Major League Baseball (MLB) and the National Football League (NFL) that track players and equipment (i.e. the ball) in space and time. Using machine learning and other analytical techniques, this research explores the various data sets in each sport, providing advanced insights for team decision makers. Additionally, a framework will be presented on how the results can impact organizational decision-making. Qualitative research methods (e.g. interviews with front office personnel) are used to provide the analysis with both context and breadth; whereas various quantitative analyses supply depth to the research. For example, the reader will be exposed to mathematical/computer science terms such as Kohonen Networks and Voronoi Tessellations. However, they are presented with great care to simplify the concepts, allowing an understanding for most readers. As this research is jointly supported by the engineering and management schools, certain topics are kept at a higher level for readability. For any questions, contact the author for further discussion. Part I will address the distinction between performance and production, followed briefly by a decomposition of a typical MLB organizational structure, and finally display how the results of this analyses can directly impact areas such as player evaluation, advance scouting, and in-game strategy. Part II will similarly present how machine learning analyses can impact opponent scouting and personnel evaluation in the NFL. Thesis Supervisors Dr. Sanjay Sarma Dean, Digital Learning & Professor, Mechanical Engineering Dr. Abel Sanchez Executive Director, Geospatial Data Center & Lecturer, Computer Science Page 3 of 55 This page intentionally left blank. Page 4 of 55 J. Hochstedler I MIT 2016 Contents Chapter 1. Introduction ...................................................................................................... 8 1.1. M otivation .................................................................................................................. 8 1.2. Perform ance vs. Production .................................................................................... 8 1.3. Problem Statement ..................................................................................................... 9 Chapter 2. Analysis & Decision M aking in M ajor League Baseball................................ 10 2.1. M easuring Hitter Performance............................................................................. 10 2.2. Evaluating Pitchers Using Neural Networks......................................................... 14 2.2.2. Identification of Similar Pitchers .................................................................... 16 2.2.3. Predicting Future Production in Unproven Pitchers ...................................... 19 2.3. Incorporating Analyses Into MLB Decision-Making Processes.......................... 22 2.3.1. M odel Verification......................................................................................... 22 2.3.1.1. M ean Absolute Error................................................................................ 26 2.3.1.2. Root M ean Square Error ........................................................................ 27 2.3.1.3. Competition Testing................................................................................ 27 2.3.1.4. M odified Receiver Operating Characteristic ........................................... 28 2.3.2. M LB Organizational Structure...................................................................... 30 Chapter 3. Analysis & Decision Making in the National Football League ..................... 33 3.1. W inning and Avoiding Injuries............................................................................. 33 3.2. Data Collection....................................................................................................... 34 3.3. Receiver Openness and QB Decision-M aking....................................................... 35 3.3.1. Zone Size........................................................................................................... 37 3.3.2. Zone Integrity.................................................................................................. 37 3.3.3. Openness Classification ................................................................................. 38 3.3.4. Expected Gain ............................................................................................... 39 3.3.5. Player Elusiveness......................................................................................... 40 3.3.6. QB Decision Analysis.................................................................................... 41 3.4. Play Identification Using Supervised Learning .................................................... 43 3.4.1. Form ation Classification ............................................................................... 43 3.4.2. Action Classification...................................................................................... 43 3.4.3. Play Concept Classification and Sim ilarity..................................................... 48 3.5. Incorporating Analyses Into NFL Decision-Making Processes........................... 50 3.5.1. NFL Organizational Structure......................................................................... 51 Chapter 4. Conclusion...................................................................................................... 52 Chapter 5. References ...................................................................................................... 53 Chapter 6. Acknowledgements ........................................................................................ 55 Page 5 of 55 J. Hochstedler I MIT 2016 Figures Figure 1. Hitter Pitch FX Coordinate System Displaying Launch and Spray Angles......... 10 Figure 2. Distributions of Exit Speed and Launch Angle .................................................... 11 Figure 3. Scatter Plot of Exit Speed and Launch Angle ................................................... 12 Figure 4. Measuring Performance of Exit Speed and Launch Angle ............................... 13 Figure 5. Training Progress Over 100 Iterations............................................................... 16 Figure 6. Kohonen Map of 415 RHP's from the 2015 MLB Season ................................ 17 Figure 7. Scatter Plot of Model A Predicted vs. Actual Hitter Values ............................ 23 Figure 8. Distributions of Hitter Projection Models ........................................................ 24 Figure 9. Gaussian Distribution of Four Player Projection Models with Actual Values..... 25 Figure 10. Gaussian Distribution After Removal of Small Sample Players ..................... 25 Figure 11. Performance Evaluation of Each Prediction Model ........................................ 28 Figure 12. Zoomed Evaluation of Each Prediction Model .............................................. 29 Figure 13. Simplified MLB Team Baseball Operations Organizational Structure.......... 30 Figure 14. NFL QB Passer Rating vs. Receiver Concussions and Team Losses.............. 34 Figure 15. Down and Distance Utility Function ............................................................... 35 Figure 16. Traditional Voronoi Tessellation.................................................................... 35 Figure 17."Predictive" Voronoi Tessellation ................................................................... 35 Figure 18. Distribution of Predicted Voronoi Zone Size for Eligible Receivers.............. 37 Figure 19. First 2014 Colts Play from Scrimmage .......................................................... 39 Figure 20. Expected Play Gain on 148 Completions ........................................................ 40 Figure 21. Segmented Euclidean Regression of Two Distinct #8 (Post) Routes............. 44 Figure 22. Three Segment Euclidean Regression of Multiple

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    55 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us