Incorporating Spatiotemporal Machine Learning into Major League Baseball and the National Football League

by

Jeremy H. Hochstedler

B.S. Electrical Engineering MASSACHUSES INSTITUTE OF TECHNOLOGY Rose-Hulman Institute of Technology, 2006

M.S. Electrical Engineering JUN 2 7217 University of Notre Dame, 2008 LIBRARIES M.S. Management Science and Engineering Stanford University, 2012 ARCHIVES

Submitted to the Systems Design and Management Program In Partial Fulfillment of the Requirements for the Degree of

Master of Science in Engineering and Management at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2016

D 2016 Jeremy H. Hochstedler. All rights reserved. The author hereby grants MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part in any medium now known or hereafter created.

Signature of Author ______Signature redacted I I Jeremy H. Hochstedler System Design and Management Program June 2016

Certified by Signature redacted Dr./ Sanjay Sarma Dean, Digital Learn & Professor, Mechanical Engineering Signature redacted Thesis Supervisor Accepted by Patrick Hale Director, System Design and Management Program This page intentionally left blank.

Page 2 of 55 Incorporating Spatiotemporal Machine Learning into Major League Baseball and the National Football League

by

Jeremy H. Hochstedler

Submitted to the MIT System Design and Management Program on May 11, 2016 in partial fulfillment of the requirements for the Degree of Master of Science in Engineering and Management

Abstract Rich data sets exist in Major League Baseball (MLB) and the National Football League (NFL) that track players and equipment (i.e. the ball) in space and time. Using machine learning and other analytical techniques, this research explores the various data sets in each sport, providing advanced insights for team decision makers. Additionally, a framework will be presented on how the results can impact organizational decision-making.

Qualitative research methods (e.g. interviews with front office personnel) are used to provide the analysis with both context and breadth; whereas various quantitative analyses supply depth to the research. For example, the reader will be exposed to mathematical/computer science terms such as Kohonen Networks and Voronoi Tessellations. However, they are presented with great care to simplify the concepts, allowing an understanding for most readers. As this research is jointly supported by the engineering and management schools, certain topics are kept at a higher level for readability. For any questions, contact the author for further discussion.

Part I will address the distinction between performance and production, followed briefly by a decomposition of a typical MLB organizational structure, and finally display how the results of this analyses can directly impact areas such as player evaluation, advance scouting, and in-game strategy. Part II will similarly present how machine learning analyses can impact opponent scouting and personnel evaluation in the NFL.

Thesis Supervisors Dr. Sanjay Sarma Dean, Digital Learning & Professor, Mechanical Engineering

Dr. Abel Sanchez Executive Director, Geospatial Data Center & Lecturer, Computer Science

Page 3 of 55 This page intentionally left blank.

Page 4 of 55 J. Hochstedler I MIT 2016

Contents

Chapter 1. Introduction ...... 8 1.1. M otivation ...... 8 1.2. Perform ance vs. Production ...... 8 1.3. Problem Statement ...... 9 Chapter 2. Analysis & Decision M aking in M ajor League Baseball...... 10 2.1. M easuring Hitter Performance...... 10 2.2. Evaluating Pitchers Using Neural Networks...... 14 2.2.2. Identification of Similar Pitchers ...... 16 2.2.3. Predicting Future Production in Unproven Pitchers ...... 19 2.3. Incorporating Analyses Into MLB Decision-Making Processes...... 22 2.3.1. M odel Verification...... 22 2.3.1.1. M ean Absolute Error...... 26 2.3.1.2. Root M ean Square Error ...... 27 2.3.1.3. Competition Testing...... 27 2.3.1.4. M odified Receiver Operating Characteristic ...... 28 2.3.2. M LB Organizational Structure...... 30 Chapter 3. Analysis & Decision Making in the National Football League ...... 33 3.1. W inning and Avoiding Injuries...... 33 3.2. Data Collection...... 34 3.3. Receiver Openness and QB Decision-M aking...... 35 3.3.1. Zone Size...... 37 3.3.2. Zone Integrity...... 37 3.3.3. Openness Classification ...... 38 3.3.4. Expected Gain ...... 39 3.3.5. Player Elusiveness...... 40 3.3.6. QB Decision Analysis...... 41 3.4. Play Identification Using Supervised Learning ...... 43 3.4.1. Form ation Classification ...... 43 3.4.2. Action Classification...... 43 3.4.3. Play Concept Classification and Sim ilarity...... 48 3.5. Incorporating Analyses Into NFL Decision-Making Processes...... 50 3.5.1. NFL Organizational Structure...... 51 Chapter 4. Conclusion...... 52 Chapter 5. References ...... 53 Chapter 6. Acknowledgements ...... 55

Page 5 of 55 J. Hochstedler I MIT 2016

Figures

Figure 1. Hitter Pitch FX Coordinate System Displaying Launch and Spray Angles...... 10 Figure 2. Distributions of Exit Speed and Launch Angle ...... 11 Figure 3. Scatter Plot of Exit Speed and Launch Angle ...... 12 Figure 4. Measuring Performance of Exit Speed and Launch Angle ...... 13 Figure 5. Training Progress Over 100 Iterations...... 16 Figure 6. Kohonen Map of 415 RHP's from the 2015 MLB Season ...... 17 Figure 7. Scatter Plot of Model A Predicted vs. Actual Hitter Values ...... 23 Figure 8. Distributions of Hitter Projection Models ...... 24 Figure 9. Gaussian Distribution of Four Player Projection Models with Actual Values..... 25 Figure 10. Gaussian Distribution After Removal of Small Sample Players ...... 25 Figure 11. Performance Evaluation of Each Prediction Model ...... 28 Figure 12. Zoomed Evaluation of Each Prediction Model ...... 29 Figure 13. Simplified MLB Team Baseball Operations Organizational Structure...... 30 Figure 14. NFL QB Passer Rating vs. Receiver Concussions and Team Losses...... 34 Figure 15. and Distance Utility Function ...... 35 Figure 16. Traditional Voronoi Tessellation...... 35 Figure 17."Predictive" Voronoi Tessellation ...... 35 Figure 18. Distribution of Predicted Voronoi Zone Size for Eligible Receivers...... 37 Figure 19. First 2014 Colts Play from Scrimmage ...... 39 Figure 20. Expected Play Gain on 148 Completions ...... 40 Figure 21. Segmented Euclidean Regression of Two Distinct #8 (Post) Routes...... 44 Figure 22. Three Segment Euclidean Regression of Multiple Drag Routes...... 45 Figure 23. Full Play Displaying Actual and Estimated Receiver Routes ...... 46 Figure 24. Confusion Matrix of Receiver Route Prediction Model...... 47 Figure 25. Prediction Distribution of True #3 (Out) Routes...... 48 Figure 26. Kohonen M ap of 231 Offensive Plays...... 49 Figure 27. Kohonen Map Displaying Seven (qty. 7) Clusters & Associated Tags...... 49 Figure 28. Simplified NFL Team Football Operations Organizational Structure ...... 51

Page 6 of 55 J. Hochstedler I MIT 2016

Tables

Table 1. Feature Set of Three RHP's Displaying Two Pitch Types ...... 16 Table 2. Sim ilarity of Each Team's RHP Trio ...... 18 Table 3. Top 10 Starting RHP's from the 2014 - 2015 Seasons...... 20 Table 4. Pitcher Kohonen Clusters (Nos. 4, 8, 19) Displaying Pitch Velocities ...... 21 Table 5. M ean A bsolute Error Results...... 26 Table 6. Root M ean Square Error Results ...... 27 Table 7. Com petition Test Results ...... 28 Table 8. A. Luck 2014 Completion Percentage as Function of Receiver Openness ..... 38 Table 9. Receiver Elusiveness for 148 Completions from 2014 Colts Base Offense..... 41 Table 10. QB (A. Luck) Decision Analysis Results ...... 42 Table 11. Outcome Analysis of Seven Play Types ...... 50

Page 7 of 55 J. Hochstedler I MIT 2016

Chapter 1. Introduction

"A man can be as great as he wants to be. Ifyou believe in yourself and have the courage, the determination, the dedication, the competitive drive, and ifyou are willing to sacrifice the little things in life, it can be done." Vince Lombardi

1.1. Motivation

I love to compete. To me, as a former college player and coach, sports are the ultimate form of competition. I also find fulfillment in using math and science to answer interesting questions. Combine these and one can clearly see that my passion lies at the intersection of sports and data science.

Through this research, including the many conversations with coaches and front office personnel, I have gained a deeper understanding of the "inside" of professional sports organizations. It is my mission to be a leader in the sports industry, and it is my goal to be a part of two championships: one in baseball, one in football. This goal is driven out of a quest to experience the extraordinary. As this is a lifelong mission, this research is simply another step in the process.

1.2. Performance vs. Production

Growing up when at the plate, I was taught by my Dad and other coaches to "find a way to get it done." If a bloop single scored the run, then I succeeded. It wasn't until I became a coach myself that my perspective changed. I began coaching in 2009 and was introduced to the concept "Performance vs. Production." Defined as:

Performance -the execution of an action. In baseball, a hitter's performance is the sum of all actions within his control. It's the process or how he perfbrms.

Production - output. In sports, it's the outcome of his performance or what he produces.

Page 8 of 55 J. Hochstedler I MIT 2016

The "Performance vs. Production" philosophy teaches us to focus on what we can control. A hitter who "squared up" a ball with a 100+ mph exit velocity at a +5 degree angle from horizontal may not get out of the box as the third baseman snares the line drive. In the old days, this scenario would have me believe that I did nothing but fail. However, I now realize how the hitter with the 100+ mph exit velocity actually performed at an extraordinarily high level, yet his production fell to some bad luck (or possibly a strategically well placed defender). The overarching philosophy of this thesis is to measure performance in lieu of production.

1.3. Problem Statement

Spatiotemporal data possess space and time characteristics. Examples include raw flight radar data, stored GPS data, and lunar tracking data. Here, the focus remains on data in sports, specifically baseball and American football. Thus, the question:

How can professionalsports organizationsgain stronger leverage through the analysis of spatiotemporaldata?

This research applies new methods to multiple spatiotemporal data sets in MLB and the NFL. Specifically, MLB's PitchFX data and geospatial NFL data are analyzed in new ways to create new insights for decision-makers within their respective organizations.

Page 9 of 55 J. Hochstedler IMIT 2016

Chapter 2. Analysis & Decision Making in Major League Baseball

While the analyses presented here are a sample of the power machine learning can bring to front offices, it is ultimately the decision of organizational leadership of where to apply machine learning and other advanced techniques. For example, it's possible resources could be applied to better understanding biomechanics, stress, fatigue, psychology, camaraderie, or other existing elements that remain vastly unknown. Additionally, machine learning could be applied to better understand the most valuable unknown commodity in sports: the pitching arm (Passan, 2016). Regardless, the analyses that follow are merely provided as examples of what can be done.

2.1. Measuring Hitter Performance

When analyzing how a hitterperforms, the outcome must be temporarily removed allowing an analysis to focus on what the hitter can control. Fundamentally, ball exit velocity (speed and trajectory) off the hitter's bat is used to characterize his performance. In the sabermetric community, the exit velocity is comprised of three components: ball exit speed off the bat, vertical launch angle, and horizontal spray angle. Figure 1 illustrates the distinction between launch and spray angles.

Spray Angle

Figure 1. Hitter Pitch FX Coordinate System Displaying Launch and Spray Angles

Page 10 of 55 J. Hochstedler I MIT 2016

In developing this philosophy, an assumption is made that a hitter cannot control his spray angle at the "micro" level. Yes, spray charts prove that certain players (e.g. David Ortiz, Barry Bonds) predominantly pull the baseball at the macro level. However, on the micro level, hitters are taught to keep their bat through the hitting zone longer. That is, once a hitter's hands have started and his bat travels through the zone, his expected spray angle continuously changes with no ability to make an adjustment without severely sacrificing bat speed. Therefore, this discussion focuses on the two factors a hitter most certainly can control: ball exit speed and launch angle. Traditionally these are recognized as - and tied to - "bat speed" and "timing," respectively. Figure 2 provides a quick vie\I of the April 2009 publicly released HITf/x data shows a distribution of each of these factors below.

L-M7hA0 2. mon- 13.2708 Ad -2.1512 man -2.3422 Ad 1SA44 S00

450 2501 400

350 2001

300

250 IO

200 100 10

50

50

00' i0o VAl~y~p LambchAn#*(dg.Nmbhatn4 Figure 2. Distributions of Exit Speed and Launch Angle

I This analysis has been modified from the author's original work (Hochstedler, 2013)

Page 11 of 55 J. Hochstedler IMIT 2016

Now plotting every batted ball's launch angle/exit velocity combination, the scatter plot in Figure 3 allows the two factors within a hitter's control to be observed in one image.

Laumh Angle vs. E Yalocky ...... 100 ...... - ...... Ln ...... h ng y . d..y 10 0 60 ...... 0 ......

60 00

00 -- 0.

0

0 20:00 :00:P -20 ......

-100 I 0 0 40 0 0 2

Yelcky(mph)

Figure 3. Scatter Plot of Exit Speed and Launch Angle

Although interesting, in order to maximize the understanding, these two relationships must be measured. Long term, it is assumed that increased performance will lead to increased production; therefore a focus on measuring performance is maintained. A hitter is unable to control how an opposing player fields his batted ball. Additionally, a hitter has virtually no control over reaching on errors. Even singles, doubles, and triples possess a significant defensive bias.

Therefore, luck, chance, ballpark effects, and defensive alignment/skill are removed to evaluate a hitter's perjbrmance. To do so, exit speed and launch angle combinations are measured against expected outcomes. Therefore, the factors are divided into 40 "bins" (exit speed into 4 mph increments; launch angle into 3 degree increments) yielding a 40x40 matrix of 1,600 bins (many are empty, grey). Within each bin, a certain number of batted balls are tracked and marked with a specific outcome (e.g. out, single, HR, etc.). Here

Page 12 of 55 J. Hochstedler I MIT 2016 linear weights (i.e. wOBA) are used, then averaged within each 4 mph x 3 degree bin. [Note: any outcome measurement such as batting average, on base percentage (OBP), slugging percentage, etc. may be substituted.] Figure 4 provides a heat map displaying the outcome of the analysis. Red & yellow denote high wOBA (HRs & XBH's); teal/green depict medium wOBA (e.g. singles); blue represents poor wOBA (e.g. outs). Optimal hitting occurs approximately between 18-40 degrees in excess of 95 mph.

M4eftiwO8AtS#~lMictIon ofExf Vol ty & Launch Agi

3 .3 ME

-U.' .M .~ U ~ --- ~--~- 0

Figure 4. Measuring Performance of Exit Speed and Launch Angle Using Linear Weights

By aggregating all balls in play, the performance of each plate appearance can be measured on a continuous peiformance scale in lieu of the digital production scale. Using OBP as the outcome measurement as an example, the outcome based method would permit a binary assignment of skill for each individual plate appearance ("1" for reaching base successfully, "0" producing an out). However, by using the performance measures of ball exit speed and vertical launch angle, the likelihood of reaching base using those two parameters is used (a number between 0 and 1, depending on the bin associated with the two velocity characteristics measured for that specific plate appearance).

Page 13 of 55 J. Hochstedler I MIT 2016

Since the growth of quantitative analysis in professional baseball, team analytics staffs are growing. (Lindbergh and Arthur, 2016) As such, the need to use third party data analysis is decreasing. For this reason, the author has entered into an exclusive agreement with a professional baseball organization that contains a strict confidentiality agreement which restricts the public disclosure of work performed. Therefore, this thesis is unfortunately unable to detail how portions of the author's analysis have evolved since its original publishing in 2013.

However, measuring performance in lieu of production enables more precise player evaluation by eliminating factors outside the control of the hitter. Subsequently, this method will lead to more accurate player projections. This approach can also be used to evaluate pitchers and remains a core principle to any analysis performed in this research. In general, the author utilizes this philosophy in any research or analysis he performs.

2.2. Evaluating Pitchers Using Neural Networks

Understanding pitchers, their pitch types, velocities, movement, and usage are vital to player evaluation and developing a strategy to defeat the opposition. This analysis uses an Artificial Neural Network to group pitchers by similarity. Specifically using a Kohonen Network, pitchers are grouped by the peiformance of their pitches using 2015 Pitch FX data.

2.2.1. Data Collection The process below identifies the steps taken to extract, transform, and prepare the data for analysis. First, data is scraped from MLBAM's 2015 Pitch FX database returning information on more than 700,000 pitches. Features are then extracted for each pitch.

Page 14 of 55 J. Hochstedler I MIT 2016

Specifically, the following pitch characteristics are retained: 1. Pitcher (name and ID) 2. Pitcher handedness 3. Pitch type (as classified by MLB Advanced Media's algorithm) 4. Speed (mph at 50' from plate) 5. Release point (xO, z at 50' frome plate) 6. Break (pfx_x, pfxz when ball crosses plate) 7. Play outcome (specifically to gain BB/9 as measure of pitcher control)

The data is consolidated by averaging data across individual pitchers. Pitch types are consolidated into five types: FF (four seam fastball), CH (change-up), CU/KC (curveball), SL/FC (slider/cutter), and SI/FT (sinker/two seam fastball). Known as the "feature engineering" step, the decisions here to group sliders with cut fastballs and sinkers with two seam fastballs is open for critique. However, aside from the traditional fastball, change-up, and breaking ball, characteristics of pitches that "cut" (i.e. SL/FC) are differentiated from those that "run/sink" (i.e. SI/FT). To gain a deeper level of fidelity in this analysis, left-handed pitchers (LHP) and knuckleballers (e.g. R. Dickey and S. Wright) are omitted. That is, knuckleballers are known to be unique, and comparing a LHP to his right-handed (RHP) counterparts from a physical standpoint would be odd, at best. Therefore, filtering for RHP's with more than 50 batters faced leaves 415 total pitchers.

This analysis focuses on the aspects that a pitcher can control, namely his "stuff": pitch release points, velocities, and breaks. The only feature element (of the 26) that partially violates this methodology is BB/PA; this is used for simplicity and further work could improve this analysis (e.g. consider pitchers' abilities to avoid belt high, center cut pitches, balls out of the strike zone when behind in the count, etc.). Table 1 displays three samples of the prepared feature set (due to space constraints, only FF and SL pitch types are presented).

Page 15 of 55 J. Hochstedler IMIT 2016

Speed (mph) Rel. X (ft) Rel. Z (ft) Horiz. Break (in) Vert. Break (in) Name BB/PA FF SL FF SL FF SL FF SL FF SL Arrieta, Jake 94.6 90.2 -2.8 -3.1 6.2 6.1 -5.2 3.0 8.7 2.3 0.048 Harvey, Matt 95.8 89.5 -1.0 -1.1 6.1 6.1 -6.1 0.7 9.1 2.7 0.046 Hemandez, Felix 92.2 84.1 -2.0 -2.2 5.9 5.9 -3.7 2.1 6.5 -3.9 0.060 Table 1. Feature Set of Three RHP's Displaying Two Pitch Types (FF & SL)

2.2.2. Identification of Similar Pitchers Kohonen Networks use an unsupervised neural network to compare high-dimensional data sets. After scaling the 26 feature inputs that describe the performance of each pitcher, the feature set is inserted into the Kohonen competitive learning algorithm. After 1 iterations (here, n=100, refer to Figure 5 below) of internal competition within the algorithm, the data self organizes by minimizing the mean Euclidean distance between nearest units. Basically, this enables the most similar pitchers to cluster (group) together.

0 0 0

0 -I.

0) C.i

I I 0 20 40 60 80 100

Iteration

Figure 5. Training Progress Over 100 Iterations

Page 16 of 55 J. Hochstedler I MIT 2016

Figure 6 displays the Kohonen map of the 415 pitchers analyzed. Each node (circle) encompasses 0-7 pitchers (refer to the scale at left). In a Kohonen Network, minimizing the Euclidean separation between pitchers results in more similar pitchers. as two nodes become closer, they are determined to be more similar (as described by their 26 features).

7

5

4

3

2

Figure 6. Kohonen Map of 415 RHP's from the 2015 MLB Season

Performing this analysis across the top 2015 RHP trio for each team (as identified by team depth charts on MLB.com), the New York Mets' trio is observed to be the most similar when evaluating physical performance. Table 2 displays the similarity results for all 30 teams. Note: where a third starting RHP did not qualify, the team's top reliever (closer) was used.

Page 17 of 55 J. Hochstedler I MIT 2016

Euclidean Rank Team Pitchers Separation I NYM Harvey/deGrom/Syndergaard 7.63 2 BOS Porcello/Kelly/Buchholz 8.00 3 ATL Miller/Teheran/Wisler 8.49 4 CLE Kluber/Carrasco/Salazar 9.21 5 TEX Gallardo/Lewis/Gonzalez 10.31 6 SD Shields/Ross/Kennedy 10.37 7 SF Cain/Leake/Heston 10.51 8 ARI De La Rosa/Anderson/Hellickson 10.78 9 OAK Gray/Chavez/Bassitt 10.96 10 MIN Hughes/Santana/Gibson 11.09 11 MIA Fernandez/Koehler/Phelps 11.20 12 LAA Richards/Weaver/Shoemaker 11.65 13 KC Cueto/Volquez/Ventura 11.66 14 CHC Arrieta/Hendricks/Hammel 12.24 15 WAS Scherzer/Zimmermann/Strasburg 12.27 16 PHI Harang/Nola/Eickhoff 12.75 17 SEA Hernandez/Iwakuma/Walker 13.28 18 MIL Nelson/Peralta/Jungmann 15.08 19 HOU McHugh/Fiers/McCullers 15.29 20 NYY Tanaka/Pineda/Warren 15.45 21 BAL Tillman/Gonzalez/Jimenez 16.16 22 TB Archer/Odorizzi/Ramirez 16.90 23 DET Verlander/Simon/Sanchez 17.23 24 PIT Cole/Burnett/Morton 17.29 25 TOR Stroman/Estrada/Osuna 17.37 26 STL Wainright/Lackey/Wacha 17.61 27 CIN DeSclafani/Iglesias/Smith 17.98 28 LAD Greinke/Bolsinger/Jansen 18.14 29 CWS Samardzija/Johnson/Robertson 22.13 30 COL Kendrick/Bettis/Gray 23.90 Table 2. Similarity of Each Team's RHP Trio (Lower = More Similar)

Page 18 of 55 J. Hochstedler I MIT 2016

While interesting, this analysis provides little impact to organizational decision-making. In order to benefit a ball club, this analysis must be expanded to analyze pitcher transformation from season to season. For example, this approach can identify opportunities and predict how changes in mechanics such as release point, velocities, and spin could potentially improve a pitcher. Careful communication to the player such mechanical adjustments shall be carried forward through a trusted source (Johnson, 2013). Other applications include predicting hitter performance based off the pitcher "type" he's facing (e.g. create new segmentations of pitcher types or "splits" than simple handedness). Finally, predicting future pitcher performance of unproven pitchers based on similarity to other, established pitchers could benefit organizational decision-making.

2.2.3. Predicting Future Production in Unproven Pitchers Many attempts are made to predict player performance, particularly for young, unproven pitchers. This method identifies the styles, movement, and combinations that "work" and find pitchers with similar characteristics. First, successful pitching styles must be identified. Using Fangraphs.com, the top 10 RHP's (min. 50 GS) from the 2014-2015 seasons are identified by their value per game started and are presented in Table 3. Here, wins above replacement (WAR) is used to establish value while normalizing on games started (GS) reduces the negative consequences of injuries.

Page 19 of 55 J. Hochstedler I MIT 2016

MLB Name Cluster GS IP K/9 WAR WAR/GS Rank I Arrieta, Jake 19 58 385.2 9.40 12.3 0.21 2 Kluber, Corey 18 66 457.2 10.11 12.9 0.20 3 Scherzer, Max 4 66 449.0 10.58 11.6 0.18 4 deGrom, Jacob 8 52 331.1 9.48 8.7 0.17 5 Greinke, Zack 24 64 425.0 8.62 10.2 0.16 6 Cole, Gerrit 4 54 346.0 8.84 7.8 0.14 7 Strasburg, Stephen 19 57 342.1 10.44 7.8 0.14 8 Hernandez, Felix 23 65 437.2 9.03 8.8 0.14 9 Cueto, Johnny 8 66 455.2 8.26 8.7 0.13 10 Archer, Chris 2 66 406.2 9.41 8.5 0.13

Table 3. Top 10 Starting RHP's from the 2014 - 2015 Seasons

After filtering the Kohonen map derived in Section 2.2, three groups (out of 25) emerge as the most interesting: Cluster Nos. 4, 8, and 19 are the only groups with at least two top 10 RHP's (as determined by WAR/GS above). Table 4 displays the three clusters along with the velocities of their pitch repertoire (due to space constraints, their entire release, movement, and spin metrics were not included).

Page 20 of 55 J. Hochstedler I MIT 2016

Cluster Name CH_mph SImph I FFmph SLmph CUmph

4 Verlander, Justin 86.8 - 93.0 85.7 79.4 4 Scherzer, Max 84.9 - 94.4 86.3 79.5 4 Cole, Gerrit 88.1 - 95.6 87.2 81.6 4 Mitchell, Bryan 88.9 - 96.1 91.7 82.5 4 Velasquez, Vincent 86.4 - 94.5 84.1 81.5

8 Harvey, Matt 88.4 95.9 95.8 89.5 83.8 8 Cueto, Johnny 83.4 92.6 92.6 86.5 76.3 8 Frias, Carlos 87.0 94.4 94.3 89.6 79.6 8 Zimmermann, Jordan 86.6 93.8 93.1 87.8 80.6 8 Lorenzen, Michael 85.1 94.4 94.0 86.0 80.3 8 deGrom, Jacob 85.6 94.8 95.2 89.8 81.6 19 Arrieta, Jake 89.1 94.7 94.6 90.2 80.8 19 Salazar, Danny 85.3 94.3 94.8 85.0 79.0 19 Strasburg, Stephen 88.8 95.7 95.7 88.4 82.3 19 Foltynewicz, Mike 84.4 94.7 95.1 84.0 75.3 19 Sanchez, Aaron 89.2 95.6 94.8 91.0 79.3 Table 4. Pitcher Kohonen Clusters (Nos. 4, 8, 19) Displaying Pitch Velocities

Pitchers with "household" names such as Scherzer, Cole, Harvey, deGrom, Arrieta, and Strasburg are known to be successful based off their production; however relatively young, unknown pitchers such as Mitchell, Velasquez, Frias, Lorenzen, Foltynewicz, and Sanchez don't have significant major league experience and thus, scouts are limited in their evaluation. The pitchers within each group possess the most similar talent (i.e. performance as defined by their Pitch FX characteristics) to the other pitchers within their respective cluster. By measuring their talent/performance and matching them with other, known successful pitchers, we can predict that given the opportunity, their production will follow. That is, with mechanical or psychological adjustments (Mayne, 2008; Johnson, 2013) such as pitch command/execution improvement, look for pitchers like Mitchell, Velasquez, Frias, Lorenzen, Foltynewicz, and Sanchez to become successful, productive pitchers in 2016 or beyond.

Page 21 of 55 J. Hochstedler I MIT 2016

2.3. Incorporating Analyses Into MLB Decision-Making Processes

The importance of communicating with data cannot be underestimated. A pristine analysis will fail if it is not communicated clearly and effectively throughout the organization. This section outlines the verification and communication processes required to effectively transfer the knowledge created via these analyses.

2.3.1. Model Verification Prior to providing input to organization decision-makers, any prediction, model, or recommendation shall be independently verified. Verification may be qualitatively reviewed via peer review, however quantitative test methods provide an objective, unbiased review of the recommendation.

For example, a quantitative method of testing player projection models shall be established prior to development of the actual model. Here, four methods are presented which test and verify four distinct player projection models from the 2015 MLB season. Because test methods should be developed prior to examination of the models themselves, each shall be anonymized to prevent analyst bias.

First, the data must be characterized and scaled to ensure alignment across models. As these models are used to predict individual performances and less concern is present about systemic evaluation (i.e. this is not used to evaluate league averages year over year), each model was scaled by adding a fixed offset to each individual hitters' value such that each model's mean equated to the actual mean. Characterization of the models are displayed below in plate appearances and scaled hitting value.

Page 22 of 55 J. Hochstedler I MIT 2016

Plate Appearances Mean Hitting Value (scaled)

Actual 169,577 Actual 0.2612 Model A 192,857 Model A 0.2612 Model B 173,470 Model B 0.2612 Model C 162,171 Model C 0.2612 Model D 183,498 Model D 0.2612

For further characterization, Figure 7 presents the predicted vs. actual hitter values of a single model, Model A.

Model A - Predicted vs. Actual Hitter Values 0.45

0.40

0.35- 00.0* %.* ' . * GJ 6, *0

0.30- 0 00 G) = .*. .' .* **. * 'U 0.25 - *W O 0*..'. '-A ... *o. ... , * 0.20 - .*

000 0 0 0. 15 F

0.10' 0.1 5 0.20 0.25 0.30 0.35 0.40 Predicted Hitter Values

Figure 7. Scatter Plot of Model A Predicted vs. Actual Hitter Values

Page 23 of 55 J. Hochstedler I MIT 2016

Similarly for characterization purposes, Figure 8 presents the distributions of each hitter projection model.

Model A -- sd = 0.0244 Model B -- sd = 0.022 35 - I 35 30 30

25- 25 I >1 20- I 20- 15- 15 -

10- 10.

5- 5-

0 0 0.15 0.35 0.40 0.15 0.20 0.25 0.30 0.35 0.40 Hitter Value Hitter Value

Model C -- sd = 0.0222 Model D -- sd = 0.0204 wU., 40 -

35 35

30 30-

25 25-

8 20 20

15 15

10 10

5 5-

Im , 1 0,15 0.20 0.25 0.30 0.35 0.40 0.15 0.20 0.25 0.30 0.35 0.40 Hitter Value Hitter Value kILM Figure 8. Distributions of Hitter Projection Models

Gaussian fits provide further comparison between the models and the actual values. Figure 9 presents this characterization. Note: because each model is projecting identical player sets (the area under the curves are the same), the height of each Gaussian distribution allows the researcher to quickly understand the comparative differences in standard deviation.

Page 24 of 55 J. Hochstedler I MIT 2016

Gaussian Fit to Hitter Value Distribution 20

15

S10 0' WJ

- Model A - Model B - Model C Model D - Actual -02V 0.0 0.2 0.4 0.6 0.8 1.0 Hitter Value

Figure 9. Gaussian Distribution of Four Player Projection Models with Actual Values

The long tails that skew the Actual distribution fit are magnified by atypical hitter values caused by low PA sample sizes. Therefore, the data has been filtered to exclude hitters with less than 50 PA's, yielding the presentation in Figure 10.

Gaussian Fit to Hitter Value Distribution

151-

S10

- Model A 5 Model B Model C Model D Actual

0.: .0 0.15 0.20 0.25 0.30 0.35 0.40 0.45 Hitter Value

Figure 10. Gaussian Distribution After Removal of Small Sample Players

Page 25 of 55 J. Hochstedler I MIT 2016

Test method selection is critical. Review of available test methods and industry precedence shall be performed prior to determining the appropriate test method(s). Here, four distinct test methods are used: mean absolute error (MAE), root mean square error (RMSE), competition, and receiver operating characteristic (ROC) measurements. MAE and RMSE are straightforward metrics used within the industry (Wyers, 2011; Tango, 2011; Swartz, 2012; Paine, 2015), whereas the competition and ROC analyses possess sensitivity.

2.3.1.1. Mean Absolute Error MAE measures the mean magnitude of errors from a set of forecasts. Specifically, the MAE averages the differences between forecast and its corresponding observation (actual value).

Symbolically, MAE is defined as: n 1 ~- MAE = - -Pi-Ai| n i=1 where: Pi =predictedvalue Ai = actual value

Performing this analysis yields the following results.

Model MAE Model A 0.0288 Model B 0.0283 Model C 0.0279 Model D 0.0269 Table 5. Mean Absolute Error Results

Here, Model D is observed to be the most accurate prediction algorithm as it maintains the lowest MAE.

Page 26 of 55 J. Hochstedler I MIT 2016

2.3.1.2. Root Mean Square Error RMSE averages the square of the difference between individual forecast, observation pairs. Then, the square root of that average is taken. Due to the initial raise by power of 2, RMSE places more emphasis on (negatively impacts) outliers and more risky (higher std. dev.) projection systems.

Mathematically, RMSE is represented as:

n

RMSE= - (Pi - A) 2 n where: Pi =predictedvalue Ai = actual value

Model RMSE Model A 0.0372 Model B 0.0363 Model C 0.0363 Model D 0.0351 Table 6. Root Mean Square Error Results

As Table 6 displays, Model D once again emerges as the most accurate projection model.

2.3.1.3. Competition Testing In this test method, all models "compete" to determine which method "wins" most often. Here, the closest prediction model is awarded a "W" (win) and the remaining three models are given an "L" (loss). Option cutoffs may be used to distinguish wins, losses, and ties. If employed, a sensitivity analysis must ensue. The results are presented in Table 7 and display that Model D is more accurate, more often than its three competitors.

Page 27 of 55 J. Hochstedler I MIT 2016

Model Win % Model A 0.227 Model B 0.257 Model C 0.244 Model D 0.272 Table 7. Competition Test Results

Note: careful attention must be paid to this test method. Similar to a political election, two models may be extremely similar and "split" votes/competitions, allowing an inferior candidate/model to emerge victorious. Nevertheless, this model was included if for no other reason than to make this point.

2.3.1.4. Modified Receiver Operating Characteristic ROC curves present the accuracy of a classification model by varying the discrimination threshold. Once generated, the overall performance is measured using the area under the curve (AUC). To ease communication with those unfamiliar with standard ROC curves that traditionally possess true-positive and false-positive rates on the axes, the x-axis has been replaced with the threshold limits. Figure 11 displays the ROC curves for the four models. Modified ROC Curves 1.0

0.8

0) 0.6

0.4

- Model A 0.2- - Model B - Model C Model D nn 0.00 0.05 0.10 0.15 0.20 Threshold Limit (Hitter Value)

Figure 11. Performance Evaluation of Each Prediction Model

Page 28 of 55 J. Hochstedler I MIT 2016

Zooming the curve to a more interesting window yields Figure 12. To aid in understanding the significance of these figures, observe the lower x-axis extreme. A threshold limit of 0 would require a given projection model to predict precisely the hitter's value it attempts to predict. As this is extremely unlikely, the accuracy rate is observed to be at - or near - zero. Conversely, if the threshold limits were expanded to the largest bound (say, +/- 0.200), each model would "win" on its projection of every hitter.

Modified ROC Curves 1.0

0.8~

.4) 0.6

4-0 LI CC

BMode----

-- Model A 0.2 - Model C Model D 0.0 O.A 0 0.01 0.02 0.03 0.04 0.05 Threshold Limit (Hitter Value)

Figure 12. Zoomed Evaluation of Each Prediction Model

In observing the results, it is clear that the four models are closely contested, however Model D emerges the victor in all four test methods. These models can be further analyzed across subgroups (e.g. separation by age, experience, performance types, etc.). After improving the best model (or using a hybrid model), these projections must be incorporated into the day-to-day operations of the MLB organization.

Page 29 of 55 J. Hochstedler I MIT 2016

2.3.2. MLB Organizational Structure Once appropriate verification of models and analysis has occurred, the results must be incorporated into the organization's decision-making process. Naturally, organizational structures vary considerably across all teams. For example, an emerging trend among teams includes an additional link in the executive decision-making chain; specifically, some teams employ a President of Baseball Operations which operates between the Owner/Chairman and General Manager. While many other titles exist such as Vice President, Special Assistant, Coordinator, and others, the simplified structure presented in Figure 13 serves as a framework for which to integrate machine learning analyses into the decision-making process.

Owner / Chairman

General Manager

Assistant General Manager

Director Director Director Director Professional Scouting Amateur Scouting Research & Development Baseball Operations

Research & Systems Quantitative Analysis Development

Figure 13. Simplified MLB Team Baseball Operations Organizational Structure

While reporting structures differ across organizations, each team shall maintain a common, organized database to house the analyses of individual players and teams. These can include quantitative data, player value, scouting information, and more. Any machine learning analysis performed shall be simplified and presented within the team database enabling efficient

Page 30 of 55 J. Hochstedler IMIT 2016 consumption by team decision makers. Additionally, extreme care and security measures (such as IP address restrictions, VPN requirements, and two-factor authentication) shall be utilized to ensure the data is used for team employees only.

Page 31 of 55 J. Hochstedler I MIT 2016

This page intentionally left blank.

Page 32 of 55 J. Hochstedler I MIT 2016

Chapter 3. Analysis & Decision Making in the National Football League

While spatiotemporal analysis can impact media (Hochstedler & Hurst, 2016), the focus here remains on competitive advantage and strategy. Gameday preparation and player evaluation cannot be underestimated. (Polian, 2014) As the main objective of scouting is to obtain as much useful information about a future opponent as possible, (Belichick, 1962) the analyses presented here are additional methods of securing useful information. That is, these methods are supplemental and shall not replace vital methods such as video scouting. It is important to note, however, that proper use of spatiotemporal football data will also enable efficiency gains in existing processes such as player evaluation video review.

3.1. Winning and Avoiding Injuries

Within football and media organizations, significant resources are consumed through the analysis of player performance and decision-making, especially at the (QB) position. Since 2014, radio-frequency identification (RFID) tracking technology has been used to continuously monitor the on-field locations of NFL players. (Zebra Technologies, 2015) Using geospatial American football data, this research quantitatively evaluates receiver openness, player elusiveness, and QB decision-making. 2

In addition to enhancing win probability, using NFL injury data, we have discovered how QB decisions and passing ability impact the likelihood of receiver concussions. Specifically, Figure 14 demonstrates how better passers reduce team losses and receiver concussions. (Mrkvicka and Hochstedler, 2015) By making better decisions and finding the open receiver, QB's can put their receivers and their teams in better positions to succeed.

2 This section on receiver "openness" and QB decision-making has been published in the 2016 MIT Sloan Sports Analytics Conference (Hochstedler, 2016).

Page 33 of 55 J. Hochstedler IMIT 2016

10.0 WR/TE Concussions 8.0 Team Losses Per Year 6.0 4.0 2.0 0.0 < 80 80-90 > 90 Passer Rating

Figure 14. NFL QB Passer Rating vs. Receiver Concussions and Team Losses from 2012 - 2015

3.2. Data Collection

NFL RFID data is not publically available for analysis at this time. In order to perform similar analysis, spatial coordinates of all twenty-two on-field players were captured based on game video at three frames per second to the nearest 0.25 yard for every base offensive pass play for the 2014 Indianapolis Colts (231 total plays). Here, the "base offense" is defined as 1s & 2nd down, less than 15-point differential, greater than five minutes remaining in a half, and between the 20 yard lines.

In order to evaluate decision-making, utility must be defined. The success outcomes that this analysis considers are completions and yards gained. The base offensive pass plays for the 2014 Indianapolis Colts provide a significant sample that allows for an analysis of Andrew Luck's decision-making. Because strategy changes towards the end of a half and near the end zones, these play types are removed from consideration. Additionally, this dataset focuses on ls and

2 nd down plays since the marginal team benefit of additional yardage is more direct, which doesn't hold true on 3 rd and 4th downs. In short, a significant benefit is earned when an offensive team achieves more yardage than the yard to gain (i.e. when the offensive team "picks up the 3 first down") on 3rd and 4" downs. Figure 15 provides a visual representation of team utility.

3 While this utility function is assumed, in truth, there are slight benefits in achieving a first down on I" and 2nd down plays, however they are minimal compared to picking up a first down on 3rd or 4th down.

Page 34 of 55 J. Hochstedler I MIT 2016

k

st nd Down 2., I s &

rd th .3 & 4 Down

Yards Gained Distance to Gain First Down

Figure 15. Down and Distance Utility Function

3.3. Receiver Openness and QB Decision-Making

Player velocities significantly impact receiver routes and the defense of those routes. Because the player velocity is important in addition to positional data, a "predictive" Voronoi tessellation has been developed to quantify receiver "openness ". That is, because the game relies on "where a player will be" not necessarily "where he currently stands," predictive methods more accurately reflect player decision-making as it relates to geospatial analysis. Figure 16 and Figure 17 outline the distinctions between a traditional and "predictive" tessellation.

,

Figure 16. A traditional Voronoi Tessellation where Figure 17. A "predictive" Voronoi Tessellation both players are immobile and assumed to possess where the blue player possess a non-zero velocity identical acceleration profiles. The blue cell toward an immobile orange player. The blue player maintains the shortest Euclidean distance to every is moving fast enough that he now "owns" the ground point shaded blue, whereas the orange player behind the orange player since he can reach those maintains the shortest distance to every point shaded points more quickly (in time) even though the orange orange. player is currently the closest player (in distance) to the points in that cell.

Page 35 of 55 J. Hochstedler I MIT 2016

To assist in illustrating this concept, the first five frames from the Colts first play from the 2014 season are illustrated below. Taken at 3 Hz, these figures commence 1.33 seconds after the which is at the top of Quarterback Andrew Luck's dropback.

Frame 5. With this play designed to the Frame 6. With two receivers maintaining offenses' right side of the field (Wayne on small zones (i.e. "covered"), Luck remains the "over" or deep crossing route), Luck patient, allowing the play to develop. looks left to hold off the free safety.

Frame 7. With Hilton's inside release go- Frame 8. With the corner staying true to his route, he successfully occupies the assignment, Luck observes Wayne's zone responsible for the right one- starting to open. third of the field.

Frame 9. Luck makes his decision to throw to Wayne just as his zone reaches its maximum (i.e. when he becomes "wide- open"). Play result: good decision, good outcome. That is, Luck threw to a wide-open Wayne (good decision) for a reception and gain of 21 yards (good outcome).4

4 Images obtained from NFL.com and reproduced here under 17 U.S. Code 107 (Fair Use)

Page 36 of 55 J. Hochstedler I MIT 2016

3.3.1. Zone Size By analyzing all twenty-two players' instantaneous positions and velocities, the Voronoi Tessellation is performed after projecting each player's position forward two frames (2/3 of a second) and finding the respective zone area (yards2) for each player. Zone areas are then established for each eligible receiver for each frame throughout the play. Figure 18 displays the zone area distribution for every eligible receiver in 2014 Colts' pass plays at the moment of QB release. To further understand the "predictive" tessellation, refer to Appendix I, which displays the evolution of the Indianapolis Colts' first offensive play from scrimmage from the 2014 season. Using the geospatial coordinates captured, this analysis was performed on each base offensive pass play from the 2014 season.

160

140-

120

100-

80

60

40

20

0 0 100 200 300 400 500 600 Zone Size (yards^2)

Figure 18. Distribution of Predicted Voronoi Zone Size for All Eligible Receivers at QB Decision Point

3.3.2. Zone Integrity While zone size describes how much of the field a receiver "owns," a defender may still be lurking nearby to ultimately break-up the intended pass. Therefore, projected zone integrities are calculated for each eligible receiver and play frame. Zone integrity is measured as the projected distance to nearest defender from an eligible receiver.

Page 37 of 55 J. Hochstedler I MIT 2016

3.3.3. Openness Classification To enable classification of the openness of an eligible receiver, zone size and integrity are combined to simplify the analysis and explain the spirit of what can be measured using the geospatial data. Receiver openness is classified as: wide-open, open, defended, or well-defended. Specifically: - Wide-open = zone area > 200 yards2 or integrity > 8 yards

e Open = zone area between 100-200 yards2 or integrity between 4-8 yards

e Defended= zone area between 50-100 yards2 or between 2-4 yards 2 e Well-defended= zone area < 50 yards or integrity < 2 yards

Table 8 displays Andrew Luck's 2014 completion percentage for each targeted receiver's openness. Note: four plays were not included in the data set, as they did not have a clearly defined targeted receiver.

Openness of r Plays Completions Completion % Targeted ReceiverJj Wide-open 69 49 71% Open 70 49 70% Defended 67 39 58% Well-defended 21 11 52% N/A 4 0 0%

Total 231 148 64% Table 8. Andrew Luck's 2014 Completion Percentage as Function of Receiver Openness

As observed, Luck completed a higher percentage of passes to open and wide-open receivers than those who were defended or well-defended.

Page 38 of 55 J. Hochstedler I MIT 2016

3.3.4. Expected Gain For each frame throughout the play, expected yardage is also derived by observing the maximum y-value of that receiver's predicted zone. In a theoretical "pure play" where an ideal throw (on- time and on-target) is matched with an ideal catch and a defender making an ideal (including reaction and pursuit angle), a receiver's maximum gain would occur at the point in his zone which is the furthest point possible down field.

Figure 19 displays Luck's decision point from the Colts first offensive play from scrimmage from the 2014 season. Hilton possesses a smaller zone (red) while Wayne maintains a large zone (blue). If Luck were to make an ideal pass at this moment (with an ideal catch and a pure tackle), Wayne should be expected to obtain 26 yards on the play (northern-most point in his zone). On this particular play Luck slightly underthrows Wayne, causing Wayne to flatten out his route and carrying him out of bounds with a 21-yard gain.

(+31.0,+16.5) +rt36.5,+12.0)

3.50+8.5)

Figure 19. First 2014 Colts Play from Scrimmage. Receiver Wayne Classified "Wide Open" (Blue Zone)5

5 Image obtained from NFL.com and reproduced here under 17 U.S. Code 107 (Fair Use)

Page 39 of 55 J. Hochstedler I MIT 2016

Figure 20 demonstrates the comparison of the expected gain for every completed pass (148 in total) against the play's actual gain. Plays above the regression identify plays where the actual gain exceeded the expected gain.

Expected Gain At Moment of QB Release 90 80--- 70 60

40 - - 1---- -__ ~30 20 10 - -- 0 -10 -1 --2 D- 225 31 36 40 ---45 so

-2UA Expected Gain (yards)

Figure 20. Expected Play Gain on 148 Completions

3.3.5. Player Elusiveness While each completion possesses an expected gain in yardage (whether positive or negative), the actions of the QB, receiver, and defenders will ultimately define the actual yardage achieved. For example, a poorly executed pass that forces a receiver to for the catch will likely reduce the actual yardage gained. Alternatively, several broken tackles after the catch will likely lead to a higher than expected actual gain. For each play, the difference in actual yardage gained is measured for each individual receiver. In sum, the additional yardage gained from the expected yardage defines a player's "elusiveness." That is, the more yards a player gains than expected, the more elusive he is. Table 9 presents the player elusiveness for each player on the 148

Page 40 of 55 J. Hochstedler I MIT 2016 completions. Notice how running backs (RB) are more elusive than tight ends (TE) and wide receivers (WR).

Position Additional Gain from Catches Mean Expectation (Yards) All Colts RB's 120 29 4.14 All Colts TE's 52 34 1.53 All Colts WR's 88 85 1.04 Table 9. Receiver Elusiveness for 148 Completions from 2014 Colts Base Offense

3.3.6. QB Decision Analysis Although plays typically designed to have primary, secondary, and tertiary targets, the decision of which receiver to target ultimately comes down to the QB. A QB can check down his options and decide for whom to target with his pass. This analysis attempts to model how those decisions are made based on geospatial elements created through zone size and integrity of eligible receivers. Here, the zone size and integrity are combined to quantify a receiver's openness along with his expected gain.

Using these factors, Andrew Luck's decision-making can be analyzed. For a given play, receiver options are skill players who do not block. Considering every eligible receiver at each frame (taken every 1/3 of a second during play development) prior to the pass release frame (final QB decision point), each receiver frame is assigned an openness (wide-open, open, defended, and well-defended) and an expected yardage of gain. These two factors are then combined to produce an expected utility if that option was chosen at that play frame.

The expected payoff (EP) is calculated as: EP = P(CIO)x E

where:

P(CI 0) = probabilityofpass completion given currentstate of receiver openness E = expected gain (yards)

Page 41 of 55 J. Hochstedler I MIT 2016

The expected payoff yardage is then captured for all options prior to the target selection, and the target payoff is compared to the play population options to determine the percentile of the target receiver expected payoff.

Decision types as a percentile of optional targets for given play:

e A percentile of 80 was classified as an ideal target decision. - A percentile of 50 was classified as a preferredtarget decision. - A percentile of 20 was classified as a neutral target decision. * A percentile of below 20 was classified as an undesirable target decision.

In order to isolate QB decision-making, plays in which the intended receiver is unidentifiable (e.g. QB is hit as he throws, four plays in total) are removed from this analysis. Table 10 displays the results of the decision analysis. Specifically noting how QB Andrew Luck made an ideal or preferreddecision on more than 75% of the pass plays analyzed.

Decision Plays Percentage of Type Total Plays Ideal 92 40.5% 75.8% Preferred 80 35.2% Neutral 45 19.8% 24.2% Undesirable 10 4.4% N/A 4 Total 231 Table 10. QB (A. Luck) Decision Analysis Results

Player tracking data that possess time and location characteristics can be leveraged for the quantitative, unbiased analyses of on-field decisions, such as the one presented here can undoubtedly impact decisions made off the field by organizational leaders.

Page 42 of 55 J. Hochstedler I MIT 2016

3.4. Play Identification Using Supervised Learning

Machine-learning classifiers are used to identify formations, routes, and blocking strategies for each of a team's eleven offensive players. Key features are distilled from the analysis of player positioning, proximities, and trajectories. All machine learning algorithms have been trained on a synthetic data set provided by EA Sports and tested against the collected data for the 2014 Indianapolis Colts.

3.4.1. Formation Classification Individual player starting locations are defined by proximity to teammates, opponents, various fixed field points (e.g. hashes, boundaries, and numbers), as well as floating points (e.g. line of scrimmage). Logic has been established to classify player alignment as one of ten distinct positions. These position classifications and additional data are used as features in a supervised machine-learning algorithm to categorize the overall offensive formation.

3.4.2. Action Classification Upon the start of play execution (the snap), player trajectories are analyzed for every position on the field in order to classify the player action. Described here, movements by players at the wide receiver (WR) position are classified.

Basic receiver routes are comprised of three essential components: a stem, pivot, and branch. The stem is the initial segment from starting point (snap, t = 0) to the pivot (re-direction) point. The branch is the final component of the route for which the WR runs. Using a segmented Euclidean regression, features are distilled from each WR trajectory. That is, in order to describe the route with specific features, it must first be estimated in mathematical terms. Figure 21 displays basic segmented Euclidean regression of two separate post routes (one from the left, one from the right) by Colts receiver T.Y. Hilton.

Page 43 of 55 J. Hochstedler I MIT 2016

play id: 2014110300_0334 play id: 20141009003341 player _num: 13F player _numr: 13 40 route: 8 - post 40. route: 8 - post

35- 35

30 - 30

25 - 25

20 20

15 15-

10 10.

5

U - 0- -10 -5 0 in -10 -5 5 10 Distance (Yards) Distance (Yards)

Figure 21. Segmented Euclidean Regression of Two Distinct #8 (Post) Routes

Each route is explicitly classified by its stem angle (heading), stem length, branch angle, and branch length. Euclidean "precision" error is defined as: n 2 2 PE = (x2 - X1 ) + (Y2 - yi) i=1 where:

xi = observed coordinate(represented by black dot)

x2 = estimated coordinate (representedby nearestpoint on red line)

Associated play features for each example route are outlined below.

play id = 2014100900 3341 play id = 2014110300 0334 stemheading = 94.4 stemheading = 91.9 stemframes = 9 stemframes = 9 stemdistance = 20.59 stemdistance = 22.99 branchheading = 59.06 branchheading = 119.05 branchframes = 4 branchframes = 3 branchdistance = 12.79 branchdistance = 9.75 precisionerror = 4.54 precisionerror = 2.11

Page 44 of 55 J. Hochstedler I MIT 2016

Routes with greater complexity can be classified using additional techniques. For example, these methods can be expanded to a three segment Euclidean regression. Figure 22 displays drag routes that would be difficult to classify using a simple two-segment regression. That is, because these routes possess a high precision error, an additional spline segment is needed.

7

6.

.18 -16 -14 -12 -10 -4 6 -4 4 -0 Locaw an Feld (Vds)

Figure 22. Three Segment Euclidean Regression of Multiple Drag Routes

As every route run is classified using these methods, Figure 23 displays a full play with four receiver routes. Note: dashed black lines depict observed route paths, red lines display the fitted routes.

Page 45 of 55 J. Hochstedler IMIT 2016

20141228110404 I preadteright I gWtWong

Figure 23. Full Play Displaying Actual and Estimated Receiver Routes

These features are then imported into a supervised learning algorithm to identify route type. Specifically, several classifiers (Random forest, SVM, K-NN, and Naive Bayes) were trained using synthetic data created by the EA Sports Madden engine and tested against the actual Colts 2014 data. Figure 24 displays the confusion matrix of the most accurate model.

Page 46 of 55 J. Hochstedler I MIT 2016

1.0

Normalized confusion matrix. Accuracy = 68.0% 0.9 arrow

0.8

slant 0.7

out 0.6

.0 'U 0.5 0J dig -

4 .1 0.4 corner -

0.3 post [

0.2

go ELL 0.1 4h

0.0 Predicted label

Figure 24. Confusion Matrix of Receiver Route Prediction Model

While a 68.0% accurate classifier leaves room for improvement, one important observation emerges from the confusion matrix; that is, while the classifier is deemed "somewhat accurate" (68.0%) on its prediction, it misses to the most similar route-type. For example, the classifier predicts an "out" route as one of three types: arrow, out, or corner, which are all similar in that they are receiver routes with a heading toward the near boundary (away from the QB). Refer to Figure 25 for this distribution. Furthermore, notice significant error in prediction distinction between #8 (post) and a #9 (go) routes. As many deep concepts possess an option between the #8 and #9 routes (depending on the defensive alignment) this prediction error is not as critical. For example, with a "MOFO" (middle of field open) coverage type such as Cover 0 or Cover 2, a

Page 47 of 55 J. Hochstedler IMIT 2016 receiver may option his route to the middle of the field as a post as opposed to continuing on his go route. (Brown, 2015)

18

16-

14

12

10

4

2

Figure 25. Prediction Distribution of True #3 (Out) Routes

Admittedly there are several flaws with this approach: first, the training set was performed using a synthetic data set; second, this model used the simplified, two segment Euclidean regression classification method; finally, the training and testing sample sizes are relatively small, thus increasing the margin for error. However, the root methodology provides much promise on the road to machine classification of play types.

3.4.3. Play Concept Classification and Similarity Features for each play are distilled from the spatiotemporal data. For example, formation, route types, QB drop back depth, and blocking schemes are identified and quantitatively characterized. Utilizing a Kohonen Network (a similar technique found in Sec 2.2 of this thesis), the features were then collected and inserted in the neural network to group plays by similarity. Figure 26 displays the clustering count results of the 2014 Indianapolis Colts base offense.

Page 48 of 55 J. Hochstedler I MIT 2016

10

8

6

4

2

Figure 26. Kohonen Map of 231 Offensive Plays

Through the development process of the Kohonen network, the author chose to cluster play types into seven distinct groups (defined by the seven colors). This is accomplished by minimizing the Euclidean distances between all nodes (individual circles). Figure 27 displays the grouped plays with a short description of their play type.

Short/intermediate routes w/ run action Screens

Intermediate & deep route concepts Spread, short routes

Multiple routes, deep Max protect, deep routes

Figure 27. Kohonen Map Displaying Seven (qty. 7) Clusters & Associated Tags

Page 49 of 55 J. Hochstedler IMIT 2016

After clustering, play outcomes within each cluster are analyzed to understand successful versus non-effective play types. From Table 11, it is evident that certain play types are more effective than others. For example, max protection deep (i.e. green) play concepts yield the highest yards per completion with a lower than average completion percentage whereas screens (black) possess high completion percentages with the lowest expected yardage. While these two examples depict the extremes (short vs. deep concepts), this analysis will most benefit the intermediate play types. That is, blue-type plays should be further investigated via video to understand what leads to their success.

Cluster Type Comp. Plays Yards Yds/Play Comp. % Yds/Comp INT Max Protect, Deep 23 45 494 11.0 51.1 21.5 1 Intermediate 24 34 236 6.9 70.6 9.8 0 Multiple Routes, Deep 15 21 263 12.5 71.4 17.5 1 Short/Intermediate 32 52 391 7.5 61.5 12.2 0 Intermediate/Deep 22 36 254 7.1 61.1 11.5 3 Short/Intermediate 12 20 180 9.0 60.0 15.0 2 Screens 20 23 93 4.0 87.0 4.7 0 All 148 231 1911 8.3 64.1 12.9 7 Table 11. Outcome Analysis of Seven Play Types

3.5. Incorporating Analyses Into NFL Decision-Making Processes

As technology in the NFL continues to evolve, the author recommends a similar central system to convey the output of these technically deep analyses. Significant care must be taken to ensure A) appropriate analyses are conducted and B) the results of each analysis is conveyed clearly and efficiently to team decision-makers. Just as in an MLB organization, NFL coaches and front office personnel operate with very distinct responsibilities.

Page 50 of 55 J. Hochstedler I MIT 2016

3.5.1. NFL Organizational Structure Similar to the MLB organizational structures, NFL teams posses scouting and operations verticals beneath the General Manager. However, as of the date of this research, only 14 of 30 NFL teams possess an "analytics" department with many consisting of a single employee. Thus with limited technical (mathematical, software) expertise, appropriate verification of any quantitative analysis is critical. While many other titles exist such as Vice President, Scout, Coordinator, and others, the structure presented in Figure 28 provides a simplified structure for which to integrate machine learning and other geospatial analyses into the organization's decision-making process.

Owner / Chairman

General Manager

Director Director Director Director Pro Personnel College Scouting Research & Development Football Operations

Research & Systems Quantitative Analysis Development

Figure 28. Simplified NFL Team Football Operations Organizational Structure

Members within the research department must maintain the analytical skills to conduct the required quantitative analysis while also possessing the soft skills necessary to effectively communicate to broad football audiences. While specific knowledge of minute details is not necessary, a solid football foundation shall include knowledge of play concepts, coverage types, blocking schemes, and more. If both hard and soft skills exist within the R&D department, the organization will maximize value and gain significant leverage from spatiotemporal data and other important information sets.

Page 51 of 55 J. Hochstedler I MIT 2016

Chapter 4. Conclusion

In summary, this research has provided a few examples of machine learning analyses of spatiotemporal data in Major League Baseball and the National Football League. Although many MLB organizations are a decade deep into using analytical and quantitative analysis to improve decision-making, NFL teams lag behind. While all teams should look to maximize value from data, it shall be carried out responsibly. Specifically, primary takeaways shall be distilled from any research conducted and presented to organizational decision makers in an efficient and straightforward manner. If conducted and communicated properly, NFL organizations will begin to incorporate more analytical decision-making processes just as its older MLB brother learned more than a decade ago.

Page 52 of 55 J. Hochstedler I MIT 2016

Chapter 5. References

Belichick, Steve. 1962. Football Scouting Methods. Ronald Sports Library. Brown, Chris B. 2015. The Art of Smart Football. Chris B. Brown. Hochstedler, Jeremy. "Performance vs. Production." DiamondChartsLLC.com. 24 April 2013. Web. Accessed 1 March 2016. Hochstedler, Jeremy and Kellen Hurst. "Telemetry Sports: Bringing Power to Data." Tech Crunch & NFL's Ist and FutureStartup Competition. 6 Feb 2016. Stanford University. Hochstedler, Jeremy. 2016. "Finding the Open Receiver: A Quantitative Geospatial Analysis of Quarterback Decision-Making." MIT Sloan Sports Analytics Conference. Johnson, Derek. 2013. The Complete Guide to Pitching. Human Kinetics. Lindbergh, Ben and Rob Arthur. "Statheads Are The Best Free Agent Bargains In Baseball." FiveThirtyEight.com. 26 April 2016. Web. Accessed 27 April 2016. Mayne, Brent. 2008. The Art of Catching. Cleanline Books. Mrkvicka, Neil, and J. Hochstedler. 2016. A subset of research first published in "Finding the Open Receiver: A Quantitative Geospatial Analysis of Quarterback Decision-Making." MIT Sloan Sports Analytics Conference. Paine, Neil, and Rob Arthur. "Is 2015 The Year Baseball's Projections Failed?" FiveThirtyEight.com. 7 August 2015. Web. Accessed 1 March 2016. Passan, Jeff. 2016. The Arm: Inside the Billion-DollarMystery of the Most Valuable Commodity in Sports. Harper-Collins Publishers. Polian, Bill, and Vic Carucci. 2014. The Game Plan. Triumph Books. Python Software Foundation. Python Language Reference, version 2.7. Available at http://www.python.org. Packages include: sklearn, matplotlib, numpy, scipy, & pymongo Swartz, Matt. "Testing Projections for 2011." Fangraphs.com. 9 February 2012. Web. Accessed 1 March 2016. Tango, Tom. "Testing the 2007-2010 Forecasting Systems - Official Results." InsideTheBook.com. 15 February 2011. Web. Accessed 1 March 2016.

Page 53 of 55 J. Hochstedler I MIT 2016

Wyers, Colin. "Reintroducing PECOTA." BaseballProspectus.com.7 February 2011. Web. Accessed 1 March 2016. Zebra Technologies. "Partners in Innovation." Zebra. com. Web. Accessed December 2015.

Page 54 of 55 J. Hochstedler I MIT 2016

Chapter 6. Acknowledgements

This research is derived from the instruction and inspiration of many, but first I commend those who made portions of this research possible. Thank you to MIT, Tech Crunch, the NFL, Stanford University, EA Sports, Katy Mrkvicka, Neil Mrkvicka, Troy Mrkvicka, and Craig Mrkvicka. Additionally, I would like to give special recognition to my closest for their guidance and assistance to date.

To those who have instructed: thank you. e My first coaches Dad and Jerry Hampton, high school coach Dennis Kas, and my college coaches Sean Bendel and Jeff Jenkins taught me to compete. * From Steve Sotir and many college coaches at ABCA, I learned how to teach. e Rick Espeset and Dan Sprunger welcomed me into - and taught me the meaning of - a program. Through many late night discussions with colleagues Paul Gagnon, Zach Cardwell, Caleb Eiler, and Tyler Foxworthy, I was able to expand my technical abilities. * All former players for whom I've coached, I assure you I learned more from you than you did from me. * Various front office personnel in both MLB and the NFL have guided portions of my development as an analyst.

Now, a hat tip to those who have inspired. * To my college roommates/teammates Alex Decker, Kellen Hurst, Adam Knaack, Jimmy Murray, Matt Salisbury, and others, without you my passion for sports wouldn't exist. " The early believers in our ability to produce quality data analyses in sports, specifically Sean Bendel, Rob Smith, Adam Revellete, and Stu Fritz provided the optimism to continue in this path. * For those few doubters (whether by actions or words), your fuel played a small part in keeping this storm rolling. " To my parents, sister, Sam, and others whose actions I observe, I cherish your teachings, both spoken and non-verbal. * Finally to my wife Amy and Grady, Tyler, and Molly: Thank you, for everything; I love you.

Page 55 of 55