A Data-Driven Method for In-Game Decision Making in MLB
Total Page:16
File Type:pdf, Size:1020Kb
A Data-driven Method for In-game Decision Making in MLB When to Pull a Starting Pitcher Gartheeban Ganeshapillai John Guttag Massachusetts Institute of Technology Massachusetts Institute of Technology 77 Massachusetts Avenue, 77 Massachusetts Avenue, Cambridge, MA 02139 Cambridge, MA 02139 [email protected] [email protected] ABSTRACT 1. INTRODUCTION Professional sports is a roughly $500 billion dollar industry Perhaps the most important in-game decision a baseball that is increasingly data-driven. In this paper we show how manager makes is when to relieve a starting pitcher [2]. To- machine learning can be applied to generate a model that day, managers rely on various heuristics (e.g., pitch count) could lead to better on-field decisions by managers of pro- to decide when a starting pitcher should be relieved [2, 17]. fessional baseball teams. Specifically we show how to use In this paper, we derive from data a model that can be used regularized linear regression to learn pitcher-specific predic- to assist in making these decisions in a principled way. tive models that can be used to help decide when a starting We pose the problem as classification problem of predict- pitcher should be replaced. A key step in the process is our ing whether a starting pitcher would give up at least one run method of converting categorical variables (e.g., the venue if allowed to start the next inning. (This formulation seems in which a game is played) into continuous variables suit- appropriate late in close games, but is probably not the right able for the regression. Another key step is dealing with formulation early in a game or during a blowout.) Our situations in which there is an insufficient amount of data method uses information about the current at-bat, game to compute measures such as the effectiveness of a pitcher situation, and historical data. against specific batters. First, we reformulate our problem into a regression prob- For each season we trained on the first 80% of the games, lem using Pitcher’s Total Bases (PTB) [9] instead of runs and tested on the rest. The results suggest that using our allowed as the dependent variable because PTB is generally model could have led to better decisions than those made accepted as a better indicator of pitcher effectiveness. We by major league managers. Applying our model would have solve the regression problem using regularized least squares led to a different decision 48% of the time. For those games to produce a pitcher-specific predictor for the expected PTB in which a manager left a pitcher in that our model would in the next inning. The main technical interest of this work have removed, the pitcher ended up performing poorly 60% lies in our approach to constructing the variables used in the of the time. regularized regression. Categories and Subject Descriptors • Many of the important predictors that provide con- textual information such as the next batter, the venue I.2 [Artificial Intelligence]: Applications and Expert Sys- in which the game is played, etc., are categorical vari- tems; G.3 [Probability and Statistics]: Multivariate statis- ables. We convert these categorical variables into con- tics, Time series analysis tinuous variables by deriving associations between these variables and various statistics in the historical data. General Terms For example, the batting team is represented by prior statistics such as the average number of hits against Machine Learning Applications the current pitcher. We call these statistical associa- tions priors. Keywords • MLB, Predictive Modeling In those cases where the amount of data available to derive an association is small, we shrink the prior using global averages to prevent overfitting. We evaluate our method on the MLB 2006-2010 data set 1 from STATS Inc. , which contains a record of each pitch Permission to make digital or hard copies of all or part of this work for personal or thrown in MLB games in both the regular and post seasons. classroom use is granted without fee provided that copies are not made or distributed We train our model using the first 80% of the games of each for profit or commercial advantage and that copies bear this notice and the full cita- season and test it on the last 20%. tion on the first page. Copyrights for components of this work owned by others than We evaluate our model relative to what actually occurred ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission in the games in the test set. First, we learned a model that and/or a fee. Request permissions from [email protected]. closely reflects the actual decisions made by MLB managers. KDD’13, August 11–14, 2013, Chicago, Illinois, USA. 1 Copyright 2013 ACM 978-1-4503-2174-7/13/08 ...$15.00. http://www.stats.com/baseball.asp 973 It is well known that managers rely heavily on pitch count in 1986 [10]. Since then there have been many commercial and the opposing team’s scoring to determine when to re- systems developed to model the progression of players[1, 11, lieve the starting pitcher [2, 17, 12]. This manager model 3, 13, 15]. learns pitcher-specific parameters that fit these two variables Other than [6], which described a method for predicting against the manager’s decisions on the training data. the next type of pitch, almost all of the existing baseball sta- For the 76 pitchers who pitched at least 500 pitches in each tistical methods address high level problems that span multi- year, our manager model accurately models the manager’s ple games: projecting a player’s performance over the years, decision in 95%(±1.2%) of the innings, i.e., it is a reasonably evaluating a player’s contributions to the wins and losses, accurate model of what major league managers actually do. and optimizing a team budget. In contrast, we present a When the manager model is used to predict whether or not machine learning method to assist decision making during a run will be scored, it correctly predicts 75%(±4.6%) of the a game. In particular we present a method for producing innings. In contrast, our model makes a correct prediction predictive models that can be used to help decide whether for 81%(±4.9%) of the innings. Our model also outperforms a pitcher should be replaced at various points in the game. the manager model in F-score (0.41 vs 0.26) and odds ratio Pitchers are divided into two categories, starters (the first (3.2vs1.2). pitcher for each team) and relievers (all subsequent pitch- Second, we demonstrate that applying our model would ers). Starters rarely pitch the complete game, and deciding have led to a different decision 48% of the time after the when to replace the starter is often the most important in- fourth inning in close games. For those close game situa- game decision that a manager has to make. Early in the tions in which a manager actually left a pitcher in that the game the decision is based upon many factors, including the model would have removed, 60% of the time the pitcher sur- effectiveness of the starter and the managers desire to con- rendered a run. Unfortunately, it is impossible to say what serve the bullpen. Later in the game, the decision is based would have happened in those situations where a manager primarily on the manager’s estimate of the relative expected removed a pitcher that the model would have allowed to effectiveness of the starter and the pitcher who would replace continue, since we don’t know whether or not the removed him. Our work is designed to provide useful information in pitcher would have given up a run had he not been removed. the second half of close games. In Section 2 we present some background on baseball and related work. In Section 3 we describe our method. In Sec- 3. METHOD tion 4 we discuss the measures of performance used to eval- We first build a pitcher-specific predictor for the expected uate our method, and present the results of a series of tests PTB in the next inning. We then use this prediction to build in which comparisons are made using several performance a classifier that predcits whether a run will be given in the measures. Section 5 discusses the real world applications next inning. of our method. In Section 6 we discuss the limitations and outline future work. Section 7 provides a conclusion. 3.1 Features The independent variable x of our model incorporates the 2. RELATED WORK following groups of information available to a manager at Baseball was one of the first sports to attract statistical the time a decision has to be made: analysis. As a sport, it has all the necessary characteris- • Current game statistics: team scores, outs, inning num- tics for performing a quantitative analysis: a rich dataset ber, and pitch count, with detailed records for several decades, ordered game pro- gression that allows activity to be nicely segmented, and a • Averages of all previous innings in the season: strike reasonable amount of independence between events. This and ball count, base codes (bases advanced), home has led to the development of many statistical methods for runs, hits in play, walks (intentional walk discounted), assessing individual baseball players over the years [14, 1, steals, strike outs, and the distributions of pitch veloc- 11, 3]. ity and zones, Bill James used statistical data in the 1980s to analyze why teams win and lose.