A Data-driven Method for In-game Decision Making in MLB When to Pull a Starting Pitcher Gartheeban Ganeshapillai John Guttag Massachusetts Institute of Technology Massachusetts Institute of Technology 77 Massachusetts Avenue, 77 Massachusetts Avenue, Cambridge, MA 02139 Cambridge, MA 02139
[email protected] [email protected] ABSTRACT 1. INTRODUCTION Professional sports is a roughly $500 billion dollar industry Perhaps the most important in-game decision a baseball that is increasingly data-driven. In this paper we show how manager makes is when to relieve a starting pitcher [2]. To- machine learning can be applied to generate a model that day, managers rely on various heuristics (e.g., pitch count) could lead to better on-field decisions by managers of pro- to decide when a starting pitcher should be relieved [2, 17]. fessional baseball teams. Specifically we show how to use In this paper, we derive from data a model that can be used regularized linear regression to learn pitcher-specific predic- to assist in making these decisions in a principled way. tive models that can be used to help decide when a starting We pose the problem as classification problem of predict- pitcher should be replaced. A key step in the process is our ing whether a starting pitcher would give up at least one run method of converting categorical variables (e.g., the venue if allowed to start the next inning. (This formulation seems in which a game is played) into continuous variables suit- appropriate late in close games, but is probably not the right able for the regression.