Does Pitch Type - Zone Uncertainty Matter to a Pitcher’S Performance?
Total Page:16
File Type:pdf, Size:1020Kb
New Physics: Sae Mulli, Vol. 68, No. 6, June 2018, pp. 624∼629 http://dx.doi.org/10.3938/NPSM.68.624 Does Pitch Type - Zone Uncertainty Matter to a Pitcher’s Performance? Hyunuk Kim Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang 37673, Korea Woo-Sung Jung∗ Department of Industrial and Management Engineering & Department of Physics, Pohang University of Science and Technology, Pohang 37673, Korea (Received 20 February 2018 : revised 30 March 2018 : accepted 1 May 2018) Baseball is a game of numbers. Large-scale baseball data have been utilized in decision making process by team managers. Recent studies with PITCHf/x, a system that tracks every pitch, provide new insights on the role of the pitch-type sequence in the pitcher’s performance. These studies are based on the assumption that the pitch-type uncertainty puts the hitter at a disadvantage. However, the pitch-zone uncertainty, another factor of pitching uncertainty, is neglected in many cases. Here, we introduce normalized mutual information for pitch type and pitch-zone as an indicator of pitching uncertainty. A pitcher with less of a repertoire with respect to type-zone combination has high normalized mutual information in pitching. We calculate the pitch type - zone uncertainties for major league baseball starting pitchers and compare the results with field independent pitching, a metric of pitcher’s performance. Our analysis shows that normalized mutual information is uncorrelated with performance in nine distinct subgroups extracted from revealed comparative advantage in pitch type. This result underlines the importance of the pitcher’s repertoire and ability to be competitive in professional baseball. PACS numbers: 89.70.Cf, 89.75.-k Keywords: Pitching uncertainty, Normalized mutual information, Major league baseball, Starting pitcher, Revealed comparative advantage I. INTRODUCTION and statistics, gains worldwide popularity by the book Moneyball: The Art of Winning an Unfair Game [6]. Sports industry has grown with massive interests on Sabermetric indicators for baseball players are used in players and teams. It is worth hundreds billion dol- practices, and successfully describe player’s style even lars, covering from commercial products to sports events they consist of simple equations [7]. [1]. Along this trend, sports analytics become more im- Cutting-edge technologies about camera, sensing, and portant for efficient decision making and for analyzing visualization recently offer a great variety of resources individual player’s movement to predict future perfor- to collect information on game plays. Since 2008, mance. Publicly available sports databases attract re- PITCHf/x, a system that tracks every pitch, has been searchers to develop new approaches, which are based exploited for investigating game events in detail (up- on machine learning algorithm [2], complex networks [3, graded as “Statcast” with new radar and video technolo- 4], and Bayesian statistics [5]. gies in 2015) [8]. The system infers pitch types based on Baseball is one of popular sports in quantitative anal- ysis. Sabermetrics, a field that examines game records ball movements. High resolution PITCHf/x data cap- tures changes of pitching mechanisms [9,10], and helps ∗E-mail: [email protected] to estimate player’s ability: framing [11], hitting [12,13], This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Does Pitch Type - Zone Uncertainty Matter to a Pitcher’s Performance? – Hyunuk Kim · Woo-Sung Jung 625 Fig. 1. (Color online) Starting pitcher classification in season 2017 which are extracted by revealed comparative advantage in pitch type. Links of which correlation higher than 0.74 are included in the figure. These clusters are consistent over ten seasons from 2008 to 2017. Figure is drawn by Gephi [24]. and the quality of pitches [14]. PITCHf/x data discloses tool for automatic Major League Baseball data collec- the role of pitching uncertainty in pitcher’s performance. tion. PITCHf/x data consists of 79 data fields such Discussions on pitching uncertainty rely on assumption as game date, pitch type, release speed, zone, event that pitcher’s uncertainty makes hitter puzzled at bat. description, and etc. Our unit of analysis is start- Pitcher’s performance including strike out rates are ex- ing pitchers who were the first pitcher in more than plained by pitch type uncertainty [15] and pitch type 10 regular season games and threw more than 1,000 sequence [16–20]. pitches (except unidentified ball, intentional ball, and Besides pitch type, deception and pitch zone are also pitchout) in a season as we need enough samples to important factors of uncertainty [21], but they are ex- estimate pitching uncertainty. The number of start- cluded in many studies. Here we examine pitching un- ing pitchers of our interests ranges from 166 to 184. certainty by considering type and zone simultaneously in Their pitching performance is downloaded from Fan- the concept of mutual information. During ten seasons Graphs (https://www.fangraphs.com/). (2008 - 2017) of Major League Baseball (MLB), we com- pare pitch type-zone uncertainty of starting pitchers with their performance to check whether uncertainty mat- 2. Classifying starting pitchers ters. Correlation between pitch type-zone uncertainty and performance is not significant in nine subgroups of We minimize the effect of different pitching style on pitchers which are clustered by revealed comparative ad- uncertainty analysis by classifying subgroups of pitch vantage in pitch type. Our contribution is to suggest type. Subgroup corresponds with starting pitcher’s ace information-theoretic indicator of pitch type-zone uncer- that is effective throughout the season. Pearson correla- tainty and to confirm the importance of powerful pitch- tion in revealed comparative advantage (RCA) [23] iden- ing repertoire to be competitive in professional baseball tifies subgroups of pitch type clearly (Fig. 1). RCA is league. calculated as Eq. (1) where Aij is the frequency of pitch type j for player i. P II. DATA AND STARTING PITCHER Aij ij Aij RCAij = P P (1) CLASSIFICATION j Aij i Aij 1. Data Nine subgroups, Changeup (CH), Curveball (CU), Cut- ter (FC), Four-seam Fastball (FF), Splitter (FS), Two- We collect PITCHf/x data for ten seasons from 2008 seam Fastball (FT), Knuckle-curve (KC), Sinker (SI), to 2017 with the aid of R package “baseballr” [22], a and Slider (SL), are consistent for ten seasons (Table 1). 626 New Physics: Sae Mulli, Vol. 68, No. 6, June 2018 Table 1. The number of pitchers who throw more than 1,000 pitches as starting pitcher in more than 10 regular games. Number of starting pitchers Group 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 Total 164 176 173 169 177 174 177 180 173 184 Changeup (CH) 11 13 11 9 9 15 11 16 11 17 Curveball (CU) 18 20 15 18 19 18 21 21 21 26 Cutter (FC) 26 26 29 27 25 24 21 26 25 24 Four-seam Fastball (FF) 14 12 9 12 13 13 8 11 13 11 Splitter (FS) 15 15 16 15 14 13 15 15 14 14 Two-seam Fastball (FT) 31 30 29 28 29 23 24 22 24 23 Knuckle-curve (KC) 6 9 14 16 18 22 23 22 16 17 Sinker (SI) 28 31 30 23 30 27 25 25 18 23 Slider (SL) 13 19 17 19 19 17 27 20 29 28 Others (KN, EP, SC) 2 1 3 2 1 2 2 2 2 1 in normalized mutual information (NMI), which is cal- culated as Eq. (3) for variable X and Y , to measure pitching uncertainty. X p(x; y) I(X; Y ) = p(x; y) log (2) p(x)p(y) x2X;y2Y I(X; Y ) NMI(X; Y ) = p (3) H(X)H(Y ) Pitching uncertainty indicator summarizes pitcher’s repertoire in a comprehensive way. For instance, Hyun- Jin Ryu, a Korean starting pitcher in MLB, frequently throws changeup to zone 14 (Fig. 2), the bottom right Fig. 2. (Color online) Hyun-Jin Ryu’s pitching profile zone at catcher’s perspective. His four-seam fastball in 2017 with respect to zone and type. X axis is zone index and Y axis is pitch type. Value in a cell is the (FF) is distributed throughout the entire zone while off- probability of throwing type-zone repertoire. His pitch speed pitches are often located at boundary of strike zone type-zone uncertainty, normalized mutual information of pitch type and zone, is about 0.075. (Zone index 11-14). Pitching profile like Fig. 2 is inef- ficient when comparing uncertainty of starting pitchers. NMI gives a simple metric that could be directly used Each subgroup has sufficient number of players for cor- in sabermetric analysis. Hyun-Jin Ryu’s NMI in 2017 is relation analysis. Minor pitch types such as Knuckleball about 0.075, which is slightly higher than average NMI (KN), Eephus (EP), and Screwball (SC) are classified as in the same year, 0.070. Others. Our subsequent question is whether predictable pitcher performs worse than unpredictable others. We compare NMI with field independent pitching (FIP), a III. PITCH TYPE-ZONE UNCERTAINTY value of pitcher’s performance that removes defense fac- tors such as field error. FIP solely depends on pitcher’s Mutual information of two variables measures the ability as it is the linear combination of the number amount of shared information between them (Eq. (2)). of home runs, unintentional walks, hit-by-pitches, and Two variables are independent when mutual information strike outs. FIP is recognized as better measure of is zero. We consider pitch type and zone as variables pitcher’s performance than earned run average (ERA).