An Exploration of the First Pitch in Baseball
Total Page:16
File Type:pdf, Size:1020Kb
AN EXPLORATION OF THE FIRST PITCH IN BASEBALL Ashley Spangler A Thesis Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE May 2017 Committee: James Albert, Advisor Christopher Rump John Chen © 2017 Ashley Spangler All Rights Reserved iii ABSTRACT James Albert, Advisor Sabermetrics is the statistical analysis of baseball. This research was started in the 1950’s and since then has become increasingly popular. Over the last couple of years, the availability of data within the sport of baseball has exploded. From mainly three sources, we have access to a vast arrange of statistics. This research investigates the importance of count and the first pitch in baseball. The first pitch determines whether the hitter or the pitcher has the advantage in the at- bat and can set the precedence for the rest of the at-bat. Exploratory methods are used to investigate and summarize the relationships between various variables through the use of tables, contour plots, scatterplots, and line graphs. As the pitcher’s thrown first pitch strike percentage increases, the number of innings pitched per game increases, Walks per Hits per Innings Pitched (WHIP) decreases, walk percentage decreases, and strikeout percentage increases. 64% of the first pitches thrown are either four-seam or two-seam fastballs or sliders, which are all fast pitches. Over 50% of the first pitches are in the strike zone. Singles, doubles, triples, and homeruns are more likely to be hit on the first pitch. Pitchers have the highest pitching statistics when the hitter swings and misses compared to putting the ball in play, a called strike, or a ball on the first pitch. When the first pitch is a ball, the hitters have the highest hitting statistics. Generalized Additive Models (GAM) and Logistic Regression Models are used to discover the factors significant in predicting the probability that hitters swing. Logistic models were created for all pitches and then first pitches for all players. Next, four logistic models were created for four different players. In the majority of the models, count type (whether the count iv favored the pitcher, hitter, or was neutral), the distance in feet of the pitch from the center of the strike zone, and if runners were on base or not were significant in predicting the probability of swinging. Overall results suggest that hitters have different hitting strategies and swinging on the first pitch, or in general, depends on the hitter. More mature hitters tend to not swing on the first pitch. v In honor of you, Dad. vi ACKNOWLEDGMENTS I would like to express my deepest appreciation to my advisor, Dr. James Albert, for his overwhelming support, assistance, and encouragement throughout the whole project. I would not have been able to get through all the struggles and frustrations without his patience and extensive baseball knowledge. I would also like thank Dr. Christopher Rump and Dr. John Chen for serving on my committee. Thanks to my fellow Bowling Green State University graduate students and professors. I would not have been able to get through graduate school without all of your support, advice, and knowledge. Finally, I must express my very profound gratitude to my family, fiancé, and mom for providing me with unfailing support and continuous encouragement throughout my years of school and through the process of researching and writing this thesis. This accomplishment would not have been possible without them. vi TABLE OF CONTENTS Page SECTION 1: INTRODUCTION ........................................................................................... 1 1.1 Description of the Game of Baseball ................................................................... 1 1.2 Connection between Statistics and Baseball ........................................................ 3 1.3 Availability of Baseball Data ............................................................................... 5 SECTION 2: IMPORTANCE OF PITCH COUNT AND THE FIRST PITCH .................. 8 2.1 Previous Work ..................................................................................................... 11 2.2 Different Philosophies for Hitters and Pitchers ................................................... 15 SECTION 3: RESEARCH DESIGN .................................................................................... 19 3.1 Research Questions .............................................................................................. 19 3.2 Methodological Design. ....................................................................................... 20 SECTION 4: EXPLORATORY WORK OF THE FIRST PITCH. ...................................... 22 4.1 Importance of the First Pitch Example ................................................................ 22 4.2 Characteristics of the First Pitch .......................................................................... 33 4.2.1 Percentage of First Pitch Strikes Thrown and Swing Rate of First Pitches ............................................................................................................ 33 4.2.2 Pitch Type Classification ...................................................................... 34 4.2.3 Pitch Type Thrown on the First Pitch. .................................................. 37 4.2.4 Percentage of Strikeouts and Walks for Plate Appearances Passing Through 0-1 vs 1-0 Count .............................................................................. 43 4.3 Pitch Location by Stance and Count .................................................................... 43 4.4 Hit Quality ........................................................................................................... 48 vii SECTION 5: FIRST PITCH-HITTER PERSPECTIVE ...................................................... 50 5.1 Batting Average. .................................................................................................. 50 5.2 On-Base Percentage ............................................................................................. 52 5.3 Slugging Percentage ............................................................................................. 54 5.4 On-Base Plus Slugging. ....................................................................................... 55 SECTION 6: FIRST PITCH-PITCHER PERSPECTIVE. .................................................... 58 6.1 Batting Average on Balls in Play ......................................................................... 58 6.2 Left On-Base Percentage ..................................................................................... 60 6.3 Walks per Hits per Innings Pitched ..................................................................... 62 6.4 Earned Runs Average .......................................................................................... 64 6.5 Fielding Independent Pitching ............................................................................. 65 SECTION 7: SWING VS NO SWING ................................................................................. 70 7.1 Logistic Regression Modeling ............................................................................. 71 7.1.1 Probability of Swinging Given the Count ............................................. 72 7.1.2 Probability of Swinging Given the Placement of the Runners ............. 74 7.1.3 Probability of Swinging Given the Position of the Hitter in the Batting Lineup ............................................................................................... 76 7.1.4 Probability of Swinging Given the Inning ............................................ 78 7.1.5 Logistic Model for All Players ............................................................. 80 7.1.6 Logistic Models for Certain Players ..................................................... 88 7.2 Generalized Additive Modeling ........................................................................... 108 7.2.1 Probability of Swinging Based on the Location of the Pitch ............... 110 7.2.2 Probability of Swinging Based on the Hitter ........................................ 110 viii SECTION 8: CONCLUSION ................................................................................................ 113 REFERENCES……………………………………………………………………………… 117 APPENDIX A: PITCH TYPE DEFINITIONS .................................................................... 121 APPENDIX B: DESCRIPTION OF PITCHF/X VARIABLES FOR PITCH CLASSIFICATION… ............................................................................................... 123 APPENDIX C: ABBREVIATIONS USED .......................................................................... 125 APPENDIX D: HELPFUL R FUNCTIONS AND LIBRARIES .......................................... 126 ix LIST OF FIGURES Figure Page 2.1 The First Pitch Effect ................................................................................................. 10 2.2 Number of Strikeouts per Game from 1893 to 2015 ................................................. 11 2.3 Swing Rate on the First Pitch by Season ................................................................... 14 4.1 Corey Kluber’s Location of First Pitches for the First Game of the 2016 World Series ……………………………………………………………………………… . 23 4.2 Trevor Bauer’s Location of First Pitches for the Second Game of the 2016 World Series……………………………………………………………………………… . 23 4.3 Overall 2015 Seasonal Percentage of First Pitch Strikes