Major Qualifying Project: Advanced Baseball Statistics

Total Page:16

File Type:pdf, Size:1020Kb

Major Qualifying Project: Advanced Baseball Statistics Major Qualifying Project: Advanced Baseball Statistics Matthew Boros, Elijah Ellis, Leah Mitchell Advisors: Jon Abraham and Barry Posterro April 30, 2020 Contents 1 Background 5 1.1 The History of Baseball . .5 1.2 Key Historical Figures . .7 1.2.1 Jerome Holtzman . .7 1.2.2 Bill James . .7 1.2.3 Nate Silver . .8 1.2.4 Joe Peta . .8 1.3 Explanation of Baseball Statistics . .9 1.3.1 Save . .9 1.3.2 OBP,SLG,ISO . 10 1.3.3 Earned Run Estimators . 10 1.3.4 Probability Based Statistics . 11 1.3.5 wOBA . 12 1.3.6 WAR . 12 1.3.7 Projection Systems . 13 2 Aggregated Baseball Database 15 2.1 Data Sources . 16 2.1.1 Retrosheet . 16 2.1.2 MLB.com . 17 2.1.3 PECOTA . 17 2.1.4 CBS Sports . 17 2.2 Table Structure . 17 2.2.1 Game Logs . 17 2.2.2 Play-by-Play . 17 2.2.3 Starting Lineups . 18 2.2.4 Team Schedules . 18 2.2.5 General Team Information . 18 2.2.6 Player - Game Participation . 18 2.2.7 Roster by Game . 18 2.2.8 Seasonal Rosters . 18 2.2.9 General Team Statistics . 18 2.2.10 Player and Team Specific Statistics Tables . 19 2.2.11 PECOTA Batting and Pitching . 20 2.2.12 Game State Counts by Year . 20 2.2.13 Game State Counts . 20 1 CONTENTS 2 2.3 Conclusion . 20 3 Cluster Luck 21 3.1 Quantifying Cluster Luck . 22 3.2 Circumventing Cluster Luck with Total Bases . 24 3.3 Conclusion . 26 4 Garbage Time 27 4.1 Calculating Situational Odds of Winning . 28 4.2 Quantifying Garbage Time . 30 4.3 Garbage-Adjusted Statistics . 31 5 Preseason Model 34 5.1 Calculating Cluster Luck Adjusted Wins . 34 5.2 Roster Changes . 36 5.3 Injury Adjustments . 37 5.4 Alternative Attempts at Creating the Preseason Model . 38 5.5 Alternative Methods to Creating Preseason Models . 39 5.5.1 PECOTA and ZIPS . 39 5.5.2 Total Bases . 39 5.5.3 Fractional Rosters . 39 5.6 Regression Results . 40 5.7 Second Regression . 42 6 In-Season Model 43 6.1 Runs . 43 6.2 Cluster Luck Runs . 44 6.3 Total Bases . 45 6.4 Wins Adapted In-Season . 46 6.5 Garbage Time Exlcuded . 47 6.6 Results . 48 7 Combination of the Two Models 51 7.1 Credibility Theory . 51 7.2 Linear . 52 7.3 Mixed . 53 7.4 Results . 54 8 Single Game Betting 55 8.1 Starting Pitcher, Relief Pitcher, and Lineup . 55 8.2 Games . 56 8.3 Results . 57 8.4 Conclusion . 58 9 2019 Testing 59 9.1 Preseason Testing . 59 9.2 Conclusion . 60 10 Conclusion 61 CONTENTS 3 A Database Technical Details 63 A.1 Design Decisions . 63 A.1.1 Choosing SQLite and SQL Alchemy . 63 A.1.2 Table Design and Code Reusability . 65 A.2 Retrieving and Cleaning Data . 66 A.2.1 Retrosheet . 66 A.2.2 MLB.com Rosters . 66 A.2.3 CBS Sports Injury Reports . 67 A.3 Using the Database . 69 A.4 Full Table List . 70 A.4.1 GameLog . 70 A.4.2 PlayByPlay . 70 A.4.3 SeasonalRoster . 74 A.4.4 StartingLineup . 74 A.4.5 TeamStats . 74 A.4.6 TeamBattingStats . 75 A.4.7 TeamPitchingStats . 75 A.4.8 NonGbgTeamBattingStats . 75 A.4.9 NonGbgTeamPitchingStats . 76 A.4.10 PlayerBattingStats . 76 A.4.11 PlayerPitchingStats . 77 A.4.12 NonGbgPlayerBattingStats . 77 A.4.13 NonGbgPlayerPitchingStats . 77 A.4.14 Game State Counts By Year ........................... 78 A.4.15 Game State Counts . 78 A.4.16 RosterByGame . 78 A.4.17 Schedule . 78 A.4.18 BettingOdds . ..
Recommended publications
  • NCAA Division I Baseball Records
    Division I Baseball Records Individual Records .................................................................. 2 Individual Leaders .................................................................. 4 Annual Individual Champions .......................................... 14 Team Records ........................................................................... 22 Team Leaders ............................................................................ 24 Annual Team Champions .................................................... 32 All-Time Winningest Teams ................................................ 38 Collegiate Baseball Division I Final Polls ....................... 42 Baseball America Division I Final Polls ........................... 45 USA Today Baseball Weekly/ESPN/ American Baseball Coaches Association Division I Final Polls ............................................................ 46 National Collegiate Baseball Writers Association Division I Final Polls ............................................................ 48 Statistical Trends ...................................................................... 49 No-Hitters and Perfect Games by Year .......................... 50 2 NCAA BASEBALL DIVISION I RECORDS THROUGH 2011 Official NCAA Division I baseball records began Season Career with the 1957 season and are based on informa- 39—Jason Krizan, Dallas Baptist, 2011 (62 games) 346—Jeff Ledbetter, Florida St., 1979-82 (262 games) tion submitted to the NCAA statistics service by Career RUNS BATTED IN PER GAME institutions
    [Show full text]
  • The Astros' Sign-Stealing Scandal
    The Astros’ Sign-Stealing Scandal Major League Baseball (MLB) fosters an extremely competitive environment. Tens of millions of dollars in salary (and endorsements) can hang in the balance, depending on whether a player performs well or poorly. Likewise, hundreds of millions of dollars of value are at stake for the owners as teams vie for World Series glory. Plus, fans, players and owners just want their team to win. And everyone hates to lose! It is no surprise, then, that the history of big-time baseball is dotted with cheating scandals ranging from the Black Sox scandal of 1919 (“Say it ain’t so, Joe!”), to Gaylord Perry’s spitter, to the corked bats of Albert Belle and Sammy Sosa, to the widespread use of performance enhancing drugs (PEDs) in the 1990s and early 2000s. Now, the Houston Astros have joined this inglorious list. Catchers signal to pitchers which type of pitch to throw, typically by holding down a certain number of fingers on their non-gloved hand between their legs as they crouch behind the plate. It is typically not as simple as just one finger for a fastball and two for a curve, but not a lot more complicated than that. In September 2016, an Astros intern named Derek Vigoa gave a PowerPoint presentation to general manager Jeff Luhnow that featured an Excel-based application that was programmed with an algorithm. The algorithm was designed to (and could) decode the pitching signs that opposing teams’ catchers flashed to their pitchers. The Astros called it “Codebreaker.” One Astros employee referred to the sign- stealing system that evolved as the “dark arts.”1 MLB rules allowed a runner standing on second base to steal signs and relay them to the batter, but the MLB rules strictly forbade using electronic means to decipher signs.
    [Show full text]
  • Sabermetrics: the Past, the Present, and the Future
    Sabermetrics: The Past, the Present, and the Future Jim Albert February 12, 2010 Abstract This article provides an overview of sabermetrics, the science of learn- ing about baseball through objective evidence. Statistics and baseball have always had a strong kinship, as many famous players are known by their famous statistical accomplishments such as Joe Dimaggio’s 56-game hitting streak and Ted Williams’ .406 batting average in the 1941 baseball season. We give an overview of how one measures performance in batting, pitching, and fielding. In baseball, the traditional measures are batting av- erage, slugging percentage, and on-base percentage, but modern measures such as OPS (on-base percentage plus slugging percentage) are better in predicting the number of runs a team will score in a game. Pitching is a harder aspect of performance to measure, since traditional measures such as winning percentage and earned run average are confounded by the abilities of the pitcher teammates. Modern measures of pitching such as DIPS (defense independent pitching statistics) are helpful in isolating the contributions of a pitcher that do not involve his teammates. It is also challenging to measure the quality of a player’s fielding ability, since the standard measure of fielding, the fielding percentage, is not helpful in understanding the range of a player in moving towards a batted ball. New measures of fielding have been developed that are useful in measuring a player’s fielding range. Major League Baseball is measuring the game in new ways, and sabermetrics is using this new data to find better mea- sures of player performance.
    [Show full text]
  • 2021 Sun Devil Baseball GAME NOTES - OREGON
    2021 Sun Devil Baseball GAME NOTES - OREGON GAMES 15-17 March 19-21 #23 ARIZONA STATE 4p.m./2 p.m./12 p.m. AZT #19 Oregon 11-3 (0-0 Pac-12) PK Park 8-3/0-0 Pac-12 Eugene, Ore. @ASU_Baseball Watch: Pac-12 Live Stream @OregonBaseball @TheSunDevils Radio: N/A @GoDucks Five -Time NCAA Champions (1965, 1967, 1969, 1977, 1981) | 22 College World Series Appearances | 21 Conference Championships | 128 All-Americans | 14 National Players of the Year | 12 College Baseball Hall of Famers MEDIA RELATIONS CONTACT ASU_BASEBALL SUN DEVIL BASEBALL ASU_BASEBALL Jeremy Hawkes 2021 @ASU_BASEBALL Schedule [email protected] | C: 520-403-0121 | O: 480-965-9544 Date Opponent Time/Score 19-Feb Sacramento State^ L, 2-4 20-Feb Sacramento State^ W, 2-1 #10THINGS (Twitter-Friendly Notes) #BYTHENUMBERS 21-Feb Sacramento State^ W, 3-1 ASU has held its opponents to 5 runs or fewer in 26-Feb Hawaii L, 2-3 27-Feb Hawaii W, 6-5 1. Dating back to last year, ASU has held oppo- 28 of the 31 games since Jason Kelly has come on as 27-Feb Hawaii W, 9-6 nents to five runs or fewer in 28 of the 31 games the pitching coach. For perspective, ASU gave up six or 2-Mar Nevada W, 13-4 5-Mar Utah W, 4-3 since Jason Kelly’s arrival. more runs in 25 of its 57 games in 2019. Even despite a 6-Mar Utah W, 4-1 tough 10-runs allowed against UNLV, ASU is 20th in the 7-Mar Utah W, 5-0 2.
    [Show full text]
  • 04-30-2015 Angels Game Notes
    ANGELS (10-11) @ ATHLETICS (9-13) RHP GARRETT RICHARDS (1-1, 3.75 ERA) vs. RHP JESSE CHAVEZ (0-1, 0.71 ERA) O.Co COLISEUM – 12:37 PM PDT TV – FOX SPORTS WEST RADIO – KLAA AM 830 THURSDAY, APRIL 30, 2015 GAME #22 (10-11) OAKLAND, CA ROAD GAME #12 (6-5) LEADING OFF: Today the Angels play the ‘rubber match’ GAME OF CRON: On Sunday, C.J. Cron’s four hits set a at Oakland and the third game (1-1) of a six-game Bay single-game career-high…He has recorded nine hits in his THIS DATE IN ANGELS HISTORY Area road trip to Oakland (April 28-30; 1-1) and San last 18 at-bats and has three doubles his last seven games. April 30 (2008) A day after Joe Saunders Francisco (May 1-3)…Halos are in a stretch of 25 moves to 5-0, Ervin Santana consecutive days in California (April 20-May 14)…LAA ABOUT FACE: Thru 21 games in 2014, David Freese tallied becomes the third pitcher in club went 4-3 on the seven-game home stand vs. Oakland (2- 13 hits, one double, one home run and six RBI…Thru 21 history to go 5-0 in April…The duo 2) and Texas (2-1)…Following this trip, Angels will host a games in 2015, owns 17 hits, four doubles, four home runs becomes just the second tandem nine-game home stand vs. Seattle (May 4-6), Houston and a team-leading 15 RBI. in MLB history to boast two (May 7-10) and Colorado (May 12-13).
    [Show full text]
  • A Statistical Study Nicholas Lambrianou 13' Dr. Nicko
    Examining if High-Team Payroll Leads to High-Team Performance in Baseball: A Statistical Study Nicholas Lambrianou 13' B.S. In Mathematics with Minors in English and Economics Dr. Nickolas Kintos Thesis Advisor Thesis submitted to: Honors Program of Saint Peter's University April 2013 Lambrianou 2 Table of Contents Chapter 1: The Study and its Questions 3 An Introduction to the project, its questions, and a breakdown of the chapters that follow Chapter 2: The Baseball Statistics 5 An explanation of the baseball statistics used for the study, including what the statistics measure, how they measure what they do, and their strengths and weaknesses Chapter 3: Statistical Methods and Procedures 16 An introduction to the statistical methods applied to each statistic and an explanation of what the possible results would mean Chapter 4: Results and the Tampa Bay Rays 22 The results of the study, what they mean against the possibilities and other results, and a short analysis of a team that stood out in the study Chapter 5: The Continuing Conclusion 39 A continuation of the results, followed by ideas for future study that continue to project or stem from it for future baseball analysis Appendix 41 References 42 Lambrianou 3 Chapter 1: The Study and its Questions Does high payroll necessarily mean higher performance for all baseball statistics? Major League Baseball (MLB) is a league of different teams in different cities all across the United States, and those locations strongly influence the market of the team and thus the payroll. Year after year, a certain amount of teams, including the usual ones in big markets, choose to spend a great amount on payroll in hopes of improving their team and its player value output, but at times the statistics produced by these teams may not match the difference in payroll with other teams.
    [Show full text]
  • Baseball Milestones: Barry, Alex, and Tom
    University of Central Florida STARS On Sport and Society Public History 8-17-2007 Baseball milestones: Barry, Alex, and Tom Richard C. Crepeau University of Central Florida, [email protected] Part of the Cultural History Commons, Journalism Studies Commons, Other History Commons, Sports Management Commons, and the Sports Studies Commons Find similar works at: https://stars.library.ucf.edu/onsportandsociety University of Central Florida Libraries http://library.ucf.edu This Commentary is brought to you for free and open access by the Public History at STARS. It has been accepted for inclusion in On Sport and Society by an authorized administrator of STARS. For more information, please contact [email protected]. Recommended Citation Crepeau, Richard C., "Baseball milestones: Barry, Alex, and Tom" (2007). On Sport and Society. 751. https://stars.library.ucf.edu/onsportandsociety/751 SPORT AND SOCIETY FOR H-ARETE Baseball milestones: Barry, Alex, and Tom AUGUST 17, 2007 Over two weeks ago before taking a short vacation to escape the heat and humidity of Florida, I saw the Barry Bonds home run that tied Henry Aaron. That same day Alex Rodriguez hit his 500th home run and all those fans with steroid anxiety suddenly discovered a new hero. Even before Bonds had officially passed Aaron, A-Rod became the Great Clean Hope who would surpass "Mr. Bonds," as Bud Selig so warmly called him, and bring the home run crown back to the kind of people who hold it by divine fiat. The next day Tom Glavine won his 300th game, becoming only the fifth left-hander and the twenty-third member of that exclusive club.
    [Show full text]
  • Lessons You Can Learn by Watching a Game
    Lessons You Can Learn by Watching a Game Good coaches no matter how old they are will watch a game and come away learning something. Even if they may be watching the game for enjoyment, there is always something they will see that could possibly help them in the future. A great teaching moment is to take your team to a game or watch a game on TV with them. Show your players during that game not only the good things that are happening but also the things that are done that may cost a run and eventually a game. Coaches can teach their players what to look for during the game like offensive and defensive weaknesses and tendencies. They can teach situations that come up during the game and can teach why something worked or why it didn’t work. Pictures are worth a thousand words. Even watching Major League Baseball games on TV will provide a lot of teachable moments. Lesson One: When Jason Wurth hit the winning walk off home run in the ninth inning during game four against the Cardinals, the Nationals went wild. Yes, it was a big game to win but it was not the Championship game. Watching them storm the field and jump up and down with excitement, made me shake my head. I have been on both sides of that scenario and that becomes bulletin board material. The Cardinals came back to win the next game and take the series. Side note: in case you have never heard that term, bulletin board material means that a player/team said or did something that could make the other team irritated at them to the point that it inspires that other team to do everything possible to beat the team.
    [Show full text]
  • Baseball/Softball
    July2006 ?fe Aatuated ScowS& For Basebatt/Softbatt Quick Keys: Batter keywords: Press this: To perform this menu function: Keyword: Situation: Keyword: Situation: a.Lt*s Balancescoresheet IB Single SAC Sacrificebunt ALT+D Show defense 2B Double SF Sacrifice fly eLt*B Edit plays 3B Triple RBI# # Runs batted in RLt*n Savea gamefile to disk HR Home run DP Hit into doubleplay crnl*n Load a gamefile from disk BB Walk GDP Groundedinto doubleplay alr*I Inning-by-inning summary IBB Intentionalwalk TP Hit into triple play nlr*r Lineupcards HP Hit by pitch PB Reachedon passedball crRL*t List substitutions FC Fielder'schoice WP Reachedon wild pitch alr*o Optionswindow CI Catcher interference E# Reachon error by # ALT+N Gamenotes window BI Batter interference BU,GR Bunt, ground-ruledouble nll*p Playswindow E# Reachedon error by DF Droppedfoul ball ALr*g Quit the program F# Flied out to # + Advanced I base alr*n Rosterwindow P# Poppedup to # -r-r Advanced2 bases CTRL+R Rosterwindow (edit profiles) L# Lined out to # +++ Advanced3 bases a,lr*s Statisticswindow FF# Fouledout to # +T Advancedon throw 4 J-l eLt*:t Turn the scoresheetpage tt- tt Groundedout # to # +E Advanced on effor l+1+1+ .ALr*u Updatestat counts trtrft Out with assists A# Assistto # p4 Sendbox score(to remotedisplay) #UA Unassistedputout O:# Setouts to # Ff, Edit defensivelineup K Struck out B:# Set batter to # F6 Pitchingchange KS Struck out swinging R:#,b Placebatter # on baseb r7 Pinchhitter KL Struck out looking t# Infield fly to # p8 Edit offensivelineup r9 Print the currentwindow alr*n1 Displayquick keyslist Runner keywords: nlr*p2 Displaymenu keys list Keyword: Situation: Keyword: Situation: SB Stolenbase + Adv one base Hit locations: PB Adv on passedball ++ Adv two bases WP Adv on wild pitch +++ Adv threebases Ke1+vord: Description: BK Adv on balk +E Adv on error 1..9 PositionsI thru 9 (p thru rf) CS Caughtstealing +E# Adv on error by # P.
    [Show full text]
  • Tuesday, March 28, 2017
    World Champions 1983, 1970, 1966 American League Champions 1983, 1979, 1971, 1970, 1969, 1966 American League East Division Champions 2014, 1997, 1983, 1979, 1974, 1973, 1971, 1970, 1969 American League Wild Card 2016, 2012, 1996 Tuesday, March 28, 2017 Game Stories: Tyler Wilson continues to trend upward in otherwise sloppy 11-9 loss to Red Sox The Sun 3/27 Orioles duo leads top prospect performers MLB.com 3/27 Wrapping up an 11-9 loss MASNsports.com 3/27 Columns: Chris Davis hoping to move on from year one of megadeal with Orioles The Sun 3/28 Kevin Gausman bringing back slider this spring to build on breakout season The Sun 3/28 Orioles grant Michael Bourn's request for release; Chris Johnson also released The Sun 3/27 Buck Showalter on Trey Mancini's roster chances: Complicated, but 'Trey has done his part' The Sun 3/27 Kevin Gausman named Orioles' Opening Day starter The Sun 3/27 Orioles lineup vs. Braves MASNsports.com 3/28 More roster talk as Orioles near end of camp MASNsports.com 3/28 Orioles release Bourn and Johnson MASNsports.com 3/27 For his latest trick, Cedric Mullins homered off Craig Kimbrel MASNsports.com 3/27 Orioles' Fifth Starter Quandary Becomes A Little Clearer PressBoxOnline.com 3/27 Kevin Gausman Hopes His Opening Day Start Is 'First Of Many' PressBoxOnline.com 3/27 Jayson Aquino Makes His Plea For The Fifth Starter's Job PressBoxOnline.com 3/27 Chris Tillman Throws For First Time Since Cortisone Shot PressBoxOnline.com 3/26 Orioles Fifth Starter Competition May Get Clearer PressBoxOnline.com
    [Show full text]
  • Name of the Game: Do Statistics Confirm the Labels of Professional Baseball Eras?
    NAME OF THE GAME: DO STATISTICS CONFIRM THE LABELS OF PROFESSIONAL BASEBALL ERAS? by Mitchell T. Woltring A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Leisure and Sport Management Middle Tennessee State University May 2013 Thesis Committee: Dr. Colby Jubenville Dr. Steven Estes ACKNOWLEDGEMENTS I would not be where I am if not for support I have received from many important people. First and foremost, I would like thank my wife, Sarah Woltring, for believing in me and supporting me in an incalculable manner. I would like to thank my parents, Tom and Julie Woltring, for always supporting and encouraging me to make myself a better person. I would be remiss to not personally thank Dr. Colby Jubenville and the entire Department at Middle Tennessee State University. Without Dr. Jubenville convincing me that MTSU was the place where I needed to come in order to thrive, I would not be in the position I am now. Furthermore, thank you to Dr. Elroy Sullivan for helping me run and understand the statistical analyses. Without your help I would not have been able to undertake the study at hand. Last, but certainly not least, thank you to all my family and friends, which are far too many to name. You have all helped shape me into the person I am and have played an integral role in my life. ii ABSTRACT A game defined and measured by hitting and pitching performances, baseball exists as the most statistical of all sports (Albert, 2003, p.
    [Show full text]
  • Here Comes the Strikeout
    LEVEL 2.0 7573 HERE COMES THE STRIKEOUT BY LEONARD KESSLER In the spring the birds sing. The grass is green. Boys and girls run to play BASEBALL. Bobby plays baseball too. He can run the bases fast. He can slide. He can catch the ball. But he cannot hit the ball. He has never hit the ball. “Twenty times at bat and twenty strikeouts,” said Bobby. “I am in a bad slump.” “Next time try my good-luck bat,” said Willie. “Thank you,” said Bobby. “I hope it will help me get a hit.” “Boo, Bobby,” yelled the other team. “Easy out. Easy out. Here comes the strikeout.” “He can’t hit.” “Give him the fast ball.” Bobby stood at home plate and waited. The first pitch was a fast ball. “Strike one.” The next pitch was slow. Bobby swung hard, but he missed. “Strike two.” “Boo!” Strike him out!” “I will hit it this time,” said Bobby. He stepped out of the batter’s box. He tapped the lucky bat on the ground. He stepped back into the batter’s box. He waited for the pitch. It was fast ball right over the plate. Bobby swung. “STRIKE TRHEE! You are OUT!” The game was over. Bobby’s team had lost the game. “I did it again,” said Bobby. “Twenty –one time at bat. Twenty-one strikeouts. Take back your lucky bat, Willie. It was not lucky for me.” It was not a good day for Bobby. He had missed two fly balls. One dropped out of his glove.
    [Show full text]