Package 'Sabermetrics'

Total Page:16

File Type:pdf, Size:1020Kb

Package 'Sabermetrics' Package ‘Sabermetrics’ February 19, 2015 Type Package Title Sabermetrics Functions For Baseball Analytics Version 1.0 Date 2015-02-06 Author Peter Xenopoulos <www.peterxeno.com> Maintainer Peter Xenopoulos <[email protected]> Description A collection of baseball analytics functions for sabermetrics purposes. Among these func- tions include popular metrics such as OBP, wOBA, runs created functions as well as field- independent pitching metrics. License GPL-3 NeedsCompilation no Repository CRAN Date/Publication 2015-02-07 00:55:03 R topics documented: sabermetrics-package . .2 dice .............................................2 eqa..............................................3 fip..............................................5 iso..............................................6 log5.............................................7 obp .............................................8 ops..............................................9 pyth............................................. 10 rcBasic . 11 rcBasicSB . 12 rcPX............................................. 13 rcTech . 14 secA............................................. 15 slg.............................................. 16 wOBA............................................ 17 Index 18 1 2 dice sabermetrics-package Sabermetrics Functions For Baseball Analytics Description A collection of baseball analytics functions for sabermetrics purposes. Among these functions include popular metrics such as OBP, wOBA, runs created functions as well as field-independent pitching metrics. Details Package: Sabermetrics Type: Package Version: 1.0 Date: 2015-02-06 License: GPL-3 Author(s) Peter Xenopoulos References Wikipedia: http://en.wikipedia.org/wiki/Sabermetrics#Examples Reddit: http://www.reddit.com/r/Sabermetrics dice Defense-Independent Component ERA (DICE) Description A function gives a number that is better at predicting a pitcher’s ERA in the following year than the pitcher’s actual ERA in the current year. Usage dice(HR, BB, HBP, K, IP) eqa 3 Arguments HR Home Runs Allowed BB Walks Allowed HBP Batters Hit K Strikeouts IP Innings Pitched Value Returns 3 + ((13*HR+3*BB+3*HBP-2*K)/IP) Author(s) Peter Xenopoulos References http://en.wikipedia.org/wiki/Defense_independent_pitching_statistics Examples ## Defense-Independent Component ERA (dice) function is currently defined as function (HR, BB, HBP, K, IP) { defenseERA <- 3 + ((13 * HR + 3 * BB + 3 * HBP - 2 * K)/IP) return(defenseERA) } ## Let's take 2014's MLB MVP, Clayton Kershaw, and find his DICE ## Stats for Clayton Kershaw available on ## http://www.baseball-reference.com/players/k/kershcl01-pitch.shtml ## For 2014, Kershaw allowed 9 HR, 31 BB, 2 HBP, 239 K, and 198.1 IP ## The formula for his DICE using the dice function is below ## Output should be 1.677436 dice(9,31,2,239,198.1) eqa Equivalent Average Description A baseball metric invented by Clay Davenport and intended to express the production of hitters in a context independent of park and league effects. EQA represents a hitter’s productivity using the same scale as batting average. Usage eqa(H, TB, BB, HBP, SB, SAC, SF, AB, CS) 4 eqa Arguments H Hits TB Total Bases BB Walks HBP Hit by pitch SB Stolen bases SAC Sacrifice hit/bunt SF Sacrifice flies AB At bats CS Caught stealing Value Returns (H+TB+1.5*(BB+HBP)+SB+SAC+SF)/(AB+BB+HBP+SAC+SF+CS+(SB/3)) Author(s) Peter Xenopoulos References http://en.wikipedia.org/wiki/Equivalent_average Examples ## The equivalent average (eqa) function is currently defined as function (H, TB, BB, HBP, SB, SAC, SF, AB, CS) { eqa <- (H + TB + 1.5 * (BB + HBP) + SB + SAC + SF)/(AB + BB + HBP + SAC + SF + CS + (SB/3)) return(eqa) } ## Let's take 2014's MLB MVP, Mike Trout, and find his OPS ## Stats for Mike Trout available on ## http://www.baseball-reference.com/players/t/troutmi01-bat.shtml ## For 2014, Trout had 173 H, 338 TB, 83 BB, 10 HBP, 16 SB, 0 SAC, 10 SF, 602 AB, 2 CS ## The formula for his EQA using the ops function is below ## Output should be .9496958 eqa(173,338,83,10,16,0,10,602,2) fip 5 fip Field Independent Pitching Description Similar to DICE dice Usage fip(HR, BB, K, IP, C) Arguments HR Home Runs allowed BB Walks K Strikeouts IP Innings Pitched C League average ERA Value Returns ((13*HR+3*BB-2*K)/IP) + C Author(s) Peter Xenopoulos References http://en.wikipedia.org/wiki/Defense_independent_pitching_statistics See Also DICE dice Examples ## Field Independent Pitching (fip) function is currently defined as function (HR, BB, K, IP, C) { fieldIndPitch <- ((13 * HR + 3 * BB - 2 * K)/IP) + C return(fieldIndPitch) } ## Let's take 2014's MLB MVP, Clayton Kershaw, and find his FIPS ## Stats for Clayton Kershaw available on ## http://www.baseball-reference.com/players/k/kershcl01-pitch.shtml ## For 2014, Kershaw allowed 9 HR, 31 BB, 239 K, 198.1 IP and league era (C) of 3.66 6 iso ## The formula for his FIPS using the dice function is below ## Output should be 2.307148 fip(9,31,239,198.1,3.66) iso Isolated Power Description Isolated power is a statistic to measure a hitter’s raw power Usage iso(slg, avg) Arguments slg Slugging Percentage. Found from slg avg Batting Average Value Returns Slugging Percentage - Batting Average Author(s) Peter Xenopoulos References http://en.wikipedia.org/wiki/Isolated_Power See Also Slugging Percentage slg Examples ## The isolated power (iso) function is currently defined as function (slg, avg) { iso <- slg - avg return(iso) } ## Let's take 2014's MLB MVP, Mike Trout, and find his Isolated Power ## Stats for Mike Trout available on ## http://www.baseball-reference.com/players/t/troutmi01-bat.shtml ## For 2014, Trout had a SLG of .561 and an AVG of .287 log5 7 ## The formula for his Isolated Power using the iso function is below ## Output should be .274 iso(0.561,0.287) log5 Log5 Sabermetric formula Description Log 5 is a formula invented by Bill James to estimate the probability that team A will win a game, based on the true winning percentage of Team A and Team B. It’s equivalent to the Bradley-Terry- Luce model used for paired comparisons, the Elo rating system used in chess and the Rasch model used in the analysis of categorical data. Usage log5(probA, probB, order) Arguments probA Win probability of team A probB Win probability of team B order Determine winning probability of which team. 0 means win probability of A over B, and 1 vice-versa Value Returns (probA - (probA*probB)) / (probA + probB - (2 * probA * probB)) Author(s) Peter Xenopoulos References http://en.wikipedia.org/wiki/Log5 8 obp obp On-Base Percentage Description Function to calculate the on-base percentage of a player/team Usage obp(H, BB, HBP, AB, SF) Arguments H Hits BB Unintentional Walks HBP Hit by pitch AB At bats SF Sacrifice flies Details On-base percentage is used to figure out how often an entity gets on-base Value Returns the following: ((H+BB+HBP)/(AB+BB+SF+HBP)) Author(s) Peter Xenopoulos References http://en.wikipedia.org/wiki/On-base_percentage See Also Slugging Percentage slg, OPS ops and Isolated Power iso ops 9 Examples ## The on-base percentage (obp) function is currently defined as function (H, BB, HBP, AB, SF) { onbase <- ((H+BB+HBP)/(AB+BB+SF+HBP)) return(onbase) } ## Let's take 2014's MLB MVP, Mike Trout, and find his on-base percentage ## Stats for Mike Trout available on ## http://www.baseball-reference.com/players/t/troutmi01-bat.shtml ## For 2014, Trout had 173 H, 83 BB, 10 HBP, 602 AB, 10 SF ## The formula for his on-base percentage using the obp function is below ## Output should be 0.377305 obp(173,83,10,602,10) ops On-base plus Slugging Description Function to calculate on base percentage plus slugging percentage. This is a measure of a hitter’s ability to hit for power and get on base. Usage ops(slg, obp) Arguments slg Slugging percentage. Found from slg obp On-base percentage. Found from obp Value Returns On-Base Percentage + Slugging Percentage Author(s) Peter Xenopoulos References http://en.wikipedia.org/wiki/On-base_plus_slugging See Also On-base Percentage obp and Slugging Percentage slg 10 pyth Examples ## The on-base percentage plus slugging (ops) function is currently defined as function (slg, obp) { ops <- slg + obp return(ops) } ## Let's take 2014's MLB MVP, Mike Trout, and find his OPS ## Stats for Mike Trout available on ## http://www.baseball-reference.com/players/t/troutmi01-bat.shtml ## For 2014, Trout had a SLG of .561 and an OBP of .377 ## The formula for his OPS using the ops function is below ## Output should be .938 ops(0.561,0.377) pyth Pythagorean Expectation Description Pythagorean expectation is a formula invented by Bill James to estimate how many games a baseball team "should" have won based on the number of runs they scored and allowed. Usage pyth(RS, RA) Arguments RS Runs Scored RA Runs Allowed Value Returns (RS*RS)/((RS*RS)+(RA*RA)) Author(s) Peter Xenopoulos References http://en.wikipedia.org/wiki/Pythagorean_expectation rcBasic 11 rcBasic Runs Created (Basic) Description Basic description of how many runs a hitter contributes to his team Usage rcBasic(H, BB, TB, AB) Arguments H Hits BB Walks TB Total Bases AB At Bats Value Returns ((H+BB)*TB)/(AB+BB) Author(s) Peter Xenopoulos References http://en.wikipedia.org/wiki/Runs_created See Also Runs Created (with stolen bases) rcBasicSB and Runs Created (Technical) rcTech Examples ## This is a generic runs created formula ## Let's see how many runs created (keep in mind this is an estimate) ## a batter will make with ## 100 hits, 7 walks (BB), 80 total bases, and 300 at bats function (H, BB, TB, AB) { rc <- ((H + BB) * TB)/(AB
Recommended publications
  • TODAY's HEADLINES AGAINST the OPPOSITION Home
    ST. PAUL SAINTS (6-9) vs INDIANAPOLIS INDIANS (PIT) (9-5) LHP CHARLIE BARNES (1-0, 4.00) vs RHP JAMES MARVEL (0-0, 3.48) Friday, May 21st, 2021 - 7:05 pm (CT) - St. Paul, MN - CHS FIeld Game #16 - Home Game #10 TV: FOX9+/MiLB.TV RADIO: KFAN Plus 2021 At A Glance TODAY'S HEADLINES AGAINST THE OPPOSITION Home .....................................................4-5 That Was Last Night - The Saints got a walk-off win of their resumed SAINTS VS INDIANAPOLIS Road ......................................................2-4 game from Wednesday night, with Jimmy Kerrigan and the bottom of the Saints order manufacturing the winning run. The second game did .235------------- BA -------------.301 vs. LHP .............................................1-0 not go as well for St. Paul, where they dropped 7-3. Alex Kirilloff has vs. RHP ............................................5-9 homered in both games of his rehab assignment with the Saints. .333-------- BA W/2O ----------.300 Current Streak ......................................L1 .125 ------- BA W/ RISP------- .524 Most Games > .500 ..........................0 Today’s Game - The Saints aim to preserve a chance at a series win 9 ----------------RUNS ------------- 16 tonight against Indianapolis, after dropping two of the first three games. 2 ----------------- HR ---------------- 0 Most Games < .500 ..........................3 Charlie Barnes makes his third start of the year, and the Saints have yet 2 ------------- STEALS ------------- 0 Overall Series ..................................1-0-1 to lose a game he’s started. 5.00 ------------- ERA ----------- 3.04 Home Series ...............................0-0-1 28 ----------------- K's -------------- 32 Keeping it in the Park - Despite a team ERA of 4.66, the Saints have Away Series ................................0-1-0 not been damaged by round-trippers.
    [Show full text]
  • NCAA Division I Baseball Records
    Division I Baseball Records Individual Records .................................................................. 2 Individual Leaders .................................................................. 4 Annual Individual Champions .......................................... 14 Team Records ........................................................................... 22 Team Leaders ............................................................................ 24 Annual Team Champions .................................................... 32 All-Time Winningest Teams ................................................ 38 Collegiate Baseball Division I Final Polls ....................... 42 Baseball America Division I Final Polls ........................... 45 USA Today Baseball Weekly/ESPN/ American Baseball Coaches Association Division I Final Polls ............................................................ 46 National Collegiate Baseball Writers Association Division I Final Polls ............................................................ 48 Statistical Trends ...................................................................... 49 No-Hitters and Perfect Games by Year .......................... 50 2 NCAA BASEBALL DIVISION I RECORDS THROUGH 2011 Official NCAA Division I baseball records began Season Career with the 1957 season and are based on informa- 39—Jason Krizan, Dallas Baptist, 2011 (62 games) 346—Jeff Ledbetter, Florida St., 1979-82 (262 games) tion submitted to the NCAA statistics service by Career RUNS BATTED IN PER GAME institutions
    [Show full text]
  • Pitch Quantification Part 1: Between Pitcher Comparisons of QOP with Conventional Statistics" (2016)
    Biola University Digital Commons @ Biola Faculty Articles & Research 2016 Pitch quantification arP t 1: between pitcher comparisons of QOP with conventional statistics Jason Wilson Biola University Follow this and additional works at: https://digitalcommons.biola.edu/faculty-articles Part of the Sports Studies Commons, and the Statistics and Probability Commons Recommended Citation Wilson, Jason, "Pitch quantification Part 1: between pitcher comparisons of QOP with conventional statistics" (2016). Faculty Articles & Research. 393. https://digitalcommons.biola.edu/faculty-articles/393 This Article is brought to you for free and open access by Digital Commons @ Biola. It has been accepted for inclusion in Faculty Articles & Research by an authorized administrator of Digital Commons @ Biola. For more information, please contact [email protected]. | 1 Pitch Quantification Part 1: Between-Pitcher Comparisons of QOP with Conventional Statistics Jason Wilson1,2 1. Introduction The Quality of Pitch (QOP) statistic uses PITCHf/x data to extract the trajectory, location, and speed from a single pitch and is mapped onto a -10 to 10 scale. A value of 5 or higher represents a quality MLB pitch. In March 2015 we presented an LA Dodgers case study at the SABR Analytics conference using QOP that included the following results1: 1. Clayton Kershaw’s no hitter on June 18, 2014 vs. Colorado had an objectively better pitching performance than Josh Beckett’s no hitter on May 25th vs. Philadelphia. 2. Josh Beckett’s 2014 injury followed a statistically significant decline in his QOP that was not accompanied by a significant decline in MPH. These, and the others made in the presentation, are big claims.
    [Show full text]
  • Sabermetrics: the Past, the Present, and the Future
    Sabermetrics: The Past, the Present, and the Future Jim Albert February 12, 2010 Abstract This article provides an overview of sabermetrics, the science of learn- ing about baseball through objective evidence. Statistics and baseball have always had a strong kinship, as many famous players are known by their famous statistical accomplishments such as Joe Dimaggio’s 56-game hitting streak and Ted Williams’ .406 batting average in the 1941 baseball season. We give an overview of how one measures performance in batting, pitching, and fielding. In baseball, the traditional measures are batting av- erage, slugging percentage, and on-base percentage, but modern measures such as OPS (on-base percentage plus slugging percentage) are better in predicting the number of runs a team will score in a game. Pitching is a harder aspect of performance to measure, since traditional measures such as winning percentage and earned run average are confounded by the abilities of the pitcher teammates. Modern measures of pitching such as DIPS (defense independent pitching statistics) are helpful in isolating the contributions of a pitcher that do not involve his teammates. It is also challenging to measure the quality of a player’s fielding ability, since the standard measure of fielding, the fielding percentage, is not helpful in understanding the range of a player in moving towards a batted ball. New measures of fielding have been developed that are useful in measuring a player’s fielding range. Major League Baseball is measuring the game in new ways, and sabermetrics is using this new data to find better mea- sures of player performance.
    [Show full text]
  • Incorporating the Effects of Designated Hitters in the Pythagorean Expectation
    Abstract The Pythagorean Expectation is widely used in the field of sabermetrics to estimate a baseball team’s overall season winning percentage based on the number of runs scored and allowed in its games thus far. Bill James devised the simplest version RS 2 p q of the formula through empirical observation as W inning P ercentage RS 2 RA 2 “ p q `p q where RS and RA are runs scored and allowed, respectively. Statisticians later found 1.83 to be a more accurate exponent, estimating overall season wins within 3-4 games per season. Steven Miller provided a theoretical justification for the Pythagorean Expectation by modeling runs scored and allowed as independent continuous random variables drawn from Weibull distributions. This paper aims to first explain Miller’s methodology using recent data and then build upon Miller’s work by incorporating the e↵ects of designated hitters, specifically on the distribution of runs scored by a team. Past studies have attempted to include other e↵ects on run production such as ballpark factor, game state, and pitching power. The results indicate that incorporating information on designated hitters does not improve the error of the Pythagorean Expectation to better than 3-4 games per season. ii Contents Abstract ii Acknowledgements vi 1 Background 1 1.1 Empirical Derivation ........................... 2 1.2 Weibull Distribution ........................... 2 1.3 Application to Other Sports ....................... 4 2 Miller’s Model 5 2.1 Model Assumptions ............................ 5 2.1.1 Continuity of the Data ...................... 6 2.1.2 Independence of Runs Scored and Allowed ........... 7 2.2 Pythagorean Won-Loss Formula ....................
    [Show full text]
  • A Statistical Study Nicholas Lambrianou 13' Dr. Nicko
    Examining if High-Team Payroll Leads to High-Team Performance in Baseball: A Statistical Study Nicholas Lambrianou 13' B.S. In Mathematics with Minors in English and Economics Dr. Nickolas Kintos Thesis Advisor Thesis submitted to: Honors Program of Saint Peter's University April 2013 Lambrianou 2 Table of Contents Chapter 1: The Study and its Questions 3 An Introduction to the project, its questions, and a breakdown of the chapters that follow Chapter 2: The Baseball Statistics 5 An explanation of the baseball statistics used for the study, including what the statistics measure, how they measure what they do, and their strengths and weaknesses Chapter 3: Statistical Methods and Procedures 16 An introduction to the statistical methods applied to each statistic and an explanation of what the possible results would mean Chapter 4: Results and the Tampa Bay Rays 22 The results of the study, what they mean against the possibilities and other results, and a short analysis of a team that stood out in the study Chapter 5: The Continuing Conclusion 39 A continuation of the results, followed by ideas for future study that continue to project or stem from it for future baseball analysis Appendix 41 References 42 Lambrianou 3 Chapter 1: The Study and its Questions Does high payroll necessarily mean higher performance for all baseball statistics? Major League Baseball (MLB) is a league of different teams in different cities all across the United States, and those locations strongly influence the market of the team and thus the payroll. Year after year, a certain amount of teams, including the usual ones in big markets, choose to spend a great amount on payroll in hopes of improving their team and its player value output, but at times the statistics produced by these teams may not match the difference in payroll with other teams.
    [Show full text]
  • MLB Statistics Feeds
    Updated 07.17.17 MLB Statistics Feeds 2017 Season 1 SPORTRADAR MLB STATISTICS FEEDS Updated 07.17.17 Table of Contents Overview ....................................................................................................................... Error! Bookmark not defined. MLB Statistics Feeds.................................................................................................................................................. 3 Coverage Levels........................................................................................................................................................... 4 League Information ..................................................................................................................................................... 5 Team & Staff Information .......................................................................................................................................... 7 Player Information ....................................................................................................................................................... 9 Venue Information .................................................................................................................................................... 13 Injuries & Transactions Information ................................................................................................................... 16 Game & Series Information ..................................................................................................................................
    [Show full text]
  • Testing the Minimax Theorem in the Field
    Testing the Minimax Theorem in the Field: The Interaction between Pitcher and Batter in Baseball Christopher Rowe Advisor: Professor William Rogerson Abstract John von Neumann’s Minimax Theorem is a central result in game theory, but its practical applicability is questionable. While laboratory studies have often rejected its conclusions, recent field studies have achieved more favorable results. This thesis adds to the growing body of field studies by turning to the game of baseball. Two models are presented and developed, one based on pitch location and the other based on pitch type. Hypotheses are formed from assumptions on each model and then tested with data from Major League Baseball, yielding evidence in favor of the Minimax Theorem. May 2013 MMSS Senior Thesis Northwestern University Table of Contents Acknowledgements 3 Introduction 4 The Minimax Theorem 4 Central Question and Structure 6 Literature Review 6 Laboratory Experiments 7 Field Experiments 8 Summary 10 Models and Assumptions 10 The Game 10 Pitch Location Model 13 Pitch Type Model 21 Hypotheses 24 Pitch Location Model 24 Pitch Type Model 31 Data Analysis 33 Data 33 Pitch Location Model 34 Pitch Type Model 37 Conclusion 41 Summary of Results 41 Future Research 43 References 44 Appendix A 47 Appendix B 59 2 Acknowledgements I would like to thank everyone who had a role in this paper’s completion. This begins with the Office of Undergraduate Research, who provided me with the funds necessary to complete this project, and everyone at Baseball Info Solutions, in particular Ben Jedlovec and Jeff Spoljaric, who provided me with data.
    [Show full text]
  • Does Sabermetrics Have a Place in Amateur Baseball?
    BaseballGB Full Article Does sabermetrics have a place in amateur baseball? Joe Gray 7 March 2009 he term “sabermetrics” is one of the many The term sabermetrics combines SABR (the acronym creations of Bill James, the great baseball for the Society for American Baseball Research) and theoretician (for details of the term’s metrics (numerical measurements). The extra “e” T was presumably added to avoid the difficult-to- derivation and usage see Box 1). Several tight pronounce sequence of letters “brm”. An alternative definitions exist for the term, but I feel that rather exists without the “e”, but in this the first four than presenting one or more of these it is more letters are capitalized to show that it is a word to valuable to offer an alternative, looser definition: which normal rules of pronunciation do not apply. sabermetrics is a tree of knowledge with its roots in It is a singular noun despite the “s” at the end (that is, you would say “sabermetrics is growing in the philosophy of answering baseball questions in as popularity” rather than “sabermetrics are growing in accurate, objective, and meaningful a fashion as popularity”). possible. The philosophy is an alternative to The adjective sabermetric has been back-derived from the term and is exemplified by “a sabermetric accepting traditional thinking without question. tool”, or its plural “sabermetric tools”. The adverb sabermetrically, built on that back- Branches of the sabermetric tree derived adjective, is illustrated in the phrase “she The metaphor of sabermetrics as a tree extends to approached the problem sabermetrically”. describing the various broad concepts and themes of The noun sabermetrician can be used to describe any practitioner of sabermetrics, although to some research as branches.
    [Show full text]
  • "What Raw Statistics Have the Greatest Effect on Wrc+ in Major League Baseball in 2017?" Gavin D
    1 "What raw statistics have the greatest effect on wRC+ in Major League Baseball in 2017?" Gavin D. Sanford University of Minnesota Duluth Honors Capstone Project 2 Abstract Major League Baseball has different statistics for hitters, fielders, and pitchers. The game has followed the same rules for over a century and this has allowed for statistical comparison. As technology grows, so does the game of baseball as there is more areas of the game that people can monitor and track including pitch speed, spin rates, launch angle, exit velocity and directional break. The website QOPBaseball.com is a newer website that attempts to correctly track every pitches horizontal and vertical break and grade it based on these factors (Wilson, 2016). Fangraphs has statistics on the direction players hit the ball and what percentage of the time. The game of baseball is all about quantifying players and being able give a value to their contributions. Sabermetrics have given us the ability to do this in far more depth. Weighted Runs Created Plus (wRC+) is an offensive stat which is attempted to quantify a player’s total offensive value (wRC and wRC+, Fangraphs). It is Era and park adjusted, meaning that the park and year can be compared without altering the statistic further. In this paper, we look at what 2018 statistics have the greatest effect on an individual player’s wRC+. Keywords: Sabermetrics, Econometrics, Spin Rates, Baseball, Introduction Major League Baseball has been around for over a century has given awards out for almost 100 years. The way that these awards are given out is based on statistics accumulated over the season.
    [Show full text]
  • Machine Learning Applications in Baseball: a Systematic Literature Review
    This is an Accepted Manuscript of an article published by Taylor & Francis in Applied Artificial Intelligence on February 26 2018, available online: https://doi.org/10.1080/08839514.2018.1442991 Machine Learning Applications in Baseball: A Systematic Literature Review Kaan Koseler ([email protected]) and Matthew Stephan* ([email protected]) Miami University Department of Computer Science and Software Engineering 205 Benton Hall 510 E. High St. Oxford, OH 45056 Abstract Statistical analysis of baseball has long been popular, albeit only in limited capacity until relatively recently. In particular, analysts can now apply machine learning algorithms to large baseball data sets to derive meaningful insights into player and team performance. In the interest of stimulating new research and serving as a go-to resource for academic and industrial analysts, we perform a systematic literature review of machine learning applications in baseball analytics. The approaches employed in literature fall mainly under three problem class umbrellas: Regression, Binary Classification, and Multiclass Classification. We categorize these approaches, provide our insights on possible future ap- plications, and conclude with a summary our findings. We find two algorithms dominate the literature: 1) Support Vector Machines for classification problems and 2) k-Nearest Neighbors for both classification and Regression problems. We postulate that recent pro- liferation of neural networks in general machine learning research will soon carry over into baseball analytics. keywords: baseball, machine learning, systematic literature review, classification, regres- sion 1 Introduction Baseball analytics has experienced tremendous growth in the past two decades. Often referred to as \sabermetrics", a term popularized by Bill James, it has become a critical part of professional baseball leagues worldwide (Costa, Huber, and Saccoman 2007; James 1987).
    [Show full text]
  • Loss Aversion and the Contract Year Effect in The
    Gaming the System: Loss Aversion and the Contract Year Effect in the NBA By Ezekiel Shields Wald, UCSB 2/20/2016 Advisor: Professor Peter Kuhn, Ph.D. Abstract The contract year effect, which involves professional athletes strategically adjusting their effort levels to perform more effectively during the final year of a guaranteed contract, has been well documented in professional sports. I examine two types of heterogeneity in the National Basketball Association, a player’s value on the court relative to their salary, and the presence of several contract options that can be included in an NBA contract. Loss aversion suggests that players who are being paid more than they are worth may use their current salaries as a reference point, and be motivated to improve their performance in order to avoid a “loss” of wealth. The presence of contract options impacts the return to effort that the players are facing in their contract season, and can eliminate the contract year effect. I use a linear regression with player, year and team fixed effects to evaluate the impact of a contract year on relevant performance metrics, and find compelling evidence for a general contract year effect. I also develop a general empirical model of the contract year effect given loss aversion, which is absent from previous literature. The results of this study support loss-aversion as a primary motivator of the contract year effect, as only players who are marginally overvalued show a significant contract year effect. The presence of a team option in a player’s contract entirely eliminates any contract year effects they may otherwise show.
    [Show full text]