Simulation-Based Projections for Baseball Statistics

Total Page:16

File Type:pdf, Size:1020Kb

Simulation-Based Projections for Baseball Statistics Simulation-Based Projections for Baseball Statistics A Thesis Presented to the Faculty of California State Polytechnic University, Pomona In Partial Fulfillment Of the Requirements for the Degree Master of Science In Computer Science By Daniel Adam Acevedo 2018 SIGNATURE PAGE THESIS: SIMULATION-BASED PROJECTIONS FOR BASEBALL STATISTICS AUTHOR: DANIEL ADAM ACEVEDO TERM SUBMITTED: Spring 2018 Computer Science Department Dr. Yu Sun ____________________________________ Thesis Committee Chair Department of Computer Science Dr. Abdelfattah Amamra ____________________________________ Department of Computer Science Dr. Sampath Jayarathna ____________________________________ Department of Computer Science ii ACKNOWLEDGEMENTS I would like to thank my family for their love, support, and for all of the sacrifices they’ve made so that I could get an education, resulting in a Master’s Degree. I would like to thank my advisor, Dr. Yu Sun, for his guidance and help throughout my time conducting this thesis. I would also like to thank my Dr. Sampath Jayarathna and Dr. Abdelfattah Amamra for being members of my committee. iii ABSTRACT Baseball is an unpredictable sport. The introduction of sabermetrics established an opening for the application of computer science methods within the game’s evaluation. Every Major League Baseball organization has developed their own method of measuring players’ results and making predictions as to what they should expect from a player entering a season. While most industry models use their own statistical analysis to perform predictions, this thesis introduces a new model that uses simulations in addition to statistical analysis in order to make predictions. The results of this thesis show that this model is comparable to some of the best projection systems available. iv TABLE OF CONTENTS SIGNATURE PAGE .................................................................................................... ii ACKNOWLEDGEMENTS ......................................................................................... iii ABSTRACT ................................................................................................................. iv LIST OF FIGURES ..................................................................................................... vi 1. INTRODUCTION .....................................................................................................1 2. ACQUIRING DATA AND WEIGHTS.....................................................................4 2.1. Data Used ....................................................................................................................... 4 2.2. Data Acquisition ............................................................................................................. 5 2.2.1. Setting up the Databases............................................................................................ 5 2.2.2. Obtaining the Most Recent Four Year Period ............................................................ 6 2.2.3. Obtain Weights ......................................................................................................... 7 2.2.4. Applying Weights and Regression to the Mean ......................................................... 9 3. IMPLEMENTATION ............................................................................................. 10 3.1. Explanation of a Simulation ........................................................................................ 10 3.2. Creating a Prediction ................................................................................................... 11 3.3. Apply Age Regression .................................................................................................. 12 3.4. Prediction Example ...................................................................................................... 13 4. ANALYSIS ............................................................................................................... 19 4.1. Metrics for Evaluation ................................................................................................. 19 4.2. Explanation of Industry Projections............................................................................ 19 4.3. Explanation of Metrics ................................................................................................. 20 4.4. Predictions Comparison ............................................................................................... 21 v 4.4.1. All Players .............................................................................................................. 22 4.4.2. Players with less than three years played ................................................................. 26 4.4.3. Players with four or more years played .................................................................... 30 5. CONCLUSION ........................................................................................................ 34 vi LIST OF FIGURES Figure 1. Description of SQLite tables .............................................................................5 Figure 2. Description of statistics considered for predictions. ...........................................6 Figure 3. Spinner Board example featuring BIP vs. Not a BIP ....................................... 10 Figure 4. Spinner Board example with numerous outcomes ........................................... 11 Figure 5. Wil Myers' Generalized Spinner Board ........................................................... 14 Figure 6. Myers' Outcome Spinner Board ...................................................................... 15 Figure 7. Myers' Probabilities of outcomes given certain events occurring ..................... 16 Figure 8. Myers' Resulting BIP vs Not a BIP ................................................................. 17 Figure 9. Myers' Resulting Outcomes ............................................................................ 17 Figure 10. Myers' true rates and predicted values based on those rates ........................... 18 Figures 11-15. Any Years Played MAE of Hit, Home Run, Runs Scored, RBI, and WOBA Predictions ................................................................................................ 23 Figures 16-15. Any Years Played RMSE of Hit, Home Run, Runs Scored, RBI, and WOBA Predictions ................................................................................................ 24 Figures 21-25. Any Years Played R of Hit, Home Run, Runs Scored, RBI, and WOBA Predictions ............................................................................................................. 25 Figures 26-30. Less than Four Years Played MAE of Hit, Home Run, Runs Scored, RBI, and WOBA Predictions .......................................................................................... 27 Figures 31-35. Less than Four Years Played RMSE of Hit, Home Run, Runs Scored, RBI, and WOBA Predictions ................................................................................. 28 Figures 36-40. Less than Four Years Played R of Hit, Home Run, Runs Scored, RBI, and WOBA Predictions ................................................................................................ 29 vi Figures 41-45. Four or More Years Played MAE of Hit, Home Run, Runs Scored, RBI, and WOBA Predictions .......................................................................................... 31 Figures 46-50. Four or More Years Played RMSE of Hit, Home Run, Runs Scored, RBI, and WOBA Predictions .......................................................................................... 32 Figures 51-55. Four or More Years Played R of Hit, Home Run, Runs Scored, RBI, and WOBA Predictions ................................................................................................ 33 vii 1. INTRODUCTION Baseball is often called a game of failure – a player who hits the ball only three out of ten times is considered amongst the best players in the game. The prediction of baseball statistics could also be considered a game of failure, as prediction models try their best to utilize a large amount of data to project a player’s performance, but will ultimately never be able to perfectly predict a player’s performance consistently. Major League Baseball (MLB) organizations have a substantial interest in the performance of their projection systems: teams pay salaries to players that are consistent with their performance, with the assumption that their success is reasonably sustainable in future years. Most recently, in 2014, the Miami Marlins awarded Giancarlo Stanton with the largest monetary contract in MLB history, worth $325,000,000 over a thirteen-year period [1]. These large contracts are risks – Stanton played worse than his pre-contract average over the next two years, before obtaining the best statistical year of his career in 2017 [2]. The Marlins, under new ownership, traded Stanton to the New York Yankees, as the team could no longer afford such a large monetary contract. This highlights the importance of teams signing players to salaries that are consistent with their past performance and with an expectation that their performance will improve or stay the same, while simultaneously maintaining their budget. A well-known example of the importance of maintaining a budget is portrayed in the film Moneyball. Based on a true story, it follows Oakland Athletics’ general manager Billy Beane and his use of sabermetrics, the application of
Recommended publications
  • Implicitly Defined Baseball Statistics
    Implicitly Defined Baseball Statistics December 9, 2012 Joe Scott 1 Introduction Major League Baseball uses statistics to determine awards every season. The batting champion is given to the player with the highest batting average. The Cy Young Award is given to the top pitcher which is determined by many different statistics including earned run average (ERA). Batting average and ERA have been used for many years and are major statistics in baseball. Neither batting average or ERA consider the skill of the opposing pitcher or batter. Thus, every pitcher and batter is considered to have the same skill level. We develop an implicitly definded statistic that determines the skill or value of a player. The value of a batter and the value of a pitcher is based on the skill of the oppposing pitcher and batter respectively. We use linear algebra to find eigenvector solutions to the eigenvalue problem, Aλ = λx, which generates each player's statistical value. 2 Idea Consider a baseball league in which there are Nb players who bat, represented by bi for 1 ≤ i ≤ Nb. We represent the number of pitchers in the league as pj, 1 ≤ j ≤ Np where Np is the number of pitchers. Nb is defined as the number players who record an at bat during a specific season and Np is the number of players who record a pitching appearance during a season. The total number of players in the league, Ntp, is represented by the inequality Ntb ≤ Nb + Np. This inequality considers players who both hit and pitch. Since in the National League pitchers hit as well as pitch we need to add the pitchers to the total number of batters and in interleague play (which is when American League teams face National League teams in the regular season) American League pitchers bat when the National League team is home.
    [Show full text]
  • NCAA Division I Baseball Records
    Division I Baseball Records Individual Records .................................................................. 2 Individual Leaders .................................................................. 4 Annual Individual Champions .......................................... 14 Team Records ........................................................................... 22 Team Leaders ............................................................................ 24 Annual Team Champions .................................................... 32 All-Time Winningest Teams ................................................ 38 Collegiate Baseball Division I Final Polls ....................... 42 Baseball America Division I Final Polls ........................... 45 USA Today Baseball Weekly/ESPN/ American Baseball Coaches Association Division I Final Polls ............................................................ 46 National Collegiate Baseball Writers Association Division I Final Polls ............................................................ 48 Statistical Trends ...................................................................... 49 No-Hitters and Perfect Games by Year .......................... 50 2 NCAA BASEBALL DIVISION I RECORDS THROUGH 2011 Official NCAA Division I baseball records began Season Career with the 1957 season and are based on informa- 39—Jason Krizan, Dallas Baptist, 2011 (62 games) 346—Jeff Ledbetter, Florida St., 1979-82 (262 games) tion submitted to the NCAA statistics service by Career RUNS BATTED IN PER GAME institutions
    [Show full text]
  • Pitch Quantification Part 1: Between Pitcher Comparisons of QOP with Conventional Statistics" (2016)
    Biola University Digital Commons @ Biola Faculty Articles & Research 2016 Pitch quantification arP t 1: between pitcher comparisons of QOP with conventional statistics Jason Wilson Biola University Follow this and additional works at: https://digitalcommons.biola.edu/faculty-articles Part of the Sports Studies Commons, and the Statistics and Probability Commons Recommended Citation Wilson, Jason, "Pitch quantification Part 1: between pitcher comparisons of QOP with conventional statistics" (2016). Faculty Articles & Research. 393. https://digitalcommons.biola.edu/faculty-articles/393 This Article is brought to you for free and open access by Digital Commons @ Biola. It has been accepted for inclusion in Faculty Articles & Research by an authorized administrator of Digital Commons @ Biola. For more information, please contact [email protected]. | 1 Pitch Quantification Part 1: Between-Pitcher Comparisons of QOP with Conventional Statistics Jason Wilson1,2 1. Introduction The Quality of Pitch (QOP) statistic uses PITCHf/x data to extract the trajectory, location, and speed from a single pitch and is mapped onto a -10 to 10 scale. A value of 5 or higher represents a quality MLB pitch. In March 2015 we presented an LA Dodgers case study at the SABR Analytics conference using QOP that included the following results1: 1. Clayton Kershaw’s no hitter on June 18, 2014 vs. Colorado had an objectively better pitching performance than Josh Beckett’s no hitter on May 25th vs. Philadelphia. 2. Josh Beckett’s 2014 injury followed a statistically significant decline in his QOP that was not accompanied by a significant decline in MPH. These, and the others made in the presentation, are big claims.
    [Show full text]
  • Sabermetrics: the Past, the Present, and the Future
    Sabermetrics: The Past, the Present, and the Future Jim Albert February 12, 2010 Abstract This article provides an overview of sabermetrics, the science of learn- ing about baseball through objective evidence. Statistics and baseball have always had a strong kinship, as many famous players are known by their famous statistical accomplishments such as Joe Dimaggio’s 56-game hitting streak and Ted Williams’ .406 batting average in the 1941 baseball season. We give an overview of how one measures performance in batting, pitching, and fielding. In baseball, the traditional measures are batting av- erage, slugging percentage, and on-base percentage, but modern measures such as OPS (on-base percentage plus slugging percentage) are better in predicting the number of runs a team will score in a game. Pitching is a harder aspect of performance to measure, since traditional measures such as winning percentage and earned run average are confounded by the abilities of the pitcher teammates. Modern measures of pitching such as DIPS (defense independent pitching statistics) are helpful in isolating the contributions of a pitcher that do not involve his teammates. It is also challenging to measure the quality of a player’s fielding ability, since the standard measure of fielding, the fielding percentage, is not helpful in understanding the range of a player in moving towards a batted ball. New measures of fielding have been developed that are useful in measuring a player’s fielding range. Major League Baseball is measuring the game in new ways, and sabermetrics is using this new data to find better mea- sures of player performance.
    [Show full text]
  • Riverside Quarterly V2N4 Sapiro 1967-03
    Riverside XZ ‘ RIVERS lue. QUARTERLY March 1967 Vol. u, 4 Editor: Leland Sapiro Associate Editor: Jim Harmon Poetry Editor: Jim Sallis Assistant Editors: Redd Boggs Edward Teach Jon White Send business correspondence and prose manuscripts to: This issue is dedicated to John W. Campbell, Jr., who is Box 82 University Station, Saskatoon, Canada the main subject in two articles. If Orlin Tremaine changed science fiction "from a didactic exercise into a form of art," Send poetry to: R.D. 3, Iowa City, Iowa 52240 then Campbell changed it from romance to novel, i.e., into an art form with social content. I do not prefer the type of story emphasised by Mr. Campbell's present magazine, but this in no way reduces indebtedness to him for any science fiction reader. table of contents "NOW HEAR THIS'." Everyone is urged to register at once for the 1967 science­ RQ Miscellany .................... 231 fiction convention to be held in New York city, September 1—4. Superman and the System ..... A S3 registration fee paid now entitles you to the usual con­ (first of two parts) ........... W.H.G. Armytage .... 232 vention privileges (e.g., reduced room rates) plus progress reports and a program book mailed in advance. Send cash or in­ Consubstantial ............ ....... Padraig 0 Broin .... 243 quiries to Nycon 3, Box 367, Gracie Square Sta., New York 10028. Creide's Lament for Cael ............ 244 Parapsychology: Fact or Fraud? .... Raymond Birge ..... 247 "RADIOHERO" The Bombardier .................... Thomas Disch ....... 265 Old Time Radio fans can anticipate Jim Harmon's book, The Great Radio Heroes, scheduled for publication by Doubleday On Being Forbidden Entrance to a Castle ...
    [Show full text]
  • A Statistical Study Nicholas Lambrianou 13' Dr. Nicko
    Examining if High-Team Payroll Leads to High-Team Performance in Baseball: A Statistical Study Nicholas Lambrianou 13' B.S. In Mathematics with Minors in English and Economics Dr. Nickolas Kintos Thesis Advisor Thesis submitted to: Honors Program of Saint Peter's University April 2013 Lambrianou 2 Table of Contents Chapter 1: The Study and its Questions 3 An Introduction to the project, its questions, and a breakdown of the chapters that follow Chapter 2: The Baseball Statistics 5 An explanation of the baseball statistics used for the study, including what the statistics measure, how they measure what they do, and their strengths and weaknesses Chapter 3: Statistical Methods and Procedures 16 An introduction to the statistical methods applied to each statistic and an explanation of what the possible results would mean Chapter 4: Results and the Tampa Bay Rays 22 The results of the study, what they mean against the possibilities and other results, and a short analysis of a team that stood out in the study Chapter 5: The Continuing Conclusion 39 A continuation of the results, followed by ideas for future study that continue to project or stem from it for future baseball analysis Appendix 41 References 42 Lambrianou 3 Chapter 1: The Study and its Questions Does high payroll necessarily mean higher performance for all baseball statistics? Major League Baseball (MLB) is a league of different teams in different cities all across the United States, and those locations strongly influence the market of the team and thus the payroll. Year after year, a certain amount of teams, including the usual ones in big markets, choose to spend a great amount on payroll in hopes of improving their team and its player value output, but at times the statistics produced by these teams may not match the difference in payroll with other teams.
    [Show full text]
  • OFFICIAL GAME INFORMATION Lake County Captains (14-15) Vs
    High-A Affiliate OFFICIAL GAME INFORMATION Lake County Captains (14-15) vs. Dayton Dragons (16-13) Sunday, June 6th • 1:30 p.m. • Classic Park • Broadcast: WJCU.org Game #30 • Home Game #12 • Season Series: 3-2, 19 Games Remaining RHP Mason Hickman (1-2, 3.45 ERA) vs. RHP Spencer Stockton (2-0, 3.57 ERA) YESTERDAY: The Captains’ three-game winning streak ended with a 15-4 loss to Dayton on Saturday night. Kevin Coulter surrendered seven runs on 10 hits over 1.2 innings to take the loss in a spot start. Dragons centerfielder Quin Cotton hit two home runs and drove in six High-A Central League runs to lead the Dayton offense. Dragons starter Graham Ashcraft earned the win with seven strong innings, in which he allowed just one run on two hits and struck out nine. East Division W L GB COMING ALIVE: After scoring just 12 runs and suffering a six-game sweep last week at West Michigan, the Captains have already scored 29 runs in the first five games of this series against Dayton. Will Brennan has gone 7-for-18 (.389) with two home runs, two doubles, 10 RBI and West Michigan (Detroit) 16 12 -- a 1.254 OPS. Joe Naranjo has gone 3-for-10 with a team-leading five walks for a .533 on-base percentage. Dayton (Cincinnati) 16 13 0.5 BRENNAN BASHING: Captains OF Will Brennan leads the High-A Central League (HAC) lead in doubles (11). He is second in batting average (.326), fourth in wRC+ (154), fifth in on-base percentage (.410), sixth in OPS (.920), sixth in extra-base hits (13) and ninth in slugging Great Lakes (Los Angeles - NL) 15 14 1.5 percentage (.511).
    [Show full text]
  • Iscore Baseball | Training
    | Follow us Login Baseball Basketball Football Soccer To view a completed Scorebook (2004 ALCS Game 7), click the image to the right. NOTE: You must have a PDF Viewer to view the sample. Play Description Scorebook Box Picture / Details Typical batter making an out. Strike boxes will be white for strike looking, yellow for foul balls, and red for swinging strikes. Typical batter getting a hit and going on to score Ways for Batter to make an out Scorebook Out Type Additional Comments Scorebook Out Type Additional Comments Box Strikeout Count was full, 3rd out of inning Looking Strikeout Count full, swinging strikeout, 2nd out of inning Swinging Fly Out Fly out to left field, 1st out of inning Ground Out Ground out to shortstop, 1-0 count, 2nd out of inning Unassisted Unassisted ground out to first baseman, ending the inning Ground Out Double Play Batter hit into a 1-6-3 double play (DP1-6-3) Batter hit into a triple play. In this case, a line drive to short stop, he stepped on Triple Play bag at second and threw to first. Line Drive Out Line drive out to shortstop (just shows position number). First out of inning. Infield Fly Rule Infield Fly Rule. Second out of inning. Batter tried for a bunt base hit, but was thrown out by catcher to first base (2- Bunt Out 3). Sacrifice fly to center field. One RBI (blue dot), 2nd out of inning. Three foul Sacrifice Fly balls during at bat - really worked for it. Sacrifice Bunt Sacrifice bunt to advance a runner.
    [Show full text]
  • Testing the Minimax Theorem in the Field
    Testing the Minimax Theorem in the Field: The Interaction between Pitcher and Batter in Baseball Christopher Rowe Advisor: Professor William Rogerson Abstract John von Neumann’s Minimax Theorem is a central result in game theory, but its practical applicability is questionable. While laboratory studies have often rejected its conclusions, recent field studies have achieved more favorable results. This thesis adds to the growing body of field studies by turning to the game of baseball. Two models are presented and developed, one based on pitch location and the other based on pitch type. Hypotheses are formed from assumptions on each model and then tested with data from Major League Baseball, yielding evidence in favor of the Minimax Theorem. May 2013 MMSS Senior Thesis Northwestern University Table of Contents Acknowledgements 3 Introduction 4 The Minimax Theorem 4 Central Question and Structure 6 Literature Review 6 Laboratory Experiments 7 Field Experiments 8 Summary 10 Models and Assumptions 10 The Game 10 Pitch Location Model 13 Pitch Type Model 21 Hypotheses 24 Pitch Location Model 24 Pitch Type Model 31 Data Analysis 33 Data 33 Pitch Location Model 34 Pitch Type Model 37 Conclusion 41 Summary of Results 41 Future Research 43 References 44 Appendix A 47 Appendix B 59 2 Acknowledgements I would like to thank everyone who had a role in this paper’s completion. This begins with the Office of Undergraduate Research, who provided me with the funds necessary to complete this project, and everyone at Baseball Info Solutions, in particular Ben Jedlovec and Jeff Spoljaric, who provided me with data.
    [Show full text]
  • Tesis Doctorals En Xarxa
    Coprocessor integration for real-time event processing in particle physics detectors Alexey Pavlovich Badalov http://hdl.handle.net/10803/396128 ADVERTIMENT. L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by/4.0/ ADVERTENCIA. El acceso a los contenidos de esta tesis queda condicionado a la aceptación de las condiciones de uso establecidas por la siguiente licencia Creative Commons: http://creativecommons.org/licenses/by/4.0/ The access to the contents of this doctoral thesis it is limited to the acceptance of the use WARNING. conditions set by the following Creative Commons license: http://creativecommons.org/licenses/by/4.0/ 90) - 02 - TESIS DOCTORAL Título Coprocessor integration for real-time event processing in particle physics detectors Realizada por Alexey Badalov en el Centro La Salle – Ramon Llull University y en el Departamento GR-SETAD C.I.F. G: 59069740 Universitat Ramon Llull Fundació Rgtre. Fund. Generalitat de Catalunya núm. 472 (28 472 núm. de Catalunya Generalitat Rgtre. Fund. Fundació Llull Ramon Universitat 59069740 G: C.I.F. Dirigida por Dr. Xavier Vilasis i Cardona Dr. Niko Neufeld C. Claravall, 1-3 | 08022 Barcelona | Tel. 93 602 22 00 | Fax 93 602 22 49 | [email protected] | www.url.edu Coprocessor integration for real-time event processing in particle physics detectors Alexey Badalov 2 Abstract High-energy physics experiments today have higher energies, more accurate sensors, and more flexible means of data collection than ever before. Their rapid progress requires ever more computational power; and massively parallel hardware, such as graphics cards, holds the promise to provide this power at a much lower cost than traditional CPUs.
    [Show full text]
  • Aaron Judge Remarkable
    Aaron Judge Notes from 2017 Regular Season These notes were compiled by Remarkable. What is Remarkable? It’s a patented application that ​ ​ produces insightful statistical nuggets on players and teams in plain language, automatically! Thousands of notes at your fingertips each day. Aaron Judge had an OPS of 1.460 (97 PAs) against RHP over the last 30 days of the regular ​ season (26 Games) -- 2nd best in MLB; League Avg: .815. ​ ​ Aaron Judge has an average Exit Velocity of 95.3 MPH versus starting pitchers this season ​ (211 balls in play) -- Rank: 1st of 140 full time hitters in MLB; League Avg: 88.0. ​ ​ Aaron Judge had an OBP of .536 (56 PAs) when the bases are empty over the last 30 days of ​ the regular season (24 Games) -- best in MLB; League Avg: .335. ​ ​ Aaron Judge put just 22.6% of his swings in play (51/226) on the first pitch of at-bats in the ​ 2017 season -- lowest in MLB; League Avg: 37.9%. ​ ​ Aaron Judge drew 28 walks in 117 PAs (23.9%) over the last 30 days of the regular season (26 ​ Games) -- best in MLB; League Avg: 9.3%. ​ ​ Aaron Judge pulled 80.0% of balls he's put into play (16/20) on elevated pitches over the last ​ 30 days of the regular season (26 Games) -- highest in MLB; League Avg: 50.8%. ​ ​ None of Aaron Judge's plate appearances lasted only one pitch (0/27 PAs) over the last week ​ ​ of the regular season (6 Games). Aaron Judge had a swing rate of just 17.2% (22/128) on fastballs away over the last 30 days of ​ the regular season (25 Games) -- lowest in MLB; League Avg: 37.6%.
    [Show full text]
  • Machine Learning Applications in Baseball: a Systematic Literature Review
    This is an Accepted Manuscript of an article published by Taylor & Francis in Applied Artificial Intelligence on February 26 2018, available online: https://doi.org/10.1080/08839514.2018.1442991 Machine Learning Applications in Baseball: A Systematic Literature Review Kaan Koseler ([email protected]) and Matthew Stephan* ([email protected]) Miami University Department of Computer Science and Software Engineering 205 Benton Hall 510 E. High St. Oxford, OH 45056 Abstract Statistical analysis of baseball has long been popular, albeit only in limited capacity until relatively recently. In particular, analysts can now apply machine learning algorithms to large baseball data sets to derive meaningful insights into player and team performance. In the interest of stimulating new research and serving as a go-to resource for academic and industrial analysts, we perform a systematic literature review of machine learning applications in baseball analytics. The approaches employed in literature fall mainly under three problem class umbrellas: Regression, Binary Classification, and Multiclass Classification. We categorize these approaches, provide our insights on possible future ap- plications, and conclude with a summary our findings. We find two algorithms dominate the literature: 1) Support Vector Machines for classification problems and 2) k-Nearest Neighbors for both classification and Regression problems. We postulate that recent pro- liferation of neural networks in general machine learning research will soon carry over into baseball analytics. keywords: baseball, machine learning, systematic literature review, classification, regres- sion 1 Introduction Baseball analytics has experienced tremendous growth in the past two decades. Often referred to as \sabermetrics", a term popularized by Bill James, it has become a critical part of professional baseball leagues worldwide (Costa, Huber, and Saccoman 2007; James 1987).
    [Show full text]