A Survey of Baseball Machine Learning: a Technical Report
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Machine Learning Applications in Baseball: a Systematic Literature Review
This is an Accepted Manuscript of an article published by Taylor & Francis in Applied Artificial Intelligence on February 26 2018, available online: https://doi.org/10.1080/08839514.2018.1442991 Machine Learning Applications in Baseball: A Systematic Literature Review Kaan Koseler ([email protected]) and Matthew Stephan* ([email protected]) Miami University Department of Computer Science and Software Engineering 205 Benton Hall 510 E. High St. Oxford, OH 45056 Abstract Statistical analysis of baseball has long been popular, albeit only in limited capacity until relatively recently. In particular, analysts can now apply machine learning algorithms to large baseball data sets to derive meaningful insights into player and team performance. In the interest of stimulating new research and serving as a go-to resource for academic and industrial analysts, we perform a systematic literature review of machine learning applications in baseball analytics. The approaches employed in literature fall mainly under three problem class umbrellas: Regression, Binary Classification, and Multiclass Classification. We categorize these approaches, provide our insights on possible future ap- plications, and conclude with a summary our findings. We find two algorithms dominate the literature: 1) Support Vector Machines for classification problems and 2) k-Nearest Neighbors for both classification and Regression problems. We postulate that recent pro- liferation of neural networks in general machine learning research will soon carry over into baseball analytics. keywords: baseball, machine learning, systematic literature review, classification, regres- sion 1 Introduction Baseball analytics has experienced tremendous growth in the past two decades. Often referred to as \sabermetrics", a term popularized by Bill James, it has become a critical part of professional baseball leagues worldwide (Costa, Huber, and Saccoman 2007; James 1987). -
A Giant Whiff: Why the New CBA Fails Baseball's Smartest Small Market Franchises
DePaul Journal of Sports Law Volume 4 Issue 1 Summer 2007: Symposium - Regulation of Coaches' and Athletes' Behavior and Related Article 3 Contemporary Considerations A Giant Whiff: Why the New CBA Fails Baseball's Smartest Small Market Franchises Jon Berkon Follow this and additional works at: https://via.library.depaul.edu/jslcp Recommended Citation Jon Berkon, A Giant Whiff: Why the New CBA Fails Baseball's Smartest Small Market Franchises, 4 DePaul J. Sports L. & Contemp. Probs. 9 (2007) Available at: https://via.library.depaul.edu/jslcp/vol4/iss1/3 This Notes and Comments is brought to you for free and open access by the College of Law at Via Sapientiae. It has been accepted for inclusion in DePaul Journal of Sports Law by an authorized editor of Via Sapientiae. For more information, please contact [email protected]. A GIANT WHIFF: WHY THE NEW CBA FAILS BASEBALL'S SMARTEST SMALL MARKET FRANCHISES INTRODUCTION Just before Game 3 of the World Series, viewers saw something en- tirely unexpected. No, it wasn't the sight of the Cardinals and Tigers playing baseball in late October. Instead, it was Commissioner Bud Selig and Donald Fehr, the head of Major League Baseball Players' Association (MLBPA), gleefully announcing a new Collective Bar- gaining Agreement (CBA), thereby guaranteeing labor peace through 2011.1 The deal was struck a full two months before the 2002 CBA had expired, an occurrence once thought as likely as George Bush and Nancy Pelosi campaigning for each other in an election year.2 Baseball insiders attributed the deal to the sport's economic health. -
2013 Baseball Hall of Fame Natalie Weinberg University of Pennsylvania [email protected]
COMPARATIVE ADVANTAGE Winter 2014 MICROECONOMICS 2013 Baseball Hall of Fame Natalie Weinberg University of Pennsylvania [email protected] Abstract The purpose of this paper is to outline potential reasons why the 2013 election vote into the Baseball Hall of Game failed to elect a new player. The paper compares various voting rules, and analyzes specific statistics of players. 6 COMPARATIVE ADVANTAGE Winter 2014 MICROECONOMICS When a player is elected into nually (baseballhall.org). sdfsdf Each voter from the BBWAA the Baseball Hall of Fame, he The eligible candidate pool submits his or her top 10 pre- enters the club of the “immor- for the players ballot each year ferred candidates that he or she tals” (New York Times). The consists of all players who were feels is worthy to be inducted Hall of Fame in Cooperstown, part of Major League Baseball into the Hall from the list on New York, is a museum that (the MLB) for at least 10 con- the ballot (bbwaa.com). The honors and preserves the lega- secutive years and have been listed order is not relevant to 1 cy of outstanding baseball play- retired for at least five . Another the voting; each player in the ers throughout the decades. A committee narrows down this group of 10 is treated equally in player receives a great honor by pool to 200 players, and then the the count. In addition, a voter being voted in, and his career is 60-person BBWAA screening is only restricted to nominating stamped with a seal of approv- committee compiles the top 25 10 candidates, but he or she can al by the fans of the game. -
A's News Clips, Tuesday, February 14, 2012 Oakland A's Sign Cuban
A’s News Clips, Tuesday, February 14, 2012 Oakland A's sign Cuban outfielder Yoenis Cespedes By Joe Stiglich, Oakland Tribune The A's added another twist to their curious offseason Monday, agreeing to a four-year, $36 million contract with Cuban outfielder Yoenis Cespedes. Cespedes -- hyped as having excellent power, good speed and a strong arm -- was considered the top hitter on the international market this winter. But he couldn't be signed until he established residency in the Dominican Republic after defecting from Cuba. Cespedes, 26, still needs to obtain a worker's visa and pass a physical before his deal is completed. His agent, Adam Katz, would not speculate on whether Cespedes will be in training camp when A's position players report Feb. 24. Pitchers and catchers report Saturday. The A's hope they finally have filled a need for a young power-hitting outfielder. The right-handed Cespedes hit 33 homers in 90 games last season in the Cuban National Series, Cuba's premier league. He hit .458 in six games during the 2009 World Baseball Classic. "This kid is a physical presence," A's player personnel director Billy Owens told MLB Network Radio. "We've actually scouted him the last four or five years in international competition, and he blows you away with sheer physicality, running speed, the power potential." A's general manager Billy Beane declined to comment on Cespedes. It is unknown whether the A's will thrust him into the opening day lineup or give him time in the minors. Their projected outfield, left to right, is Seth Smith, Coco Crisp and Josh Reddick. -
The Stolen Base Is an Integral Part of the Game of Baseball
THE STOLEN BASE by Lindsay S. Parr A thesis submitted to the Faculty and the Board of Trustees of the Colorado School of Mines in partial fulfillment of the requirements for the degree of Master of Science (Applied Mathematics and Statistics). Golden, Colorado Date Signed: Lindsay S. Parr Signed: Dr. William C. Navidi Thesis Advisor Golden, Colorado Date Signed: Dr. Willy A. Hereman Professor and Head Department of Applied Mathematics and Statistics ii ABSTRACT The stolen base is an integral part of the game of baseball. As it is frequent that a player is in a situation where he could attempt to steal a base, it is important to determine when he should try to steal in order to obtain more wins per season for his team. I used a sample of games during the 2012 and 2013 Major League Baseball seasons to see how often players stole in given scenarios based on number of outs, pickoff attempts, runs until the end of the inning, left or right-handed batter/pitcher, run differential, and inning. New stolen base strategies were created using the percentage of opportunities attempted and the percentage of successful attempts for each scenario in the sample, a formula introduced by Bill James for batter/pitcher match-up, and run expectancy. After writing a program in R to simulate baseball games with the ability to change the stolen base strategy, I compared new strategies to the current strategy used to see if they would increase each Major League Baseball team’s average number of wins per season. I found that when using a strategy where a team steals 80% of the time it increases its run expectancy and 20% of the time that it does not, the average number of wins per season increases for a vast majority of teams over using the current strategy. -
356 Baseball for Dummies, 4Th Edition
Index 1B. See fi rst–base position American Association, 210 2B. See second–base position American League (AL), 207. 3B. See third–base position See also stadiums 40–40 club, 336 American Legion Baseball, 197 anabolic steroids, 282 • A • Angel Stadium of Anaheim, 280 appeal plays, 39, 328 Aaron, Hank, 322 appealing, 328 abbreviations appearances, defi ned, 328 player, 9 Arizona Diamondbacks, 265 scoring, 262 Arizona Fall League, 212 across the letters, 327 Arlett, Buzz, 213 activate, defi ned, 327 around the horn, defi ned, 328 adjudged, defi ned, 327 artifi cial turf, 168, 328 adjusted OPS (OPS+), 243–244 Asian leagues, 216 advance sale, 327 assists, 247, 263, 328 advance scouts, 233–234, 327 AT&T Park, 272, 280 advancing at-balls, 328 hitter, 67, 70, 327 at-bats, 8, 328 runner, 12, 32, 39, 91, 327 Atlanta Braves, 265–266 ahead in the count, defi ned, 327 attempts, 328. See also stealing bases airmailed, defi ned, 327 automatic outs, 328 AL (American League) teams, 207. away games, 328 See also stadiums alive balls, 32 • B • alive innings, 327 All American Amateur Baseball Babe Ruth League, 197 Association, 197 Babe Ruth’s curse, 328 alley (power alley; gap), 189, 327, 337 back through the box, defi ned, 328 alley hitters, 327 backdoor slide, 328 allowing, defi ned, 327COPYRIGHTEDbackdoor MATERIAL slider, 234, 328 All-Star, defi ned, 327 backhand plays, 178–179 All-Star Break, 327 backstops, 28, 329 All-Star Game, 252, 328 backup, 329 Alphonse and Gaston Act, 328 bad balls, 59, 329 aluminum bats, 19–20 bad bounces (bad hops), 272, 329 -
Sabermetrics Over Time: Persuasion and Symbolic Convergence Across a Diffusion of Innovations
SABERMETRICS OVER TIME: PERSUASION AND SYMBOLIC CONVERGENCE ACROSS A DIFFUSION OF INNOVATIONS BY NATHANIEL H. STOLTZ A Thesis Submitted to the Graduate Faculty of WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES in Partial Fulfillment of the Requirements for the Degree of MASTER OF ARTS Communication May 2014 Winston-Salem, North Carolina Approved By: Michael D. Hazen, Ph. D., Advisor John T. Llewellyn, Ph. D., Chair Todd A. McFall, Ph. D. ii Acknowledgments First and foremost, I would like to thank everyone who has assisted me along the way in what has not always been the smoothest of academic journeys. It begins with the wonderful group of faculty I encountered as an undergraduate in the James Madison Writing, Rhetoric, and Technical Communication department, especially my advisor, Cindy Allen. Without them, I would never have been prepared to complete my undergraduate studies, let alone take on the challenges of graduate work. I also want to thank the admissions committee at Wake Forest for giving me the opportunity to have a graduate school experience at a leading program. Further, I have unending gratitude for the guidance and patience of my thesis committee: Dr. Michael Hazen, who guided me from sitting in his office with no ideas all the way up to achieving a completed thesis, Dr. John Llewellyn, whose attention to detail helped me push myself and my writing to greater heights, and Dr. Todd McFall, who agreed to assist the project on short notice and contributed a number of interesting ideas. Finally, I have many to thank on a personal level. -
The Unintended Demise of Salary Arbitration in Major League Baseball: How the Law of Unintended Consequences Crippled the Salary Arbitration Remedy—And How to Fix It
HARVARD JSEL VOLUME 1, NUMBER 1 – SPRING 2010 Hardball Free Agency—The Unintended Demise of Salary Arbitration in Major League Baseball: How the Law of Unintended Consequences Crippled the Salary Arbitration Remedy—and How to Fix It Eldon L. Ham∗ & Jeffrey Malach† TABLE OF CONTENTS I. INTRODUCTION ............................................................................................................64 II. THE DEVELOPMENT OF BASEBALL SALARY ARBITRATION............................66 III. SALARY ARBITRATION IN ITS CURRENT FORM ...................................................75 A. The Process.....................................................................................................................75 B. Admissible Criteria.........................................................................................................76 C. Salary Arbitration in the National Hockey League ........................................................77 IV. BENEFITS OF THE FINAL OFFER ARBITRATION PROCESS...............................78 V. CURRENT PROBLEMS WITH SALARY ARBITRATION...........................................80 A. Failure to Assign Weight to Admissible Criteria Creates Inconsistent and Unpredictable Results ....................................................................................................80 B. Comparing Players with Different Years of Service Time..................................................82 ∗ Eldon L. Ham has taught Sports, Law & Society at Chicago-Kent College of Law for seventeen years. He has published -
The Twelve Year Rain Delay: Why a Change in Leadership Will Benefit the Game of Baseball
LAMME 1/24/2005 2:12:44 PM Originally published in the ALBANY LAW REVIEW . 68 ALB . L. REV . 155 (2004). Posted here with permission. All rights reserved. COMMENTS THE TWELVE YEAR RAIN DELAY: WHY A CHANGE IN LEADERSHIP WILL BENEFIT THE GAME OF BASEBALL Jacob F. Lamme * It breaks your heart. It is designed to break your heart. The game begins in the spring, when everything else begins again, and it blossoms in the summer, filling the afternoons and evenings, and then as soon as the chill rains come, it stops and leaves you to face the fall alone. 1 [E]xamining the business of baseball is like looking at the sun, you can‘t do it for very long before you have to turn away. 2 INTRODUCTION Baseball is a beautiful game. Despite the recent surge in the popularity of professional football and basketball, baseball is— and will always be—the greatest game ever played. Baseball is so graceful and elegant that even the United States Supreme Court could not resist professing its love for the game and its players: [T]here are the many names, celebrated for one reason or another, that have sparked the diamond and its environs and that have provided * B.A., University at Albany, 2002; Albany Law School, Class of 2005. The author wishes to thank Northeastern Law Professor Roger Abrams, Albany Law School Professor Joan Leary Matthews, Albany Law Review editor Jonathan Pall, National Baseball Hall of Fame and Museum Reference Librarian Claudette Burke, his family and friends, and especially Mylene Guantic for all her love and support. -
Baseball Hacks™ by Joseph Adler Copyright © 2006 O’Reilly Media, Inc
Other resources from O’Reilly Related titles Access Hacks PHP Hacks Excel Hacks Retro Gaming Hacks Gaming Hacks Spidering Hacks Google Hacks Programming Perl Google Maps Hacks MySQL Cookbook Hacks Series Home hacks.oreilly.com is a community site for developers and power users ofall stripes. Readers learn fromeach other as they share their favorite tips and tools for Mac OS X, Linux, Google, Windows XP, and more. oreilly.com oreilly.com is more than a complete catalog ofO’Reilly books. You’ll also find links to news, events, articles, weblogs, sample chapters, and code examples. oreillynet.com is the essential portal for developers inter- ested in open and emerging technologies, including new platforms, programming languages, and operating systems. Conferences O’Reilly brings diverse innovators together to nurture the ideas that spark revolutionary industries. We special- ize in documenting the latest tools and systems, translating the innovator’s knowledge into useful skills for those in the trenches. Visit conferences.oreilly.com for our upcoming events. Safari Bookshelf (safari.oreilly.com) is the premier online reference library for programmers and IT professionals. Conduct searches across more than 1,000 books. Sub- scribers can zero in on answers to time-critical questions in a matter ofseconds. Read the books on your Book- shelf from cover to cover or simply flip to the page you need. Try it today. BASEBALL HACKSTM Joseph Adler Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo Baseball Hacks™ by Joseph Adler Copyright © 2006 O’Reilly Media, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. -
Major Qualifying Project: Advanced Baseball Statistics
Major Qualifying Project: Advanced Baseball Statistics Matthew Boros, Elijah Ellis, Leah Mitchell Advisors: Jon Abraham and Barry Posterro April 30, 2020 Contents 1 Background 5 1.1 The History of Baseball . .5 1.2 Key Historical Figures . .7 1.2.1 Jerome Holtzman . .7 1.2.2 Bill James . .7 1.2.3 Nate Silver . .8 1.2.4 Joe Peta . .8 1.3 Explanation of Baseball Statistics . .9 1.3.1 Save . .9 1.3.2 OBP,SLG,ISO . 10 1.3.3 Earned Run Estimators . 10 1.3.4 Probability Based Statistics . 11 1.3.5 wOBA . 12 1.3.6 WAR . 12 1.3.7 Projection Systems . 13 2 Aggregated Baseball Database 15 2.1 Data Sources . 16 2.1.1 Retrosheet . 16 2.1.2 MLB.com . 17 2.1.3 PECOTA . 17 2.1.4 CBS Sports . 17 2.2 Table Structure . 17 2.2.1 Game Logs . 17 2.2.2 Play-by-Play . 17 2.2.3 Starting Lineups . 18 2.2.4 Team Schedules . 18 2.2.5 General Team Information . 18 2.2.6 Player - Game Participation . 18 2.2.7 Roster by Game . 18 2.2.8 Seasonal Rosters . 18 2.2.9 General Team Statistics . 18 2.2.10 Player and Team Specific Statistics Tables . 19 2.2.11 PECOTA Batting and Pitching . 20 2.2.12 Game State Counts by Year . 20 2.2.13 Game State Counts . 20 1 CONTENTS 2 2.3 Conclusion . 20 3 Cluster Luck 21 3.1 Quantifying Cluster Luck . 22 3.2 Circumventing Cluster Luck with Total Bases . -
Regimes in Baseball Players' Career Data
Data Min Knowl Disc (2017) 31:1580–1621 DOI 10.1007/s10618-017-0510-5 Regimes in baseball players’ career data Marcus Bendtsen1 Received: 28 February 2016 / Accepted: 12 May 2017 / Published online: 27 May 2017 © The Author(s) 2017. This article is an open access publication Abstract In this paper we investigate how we can use gated Bayesian networks, a type of probabilistic graphical model, to represent regimes in baseball players’ career data. We find that baseball players do indeed go through different regimes throughout their career, where each regime can be associated with a certain level of performance. We show that some of the transitions between regimes happen in conjunction with major events in the players’ career, such as being traded or injured, but that some transitions cannot be explained by such events. The resulting model is a tool for managers and coaches that can be used to identify where transitions have occurred, as well as an online monitoring tool to detect which regime the player currently is in. 1 Introduction Baseball has been the de facto national sport of the United States since the late nine- teenth century, and its popularity has spread to Central and South America as well as the Caribbean and East Asia. It has inspired numerous movies and other works of arts, and has also influenced the English language with expressions such as “hitting it out of the park” and “covering one’s bases”, as well as being the origin of the concept of a rain-check. As we all are aware, the omnipresent baseball cap is worn across the world.