Introduction to the Special Issue on Analytics in Sports, Part I: General Sports Applications

Total Page:16

File Type:pdf, Size:1020Kb

Load more

Vol. 42, No. 2, March–April 2012, pp. 105–108 ISSN 0092-2102 (print) ISSN 1526-551X (online) http://dx.doi.org/10.1287/inte.1120.0633 © 2012 INFORMS Introduction to the Special Issue on Analytics in Sports, Part I: General Sports Applications Michael J. Fry Department of Operations, Business Analytics and Information Systems, University of Cincinnati, Cincinnati, Ohio 45221, [email protected] Jeffrey W. Ohlmann Department of Management Sciences, University of Iowa, Iowa City, Iowa 52242, [email protected] Sports is a rapidly growing application area for analytics. The use of analytics is pervasive in the professional sports community as evidenced by the increased role for those practicing sports analytics in front-office manage- ment and coaching. Part I of this Special Issue on Analytics in Sports is devoted to the application of a variety of methodologies to a broad range of sports. The papers analyze golf, hockey, baseball, motorcycle racing, and college football. The methodologies employed draw from many areas of analytics including optimization, probabilistic modeling, and choice models. Key words: sports; analytics. ports are big business. Most estimates place the example of applying analytics to baseball—Bill James Stotal market value of spectator sports in the United (James 2001), George Lindsey (Lindsey 1959, Lindsey States in the range of hundreds of billions of dollars. 1961, Lindsey 1963), and many others preceded the As one paper in this special issue notes, the mar- work described in Moneyball—this book was the cata- ket size of spectator sports in the United States has lyst for introducing the broader sports community to been estimated to be double that of the automotive the potential benefits of quantitative analysis. Money- industry and is easily one of the top-10 business mar- ball became a best seller; in 2011, a movie of the same kets globally. Given the size and impact of the sports name (starring Brad Pitt) achieved considerable box industry, it should not be surprising that this is a office success. fertile application area for operations research (OR) Recently, a new breed of management has become models. more prominent in sports organizations. There has Sports analytics has also been gaining widespread been an increase in the number of front office person- notoriety in the general populace. Michael Lewis’ nel with quantitative training and (or) the apprecia- entertaining story about the use of data analysis in tion for the power of analytics to help improve the baseball in Moneyball: The Art of Winning an Unfair performance of their teams both on the field of play Game (Lewis 2004) is arguably the most visible and from a business perspective. Paul DePodesta, one account of sports analytics. Many of the strategies of the central figures in Moneyball, earned an eco- documented in Moneyball, which were then employed nomics degree from Harvard; he has worked in the by the small-market Oakland Athletics team to help it front offices of several MLB teams and currently is the compete with teams with much larger payrolls, have vice president of player development and scouting for been adopted in some form by many other Major the New York Mets. Daryl Morey, general manager of League Baseball (MLB) teams and teams in other the Houston Rockets in the National Basketball Asso- sports. Although Moneyball is clearly not the earliest ciation (NBA), has degrees in computer science and 105 Fry and Ohlmann: Introduction to Special Issue on Analytics in Sports, Part I 106 Interfaces 42(2), pp. 105–108, © 2012 INFORMS statistics from Northwestern University and an MBA Although several dedicated sports analytics jour- from MIT; yet, he has little actual basketball play- nals have recently been established, Interfaces has tra- ing experience. Morey has been an active crusader ditionally been a leader in publishing papers on for sports analytics through his involvement with the this topic. This, and other findings, are highlighted annual MIT-Sloan Sports Analytics Conference, which in “Identifying the ‘Players’ in Sports Analytics brings academics and practitioners together to discuss Research” by B. Jay Coleman. This paper provides an the latest advances in sports analytics. Another pro- overview of the sports analytics research landscape: ponent of quantitative methods in the NBA is Mark which universities are contributing research on this Cuban, owner of the 2011 NBA Champion Dallas topic, where the papers are appearing, and which Mavericks. Cuban has worked directly with Wayne papers are most commonly cited. This paper is similar Winston, John and Esther Reese Professor of Decision in spirit to the Rothkopf rankings (Fricker 2011) pub- Sciences at Indiana University’s Kelley School of Busi- lished by Interfaces, which identifies the universities ness, to apply sports analytics concepts to on-court and organizations that are the leading contributors to and off-court decisions made by the Dallas Mavericks. OR practice. Coleman shows that sports analytics is As these and other examples show, the sports world is a fast-growing research area, outpacing other appli- teeming with opportunities for analytics to contribute cation areas in OR. The author finds that a relatively to sports management. small group of researchers works on sport analytics, This special issue of Interfaces seeks to add to and that many of these researchers are based outside the academic literature examining sports applications. the United States. Historically, Interfaces has been a leading outlet for Most team-based professional sports leagues assign sports analytics research, but this is the first spe- newly eligible players to teams through some form cial issue of Interfaces dedicated to this topic. Stick- of sports draft where each team selects players for ing with the overall spirit of the Interfaces journal, its team from the pool of all draft-eligible players. we present practice-oriented sports analytics papers. In MLB, this is a particularly difficult task because As evidenced by the overwhelming response to the 1,500 players are selected each year over 50 draft call for papers for this special issue (we received rounds. All these players must be evaluated by the over 40 submissions), many researchers in OR-related baseball teams, but player evaluations are inherently fields are working on sports-related applications. imprecise. Multiple scouts evaluating the same player often have varying opinions on the player’s pro- jected performance in the major leagues. In “A Major Papers in Part I of This Special Issue League Baseball Team Uses Operations Research to Because of the large number of high-quality submis- Improve Draft Preparation,” Streib et al. present an sions to this special issue, we publish it as a double application of OR to improve an MLB team’s process issue. The first issue focuses on general (nonschedul- of assessing baseball players. The authors develop an ing) sports applications; the second issue focuses optimization tool that helps a team come to a consen- solely on sports scheduling, which has emerged as sus on rankings for draft-eligible players. The algo- a highly successful application area for OR. As seen rithm is based on voter-preference research, including by the variety of sports examined in the first issue, the classic Kemeny-Young method; however, the size the application of analytics is not limited to any of the problem faced by MLB teams requires new particular sport. Three of the papers examine golf methods and solution approaches. The authors’ tool applications. Sports analytics applied to golf has expe- has been used by one MLB team for several years, and rienced increased interest recently because of the other MLB teams have expressed interest in develop- wealth of data made available through the PGA ing a similar tool. This paper presents an excellent TOUR’s ShotLink database (Deason 2006). The other example of using analytics to better utilize the invalu- papers in this special issue discuss hockey, baseball, able subject matter expertise of personnel (scouts in motorcycle racing, and college football. this case). Fry and Ohlmann: Introduction to Special Issue on Analytics in Sports, Part I Interfaces 42(2), pp. 105–108, © 2012 INFORMS 107 The goal in team sports is to create an optimal academic papers and contradicts the commonly held lineup of players at different positions to defeat the belief characterized by the oft-repeated quote from opposing team. However, this problem is difficult golfer Bobby Locke that golfers “drive for show, but because different player positions require different putt for dough” (Riach 2007). skills and a collection of the best individual players In “A Proposal for Redesign of the FedEx Cup does not necessarily produce the best lineup because Playoff Series on the PGA TOUR,” Hall and Potts of interaction effects. In “Quantifying the Contribu- propose a new design for this season-ending event. tion of NHL Player Types to Team Performance,” The FedEx Cup playoff series was established in 2007 Chan et al. examine the question of how to determine to bring the professional golf season to an exciting the optimal player lineups in ice hockey by using a end, similar to the Super Bowl in the National Foot- clustering technique to define distinct player types for ball League or the World Series in MLB. The FedEx offense, defense, and goalies. These clusters are used Cup system is a season-long points competition that in regression models to determine the relationship ends in a playoff series of four events where the top between team performance and the clustered player 125 point-scoring players are reduced each week until types. The authors show how the results from their models can be used to analyze trades in the National 30 players compete in a final 72-hole stroke-play tour- Hockey League, and they provide an Excel-based tool nament.
Recommended publications
  • Making It Pay to Be a Fan: the Political Economy of Digital Sports Fandom and the Sports Media Industry

    Making It Pay to Be a Fan: the Political Economy of Digital Sports Fandom and the Sports Media Industry

    City University of New York (CUNY) CUNY Academic Works All Dissertations, Theses, and Capstone Projects Dissertations, Theses, and Capstone Projects 9-2018 Making It Pay to be a Fan: The Political Economy of Digital Sports Fandom and the Sports Media Industry Andrew McKinney The Graduate Center, City University of New York How does access to this work benefit ou?y Let us know! More information about this work at: https://academicworks.cuny.edu/gc_etds/2800 Discover additional works at: https://academicworks.cuny.edu This work is made publicly available by the City University of New York (CUNY). Contact: [email protected] MAKING IT PAY TO BE A FAN: THE POLITICAL ECONOMY OF DIGITAL SPORTS FANDOM AND THE SPORTS MEDIA INDUSTRY by Andrew G McKinney A dissertation submitted to the Graduate Faculty in Sociology in partial fulfillment of the requirements for the degree of Doctor of Philosophy, The City University of New York 2018 ©2018 ANDREW G MCKINNEY All Rights Reserved ii Making it Pay to be a Fan: The Political Economy of Digital Sport Fandom and the Sports Media Industry by Andrew G McKinney This manuscript has been read and accepted for the Graduate Faculty in Sociology in satisfaction of the dissertation requirement for the degree of Doctor of Philosophy. Date William Kornblum Chair of Examining Committee Date Lynn Chancer Executive Officer Supervisory Committee: William Kornblum Stanley Aronowitz Lynn Chancer THE CITY UNIVERSITY OF NEW YORK I iii ABSTRACT Making it Pay to be a Fan: The Political Economy of Digital Sport Fandom and the Sports Media Industry by Andrew G McKinney Advisor: William Kornblum This dissertation is a series of case studies and sociological examinations of the role that the sports media industry and mediated sport fandom plays in the political economy of the Internet.
  • Sports Analytics Algorithms for Performance Prediction

    Sports Analytics Algorithms for Performance Prediction

    Sports Analytics Algorithms for Performance Prediction Paschalis Koudoumas SID: 3308190012 SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Data Science JANUARY 2021 THESSALONIKI – GREECE -i- Sports Analytics Algorithms for Performance Prediction Paschalis Koudoumas SID: 3308190012 Supervisor: Assoc. Prof. Christos Tjortjis Supervising Committee Mem- Assoc. Prof. Maria Drakaki bers: Dr. Leonidas Akritidis SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Data Science JANUARY 2021 THESSALONIKI – GREECE -ii- Abstract This dissertation was written as a part of the MSc in Data Science at the International Hellenic University. Sports Analytics exist as a term and concept for many years, but nowadays, it is imple- mented in a different way that affects how teams, players, managers, executives, betting companies and fans perceive statistics and sports. Machine Learning can have various applications in Sports Analytics. The most widely used are for prediction of match outcome, player or team performance, market value of a player and injuries prevention. This dissertation focuses on the quintessence of foot- ball, which is match outcome prediction. The main objective of this dissertation is to explore, develop and evaluate machine learning predictive models for English Premier League matches’ outcome prediction. A comparison was made between XGBoost Classifier, Logistic Regression and Support Vector Classifier. The results show that the XGBoost model can outperform the other models in terms of accuracy and prove that it is possible to achieve quite high accuracy using Extreme Gradient Boosting. -iii- Acknowledgements At this point, I would like to thank my Supervisor, Professor Christos Tjortjis, for offer- ing his help throughout the process and providing me with essential feedback and valu- able suggestions to the issues that occurred.
  • Open Evan Bittner Thesis.Pdf

    Open Evan Bittner Thesis.Pdf

    THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF STATISTICS PREDICTING MAJOR LEAGUE BASEBALL PLAYOFF PROBABILITIES USING LOGISTIC REGRESSION EVAN J. BITTNER FALL 2015 A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Statistics with honors in Statistics Reviewed and approved* by the following: Andrew Wiesner Lecturer of Statistics Thesis Supervisor Murali Haran Associate Professor of Statistics Honors Adviser * Signatures are on file in the Schreyer Honors College. i ABSTRACT Major League Baseball teams are constantly assessing whether or not they think their teams will make the playoffs. Many sources publish playoff probabilities or odds throughout the season using advanced statistical methods. These methods are somewhat secretive and typically advanced and difficult to understand. The goal of this work is to determine a way to calculate playoff probabilities midseason that can easily be understood and applied. The goal is to develop a method and compare its predictive accuracy to the current methods published by statistical baseball sources such as Baseball Prospectus and Fangraphs. ii TABLE OF CONTENTS List of Figures .............................................................................................................. iii List of Tables ............................................................................................................... iv Acknowledgements .....................................................................................................
  • Sabermetrics Over Time: Persuasion and Symbolic Convergence Across a Diffusion of Innovations

    Sabermetrics Over Time: Persuasion and Symbolic Convergence Across a Diffusion of Innovations

    SABERMETRICS OVER TIME: PERSUASION AND SYMBOLIC CONVERGENCE ACROSS A DIFFUSION OF INNOVATIONS BY NATHANIEL H. STOLTZ A Thesis Submitted to the Graduate Faculty of WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES in Partial Fulfillment of the Requirements for the Degree of MASTER OF ARTS Communication May 2014 Winston-Salem, North Carolina Approved By: Michael D. Hazen, Ph. D., Advisor John T. Llewellyn, Ph. D., Chair Todd A. McFall, Ph. D. ii Acknowledgments First and foremost, I would like to thank everyone who has assisted me along the way in what has not always been the smoothest of academic journeys. It begins with the wonderful group of faculty I encountered as an undergraduate in the James Madison Writing, Rhetoric, and Technical Communication department, especially my advisor, Cindy Allen. Without them, I would never have been prepared to complete my undergraduate studies, let alone take on the challenges of graduate work. I also want to thank the admissions committee at Wake Forest for giving me the opportunity to have a graduate school experience at a leading program. Further, I have unending gratitude for the guidance and patience of my thesis committee: Dr. Michael Hazen, who guided me from sitting in his office with no ideas all the way up to achieving a completed thesis, Dr. John Llewellyn, whose attention to detail helped me push myself and my writing to greater heights, and Dr. Todd McFall, who agreed to assist the project on short notice and contributed a number of interesting ideas. Finally, I have many to thank on a personal level.
  • Changing Baseball Forever Jake Sumeraj College of Dupage

    Changing Baseball Forever Jake Sumeraj College of Dupage

    ESSAI Volume 12 Article 34 Spring 2014 Changing Baseball Forever Jake Sumeraj College of DuPage Follow this and additional works at: http://dc.cod.edu/essai Recommended Citation Sumeraj, Jake (2014) "Changing Baseball Forever," ESSAI: Vol. 12, Article 34. Available at: http://dc.cod.edu/essai/vol12/iss1/34 This Selection is brought to you for free and open access by the College Publications at DigitalCommons@COD. It has been accepted for inclusion in ESSAI by an authorized administrator of DigitalCommons@COD. For more information, please contact [email protected]. Sumeraj: Changing Baseball Forever Changing Baseball Forever by Jake Sumeraj (Honors English 1102) idden in the back rooms of any modern major league baseball franchise are a select few individuals that are drastically changing the way teams operate. Using numbers and Hborderline obsessive tracking of each player’s every move, they see things that elude the everyday baseball fan. These are the baseball analysts. Although they do the research that can potentially decide which player becomes the face of the team, these analysts can likely walk the city streets without a single diehard fan knowing who they are. Baseball analysts get almost zero publicity. However, their work is clearly visible at any baseball game. A catcher’s decision to call for a 2-0 curveball to a power hitter, the manager’s choice to continuously play a hitter that’s only batting 0.238, and a defensive shift to the left that leaves the entire right side of the infield open are all moves that are the result of research done by analysts.
  • The Shrinkage of the Pythagorean Exponents

    The Shrinkage of the Pythagorean Exponents

    Journal of Sports Analytics 2 (2016) 37–48 37 DOI 10.3233/JSA-160017 IOS Press The Shrinkage of the Pythagorean exponents John Chena,∗ and Tengfei Lib aKing George V School, 2 Tin Kwong Road, Homantin, Kowloon, Hong Kong bDepartment of Biostatistics, CB# 7420, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA Abstract. The Pythagorean expectation is a formula designed by Bill James in the 1980s for estimating the number of games that a baseball team is expected to win. The formula has since been applied in basketball, hockey, football, and soccer. When used to assess National Basketball Association (NBA) teams, the optimal Pythagorean exponent is generally believed to be between 14 and 17. In this study, we empirically investigated the accuracy of the formula in the NBA by using data from the 1993-1994 to 2013-2014 seasons. This study confirmed the results of previous studies, which found that the Pythagorean exponent is slightly higher than 14 in the fit scenario, in which the strengths and winning percentage of a team are calculated using data from the same period. However, to predict future winning percentages according to the current evaluations of team strengths, the optimal Pythagorean exponent in the prediction scenario decreases substantially from 14. The shrinkage factor varies from 0.5 early in the season to nearly 1 toward the end of the season. Two main reasons exist for the decrease: the current evaluated strengths correlate with the current winning percentage more strongly than they do with the future winning percentage, and the scales of strengths evaluated in the early or middle part of a season tend to exceed those evaluated at the end of the season because of the evening out of randomness or the law of averages.
  • Major Qualifying Project: Advanced Baseball Statistics

    Major Qualifying Project: Advanced Baseball Statistics

    Major Qualifying Project: Advanced Baseball Statistics Matthew Boros, Elijah Ellis, Leah Mitchell Advisors: Jon Abraham and Barry Posterro April 30, 2020 Contents 1 Background 5 1.1 The History of Baseball . .5 1.2 Key Historical Figures . .7 1.2.1 Jerome Holtzman . .7 1.2.2 Bill James . .7 1.2.3 Nate Silver . .8 1.2.4 Joe Peta . .8 1.3 Explanation of Baseball Statistics . .9 1.3.1 Save . .9 1.3.2 OBP,SLG,ISO . 10 1.3.3 Earned Run Estimators . 10 1.3.4 Probability Based Statistics . 11 1.3.5 wOBA . 12 1.3.6 WAR . 12 1.3.7 Projection Systems . 13 2 Aggregated Baseball Database 15 2.1 Data Sources . 16 2.1.1 Retrosheet . 16 2.1.2 MLB.com . 17 2.1.3 PECOTA . 17 2.1.4 CBS Sports . 17 2.2 Table Structure . 17 2.2.1 Game Logs . 17 2.2.2 Play-by-Play . 17 2.2.3 Starting Lineups . 18 2.2.4 Team Schedules . 18 2.2.5 General Team Information . 18 2.2.6 Player - Game Participation . 18 2.2.7 Roster by Game . 18 2.2.8 Seasonal Rosters . 18 2.2.9 General Team Statistics . 18 2.2.10 Player and Team Specific Statistics Tables . 19 2.2.11 PECOTA Batting and Pitching . 20 2.2.12 Game State Counts by Year . 20 2.2.13 Game State Counts . 20 1 CONTENTS 2 2.3 Conclusion . 20 3 Cluster Luck 21 3.1 Quantifying Cluster Luck . 22 3.2 Circumventing Cluster Luck with Total Bases .
  • Download a Copy of the 2019 Basketball Career Conference

    Download a Copy of the 2019 Basketball Career Conference

    “To Catch a Foul Ball SMWW You Need a Ticket to the Game” - Dr. G. Lynn Lashbrook Basketball Career Conference July 6-7, 2019 The Global Leader in Sports Business Education | SMWW.com The Global Leader in Sports Education | USA & Canada 503-445-7105 | UK +44(0) 871-288-4799 | SMWW.com BASKETBALL CAREER CONFERENCE AGENDA SMWW SUCCESS STORIES Saturday, July 6th - SMWW Welcome Reception Over 15,000 graduates working in over a 160 countries! John Ross, Portland Trail Blazers Michael Gershon Keystone Ice Miners Brian Graham, National Scouting Report 5:00-7:00pm SMWW Welcome Reception with cocktails and hors’d’oeuvres Mark Warkentien, New York Knicks Travis Gibson Champion Hockey Brian Orth, Cloverdale Minor Hockey Association Alexa Atria, New York Yankees Frank Gilberti Chatham High School Brian Gioia, Chicago Bulls at Bahama Breeze, 375 Hughes Center Dr, Las Vegas, NV 89109 Simon Barrette Columbus Blue Jackets Bob Gillen Yellowstone Quake Brian Adams, Boston Celtics Sunday, July 7th - SMWW Career Conference Paul Epstein, San Francisco 49ers Jessica Gillis Hockey New Brunswick Chad Pennick, Denver Nuggets Demetri Betzios, Toronto Argonauts Tony Griffo London Knights Chris Cordero, Miami Heat Four Seasons Hotel, 3960 S Las Vegas Blvd, Las Vegas, NV 89119 Andre Sherard, Sporting Kansas City Mario Guido Rinknet Christian Alicpala, Toronto Raptors Taylore Scott, Dallas Cowboys Brian Guindon HockeyTwentyFourSeven Christian Stoltz, USAL Rugby Alireza Absalan, FIFA Agent Aaron Guli President Irish Ice Hockey Association Christian Payne, Dickinson
  • Baseball Prospectus – Escaping Bill James’ Shadow ...James Fraser

    Baseball Prospectus – Escaping Bill James’ Shadow ...James Fraser

    By the Numbers Volume 10, Number 2 The Newsletter of the SABR Statistical Analysis Committee May, 2000 Welcome Phil Birnbaum, Editor Welcome to the May BTN. Thanks to Rob Wood, Clifford Blau, your help. Hope you enjoy the issue, and please contact our James Fraser, and Tom Hanrahan for their time, effort, and contributors if you liked their work. Deadline for contributions contribution. Thanks also to the anonymous peer reviewers for for next issue is July 24. A Brief Review of “Defense Independent Pitching Stats” Clifford Blau One of the most interesting articles I have read recently is correlation with the following season’s ERA than does the entitled “Defense Independent Pitching Stats, “ which was current season’s ERA, or component ERA (essentially ERA written by Voros McCracken. It can be found on the Internet at calculated using a runs predictor formula.) http://www.baseballstuff.com/fraser/articles/dips.html. Some I recommend it be read comments on by all. the article: A In this issue better test To summarize this and a would involve follow up article, Mr. A Brief Review of “Defense Independent a park- McCracken states that Pitching Stats” ..............................Clifford Blau.......................1 adjusted rate. the ratio of hits allowed News on the SABR Publication Front ........................Neal Traven.........................3 Also, the by pitchers to balls in Baseball Prospectus – Escaping Bill James’ Shadow ...James Fraser........................4 discussion on play is not a skill. “Best Teams” Logic Flawed..........................................Clifford Blau.......................6 the posting Specifically, he shows Probability of Performance – A Comment....................Rob Wood...........................7 board included that the correlation of Clutch Teams in 1999 ..................................................Tom Hanrahan ..................12 a consideration this ratio from one year How Often Does the Best Team Win the Pennant?.......Rob Wood.........................15 of career to another is so low that numbers.
  • CAWS Career Gauge Measure for Best Players of the Live Ball

    CAWS Career Gauge Measure for Best Players of the Live Ball

    A Century of Modern Baseball: 1920 to 2019 The Best Players of the Era Michael Hoban, Ph.D. Professor Emeritus (mathematics) – The City U of NY Author of DEFINING GREATNESS: A Hall of Fame Handbook (2012) “Mike, … I appreciate your using Win Shares for the purpose for which it was intended. …thanks … Bill (James)” Contents Introduction 3 Part 1 - Career Assessment The Win Shares System 12 How to Judge a Career 18 The 250/1800 Benchmark - Jackie Robinson 27 The 180/2400 Benchmark - Pedro and Sandy 30 The 160/1500 Benchmark - Mariano Rivera 33 300 Win Shares - A New “Rule of Thumb” 36 Hall of Fame Elections in the 21st Century 41 Part 2 - The Lists The 21st Century Hall of Famers (36) 48 Modern Players with HOF Numbers at Each Position 52 The Players with HOF Numbers – Not Yet in the Hall (24) 58 The Pitchers with HOF Numbers – Not Yet in the Hall (6) 59 The 152 Best Players of the Modern Era 60 The Complete CAWS Ranking for Position Players 67 The Complete CAWS Ranking for Pitchers 74 The Hall of Famers Who Do Not Have HOF Numbers (52) 78 2 Introduction The year 2019 marks 100 years of “the live-ball era” (that is, modern baseball) – 1920 to 2019. This monograph will examine those individuals who played the majority of their careers during this era and it will indicate who were the best players. As a secondary goal, it will seek to identify “Hall of Fame benchmarks” for position players and pitchers – to indicate whether a particular player appeared to post “HOF numbers” during his on-field career.
  • The Numbers Game

    The Numbers Game

    The Numbers Game A Qualitative Study on Big Data Analytics and Performance Metrics in Sports. 11107766 Kyra Teklu Faculty of Humanities MA New Media and Digital Cultures Thesis Supervisor: Dr. N.A.J.M. van Doorn 2 Table of Contents Chapter 1. Introduction 3 Chapter 2. The Influence of Statistics 7 2.1 The Role of Quantification 7 2.2. The Role of Statistics and Metrics 10 2.3. Data and The Databases 16 Chapter 3. The Commercialisation of Sports 25 3.1. The Early Years 25 3.2. Media Rights and Sponsorship 29 3.3. Big Data shapes the Sport Industry 38 Chapter 4. Methodology 43 Chapter 5. Findings 54 5.1. The Tools 54 5.2. Player Recruitment 61 5.3 Rise in More Interesting Data 67 5.4. Unpredictability of Data 72 5.4. Economic Value 74 Chapter 6. Conclusion 78 References. 81 Chapter 7. Appendices 93 Appendix A-Examples of Metrics 93 Appendix B-Description of Participants 94 Appendix C-Participant Demographic Table 96 Appendix D-Transcripts 97 3 Chapter 1. Introduction A New Science of Winning The image you see on the first page of this thesis is a visualization of passes. This web depicts the England football teams passes during the first half of a game.1 The blue arrows indicate successful passes and their direction. Red indicates the failed attempts. Such examples of statistical visualizations focusing on performance are not uncommon nowadays, due to the advanced nature of data analytics and performance metrics within the professional sports industry. This is perhaps best indicated by the fact, today, 19 of the 20 Premier League2 3 teams use Prozone (Medeiros 2014).
  • Forecasting Outcomes of Major League Baseball Games Using Machine Learning

    Forecasting Outcomes of Major League Baseball Games Using Machine Learning

    Forecasting Outcomes of Major League Baseball Games Using Machine Learning EAS 499 Senior Capstone Thesis University of Pennsylvania An undergraduate thesis submitted in partial fulfillment of the requirements for the Bachelor of Applied Science in Computer and Information Science Andrew Y. Cui1 Thesis Advisor: Dr. Shane T. Jensen2 Faculty Advisor: Dr. Lyle H. Ungar3 April 29, 2020 1 Candidate for Bachelor of Applied Science, Computer and Information Science. Email: [email protected] 2 Professor of Statistics. Email: [email protected] 3 Professor of Computer and Information Science. Email: [email protected] Abstract When two Major League Baseball (MLB) teams meet, what factors determine who will win? This apparently simple classification question, despite the gigabytes of detailed baseball data, has yet to be answered. Published public-domain work, applying models ranging from one-line Bayes classifiers to multi-layer neural nets, have been limited to classification accuracies of 55- 60%. In this paper, we tackle this problem through a mix of new feature engineering and machine learning modeling. Using granular game-level data from Retrosheet and Sean Lahman’s Baseball Database, we construct detailed game-by-game covariates for individual observations that we believe explain greater variance than the aggregate numbers used by existing literature. Our final model is a regularized logistic regression elastic net, with key features being percentage differences in on-base percentage (OBP), rest days, isolated power (ISO), and average baserunners allowed (WHIP) between our home and away teams. Trained on the 2001-2015 seasons and tested on over 9,700 games during the 2016-2019 seasons, our model achieves a classification accuracy of 61.77% and AUC of 0.6706, exceeding or matching the strongest findings in the existing literature, and stronger results peaking at 64-65% accuracy when evaluated monthly.