March Madness Model Meta-Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Incorporating the Effects of Designated Hitters in the Pythagorean Expectation
Abstract The Pythagorean Expectation is widely used in the field of sabermetrics to estimate a baseball team’s overall season winning percentage based on the number of runs scored and allowed in its games thus far. Bill James devised the simplest version RS 2 p q of the formula through empirical observation as W inning P ercentage RS 2 RA 2 “ p q `p q where RS and RA are runs scored and allowed, respectively. Statisticians later found 1.83 to be a more accurate exponent, estimating overall season wins within 3-4 games per season. Steven Miller provided a theoretical justification for the Pythagorean Expectation by modeling runs scored and allowed as independent continuous random variables drawn from Weibull distributions. This paper aims to first explain Miller’s methodology using recent data and then build upon Miller’s work by incorporating the e↵ects of designated hitters, specifically on the distribution of runs scored by a team. Past studies have attempted to include other e↵ects on run production such as ballpark factor, game state, and pitching power. The results indicate that incorporating information on designated hitters does not improve the error of the Pythagorean Expectation to better than 3-4 games per season. ii Contents Abstract ii Acknowledgements vi 1 Background 1 1.1 Empirical Derivation ........................... 2 1.2 Weibull Distribution ........................... 2 1.3 Application to Other Sports ....................... 4 2 Miller’s Model 5 2.1 Model Assumptions ............................ 5 2.1.1 Continuity of the Data ...................... 6 2.1.2 Independence of Runs Scored and Allowed ........... 7 2.2 Pythagorean Won-Loss Formula .................... -
Introduction Predictive Vs. Earned Ranking Methods
An overview of some methods for ranking sports teams Soren P. Sorensen University of Tennessee Knoxville, TN 38996-1200 [email protected] Introduction The purpose of this report is to argue for an open system for ranking sports teams, to review the history of ranking systems, and to document a particular open method for ranking sports teams against each other. In order to do this extensive use of mathematics is used, which might make the text more difficult to read, but ensures the method is well documented and reproducible by others, who might want to use it or derive another ranking method from it. The report is, on the other hand, also more detailed than a ”typical” scientific paper and discusses details, which in a scientific paper intended for publication would be omitted. We will in this report focus on NCAA 1-A football, but the methods described here are very general and can be applied to most other sports with only minor modifications. Predictive vs. Earned Ranking Methods In general most ranking systems fall in one of the following two categories: predictive or earned rankings. The goal of an earned ranking is to rank the teams according to their past performance in the season in order to provide a method for selecting either a champ or a set of teams that should participate in a playoff (or bowl games). The goal of a predictive ranking method, on the other hand, is to provide the best possible prediction of the outcome of a future game between two teams. In an earned system objective and well publicized criteria should be used to rank the teams, like who won or the score difference or a combination of both. -
Chapter Two Massey’S Method
Chapter Two Massey’s Method The Bowl Championship Series (BCS) is a rating system for NCAA college football that wasdesignedtodeterminewhichteamsareinvitedtoplayinwhichbowlgames.The BCS has become famous, and perhaps notorious, for the ratings it generates for each team in the NCAA. These ratings are assembled from two sources, humans and computers. Human input comes from the opinions of coaches and media. Computer input comes from six computer and mathematical models—details are given in the aside on page 17. The BCS ratings for the 2001 and 2003 seasons are known as particularly controversial among sports fans and analysts. The flaws in the BCS selection system as opposed to a tournament playoff are familiar to most, including the President of the United States—read the aside on page 19. Initial Massey Rating Method In 1997, Kenneth Massey, then an undergraduate at Bluefield College, created a method for ranking college football teams. He wrote about this method, which uses the mathematical theory of least squares, as his honors thesis [52]. In this book we refer to this as the Massey method, despite the fact that there are actually several other methods attributed to Ken Massey. Massey has since become a mathematics professor at Carson-Newman College and today continues to refine his sports ranking models. Professor Massey has created various rat- ing methods, one of which is used by the Bowl Championship Series K. Massey (or BCS) system to select NCAA football bowl matchups—see the aside on page 17. The Colley method, described in the next chapter, is also one of the six computer rating sys- tems used by the BCS. -
Quantifying the Influence of Deviations in Past NFL Standings on the Present
Can Losing Mean Winning in the NFL? Quantifying the Influence of Deviations in Past NFL Standings on the Present The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation MacPhee, William. 2020. Can Losing Mean Winning in the NFL? Quantifying the Influence of Deviations in Past NFL Standings on the Present. Bachelor's thesis, Harvard College. Citable link https://nrs.harvard.edu/URN-3:HUL.INSTREPOS:37364661 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Can Losing Mean Winning in the NFL? Quantifying the Influence of Deviations in Past NFL Standings on the Present A thesis presented by William MacPhee to Applied Mathematics in partial fulfillment of the honors requirements for the degree of Bachelor of Arts Harvard College Cambridge, Massachusetts November 15, 2019 Abstract Although plenty of research has studied competitiveness and re-distribution in professional sports leagues from a correlational perspective, the literature fails to provide evidence arguing causal mecha- nisms. This thesis aims to isolate these causal mechanisms within the National Football League (NFL) for four treatments in past seasons: win total, playoff level reached, playoff seed attained, and endowment obtained for the upcoming player selection draft. Causal inference is made possible due to employment of instrumental variables relating to random components of wins (both in the regular season and in the postseason) and the differential impact of tiebreaking metrics on teams in certain ties and teams not in such ties. -
Machine Learning Applications in Baseball: a Systematic Literature Review
This is an Accepted Manuscript of an article published by Taylor & Francis in Applied Artificial Intelligence on February 26 2018, available online: https://doi.org/10.1080/08839514.2018.1442991 Machine Learning Applications in Baseball: A Systematic Literature Review Kaan Koseler ([email protected]) and Matthew Stephan* ([email protected]) Miami University Department of Computer Science and Software Engineering 205 Benton Hall 510 E. High St. Oxford, OH 45056 Abstract Statistical analysis of baseball has long been popular, albeit only in limited capacity until relatively recently. In particular, analysts can now apply machine learning algorithms to large baseball data sets to derive meaningful insights into player and team performance. In the interest of stimulating new research and serving as a go-to resource for academic and industrial analysts, we perform a systematic literature review of machine learning applications in baseball analytics. The approaches employed in literature fall mainly under three problem class umbrellas: Regression, Binary Classification, and Multiclass Classification. We categorize these approaches, provide our insights on possible future ap- plications, and conclude with a summary our findings. We find two algorithms dominate the literature: 1) Support Vector Machines for classification problems and 2) k-Nearest Neighbors for both classification and Regression problems. We postulate that recent pro- liferation of neural networks in general machine learning research will soon carry over into baseball analytics. keywords: baseball, machine learning, systematic literature review, classification, regres- sion 1 Introduction Baseball analytics has experienced tremendous growth in the past two decades. Often referred to as \sabermetrics", a term popularized by Bill James, it has become a critical part of professional baseball leagues worldwide (Costa, Huber, and Saccoman 2007; James 1987). -
Making It Pay to Be a Fan: the Political Economy of Digital Sports Fandom and the Sports Media Industry
City University of New York (CUNY) CUNY Academic Works All Dissertations, Theses, and Capstone Projects Dissertations, Theses, and Capstone Projects 9-2018 Making It Pay to be a Fan: The Political Economy of Digital Sports Fandom and the Sports Media Industry Andrew McKinney The Graduate Center, City University of New York How does access to this work benefit ou?y Let us know! More information about this work at: https://academicworks.cuny.edu/gc_etds/2800 Discover additional works at: https://academicworks.cuny.edu This work is made publicly available by the City University of New York (CUNY). Contact: [email protected] MAKING IT PAY TO BE A FAN: THE POLITICAL ECONOMY OF DIGITAL SPORTS FANDOM AND THE SPORTS MEDIA INDUSTRY by Andrew G McKinney A dissertation submitted to the Graduate Faculty in Sociology in partial fulfillment of the requirements for the degree of Doctor of Philosophy, The City University of New York 2018 ©2018 ANDREW G MCKINNEY All Rights Reserved ii Making it Pay to be a Fan: The Political Economy of Digital Sport Fandom and the Sports Media Industry by Andrew G McKinney This manuscript has been read and accepted for the Graduate Faculty in Sociology in satisfaction of the dissertation requirement for the degree of Doctor of Philosophy. Date William Kornblum Chair of Examining Committee Date Lynn Chancer Executive Officer Supervisory Committee: William Kornblum Stanley Aronowitz Lynn Chancer THE CITY UNIVERSITY OF NEW YORK I iii ABSTRACT Making it Pay to be a Fan: The Political Economy of Digital Sport Fandom and the Sports Media Industry by Andrew G McKinney Advisor: William Kornblum This dissertation is a series of case studies and sociological examinations of the role that the sports media industry and mediated sport fandom plays in the political economy of the Internet. -
Sports Analytics Algorithms for Performance Prediction
Sports Analytics Algorithms for Performance Prediction Paschalis Koudoumas SID: 3308190012 SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Data Science JANUARY 2021 THESSALONIKI – GREECE -i- Sports Analytics Algorithms for Performance Prediction Paschalis Koudoumas SID: 3308190012 Supervisor: Assoc. Prof. Christos Tjortjis Supervising Committee Mem- Assoc. Prof. Maria Drakaki bers: Dr. Leonidas Akritidis SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Data Science JANUARY 2021 THESSALONIKI – GREECE -ii- Abstract This dissertation was written as a part of the MSc in Data Science at the International Hellenic University. Sports Analytics exist as a term and concept for many years, but nowadays, it is imple- mented in a different way that affects how teams, players, managers, executives, betting companies and fans perceive statistics and sports. Machine Learning can have various applications in Sports Analytics. The most widely used are for prediction of match outcome, player or team performance, market value of a player and injuries prevention. This dissertation focuses on the quintessence of foot- ball, which is match outcome prediction. The main objective of this dissertation is to explore, develop and evaluate machine learning predictive models for English Premier League matches’ outcome prediction. A comparison was made between XGBoost Classifier, Logistic Regression and Support Vector Classifier. The results show that the XGBoost model can outperform the other models in terms of accuracy and prove that it is possible to achieve quite high accuracy using Extreme Gradient Boosting. -iii- Acknowledgements At this point, I would like to thank my Supervisor, Professor Christos Tjortjis, for offer- ing his help throughout the process and providing me with essential feedback and valu- able suggestions to the issues that occurred. -
Loss Aversion and the Contract Year Effect in The
Gaming the System: Loss Aversion and the Contract Year Effect in the NBA By Ezekiel Shields Wald, UCSB 2/20/2016 Advisor: Professor Peter Kuhn, Ph.D. Abstract The contract year effect, which involves professional athletes strategically adjusting their effort levels to perform more effectively during the final year of a guaranteed contract, has been well documented in professional sports. I examine two types of heterogeneity in the National Basketball Association, a player’s value on the court relative to their salary, and the presence of several contract options that can be included in an NBA contract. Loss aversion suggests that players who are being paid more than they are worth may use their current salaries as a reference point, and be motivated to improve their performance in order to avoid a “loss” of wealth. The presence of contract options impacts the return to effort that the players are facing in their contract season, and can eliminate the contract year effect. I use a linear regression with player, year and team fixed effects to evaluate the impact of a contract year on relevant performance metrics, and find compelling evidence for a general contract year effect. I also develop a general empirical model of the contract year effect given loss aversion, which is absent from previous literature. The results of this study support loss-aversion as a primary motivator of the contract year effect, as only players who are marginally overvalued show a significant contract year effect. The presence of a team option in a player’s contract entirely eliminates any contract year effects they may otherwise show. -
Exploring Chance in NCAA Basketball
Exploring chance in NCAA basketball Albrecht Zimmermann [email protected] INSA Lyon Abstract. There seems to be an upper limit to predicting the outcome of matches in (semi-)professional sports. Recent work has proposed that this is due to chance and attempts have been made to simulate the distribution of win percentages to identify the most likely proportion of matches decided by chance. We argue that the approach that has been chosen so far makes some simplifying assumptions that cause its result to be of limited practical value. Instead, we propose to use clustering of statistical team profiles and observed scheduling information to derive limits on the predictive accuracy for particular seasons, which can be used to assess the performance of predictive models on those seasons. We show that the resulting simulated distributions are much closer to the observed distributions and give higher assessments of chance and tighter limits on predictive accuracy. 1 Introduction In our last work on the topic of NCAA basketball [7], we speculated about the existence of a “glass ceiling” in (semi-)professional sports match outcome prediction, noting that season-long accuracies in the mid-seventies seemed to be the best that could be achieved for college basketball, with similar results for other sports. One possible explanation for this phenomenon is that we are lacking the attributes to properly describe sports teams, having difficulties to capture player experience or synergies, for instance. While we still intend to explore this direction in future work,1 we consider a different question in this paper: the influence of chance on match outcomes. -
Predicting Outcomes of NCAA Basketball Tournament Games
Cripe 1 Predicting Outcomes of NCAA Basketball Tournament Games Aaron Cripe March 12, 2008 Math Senior Project Cripe 2 Introduction All my life I have really enjoyed watching college basketball. When I was younger my favorite month of the year was March, solely because of the fact that the NCAA Division I Basketball Tournament took place in March. I would look forward to filling out a bracket every year to see how well I could predict the winning teams. I quickly realized that I was not an expert on predicting the results of the tournament games. Then I started wondering if anyone was really an expert in terms of predicting the results of the tournament games. For my project I decided to find out, or at least compare some of the top rating systems and see which one is most accurate in predicting the winner of the each game in the Men’s Basketball Division I Tournament. For my project I compared five rating systems (Massey, Pomeroy, Sagarin, RPI) with the actual tournament seedings. I compared these systems by looking at the pre-tournament ratings and the tournament results for 2004 through 2007. The goal of my project was to determine which, if any, of these systems is the best predictor of the winning team in the tournament games. Project Goals Each system that I compared gave a rating to every team in the tournament. For my project I looked at each game and then compared the two team’s ratings. In most cases the two teams had a different rating; however there were a couple of games where the two teams had the same rating which I will address later. -
Open Evan Bittner Thesis.Pdf
THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF STATISTICS PREDICTING MAJOR LEAGUE BASEBALL PLAYOFF PROBABILITIES USING LOGISTIC REGRESSION EVAN J. BITTNER FALL 2015 A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Statistics with honors in Statistics Reviewed and approved* by the following: Andrew Wiesner Lecturer of Statistics Thesis Supervisor Murali Haran Associate Professor of Statistics Honors Adviser * Signatures are on file in the Schreyer Honors College. i ABSTRACT Major League Baseball teams are constantly assessing whether or not they think their teams will make the playoffs. Many sources publish playoff probabilities or odds throughout the season using advanced statistical methods. These methods are somewhat secretive and typically advanced and difficult to understand. The goal of this work is to determine a way to calculate playoff probabilities midseason that can easily be understood and applied. The goal is to develop a method and compare its predictive accuracy to the current methods published by statistical baseball sources such as Baseball Prospectus and Fangraphs. ii TABLE OF CONTENTS List of Figures .............................................................................................................. iii List of Tables ............................................................................................................... iv Acknowledgements ..................................................................................................... -
Riding a Probabilistic Support Vector Machine to the Stanley Cup
J. Quant. Anal. Sports 2015; 11(4): 205–218 Simon Demers* Riding a probabilistic support vector machine to the Stanley Cup DOI 10.1515/jqas-2014-0093 elimination rounds. Although predicting post-season playoff outcomes and identifying factors that increase Abstract: The predictive performance of various team the likelihood of playoff success are central objectives of metrics is compared in the context of 105 best-of-seven sports analytics, few research results have been published national hockey league (NHL) playoff series that took place on team performance during NHL playoff series. This has between 2008 and 2014 inclusively. This analysis provides left an important knowledge gap, especially in the post- renewed support for traditional box score statistics such lockout, new-rules era of the NHL. This knowledge gap is as goal differential, especially in the form of Pythagorean especially deplorable because playoff success is pivotal expectations. A parsimonious relevance vector machine for fans, team members and team owners alike, often both (RVM) learning approach is compared with the more com- emotionally and economically (Vrooman 2012). A better mon support vector machine (SVM) algorithm. Despite the understanding of playoff success has the potential to potential of the RVM approach, the SVM algorithm proved deliver new insights for researchers studying sports eco- to be superior in the context of hockey playoffs. The proba- nomics, competitive balance, home ice advantage, home- bilistic SVM results are used to derive playoff performance away playoff series sequencing, clutch performances and expectations for NHL teams and identify playoff under- player talent. achievers and over-achievers. The results suggest that the In the NHL, the Stanley Cup is granted to the winner Arizona Coyotes and the Carolina Hurricanes can both of the championship after four playoff rounds, each con- be considered Round 2 over-achievers while the Nash- sisting of a best-of-seven series.