Predicting NCAAB Match Outcomes Using ML Techniques – Some Results and Lessons Learned
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Incorporating the Effects of Designated Hitters in the Pythagorean Expectation
Abstract The Pythagorean Expectation is widely used in the field of sabermetrics to estimate a baseball team’s overall season winning percentage based on the number of runs scored and allowed in its games thus far. Bill James devised the simplest version RS 2 p q of the formula through empirical observation as W inning P ercentage RS 2 RA 2 “ p q `p q where RS and RA are runs scored and allowed, respectively. Statisticians later found 1.83 to be a more accurate exponent, estimating overall season wins within 3-4 games per season. Steven Miller provided a theoretical justification for the Pythagorean Expectation by modeling runs scored and allowed as independent continuous random variables drawn from Weibull distributions. This paper aims to first explain Miller’s methodology using recent data and then build upon Miller’s work by incorporating the e↵ects of designated hitters, specifically on the distribution of runs scored by a team. Past studies have attempted to include other e↵ects on run production such as ballpark factor, game state, and pitching power. The results indicate that incorporating information on designated hitters does not improve the error of the Pythagorean Expectation to better than 3-4 games per season. ii Contents Abstract ii Acknowledgements vi 1 Background 1 1.1 Empirical Derivation ........................... 2 1.2 Weibull Distribution ........................... 2 1.3 Application to Other Sports ....................... 4 2 Miller’s Model 5 2.1 Model Assumptions ............................ 5 2.1.1 Continuity of the Data ...................... 6 2.1.2 Independence of Runs Scored and Allowed ........... 7 2.2 Pythagorean Won-Loss Formula .................... -
Machine Learning Applications in Baseball: a Systematic Literature Review
This is an Accepted Manuscript of an article published by Taylor & Francis in Applied Artificial Intelligence on February 26 2018, available online: https://doi.org/10.1080/08839514.2018.1442991 Machine Learning Applications in Baseball: A Systematic Literature Review Kaan Koseler ([email protected]) and Matthew Stephan* ([email protected]) Miami University Department of Computer Science and Software Engineering 205 Benton Hall 510 E. High St. Oxford, OH 45056 Abstract Statistical analysis of baseball has long been popular, albeit only in limited capacity until relatively recently. In particular, analysts can now apply machine learning algorithms to large baseball data sets to derive meaningful insights into player and team performance. In the interest of stimulating new research and serving as a go-to resource for academic and industrial analysts, we perform a systematic literature review of machine learning applications in baseball analytics. The approaches employed in literature fall mainly under three problem class umbrellas: Regression, Binary Classification, and Multiclass Classification. We categorize these approaches, provide our insights on possible future ap- plications, and conclude with a summary our findings. We find two algorithms dominate the literature: 1) Support Vector Machines for classification problems and 2) k-Nearest Neighbors for both classification and Regression problems. We postulate that recent pro- liferation of neural networks in general machine learning research will soon carry over into baseball analytics. keywords: baseball, machine learning, systematic literature review, classification, regres- sion 1 Introduction Baseball analytics has experienced tremendous growth in the past two decades. Often referred to as \sabermetrics", a term popularized by Bill James, it has become a critical part of professional baseball leagues worldwide (Costa, Huber, and Saccoman 2007; James 1987). -
Loss Aversion and the Contract Year Effect in The
Gaming the System: Loss Aversion and the Contract Year Effect in the NBA By Ezekiel Shields Wald, UCSB 2/20/2016 Advisor: Professor Peter Kuhn, Ph.D. Abstract The contract year effect, which involves professional athletes strategically adjusting their effort levels to perform more effectively during the final year of a guaranteed contract, has been well documented in professional sports. I examine two types of heterogeneity in the National Basketball Association, a player’s value on the court relative to their salary, and the presence of several contract options that can be included in an NBA contract. Loss aversion suggests that players who are being paid more than they are worth may use their current salaries as a reference point, and be motivated to improve their performance in order to avoid a “loss” of wealth. The presence of contract options impacts the return to effort that the players are facing in their contract season, and can eliminate the contract year effect. I use a linear regression with player, year and team fixed effects to evaluate the impact of a contract year on relevant performance metrics, and find compelling evidence for a general contract year effect. I also develop a general empirical model of the contract year effect given loss aversion, which is absent from previous literature. The results of this study support loss-aversion as a primary motivator of the contract year effect, as only players who are marginally overvalued show a significant contract year effect. The presence of a team option in a player’s contract entirely eliminates any contract year effects they may otherwise show. -
Exploring Chance in NCAA Basketball
Exploring chance in NCAA basketball Albrecht Zimmermann [email protected] INSA Lyon Abstract. There seems to be an upper limit to predicting the outcome of matches in (semi-)professional sports. Recent work has proposed that this is due to chance and attempts have been made to simulate the distribution of win percentages to identify the most likely proportion of matches decided by chance. We argue that the approach that has been chosen so far makes some simplifying assumptions that cause its result to be of limited practical value. Instead, we propose to use clustering of statistical team profiles and observed scheduling information to derive limits on the predictive accuracy for particular seasons, which can be used to assess the performance of predictive models on those seasons. We show that the resulting simulated distributions are much closer to the observed distributions and give higher assessments of chance and tighter limits on predictive accuracy. 1 Introduction In our last work on the topic of NCAA basketball [7], we speculated about the existence of a “glass ceiling” in (semi-)professional sports match outcome prediction, noting that season-long accuracies in the mid-seventies seemed to be the best that could be achieved for college basketball, with similar results for other sports. One possible explanation for this phenomenon is that we are lacking the attributes to properly describe sports teams, having difficulties to capture player experience or synergies, for instance. While we still intend to explore this direction in future work,1 we consider a different question in this paper: the influence of chance on match outcomes. -
Predicting Outcomes of NCAA Basketball Tournament Games
Cripe 1 Predicting Outcomes of NCAA Basketball Tournament Games Aaron Cripe March 12, 2008 Math Senior Project Cripe 2 Introduction All my life I have really enjoyed watching college basketball. When I was younger my favorite month of the year was March, solely because of the fact that the NCAA Division I Basketball Tournament took place in March. I would look forward to filling out a bracket every year to see how well I could predict the winning teams. I quickly realized that I was not an expert on predicting the results of the tournament games. Then I started wondering if anyone was really an expert in terms of predicting the results of the tournament games. For my project I decided to find out, or at least compare some of the top rating systems and see which one is most accurate in predicting the winner of the each game in the Men’s Basketball Division I Tournament. For my project I compared five rating systems (Massey, Pomeroy, Sagarin, RPI) with the actual tournament seedings. I compared these systems by looking at the pre-tournament ratings and the tournament results for 2004 through 2007. The goal of my project was to determine which, if any, of these systems is the best predictor of the winning team in the tournament games. Project Goals Each system that I compared gave a rating to every team in the tournament. For my project I looked at each game and then compared the two team’s ratings. In most cases the two teams had a different rating; however there were a couple of games where the two teams had the same rating which I will address later. -
Riding a Probabilistic Support Vector Machine to the Stanley Cup
J. Quant. Anal. Sports 2015; 11(4): 205–218 Simon Demers* Riding a probabilistic support vector machine to the Stanley Cup DOI 10.1515/jqas-2014-0093 elimination rounds. Although predicting post-season playoff outcomes and identifying factors that increase Abstract: The predictive performance of various team the likelihood of playoff success are central objectives of metrics is compared in the context of 105 best-of-seven sports analytics, few research results have been published national hockey league (NHL) playoff series that took place on team performance during NHL playoff series. This has between 2008 and 2014 inclusively. This analysis provides left an important knowledge gap, especially in the post- renewed support for traditional box score statistics such lockout, new-rules era of the NHL. This knowledge gap is as goal differential, especially in the form of Pythagorean especially deplorable because playoff success is pivotal expectations. A parsimonious relevance vector machine for fans, team members and team owners alike, often both (RVM) learning approach is compared with the more com- emotionally and economically (Vrooman 2012). A better mon support vector machine (SVM) algorithm. Despite the understanding of playoff success has the potential to potential of the RVM approach, the SVM algorithm proved deliver new insights for researchers studying sports eco- to be superior in the context of hockey playoffs. The proba- nomics, competitive balance, home ice advantage, home- bilistic SVM results are used to derive playoff performance away playoff series sequencing, clutch performances and expectations for NHL teams and identify playoff under- player talent. achievers and over-achievers. The results suggest that the In the NHL, the Stanley Cup is granted to the winner Arizona Coyotes and the Carolina Hurricanes can both of the championship after four playoff rounds, each con- be considered Round 2 over-achievers while the Nash- sisting of a best-of-seven series. -
Rebound 18 10 26 #1
October 26, 2018 EPA Docket Center U.S. EPA, Mail Code 28221T 1200 Pennsylvania Ave., NW Washington, DC 20460 Submitted via Regulations.gov Docket: EPA-HQ-OAR-2018-0283 Docket: NHTSA-2018-0067 Re: Joint Comments of Health, Environmental, and Conservation Groups on EPA’s Proposed Rule: The Safer Affordable Fuel-Efficient (SAFE) Vehicles Rule for Model Years 2021–2026 Passenger Cars and Light Trucks, 83 Fed. Reg. 42,986 (Aug. 24, 2018) The Environmental Defense Fund and Union of Concerned Scientists (“Organizations”) hereby submit these joint comments in opposition to the Administrator’s proposal to roll back existing light-duty vehicle greenhouse gas emissions standards. See 83 Fed. Reg. 42,986 (Aug. 24, 2018) (“Rollback” or “Proposal”). These comments discuss certain of the Organizations’ objections to the analysis and application of the rebound effect in the Proposal. Some of the Organizations are also filing separate comment letters to provide more detail and to address additional issues. As described in detail in our comments (attached as Appendix A), the agencies’ analysis and application of the rebound effect is fatally flawed. The agencies fail to acknowledge or defend their changes in position. Specifically, they mischaracterize their own analysis of the appropriate rebound rate in their 2010 and 2012 final rules, and completely fail to mention the analyses in the 2016 Draft TAR and EPA’s 2016 Final TSD. The agencies also utilize unweighted, average 1 values of the studies they consider, contrary to prior acknowledgement that such a methodology is unreasonable and inadequate. Moreover, the agencies do not acknowledge that they previously considered 13 of the 16 studies listed in PRIA Table 8-8 and concluded that those studies supported the agencies’ previously adopted value for the rebound effect of 10%. -
Sports Analytics from a to Z
i Table of Contents About Victor Holman .................................................................................................................................... 1 About This Book ............................................................................................................................................ 2 Introduction to Analytic Methods................................................................................................................. 3 Sports Analytics Maturity Model .................................................................................................................. 4 Sports Analytics Maturity Model Phases .................................................................................................. 4 Sports Analytics Key Success Areas ........................................................................................................... 5 Allocative and Dynamic Efficiency ................................................................................................................ 7 Optimal Strategy in Basketball .................................................................................................................. 7 Backwards Selection Regression ................................................................................................................... 9 Competition between Sports Hurts TV Ratings: How to Shift League Calendars to Optimize Viewership ................................................................................................................................................................. -
Predicting College Basketball Match Outcomes Using Machine Learning
Predicting NCAAB match outcomes using ML techniques – some results and lessons learned Zifan Shi, Sruthi Moorthy, Albrecht Zimmermann⋆ KU Leuven, Belgium Abstract. Most existing work on predicting NCAAB matches has been developed in a statistical context. Trusting the capabilities of ML tech- niques, particularly classification learners, to uncover the importance of features and learn their relationships, we evaluated a number of different paradigms on this task. In this paper, we summarize our work, pointing out that attributes seem to be more important than models, and that there seems to be an upper limit to predictive quality. 1 Introduction Predicting the outcome of contests in organized sports can be attractive for a number of reasons such as betting on those outcomes, whether in organized sports betting or informally with colleagues and friends, or simply to stimulate conversations about who “should have won”. We would assume that this task is easier in professional leagues, such as Major League Baseball (MLB), the National Basketball Association (NBA), or the National Football Association (NFL), since there are only relatively few teams and their quality does not vary too widely. As an effect of this, match statistics should be meaningful early on since the competition is strong, and teams play the same opponents frequently. Additionally, professional leagues typically play more matches per team per season, e.g. 82 in the NBA or 162 in MLB, than in college or amateur leagues in which the sport in question is (supposed to be) only -
How to Win Your Ncaa Tournament Pool
HOW TO WIN YOUR NCAA TOURNAMENT POOL All Rights Reserved Copyright © 2015 The Power Rank, Inc. All Rights Reserved. No part of this publication may be reproduced, distributed, or transmitted in any form or by any means, including photocopying, recording, or other electronic or mechanical methods, without the prior written permission of the publisher, except in the case of brief quotations embodied in critical reviews and certain other noncommercial uses permitted by copyright law. For permission requests, write to the publisher, addressed “Attention: Permissions Coordinator,” at the address below. The Power Rank, Inc. 2390 Adare Road Ann Arbor, MI 48104 Legal Disclaimer: Tips, predictions, and strategy published in this book are only the opinions of the author and for entertainment purposes only. They are not defnite predictions or guaranteed strategies. The material contained herein is intended to inform and educate the reader and in no way represents an inducement to gamble legally or illegally. Betting is illegal in some jurisdictions. It is the sole responsibility of the reader to act in accordance with their local laws. Printed in the United States of America First Printing, 2015 HOW TO WIN YOUR NCAA TOURNAMENT POOL Can analytics predict the tourney? You want to win your March Madness pool. Every March, your friend sends that email about the pool for the NCAA men's basketball tournament. Winning the pool offers not only a significant pot of money but also bragging rights for a year. However, picking a bracket is challenging. Which 12 seed might pull off the upset over a 5 seed this year? Is there a 15 seed that will terminate a 2 seed in the round of 64? (Later, I'll discuss how neither of these matter much for winning your pool.) Unless you cover college basketball for a living, you don't have the time to know much about all 60-some college basketball teams that make the tourney. -
Modeling Basketball Games As Alternating Renewal-Reward Processes and Predicting Match Outcomes a Thesis Submitted to the Department of Mathematics for Honors
Modeling Basketball Games as Alternating Renewal-Reward Processes and Predicting Match Outcomes A thesis submitted to the Department of Mathematics for honors Conrad De Peuter April 22, 2013 Duke University, Durham NC Abstract We fit an Alternating Renewal-Reward Process model to basketball games and use the models to predict the outcomes of 1209 games in the 2012-2013 National Basketball Association (NBA) season. Using data collected from NBC play-by-play pages we fit various models for each team’s renewal process (time of possession) and reward process (points per possession). Using these estimated distributions we simulate the outcome of each of the 1209 games. We introduce four seperate models to predict these games. To evaulate our models we compared their predictions with that of other commonly used methods such as team record, Pythagorean Win Percentage, and Bookmaker odds. The research suggests that an Alternating Renewal-Reward Process is an appropriate fit to a basketball game. 1 Introduction Possession based statistics have been a cornerstone of most advanced statistical analysis of the National Basketball Association (NBA) in the past decade. Although possessions are not an official NBA statistic, their use is considered essential in advanced statistical analysis. A possession starts when a team gains control of the ball, and ends when the team loses control of the ball [9]. It is important to distinguish between possessions and plays. A play is an individual action to score, and a possession can have multiple plays. For example, a period in which a team shoots and misses, gets an o↵ensive rebound, then shoots and scores has two plays; however, it is only one possession since the opposing team never gained control of the ball. -
Game #5 // Illinois (3-1) Vs. the Citadel (0-3*) *Plays Nov. 19 Nov
2019-20 UNIVERSITY OF ILLINOIS FIGHTING ILLINI BASKETBALL Game #5 // Illinois (3-1) vs. The Citadel (0-3*) *plays Nov. 19 Nov. 20, 2019 // 8 p.m. CT // Champaign, Ill. // State Farm Center (15,544) TV: BTN – Kevin Kugler (Play-By-Play), Stephen Bardo (Analyst) Radio: Illini Sports Network – Brian Barnhart (Play-By-Play), Deon Thomas (Analyst) // Satellite Radio: Sirius/XM – 84, Streaming – 84 ILLINOIS 2019-20 SCHEDULE & RESULTS PROBABLE STARTERS (FROM THE LAST GAME) Pos. No. Name Ht. Wt. Yr. PPG RPG APG Note DATE OPPONENT TIME/RESULT TV/STREAM G 11 Ayo Dosunmu 6-5 185 So. 13.5 3.5 3.8 Wooden & Naismith Watch Lists NOVEMBER (3-1) G 10 Andres Feliz 6-2 195 Sr. 16.0 8.8 3.5 51.3% 2FG & 80.8% FT 5 NICHOLLS W, 78-70 (OT) BTN+ G 1 Trent Frazier 6-2 175 Jr. 9.8 2.5 1.8 Jerry West Watch List 8 at Grand Canyon W, 83-71 WCIX F 15 Giorgi Bezhanishvili 6-9 235 So. 9.0 5.0 1.3 50% FG & 57.1% 3FG 10 at #21 Arizona L, 69-90 PAC12 C 21 Kofi Cockburn 7-0 290 Fr. 14.3 11.5 0.5 3 double-doubles 18 HAWAII W, 66-53 ESPNU 20 THE CITADEL 8 p.m. BTN OFF THE BENCH 23 HAMPTON 7 p.m. BTN+ Pos. No. Name Ht. Wt. Yr. PPG RPG APG Note 26 LINDENWOOD 7 p.m. BTN+ G 0 Alan Griffin 6-5 195 So. 6.8 4.0 0.3 Leads team w/ 5 threes DECEMBER (0-0) F 2 Kipper Nichols 6-6 220 r-Sr.