Out of Left Field Evidence for and Against Baseball’S Conventional Wisdom

Total Page:16

File Type:pdf, Size:1020Kb

Out of Left Field Evidence for and Against Baseball’S Conventional Wisdom Olin College Out of Left Field Evidence For and Against Baseball’s Conventional Wisdom Christopher Joyce 10/4/2012 This project aims to map baseball conventional wisdom to facts, and see if the things that ‘everybody knows’ about baseball are supported or contradicted by statistical evidence. I found that most baseball conventional wisdom does not line up with statistics, and in some cases is outright contradicted. Contents Background & Dataset .................................................................................................................................. 2 Exploration #1: Walks per Country .............................................................................................................. 4 Exploration #2: Good-Fielding Shortstops ................................................................................................... 6 Exploration #3: Knuckleballers and Control ................................................................................................. 8 Conclusions ................................................................................................................................................. 10 Bibliography ................................................................................................................................................ 11 Background & Dataset Baseball is a sport with a 140 year old history, but it’s only in the past 30 years or so that there’s been much focus on analyzing baseball with statistics (an approach called sabermetrics), rather than what the old hands think they know about the game. One of the first books about sabermetrics was written by an engineering professor at Johns Hopkins, Earnshaw Cook, in 1964 [1]. Teams had used statistics before this point, but it was somewhat controversial and back-alley – the idea of baseball statistics wasn’t in the public eye. For a few decades, there was a back and forth between those who believed in baseball by statistics, and those who believed in baseball by feel. This debate was silenced in the early 2000’s, when Moneyball was published [2]. Theo Epstein, the GM of the Boston Red Sox said, “That book hit The New York Times best-seller list. People who own baseball teams read The New York Times best-seller list. So they started asking questions about the processes their front offices were using, and it changed things really quickly.” Despite this advance, statistics haven’t fully converted all of baseball’s old guard. While all 30 MLB clubs incorporate statistics into their player evaluations, only 15 to 20 of them rely on it “heavily” [2]. This means a full half to a third of baseball teams still do things, in large part, by ‘feel’. Statistics have certainly informed what scouts look for, by redefining what a useful player is. For example, on- base percentage wasn’t even a metric people looked at heavily as recently as 2002 [2]; but both a walk and a single will get a player on base, and in a position to score a run. It seems pretty clear, given all this, that much of baseball’s conventional wisdom may be downright wrong; however, fans and announcers will still insist about what they ‘know’ of baseball – even without any facts [3]. In example, there’s an old baseball adage that ‘You can’t walk your way off the island’, referring to players from Latin America who will swing wildly at pitches, with the thought that it’s harder to get noticed by pro scouts by exercising restraint at the plate, and fighting through grueling at-bats; than it is by hitting balls way outside the strike zone for home runs [4]. This is something that ‘everybody knows’, but I have never seen any data to support. Some of baseball’s conventional wisdom is so pervasive that it’s actually not possible to do statistical analysis of why it might be true – an example being lefty catchers. The perception exists – mostly unsubstantiated – that it’s a crippling disadvantage for a catcher to be left-handed because they’d have a hard time making a pickoff move to first around a right-handed batter. This feeling is so pervasive that there have only been five left-handed catchers to play in at least 100 games [5]. The last catcher to play as a lefty in the MLB did so in 1989; and as of 2009, there was not a single left-handed catcher in the MLB or the minor leagues [6]. Teams eventually started making their decisions based on data rather than what ‘everyone knows’, in no small part because of the general manager of the Oakland A’s, Billy Beane. Beane was one of the first to make a real effort to go against the grain of what ‘everyone knew’. Rather than just incorporating statistics, where statistics disagreed with his math, he’d trust the math [7]. I used Sean Lahman’s compendium of data, which came in a .csv format. It includes notable individual statistics dating all the way back to 1870, as well as team records. This dataset is free, easily importable into Python for analysis, and quite complete. There were a few mismatched tags I had to modify, but none of them affected numbers within the dataset. I only use a part of the dataset – player tables, batting tables, fielding tables, and pitching tables. This dataset is ripe for more analysis, however, most baseball stereotypes exist about individual players, not teams. Seeing as this dataset came from the internet, I validated it by spot-checking certain player statistics. I checked certain statistics that jumped out at me when reading through summaries I generated of the data set, and found that they were true, just not expected. For example, I found that an individual (Ed Porray) whose birth country was listed as “A Boat on the Atlantic Ocean” was, in fact, born while at sea. I feel confident that my dataset accurately represents what occurred in baseball, with some possible human error, given these tests. Exploration #1: Walks per Country One common piece of conventional wisdom was discussed above – “You can’t walk your way off the island”. The stereotype is that players from Latin America – the Dominican Republic specifically – swing at a lot of balls that aren’t hittable, with the thought that pro scouts don’t notice those who take long at-bats. To test this, I summed all plate appearances and walks by players born in each country, and took the walks-per-plate-appearance number for each nation that has sent a man to the majors. First, I made a histogram of walks by country since 1970. I chose to throw out all pre-1970 data to be sure that I was only getting data in the modern era for this analysis – it seems like a fairer comparison to only look at dates after Hispanic players had been playing in the MLB at a high level for a decade or so. I chose the 1970 season as my cutoff point for this reason. Figure 1: Walks per at-bat, by country. This chart seems to show that Germans are walking machines; while Afghans actively try not to walk. This comes across as funny right from the get-go; because this spread is so huge. Clearly, something is not quite right within the dataset. My next strategy was to create a cutoff for a total number of plate appearances a country needs to have to be considered. I chose this to be 5000; because the average starting player will come to bat about 500 times in a season, and ten player- seasons seems like an appropriate sample size to define as the lower bound for comparison. Figure 2: Walks per at-bat, by country, 5,000 plate appearances per country cutoff. This graph shows the difference much more clearly. The maximum difference is 4 walks per 100 plate appearances. Interestingly, the Dominican Republic does not have the lowest walks-per-plate- appearance figure – they’re about one walk in a hundred above Mexico, which has the lowest walks per 100 of any country. The league average is about one walk in a hundred above the DR. Absolute numbers do indicate a difference in walks by Dominicans versus the league average. It’s hard to create a real test statistic for this data, because my dataset comprises all players who have ever taken a swing in the major leagues; and so extrapolating from this dataset means very little. However, I can reasonably make the claim that, while an absolute difference does exist, it shouldn’t be one that can be reasonably picked out without statistical analysis of the type done here. Monte Carlo simulations indicate that the specific difference being analyzed – that of Dominicans versus the league average – has a probability of happening by chance of less than 1 in ten thousand for ten thousand iterations. Given that, I am willing to call the bias statistically significant; however, that is separable from whether the stereotype is reasonably noticeable without statistical analysis. The thing to note here with respect to debunking or confirming the stereotype is not the likelihood of the effect, but rather the size of the effect. If this behavior was being picked out fairly by announcers, they would be more likely to refer to Canadians as walking machines, or Mexicans as those who never walk, not Dominicans. Those effects are significantly larger; to the tune of 2 walks in a hundred off of the league average; as compared to about 1 in 100 for Dominicans. If an effect twice as strong goes unnoticed, I think it’s reasonable to chalk this effect up to confirmation bias: when someone sees a Dominican strike out, they add it as a data point supporting a stereotype they want to believe. Exploration #2: Good-Fielding Shortstops Another common baseball stereotype is that the better a fielder your shortstop, the worse a hitter he is. To analyze this, I ran two correlations: between games per error and batting average, and between assists per game and batting average. Both of these are good, simple measures of defensive quality. There are more complicated metrics in existence, but I chose to use errors and assists because, if a shortstop has few errors but few assists as well, he may not actually be a good defensive shortstop – he might just be slow.
Recommended publications
  • India's Take on Sports Analytics
    PSYCHOLOGY AND EDUCATION (2020) 57(9): 5817-5827 ISSN: 00333077 India’s Take on Sports Analytics Rohan Mehta1, Dr.Shilpa Parkhi2 Student, Symbiosis Institute of Operations Mangement, Nashik, India Deputy Director, Symbiosis Institute of Operations Mangement, Nashik, India Email Id: [email protected] ABSTRACT Purpose – The aim of this paper is to study what is sports analytics, what are the different roles in this field, which sports are prominently using this, how big data has impacted this field, how this field is shaping up in Indian context. Also, the aim is to study the growth of job opportunities in this field, how B-schools are shaping up in this aspect and what are the interests and expectations of the B-school grads from this sector. Keywords Sports analytics, Sabermetrics, Moneyball, Technologies, Team sports, IOT, Cloud Article Received: 10 August 2020, Revised: 25 October 2020, Accepted: 18 November 2020 Design Approach analysis, he had done on approximately 10000 deliveries. Another writer, for one of the US The paper starts by explaining about the origin of magazines, F.C Lane was of the opinion that the sports analytics, the most naïve form of it, then batting average of the individual doesn’t reflect moves towards explaining the evolution of it over the complete picture of the individual’s the years (from emergence of sabermetrics to the performance. There were other significant efforts most advanced applications), how it has spread made by other statisticians or writers such as across different sports and how the applications of George Lindsey, Allan Roth, Earnshaw Cook till it has increased with the advent of different 1969.
    [Show full text]
  • The Numbers Game: Baseball's Lifelong Fascination with Statistics (Book Review)
    The Numbers Game: Baseball's Lifelong Fascination with Statistics (Book Review) The American Statistician May 1, 2005 | Cochran, James J. The Numbers Game: Baseball's Lifelong Fascination with Statistics. Alan SCHWARZ. New York: Thomas Dunne Books, 2004, xv + 270 pp. $24.95 (H), ISBN: 0‐312‐322222‐4. I am amazed at how frequently I walk into a colleague's office for the first time and spot a copy of The Baseball Encyclopedia or Total Baseball on a bookshelf. How many statisticians developed and nurtured their interest in probability and statistics by playing Strat‐O‐ Matic[R] and/or APBA[R] baseball board games; reading publications like the annual The Bill James Baseball Abstract; or scanning countless box scores and current summary statistics in the Sporting News or in the sports section of the local newspaper, looking for an edge in a rotisserie baseball league? For many of us, baseball is what initially opened our eyes to the power of probability and statistics, and the sport continues to be an integral part of our lives (for evidence of this, attend the sessions or business meeting of the ASA's Statistics in Sports section at the next Joint Statistical Meetings). This is why so many probabilists and statisticians will enjoy reading Alan Schwarz's The Numbers Game: Baseball's Lifelong Fascination with Statistics. In this book, Schwarz effectively chronicles the ongoing association between baseball and statistics by describing the evolution of the sport from the mid‐nineteenth century through the beginning of the twenty‐first century, and looking at several of its most influential characters.
    [Show full text]
  • Signature Redacted Author
    1 Using Machine Learning to Derive Insights from Sports Location Data by Joel Brooks Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2018 @ Massachusetts Institute of Technology 2018. All rights reserved. Signature redacted Author.. - -- 'A 1- - .... ..................... A5 epartment of Electrical Engineering and Computer Science April 2, 2018 CertifiedCetiid byySignature redacted ....... ... ... .. ..... John Guttag Dugald C. Jackson Professor Thesis Supervisor Signature redacted Accepted by ........... I ((J Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair, Department Committee on Graduate Students MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUN 18 2018 LIBRARIES ARCHIVES 77 Massachusetts Avenue Cambridge, MA 02139 MITLibraries http://Iibraries.mit.edu/ask DISCLAIMER NOTICE Due to the condition of the original material, there are unavoidable flaws in this reproduction. We have made every effort possible to provide you with the best copy available. Thank you. The images contained in this document are of the best quality available. 2 Using Machine Learning to Derive Insights from Sports Location Data by Joel Brooks Submitted to the Department of Electrical Engineering and Computer Science on April 2, 2018, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract Historically, much of sports analytics has aimed to find relationships between discrete events and outcomes. The availability of high-resolution event location and tracking data has led to many new opportunities in sports research. However, it is often challenging to apply machine learning to understand a particular aspect of a sport.
    [Show full text]
  • Sabermetrics Over Time: Persuasion and Symbolic Convergence Across a Diffusion of Innovations
    SABERMETRICS OVER TIME: PERSUASION AND SYMBOLIC CONVERGENCE ACROSS A DIFFUSION OF INNOVATIONS BY NATHANIEL H. STOLTZ A Thesis Submitted to the Graduate Faculty of WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES in Partial Fulfillment of the Requirements for the Degree of MASTER OF ARTS Communication May 2014 Winston-Salem, North Carolina Approved By: Michael D. Hazen, Ph. D., Advisor John T. Llewellyn, Ph. D., Chair Todd A. McFall, Ph. D. ii Acknowledgments First and foremost, I would like to thank everyone who has assisted me along the way in what has not always been the smoothest of academic journeys. It begins with the wonderful group of faculty I encountered as an undergraduate in the James Madison Writing, Rhetoric, and Technical Communication department, especially my advisor, Cindy Allen. Without them, I would never have been prepared to complete my undergraduate studies, let alone take on the challenges of graduate work. I also want to thank the admissions committee at Wake Forest for giving me the opportunity to have a graduate school experience at a leading program. Further, I have unending gratitude for the guidance and patience of my thesis committee: Dr. Michael Hazen, who guided me from sitting in his office with no ideas all the way up to achieving a completed thesis, Dr. John Llewellyn, whose attention to detail helped me push myself and my writing to greater heights, and Dr. Todd McFall, who agreed to assist the project on short notice and contributed a number of interesting ideas. Finally, I have many to thank on a personal level.
    [Show full text]
  • 投稿類別:英文寫作 篇名: Data Science and Analysis Are Changing
    投稿類別:英文寫作 篇名: Data Science and Analysis are Changing the Baseball 作者: 林俊沂。私立僑泰高中。高二 1 班 指導老師: 胡雯俐老師 Data Science and Analysis are Changing the Baseball I. Introduction 1. Motivation Whoever relishes baseball would spend valuable time paying highly constant attention to it. I am no exception. Reading articles or watching programs like MLB almost occupy my free time. I notice that understanding the statistics about baseball is essential because statistics is the most objective ways to define player’s capability. Although baseball statistics sometimes takes a lot of time to grasp, it is actually fun to strengthen the baseball knowledge and to acquaint the influence on players with data science. Therefore, I decide to learn more about baseball statistics and try to show its phenomena and effects through this research. 2. Purpose This research aims at the revolution of the baseball statistics with its origin and the use in the few years. I also sort out some opposite of baseball statistics to distinguish two different views of the topic to annotate it deeper. Through two-side arguments, the research aims to present the influence of the baseball statistics and explain it. II. Body 1. Sabermetrics Sabermetrics is a baseball statistics that can make objective analysis of baseball activities, as for the interpretation and evaluation of baseball statistics during baseball games. The term coined by Bill James, is derived from the acronym SABR which stands for the Society for America Baseball Research and is rooted with metrics. 1.1. The early history of Sabermetrics The first baseball statistics way to describe the baseball activity called box scores, developed by Henry Chadwick in 1858.
    [Show full text]
  • SABR Biblio News Son and Dover Publications About Possible Books Dover Might Include in a Projected Reprint Series
    Society for American Baseball Research BIBLIOGRAPHY COMMITTEE NEWSLETTER September 2008 (08—3) ©2008 Society for American Baseball Research Opinions expressed by contributors do not necessarily reflect the position or official policy of SABR or its Bibliography Committee Editor: Ron Kaplan (23 Dodd Street, Montclair, NJ 07042, 973-509-8162, [email protected]) In the last newsletter, I included a note from Paul Dick- SABR Biblio News son and Dover publications about possible books Dover might include in a projected reprint series. So far, the nomi- nees have included Every Diamond Doesn’t Sparkle (Fresco Comments from the Chair Thompson and Cy Rice), Dodger Daze and Knights (Tom- my Holmes), Baseball and the Cold War (Howard Senzel), Cleveland proved to be another excellent convention, Percentage Baseball (Earnshaw Cook), Ban Johnson: Czar of Baseball (Eugene Murdock), and 100 Years of Baseball with a decent hotel with a wide range of places to eat nearby (Lee Allen). If you have any further suggestions, please and a pleasant walk to the ballpark. Our committee meeting was graced with the presence send them to Paul ([email protected]) with a copy to me ([email protected]). of Frank Phelps, our founding chair, who hadn’t been able to get to a convention in some years. Frank attended the .Andy McCue Chair, Bibliography Committee committee meeting and took part in a number of convention activities. On Thursday evening, attendees could go over to the Western Reserve Historical Society for a tour of the SABR archives and the Frank Phelps Collection of baseball research materials, which Frank donated some years ago.
    [Show full text]
  • Major League Baseball Yesterday and Today
    THE CHANGING GAME REVISITED: Major League Baseball Yesterday and Today. BOB SAWYER ABSTRACT Retrosheet’s expanding data base no provides data about Reaching base On Error and Innings batted back to 1916. Section Two demonstrates how this information enables deduction of base running outs for teams and leagues Section Three extends the format of The Baseball Encyclopedia’s “The Changing Game” to opposition pitching and fielding . The six tables create a statistical profile of play during the 1916, 1921, 1971 and 2019 seasons. The tables identify areas in which Major League baseball changed rapidly between 1916 and 1921 and then continued changing at a more evolutionary pace for another century. Section One: THE VOCABULARY OF TABLETOP SUCCESS: Sports Illustrated Baseball™ is a board game in which statistics-based player-charts interact in simulation of baseball.1 Board gamers control of batting orders and substitutions, with dice rolls determining the outcomes of their choices to bunt or take a gamble on the base paths. Winning SI Baseball is the result of brains and luck rather than ability to throw, catch or hit a real baseball. And yet the game is so well-designed that good tactics for real baseball are nearly always just as useful for an SI Baseball manager. Fly outs, ground outs, singles, and doubles are subdivided in SI Baseball to allow differing amounts of potential advancement by base runners. The safe on error result designated by “E” has the same effect on batters and base runners as the type of single designated by “1”. The results of “WP”, “PB”, and BK(for Balk) are likewise interchangeable in Si Baseball.
    [Show full text]
  • How to Make a Fortune in Bull, Bear,And Black
    Thinking Outside the Box 5 How to hit home runs: I swing as hard as I can, and I try to swing right through the ball . The harder you grip the bat, the more you can swing it through the ball, and the farther the ball will go. I swing big, with everything I’ve got. I hit big or I miss big. I like to live as big as I can. —Babe Ruth What is striking is that the leading thinkers across varied felds—including horse betting, casino gambling, and investing—all emphasize the same point. We call it the Babe Ruth effect: even though Ruth struck out a lot, he was one of baseball’s greatest hitters. —Michael J. Mauboussin1 Lenny [Dykstra] didn’t let his mind mess him up. Only a psychological freak could approach a 100-mph Since the frst edition of Trend Following, sports analytics has fastball aimed not all that exploded. In the last decade professional sports have undergone a far from his head with total confdence. “Lenny remodeling, with teams scrambling to change strategies to accommodate was so perfectly designed, untold new trends in statistical analysis. There haven’t necessarily been emotionally, to play the major rule changes, nor have there been any substantial changes to the game of baseball. venues or the equipment. Instead, the renaissance is rooted in an uncon- He was able to instantly ventional process k nown as sabermetrics.3 forget any failure and draw strength from every Today, every major professional sports team either has an analyt- success.
    [Show full text]
  • Baumer and Zimablist Sabermetric Revolution
    The Sabermetric Revolution Benjamin Baumer, Andrew Zimbalist Published by University of Pennsylvania Press Benjamin Baumer. and Andrew Zimbalist. The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball. Philadelphia: University of Pennsylvania Press, 2013. Project MUSE. Web. 21 Aug. 2015.http://muse.jhu.edu/. For additional information about this book http://muse.jhu.edu/books/9780812209129 Access provided by University of Michigan @ Ann Arbor (4 Dec 2015 21:10 GMT) PREFACE ichael Lewis wrote Moneyball because he fell in love with a story. Te Mstory is about how intelligent innovation (the creative use of statistical analysis) in the face of market inefciency (the failure of all other teams to use available information productively) can overcome the unfairness of baseball economics (rich teams can buy all the best players) to enable a poor team to slay the giants. Lewis is an engaging storyteller and, along the way, introduces us to intriguing characters who carry forward the rags-to-riches plot. By the end, the story of the Oakland A’s and their general manager, Billy Beane, is so well told that we believe its portrayal of baseball history, economics, and competitive success. Te result is a new Horatio Alger tale that reinforces a beloved American myth and, all the better, applies to our national pastime. Te appeal of Lewis’s Moneyball was sufciently strong that Hollywood wanted a piece of the action. With a compelling script, smart direction, and the handsome Brad Pitt as Beane, Moneyball became part of mass culture and its perceived validity—and its legend—only grew. Tis book will attempt to set the record straight on Moneyball and the role of “analytics” in baseball.
    [Show full text]
  • Open Victoria Decesare Thesis - Final.Pdf
    THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF STATISTICS USING CONVENTIONAL AND SABERMETRIC BASEBALL STATISTICS FOR PREDICTING MAJOR LEAGUE BASEBALL WIN PERCENTAGE VICTORIA DECESARE SPRING 2016 A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Science with honors in Statistics Reviewed and approved* by the following: Andrew Wiesner Lecturer of Statistics Thesis Supervisor David Hunter Department Head of Statistics Honors Adviser * Signatures are on file in the Schreyer Honors College. i ABSTRACT Major League Baseball is dominated by statistical analysis; one cannot watch a baseball game on the television without hearing and seeing a plethora of statistics such as batting average, runs batted in, earned run average, and the list goes on. In addition to these popular stats that most people are familiar with, there are several, more complex baseball statistics – known as “sabermetric” statistics – that have been developed over the past few decades that seek to evaluate players and the game more scientifically and comprehensively. However, with all of these stats available, it is easy to get caught up in the data and overlook the main goal of MLB teams: to win games. With this in mind, the goal of this research is to explore some of the numerous baseball statistics available, both the traditional and modern ones, and observe which ones are truly the best at predicting wins. Encompassing this, is it better to use the more complex methods in analyzing how teams win, or does it hold true that “less is more”? This research seeks to answer these questions and to provide a unique perspective for fans and managers alike when trying to make use of the ever-growing world of baseball data.
    [Show full text]
  • Alesandrini, Zach 22007 Caughron.Pdf (1.404Mb)
    NORTHERN ILLINOIS UNIVERSITY How Winning Teams Become Championship Teams: a Baseball Team Architectural Analysis A Thesis Submitted to the University Honors Program In Partial Fulfillment of the Requirements of the Baccalaureate Degree With University Honors Department of Management By Zach Alesandrini DeKalb, Illinois May, 2007 ABSTRACT The purpose of this project was to understand why certain Major League Baseball teams are extremely successful during Major League Baseball's regular season, but often do not have that same success in the play-offs. Similarly, why do certain teams have tremendous success in the play-offs? This project analyzed the architecture of Major League Baseball teams that had success in the regular season to teams that had success in the post season. More specifically, the teams were separated into two groups: teams that were Regular Season champions and teams that were World Series champions. The teams were analyzed and compared on the basis of how they distributed their payroll and statistics within their rosters. The analysis showed that there were differences between the two groups. There were also many similarities between the two groups which proved that successful teams are built in many of the same ways. Finally, recommendations were made on how a team can build its roster to have not only a winning team, but a championship team. 11 TABLE OF CONTENTS Page LIST OF APPENDICES iii INTRODUCTION : 1 REVIEW OF LITERATURE 4 METHODS 17 RESULTS 32 DISCUSSION 43 CONCLUSIONS 49 BffiLIOGRAPHY 53 111 TABLE OF APPENDICES Page APPENDIX A - Team Rosters and Base Data 55 APPENDIX B - Individual Player Payroll Information and Distribution 70 APPENDIX C - IndividualPlayer Statistical Information and Distribution 86 Alesandrini 1 INTRODUCTION Every year, in late February, Major League Baseball (MLB) players report to their respective camps in Arizona or Florida for Spring Training.
    [Show full text]
  • A Regression Model Using Common Baseball Statistics to Project Offensive and Defensive Efficiency
    REGRESSION PLANES TO IMPROVE THE PYTHAGOREAN PERCENTAGE A regression model using common baseball statistics to project offensive and defensive efficiency by Dennis Moy A thesis submitted in fulfillment of the requirements for the degree of honors in Statistics University of California - Berkeley 2006 UNIVERSITY OF CALIFORNIA - BERKELEY ABSTRACT Prediction Planes for the Pythagorean Percentage by Dennis Moy Advisor: Professor David Aldous Department of Statistics In 1985, Bill James, arguably the most renowned analytical baseball statistician, devised a very simple, but effective formula that predicted a team’s winning percentage given its runs scored and runs allowed. Despite its remarkable accuracy, this model, coined Pythagorean expectation, was used primarily on seasons of the past rather than performance forecasts. This thesis develops prediction models for runs scored and runs allowed that will be converted by Pythagorean expectation to winning percentages. Data from the past twenty years were taken from four different sources of baseball statistics via the internet to produce 562 arrays that underwent computations through GRETL to create two different ordinary least-squares regression planes (offense and defense). The GRETL outputs yielded robust models that had strong positive R2 results with significant F-statistics from the Wald test that evaluated the planes’ goodness of fit, which with a potentially adjusted Pythagorean expectation, can now forecast future winning percentages. Armed with this knowledge and a little calculus,
    [Show full text]