A Statistical Study Nicholas Lambrianou 13' Dr. Nicko
Total Page:16
File Type:pdf, Size:1020Kb
Examining if High-Team Payroll Leads to High-Team Performance in Baseball: A Statistical Study Nicholas Lambrianou 13' B.S. In Mathematics with Minors in English and Economics Dr. Nickolas Kintos Thesis Advisor Thesis submitted to: Honors Program of Saint Peter's University April 2013 Lambrianou 2 Table of Contents Chapter 1: The Study and its Questions 3 An Introduction to the project, its questions, and a breakdown of the chapters that follow Chapter 2: The Baseball Statistics 5 An explanation of the baseball statistics used for the study, including what the statistics measure, how they measure what they do, and their strengths and weaknesses Chapter 3: Statistical Methods and Procedures 16 An introduction to the statistical methods applied to each statistic and an explanation of what the possible results would mean Chapter 4: Results and the Tampa Bay Rays 22 The results of the study, what they mean against the possibilities and other results, and a short analysis of a team that stood out in the study Chapter 5: The Continuing Conclusion 39 A continuation of the results, followed by ideas for future study that continue to project or stem from it for future baseball analysis Appendix 41 References 42 Lambrianou 3 Chapter 1: The Study and its Questions Does high payroll necessarily mean higher performance for all baseball statistics? Major League Baseball (MLB) is a league of different teams in different cities all across the United States, and those locations strongly influence the market of the team and thus the payroll. Year after year, a certain amount of teams, including the usual ones in big markets, choose to spend a great amount on payroll in hopes of improving their team and its player value output, but at times the statistics produced by these teams may not match the difference in payroll with other teams. This observation invites a few questions for investigation. Are high-payroll teams actually seeing an improvement in results? Are the results between high-payroll and non-high-payroll teams actually statistically different? What statistics present the strongest relation with high payroll increase? What statistics present the weakest relation with payroll increase? The questions and possibilities are endless, so those are just the beginning, but the purpose of this study is to answer the questions raised above and to investigate if high-payroll teams truly perform better, and then interpret what the results actually mean. To accomplish this, statistical methods will be utilized for the investigation. Chapter two, The Baseball Statistics, will give a brief description of each statistic used in the study, including how it is formed and what exactly it measures. This will include a detailed breakdown of basic and more advanced baseball statistics that are used in the analysis. Chapter three, Statistical Lambrianou 4 Methods and Procedures, will describe the statistical tests applied to the above statistics using team payroll as either a designation for groups or as a dependent variable for study. The chapter will also discuss what the possible results of each statistical test would imply about what is being tested. Chapter four, Results and the Tampa Bay Rays, will interpret the results of the analysis, and it will also profile the Tampa Bay Rays, a low-payroll team that manages to produce results that are equal or better than those of teams on the high-end of the payroll spectrum. Chapter 5, The Continuing Conclusion, will include concluding thoughts, as well as a discussion for future analysis based on the procedures and results of the study. Lambrianou 5 Chapter 2: Statistics and Methods Sabermetrics is the statistical analysis of baseball data. Rather than focusing on the traditional statistics that have been flashed across the bottom of televised baseball games, scoreboards at ballparks, and Sports News Broadcasts for years, the term “Sabermetrics” has come to define the study of more advanced baseball statistics. The goal of Sabermetrics is to try to objectively estimate the worth of a player or a specific part of that player, rather than focusing on data that could be immaterial. This thesis presents both traditional statistics, and the more advanced baseball statistics associated with Sabermetrics. The results of the analysis, which are based on compiled team data, rely more heavily on Sabermetrics-associated statistics. The baseball statistics that are used in this study mostly fall into two different categories: those drawn from position players and those drawn from pitching. The one exception is Team Winning Percentage, which takes into account contributions from both the categories listed above and is measured by taking wins over wins and losses, or total games played. On the statistics split into the two categories, position players and pitching, there is an overlap between some of the data. Some of the team statistics objectively improve upon others and are thus better measurements. Overall, the purpose of providing this variety is to see how teams might value certain things (or coincidentally simply excel in them), despite those things being less meaningful. Also, since no one measurement is fool-proof, diversifying all the possible aspects provides for a more accurate overall picture. For position players, the team statistics that were utilized were: Runs scored per a year Lambrianou 6 On-Base Percentage (OBP) Slugging Percentage (SLG) Isolated Power (ISO) Weighted On-Base Average (wOBA) Weighted Runs Created Plus (wRC+) Ultimate Base Running (UBR) Weighted Stolen Base Runs (wSB) Defensive Runs Saved (DRS) Ultimate Zone Rating (UZR) Wins Above Replacement (WAR) sorted by position players The first statistic listed above (Runs scored per year) is explained in its name. It is solved by taking the amount of runs that cross home plate over the designated number of years and then dividing that total number of runs scored by that number of years to find an annual average. The next two statistics, OBP and SLG, are slowly becoming more accepted by the majority of the baseball community. The first of these, OBP, is solved by the formula1: (1) Please note at this point that all abbreviations or further statistics found in formulas that have formulas of their own may be found in the appendix. This statistic essentially measures how well a batter reaches base (with exceptions like fielding errors or a fielder's choice). As Fangraphs 1Fangraphs, "OBP." Online. http://www.fangraphs.com/library/offense/obp/. Lambrianou 7 terms it, OBP measures the “most important thing a batter can do at the plate: not make an out.”2 The formula for the next statistic, SLG, is solved by3: (2) The original purpose of SLG was to help measure a player's power output, but, as is apparent in the formula provided, it actually takes singles into account and makes assumptions about what each type of hit is worth against the other type of hits. The criticism against that last point is that, when studied, hits like a double are not worth twice as much as a single, and the rest of the weights do not match up; so, it is not perfect in its attempt exactly. Nevertheless, SLG is commonly added to the OBP to produce the On-Base Plus Slugging (OPS) statistic. OPS provides a quick snapshot of a player's offensive contributions, but as you see the statistic is not listed above. That is because SLG and OBP are typically not seen as equal measures; just as the hits in SLG are not equal (OBP is worth approximately 1.8 times more than SLG). For all those reasons, wOBA, the next statistic, instead of OPS, serves as a statistic that “combines all the aspects of hitting into one metric...more accurately and comprehensively.”4 The statistic notes that all “hits are not created equal,”5 and thus when combining everything into one it weighs them in proportion to the different actual run values. These weights change annually. To provide an example of how it is typically calculated, 2012's formula may be used6: (3) – 2Fangraphs, "OBP." 3BaseballReference, "Slugging Percentage." Online. http://www.baseball- reference.com/bullpen/Slugging_percentage. 4Fangraphs, "wOBA." Online. http://www.fangraphs.com/library/offense/woba/. 5Fangraphs, "wOBA." 6Fangraphs, "wOBA." Lambrianou 8 The formula immediately notes that, based on the research of Tom Tango – author of The Book, and developer of wOBA and other handy statistics, the weights from SLG do not follow over. For example, the SLG measure had a single worth half as much as a double and one fourth of the value of a home run. In contrast, the research behind wOBA found that a single is worth approximately 70% of the value of a double and 43% of the value of a home run. Equation (3) - (see above), does not take stolen bases (SB) and “caught stealing” (CS) into account. Instead, it essentially takes everything that comes from the possible value produced strictly from performance at the plate. Equation (3) can be used to “determine an ideal batting lineup.”7 The modified version of Equation (3) used in this study, however, does take SB and CS into account, weighing SB at about 0.20 to 0.25 and CS numbers at about -0.40 to -0.50 (caught stealing is counted as being more negative than successfully stealing a base, which is counted as being positive – since they do not match up one to one towards a team's run numbers)8. The amount of weight is determined via linear run estimators (or the term most often used – linear weights) which are calculated from analyzed sample data in order to measure how much a team can possibly score as a result of an event. One of the only slight downfalls of using wOBA is that it is not park-adjusted (and league-adjusted), but that is accounted for in the next measurement, wRC+, a statistic based on Bill James' ideas, again created by Tom Tango.