LEFT-HANDED IN MAJOR LEAGUE

A THESIS

Presented to

The Faculty of the Department of Economics and Business

The Colorado College

In Partial Fulfillment of the Requirements for the Degree

Bachelor of Arts

By

Theodore S. Behrend

April 2012

LEFT-HANDED PITCHERS IN

Theodore S. Behrend

April 2012

Economics

Abstract

This thesis is designed to explain the number of left-handed pitchers in Major League Baseball (MLB). The main focus behind this study is to determine the optimal number of left-handed pitchers a MLB team should employ to maximize success. The MLB has far more left-handed players on average compared to left-handed people in society so that leads us to ask why. The hypothesis of this thesis is Major League Baseball teams should employ between 18% and 30.1% left-handed pitchers to maximize their success. I am measuring success by the team’s regular season number of earned runs against and the team’s winning percentage. This study will take all 30 MLB teams in to account and look over a time period of the last 20 years. There will be two separate regressions in this thesis, one to see the affect left-handed pitchers have on earned runs against and one to evaluate left-handed pitchers effect on winning percentage. I will also use independent variables such as , walks, homeruns against, yearlong payroll, attendance, pitchers average age and many others to see what makes the largest difference on a team’s success.

KEYWORDS: (Major League Baseball, Left-handed Pitchers, Winning Percentage, Earned Runs Against)

TABLE OF CONTENTS

ABSTRACT iii ACKNOWLEDGEMENTS iv 1 INTRODUCTION 1

2 THEORY AND LITERATURE SURVEY 6 2.1 Monopsony…………………………………………………………………. 6 2.2 Determinants of Winning…………………………………………………... 11 2.3 Left-handed Pitchers……………………………………………………….. 14 2.4 Conclusion………………………………………………………………….. 18

3 MODELS AND DATA 20 3.1 Model 1…………………………………………………………………… 20 3.2 Model 2…………………………………………………………………… 25 3.3 Data Resource……………………………………………………………. 25 3.4 Data Analysis……………………………………………………………… 26 3.5 Conclusion…………………………………………………………………. 34

4 RESULTS 36 4.1 Model 1…………………………………………………………………….. 36 4.2 Model 2…………………………………………………………………….. 42 4.3 Conclusion………………………………………………………………….. 47

5 CONCLUSION 48 5.1 Regression Conclusions…………………………………………………… 48 5.2 Limitations of the Study and Possible Future Research…………………… 50

SOURCES CONSULTED 51

LIST OF TABLES

2.1 Monopsony Graph…………………………………………………………… 8

3.1 Descriptive Statistics …………………………….………………………….. 27

3.2 Correlation Table…………..………………………………………………… 30

3.3 Expected Signs for Independent Variables for Model 1…………………….. 33

3.4 Expected Signs for Independent Variables for Model 2…………………….. 34

4.1 OLS Regression Results for Model 1………………………………………... 37

4.2 Fixed Effects Regression for Model 1……………………………………….. 41

4.3 OLS Regression Result for Model 2………………………………………. 43

4.4 Fixed Effects Regression for Model 2……………………………………….. 46

ACKNOWLEDGEMENTS

I would like to thank my thesis advisor Vibha Kapuria-Foreman for her guidance throughout this thesis. I would also like to thank Jeff Moore for of all his support. Finally I would like to thank my parents for everything they have sacrificed in order to support me. I am truly lucky to have you as parents.

CHAPTER I

INTRODUCTION

Major League Baseball is arguably the most popular professional sport in the

United States and has been for over a century. Baseball is often referred to as America’s pastime. Many have said that defense wins championships, and in baseball, defense starts with pitching. The goal of the team is to win. This study will examine a few different aspects of left-handed pitchers in the MLB. The first thing that will be studied is how the number of left-handed pitchers affects overall winning percentage during the regular season. This is to show how left-handed pitchers affect the team as a whole. To examine just the defensive side of a team’s success, this study will look at how left- handed pitchers affect earned runs against. Lastly, a number of other variables will be taken into account to see their correlation with team winning percentage and earned runs against.

Left-handed players have an advantage in baseball, both in pitching and hitting.

That may help explain why in society 10% of the population is left-handed compared to the MLB where 25% of the players are left-handed. An article written in The Japan

Times1 presented statistics showing the success of left-handed pitchers over right handed pitchers. The article stated that in the 2007 major league baseball season, “Left-handers .272 against right-handed pitchers. Righties vs. righties hit .261. Against left-handed

1 Peters, Dave. "Left-handers have Advantage in Baseball, Researcher Concludes." The Japan Times Online (July 2008) 1

2 pitching, righties hit .281, lefties just .251. But there were 122,053 at-bats against right- handed pitchers last season, nearly three times as many as the 45,730 against lefties”.

One can easily see that left-handed pitchers have a significant advantage over left-handed batters although right-handed batters have an advantage over left-handed pitchers. This is common knowledge in baseball; the has the advantage when the batter and pitcher are both either lefties or righties. It is said that a left-handed batter can see the ball better when the pitch is coming from a right-hander compared to a left-hander.

When a left-handed pitcher throws a curve ball, the ball curves towards a right-handed batter and away from a left-handed batter which makes the pitch harder to hit for a lefty because of depth perception. It is easier for a batter to hit the ball when it curving towards them instead of away from them.

Another inarguable advantage that left-handed pitchers have is ability to watch runners on first base while pitching. Right-handed pitchers face towards third base while pitching. Gerry Fraley wrote an article published in the 2000 edition of Baseball Digest that further explains this advantage among others;

The left-hander holds runners at first more effectively than right-handers because he faces the base from the stretch. The shorter lead decreases the possibility of a runner going from first to third on a hit. The left-hander usually faces right- handed hitters. Those batters are less likely to take advantage of the hole on the right side, caused by the first baseman holding the runner, than left-handed hitters. There is also the neutralization factor. Left-handed hitters have advantages equal to what left-handed pitchers enjoy. The best way to stop left-handed hitters is with left-handed pitchers. Through the last generation, left-handed hitters annually hit about 25 points lower against left-handed pitchers than against right-handed pitchers.2

General Managers don’t want all left-handed pitchers on their team because it simply doesn’t make sense when the majority of batters in the MLB are right-handed.

2 Gerry Fraley, “Southpaws,” Baseball Digest 59, no. 8 (August 2000): 52-59. p.52 3

The statistics in the previous paragraph show that right-handed batters do significantly better against a left-handed pitcher compared to a right-handed pitcher. The other problem with trying to employ all left-handed pitchers is that there aren’t that many of them. It is much more difficult to find a group of skilled left-handed pitchers compared to a group of skilled right-handed pitchers. As stated before, only one out of every ten people in society throw left handed, that means by looking strictly at the numbers, for every one talented left-handed pitcher, there should be nine just as talented right-handed pitchers.

Skilled starting left-handed pitchers are much rarer than skilled starting right- handed pitchers so it makes sense that they are paid more. It is simple economics, the law of supply and demand. There is a much lower supply of left-handers but there is still a demand for them so price rises. The question is do major league baseball teams see an increase in winning because of these left-handed pitchers? This question will be assessed after all the data has been accounted for and the numbers have been crunched.

The two main focuses of this thesis are to study the effects left-handed pitchers have on a MLB team’s overall performance and determine if those pitchers should be hired or released just because of their handedness. First, the effects of left-handed pitchers on winning percentage will be assessed. Second, the effects of left-handed pitchers on earned runs against will be evaluated. General Managers could use this information to decide how many left-handed pitchers they want on their roster from year to year. They can also decide whether or not it is worth spending extra money just so they have the number of left-handed pitchers that they want on their team. This information could tell how many left-handed pitchers the coaches should keep in their 4 starting rotation to maximize success. This thesis is important because of these reasons, among many others.

Left-handed pitchers are an essential part of a baseball team and over the years have become more commonly seen. There have been a plethora of studies done on professional sports and player production relating contract, age, race, and an abundance of other characteristics. Despite the importance of having left-handed pitchers, there really has never been a study to assess their importance in the game of baseball. Previous literature written on handedness, player production and baseball salaries will provide valuable background information for this thesis.

There are many variables that must be accounted for in order to find the true value of a left-handed pitcher. Numerous statistics other than just percentage of left-handed pitchers will be evaluated such as winning percentage, earned runs against, strikeouts, walks, hits against, homeruns against, average pitcher age on a team, defensive errors, along with others. Other variables that will also be looked at that don’t show up on an average stat sheet are team salary and yearlong attendance.

In the end, the aim of this study is to examine how many left-handed pitchers should be employed to maximize success. Success in this thesis is measured by regular season winning percentage and earned runs against. The regressions in this thesis will test my hypothesis that teams should employ between 18% and 30.1% left-handed pitchers in order to maximize success. Most major league baseball teams have five pitchers in their starting rotation but usually it is not the same five throughout the entire season. This study will account for the starting pitchers with the most games played.

This study will also look at the relieving pitchers with the most . Lastly 5 the main closer for a team will also be accounted for and will be included with the relieving pitchers. In the end, data for four to six starting pitchers and five relieving pitchers (including the closer) will be evaluated. Essentially my hypothesis is that a team should employ two or three left-handed pitchers if they have ten or eleven main pitchers and should employ two left-handed pitchers if they have nine main pitchers in order to maximize success.

The following chapter will introduce and evaluate significant economic theory and explain the methodology of the data being used in this study. Chapter three will describe the two models used in this thesis and explain all details including variables used to the regressions. The third chapter will also illuminate the data set that was collected and illustrate statistical tables showing summery of the data, correlation between data and estimated results. Chapter four will then examine the regressions and the meaning of the results of the regressions. The final chapter will conclude the thesis with an overview of the results and suggest possible future studies on the topic. CHAPTER II

THEORY AND LITERATURE SURVEY

The purpose of this chapter is to introduce previous economic theory relevant to monopsony in major league baseball, determinants of success in major league baseball and left-handed pitchers in major league baseball. The following chapter will explain the regressions being ran in this thesis along with all of the variables used. Chapter three will also explain the data set being used for this thesis.

Monopsony

Major league baseball teams, which act collectively, are the only firms who buy professional baseball players in the United States and Canada. The MLB is a monopsony because of this. A monopsony is a lone buyer in a market and chooses a price-quantity combination from the industry supply curve that maximizes its profits. A monopsony is different from a competitive market because in a monopsony, there is only one employer or buyer in the market compared to a competitive market where there are many employers or buyers. Because in a monopsony there is only one buyer or employer, that buyer or employer is able to set the price. Jeffrey M. Perloff1 states, “A monopsony is the mirror image of a monopoly, and it exercises its market power by buying at a price below the price that a competitive buyer would pay”.

1 Jeffrey Perloff, Economics Third Edition, p. 141 (Berkeley: University of California, 2004). 6

7

On the following page there is a monopsony graph shown in figure 2.1. Below the graph, a table shows what areas of the graph make up for consumer surplus, producer surplus and welfare. In a monopsony, the marginal expenditure curve or the monopsony’s marginal cost of buying one more unit is above the upward sloping market supply curve.

A monopsony is in equilibrium where the marginal expenditure (ME) curve intersects the demand curve. This differs from a competitive market because a competitive market is at equilibrium where the supply and demand curves intersect. Marginal expenditure is the amount of money spent on a unit. Marginal revenue product is the amount of revenue that is received because of a single unit or labor purchased. In Figure 2.1, Pm stands for monopsony market price, Pc represents the competitive market price, Qm symbolizes monopsony market quantity and Qc signifies competitive market quantity.

The graph in Figure 2.1 shows how price and quantity is lower in a monopsony compared to a competitive market. A monopsony does this to maximize their profits. The gap between marginal revenue product and Pm is referred to as monopolist exploitation. 8

FIGURE 2.1

MONOPSONY GRAPH

Competition Monopsony Change

Consumer Surplus, A + B + C + D A + B + C + E + F (E + F) – D = ∆ CS

CS

Producer Surplus, PS E + F + G + H + I H + I -E -F – G = ∆ PS

Welfare, W=CS+PS A + B + C + D + E A + B + C + E + F -D – G = ∆ DWL

+ F + G + H + I + H + I

Source: Jeffrey Perloff, Economics Third Edition, (Berkeley: University of

California, 2004). P. 143 9

A monopsony has a dead weight loss of D and G which is non-existent in a competitive market. A couple examples of monopsonies are a United States weapon manufacturer that can only legally sell to the federal government or coal miners in a small company town in a secluded area where there is only one main employer. In major league baseball the consumer surplus or in this case team owner’s surplus is larger than the producer surplus or baseball player’s surplus.

Monopsony was very prevalent in baseball years ago as Boal and Ransom explain in their article edited by Whaples called “Monopsony in the Labor Market”;

A striking example of monopsony in an American labor market is professional baseball. Until 1976, the "reserve clause" in player contracts bound each player to a single team, an extreme form of collusion. As a result, teams did not compete for players. Estimates by Scully (1974) and others indicate that rate of monopsonistic exploitation was very high during this era -- players were paid less than half of the value of their contribution to output, and possibly as little as one- seventh. After the reserve clause was eliminated in 1976, players with at least six years' experience became free to negotiate with other teams. Salaries subsequently soared. By 1989, the rate of exploitation was estimated to have fallen close to zero2

Now in major league baseball, players are not restricted to stay with the same team.

Players under free agency now can discuss and debate with multiple teams on how much they are going to be paid and how long their contract with a team will be before signing.

Gerald Scully, as mentioned in the previous quote, wrote “Pay and Performance in Major League Baseball”.3 This article was published two years before the reserve

2 Boal, William and Micheal Ransom, ed. Robert Whaples, Monopsony in American Labor Markets, http://eh.net/encyclopedia/article/boal.monopsony (February 2010).

3 Gerald Scully, “Pay and Performance in Major League Baseball,” The American Economic Review 64 no. 6 (December 1974, 915-930).

10 clause was removed, so players were not yet allowed to become free agents and were confined to one team. The purpose of his study was to measure the amount of economic loss to baseball players due to the restrictions of the reserve clause because of the owner’s monopsony power. Scully used marginal revenue product (MRP) to evaluate players and their salaries. He determined that baseball teams are set with a constant number of games but the quality of the games is measured by the team’s winning percentage which comes from two inputs, player’s skill and non-player inputs such as managers, coaches and team spirit. He then states; “Teams derive revenue essentially from two main sources: gate receipts and the sale of radio and television rights. Fans purchase tickets or watch televised games because they derive utility from seeing the home team win. We assert that both gate receipts and broadcast revenue are directly related to the team’s percent wins…fans attend or watch games to see the team win, not to see player skills per se”.4 Although Scully found that players were paid below their marginal revenue product many years ago, today teams pay their players pretty close to the marginal revenue product. This thesis will use the same logic that a team’s winning percentage plays a major role in revenues brought in through ticket sales and television and radio broadcasts. Over the recent years teams have created substantial revenue through other outlets such as the internet, smart phones and team apparel.

“The Underpayment of Restricted Players in North American Sports Leagues”5 was written in 2009 and considers whether underpaying players restricted by the reserve clause is common in the three largest sports leagues in North America. To be more

4 ibid, p. 917

5 Krautmann, Anthony C., Peter von Allmen, and David Bern. "The Underpayment of Restricted Players In North American Sports Leagues." International Journal of Sport Finance 4, no. 3 (2009): 161-175.

11 precise the study examined if owners of professional NFL, NBA and MLB teams exercise monopsony power over their players. Their results were consistent with their hypothesis that owners of professional sports teams do exercise monopsony power whenever they have the chance. Throughout all three sports, including baseball, restricted players are underpaid but when negotiating power of players rise the owners are not as easily able to take out a surplus.

Looking over the years, monopsony in baseball has definitely decreased. Even though it has decreased it is still prevalent on restricted players in the league. In the

MLB, all teams have the common goal of winning as many games as possible. With more wins, teams bring in more revenue through ticket sales, concessions, broadcast revenues and player memorabilia because people want and like to see teams win. Players can help raise their team winning percentage by playing well. This means that marginal revenue product would increase because the average unit of labor is now producing more wins and in turn more revenue. The following section is going to focus on determinants of winning in major league baseball.

Determinants of Winning

In 2010, Seth Gitter and Thomas Rhoads looked at the “Determinants of Minor

League Baseball Attendance”.6 In their study they looked at attendance and winning percentages for both minor league and major league baseball teams. Their hypothesis was that minor league attendance is affected by their affiliate major league teams winning percentage. They found several interesting results. They found that a minor league baseball team can be a substitute for a major league baseball team if they are within 100

6 Gitter, Seth and Thomas Rhoads, “Determinants of Minor League Baseball Attendance,” Journal of Sports Economics 6 (March 2010): 614-628. 12 miles of each other. For example, if a MLB team was to raise their ticket prices, many fans would attend minor league games instead of going to the MLB games if the teams were somewhat close to each other. Because of this they concluded that organizations should keep their major and minor league teams in close proximity to each other in order to maximize profits. Gitter and Rhoads also concluded that for all baseball teams, no matter what league they are in, attendance always rises when winning percentage increases. This proves that one way to get more people to come to baseball games is to win more, no matter what level teams are playing at. People always like to see teams win. This supports Scully’s study and from this we can assume that when attendance increases so do revenues.

A similar study that was conducted in 2008 evaluated “The Interaction between

Baseball Attendance and Winning Percentage”.7 The study done by Davis inspected whether attendance affects winning percentage or if winning percentage affect attendance in major league baseball. He found that for most teams, winning percentage and attendance seem to go together in that when the winning percentage is higher so is attendance and vice versa. Davis concluded that greater team success causes higher attendance and not the other way around similar to Gitter and Rhoads study, people go to see winning teams.

An additional determinant of winning was looked at when Eliji Yamamura wrote

“Team Payroll, Competitive Balance, and Team Performance of the Japan Professional

7 Davis, Michael C. "The Interaction between Baseball Attendance and Winning Percentage: A VAR Analysis." International Journal of Sport Finance 3, no. 1 (2008): 58-73.

13

Baseball League”.8 The main focus of this study was to look at how salaries affected team performance measured by winning percentage in Japanese professional baseball.

The study used panel data from years 1993 to 2004 for both Central and Pacific Leagues.

Yamamura found that salaries had a positive correlation with team performance on only one of the leagues and no effect on the other league. He came to the conclusion that the reason why there was such different results was that the popularity of one of the leagues decreases the incentive to allocate the resource of teams efficiently with an objective of win maximization.

A study done by R. Schulz and C. Curnow9 examined the age of which super athletes reach their peak performance in their sport. The study included a number of sports such as track and field, swimming, tennis, basketball, golf and of course baseball.

Schulz and Curnow found that over the last 90 years, age of peak performance has stayed constant in track and field, swimming, baseball and tennis. They discovered that athletes reach their peak performance in their early twenties if the tasks they must complete require strength, speed, and explosive power. If the athletes need more endurance, acquired skill and knowledge then they usually peak in their late twenties or early thirties.

Major league baseball pitchers must be strong, fast and explosive which categorizes them as peaking at an earlier age but they also must have endurance, acquired skill and knowledge which athletes appear to peak at an older age. The average age of a team’s pitcher will be interesting to see how it affects the success of the overall team.

8 Yamamura, Eiji. "Team Payroll, Competitive Balance, and Team Performance of the Japan Professional Baseball League." Empirical Economics Letters 7, no. 9 (2008): 909-916.

9 Schulz, R., and C. Curnow. "Peak Performance and Age Among Superathletes: Track and Field, Swimming, Baseball, Tennis, and Golf." Journal of gerontology 43, no. 5 (1988): 113-120.

14

The article “Value of Stealing Bases in Major League Baseball: ‘Stealing’ Runs and Wins”10, written by Herman Demmink III focused on how stealing bases can in turn raise the chances of winning. A small but significant positive coefficient on attempts with the dependent variable being team’s total wins confirms that stealing bases should help teams win more games. That means that if you stop opposing players from trying to steal bases, you also have a better chance at winning. As quoted in the introduction Gerry Fraley11 states “The left-hander holds runners at first more effectively than right-handers because he faces the base from the stretch”. With that being confirmed, Demmink III states “From an economist’s point of view, more attendance equals more revenue and more revenue is a sign of a good business. In the larger scheme, as ticket sales increase so do food and merchandise sales inside the ballpark. It doesn’t stop there, as fans travel to see there ‘favorite’ team play, local economies also see an increase in sales volume”12. As hard as it may be to believe, when a team tries to steal more bases, it not only helps their team win games but also can help the local economy.

Left-handed Pitchers

Addison DeBoer wrote a thesis at Colorado College in 2010 that evaluated left- handed hitters in major league baseball.13 He used data from all 30 major league baseball teams and looked at 10 years of data. He analyzed runs scored by a team and season long

10 Demmink, Herman. "Value of Stealing Bases in Major League Baseball: "Stealing" Runs and Wins ." Public Choice 142, no. 3-4 (2010): 497-505.

11 Gerry Fraley, “Southpaws,” Baseball Digest 59, no. 8 (August 2000): 52-59.

12 Ibid, p 504

13 Addison DeBoer, “Baseball and the Left Handed Hitter,” Colorado College Economics Thesis (2010).

15 winning percentage. His hypothesis was that a team should employ between 33% and

55% of their hitters to be left-handed in order to achieve the optimal rate of success. His thesis model was able to prove several different variables do significantly affect runs scored and winning percentage but the results were inconclusive in relating left-handed hitting to either dependent variable.

In 1989, Wood and Aggleton wrote an article called “Handedness in ‘fast ball; sports: Do left-handers have an innate advantage?”14. The authors looked at cricket, tennis and soccer and examined if left-handed athletes had any innate advantage over right-handed athletes. They did find that in professional cricket, there are a relatively high proportion of cricketers who bowled left-handed. Through their research they concluded that there is no supposed neurological advantage to being left-handed and the only reason they saw any excess left-handers in these sports is because of the nature of the game. Because of this study we can assume that left-handed pitchers do not have a neurological advantage over right-handed pitchers. The only advantages lefty pitchers have in baseball are because of the way the game is set up.

An article that focuses on the advantages of left-handed pitchers in baseball was done in 2000 by Fraley15. He not only examines numbers but also non-statistical attributes of left-handers. He quotes famous baseball players and their beliefs on lefty pitchers. In the article Southpaws, he added some statistics to display the benefits of left- handed pitchers such as, “Left-handed starters win more often than right-handed starters.

In the over the past three seasons (1997-1999), left-handed starters had

14 Wood, C.J. and J.P. Aggleton, “Handedness in ‘Fast Ball’ Sports: Do Left-handers have an Innate Advantage?,” British Journal of Psychology 80 no. 2 (May 1989) 227-241.

15 Gerry Fraley, “Southpaws,” Baseball Digest 59 no. 8 (August 2000) 52-57.

16 a winning record (608-581) and a 4.49 ERA. In that same span, right-handed starters had a losing record (1,773-1,893) and a 4.98 ERA”16. ERA stands for earned runs against.

Earned runs against are the number of runs scored against a team without counting runs scored because of errors in the field.

Another relevant article is “Do Southpaws Get a Fair Shake in MLB? Part Two:

Pitchers”.17 J.C. Bradbury looks at salary differences between left-handed and right- handed pitchers. He estimates the impact of pitching performance by looking at statistics such as strikeouts, walks, home runs, innings pitched and age. Bradbury finds that left- handed starting pitchers on average are paid more than right-handed starting pitchers in major league baseball. A left-handed starting pitcher makes on average $233,000 more than an equally skilled right-handed pitcher. He also finds that right-handed relieving pitchers are paid more compared to left-handed relieving pitchers. On average left- handed relieving pitchers are paid $209,000 less than equally skilled right-handed pitchers.

Bradbury also did a study in 2007 that determined if pitchers were paid accordingly to their contributions on the field. The name of the article is “Does the

Baseball Labor Market Properly Value Pitchers?”18. He used the marginal revenue product of pitchers based on player statistics to determine deserved salaries. His conclusion was that pitchers are paid according to their individual contributions.

Bradbury did not look at the handedness of pitchers or put groups of pitchers into any

16 ibid, 52

17 J.C. Bradbury, “Do Southpaws Get a Fair Shake in MLB? Part Two: Pitchers,” Sabernomics (August 2006).

18 J.C. Bradbury, “Does the Baseball Labor Market Properly Value Pitchers?,” Journal of Sports Economics 8 no. 6 (2007): 616-632.

17 categories but looked at all pitchers as a whole. Depending on how the data turns out for left-handed pitchers success in this thesis, I may be able to assess if it is worth it for teams to pay that extra money for a left-handed starting pitcher.

A study done by Goldstein and Young19 looks into how the number of left-handed players in the MLB has risen but is now starting to plateau over the last few decades. The study also explains the advantages of different handedness in baseball. They then try to explain why the number of left and right-handed players has changed and evolved. They found that this evolution has most likely happened because of advantages of pitchers or batters based on handedness of the players. As explained before, either the pitcher or the hitter has an advantage depending on which arm the pitcher throws with and what side of the plate the batter hits from match up. Teams began to employ more left-handed batters to try to gain a competitive offensive advantage over pitchers. After more left-handed hitters were employed, over the years, more left-handed pitchers were employed to compensate. Although, Goldstein and Young did not elaborate on why the number of left and right handed pitchers and hitter will not eventually level out to 50% of each, there is reason to believe that this will never happen because the supply of skilled left- handed players is much smaller than the number of skilled right-handed players. This study is very relevant to the present thesis because it illuminates reasons why over the past century the number of left-handers has risen but now has started to stay more constant. Because the percentage of lefty pitchers has started to plateau over the last few decades this gives reason to believe that there is an optimal number of left-handed pitchers to employ on a major league baseball team. Their prediction is that left-handed

19 Goldstein, Stephen and Charlotte Young, “’Evolutionary’ Stable Strategy of Handedness in Major League Baseball,” US: American Psychological Association 110 no. 2 (June 1996): 164-169. 18 pitchers will end up stabilizing at 31%. Goldstein and Young could not conclude that this will maximize the success for teams defensively.

“Game Theory and Professional Baseball: Mixed-Strategy Models”, an article written by Flanagan in 199820, uses two-person game theory to model the contest between batter and pitcher in major-league baseball. Game theory has rarely been applied to sporting events, particularly baseball, even though most sporting events have all the essential elements needed. Flanagan looked at the matchup between the pitcher and batter. He decided that the pitcher can throw with his left or right hand and the batter can hit from the left or right side or the plate. He analyzed average data to generate a solution in mixed strategies to predict the empirical proportions of right and left-handed batter rationally but was not as successful predicting the proportion of right and left-handed pitchers. He found that the problem in predicting the proportion of pitchers is because the shortage of left-handed pitchers. This was not as much of a problem because batters potentially have the ability to change which side of the plate they hit from whereas pitchers are basically born either throwing with their right or left arm.

In the end, the previously mentioned studies play a major role in the theory used for this thesis. Without all the knowledge gained from these studies, this thesis would not be possible.

Conclusion

This chapter explained all relevant studies that were taken into account in order to come up with the models presented. Previous literature on monopsony in major league baseball, determinants of winning, and the left-handed pitcher were all examined. The

20 Thomas Flanagan, “Game Theory and Professional Baseball: Mixed-Strategy Models,” Journal of Sport Behavior 21 no. 2 (June 1998) 121-138. 19 models were described along with all the variables in the models. The following chapter will examine models being used and the data set for this study. CHAPTER III

MODELS AND DATA

This chapter will explain the regressions being ran in this thesis. The models will be described along with all of the variables used in them. After the models are explained, the data set that was collected for this thesis will be described. The chapter will also illustrate statistical tables showing summary of the data, correlation between data and estimated results.

Below is the equation used for model 1 with the dependent variable, winning percentage.

Model 1

Winning Percentage = α + β1 Hits Allowed + β2 Homeruns Allowed + β3 Runs Allowed

+ β4 Walks + β5 Strikeouts + β6 Errors + β7 Pitchers Average Age + β8 Percent of Left-

handed Starters + β9 Percent of Left-handed Relievers + β10 Percent of Left-handed

Pitchers + β11 Attendance + β12 Payroll + β13 Ratio + ε

In the first model, the dependent variable is winning percentage (wp). Winning percentage is the number of games won divided by the number of games played in a season. The winning percentage can tell a lot about a team. By looking at this one simple statistic, one can quickly find out how well a team preformed that season.

20

21

Winning percentage is one of the dependent variables in this thesis because it will show how important all the independent variables are to a team’s success.

The first independent variable, found in both models is hits allowed (H). A hit is when a player on the offensive team safely reaches at least first base. This means the offensive player or batter must successfully make contact with the ball and hit it into fair territory without the defensive team catching the ball before it hits the ground. After this is completed the batter must run to first base before being thrown out or tagged by the defensive team to complete a hit. The number of hits allowed is an important variable because it is closely correlated to how well a team’s pitching staff performs. My prediction is that hits allowed will have a negative relationship with winning percentage.

The simple idea behind that is the more hits a team allows, the more chances for those base runners score occurs.

There are four different types of hits in baseball. There is a single, which was described above when a batter makes it unharmed to first base, a is when the batter makes it second base and when the batter makes it to third base it is called a .

The fourth and most dangerous kind of hit is a homerun (HR), which is the second independent variable. A can occur two different ways. The most common type of homerun is when the hitter hits the ball over the fence in fair territory. When this happens, the batter is clear to trot around all four bases, scoring a run with no chance of being tagged or called out. The other possible way a hitter can be credited with a homerun is by hitting the ball in play and successfully running around all four bases without being forced out. This is known as the inside the park homerun. Similar to hits,

I expect a negative correlation with winning percentage. 22

Runs allowed (r), is simply the number of runs allowed by the defense. The ultimate goal of the game is to score more runs than the other team, which is easier to do when you don’t allow many runs in the first place. Unlike earned runs, it doesn’t matter whether or not someone makes an which causes the run to be scored. If the team is defensively sound the runs allowed amount shouldn’t be much higher than earned runs allowed. Obviously the expect sign for runs allowed is negative for a team’s winning percentage.

A walk (bb) is one of the independent variables used in this thesis that shows competition between the pitcher and batter. The walk is also referred to as .

A walk occurs when the pitcher throws four pitches called balls outside the strike zone.

When the pitcher walks a batter, the batter is rewarded by a free walk to first base.

Intentional walks are not accounted for in this statistic because if the pitcher purposely walks the hitter, it is done due to defensive strategy. Comparable to hits, homeruns and runs, I expect walks to have a negative correlation with winning percentage.

Another independent variable that solely relies on the competition between the pitcher and batter is the (so). A strikeout occurs when the pitcher throws three strikes to the batter, making an out. A pitcher that is dominant in his position has the power to make many outs by himself by striking out hitters. With more strikes outs, there is less allowance of opposing players getting on base so I expect strikeouts to have a positive relationship with winning percentage.

In baseball an error (e) is committed when a defensive player misplays the ball in a matter that allows a hitter or base runner to reach at least one more additional base. An example of this would be if a fly ball was hit directly to an outfielder and the outfield 23 misses catch, allowing the batter to reach first base. The official scorer must use his or her judgment on whether or not the defensive player should have made the player. If the official scorer decides that the player could have made the play but didn’t then that player is credited with an error on the play. The expected sign for errors is negative in the first model.

Pitchers average age (pitchage) is the next independent variable. This statistic is simply the average age of all the pitchers on a team. This variable is included because it will show whether or not experience plays a role in how well a team’s performs. I am assuming that a team with a higher average age of a pitching staff will also have more experience because those pitchers have had more time to gain it. I expect that pitchers average age will have a positive correlation with winning percentage.

Each team in baseball has a starting rotation for pitchers. Most teams have four or five starting pitchers that will rotate nights in which they pitch. For this study, the percentage of left-handed starting pitchers (plhstart) is just the number of left-handed starting pitcher divided by the overall number of left-handed pitchers. The exact same statistic is taken for percent of left-handed relieving pitchers (plhrelievers). For each team, the five relievers with the most innings pitched will be examined. The teams overall percent of left-handed pitchers (plhpitch) will also be examined. For this statistic, the number of left-handed starting and relieving pitchers will be counted then divided by the overall number of starting and relieving pitchers for each team.

Fielding Percentage (fldp) is the percentage of times a defensive player correctly handles a batted or thrown ball. Initially was in model 1 but due to a 24 very high correlation with errors, it was decided to be dropped. More explanation of this will be provided in the data chapter.

Attendance (attendance) and Payroll (payroll) are the two variables in the first model are statistics that do not account for how a team plays on the field. Attendance is just the overall number of people that came to home games in a season. Payroll is how much a team spent on players for a single season. These are both statistics that may not may not show up on a players stat sheet but may have an influence on winning percentage. For both variables, I expect the relationship to be positive with winning percentage.

The final variable is ratio. Ratio is a dummy variable that looks at if the percentage of left-handed pitchers is in between 18% and 30.1%. As stated before, my hypothesis is that teams should employ between 18% and 30.1% left-handed pitchers to maximize team success. If a team has between 18% and 30.1% left-handed pitchers they will receive a 1. If a team does not have between 18% and 30.1% left-handed pitchers they will receive a 0. In the end this dummy variable will show if having this certain range of left-handed pitchers will positively, negatively or have no affect winning percentage. Because left-handed batters have more trouble hitting against left-handed pitchers I would expect this variable to have a positive relationship with winning percentage.

25

Model 2

Earned Runs Allowed = α + β1 Hits Allowed + β2 Homeruns Allowed + β3 Walks + β4

Strikeouts + β5 Pitchers Average Age + β6 Percent of Left-handed Starters + β7 Percent

of Left-handed Relievers + β8 Percent of Left-handed Pitchers + β9 Attendance + β10

Payroll + β 11 Ratio + ε

In the second model, the dependent variable is earned runs allowed (er). An is when a player on the offensive team makes it safely around all four bases without the defensive team making an error. It is said that the pitcher is held accountable for earned runs because the run scored is a result of pitching since there is no errors committed by any of the defensive players on the field. By looking at earned runs against, it shows how well your pitching staff did to prevent other teams from scoring.

All of the independent variables were previously described in the Model 1 section. There are two separate dependent variables in this thesis to research two similar but different things. Model 1 will help explain what factors into a team’s winning percentage. Model

2 illuminates what it takes for a team to have a successful pitching staff. As stated before, the dependent variable in model 2, earned runs allowed, is a statistic that expresses how well a pitching staff is performing.

Data Resource

All of the data used for this thesis was collected from the website www.baseball- reference.com1. This website allowed for the collection of the most recent two decades

(1992 to 2011) of data for all thirty baseball teams. Of course some teams did not exist

1 "Baseball-Reference." Available from http://www.baseball-reference.com/. 26 two decades ago so the data was gathered from the first year they became a team to year

2011. For example, the and Florida Marlins did not have a team until

1993 so there is 19 years of data for those two teams. This thesis included statistics from regular season games only. That means that preseason and playoff statistics were not accounted for.

Data Analysis

Table 3.1 is a descriptive statistics table for all of the variables used in this thesis.

The table shows mean, standard deviation, minimum and maximum of each variable.

27

TABLE 3.1

DESCRIPTIVE STATISTICS

Variable Abbreviation Mean Std. Dev. Min Max

Winning Percentage wp 0.5 0.07 0.265 0.716

Earned Runs Against er 686.32 92.32 407 1015

Hits Against h 1442.47 125.63 929 1734

Runs Against r 749.54 98.54 448 1103

Homeruns Against hr 163.76 29.52 76 241

Walks bb 535.06 72.95 288 784

Strikeouts so 1032.14 137.05 560 1404

(Hits + Walks) whip 1.40 0.09 1.17 1.733 /Innings

Errors e 107.07 18.57 57 174

Average Pitchers Age pitchage 28.63 1.48 24.5 34.2

Attendance attendance 2,341,532 747,692.5 642,745 4,483,350

Payroll payroll $61,800,000 $34,100,000 $9,373,044 $208,306,817

Percent of Left- plhstart .276 .182 0 .8 handed Starters

Percent of Left- plhrelievers .287 .148 0 .8 handed Relievers

Overall Percent of plhpitch .281 .114 0 .7 Left-handed Pitchers

Are there between Ratio .576 .494 0 1 18% - 30.1% Left- handed Pitchers Employed

28

In table 3.1 the analysis of each variable is shown with few abnormalities. One of the surprising statistics found is that there is a very wide range from minimum to maximum of percentage of starting left-handed pitchers, percentage of left-handed relievers and overall percentage of left-handed pitchers. This statistic shows that there were teams that used no left-handed pitchers at all and teams that used majority of left- handed pitchers but no teams used all left-handed starting or relieving pitchers. In 2006 the did not have a single left-handed pitcher that played enough games to be considered in the starting rotation or one of their top five relieving pitchers.

On the other hand, in 2008, the Pittsburg Pirates had a pitching staff of mostly lefties.

Four out of five of their starting pitchers and three out of their five relieving pitchers were left-handed.

Another interesting statistic is the standard deviation of payroll being

$34,100,000. There is also a major difference in the minimum and maximum of payroll.

The difference between the two is $198,626,956 which is an enormous variance. In 2005 the , known for their notoriously high payroll spent over 208 million dollars on their players while the 1992 Cleveland Indian spent just over 9.3 million. The

Yankees went on to take first place in the AL East with a record of 95 wins and 67 losses whereas the Indians documented a losing record of 76-86 with the lowest attendance in the league. Although the time period is 13 years different between the two, it is still a staggering statistic.

An additional surprising statistic that was realized after looking at the descriptive statistics table was the difference in the maximum and minimum attendance. The 2001

Washington Nationals claim the lowest attendance in the last two decades with just 29

642,745 fans. In 1993 the Colorado Rockies almost had 7 times the number of fans with

4,483,350 in their first year in Major League Baseball. The rest of the descriptive statistics look as they were expected to look.

Table 3.2, shown on the following page, is a correlation table that includes all of the variables that were used in this thesis.

30

TABLE 3.2

CORRELATION TABLE

wp h hr r bb so e

wp 1.0000 h -0.3453 1.0000 hr -0.3169 0.6588 1.0000 r -0.5388 0.8616 0.7770 1.0000 bb -0.3793 0.4870 0.4003 0.6408 1.0000 so 0.2946 0.1298 0.0766 -0.0396 0.1787 1.0000 e -0.3263 0.3726 0.1093 0.3939 0.2978 -0.0354 1.0000 pitchage 0.4051 -0.0804 -0.0579 -0.1589 -0.1683 0.1352 -0.1479 plhrelievers 0.0303 0.0111 -0.0306 -0.0169 0.0143 -0.0468 -0.0103 plhstart 0.0344 -0.0414 -0.0636 -0.0637 -0.0591 -0.0001 -0.0723 plhpitch 0.0470 -0.0268 -0.0704 -0.0624 -0.0412 -0.0324 -0.0657 attendance 0.4695 -0.0194 -0.0412 -0.1446 -0.0410 0.3698 -0.1924 payroll 0.3067 0.0358 0.0439 -0.0808 -0.0803 0.4513 -0.3060 er -0.5151 0.8601 0.7995 0.9893 0.6346 -0.0292 0.2919 fldp 0.3529 -0.0832 0.0190 -0.2373 -0.1422 0.2491 -0.9059 whip -0.6082 0.5119 0.5033 0.7780 0.6045 -0.4107 0.2554 Ratio -0.0253 0.0531 0.0229 0.0547 0.0280 -0.0325 0.1085

pitchage plhrel~s plhstart plhpitch atten~ce payroll er

pitchage 1.0000 plhrelievers 0.0507 1.0000 plhstart -0.0458 -0.0570 1.0000 plhpitch -0.0043 0.5845 0.7738 1.0000 attendance 0.5310 0.0077 -0.0557 -0.0459 1.0000 payroll 0.4832 -0.0193 -0.0074 -0.0195 0.6247 1.0000 er -0.1399 -0.0092 -0.0573 -0.0527 -0.1167 -0.0431 1.0000 fldp 0.1588 0.0181 0.0764 0.0735 0.2858 0.3947 -0.1324 whip -0.2092 0.0155 -0.0573 -0.0364 -0.2870 -0.2702 0.7692 Ratio -0.0864 0.0276 -0.0726 -0.0413 -0.0239 -0.0860 0.0459

fldp whip Ratio

fldp 1.0000 whip -0.3167 1.0000 Ratio -0.0900 0.0293 1.0000

31

In table 3.2, the correlation table, there is very high correlation between earned runs against (er) and runs against (r). As one can see in the previous table, there is .9893 correlation between the two variables. Because of this statistic it was decided that earned runs against and runs against should not be included in the same regressions. The high correlation between run against and earned runs against is understandable because usually the more runs a team allows, that team will also allow a similar number of earned runs

Another high correlation was found between errors and fielding percentage. The initial model 1 consisted of both of these variables until the high correlation was found.

As explained in the previous chapter, fielding percentage is the percentage of times a defensive player correctly handles a batted or thrown ball. When a defensive player makes an error, it lowers their fielding percentage. Now just errors are included in model

1 and fielding percentage was taken out. The correlation between the two variables is

0.9059.

The last variable that was initially in both models and was omitted was whip, also known as walks and runs per inning. This statistic was taken out because the two variables that combine together to make up whip are already in both of the models.

There is really no need for whip to be in either of the models as long as hits allowed and walks are still in the equations.

The next two tables, table 3.3 and table 3.4, explain what the predictions are for each independent variable. The purpose of these tables is to try to predict whether the independent variables should have a positive or negative affect on the dependent variables. For table 3.3, because the dependent variable in model 1 is winning percentage, 32 a negative under the predicted sign column would mean that the independent variable has a negative effect on winning percentage. As for table 3.4, the dependent variable in model 2 is earned runs against so if it says positive in the predicted sign column that means the independent variable has a positive effect on earned runs against.

33

TABLE 3.3

EXPECTED SIGNS FOR INDEPENDENT VARIABLES FOR MODEL 1

Independent Variable Predicted Sign

Hits Allowed Negative

Runs Allowed Negative

Homeruns Allowed Negative

Walks Negative

Strikeouts Positive

(Hits+Walks)/Inning Negative

Errors Negative

Fielding Percentage Positive

Average Pitchers Age Unknown

Attendance Positive

Payroll Positive

Percent of Left-handed Starting Pitchers Positive

Percent of Left-handed Relieving Pitchers Positive

Overall Percent of Left-handed Pitchers Positive

Ratio (Between 18%-30.1% Left-handed Pitchers Positive Employed)

34

TABLE 3.4

EXPECTED SIGNS FOR INDEPENDENT VARIABLES FOR MODEL 2

Independent Variable Predicted Sign

Hits Allowed Positive

Homeruns Allowed Positive

Walks Positive

Strikeouts Negative

Hits and Walks Allowed Per Inning Positive

Average Pitchers Age Unknown

Attendance Negative

Payroll Negative

Percent of Left-handed Starting Pitchers Negative

Percent of Left-handed Relieving Pitchers Negative

Overall Percent of Left-handed Pitchers Negative

Ratio (Between 18%-30.1% Left-handed Negative Pitchers Employed)

Conclusion

This chapter has explained the models being used along with all of the variables.

Every dependent and independent variable has been described. This chapter has also described the data set used that was collected for this thesis. This chapter showed tables 35 illustrating descriptive statistics, a correlation table and tables of expected signs. The following chapter will examine the result found for regressions run. CHAPTER V

RESULTS

The purpose of this chapter is to examine the outcomes of the regressions described in the previous chapters. As explained in the previous chapters, two regressions were run with the dependent variables being winning percentage and earned runs against. In model 1 there were thirteen independent variables and in model 2 there were eleven independent variables. From using two decades of data, the regressions were run as described in the previous chapter. This chapter will explain results of the regressions and then will explain any problems that occurred while running the tests.

Model 1

Below is the equation used to run the model 1 regression with the dependent variable winning percentage along with the twelve independent variables.

Winning Percentage = α + β1 Hits Allowed + β2 Homeruns Allowed + β3 Runs Allowed

+ β4 Walks + β5 Strikeouts + β6 Errors + β7 Pitchers Average Age + β8 Percent of Left-

handed Starters + β9 Percent of Left-handed Relievers + β10 Percent of Left-handed

Pitchers + β11 Attendance + β12 Payroll + Ratio β13 + ε

36

37

The first test run with the model 1 equation was an ordinary least squares (OLS) regression with pooled data. Below are the results after the equation was run in stata.

TABLE 4.1

OLS REGRESSION RESULTS FOR MODEL 1

Linear regression Number of obs = 586 F( 12, 572) = . Prob > F = . R-squared = 0.5272 Root MSE = .04908

Robust wp Coef. Std. Err. t P>|t| [95% Conf. Interval]

h .000139 .0000372 3.74 0.000 .0000659 .000212 hr .0002637 .0001228 2.15 0.032 .0000225 .0005048 r -.0004918 .0000703 -7.00 0.000 -.0006299 -.0003537 bb -.0000706 .0000439 -1.61 0.108 -.0001569 .0000156 so .0000885 .0000189 4.69 0.000 .0000514 .0001256 e -.0003553 .0001487 -2.39 0.017 -.0006474 -.0000633 pitchage .0091983 .001878 4.90 0.000 .0055096 .0128869 plhstart -.0797857 .1349123 -0.59 0.554 -.3447696 .1851982 plhrelievers -.0784586 .1306966 -0.60 0.549 -.3351624 .1782452 plhpitch .1743214 .2637295 0.66 0.509 -.3436749 .6923178 attendance 2.69e-08 3.76e-09 7.17 0.000 1.96e-08 3.43e-08 payroll -2.96e-10 9.48e-11 -3.12 0.002 -4.82e-10 -1.10e-10 Ratio .0038373 .0041575 0.92 0.356 -.0043287 .0120032 _cons .2946437 .0645295 4.57 0.000 .1678999 .4213874

To start off, the regression resulted with a 0.5265 r-squared and a 0.5166 adjusted r-squared. This means that the independent variables account for about 52 percent of a team’s winning percentage. This makes sense that the r-squared value would be around 50 percent because all of the variables other than attendance and payroll just account for the defensive side of the game. 38

After examining the results, one can see that there are a number of significant variables. Hits allowed, homeruns allowed, runs against, strikeouts, errors, average pitchers age, attendance and payroll are all significant independent variables. Walks turned out to be significant at 9.3%. The independent variables that are not significant are percent of left-handed starters, percent of left-handed relievers, percent of left-handed pitchers and ratio.

By looking at the three left-handed pitching variables, one might believe that with all three of them in the same regression it could possibly ruin the results. The regression was run with just percent of left-handed starters and percent left-handed relievers leaving out percent of left-handed pitchers. It was also run with only percent of left-handed pitchers leaving out percent of left-handed starters and left-handed relievers. Any combination of ways the regression was run with those three independent variables, the results for them always ended up insignificant without change in the other variables either. For this reason, it was decided to leave all three in the table above. The results for these variables mean that there is no correlation between having more left-handed starting pitchers and winning percentage, more left-handed relieving pitchers and winning percentage, or more overall left-handed pitchers and winning percentage. All three of these variables were predicted to have a positive effect on winning percentage but in the end had no correlation.

The ratio variable was found not significant when it was predicted to have a positive relationship with winning percentage. The hypothesis for this thesis is that major league baseball teams should employ between 18% to 30.1% left-handed pitchers. By looking at the descriptive statistics on table 3.1 in the data chapter, it shows that the 39 majority of major league teams do choose to employ that ratio of left-handed pitchers but that doesn’t necessarily mean that is going to give them a better chance to win.

For the significant variables, there are a number of surprises with what their predicted sign was and what the results showed. For hits allowed the predicted sign for model 1 was negative but turned out to be positive and significant. Homeruns allowed was similar to hits allowed in that it also was predicted negative but ended up positive.

The last significant variable that is a surprise after seeing the results is payroll. It was predicted it would have a positive correlation with winning percentage but the results show it to have a negative association. Strikeouts, average pitcher age and attendance all turn out to have positive signs as predicted. Runs allowed and errors also were predicted correctly in that they had a negative correlation with winning percentage.

The hits and homeruns allowed variables having positive relationships with winning percentage really make no sense at all. There is no reason that allowing hits and giving the other team opportunities to score should help your winning percentage.

Homeruns allowed makes even less sense because that gives the opponent runs. Given that the team that scores more runs wins, more homeruns and hits allowed should not help your chances of winning. Although it shouldn’t help your team’s chance of winning, maybe people don’t have to worry as much about it hurting their chances of winning.

The variable payroll also had a surprising result. The thinking behind predicting payroll to have a positive relationship with winning percentage was that teams would be able to pay more money to collect better players and in turn win more games. It turns out that spending more money on players doesn’t simply win more games for that team. The 40 negative correlation between payroll and winning percentage means that there are teams in the league with lower payrolls having more success than teams with higher payrolls.

Average pitcher age did not have a predicted sign because there was no logical reasoning to predict it one way or the other. The results showed it to be positive and significant. So the teams with the older pitchers have seemed to win more than the teams with the younger pitchers over the past two decades. The article “Peak Performance and

Age among Super Athletes; Track and Field, Swimming, Baseball, Tennis, and Golf”1 that was introduced in the literature review section, explained that the majority of athletes that need explosive power, speed and strength, reach their peak performance in their early twenties. It also said that athletes that need more endurance, acquired skill and knowledge usually peak in their late twenties or early thirties. The average age of a major league baseball pitcher over the past two decades was 28.63 years old. This means that baseball pitchers need endurance, acquired skill and knowledge more than explosive power, speed and strength in order to be successful. As pitchers reach their late twenties and early thirties, they have had time to gain those important traits and in turn help to win more baseball games for their team.

The next regression that was run on model 1 was a fixed effects regression. A fixed effect estimation is a way to run a regression from a panel data set that differs from regular OLS because of deviations from different time periods. The results of the fixed effects regression on model 1 can be found in table 4.2 on the following page.

1 Schulz, R., and C. Curnow. "Peak Performance and Age Among Superathletes: Track and Field, Swimming, Baseball, Tennis, and Golf." Journal of gerontology 43, no. 5 (1988): 113-120. 41

TABLE 4.2

FIXED EFFECTS REGRESSION ON MODEL 1

Fixed-effects (within) regression Number of obs = 586 Group variable: teamid Number of groups = 30

R-sq: within = 0.5287 Obs per group: min = 14 between = 0.4676 avg = 19.5 overall = 0.5054 max = 20

F(13,543) = 46.86 corr(u_i, Xb) = -0.1470 Prob > F = 0.0000

wp Coef. Std. Err. t P>|t| [95% Conf. Interval]

h .0001156 .000034 3.40 0.001 .0000488 .0001824 hr .0005157 .0001186 4.35 0.000 .0002828 .0007485 r -.0005556 .000066 -8.41 0.000 -.0006853 -.0004258 bb -.0000511 .0000409 -1.25 0.212 -.0001315 .0000292 so .0001322 .0000197 6.71 0.000 .0000935 .0001708 e -.0003181 .0001333 -2.39 0.017 -.00058 -.0000562 pitchage .0072374 .0016987 4.26 0.000 .0039005 .0105742 plhstart .051318 .120099 0.43 0.669 -.1845975 .2872336 plhrelievers .0567451 .1146059 0.50 0.621 -.1683802 .2818704 plhpitch -.096992 .2346134 -0.41 0.679 -.5578531 .363869 attendance 3.45e-08 4.12e-09 8.37 0.000 2.64e-08 4.25e-08 payroll -6.03e-10 8.95e-11 -6.74 0.000 -7.79e-10 -4.27e-10 Ratio .002851 .0038587 0.74 0.460 -.0047288 .0104307 _cons .334792 .0544144 6.15 0.000 .2279035 .4416805

sigma_u .02725822 sigma_e .04347014 rho .28222775 (fraction of variance due to u_i)

F test that all u_i=0: F(29, 543) = 6.42 Prob > F = 0.0000

The results for the fixed effects regression turned out to be very similar to the results of the OLS regression for model 1. All of the variables used in both of the regressions had basically the same outcomes. Every one of the independent variables that are significant in the OLS pooled data regression are also significant in the fixed effects regression and all of them kept the same sign. 42

To conclude for model 1, the hypothesis was tested and could not be proved due to the insignificant independent variable, ratio. This means that teams who employ between 18% and 30.1% left-handed pitchers don’t necessarily improve their chance of winning. Although ratio was not significant and doesn’t affect winning percentage, there are a multiple variables that help increase or decrease a team’s winning percentage. A couple different diagnostic tests were run for these regressions. The White test proved that there was heteroskedasticity in model 1. The regression was run with robust standard errors to fix the problem of heteroskedasticity. The Jarque Bera test was also ran and the results showed that the residuals were found to be normally distributed.

Model 2

Below is the equation used to run the model 2 regression with the dependent variable earned runs allowed along with the eleven independent variables.

Earned Runs Allowed = α + β1 Hits Allowed + β2 Homeruns Allowed + β3 Walks + β4

Strikeouts + β5 Pitchers Average Age + β6 Percent of Left-handed Starters + β7 Percent

of Left-handed Relievers + β8 Percent of Left-handed Pitchers + β9 Attendance + β10

Payroll + β 11 Ratio + ε

The first test run with the model 2 equation was an ordinary least squares (OLS) regression with pooled data. Table 4.3 below, shows the results after the equation was run in stata.

43

TABLE 4.3

OLS REGRESSION RESULT FOR MODEL 2

Linear regression Number of obs = 586 F( 11, 574) = 526.51 Prob > F = 0.0000 R-squared = 0.9110 Root MSE = 27.803

Robust er Coef. Std. Err. t P>|t| [95% Conf. Interval]

h .3713769 .0137078 27.09 0.000 .3444534 .3983004 hr 1.139358 .0531444 21.44 0.000 1.034977 1.243739 bb .3494234 .018849 18.54 0.000 .312402 .3864448 so -.1225847 .0098626 -12.43 0.000 -.1419558 -.1032136 pitchage -.6996043 .9999415 -0.70 0.484 -2.663595 1.264386 plhstart 51.27947 74.15049 0.69 0.489 -94.35991 196.9189 plhrelievers 40.35783 71.65061 0.56 0.573 -100.3715 181.0872 plhpitch -100.8597 145.0158 -0.70 0.487 -385.686 183.9666 attendance -5.67e-06 2.39e-06 -2.37 0.018 -.0000104 -9.76e-07 payroll 1.65e-07 4.76e-08 3.46 0.001 7.14e-08 2.58e-07 Ratio .1048559 2.306115 0.05 0.964 -4.424598 4.63431 _cons -70.72065 33.59877 -2.10 0.036 -136.7122 -4.729137

This regression resulted with a 0.9110 r- squared and a 0.9093 adjusted r-squared.

This means that the eleven independent variables used in this equation account for about

91 percent of the teams earned runs against. It was expected that model 2 would have a higher r-squared and adjusted r-squared result when compared to model 1. This was expected because the majority of the independent variable used in both equations focus primarily on defensive aspects of baseball. With model 2 having the dependent variable earned runs against, a statistic that is only defense oriented, it makes sense model 2 has a higher r- squared value because of the independent variables. 44

Identical to the model 1 equation, the model 2 equation also has the independent variables left-handed starting pitchers, left-handed relieving pitchers and left-handed pitchers. When looking at these three variables, one may see a problem because percent of left-handed pitchers includes the other two variables percent of left-handed starter and percent of left-handed relievers. The exact same procedure that was conducted for model

1 was conducted for model 2. A number of regressions were run all having the same end results. It didn’t matter which of the three variables were put in or taken out of the regression, all three always ended up being insignificant and not changing the results of the other variables. Because of this, all three of the variables were kept in table 4.3 to show all of the outcomes for the variables.

Along with the other left-handed pitching variables, ratio also was found to be insignificant. The hypothesis of this thesis was that major league baseball teams should employ between 18% to 30.1% left-handed pitchers in order to maximize success.

Success in this thesis is measured by winning percentage and earned runs against. With ratio being insignificant, we cannot conclude that teams should employ between 18% and

30.1% left-handed pitchers to lower earned runs against.

The majority of the independent variables ended up being significant but there was also a noteworthy amount of independent variables that are insignificant. Six of the eleven variables turned out to be significant. Those six variables are hits allowed, homeruns allowed, walks, strikeouts, attendance and payroll. The insignificant variables are average pitcher age, percent of left-handed starters, percent of left-handed relievers, percent of left-handed pitchers and ratio. 45

For the majority of the significant independent variables, the resulted signs wound up being the same as the predicted signs. Hits allowed, homeruns allowed, and walks all resulted in having a positive effect on earned runs against. Strike outs and attendance are the two independent variables that were found significant with a negative effect on earned runs against. Payroll is the one significant independent variable that had the sign result opposite of what was predicted. Payroll was predicted to have a negative effect on earned runs against. The results show that payroll actually turned out to have a positive effect on the dependent variable, earned runs against.

On the following page, table 4.4 shows the fixed effects regression on model 2.

46

TABLE 4.4

FIXED EFFECTS REGRESSION ON MODEL 2

Fixed-effects (within) regression Number of obs = 586 Group variable: teamid Number of groups = 30

R-sq: within = 0.9016 Obs per group: min = 14 between = 0.9579 avg = 19.5 overall = 0.9103 max = 20

F(11,545) = 453.74 corr(u_i, Xb) = 0.2580 Prob > F = 0.0000

er Coef. Std. Err. t P>|t| [95% Conf. Interval]

h .353919 .0134322 26.35 0.000 .3275337 .3803042 hr 1.106106 .0547297 20.21 0.000 .9985994 1.213613 bb .3450901 .0201029 17.17 0.000 .3056014 .3845789 so -.1048602 .010983 -9.55 0.000 -.1264345 -.083286 pitchage .0091312 1.026672 0.01 0.993 -2.007588 2.02585 plhstart 78.8449 72.58729 1.09 0.278 -63.74022 221.43 plhrelievers 58.70034 69.28763 0.85 0.397 -77.40317 194.8038 plhpitch -145.0158 141.8124 -1.02 0.307 -423.5817 133.5501 attendance -8.07e-06 2.46e-06 -3.28 0.001 -.0000129 -3.24e-06 payroll 1.49e-07 5.09e-08 2.93 0.004 4.91e-08 2.49e-07 Ratio 1.285684 2.332079 0.55 0.582 -3.295281 5.866648 _cons -70.89669 32.81038 -2.16 0.031 -135.347 -6.446402

sigma_u 11.54615 sigma_e 26.284876 rho .16174745 (fraction of variance due to u_i)

F test that all u_i=0: F(29, 545) = 3.35 Prob > F = 0.0000

The fixed effects regression run on model 2 basically had the same results as the

OLS regression with pooled data run on model 2. We still cannot conclude that teams should employ between 18% and 30.1% left-handed pitchers to decrease earned runs against because of the insignificant ratio variable. The same variables that were significant in the OLS regression are significant in the fixed effects regression. Also, the signs for all of the variables turned out the same for both regressions. Even the r-squared 47 values are just about the same value. Just as in the OLS regression, the t-values stayed fairly similar. By far, the independent variables with the largest coefficients are hits allowed, homeruns allowed, walks and strikeouts.

The White test proved that there was heteroskedasticity in model 2. The regression was run with robust standard errors to fix the problem of heteroskedasticity.

The Jarque Bera test was also ran and the results showed that the residuals were found to be normally distributed.

Conclusion

This chapter explained the results from which came from the equations shown in

model 1 and model 2. Both models had an OLS regression run with pooled data and a

fixed effects regression run. The next chapter will finalize all conclusions that can be

drawn from this research and also provide ideas for future research related to this topic.

CHAPTER V

CONCLUSION

The purpose of this chapter is to draw conclusions from the regression results, explain limitations of the study and also discuss if there is this any possible extensions that could be made to improve this study.

Regression Conclusions

This study focused on the handedness of pitchers in major league baseball. The study was done to explore why there are so many more left-handed players in the MLB compared to society and if there is an optimal percent of left-handed pitchers to employ.

The main goal of this thesis was to test the hypothesis that major league baseball teams should employ between 18% to 30.1% left-handed pitchers. Two models were used to test this hypothesis. The first used winning percentage as the dependent variable.

Winning percentage was used because it is the best statistic to use when trying to measure a team’s success. By simply looking at winning percentage, one can evaluate how well a team performed in a certain year. The second regression used earned runs against as the dependent variable. The second model was constructed to look solely at the success of a team’s defense. In both models the variable ratio turned out to be insignificant. Although this study did not prove that teams should employ between 18% and 30.1% left-handed pitchers, several other interesting conclusion can be drawn from other variables used in the regressions.

48

49

In model 1, an R-squared value of 52.72% was found. That means that the independent variable explains about 53% of the dependent variable, winning percentage.

Eight independent variables turned out to be significant. The eight variables were hits allowed, homeruns allowed, runs against, strikeouts, errors, average pitchers age, attendance and payroll. Of those eight variables, hits allowed, homeruns allowed and payroll all had the opposite sign of what was predicted. Because of these results, it is possible that teams shouldn’t worry as much about allowing hits and homeruns and paying players more than their competitors. Average pitcher age did not have a predicted sign because there really wasn’t any logic in predicting one way over the other. The results showed it to be very significant and positive. This means that teams should look for older, more experience players to pitch for their team.

Model 2 resulted with an R-squared value of 0.9110. This means that the eleven independent variables used in this model account for about 91% of the teams earned runs against. Model 2 had a much higher R-squared value than model 1 which was expected because most of the independent variables were defensive statistics. Six of the eleven independent variables showed to be significant. Just as in model 1, payroll turned out significant with a sign opposite of what was predicted. This just reinforces that maybe payroll isn’t as important as people may think. It is a possibility that teams should spend more money on hiring baseball scouts to find new young talent to draft instead of buying very expensive veterans. This money could also be spent on improving their minor league programs to bring up players through their own system.

Although this study did not find the optimal number of left-handed pitchers to employ to maximize success, it does not mean that this study did not find any relevant 50 information. As explained in previous chapters, left-handed starting pitchers are paid more on average than right-handed pitchers. This study shows that there isn’t a magic number of left-handed pitchers to employ so there is no real reason to spend more on a left-handed pitcher just to have a left-handed pitcher.

Limitations of the Study and Possible Future Research

A number of limitations occurred during the making of this study. The main limitation was time. With more time to complete this thesis, more variables could have been looked at. For instance, instead of just having the 18% to 30.1% ratio, a number of other ratios could have been tested. Also with more time, more observations could have been collected.

Possible further research could look at left-handed baseball player’s effect on the success of a major league baseball team as a whole. This study would include defensive and offensive statistics and look at both the pitchers and batters. By looking at just the defensive side of baseball, half of the game is being ignored.

Another extension that would be beneficial to this study would be to account for every pitchers salary. If every individual pitchers salary was accounted for, the difference between left and right-handed pitchers could be examined for the past twenty years. 51

SOURCES CONSULTED

"Baseball-Reference." Available from http://www.baseball-reference.com/.

"USA TODAY Salaries Databases." in USA TODAY [database online]. Available from http://content.usatoday.com/sportsdata/baseball/mlb/salaries/team/2009.

Boal, William and Michael Ransom. "Monopsony in American Labor Markets". EH.Net Encyclopedia, edited by Robert Whaples. January 23, 2002. URL http://eh.net/encyclopedia/article/boal.monopsony

Boal, William, and Michael R. Ransom. "Monopsony in the Labor Market." Journal of Economic Literature 35, no. 1 (1997): 86-112.

Bradbury, J. C. "Does the Baseball Labor Market Properly Value Pitchers?" Journal of Sports Economics 8, no. 6 (2007): 616-632.

Bradbury, J. C. "Do Southpaws Get a Fair Shake in MLB? Part Two: Pitchers." Sabernomics (2006)

Daley, Ken. "Baseball's BEST Left-Handed Pitchers." Baseball Digest 59, no. 8 (2000): 56.

Davis, Michael C. "The Interaction between Baseball Attendance and Winning Percentage: A VAR Analysis." International Journal of Sport Finance 3, no. 1 (2008): 58-73.

DeBoer, Addison. "Baseball and the Left Handed Hitter." (2010)

Dellaserra, Alysia. "Left Handed Batter VS. A Right Handed Pitcher." Livestrong (2010)

Demmink, Herman. "Value of stealing bases in Major League Baseball: "Stealing" runs and wins." Public Choice 142, no. 3-4 (2010): 497-505.

Flanagan, Thomas. "Game theory and professional baseball: mixed-strategy models. / La theorie du jeu et le baseball professionnel: modeles a strategie mixte." Journal of Sport Behavior 21, no. 2 (1998): 121-138.

Fortenbaugh, Dave, and Monique Butcher-Mokha. "The biomechanics of situational baseball: Execution and perception of left-handed pitchers' simulated pick-off moves to first base." Sports Biomechanics 6, no. 1 (2007): 2-16. 52

Fraley, Gerry. "Southpaws." Baseball Digest 59, no. 8 (2000): 52.

Gitter, Seth, and Thomas Rhoads. "Determinants of Minor League Baseball Attendance." Journal of Sports Economics 11, no. 6 (2010): 614-628.

Goldstein, Stephen, and Charlotte Young. ""Evolutionary" stable strategy of handedness in major league baseball." Journal of Comparative Psychology 110, no. 2 (1996): 164-169.

Karp, Josie. "Low supply of left-handed pitchers crates demand." Baseball Digest 53, no. 11 (1994): 25.

Krautmann, Anthony C., E. Gustafson, and L. Hadley. "A note on the structural stability of salary equations: major league baseball pitchers." Journal of Sports Economics 4, no. 1 (2003): 56-63.

Krautmann, Anthony C., Peter von Allmen, and David Bern. "The Underpayment of Restricted Players In North American Sports Leagues." International Journal of Sport Finance 4, no. 3 (2009): 161-175.

Merrill, Everett J. "Left-Handed Starters: Do They Make a Difference?" Baseball Digest 59, no. 10 (2000): 54.

Perloff, Jeffrey. Microeconomics. Third ed. Pearson Addison Wesley, 2004.

Peters, Dave. "Left-handers have advantage in baseball, researcher concludes." The Japan Times Online

Schulz, R., and C. Curnow. "Peak Performance and Age Among Superathletes: Track and Field, Swimming, Baseball, Tennis, and Golf." Journal of gerontology 43, no. 5 (1988): 113-120.

Schwartz, Barry, and Stephen F. Barsky. "The Home Advantage." Social Forces 55, no. 3 (1977): pp. 641-661.

Scully, Gerald. "Pay and Performance in Major League Baseball." The American Economic Review 64, no. 6 (1874): 915-915-930.

Wood, C. J., and J. P. Aggleton. "Handedness in 'fast ball' sports: Do left-handers have an innate advantage?" British Journal of Psychology 80, no. 2 (1989): 227.

Yamamura, Eiji. "Team Payroll, Competitive Balance, and Team Performance of the Japan Professional Baseball League." Empirical Economics Letters 7, no. 9 (2008): 909-916.