HAVERFORD COLLEGE DEPARTMENT OF ECONOMICS

Players Streak Too The Economic Significance of an Extended in Major League

Matthew Seskin 12/21/12

Seskin 2

Table of Contents

Abstract……………………………………………………………………… 3

I. Introduction……………………………………………………………...4

II. Literature Review…………………………………………………….7

A. Inter‐seasonal Factors 8

B. Intra‐seasonal Factors 10

III. Method………………………………………………………………...... 14

A. Data 14

B. Model 17

C. Time Aspect 20

D. Without Joe 23

IV. Results……………………………………………………………………24

A. Original Regression Results 24

B. Time Aspect Regression Results 27

C. Robustness: The Results without Joe 30

V. Conclusion………………………………………………………………. 31

Bibliography……………………………………………………………….. 33

Appendix……………………………………………………………………..35

Seskin 3

Abstract

This paper analyzes the impact of a player’s extended hitting streak on his team’s home attendance. Adding to the previous literature, which predominantly looks at the effects team performance, timing, and economic circumstances have on attendance, this study looks at how individual player effort can draw crowds to the games. Using OLS regression with fixed effects, 26 streaks, taking place between 1941 and

2011, were observed and analyzed. The data contains variables that control for other factors, which have been proven to affect attendance, allowing the results to isolate the effect of the hitting streaks. The analysis also considers how fan reaction to streaks, at least in the form of attendance, has changed over time, and whether the results are still economically significant when the longest and most famous streak, Joe DiMaggio’s, is taken out of the equation. The study was able to yield economically and statistically significant results supporting the hypothesis that hitting streaks do have a positive economic effect on attendance.

Seskin 4

I. Introduction

In over 100 years of existence, Major League Baseball (MLB) has experienced ups and downs in popularity. Yogi Berra, in one of his famous “yogi‐isms,” once proclaimed, “If people don’t want to come out to the ballpark, nobody’s going to stop ‘em,”(E. M. Knowles,

1999). However, by observing historical attendance data and presumed related variables across the seasons, economists have begun to understand what brings people to the ball park, and, in the words of Yogi, what doesn’t stop them from not coming.

Much of the literature to this point, with the exception of Horowitz and Lackritz

(2012), has analyzed factors that affect fan attendance from a non‐individualistic standpoint. Most of the variables that have been used to explain changes in attendance between seasons, and even within seasons, have nothing to do with individual player effort or talent. They are instead centered around the team’s performance, current economic atmosphere, timing of the game (time, day, month, year), and even the age of the home team’s stadium.

Every year, however, owners go out and spend millions of dollars to sign “superstar” players to their teams. Granted, much of the reason these owners justify shelling out this kind of capital is the hope that these players will help their teams win, which will, in turn, raise attendance and revenues. Presumably, however, owners also believe that superstar players’ individual flair and achievements alone would also draw crowds to their games.

This speculation makes the gap in the literature startling. Few economists have focused on the degree to which the accomplishments of individual players in their pursuit, or breaking, of highly coveted and respected Major League Baseball records have affected fan attendance. Seskin 5

MLB fans have made it abundantly clear that they are enchanted by great feats in . For example, according to NBC Sports, Barry Bonds’ career home runs record‐breaking ball sold for $753,467 in 2007, and Mark McGwire’s single season home record‐breaking ball sold at auction for $3 million in 1998 (“Bonds’ record ball fetches $752,467,” 2007). Although homerun balls may make the biggest splash money‐wise, home run records are not the only player accomplishments fans care about.

On July 9, 2011, ’s 3000th career soared out of Yankee Stadium and into the hands of Christian Lopez, a recent college graduate. Lopez could have kept the ball and sold it for an estimated $250,000, but instead decided to give the ball back to Jeter, saying,

“Yeah, money is cool and all, but I’m only 23 years old. I have a lot of time to make that. His

(Jeter’s) accomplishment is a milestone,” (Matuszewski, 2011). This act again characterizes how meaningful these individual player efforts are to fans.

Many sports fans, writers, and players agree, however, that one of the most impressive performances in baseball is putting together an extended hitting streak. An extended hitting streak is defined as a stretch of consecutive games in which a player has at least one hit in each game, the most famous of which is New York Yankee Joe DiMaggio’s

56‐game hitting streak in 1941. Jayson Stark of ESPN called the now 71‐year‐old streak,

“the coolest, most romantic record in sports,” (Jayson Stark, 2011). Since the day the streak ended, 25 players have had single season hitting streaks of over 30 games, but none have come within 10 games of DiMaggio’s record ( of the coming the closest in 1978 with a 44‐game streak). , who had a 39‐game streak in 1987 with the , attempted to describe the majesty of a multi‐game hitting streak. “There’s something magical about a hitting streak. It’s the fact that it develops over Seskin 6 time and the day‐to‐day pressures that come along as the streak mounts,” (Jayson Stark,

2011).

There is no question that everyone in the baseball world is impressed by the individual player effort of an extended hitting streak. Although the financial value of a hitting streak cannot be ascertained through an auction price of the final‐hit‐ball, the economic impact of such a streak could be measured by analyzing how it affects game‐day attendance.

I am interested in finding the effect that a player’s extended hitting streak has on that player’s team’s home attendance. Presumably, excitement over hitting streaks generally builds as the streak builds, with each added game drawing more and more fan interest. This effect is most likely observed through increasing attendance to games as the streak grows. Looking at 26 hitting streaks of at least 30 games by players in the modern era,1 this study intends to isolate the effect that the extension of a streak has on game‐day attendance for the player’s home team. It will also try to determine if and why that effect has changed over time, as well as how the strength of the correlation will be affected if the most famous and longest streak, Joe DiMaggio’s, is taken out of the equation.

Section II of this paper will offer a comprehensive look at previous, related studies.

Section III, the methodology section, will outline details about the data and the model being used to answer the research question. Section IV will provide the study’s results, and section V will conclude.

1 Seasons observed from 1941‐2011. Seskin 7

II. Literature Review

In order to find the true magnitude of this effect, the analysis needs to account for several other factors, also affecting why fans choose to attend games. The literature described below has looked extensively at the effects of such factors, and this study’s analysis includes the variables with conclusive influence over attendance, as well as other presumably important factors not yet explored in the literature, as controls.

There has been a plethora of literature that has looked at why sports fans decide to go to games. Studies have used data from many different professional leagues. Most of the literature, however, deals with Major League Baseball attendance, mainly because it is the league with the richest and most accessible supply of statistical data. This study will also be analyzing factors affecting attendance using MLB statistics, so literature that has used MLB data will contextualize its contributions.

Wide ranges of factors that have an effect on attendance have been observed. For the purpose of this paper, the different types will be classified into two separate categories.

First, there are the factors that change attendance levels from season to season. This inter­ seasonal category includes variables that would only affect attendance from one year to the next, and not within a single season. For example, city characteristics such as the current year’s GDP per capita in the city of interest would fall into this category. Because this study is only interested in analyzing how attendance fluctuates from game to game within a season, factors affecting seasonal attendance movements will not be considered when running the analysis.

The second category consists of variables that affect competitiveness and demand for attending a game within a season, or intra­seasonal factors. These factors include, for Seskin 8 example, the time of day the game is being played or how successful the team’s season has been up to the start of the game being observed.

Within the category of intra‐seasonal variables, a very small share of literature has dealt with individual player effort as a cause for attendance fluctuations. Individual players’ extended hitting streaks fall under this subcategory, and therefore literature regarding hitting streaks’ effects will be most closely related to this study.

Inter­seasonal factors

Using annual seasonal data from 1950 to 2003 for both (NL) and

American League (AL) teams, Krautmann and Hadley (2006) took a look at the effects of inter‐seasonal balances on Major League Baseball attendance. To quantify the effects of inter‐seasonal balance, the study used two approaches.

They first looked at the probability that a team who was successful in the previous year would be successful again in the current season. A Markov Model was used to obtain sample proportions from the data for teams that are winners in consecutive seasons. The paper did not define exactly what qualifies a team as a “winner,” however. The logic behind this measure is that fans prefer a competitive league, and attendance will be higher when there are more teams with the potential to have successful seasons. The dependent variable for their study was game attendance.

The findings indicated that in the AL, where dynasties had more of a presence over the time period, a high probability of perennial winning had a negative effect, as they hypothesized. In the NL, however, the effect was positive. The paper suggested this might be because of the dominance of the Yankees dynasties in the over the decades studied (Krautmann & Hadley, 2006). Seskin 9

For the second approach, based on arguments from Quirk and Fort (1992),

Krautmann and Hadley (2006) used the ratio of the league’s standard deviation of winning percentage divided by the standard deviation of the league’s “ideal” winning percentage as a measure of inter‐seasonal balance. The idealized standard deviation is defined as the standard deviation that would be seen if the teams’ winning percentages in the league were perfectly balanced for that season. The greater the ratio, the larger the imbalance is in the league. According to previous theory, namely the uncertainty of outcome hypothesis, a more imbalanced league is less interesting to the fans and therefore draws smaller crowds.

Believers in the uncertainty of outcome hypothesis essentially attempt to prove that fans are interested in their home team winning, but only up to a certain point. If the probability of winning becomes too high (or too low), fans lose interest, and attendance levels will therefore suffer (G. Knowles et al., 1992).

Unfortunately, the study was unable to render significant results for the influence of the idealized standard deviation measure, suggesting that the size of the ratio didn’t strongly affect attendance. Perhaps the study could have used other measures of the single‐ season factors’ influence on demand, instead, to compare how competitive balance played a role in the different leagues across time (Krautmann & Hadley, 2006).

Denaux, Denaux, & Yalcin (2011) also examined the main factors that affect how many fans attend games, but with a wider range of more specific variables that change attendance. Employing a vast array of independent variables, the study looked at demand for attendance at home games of “12 non‐expansion, non adjustment national league teams” (Astros, Braves, Cardinals, Cubs, Dodgers, Giants, Expos, Mets, Padres, Pirates, Seskin 10

Phillies, and Reds) from 1979 to 2004. Unlike Krautmann and Hadley (2006), the study looked at both inter and intra‐seasonal effects.

To evaluate effects of some inter‐seasonal factors, they included an estimate for attendance volatility and a dummy for each season. This accounted for new stadiums and fan interest in different years. They also found average ticket prices for the different seasons to see if the change in expenses of buying a ticket to a game affected attendance from season to season. Finally, variables characterizing the team’s city were examined.

These included unemployment rate and per capita income. The dependent variable was, again, attendance.

The study was able to show that an increase in per capita income did have a positive effect on attendance. Specifically, they found that a $10,000 per year increase in income led to an increase in attendance of 114 fans. The regression estimated statistically insignificant coefficients on the unemployment and price variables, so the importance of these factors was inconclusive (Denaux, Denaux, & Yalcin, 2011).

For the purposes of this study, however, inter‐seasonal factors described by

Krautmann and Hadley (2006) and Denuax et al. (2011) are inconsequential. It is unlikely that the team’s performance in previous seasons, or the team being a dynasty, has an effect on the day‐to‐day changes in attendance within the current season. It is also clear the city characteristics that are consistent over an entire season would not influence intra‐seasonal balance.

Intra­Seasonal factors

Denaux et al. (2011), discussed above, also took a look at how some intra‐seasonal factors affected attendance. The study employed very specific measures for intra‐seasonal Seskin 11 variables, and was able to come up with significant results. To see how the time, day, and date of the game affected attendance, the study used dummy variables to indicate the time of day, day of the week, and month of the year in which the game was being played. They also included dummy variables for whether the game was the first of the season and whether the game was a part of interleague play.

The study found that most of these variables had statistically and economically significant effects on attendance. Some of the more interesting findings, not looked at in similar literature, are the considerable positive effects opening day had on attendance and how attendance fluctuates from month to month. Both of these factors should be considered when looking at the effects of hitting streaks on attendance (Denaux et al.,

2011).

On the other hand, the variable used to describe to‐date performance in the current season could have been more dynamic than a dummy for winning or losing record. Other studies, such as Lemke, Leonard, and Tlhokwane (2010), used number of wins in the previous 10 games leading up to the game being observed. Another interesting statistic to include could be games back in the standings or normalized winning percentage to demonstrate how well a team is playing compared to the rest of the league. Also, the study did not incorporate a variable for competitiveness. Most other literature, such as

Krautmann and Hadley (2006) and Knowles et al. (1992), agree uncertainty of outcome could largely sway fan interest in going to games. However, Lemke, Leonard, and

Tlhokwane (2010), using factors from 2,196 games during the 2007 season to explain how numerous different intra‐seasonal variables affect home attendance, claim that the uncertainty of outcome theory is inaccurate. The data from the 2007 season showed that Seskin 12 fan interest continued to rise as the probability of a home team victory rose (Lemke,

Leonard, & Tlhokwane, 2010).

While most literature deals with effects that general team statistics and variables have on attendance, few economic studies focus more on individual player efforts’ effects on attendance. An investigation conducted by Horowitz and Lackritz (2012) is the exception. This study analyzed the impact on attendance of two of baseball’s most renowned hitting streaks: those of Joe DiMaggio of the in 1941 reaching

56 games and of Pete Rose of the Cincinnati Reds in 1978 reaching 44 games.

The study controlled for some of the factors discussed in Denaux et al. (2011) such as day of the week, but took most of its independent variables from other literature. They account for games that are part of a header,2 a dummy for home and away games, and a measure for competitiveness. After finding the effects of these variables over the course of the season in which the streak occurred, they then analyzed how attendance grew over the period of the streak.

They began observing the effects of DiMaggio’s streak when the New York Times’ headline read, “DIMAGGIO’S STREAK REACHES 30 GAMES!” (Horowitz & Lackritz, 2012).

The period over which they looked at Rose’s streak began with the Reds’ first game back from the all‐star break in 1978, when Rose’s streak stood at 25 games.

During DiMaggio’s streak, the study found that attendance was increasing exponentially by 288 people per game. This means that in total 79,4883 more people attended Yankee games because of the streak. They estimated that the streak earned the

2 At the time fans were able to attend both games for the price of one, increasing the likelihood of a fan attending.

3 Equation used in study [288 x (1+2+3+…+23)]=79,488 Seskin 13

Yankees $39,744 dollars in ticket sales alone, which was 10% of the payroll that season and completely covered DiMaggio’s $37,500 salary. These figures do not take into account the value added to concessions, advertisements, and merchandising. Not bad for a couple of month’s work.

The study showed that Rose’s streak, although not as impressive as DiMaggio’s, still added 140 people to each game, totaling 26,600 additional attendees. They calculated the added value in ticket sales to be about $57,190, 15% of Rose’s salary ($375,000) and 1.6% of Cincinnati’s $3.3 million payroll. The study attributed some of the reason for the smaller effect to increase in season ticket sales and more comprehensive local televising of games

(Horowitz & Lackritz, 2012).

Horowitz and Lackrtiz’s study is incomplete for a few reasons, however. First, it only looks at the effects of the two longest streaks, ignoring the effects that other player’s extended hitting streaks have had on attendance. Second, the authors do not look at the streaks’ effects from the streaks’ beginnings to their ends. There is the possibility that fans knew about the streaks and attended games because of it before the streaks reached the point where Horowitz and Lackritz (2012) began observing. Third, the small amount of observations limits the study’s ability to analyze how reactions to streaks have changed over time. As time has gone on, more and more factors that could sway fans’ attendance have become prevalent. For example, in a more modern context, even casual sports fans are given up‐to‐the‐minute updates on player statistics by sports networks and the news, both over the Internet and on their televisions, and are therefore more aware of what is happening around the league. This could mean that the effects a streak has on attendance might be observable earlier nowadays than in DiMaggio’s and Rose’s times. Seskin 14

III. Methodology

The analysis in this study attempts to explain the day‐to‐day variation in attendance at MLB games and takes into account factors deemed, either by measured significance in previous analyses or by intuition, to have influential effects over home attendance. Only intra‐seasonal factors which affect attendance from game to game within a season are taken into account, as inter‐seasonal factors are unimportant for the purposes of this study.

The independent variables are based partly on the variables and methods from the literature reviewed above, and partly on new variables and procedures that have not been tested in this context.

Data:

The data for the analysis consists of 2,086 observations of home games for 26 players with separate, single season, above‐30‐game hitting streaks. For the season in which the streak occurred, observations for every home game for the streaking player’s4 team are included in the dataset. As previously mentioned, the first streak observed in the study, belonging to Joe DiMaggio, took place in the 1941 season. The most recent streaks, those of the and of the , took place during the 2011 season.

Table 1 below is a list of all 26 players whose streaks are the focus of the study, with some basic information regarding their streaks. Clearly the dataset is very diverse, with streaks coming from players of varying levels of popularity, on many different teams, and in many different years.

4 “Streaking player” means the player who had the streak. Seskin 15

Table1: Players’ Streaks Information

# Player Team Streak Season

1 Joe DiMaggio New York Yankees 56 1941 2 Pete Rose Cincinnati Reds 44 1978 3 Paul Molitor Milwaukee Brewers 39 1987 4 36 (38)5 2005 (2006) 5 Boston Braves 37 1945 6 Luis Castillo Florida Marlins 35 2002 7 Philadelphia Phillies 35 2006 8 Dom DiMaggio 34 1949 9 34 1987 10 Dan Uggla Atlanta Braves 33 2011 11 Los Angeles Dodgers 31 1969 12 Atlanta Braves 31 1970 13 31 1980 14 Valdimir Guerrero 31 1999 15 St. Louis Cardinals 30 1950 16 30 1980 17 30 1989 18 Sandy Alomar Jr. 30 1997 19 Boston Red Sox 30 1997 20 30 1998 21 Luis Gonzalez 30 1999 22 St. Louis Cardinals 30 2003 23 30 2006 24 Moises Alou 30 2007 25 30 2009 26 Andre Ethier Los Angeles Dodgers 30 2011

Included in the dataset are variables that describe each home game that was played in the season in which the streak occurred. The important variables include attendance, home and away winning percentages, games back in the division for the home team, year of the season, number of the game in the season, number of games the streak has reached, and dummies for day, month, double‐header, and day/night game. The majority of the data was gathered from baseball‐reference.com (“Baseball Reference,” n.d.), baseball‐almanac.com

5 The parentheses indicate that the streak continued for 2 more games into the 2006 season. The 2006 numbers, however, are not taken into account during analysis. Seskin 16

(“Consecutive Games Hitting Streaks: 30+ Game Hitting Streaks in Baseball,” n.d.), and retrosheet.org (“Retrosheet Game Logs,” n.d.). Some of the attendance data for the 1941

New York Yankee season were also taken from the dataset used by Horowitz and Lackritz

(2012). Below, in Table 2, are some interesting summary statistics for the observations in the dataset:

Table 2: Interesting Summary Statistics

Variable # of Obs Mean Std. Dev. Min Max

WinPct. 2086 0.5112283 0.1112704 0 1 WinPct. (Away) 2086 0.4954819 0.114321 0 1 Games Back 2086 -6.400527 9.632034 -38 20.5 Attendance 2016 27144.81 13723.71 685 60498 Competitive 2081 0.513065 0.1487492 0 1

From Table 2, some important observations can be made about the data. It is clear that there are large gaps between minimum and maximum games back and attendance.

The fact that there is a team that was, at some point during its season, 38 games out of first place, and another team that was 20.5 games in first place, shows that the sample includes streaks which took place on teams with a wide range of success. There are very accomplished teams who led their division by a long shot, and there are also very unsuccessful teams who did not have a chance of making the playoffs. The large gap in minimum and maximum attendance, and the high standard deviation for attendance, show that attendance was clearly subject to sizeable swings.

Furthermore, we see that, on average, the streaking player’s team had an above .500 winning percentage. This means that they tended to win more games than they lost. This could be heavily swayed by Joe DiMaggio’s involvement, as that Yankee team had a very strong winning percentage throughout the 1941 season. On the contrary, the away team Seskin 17 had a below .500 winning percentage, indicating that, on average, the visiting teams were losing more games than they were winning.

These values correspond to the measure for competitiveness used. The measure comes from a model that gives the probability of the home team winning the game designed by Bill James in 1981 (Tippett, 2002). The equation is: Pw=(Ht – HtAt)/(Ht + At –

2HtAt) where Pw is the probability the home team will win, Ht is the home team winning percentage at the beginning of the observed game, and At is the away team winning percentage at the start of the game. The mean of this value for this dataset is just over 0.5, suggesting that the home team was on average slightly more likely to win. This is also in keeping with the uncertainty of outcome hypothesis, which states that fans are most interested in games that they do not perceive to be potential blowouts.

Model:

The basic regression model used for this analysis includes 26 independent variables and one dependent variable. Its econometric logic follows findings from Lemke et al.

(2010) regarding the importance of different procedures to yield favorable results. Notes about their methodology and results are helpful in determining the best econometric approach to take while analyzing attendance fluctuations within a season. They determined that looking at the log of attendance versus looking at paid attendance as the dependent variable did not yield coefficients with different signs or significance. Furthermore, they discovered that results from OLS regression differed depending on whether fixed effects were included or omitted. Because they concluded factors that determine home attendance likely vary from team to team and from season to season, they put more emphasis on their results found using OLS regression with fixed effects (Lemke et al., 2010). This study’s OLS Seskin 18 regression, therefore, uses fixed effects to account for differences in attendance data by team‐season. Below is the model’s equation:

lnAttendancet,i = β0 + β1(Streakt,i) + β2(HomeOpent,i) + β3(Competitivet,i) + β4(Mondayt,i) + β5(Tuesdayt,i) + β6(Wednesdayt,i) + β7(Thursdayt,i) + β8(Fridayt,i) + β9(Saturdayt,i) + β10(Marcht,i) + β11(Mayt,i) + β12(Junet,i) + β13(Julyt,i) + β14(Augustt,i) + β15(Septembert,i) + β16(Octobert,i) + β17(GBt,i) + β18(MarchGBt,i) + β19(MayGBt,i) + β20(JuneGBt,i) + β21(JulyGBt,i) + β22(AugustGBt,i) + β23(SeptemberGBt,i) + β24(OctoberGBt,i) + β25(Head1t,i) + β26(Night1t,i) + ε

The dependent variable, lnAttendance, is the log of attendance on the day of the game being played. The coefficient of interest for this analysis is β1, which is the coefficient on the variable Streak. Here, Streak is defined as the number of games that the hitting streak reaches during the observed game, t. The coefficient represents the percentage change in attendance given the addition of one more game to the player’s extended hitting streak. This coefficient will theoretically be positive, because, hypothetically, the lengthening of a hitting streak will draw larger crowds than the game before, after holding all other measureable and influential factors equal.

The rest of the factors are independent variables that are also likely to have affected how many people attended the game. HomeOpen is a dummy variable that equals one if the game is the first home game of the season and zero otherwise. Denaux et al. (2011) stressed that opening day has considerably higher attendance on average than the rest of the games in the season.

Competitive6 is a variable that predicts the home team’s probability of winning, as described earlier. It was used by Horowitz and Lackritz (2012) in their study with the thought that a more highly competitive game would yield higher attendance. This is, again,

6 Pw=(Ht – HtAt)/(Ht + At – 2HtAt) Seskin 19 referred to as the uncertainty of outcome hypothesis. If fans believe there is only a slight chance for the home team to win, they are more likely to show up to the game. As the probability of winning becomes too large, or especially too small, the attendance will dissipate.

Each day of the week is also included in the regression, because, as Denaux et al.

(2011), among other studies, suggested, it is likely that average crowd sizes differ depending on the day of the week. For the variables Monday through Saturday, the dummy variable is equal to one if the game is being played on that day and zero otherwise.

Sunday was omitted to prevent multicollinearity.

Likewise, according to Denaux et al. (2011), the different months also draw different average amounts of fans for various reasons including, but not limited to: weather, fan schedule availability/prior obligations, importance of games, etc. In the same manner by which days of the week were accounted for, a dummy variable equal to one is used for months March through October if the game being observed is being played during that month, and zero otherwise. April was omitted to prevent multicollinearity.

The variable GB represents the number of games back (or up if positive) the team is in their division at the start of game, t. This variable is important to consider because a higher likelihood of the home team making the playoffs is likely to draw a higher number of fans to the game. If the team is leading the division, this value will be positive, and if they are behind, it will be negative. The interaction variables between month and games back, represented by the variables MarchGB­OctoberGB are equal to the dummy variable representing the month in which the game is being played multiplied by the number of games back/up in the division. The inclusion of these interaction terms is important Seskin 20 because, logically, the number of games back a team is in the division becomes more important to the fans as the season goes on. This means it is plausible that fans react more strongly to games back or games up later in the season than they do earlier in the season because the team has less of a chance to change its likelihood of missing or making the playoffs.

Head1 is a dummy variable that is equal to one if the game being played is part of a scheduled double header, and zero otherwise. In this study, a scheduled double header is defined as a double header in which fans are able to attend both games for the price of one.

The practice of two games for the price of one was altogether eliminated in MLB by 1989, however, and the data reflect this fact by having all Head1 variables equal to zero for games after 1989 (Light, 2005).7

Finally, a dummy variable for whether or not the game is played at night is included.

Night games are likely to draw larger crowds, as fans are not as likely to have prior obligations, which they have during the day (such as work or school).

Time Aspect:

Because the data set for this study includes streaks ranging from 1941 up until

2011, there is an opportunity to see how fans’ reactions to the streaks have changed as the historical context surrounding the streaks have changed over time. Many factors that have developed over time may have a measureable effect on how heavily streaks sway attendance. Intuitively, some of these factors would weaken the correlation while others would make it stronger.

7 Head1 even equals zero for games that were part of double headers after 1989, because, by this time, double‐headers were mostly used to make up rainouts (and otherwise postponed games), and the owners no longer allowed double admission for the price of one ticket. Seskin 21

One example of a development that may hurt the correlation in recent decades is the overall increase in season ticket sales and game attendance in general. This may make game‐to‐game attendance fluctuations less volatile, and would therefore cause streaks to have less of an effect. By looking at attendance volatility as the years have gone on, this study will see if less erratic changes in attendance can account for some of the reason streaks’ effects on home attendance have shrunk as the decades have passed. By regressing year on the standard deviation of attendance, it will be clear whether attendance volatility has been growing or shrinking over the years.

Furthermore, as the seasons have progressed, there has been an increase in local and national broadcasting of the games, both on the radio and on television. In recent years, every one of the games in which the player was extending his streak was at least locally televised and broadcasted on the radio, if not nationally. The ability to simply watch a game from the comfort of the living room or listen on the radio while driving in a car would discourage fans from actually attending the game.

Although these factors seem like they may weaken the relationship between streak length and game attendance, other time‐related factors could counteract these forces, and help to intensify and quantitatively increase fan reactions to their home team’s player’s streak. With increased coverage of the MLB through new sports networks on television, and eventually the Internet, fans have had more and more awareness of player and team efforts, statistics, and records.

In DiMaggio’s day, sports fans would have had to read every word of the sports section to come across a passing line which told the reader that DiMaggio was on his way to a 30‐game hitting streak. As stated earlier, Horowitz and Lackritz (2012) did not begin Seskin 22 observing the effects of DiMaggio’s streak until it reached 30 games, because that is the point at which they deemed even casual sports fans would realize he was making history.

In today’s era, however, people have up‐to‐the‐minute facts and stats at their fingertips. Television channels such as ESPN (and affiliates), MLB Network, and local team‐ owned channels8 have devoted round‐the‐clock airtime to live updates from all around the league. A few clicks on a computer or tablet can tell users everything they need to know about each of their favorite player’s most recent statistics. Smartphone apps send instant notifications directly to cell phones, and social media outlets such as Facebook and Twitter allow people to share news about their favorite players with their friends and families. The combination of this recent technological innovation and increased access means that fans know about streaks sooner, and have the ability to follow them more closely. This could help to strengthen the regression’s correlation, especially earlier on in the streaks.

To look at how streak popularity has changed over time, this study will take two different approaches. One is the Decade Approach, which analyzes how the coefficients have changed by decade. The other is the Year Approach, which looks at how they have changed from year to year.

Decade Approach regression formula:

lnAttendancet,i = β0 + β1(Streakt,i) + β2(d50st,i) + β3(d60st,i) + β4(d70st,i) + β5(d80st,i) + β6(d90st,i) + β7(d2000st,i) + β8(Streakd50st,i) + β9(Streakd60st,i) + β10(Streakd70st,i) + β11(Streakd80st,i) + β12(Streakd90st,i) + β13(Streakd2000st,i) + β14(HomeOpent,i) + β15(Competitivet,i) + β16(Mondayt,i) + β17(Tuesdayt,i) + β18(Wednesdayt,i) + β19(Thursdayt,i) + β20(Fridayt,i) + β21(Saturdayt,i) + β22(Marcht,i) + β23(Mayt,i) + β24(Junet,i) + β25(Julyt,i) + β26(Augustt,i) + β27(Septembert,i) + β28(Octobert,i) + β29(GBt,i) + β30(MarchGBt,i) + β31(MayGBt,i) + β32(JuneGBt,i) + β33(JulyGBt,i) + β34(AugustGBt,i) + β35(SeptemberGBt,i) + β36(OctoberGBt,i) + β37(Head1t,i) + β38(Night1t,i) + ε

8 Such as the Yankee‐owned television channel, the YES network. Seskin 23

Here, the OLS, fixed effects regression is the same as the original model, however variables are added to account for time changes by decade. d50s – d2000s are dummy variables equal to one if the game being observed was played during that decade.

Streakd50s – Streakd2000s are interaction terms between the decade dummy and the length of the streak, which will yield coefficients for how the streaks in the observed decade related to streaks in the omitted decade, the 1940s.

Year Approach regression formula:

lnAttendancet,i = β0 + β1(Streakt,i) +β2(Year,i) + β3(StreakYear,i) + β4(HomeOpent,i) + β5(Competitivet,i) + β6(Mondayt,i) + β7(Tuesdayt,i) + β8(Wednesdayt,i) + β9(Thursdayt,i) + β10(Fridayt,i) + β11(Saturdayt,i) + β12(Marcht,i) + β13(Mayt,i) + β14(Junet,i) + β15(Julyt,i) + β16(Augustt,i) + β17(Septembert,i) + β18(Octobert,i) + β19(GBt,i) + β20(MarchGBt,i) + β21(MayGBt,i) + β22(JuneGBt,i) + β23(JulyGBt,i) + β24(AugustGBt,i) + β25(SeptemberGBt,i) + β26(OctoberGBt,i) + β27(Head1t,i) + β28(Night1t,i) + ε

In this regression, the variable Year is the year in which the game being observed was played, and the variable StreakYear is an interaction variable between the year of the observation and how long the streak is at the time of the observation.

Without Joe:

Horowitz and Lackritz (2012) pointed out that Joe DiMaggio’s streak had a profound effect on attendance, and no one has come close to breaking his record. It is possible, therefore, that Joe’s streak could be having too heavy of an effect on the coefficient describing the importance of streaks on fan behavior, especially when the time aspect is taken into account. Because Joe DiMaggio’s efforts could skew the regression results, this study will add to the analysis of the true effect of hitting streaks by running a second series Seskin 24 of the same regressions, but, this time, excluding DiMaggio’s streak from the data set being analyzed.

IV. Results

The model was able to estimate statistically and economically significant results for almost all the independent variables in question. Much of what was found in previous studies and discussed in the literature review was confirmed by the analysis of this study’s dataset. Some of the intuition and hypotheses were also confirmed.

Each regression revealed interesting information, helping to understand the effects of hitting streaks on home attendance. The results also show how fan reaction to streaks, at least by way of attending games, has changed over time, as well as how the effect of hitting streaks on attendance holds up when the most impressive streak, Joe DiMaggio’s, is excluded from the analysis. All of the results from the analysis can be found in Table 3.

Original Regression Results:

The original OLS regression with fixed effects yielded statistically and economically significant coefficients on almost all of the independent variables. Table A in the appendix gives the name, the coefficient, the standard error, the p‐score, and a 95% confidence interval for each explanatory variable.9

Confirming the findings from Denaux et al. (2011), it is clear that the home opener demands much higher attendance levels; on average almost twice as many people attend this game compared to the rest of the games in the season. Also validating results from

9 It is worth noting that it would be interesting to see the effects that adding a streak squared term would have. This may be able to tell us when in the streak the excitement really starts to build, and attendance begins to take off. When this term was included in the model, however, it made all the other results statistically insignificant. It was therefore excluded from the final analysis. Seskin 25

Denaux et al. (2011), attendance did change from month to month. The analysis shows that

April and March have the lowest attendance numbers, and June and July have the highest.

On average, 21% more people attended a game played in the month of July than a game played in the month of April. June saw an 18% increase in attendance from April. Games in

August, the other month with significant results, drew 15% more fans than games in April.

Furthermore, double headers and night games attracted more fans, 52% and 16% respectively.

The negative coefficient on the measure for competitiveness confirms the uncertainty of outcome hypothesis, which says that as the probability that the home team will win becomes too large, the attendance suffers. By multiplying the standard deviation of competitiveness found in the summary statistics from Table 2 by the coefficient on the competitiveness variable from the results in Table A, we see that there is an 8.83% decrease in attendance for each standard deviation increase in the competitiveness measure.10

The interaction term between games back and month authenticates the intuition that as the season goes on people’s reaction to how far back their home team is in the division becomes stronger. The games back variable, GB, indicates the percentage increase of people that attend a game if the team is one less game back (or one more game up) in the division. The coefficients on the interaction terms become more negative as the season goes on, indicating that the crowd is increasing by less and less as the season goes on from a decrease in games back. This means that they are reacting stronger to the number of

10 (0.1487492)(‐0.5938669) = ‐0.08834 Seskin 26

games back because they are losing hope that an increase in the standings of one game will

matter at the end of the season, and are deciding to not attend games.

Table 3: Results of Streak and Time Coefficients from OLS Regressions

A B C D E F With Joe DiMaggio Without Joe DiMaggio Variable Original w/ Year & w/ Original w/ Year w/ Year Name Regression StreakYear Decades Regression

1 Streak 0.0040** 0.0040** 0.2306** 0.0085** 0.0024* 0.0024* (0.0017) (0.0018) (0.1134) (0.0036) (0.0014) (0.0014)

2 Year 0.0167*** 0.0175*** 0.0323*** (0.0011) (0.0011) (0.0014)

3 StreakYear -0.000114* (0.00006)

4 1950s Streak -0.0052 (0.005)

5 1960s Streak -0.0006 (0.003)

6 1970s Streak -0.0036 (0.005)

7 1980s Streak -0.0065 (0.005)

8 1990s Streak -0.0053 (0.006)

9 2000s Streak -0.0071* (0.004)

*** = 0.01 significance level ** = 0.05 significance level * = 0.1 significance level

The most important finding from the regression is the coefficient describing fans’

reaction to extended hitting streaks. The coefficient, in column A, row 1 of Table 3,

indicates that there is an increase in home game attendance of 0.4% for each game that the

streak is extended. This means that, on average, the 30th game in the hitting streak will Seskin 27 increase attendance from the 29th by 12.7%.11 A hitting streak as iconic as Joe DiMaggio’s, reaching 56 games, will increase home attendance by 25%12 on the last day of the streak.

Assuming a streaking players’ team starts with the average attendance from the sample,

27,145 fans, a 30‐game hitting streak would bring an additional 52,498 fans13 to the home team’s stadium over the course of the streak. If the streak were as long as DiMaggio’s, the home team would accrue an additional 186,718 fans.14 These figures, however, presume that every game of the streak was played at the home stadium. Even if half of the games are away, however, the increase in cumulative home attendance is still very large, and would generate significant revenues for team owners. These figures also do not take into account increases in revenue from gained sales in concessions, merchandise, advertising, parking, etc.

Time Aspect Regression Results:

The coefficients estimated from the first time‐oriented approach, which analyzed how the data changed from decade to decade, can be seen in Table B in the appendix. The coefficients on the decade variables indicate that as time has gone on, overall attendance to games has increased. An F‐test of these coefficients was able to strongly reject the null hypothesis that these coefficients were jointly equal to zero. The coefficient on Streak represents the percentage effect on attendance that a one game extension of a player’s

11 30 X0(1.004) = X0(1.127) corresponds to a 12.7% increase in attendance.

12 (1.004)56 = 1.2505 corresponds to a 25% increase in attendance.

13 (27,145)*[(1.004) + (1.004)2 +(1.004)3 + … + (1.004)29 + (1.004)30] – (30)*(27145 ) = 52497.8

14 (27,145)*[(1.004) + (1.004)2 +(1.004)3 + … + (1.004)55 + (1.004)56] – (56)*(27145 ) = 186718.26 Seskin 28 streak had in the 1940s. This coefficient is likely much higher because it partly describes

Joe DiMaggio’s streak in 1941.15 The coefficients on the interaction term that describes streaks’ effects as they relate to the decade during which they occurred, become more negative in more recent decades. This means that as time has gone on, the streak’s effects on attendance have become less and less significant.16 An F‐test of these coefficients yields a p‐value of roughly 0.35, suggesting that the null hypothesis that these coefficients are jointly equal to zero cannot be rejected.

By subtracting the coefficient on the interaction term for a given decade from the streak coefficient describing the effects from streaks in the 1940s, a percentage relating the effects of a streak in that decade to a streak in the 1940s will result. For example, to find the effects of streaks in the 1980s, 0.0065 (the coefficient on the interaction term Streakd80s in column D, row 7 of Table 3) is subtracted from .0085 (the coefficient describing streaks’ effects in the 1940s in column D, row 1 of Table 3). This result shows a 0.2% increase in attendance for each game that a streak is extended in the 1980s. Unfortunately, however, many of the coefficients on the interaction terms are not strongly statistically significant.

Therefore, the streak by year approach results may be more accurate.

Although framed in a different way, the results from the OLS regression analyzing effects by year tell the same story. The results from this regression can be seen in Table C in the appendix. The positive coefficient on the explanatory variable, Year, suggests, just as do the coefficients on the decades, that home attendance has been increasing throughout time.

More recent seasons have, on average, drawn more fans to the stadiums than seasons

15 This coefficient is located in column D, row 1 of Table 3.

16 Coefficients can be seen in column D, rows 4‐9 of Table 3. Seskin 29 before. The coefficients on Streak and the interaction term StreakYear can be interpreted as follows:

The coefficient on StreakYear multiplied by a given year subtracted from the coefficient on Streak, will give the percentage effect on attendance that extending a streak by one game has in that year. For example, in 1941, the percentage change in attendance that comes about from the hitting streak reaching another game is equal to .0079.17 This means that one more game added on to a hitting streak in 1941 will increase the attendance by 0.79%. This is close to the 1940s coefficient 0.85% found through the decade approach. The results suggest that streaks occurring in 2000 had a much smaller effect, only adding 0.12% more fans per added streak‐game. An F‐test reveals that the null hypothesis that the coefficient StreakYear and Year are jointly equal to zero can be rejected at a highly significant level.

As argued earlier, it is possible that a decrease in attendance volatility as the seasons have gone on could help explain why the more recent streaks have had less of an effect on home attendance. If attendance volatility is low, fans are reacting less to all factors that would persuade them to go, or not to go, to games. Lower levels of attendance volatility could be explained by the increase in advanced sales of season tickets. It is also possible that, because of increased ticket prices, budgetary concerns have become more important to fans than the quality or convenience of the game. By regressing the standard deviation of attendance on Year, the resulting negative coefficient suggests that game attendance has, in fact, become less volatile year by year. The average standard deviation of attendance from 1941 to 2011 was about 7,524. By subtracting the coefficient on the

17 0.23 + (–0.0001144)*(1941) = .0079 Seskin 30 variable Year multiplied by a given year from the Constant, we can see what the standard deviation of attendance was in that year. For example in 1941, the predicted average standard deviation for attendance was roughly 10,96418, about 3,340 fans more than the average for the 70‐year span. In 2011, the average standard deviation is estimated to have been about 5,798,19 which is close to 1,726 fans less than the average for the entire time period and 5,166 fans less than the average for 1941. The coefficients resulting from this regression can be seen below in Table 4.

Table 4: Standard Deviation of Attendance by Year – Attendance Volatility

Variable Name Coef. Std. Err. P>|t| [95% Conf. Interval]

Year -73.8042 21.9257 0.003 -119.0566 -28.55178 Constant 154210.1 43579.8 0.002 64265.85 244154.4

Robustness, the Results Without Joe:

Because DiMaggio’s streak was so iconic and influential over home attendance, as shown in the time‐oriented analysis, it is important to see if the positive effect that streaks have on home attendance still holds if DiMaggio is taken out of the equation. Excluding

DiMaggo’s streak from the dataset and running the same regressions can achieve an appropriate robustness check. Table D in the appendix shows the results from this test. The resulting coefficients in column E and F, row 1 of Table 3 confirm that streaks still have a positive effect on attendance when Joe’s streak is not considered. The 0.24% increase in attendance per game that a streak is extended is not as large as the number found with

18 154210.1 + (–73.8)*(1941) = 10964.2

19 154210.1 + (–73.8)*(2011) = 5798.2 Seskin 31

DiMaggio’s streak included, but it is still economically and statistically significant in the predicted direction. The time‐oriented regressions did not yield significant coefficients when DiMaggio’s data was excluded from the set, but the results can be seen in Tables E and F in the appendix.

V. Conclusion

OLS regression analysis of 26 streaks ranging from 1941 to 2011 has revealed that there is a measurable positive economic impact that adding a game to an extended hitting streak has on home attendance. This effect held when the most notable and longest streak, that of Joe DiMaggio in 1941, was excluded from the data. Furthermore, the data showed that while overall average attendance increased as the seasons passed, the increase in attendance due to streaks decreased.

The results from the time aspect of the analysis can be explained by a number of hypotheses. Firstly, the analysis of how standard deviation of attendance has changed as the years have passed has revealed that as the seasons have gone on, attendance volatility has decreased. This means that people are reacting less to factors that affect why fans attend games, including hitting streaks. This may be because of an increase in advanced season ticket sales or because of increased general attendance levels. Additionally, the local and national televising and radio broadcasting of games has increased tremendously, especially in the recent decades. While in DiMaggio’s time only a few games per year were televised, today fans can watch their team play every game on television. This may also weaken the correlation between hitting streaks and attendance.

While the relationship between hitting streaks and attendance may be weakening as the years have gone on, that does not mean that hitting streaks are having less of a positive Seskin 32 economic effect. If one were to analyze how hitting streaks affect the television ratings of locally broadcasted games, the amount of people reading about the streak on the internet, or how many listeners tune in on the radio, it is likely that there would be a positive causal relationship in these mediums, as well. This means, because fans watch, listen, and monitor the game on the television, radio, and Internet, that streaks are not only increasing ticket sales, concessions, and souvenir sales in the stadium, but are also likely increasing advertising revenue to networks, websites, and teams.

The question is, how has the balance of revenue changed over time? It would be interesting to continue investigating the effects of streaks over time to see where teams and networks should concentrate their efforts to make the most profit possible from the streak. This study has supported the hypothesis, however, that individual players’ efforts can affect home attendance, and it is an important step in the understanding of what brings fans to games.

Seskin 33

Bibliography

Baseball Reference. (n.d.). Baseball­Reference.com. Retrieved from http://www.baseball‐

reference.com/

Bonds’ record home run ball fetches $752,467 ‐ Baseball‐ NBC Sports. (2007, September

15). Retrieved from http://nbcsports.msnbc.com/id/20794164/

Consecutive Games Hitting Streaks: 30+ Game Hitting Streaks in Baseball. (n.d.). Retrieved

December 5, 2012, from http://www.baseball‐almanac.com/feats/feats‐

streak.shtml

Denaux, Z. S., Denaux, D. A., & Yalcin, Y. (2011). Factors Affecting Attendance of Major

League Baseball: Revisited. Atlantic Economic Journal, 39(2), 117–127.

doi:http://dx.doi.org/10.1007/s11293‐011‐9274‐2

Horowitz, & Lackritz. (2012). Jolting Joe and Charlie Hustle: The Immediate Economic

Impact of An Extended Hitting Streak. American Economist, 57(12), 42–49.

Jayson Stark. (2011, May 15). Baseball’s unbreakable record. ESPN.com. Retrieved from

http://sports.espn.go.com/mlb/columns/story?columnist=stark_jayson&id=65398

12

Jonathan Light. (2005). In The Cultural Encyclopedia of Baseball. Jefferson, N.C.: McFarland

& Co.

Knowles, E. M. (1999). Oxford Dictionary Of Quotations (5th Edition). Oxford University

Press.

Knowles, G., Sherony, K., & Haupert, M. (1992). The Demand for Major League Baseball: A

Test of the Uncertainty of Outcome Hypothesis. The American Economist, 36(2), 72–

80. doi:10.2307/25603930 Seskin 34

Krautmann, A. C., & Hadley, L. (2006). Dynasties versus Pennant Races: Competitive

Balance in Major League Baseball. Managerial and Decision Economics, 27(4), 287–

292.

Lemke, R. J., Leonard, M., & Tlhokwane, K. (2010). Estimating Attendance at Major League

Baseball Games for the 2007 Season. Journal of Sports Economics, 11(3), 316–348.

Matuszewski, E. (2011, July 11). Jeter Fan Who Returned Baseball Leaves $180,000 on

Table to Do Right Thing. Bloomberg. Retrieved from

http://www.bloomberg.com/news/2011‐07‐11/jeter‐fan‐who‐returned‐3‐000‐hit‐

ball‐gives‐up‐180‐000‐to‐do‐right‐thing.html

Quirk, J. P., & Fort, R. D. (1992). Pay dirt: The business of professional team sports. Princeton,

N.J: Princeton University Press.

Retrosheet Game Logs. (n.d.). Retrieved December 5, 2012, from

http://www.retrosheet.org/gamelogs/index.html

Tom Tippett. (2002, October 1). May the Best Team Win ... At Least Some of the Time.

Diamond Mind, Inc. Retrieved from http://207.56.97.150/articles/playoff2002.htm

Seskin 35

Appendix

Table A – Original Regression Results

Variable Coef. Std. Err. P>|t| [95% Conf. Interval] Name Streak 0.0040380 0.0017952 0.034 0.0003407 0.0077353 HomeOpen 0.9826803 0.1409117 0.000 0.6924671 1.2728930 Competitive -0.5938669 0.1285470 0.000 -0.8586144 -0.3291194 Monday -0.4301922 0.0864279 0.000 -0.6081938 -0.2521905 Tuesday -0.4421506 0.0771971 0.000 -0.6011410 -0.2831602 Wednesday -0.4751516 0.0878907 0.000 -0.6561658 -0.2941374 Thursday -0.4683325 0.0969904 0.000 -0.6680880 -0.2685769 Friday -0.3125399 0.0986305 0.004 -0.5156731 -0.1094066 Saturday -0.1019777 0.0700916 0.158 -0.2463340 0.0423786 March -0.0822954 0.1337434 0.544 -0.3577452 0.1931543 May 0.0839997 0.0644140 0.204 -0.0486635 0.2166628 June 0.1774466 0.0836169 0.044 0.0052344 0.3496589 July 0.2064210 0.0771785 0.013 0.0474689 0.3653730 August 0.1516692 0.0848730 0.086 -0.0231299 0.3264684 September -0.0390157 0.1184633 0.745 -0.2829954 0.2049640 October -0.0989122 0.1219097 0.425 -0.3499898 0.1521655 GB 0.0554389 0.0217584 0.017 0.0106266 0.1002513 MayGB -0.0314702 0.0170448 0.077 -0.0665746 0.0036342 JuneGB -0.0421845 0.0192319 0.038 -0.0817933 -0.0025757 JulyGB -0.0466494 0.0206763 0.033 -0.0892331 -0.0040656 AugustGB -0.0482781 0.0211436 0.031 -0.0918242 -0.0047319 SeptemberGB -0.0471687 0.0204644 0.030 -0.0893160 -0.0050215 OctoberGB -0.0517100 0.0185651 0.010 -0.0899456 -0.0134745 Head1 0.5206694 0.1688839 0.005 0.1728466 0.8684923 Night1 0.1576074 0.0576803 0.011 0.0388126 0.2764022

Seskin 36

Table B – Regression with Decades Results

Variable Coef. Std. Err. P>|t| [95% Conf. Interval] Names Streak 0.0085014 0.0036118 0.027 0.0010629 0.0159400 d50s (omitted) d60s 0.5563343 0.0179593 0.000 0.5193464 0.5933222 d70s 0.9397664 0.0302808 0.000 0.8774020 1.0021310 d80s 0.7575488 0.0649990 0.000 0.6236808 0.8914167 d90s 1.4961790 0.0574090 0.000 1.3779430 1.6144150 d2000s 1.1554370 0.0269853 0.000 1.0998600 1.2110140 Streakd50s -0.0052029 0.0052845 0.334 -0.0160867 0.0056808 Streakd60s -0.0005822 0.0032335 0.859 -0.0072417 0.0060773 Streakd70s -0.0036277 0.0052365 0.495 -0.0144124 0.0071570 Streakd80s -0.0065230 0.0045070 0.160 -0.0158054 0.0027593 Streakd90s -0.0053213 0.0056036 0.351 -0.0168622 0.0062196 Streakd2000s -0.0071002 0.0042399 0.106 -0.0158325 0.0016321 HomeOpen 0.9827803 0.1404317 0.000 0.6935557 1.2720050 Competitive -0.5914816 0.1284452 0.000 -0.8560194 -0.3269438 Monday -0.4292756 0.0860797 0.000 -0.6065601 -0.2519911 Tuesday -0.4420810 0.0771861 0.000 -0.6010488 -0.2831132 Wednesday -0.4760047 0.0879513 0.000 -0.6571437 -0.2948657 Thursday -0.4686644 0.0968204 0.000 -0.6680697 -0.2692591 Friday -0.3128292 0.0985119 0.004 -0.5157183 -0.1099400 Saturday -0.1025702 0.0700269 0.155 -0.2467933 0.0416530 March -0.0911355 0.1336885 0.502 -0.3664721 0.1842012 May 0.0847966 0.0638786 0.196 -0.0467639 0.2163571 June 0.1726000 0.0842140 0.051 -0.0008421 0.3460420 July 0.2043925 0.0773320 0.014 0.0451243 0.3636607 August 0.1562526 0.0851637 0.078 -0.0191454 0.3316505 September -0.0307523 0.1177091 0.796 -0.2731788 0.2116742 October -0.0917498 0.1230330 0.463 -0.3451409 0.1616413 GB 0.0550478 0.0212520 0.016 0.0112786 0.0988171 MarchGB (omitted) MayGB -0.0313050 0.0168381 0.075 -0.0659838 0.0033737 JuneGB -0.0418092 0.0189068 0.036 -0.0807484 -0.0028700 JulyGB -0.0460997 0.0204042 0.033 -0.0881231 -0.0040764 AugustGB -0.0476519 0.0210078 0.032 -0.0909182 -0.0043856 SeptemberGB -0.0463240 0.0201638 0.030 -0.0878522 -0.0047958 OctoberGB -0.0520882 0.0180435 0.008 -0.0892495 -0.0149269 Head1 0.5200625 0.1713170 0.006 0.1672285 0.8728965 Night1 0.1579089 0.0575342 0.011 0.0394149 0.2764028

Seskin 37

Table C – Regression with Streak by Year Results

Variable Coef. Std. Err. P>|t| [95% Conf. Interval] Name Streak 0.2306304 0.113430 0.053 -0.0029831 0.4642440 StreakYear -0.0001144 0.000057 0.057 -0.0002324 0.0000036 Year 0.0174702 0.001094 0.000 0.0152181 0.0197224 HomeOpen 0.9835561 0.140884 0.000 0.6934000 1.2737120 Competitive -0.5871320 0.129529 0.000 -0.8539020 -0.3203619 Monday -0.4299980 0.086127 0.000 -0.6073794 -0.2526166 Tuesday -0.4426337 0.077215 0.000 -0.6016605 -0.2836069 Wednesday -0.4764424 0.088016 0.000 -0.6577149 -0.2951698 Thursday -0.4693345 0.096884 0.000 -0.6688715 -0.2697974 Friday -0.3133481 0.098514 0.004 -0.5162408 -0.1104554 Saturday -0.1032040 0.070370 0.155 -0.2481344 0.0417263 March -0.0932799 0.135866 0.499 -0.3731016 0.1865419 May 0.0869955 0.064606 0.190 -0.0460620 0.2200531 June 0.1741257 0.084081 0.049 0.0009572 0.3472942 July 0.2005012 0.076247 0.014 0.0434676 0.3575349 August 0.1560770 0.084631 0.077 -0.0182233 0.3303773 September -0.0290867 0.115466 0.803 -0.2668942 0.2087208 October -0.0948358 0.120993 0.441 -0.3440258 0.1543542 GB 0.0541394 0.021668 0.019 0.0095127 0.0987661 MarchGB (omitted) MayGB -0.0302894 0.017053 0.088 -0.0654105 0.0048316 JuneGB -0.0410376 0.019169 0.042 -0.0805174 -0.0015578 JulyGB -0.0457301 0.020613 0.036 -0.0881824 -0.0032779 AugustGB -0.0471317 0.021158 0.035 -0.0907074 -0.0035560 SeptemberGB -0.0454807 0.020469 0.036 -0.0876370 -0.0033244 OctoberGB -0.0508296 0.018522 0.011 -0.0889770 -0.0126822 Head1 0.5159614 0.169786 0.005 0.1662811 0.8656417 Night1 0.1582691 0.057529 0.011 0.0397857 0.2767526

Seskin 38

Table D – Original Regression without DiMaggio Results

Variable Coef. Std. Err. P>|t| [95% Conf. Interval] Name Streak 0.0024176 0.0014039 0.098 -0.0004800 0.0053151 HomeOpen 0.9098083 0.1290048 0.000 0.6435555 1.1760610 Competitive -0.5943888 0.1367364 0.000 -0.8765989 -0.3121788 Monday -0.3784380 0.0750551 0.000 -0.5333441 -0.2235320 Tuesday -0.4038383 0.0733828 0.000 -0.5552931 -0.2523836 Wednesday -0.4268091 0.0783567 0.000 -0.5885294 -0.2650889 Thursday -0.4245990 0.0913012 0.000 -0.6130355 -0.2361626 Friday -0.2526235 0.0849904 0.007 -0.4280351 -0.0772119 Saturday -0.0703500 0.0675106 0.308 -0.2096850 0.0689849 March -0.0253568 0.1263909 0.843 -0.2862148 0.2355012 May 0.0955039 0.0690268 0.179 -0.0469605 0.2379682 June 0.1910704 0.0858443 0.036 0.0138964 0.3682444 July 0.2200450 0.0841216 0.015 0.0464265 0.3936636 August 0.1847221 0.0867628 0.044 0.0056525 0.3637918 September 0.0655554 0.0876469 0.462 -0.1153390 0.2464497 October -0.0732886 0.1206321 0.549 -0.3222610 0.1756838 GB 0.0608818 0.0220211 0.011 0.0154324 0.1063313 MayGB -0.0302318 0.0173380 0.094 -0.0660157 0.0055521 JuneGB -0.0429860 0.0199024 0.041 -0.0840626 -0.0019094 JulyGB -0.0494228 0.0213688 0.030 -0.0935259 -0.0053197 AugustGB -0.0501036 0.0219266 0.031 -0.0953579 -0.0048492 SeptemberGB -0.0450534 0.0211653 0.044 -0.0887365 -0.0013703 OctoberGB -0.0550937 0.0190073 0.008 -0.0943228 -0.0158646 Head1 0.4756555 0.1931891 0.021 0.0769327 0.8743783 Night1 0.1297230 0.0538469 0.024 0.0185885 0.2408575

Seskin 39

Table E – Decades Regression without DiMaggio Results

Variable Coef. Std. Err. P>|t| [95% Conf. Interval] Name Streak 0.0025779 0.0035602 0.476 -0.0047699 0.0099257 d50s (omitted) d60s 1.5091680 0.1212722 0.000 1.2588750 1.7594620 d70s 1.8898950 0.1160544 0.000 1.6503710 2.1294200 d80s 1.3861880 0.1166305 0.000 1.1454740 1.6269010 d90s 2.1236260 0.1484211 0.000 1.8173000 2.4299520 d2000s 2.1351390 0.1195024 0.000 1.8884980 2.3817790 Streakd50s 0.0006933 0.0062701 0.913 -0.0122474 0.0136341 Streakd60s 0.0038534 0.0032546 0.248 -0.0028637 0.0105705 Streakd70s 0.0024883 0.0049003 0.616 -0.0076255 0.0126021 Streakd80s -0.0006180 0.0044007 0.889 -0.0097005 0.0084646 Streakd90s 0.0012206 0.0054999 0.826 -0.0101306 0.0125718 Streakd2000s -0.0022013 0.0043606 0.618 -0.0112012 0.0067985 HomeOpen 0.9111236 0.1289091 0.000 0.6450682 1.1771790 Competitive -0.5901574 0.1367613 0.000 -0.8724188 -0.3078961 Monday -0.3785748 0.0750164 0.000 -0.5334011 -0.2237485 Tuesday -0.4041299 0.0734120 0.000 -0.5556448 -0.2526151 Wednesday -0.4273756 0.0783664 0.000 -0.5891158 -0.2656354 Thursday -0.4250599 0.0912014 0.000 -0.6132904 -0.2368294 Friday -0.2530082 0.0849307 0.007 -0.4282966 -0.0777198 Saturday -0.0708300 0.0674906 0.304 -0.2101237 0.0684637 March -0.0334245 0.1259913 0.793 -0.2934576 0.2266087 May 0.0943940 0.0683914 0.180 -0.0467589 0.2355468 June 0.1920229 0.0863950 0.036 0.0137124 0.3703334 July 0.2172251 0.0841334 0.016 0.0435823 0.3908679 August 0.1857823 0.0872499 0.044 0.0057074 0.3658572 September 0.0703272 0.0893103 0.439 -0.1140002 0.2546547 October -0.0673618 0.1224915 0.587 -0.3201719 0.1854482 GB 0.0602151 0.0220975 0.012 0.0146081 0.1058222 MarchGB (omitted) MayGB -0.0298610 0.0172549 0.096 -0.0654733 0.0057513 JuneGB -0.0424464 0.0199369 0.044 -0.0835941 -0.0012986 JulyGB -0.0490593 0.0214780 0.032 -0.0933876 -0.0047309 AugustGB -0.0492703 0.0220832 0.035 -0.0948479 -0.0036928 SeptemberGB -0.0442741 0.0211352 0.047 -0.0878949 -0.0006533 OctoberGB -0.0546588 0.0190013 0.008 -0.0938755 -0.0154422 Head1 0.4742276 0.1985329 0.025 0.0644758 0.8839793 Night1 0.1306445 0.0539918 0.023 0.0192109 0.2420781

Seskin 40

Table F – Regression with Streak by Year without DiMaggio Results

Variable Coef. Std. Err. P>|t| [95% Conf. Interval] Name Streak 0.0968104 0.1227658 0.438 -0.1565658 0.3501866 StreakYear -0.0000475 0.0000620 0.451 -0.0001755 0.0000805 Year 0.0325491 0.0016418 0.000 0.0291606 0.0359377 HomeOpen 0.9104374 0.1289999 0.000 0.6441947 1.1766800 Competitive -0.5911501 0.1386490 0.000 -0.8773075 -0.3049926 Monday -0.3787906 0.0750226 0.000 -0.5336296 -0.2239517 Tuesday -0.4043176 0.0734130 0.000 -0.5558346 -0.2528006 Wednesday -0.4275480 0.0783285 0.000 -0.5892100 -0.2658860 Thursday -0.4252997 0.0912348 0.000 -0.6135990 -0.2370003 Friday -0.2534126 0.0849869 0.006 -0.4288170 -0.0780082 Saturday -0.0709743 0.0676058 0.304 -0.2105059 0.0685572 March -0.0290303 0.1266927 0.821 -0.2905111 0.2324506 May 0.0964810 0.0692165 0.176 -0.0463749 0.2393368 June 0.1917973 0.0860221 0.035 0.0142564 0.3693382 July 0.2161035 0.0830292 0.016 0.0447397 0.3874673 August 0.1842098 0.0869496 0.045 0.0047546 0.3636651 September 0.0678969 0.0880158 0.448 -0.1137587 0.2495525 October -0.0722462 0.1204341 0.554 -0.3208099 0.1763176 GB 0.0599872 0.0223653 0.013 0.0138274 0.1061470 MarchGB (omitted) MayGB -0.0296091 0.0174983 0.104 -0.0657238 0.0065057 JuneGB -0.0422496 0.0201030 0.046 -0.0837402 -0.0007589 JulyGB -0.0489162 0.0215299 0.032 -0.0933517 -0.0044806 AugustGB -0.0495205 0.0221221 0.035 -0.0951783 -0.0038626 SeptemberGB -0.0442126 0.0214257 0.050 -0.0884330 0.0000078 OctoberGB -0.0543865 0.0193169 0.010 -0.0942546 -0.0145184 Head1 0.4730516 0.1926521 0.022 0.0754372 0.8706660 Night1 0.1300775 0.0536261 0.023 0.0193987 0.2407564