arXiv:2102.09689v1 [stat.AP] 19 Feb 2021 [email protected],[email protected],armann.ingol Keywords: hot- the contradicts which probability, perf save save next-shot recent the good win that time-based find and consistently shot-based we of performance, range wide a Using shots 48,431 playoffs. of consists data Our characteristics. related re fo logistic controlling multilevel , playoff use League We Hockey National present. the in better perform ainlHce ege(H)paos oledrwokeep who A playoffs. (NHL) League goalie” Hockey “hot a National Having present. the in h better who performing athlete past an to refers generally phenomenon hot-hand The Introduction 1 window. shot-based h o-adter oista naheewohsperformed has who athlete an that posits theory hot-hand The iagDn,Io rbe,Amn nofsn oiaTran Monica Ingolfsson, Armann Cribben, Ivor Ding, Likang o ad c oky oledr,Ntoa okyLeague, Hockey National goaltenders, hockey, ice hand, Hot nvriyo let,Emno,A 6 R,Canada 2R6, T6G AB Edmonton, Alberta, of University oNLgaisgthti h playoffs? the in hot get goalies NHL Do utlvllgsi ersinanalysis regression logistic multilevel A eray2,2021 22, February Abstract 1 adtheory. hand [email protected],[email protected] ait fso-eae n game- and shot-related of variety a r o 3gatnesi h 2008-2016 the in goaltenders 93 for sse scuilt ucs nthe in success to crucial as seen is rac a eaieeeton effect negative a has ormance rsint etti hoyfor theory this test to gression spromdwl nterecent the in well performed as ost uniyrcn save recent quantify to dows l uk u ftentfor net the of out pucks all s eli h eetps will past recent the in well iebsdwindow, time-based 16 games (4 series of 4 wins) will win his team the —obviously. In this paper, we use data from the NHL playoffs to investigate whether goaltenders get hot, in the sense that if a goaltender has had a high recent save probability, then that goaltender will have a high save probability for the next shot that he faces.

NHL fans, coaches, and players appear to believe that goaltenders can get hot. A famous example is using backup as the starting goaltender for the Detroit

Red Wings during the 1997 playoffs, despite having started in goal during most of the regular season, because Bowman believed Vernon was the hot goaltender (Morrison and

Schmittlein 1998). The Red Wings won the Stanley Cup that year for the first time in 42 years.

NHL goaltenders let in roughly one in ten shots. More precisely, during the 2018-19 regular season, 93 goaltenders playing for 31 teams faced a total of 79,540 shots, which resulted in

7,169 goals, for an average save percentage of 91.0%. Among goaltenders who played at least 20 games, the season-long save percentage varied from a high of 93.7% (Ben Bishop, ) to a low of 89.7% (Joonas Korpisalo, )—a range from 1.3 percentage points (pps) below to 2.7 pps above the overall average save percentage. In the playoffs of the same year, the overall average save percentage was 91.6%. Among goaltenders who played in two or more playoff games, the save percentage varied from 93.6% (Robin Lehner, New York

Islanders) to 85.6% (, )—–a range from 6 pps below to

2 pps above the overall average save percentage.

It is crucial to determine whether the hot-hand phenomenon is real, for NHL goaltenders, in order to understand whether coaches are justified in making decisions about which of a team’s two goaltenders should start a particular game based on perceptions or estimates of whether that goaltender is hot. If the hot hand is real, then appropriate statistical models could potentially be used to predict the likely performance of a team’s two goaltenders in an upcoming game, or even in the remainder of a game that is in progress, during which the goaltender currently on the ice has performed poorly.

Our major finding is statistically significant negative slope coefficients for the variable of

2 interest measuring the influence of the recent save performance on the probability of saving the next shot on goal; in other words, we have demonstrated that contrary to the hot-hand theory, better past performance usually results in worse future performance. This negative impact of recent good performance is robust, according to our analysis, to both varying window sizes and defining the window size based on either time or number of shots.

The remainder of the paper is organized as follows: in Section 2, we review related literature; in Section 3, we describe our data set; in Section 4, we specify our regression models; and in

Section 5, we present our results. Section 6 concludes.

2 Literature review

We summarize five streams of related work addressing the following: (1) whether the hot hand is a real phenomenon or a fallacy, (2) whether statistical methods have sufficient power to detect a hot hand, (3) whether offensive and defensive adjustments reduce the impact of a hot hand, (4) estimation of a hot-hand effect for different positions in a variety of sports, and (5) specification of statistical models to estimate the hot hand.

(1) Is the hot hand a real phenomenon or a fallacy? The hot hand was originally studied in the 1980s in the context of basketball shooting percentages (Gilovich, Vallone, and Tversky 1985;

Tversky and Gilovich 1989a,b). These studies concluded that even though players, coaches, and fans all believed strongly in a hot-hand phenomenon, there was no convincing statistical evidence to support its existence. Instead, Gilovich, Vallone, and Tversky (1985) attributed beliefs in a hot hand to a psychological tendency to see patterns in random data; an explanation that has also been proposed for behavioral anomalies in various non-sports contexts, such as financial markets and gambling (Miller and Sanjurjo 2018). Contrary to these findings, recent papers by

Miller and Sanjurjo (2018, 2019) demonstrate that the statistical methods used in the original studies were biased, and when their data is re-analyzed after correcting for the bias, strong evidence for a hot hand emerges.

(2) Do statistical methods have sufficient power to detect a hot hand? The Gilovich, Vallone,

3 and Tversky (1985) study analyzed players individually. This approach may lack sufficient statistical power to detect a hot hand, even if it exists (Wardrop 1995, 1999). Multivariate approaches that pool data for multiple players have been proposed to increase power (Arkes

2010). We follow this approach, by pooling data for multiple NHL goaltenders over multiple playoffs.

(3) Do offensive and defensive adjustments reduce the impact of a hot hand? A hot hand, even if it is real, might not result in measurable improvement in performance if the hot player adapts by making riskier moves or if the opposing team adapts by defending the hot player more intensively. For example, a hot basketball player might attempt riskier shots and the opposing team might guard a player they believe to be hot more closely. The extent to which such adjustments can be made varies by sport, by position, and by the game situation. For example, there is little opportunity for such adjustments for basketball free throws (Gilovich,

Vallone, and Tversky 1985) and there is less opportunity to shift defensive resources towards a single player in baseball than in basketball (Green and Zwiebel 2017), because the fielding team defends against a single batting team player at a time. An NHL goaltender must face the shots that are directed at him and thus has limited opportunities to make riskier moves if he feels that he is hot. Furthermore, the opposing team faces a single goaltender and therefore has limited opportunities to shift offensive resources away from other tasks and towards scoring on the goaltender. Therefore, NHL goaltenders provide an ideal setting in which to measure whether the hot-hand phenomenon occurs.

(4) Estimation of a hot-hand effect for different positions in a variety of sports. In addition to basketball shooters, the list of sports positions for which hot-hand effects have been investigated includes baseball batters and pitchers (Green and Zwiebel 2017), soccer penalty shooters (Otting¨ and Groll 2019), dart players (Otting¨ et al. 2020), and golfers (Livingston 2012).

In , a momentum effect has been investigated at the team level (Kniffin and Mihalek

2014). A hot-hand effect has been investigated for ice hockey shooters (Vesper 2015), but not for goaltenders, except for the study by Morrison and Schmittlein (1998). The latter study

4 focused on the duration of NHL playoff series, noted a higher-than-expected number of short series, and proposed a goaltender hot-hand effect as a possible explanation. This study did not analyze shot-level data for goaltenders, as we do.

(5) Specification of statistical models to estimate the hot hand. Hot-hand researchers have used two main approaches in specifying their statistical models: (1) Analyze success rates, conditional on outcomes of previous attempts (Albright 1993; Green and Zwiebel 2017) or

(2) incorporate a latent variable or “state” that quantifies “hotness” (Green and Zwiebel 2017;

Otting¨ and Groll 2019). We follow the former approach. With that approach, the history of past performance is typically summarized over a “window” that is defined in terms of a fixed number of past attempts—the “window size.” It is not clear how to choose the window size. We vary the window size over a range that covers the window sizes used in past work. Furthermore, in addition to shot-based windows, we also use time-based windows—an approach that complicates data preparation and has not been used by other investigators.

We contribute to the hot-hand literature by investigating NHL goaltenders, a position that has not been studied previously, and which provides a setting in which there are limited oppor- tunities for either team to adapt their strategies in reaction to a perception that a goaltender is hot. In terms of methodology, we use multilevel logistic regression, which allows us to pool data across goaltender-seasons to increase statistical power, and we use a wide range of both shot-based and time-based windows to quantify a goaltender’s recent save performance.

3 Data and variables

Our data set consists of information about all shots on goal in the NHL playoffs from 2008 to 2016. The season-level data is from www.hockey-reference.com (Hockey Reference 2017) and the shot-level data is from corsica.hockey (Perry 2017). We have data for 48,431 shots, faced by 93 goaltenders, over 9 playoff seasons, with 91.64% of the shots resulting in a save.

We divided the data into 224 groups, containing from 2 to 849 shot observations, based on combinations of goaltender and playoff season. The data set includes 1,662 shot observations

5 for which one or more variables have missing values. Removing those observations changes the average save proportion from 91.64% to 91.61% and the number of groups from 224 to 223.

We exclude observations with missing values from our regression analysis but we include these observations when computing the variable of interest (recent save performance), as discussed in

Subsection 3.2.

3.1 Dependent variable: Shot outcome

The dependent variable, yij , equals 1 if shot i in group j resulted in a save and 0 if the shot resulted in a goal for the opposing team. A shot that hits the crossbar or one of the goalposts is not counted as a shot on goal and is not included in our data set.

3.2 Variable of interest: Recent save performance

The primary independent variable of interest, xij , is the recent save performance, immediately before shot i, in goaltender-season group j. It is not obvious how to quantify this variable and therefore we investigate several possibilities. In all cases, we define the recent save performance as the ratio of the number of saves to the number of shots faced by the goaltender, over some

“window”. We compare shot-based windows, over the last 5, 10, 15, 30, 60, 90, 120, and 150 shots faced by the goaltender and time-based windows, over the last 10, 20, 30, 60, 120, 180, 240, and 300 minutes played by the goaltender. We chose these window sizes to make the shot-based and time-based window sizes comparable, given that a goaltender in the NHL playoffs faces an average of 30 shots on goal per 60 minutes of playing time. The largest window sizes (150 shots and 300 minutes) correspond, roughly, to 5 games, an interval length that Green and Zwiebel

(2017) suggested was needed to determine whether a baseball player was hot.

A window could include one or more intervals during which the group j goaltender was replaced by a backup goaltender. Shots faced by the backup goaltender are excluded from the computation of xij . A window could include time periods between consecutive games, which could last several days.

6 For a given window, we exclude shot observations for which the number of shots or the time elapsed prior to the time of the shot is less than the window size. This reduces the number of shots used in the analysis by 20% to 50%, depending on the window size. As stated previously, we included shots with missing values in the other independent variables in the computation of xij but we excluded those shots from the regression analysis that we describe in Section 4.

3.3 Control variables: Other influential factors

We include a vector, Zij , of six control variables for shot i of group j that we expect could have an impact on the shot outcome. The control variables are: Game score, Position, Home, Rebound,

Strength, and Shot type. All of these variables are categorical, except Position, which can be expressed either as one categorical variable or as two numerical variables. In what follows, we elaborate on each of the control variables.

Game score indicates whether the goaltender’s team is Leading (base case), Tied, or Trailing in the game. Home is a binary variable indicating whether the goaltender was playing on home ice or not (base case). Rebound is a binary variable indicating whether the shot occurred within

2 seconds of uninterrupted game time of another shot from the same team (Perry 2017) or not (base case). Strength represents the difference between the number of players from the goaltender’s team on the ice and the number of players from the other team on the ice. Strength can take values of +2, +1, 0, −1, and −2 (base case), and we treat this variable as categorical, to allow for nonlinear effects. Shot type denotes the shot type: Backhand (base case), Deflected,

Slap, Snap, Tip-in, Wrap-around, or Wrist.

The numerical specification for Position is based on a line from the shot origin to the midpoint of the crossbar of the goal. We use d (distance) for the length in feet of this line and α for the angle that this line makes with a line connecting the midpoints of the crossbars of the two goals

(Figure 1a). A limitation of this specification is that the save probability could depend on d and α in a highly nonlinear manner. Rather than introduce nonlinear terms, we also investigate a categorical specification, which we describe next.

7 Figure 1: Numerical and categorical specifications for Position of shot origin.

Top

d Slot ɲ Corner Corner

Goal Goal

(a) Numerical specification (b) Categorical specification

For the categorical specification of Position, we divide the ice surface into three regions: Top,

Slot, and Corner (base case) (Figure 1(b)), and categorize each shot based on the region from which the shot originated. We expect the save probability to be lower for shots from the Slot region than from the Top or Corner regions.

A limitation of both specifications for Position is that we do not have data on whether the shot originated from the left or right side of the ice.

4 Regression models

We used multilevel logistic regression, with partial pooling, also referred to as mixed effects modelling. We center the variable of interest (Enders and Tofighi 2007; Hox and Roberts 2011)

∗ − by subtracting the group mean, that is, we set xij = xij x¯j , wherex ¯j is the average of xij in

∗ group j. The centered variable xij represents the deviation in performance of the goaltender- season group j from his average performance for the current playoffs. Our interest is in whether such deviations persist over time.

We allow the intercepts and the slope coefficients of the variables of interest to vary by group, but the control variable slope coefficients are the same for all groups, as shown in the following

8 partial pooling specification:

−1 ∗ Pr(yij = 1) = logit (αj + βj xij + γZij), (1)

where logit(p) ≡ ln(p/(1 − p)), αj is the intercept for group j, βj is the slope for group j, and γ is the global vector of coefficients for the control variables. We represent the intercept and slope

∗ of the variable of interest as the sum of a fixed effect and a random effect, that is: αj =α ¯ + αj ¯ ∗ and βj = β + βj .

All results that we report in Section 5 were obtained using Markov chain Monte Carlo

(MCMC), using the rstan and rstanarm R packages. We used the default prior distributions for the rstanarm package. The default distributions are weakly informative—Normal distributions with mean 0 and scale 2.5 (for the coefficients) and 10 (for the intercepts) (Gabry and Goodrich

2019). We used default values for the number of chains (4), the number of iterations (2,000), and the warm-up period (1,000 iterations).

We also used maximum likelihood (ML), using the lme4 R package, and found the MCMC and ML estimates to be nearly identical, except for a few instances where the ML estimation algorithm did not converge. The lack of ML estimation convergence in some instances is con- sistent with the findings in Eager and Roy (2017). MCMC is considered a good surrogate in situations where an ML estimation algorithm has not been established (Wollack et al. 2002).

5 Results

In this section, we first provide detailed results for the longest window sizes, using the categorical specification for Position. Second, we investigate the robustness of our main finding to the window size, window type, individual goaltender-seasons, and which specification was used for

Position. Third, we illustrate the estimated impact of the control variables on the save probability for the next shot. Fourth, we provide diagnostics for the MCMC estimation.

9 Table 1: Regression results for the 150-shot model and the 300-minute model

k = 150 shots t = 300 minutes Variable Mean 2.5% 97.5% Mean 2.5% 97.5% Intercept fixed effect 2.10 1.64 2.57 2.13 1.67 2.62 ∗ − − − − − − Recent save performance (xij) fixed effect 8.45 11.64 5.31 8.59 11.49 5.53 Game score: Tied −0.05 −0.16 0.06 −0.03 −0.15 0.08 Game score: Trailing −0.12 −0.25 0.00 −0.11 −0.23 0.02 Home −0.02 −0.12 0.09 0.02 −0.08 0.12 Rebound −1.09 −1.25 −0.94 −1.10 −1.26 −0.93 Strength: +2 1.86 −1.22 5.61 1.82 −1.09 5.68 Strength: +1 0.76 0.24 1.28 0.73 0.22 1.22 Strength: 0 0.94 0.51 1.36 0.94 0.52 1.33 Strength: −1 0.45 0.00 0.87 0.45 0.03 0.85 Shot type: Deflected −0.98 −1.34 −0.63 −1.07 −1.41 −0.72 Shot type: Slap −0.03 −0.24 0.18 −0.12 −0.34 0.09 Shot type: Snap −0.05 −0.25 0.15 −0.11 −0.32 0.10 Shot type: Tip-in −0.56 −0.78 −0.34 −0.62 −0.85 −0.40 Shot type: Wrap-around 0.40 −0.10 0.93 0.31 −0.19 0.83 Shot type: Wrist 0.02 −0.15 0.19 −0.04 −0.21 0.13 Position: Top 0.70 0.51 0.89 0.68 0.49 0.86 Position: Slot −0.82 −0.94 −0.69 −0.84 −0.97 −0.71

5.1 Baseline results

Table 1 provides means and 95% credible intervals for the intercept and slope fixed effects and

for the control variable coefficients, for our baseline models: the models with 150-shot and 300-

minute windows, and a categorical specification for Position. The window sizes for the baseline

models are comparable to those used by Green and Zwiebel (2017).

Our main finding from the baseline models is that a goaltender’s recent save performance

has a negative and a statistically significant fixed effect value for both window sizes, which is

contrary to the hot-hand theory.

The two baseline models give consistent results for the control variables. The only two

control variable coefficients that disagree in sign, Home and Shot type: Wrist, have 95% credible

intervals that contain zero. The posterior mean values for the significant control variables have

the same sign, have similar magnitude, and are in the direction we expect, for both window

types. Specifically, the posterior mean values indicate that a goaltender performs better when

his team has more skaters on the ice, when facing a wrap-around shot, and when facing a shot

10 Figure 2: Recent save performance fixed effects, for all window types and sizes.

Shot-based window Time-based window 150 300 135 270 120 240 105 210 90 180 75 150 Shots

60 120 Minutes 45 90 30 60 15 30 0 0 -10 -8 -6 -4 -2 0 Fixed effect for recent save performance from the top region. A goaltender performs worse when facing a rebound shot, a deflected shot, a slap shot, a snap shot, a tip-in shot, or a shot from the slot region.

5.2 Robustness of main finding

Our main finding, that recent save performance has a negative fixed effect value, holds for all window sizes and types (Figure 2). (Although not shown in the figure, all of the fixed effects are statistically significant.) The fixed effect magnitude increases with the window size.

The fact that the slope fixed effects, β¯ˆ, are negative leaves open the possibility that the slopes for some individual goaltender-seasons are positive, but Figures 3-4 show that this is not the case. These figures show that all of the individual-group slope coefficients, βˆj , for both the longest window sizes (Figure 3) and the shortest window sizes (Figure 4), are strongly negative.

Figures 3-4 also show positive correlation between the slope estimates from the shot-based vs. the time-based window models.

The coefficients of the significant control variables remained consistent in sign and similar in magnitude across all window sizes and types (Figure 5). The control variable point estimates for all window types and sizes are within a 95% Bayesian confidence interval (or a credible interval) for the 300-minute baseline model. Furthermore, changing the specification for Position from categorical to numerical, for the t = 300 baseline model, resulted in a slope fixed effect that

11 Figure 3: Estimated slopes (βˆjs) for all groups, for k = 150 window vs. t = 300 window.

-8.3 -8.7 -8.6 -8.5 -8.4 -8.3

-8.4

= 0.3158 -5.9192  R² = 0.4175 -8.5 š t = = t 300minutes -8.6

-8.7 k = 150 shots

Figure 4: Estimated slopes (βˆjs) for all groups, for k = 15 window vs. t = 30 window. -0.6 -1.1 -1 -0.9 -0.8 -0.7 -0.6 -0.7

-0.8

 = 0.5601x -0.2632  -0.9 R² = 0.7383 t = t= 30minutes

-1

-1.1 k = 15 shots

12 Figure 5: Control variable slope coefficient estimates, for all window types and sizes. The point estimates are represented by small circles. 95% Bayesian confidence intervals from the t = 300- minute baseline model are included for comparison.

2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 Gamescore: Tied Gamescore: Trailing Ho me Rebound Strength:-1 Strength:0 Strength:+1 Strength:+2 Shot type: Deflected Shot type: Slap Shot type: Snap Shot type: Tip-in Shot type: Wrap-around Shot type: Wrist Position: Slot Position: Top

remained negative and was similar in magnitude. The signs of the coefficients for all statistically

significant control variables in this model variant agreed with the t = 300 baseline model.

5.3 Magnitude of the impact of the independent variables on the save

probability

Figures 6–7 illustrate the impact of recent save performance and the control variables on the

estimated save probability for the next shot, using the t = 300-minute baseline model. In

creating Figure 6, we set all control variables to their baseline values. In creating Figure 7, for

∗ each panel, we set xij = 0 and we set all control variables except the one being varied in the

panel to their baseline values.

From Figure 6, we see that as the deviation of a goaltender’s recent save performance from

his current-playoff average increases from the 2.5th to the 97.5th percentile, his estimated save

probability for the next shot decreases by 5 pps. For comparison, this range is larger than the

3.7 pp range of average save percentages during the 2018-19 regular season (as discussed in

13 ∗ Figure 6: Estimated save probability against 2.5th, 50th and 97.5th percentile of xij, the recent save performance.

97.5% 87%

50.0% 89%

2.5% 92%

75% 80% 85% 90% 95% 100% Recent performance percentile performance Recent Estimated save probability

Section 1) but smaller than the 8 pp range of average save percentages during the playoffs of

the same season.

∗ Given that we define xij to be the deviation in performance of the goaltender-season group

∗ j from his average performance for the current playoffs, the percentiles for xij correspond to

different recent save performances for different groups. To illustrate the effect in more concrete

terms, consider the largest group: The 699 shots faced by during the 2011 playoffs.

The group average wasx ¯j = 94%. We set each control variable category equal to its sample

proportion in the group j data. For shots where Thomas’ recent save performance was at the

2.5th percentile, corresponding to xij = 91.3%, his estimated save probability for the next shot

was 95.2%—a 3.9 pp increase. At the other extreme, for shots where the recent save performance

was at the 97.5th percentile, corresponding to xij = 97.2%, the estimated save probability for

the next shot was 92.4%—a 4.8 pp decrease.

Figure 7 depicts the save probability against different values for the control variables. Home

and Game score have minimal impact on the estimated save probability. Different values for

Position, Rebound, Strength, and Shot type, in contrast, have a substantial impact on estimated

save probability: a shot from the Slot is 16 pps less likely to be saved than a shot from the Top;

a rebound shot is 15 pps less likely to be saved than a non-rebound shot; and a shot from a team

that has a 2-man advantage is 9 pps less likely to be saved than shot from a team that is 2 men

short. For Shot type, the save probability decreases by 18 pps when moving from wrap-around

(the shot type least likely to result in a goal) to a deflected shot (the type most likely to result

14 Figure 7: Control Variables versus estimated save Probability

Rebound 74% Away 89% Non-rebound 89% Home 90%

70% 80% 90% 100% 75% 80% 85% 90% 95% 100% (a) Rebound (b) Home

Slot 78% Trailing 88%

Corner 89% Tied 89%

Top 94% Leading 89%

75% 80% 85% 90% 95% 100% 75% 80% 85% 90% 95% 100% (c) Position (d) Game score

Deflected 74% Strength = -2 89% Tip-in 82% Strength = -1 93% Slap 88%

Strength = -0 96% Snap 88%

Wrist 89% Strength =1 95% Backhand 89%

Strength = 2 98% Wrap-around 92%

75% 80% 85% 90% 95% 100% 70% 75% 80% 85% 90% 95% 100% (e) Strength (f) Shot type

in a goal).

5.4 MCMC diagnostics

We computed two diagnostic statistics: Rˆ and neff. To check whether a chain has converged to

the equilibrium distribution we can compare the chain’s behavior to other randomly initialized

chains. The potential scale reduction statistic, Rˆ, allows us to perform this comparison, by

computing the ratio of the average variance of draws within each chain to the variance of the

pooled draws across chains; if all chains are at equilibrium, the two variances are equal and

Rˆ = 1, and this is what we found for all of our models.

The effective sample size, neff , is an estimate of the number of independent draws from

15 the posterior distribution for the estimate of interest. The neff metric computed by the rstan package is based on the ability of the draws to estimate the true mean value of the parameter.

As the draws from a Markov chain are dependent, neff is usually smaller than the total number of draws. Gelman et al. (2013) recommend running the simulation until neff is at least 5 times the number of chains, or 5 × 4 = 20. This requirement is met for all parameters in all of our models.

6 Discussion and conclusion

We used multilevel logistic regression to investigate whether the performance of NHL goaltenders during the playoffs is consistent with a hot-hand effect. We used data from the 2008–2016 NHL playoffs. We measured past performance using both shot-based windows (as has been done in past research) and time-based windows (which has not been done before). Our window sizes spanned a wide range: from, roughly, half a game to 5 games. We allowed the intercept and the slope with respect to recent save performance to vary across goaltender-season combinations.

We found a significant negative impact of recent save performance on the next-shot save prob- ability. This finding was consistent across all window types, window sizes, and goaltender-season combinations. This finding is inconsistent with a hot-hand effect and contrary to the findings for baseball in Green and Zwiebel (2017), who used a similar window size and hypothesized that skilled activity would generally demonstrate a hot-hand effect.

If a goaltender’s performance, after controlling for observable factors, was completely ran- dom, then we would expect a period of above-average or below-average recent save performance to be likely to be followed by a period of save performance that is closer to the average, be- cause of regression to the mean. As we increase the sample size used to measure recent save performance (that is, increase the window size), we would expect the average amount by which performance moves toward the average to decrease. We observe the opposite (see Figure 2), which argues against our finding being driven by regression to the mean.

A motivation effect provides one possible explanation for our finding. That is, if a goal-

16 tender’s recent save performance has been below his average for the current playoffs, then his motivation increases, resulting in increased effort and focus, causing the next-shot save prob- ability to be higher. Conversely, if the recent save performance has been above average, then the goaltender’s motivation, effort, and focus could decrease, leading to a lower next-shot save probability. B´elanger et al. (2013) find support for the first of these effects (greater perfor- mance after failure) for “obsessively passionate individuals” but did not find support for the second effect (worse performance after success) for such individuals. The study found support for neither effect for “harmoniously passionate individuals.” These findings are consistent with

Hall-of-Fame goaltender Ken Dryden’s (2019) sentiment that “if a shot beats you, make sure you stop the next one, even if it is harder to stop than the one before.” The psychological mechanisms underlying our finding could benefit from further study.

Although the estimated recent save performance coefficient is consistently negative, its mag- nitude varies and in particular, the magnitude increases sharply with the window size. We expect to see more reliable estimates with longer window sizes, but the increase in magnitude is surprising, given that we define the recent save performance as a scale-free save percentage.

One limitation of our study is that, in defining windows, we ignore the time that passes between games. Past research, such as Green and Zwiebel (2017), shares this limitation. This limitation could be particularly serious for backup goaltenders, for whom the interval between two successive appearances could be several days long.

17 References

Albright, S. Christian (1993). “A Statistical Analysis of Hitting Streaks in Baseball.” In: Journal of the American Statistical Association 88.424, pp. 1175–1183. Arkes, Jeremy (2010). “Revisiting the Hot Hand Theory with Free Throw Data in a Multivariate Framework.” In: Journal of Quantitative Analysis in Sports 6.1. B´elanger, Jocelyn J et al. (2013). “Driven by Fear: The Effect of Success and Failure Information on Passionate Individuals’ Performance.” In: Journal of Personality and Social Psychology 104.1, pp. 180–195. doi: 10.1037/a0029585. Dryden, Ken (2019). Scotty: A Hockey Life Like No Other. McLelland & Stewart. Eager, Christopher and Joseph Roy (2017). Mixed Effects Models are Sometimes Ter- rible. arXiv: 1701.04858 [stat.AP]. Enders, Craig K. and Davood Tofighi (2007). “Centering Predictor Variables in Cross- Sectional Multilevel Models: A New Look at an Old Issue.” In: Psychological Methods 12.2, p. 121. Gabry, Jonah and Ben Goodrich (2019). Prior Distributions for rstanarm Models. https://cran.r-project.org/web/packages/rstanarm/vignettes/priors.html. Gelman, Andrew et al. (2013). Bayesian Data Analysis. 3rd edition. Chapman and Hall/CRC. Gilovich, Thomas, Robert Vallone, and Amos Tversky (1985). “The Hot Hand in Bas- ketball: On the Misperception of Random Sequences.” In: Cognitive Psychology 17.3, pp. 295–314. Green, Brett and Jeffrey Zwiebel (2017). “The Hot-Hand Fallacy: Cognitive Mistakes or Equilibrium Adjustments? Evidence from Major League Baseball.” In: Management Science 64.11, pp. 5315–5348. Hockey Reference (2017). Season Data. http://www.hockey-reference.com/. Hox, Joop and J. Kyle Roberts (2011). Handbook of Advanced Multilevel Analysis. Psy- chology Press. Kniffin, Kevin M. and Vince Mihalek (2014). “Within-Series Momentum in Hockey: No Returns for Running up the Score.” In: Economics Letters 122.3, pp. 400–402. Livingston, Jeffrey A (2012). “The Hot Hand and the Cold Hand in Professional Golf.” In: Journal of Economic Behavior & Organization 81.1, pp. 172–184. Miller, Joshua B. and Adam Sanjurjo (2018). “Surprised by the Hot Hand Fallacy? A Truth in the Law of Small Numbers.” In: Econometrica 86.6, pp. 2019–2047. — (2019). “A Bridge from Monty Hall to the Hot Hand: The Principle of Restricted Choice.” In: Journal of Economic Perspectives 33.3, pp. 144–62. Morrison, Donald G. and David C. Schmittlein (1998). “It Takes a Hot Goalie to Raise the Stanley Cup.” In: Chance 11.1, pp. 3–7. Otting,¨ Marius and Andreas Groll (2019). “A Regularized Hidden Markov Model for Analyzing the ‘Hot Shoe’ in Football.” In: arXiv preprint arXiv:1911.08138. Otting,¨ Marius et al. (2020). “The Hot Hand in Professional Darts.” In: Journal of the Royal Statistical Society: Series A (Statistics in Society) 183.2, pp. 565–580. Perry, Emmanuel (2017). Corsica Play-by-Play Data. http://www.corsica.hockey/data/. Tversky, Amos and Thomas Gilovich (1989a). “The Cold Facts about the ‘Hot Hand’ in Basketball.” In: Chance 2.1, pp. 16–21. — (1989b). “The ‘Hot Hand’: Statistical Reality or Cognitive Illusion?” In: Chance 2.4, pp. 31–34. Vesper, Andrew (2015). “Putting the ‘Hot Hand’ on Ice.” In: Chance 28.2, pp. 13–18.

18 Wardrop, Robert L (1995). “Simpson’s Paradox and the Hot Hand in Basketball.” In: The American Statistician 49.1, pp. 24–28. doi: 10.1080/00031305.1995.10476107. — (1999). “Statistical Tests for the Hot-Hand in Basketball in a Controlled Setting.” In: Unpublished manuscript 1, pp. 1–20. Wollack, James A. et al. (2002). “Recovery of Item Parameters in the Nominal Re- sponse Model: A Comparison of Marginal Maximum Likelihood Estimation and Markov Chain Monte Carlo Estimation.” In: Applied Psychological Measurement 26.3, pp. 339–352.

19