THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE

DEPARTMENT OF INDUSTRIAL AND MANUFACTURING ENGINEERING

HOCKEY ANALYTICS: PREDICTIVE MODELING OF TEAM AND PLAYER PERFORMANCE

STEVEN BOLLENDORF SPRING 2018

A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Industrial Engineering with honors in Industrial Engineering

Reviewed and approved* by the following:

Guodong (Gordon) Pang Associate Professor Harold and Inge Marcus Department of Industrial and Manufacturing Engineering Thesis Supervisor

Catherine Harmonosky Associate Professor and Associate Department Head of Harold and Inge Marcus Department of Industrial and Manufacturing Engineering Honors Adviser

* Signatures are on file in the Schreyer Honors College.

i

ABSTRACT

Analytics in hockey is growing in popularity. Deciding which game strategies to implement and which players make a team more competitive is extremely valuable information for coaches and general managers (GMs) of (NHL) teams. The of this applied research is to look at two of aspects of the sport to find outcomes that can help with in-game strategies and help find the right players for reasonable salaries on NHL teams. This research looks at The Pennsylvania State University 2016-2017 hockey team and the 2015-2016 Pittsburgh

Penguins to discover any scoring rate patterns that winning hockey teams possess. Likewise, a linear regression model based on a team’s Goals For (GF), or goals scored by a team, and Goals

Against (GA), or goals scored against a team, predicts that GF contribute less to a team’s success than GA. In addition, the data from four NHL seasons on every NHL player is used to cluster players into specific player types in order to predict their value to team success. The key clustering metric used is the Corsi For Percentage (CF%), which measures a player’s puck possession skill.

According to this research, elite forwards, second line forwards, and defensive defenseman provide the most value to a team. Lastly, specific teams during the 2016-2017 season are analyzed to determine if they have underperformed or overperformed relative to the model’s predicted team total.

ii

TABLE OF CONTENTS

LIST OF FIGURES ...... iii

LIST OF TABLES ...... iv

ACKNOWLEDGEMENTS ...... vi

Chapter 1: Introduction ...... 1

1.1 History and Evolution of Hockey Analytics ...... 1 1.2 Objectives ...... 2

Chapter 2: Goal Interarrival Times ...... 5

2.1 Literature Review ...... 5 2.2 Introduction: Penn State (2016-2017) ...... 7 2.3 Methodology: Penn State Ice Hockey ...... 7 2.4 Results: Penn State Ice Hockey ...... 8 2.5 Penn State Hockey Future Considerations ...... 14 2.6 (2015-2016) ...... 15 2.7 Methodology: Pittsburgh Penguins ...... 15 2.8 Pittsburgh Penguins Descriptive Statistics ...... 16 2.9 NHL Goal Regression Model ...... 20 2.10 Pittsburgh Penguins’ Goal Interarrival Times ...... 25 2.11 Pittsburgh Penguins Future Considerations ...... 29 2.12 Conclusion ...... 30

Chapter 3: Evaluating Player Contribution to Team Success ...... 32

3.1 Introduction ...... 32 3.2 Literature Review ...... 34 3.3 Methodology ...... 36 3.3.1 Data ...... 36 3.3.2 Clustering Model ...... 37 3.3.3 Player Contribution Linear Regression Model ...... 40 3.4 Results: Player Clusters ...... 42 3.5 Results: Regression Model ...... 55 3.6 Results: Bi-Criteria Optimization Model ...... 59 3.6.1 Sensitivity Analysis ...... 64 3.7 Future Work ...... 66 3.8 Conclusion ...... 67

Chapter 4: Conclusion...... 69

BIBLIOGRAPHY ...... 72

iii

LIST OF FIGURES

Figure 1: Penn State Hockey Goal Frequency per Minute in Game ...... 9

Figure 2: Penn State Interarrival Goal Times ...... 11

Figure 3: R fitting output of interarrival goal time data ...... 12

Figure 4: Penn State Weibull distribution plot highlighting the percent of interarrival times between 300 and 600 seconds (5-10 min) ...... 13

Figure 5: Empirical CDF for Penn State highlighting the 2.5, 5, and 97.5 percentile of interarrival goal data ...... 14

Figure 6: Pittsburgh Penguins’ goals per minute in game for 2015-16 season ...... 19

Figure 7: Pittsburgh Penguins’ goals per minute in game under Coach Mike Johnston ...... 19

Figure 8: Pittsburgh Penguins’ goals per minute in game under Coach Mike Sullivan ...... 20

Figure 9: Distribution fitting for Pittsburgh Penguins’ interarrival goal data (R Output) ...... 26

Figure 10: Pittsburgh Penguins’ Weibull distribution plot highlighting the percent of goal interarrival times between 5-10 minutes ...... 27

Figure 11: Pittsburgh Penguins’ Weibull distribution plot highlighting the 25th percentile of goal interarrival times...... 28

Figure 12: Empirical CDF for Penguins highlighting the 2.5, 5, and 97.5 percentiles of interarrival goal data ...... 29

Figure 13: 2013-14 forwards cluster matrix plot () ...... 44

Figure 14: 2013-14 forwards cluster matrix plot (Tyler Toffoli) ...... 45

Figure 15: 2013-14 defensemen cluster matrix plot (John Carlson) ...... 47

Figure 16: 2013-14 defensemen cluster matrix plot (Jake Muzzin) ...... 48

Figure 17: 2014-15 defensemen cluster matrix plot (Jake Muzzin) ...... 50

Figure 18: 2014-15 defensemen cluster matrix plot (Kris Letang) ...... 51

Figure 19: 2014-15 defensemen cluster matrix plot (Kevin Shattenkirk) ...... 51

Figure 20: 2015-16 forwards cluster matrix plot (Nikita Kucherov) ...... 53

Figure 21: 2016-17 forwards cluster matrix plot (Nikita Kucherov) ...... 55

iv

LIST OF TABLES

Table 1: Basic statistics on goal interarrival times in minutes and seconds ...... 8

Table 2: Weibull distribution fitting results for Penn State’s goal interarrival times ...... 12

Table 3: Penguins’ regulation goals under Mike Johnston (28 games) ...... 17

Table 4: Penguins’ regulation goals under Mike Sullivan (54 games) ...... 18

Table 5: Summary of goals scored per period during 2015-16 season ...... 18

Table 6: Points, GF, and GA for every 2015-16 playoff team (goals include shootout goals). Colors differentiate divisions...... 21

Table 7: Points, GF, and GA for every 2015-16 non-playoff team (goals include shootout goals). Colors differentiate divisions...... 22

Table 8: Linear regression model for points in 2015-16 NHL season based on GF and GA .. 23

Table 9: summary of Penguins’ wins and losses broken down by coach ...... 24

Table 10: Basic statistics on Pittsburgh Penguins’ goal interarrival times ...... 25

Table 11: Weibull distribution fitting results for Penguins’ goal interarrival times ...... 26

Table 12: Normalized clustering statistics (GP = Games Played) ...... 39

Table 13: 2013-14 forwards clustering results ...... 42

Table 14: Sample forwards in 2013-14 clusters ...... 42

Table 15: 2013-14 defensemen clustering results ...... 46

Table 16: Sample defensemen in 2013-14 clusters ...... 46

Table 17: 2014-15 forwards clustering results ...... 49

Table 18: 2014-15 defensemen clustering results...... 49

Table 19: 2015-16 forwards clustering results ...... 52

Table 20: 2015-16 defensemen clustering results ...... 52

Table 21: 2016-17 forwards clustering results ...... 54

Table 22: 2016-17 defensemen clustering results ...... 54

Table 23: Team point regression model coefficient values (2016-17)...... 56

Table 24: Team Predicted Points vs. Actual Points (2016-17) ...... 58 v

Table 25: Team points contributed by each player type (2016-17) ...... 60

Table 26: Bi-Criteria Optimization Model ...... 61

Table 27: Optimal team with a maximum $75 million salary cap ...... 62

Table 28: Averages and standard deviations for cap hit per player type ...... 62

Table 29: Optimal team when the salary cap was exactly $55.4 million ...... 63

Table 30: Cap hit ($M) per contributed team point ...... 63

vi

ACKNOWLEDGEMENTS

I want to thank my thesis supervisor, Dr. Pang, for guiding me through the research process and for motivating me to develop new technical skills during the past two years. I want to acknowledge my honors advisor, Dr. Harmonosky, for her guidance and support during my undergraduate honors industrial engineering journey. Finally, I would like to thank my parents and family for supporting me during my academic and hockey career. You have granted me the opportunity to receive an industrial engineering degree from one of the greatest universities in the world. I am forever grateful, as my degree would not be possible without you.

1

Chapter 1: Introduction

The evolution of hockey analytics and the primary objective of this research is discussed in this chapter.

1.1 History and Evolution of Hockey Analytics

Analytics is relatively new to the hockey world. There are many scouts and National

Hockey League (NHL) General Managers (GMs) involved in professional hockey that are slowly adapting and finding uses for analytics in professional hockey. Many teams have been reluctant to rely on anything but the eye-test to scout and evaluate talent. More recently, teams like the

Maple Leafs have hired analytical minds for their operational departments. In an attempt to focus more on analytics, Kyle Dubas was recently named the Assistant General

Manager at the age of 28 for his previous work in hockey analytics (Parnass 2015).

While the hockey world has some reservations, more teams are building their operational staff with analytical minds. According to Tim Swartz, “the influence of analytics in other sports and the availability of data, the state of analytics has begun to change in the NHL” (2017). Teams such as the Pittsburgh Penguins and Florida Panthers have academic statisticians to help consult the team. While analytics is undoubtedly growing in the hockey world, there is a lot left to research and discover.

As many analytical minds know, data collection can be a problem. However, in the NHL, live data is captured using the NHL’s Real Time Scoring System (RTSS) and has been since the

1980’s (Swartz 2017). The prevalence of the data is allowing for the expansion of analytics in the game. As Arik Parnass, an NHL.com journalist points out, “Rather than exclusively trusting the eye test, condemning players for misfortune in small samples, or labeling players as lazy or 2 enigmatic based on reputation or hearsay, analytics has provided the opportunity to scrutinize decision-making and avoid those characterizations” (2015). Analytics can be used to supplement the traditional method of scouting players. With analytics, scouts can use the numbers to either support or denounce a player’s skill. There is a promising analytics future in professional and amateur hockey.

1.2 Objectives

Hockey analytics is a growing field in both the NHL and college hockey. While there are a number of other sports that have already taken notice of the impact of analytics, hockey seems to be the enigma. Recently, a variety of advanced statistics such as a Fenwick and Corsi have placed an emphasis on describing the fast-paced nature of the game. However, due to the sporadic and almost always unpredictable nature of the sport, it is difficult to rely solely on analytics to make game-time decisions and predictions on future events in both professional hockey and college hockey.

While there is limited research on hockey analytics, answers to questions, such as what strategies help teams win and what players help teams perform effectively during the regular season are pursued in this research. The goal of every NHL team and every college hockey team is to make the playoffs. To put it simply, if a team does not make the playoffs, they cannot win the championship. Organizations build teams to make the playoffs. Once teams make the playoffs, a new season begins. For this reason, NHL teams build to win in the regular season, thus the ensuing chapters of this paper focus on the regular season for both the NHL and college hockey. 3

Analytics can help NHL teams answer important questions that will affect their performance, such as: do coaches need to adjust their game plans to completely utilize their team’s skill? What analytical methods can NHL GMs use to predict their team’s future performance in order to enact impactful team change? How many goals do teams need to make the playoffs? What players are most impactful towards team performance? This research provides these answers.

Moreover, it is important to categorize certain player types. In hockey, there are certain player attributes that label them as elite players. For example, elite goal scorers receive most of the recognition around the league, but defensive defensemen receive less accolade because their style of play is less glamourous. However, these players are essential for teams to win. The question becomes, however, what is the value of these players and how does one categorize their style of play simply by the statistics that they produce? Do certain players contribute more to a team’s success? One way to find which players are most valuable and what they’re monetarily worth is by using clustering and optimization models. Previous research has been able to categorize players on their skillsets, but in this research, different descriptive statistics are used to describe players.

From this insight, teams can decide which players are worth paying more money than others. The NHL has a hard salary cap, meaning that teams cannot spend more the league allows.

For the 2017-2018 season, the NHL salary cap was $75 million. In other words, teams only have

$75 million to spend on players. Teams can acquire players through free agency during the offseason, trades with other teams, and through the draft. Player salary is dependent on a variety of factors, but most notably on a player’s perceived impact on team winning. Elite players are paid more than average NHL players because they are expected to contribute more to team success.

However, are some elite players being miscategorized and are some average players actually being 4 undervalued? Through a variety of statistics discussed in Chapter 3, players can be more appropriately judged.

Hockey analytics is a growing field. There is a lot to study and so much one can look at it in the sport. In this thesis, there are a couple chapters that look at different aspects of the game.

The end goal is to be able to give GMs and coaches better methods and tools to evaluate players and team strategies. Through continued research in this field, no longer will scouts, general managers, and coaches need to rely solely on their instincts about players, but rather they can also rely on what the analytics are showing them.

Each of the following chapters has a different scope. Chapter 2 discusses the interarrival goal time data for both the Penn State NCAA hockey team and the Pittsburgh Penguins in the

NHL. Likewise, the goal scoring data for the other 29 NHL teams are analyzed to predict future team performance. Lastly, Chapter 3 discusses the impact of specific player types on team performance and analyzes undervalued and overvalued players.

5

Chapter 2: Goal Interarrival Times

Goal interarrival times for Penn State’s hockey team and the Pittsburgh Penguins are analyzed to find efficiencies in scoring. In addition, a team point predictive model based on team scoring is created to determine team success.

2.1 Literature Review

Professional NHL and college hockey teams build teams to make the playoffs. To make the playoffs, they need to win games. To win games, they need to score more goals than their opponents. The arrival and interarrival times of goals is crucial for winning games. When teams can consistently score goals, they will increase their chances of winning games and subsequently the probability of making the playoffs increases.

When looking at hockey goals, interarrival times of goals can be hard to describe due to the fact that goals are rare events in a 60-minute regulation game. As Andrew Thomas describes, hockey goals “are rare when compared to professional basketball… the mean number of goals in a 60-minute hockey game is roughly six” (2007). It can be useful to find a distribution that describes the interarrival times of goals scored. To find the probability that another goal will be scored based on the time in the game can be crucial for determining the strategy that teams will play. As Thomas notes, “If the time between goals can be adequately modelled as Exponentially distributed, scoring times will be well described as a Poisson process” (2007). The research completed in this thesis will look to validate and compare the interarrival times for Penn State goals and Pittsburgh Penguins’ goals.

Thomas looked at only even-strength goals for all NHL teams during four NHL seasons.

This is nearly 4700 games for an average of 5.48 goals per 60 minutes of a game. He found some 6 interesting trends that this thesis builds upon. One interesting finding is that “The number of goals scored at even strength is of the same magnitude for every minute in the game except the last”

(Thomas 2007). The reason is that team’s change their style of play towards the end of the game.

For example, if team A is trailing by one goal to team B, team A may elect to remove their from the net in order to give them an extra man on the ice to help tie the game. In doing this, team A leaves their net unprotected. Team B can more easily score a goal without team A’s goalie in the net. This is one reason for the spike in last minute goals.

According to Thomas’ results, there is a decrease in the number of goals scored in the beginning of periods. The reason he believes that this is the true is because the start of the period begins at the center ice faceoff circle. Teams are neither in their defensive nor their offensive zone to start the period; therefore, it takes time to develop a play to create scoring chances. Thus, the number of expected goals should be smaller. In the data presented with Penn State hockey and the

Pittsburgh Penguins, a similar trend is analyzed.

Andrew Thomas states that the randomness of goals per minute in the game “suggest that the simple Poisson model of the game, notably the consequence that goals are scored with an equal rate at any time of the game, is not sufficient to describe the game’s dynamics” (2007). For a

Poisson process, the times between goals is exponentially distributed. He explains that the best model to describe the interarrival times of goals has yet to be determined. Thomas considered the

Weibull and the Plateau Hazard (PH) function to describe the interarrival time of goals. The initial distribution of goal interarrival times appears to be exponential in nature, but as Thomas mentions, the lack of goals scored near zero minutes suggest that it is not an exponential distribution.

Furthermore, Thomas assigns winning probabilities when teams score at particular points in the game with certain goal differentials. In this research, a micro-level analysis of one team 7

(Penn State and the Pittsburgh Penguins) is completed to determine strategies they can implement at particular points in the game, given their scoring rates and the frequency of goals per minute in the game.

2.2 Introduction: Penn State Ice Hockey (2016-2017)

While the goal interarrival research completed by Thomas looked at all NHL teams over the course of several NHL seasons, a focused view of the goal interarrival times would be helpful on a team-by-team basis. In this chapter, the 2016-2017 Penn State NCAA Division I ice hockey team was studied for the frequency of their goals and for the interarrival times of goals. The purpose of this particular study was to see where further efficiencies reside in Penn State’s goal scoring and if their style of play can help predict their goal scoring rate.

Penn State won the Big Ten Championship in 2016-2017. They had a dominating season, which is why this study is interesting because they were the top team in their league. Their style of play may give further insights into how winning teams perform and how their scoring rates affect their success. When teams score frequently, they will control the pace of the game and will gain and maintain the game’s momentum. Is this apparent in Penn State’s interarrival data? Is Penn

State implementing a strategy that is conducive to their success? These are the questions that are answered in this research.

2.3 Methodology: Penn State Ice Hockey

During the 2016-2017 season, which included a Big Ten Championship and a trip to the

NCAA Men's Division I Ice Hockey Tournament, Penn State scored 160 goals in 39 total games.

The time of every goal and the player who scored was recorded for every game and was found on www.gopsusports.com (2018). Initial data analysis was done in Excel. Every interarrival time was 8 calculated and graphed in Excel to get an initial visualization of the data. In addition, the goals were graphed based on the minute in the game that Penn State scored. This was done in order to help identify any points during the game that were inefficient periods for goal scoring.

Finally, R programming was used to determine the best distribution that fit the interarrival data. The interarrival distribution was used to look further into the specifics of the goals scoring rates. With a distribution to describe the Penn State scoring rates, the efficiency of their goal scoring times could be analyzed.

2.4 Results: Penn State Ice Hockey

Initial data analysis involved looking at the basic statistics of the interarrival times for the

160 Penn State goals during the season. The basic statistics can be found in Table 1. The results are labeled in seconds and minutes. The average time between goals scored by Penn State is 11.1 minutes. This is equivalent to a goal every half period, since periods are 20 minutes. As shown by the basic statistics, there is a large standard deviation and variance. The standard deviation of goals scored is 9.25 minutes, which is clearly a large deviation since it is relatively close to the mean.

While this may seem extreme, it actually makes sense because hockey is a random, sporadic sport.

Goals are random, ununiform, nor predictable. For this reason, hockey is a difficult sport to describe with basic statistics.

Table 1: Basic statistics on goal interarrival times in minutes and seconds

Minutes Seconds Mean 11.10 666.07 Variance 85.49 307,764.80 Standard Deviation 9.25 554.77

9

Moreover, each goal scored by Penn State was recorded by the minute it was scored in the game. Each game consists of 60 minutes. A histogram of goals scored per minute of the game was created to evaluate fluctuations in scoring. Questions, such as which game strategies to implement, can be answered by coaches if they were to view this data. Figure 1 describes the scoring frequency per minute of the game for the 160 goals scored over the 39 games played by Penn State. The red lines in the histogram represent the first minute of the second and third periods.

Goals Per Minute in Game 8

7

6

5

4

3 Goal Frequency Goal 2

1

0

5 8 1 2 3 4 6 7 9

39 42 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 40 41 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 More Minute in Game *Red line denotes first minute of period

Figure 1: Penn State Hockey Goal Frequency per Minute in Game Looking at the histogram of the raw scoring times, there are a few takeaways that can affect coaching decisions for Penn State. For one, there appears to be a decline in second period scoring, specifically in the middle of the period. This is likely due to what is referred to in hockey as the

“long change” effect. In the second period, each team’s bench is farther from their defending net because teams switch sides. Because a team’s bench is farther away from their net in the second period, teams might elect to play more conservatively and take less offensive risk for the fear that 10 defensemen might get caught on the ice for too long and not make it back to the bench for a line change.

In addition, it is clear that there is a rise in late third period goals. As Thomas states, “This difference is plausibly explained by the strategy of a team losing by one goal to replace their goalie with an extra skater in a desperate attempt to tie the game” (2017). Penn State had 25 wins, 160

Goals For (GF), or goals scored, and 108 Goals Against (GA), or goals scored against, during the season according to www.gopsusports.com (2018). For a team with a winning record, it is very likely to see a spike in late third period goals. As Penn State leads games into the final minutes, the opposing teams will take risks, such as pulling their goalie to give their offense more chances to tie the game. With these calculated risks, Penn State has better offensive opportunities due to opposing teams’ lack of focus on their defensive zone.

Nevertheless, the histogram is important for coaches and players to understand to realize what strategies are working to increase scoring. While finding the strategies that Penn State implemented in the second period is not accessible, teams can utilize the Figure 1 data to alter their strategies to increase scoring. For example, teams playing a more conservative forecheck

(attacking style of offense), such as a 1-3-1 trap forecheck, will see less offensive chances due to the defensive nature of their system. However, if they move to a more aggressive forecheck, they can generate more scoring opportunities, but they would risk giving up more goals.

Moreover, the interarrival times of goals were plotted and can be found in Figure 2. The graph is plotted in Excel with a bin size of 30 seconds. The interarrival time of 150 to 180 seconds

(2.5 minutes to 3 minutes) has the largest amount of goals with 10. By looking at Figure 2, it is clear that goals do not frequently occur very close together. This is consistent with Thomas’ findings for his NHL data; Thomas notes that “Preliminary inspection shows that goals are not 11 scored quickly in succession - that is, scoring times are not maximized near zero, as expected in the Exponential distribution” (Thomas 2007). As stated previously, the frequency of goals is maximized around 2.5 minutes to 3 minutes in time between goals. Thus, the Weibull and

Lognormal distributions are tested for the best fit of the data.

Penn State Goal Interarrival Times (Seconds) 12

10

8

6

Frequency Frequency 4

2

0

30

750 150 270 390 510 630 870 990

2430 1110 1230 1350 1470 1590 1710 1830 1950 2070 2190 2310 2550 2670 2790 2910 3030 3150 More Seconds

Figure 2: Penn State Interarrival Goal Times In R, using the package by Muller and Dutang (2015), the Weibull and Lognormal distributions were fit to the data. Figure 3 summarizes the fitting of the distributions to the interarrival data. By analyzing these results, the Weibull distribution appears to be the better fit for the data. By looking at the empirical fit of the Weibull distribution on the data (top left graph in

Figure 3), the Weibull distribution fits appropriately. In addition, the fit for the Q-Q plot, which represents the fit at the tails, is appropriate for the Weibull distribution. Likewise, the P-P plot, which represents the distribution’s fit at the center of the data, is also appropriate for the Weibull 12 distribution. Finally, the p-value for the Weibull distribution is greater than .250, signifying that the Weibull distribution is a good fit for the data. Table 2 summarizes the results for the Weibull distribution.

Table 2: Weibull distribution fitting results for Penn State’s goal interarrival times

AD-Value .226 P-value >0.250 Shape 1.19186 Scale 706.87688

Figure 3: R fitting output of interarrival goal time data

Nonetheless, once the Weibull distribution was deemed the best fit for the data, further analysis on the population data was conducted. As seen in time density plot in Figure 4, 25.83% 13 of Penn State goal interarrival times fall between 300-600 seconds (5-10 minutes). Likewise, as seen in Figure 5, 95% of the times fell between 32 and 2,113 seconds (0.53 - 35.22 minutes). This is a large range, but again can be explained by the pure variability within the sport. Therefore, if teams are able to reduce this range, they will have a better chance of winning games since scoring occurs at a more frequent pace. This data can tell a story about the type of team Penn State was for the season. Having 25% of goals occurring within 5-10 minutes of each other is a good place for teams to be in order to capitalize on the in-game momentum shifts. With this being said, it would be interesting in future studies to validate this claim with below average teams. It can be assumed that the percentage of goals occurring within 5-10 minutes of each other is much lower for below average teams.

Figure 4: Penn State Weibull distribution plot highlighting the percent of interarrival times between 300 and 600 seconds (5-10 min) 14

Figure 5: Empirical CDF for Penn State highlighting the 2.5, 5, and 97.5 percentile of interarrival goal data

2.5 Penn State Hockey Future Considerations

Knowing that the Penn State hockey goal interarrival times can be described by the Weibull distribution, it can be beneficial for describing the value of goals in the game. Additional research on Penn State game situations when a goal is scored can help coaches adjust their game strategy.

For example, if a goal almost ensures a victory at a certain point in the game based on the probability that an opposing team’s goal will not be scored within a certain amount of time, Penn

State can play a more defensive style to ensure victory. More research and data would be needed in the future to complete this study.

Finally, it would be beneficial to look at below average college hockey teams to determine what percentage of goals are scored within 5-10 minutes of each other. This range is crucial because it would put teams on pace to score around 6 goals per game. This would almost always 15 guarantee a win during most games, as Penn State averaged 4.10 goals per game during their Big

Ten Championship season. Nevertheless, comparing other teams to the Penn State goal data would be an interesting future study to further validate the importance of scoring frequently.

2.6 Pittsburgh Penguins (2015-2016)

Penn State was studied in the previous section; however, the NHL is another level of hockey. The NHL has the best hockey players in the world. The pace of play, the skill, and style of play for many NHL teams is arguably different than college hockey teams. In this section, similar to the previous section, a focused view of the Pittsburgh Penguins goal scoring is studied.

The motivation to study the goal interarrival distribution for the 2015-2016 Pittsburgh

Penguins is for two reasons. For one, the Penguins were the NHL Champions during the 2015-2016 season. As the best team in the NHL, their style of play is interesting to study to see what other teams could emulate. However, the other and possibly more interesting reason for studying the Pittsburgh Penguins is that they fired their head coach on December 12, 2015 (Rosen

2017). Mike Johnston was fired as the head coach of the Penguins 28 games into the season and

Mike Sullivan immediately became the head coach on game 29 of the season. The Penguins were off to a slow start a little more than a quarter into the season and decided to make a change. Due to their championship season and their coaching change, the 2015-2016 Penguins are an interesting team to study.

2.7 Methodology: Pittsburgh Penguins

The Pittsburgh Penguins study of goal interarrival times was conducted using Excel and R.

The data on the 241 goals scored in 82 regular season games was collected from www.hockey- 16 reference.com (2018). The data was sorted for every game and imported into Excel. In Excel, basic statistics and histograms were created to describe the goal times. A deeper analysis on the goals per minute in a game was conducted. More specifically, a comparison between Mike Johnston and

Mike Sullivan was done in Excel by looking at the goals from the first 28 games of the season and the goals from game 29 to game 82.

In addition, playoff teams during the 2015-2016 NHL season were analyzed for their goals scored during the regular season. Mike Johnston’s 82-game pace goal scoring total was compared to all playoff teams in order to predict if Johnston’s style of play would have led the Penguins to the playoffs had he not been fired, but rather remained the head coach.

The interarrival times between goals was calculated by manually creating formulas in

Excel. From there, the interarrival times were imported into R and Minitab. The same R package for fitting distributions from the Penn State study was used to find the best distribution to fit the data (Delignette & Dutang 2015). After all the data summaries and visualizations were complete, the data was analyzed in various forms including percentile analysis to determine where the majority of goal interarrival times fall.

2.8 Pittsburgh Penguins Descriptive Statistics

The beginning of the Pittsburgh Penguins’ season did not start the way a superstar-loaded team would have expected. In the first 28 games under Mike Johnston, the Penguins’ record was

15-10-3, meaning that they had 15 wins and 13 losses (3 losses). The team was not performing up to expectations. The Penguins’ General Manager Jim Rutherford explains at the time of Johnston’s firing, “I felt it was time for a coaching change because our team has underachieved. Our expectations are much higher with this group of players” (Chiari 2015). What 17 went into the firing is not public information, but from the goal data collected, the Penguins were not on pace to score enough goals to make the playoffs.

Under Mike Johnston, in his 28 games as head coach, the Penguins scored 64 total goals.

In fact, they did not score a first period goal until the fifth game of the season. The summary statistics of goals scored under Johnston, along with his season pace are displayed in Table 3. Had the Penguins not fired Johnston after game 28, they were on pace to score 187.4 goals in 82 games under Johnston.

Table 3: Penguins’ regulation goals under Mike Johnston (28 games)

Period Goals (28 Games) Goals (82-Game Pace) 1 16 46.9 2 28 82 3 20 58.6 Total 64 187.4

Once Mike Johnston was fired, Mike Sullivan took over as the head coach and the team went on to win 33 games and eventually win the Stanley cup (Rosen 2017). As Table 4 summarizes, the Penguins would have been on pace to score 261.2 goals under Mike Sullivan had he been the coach for the entire 82-game season. This is nearly a 74-goal difference between

Johnston’s potential and Sullivan’s potential season goal totals. In Table 5, the Penguins actual season totals is recorded (data from www.hockey-reference.com). The Penguins scored 241 total goals with 108 coming in the second period. This is interesting because the second period is the period with the “long change” discussed in the Penn State section of this chapter. It appears the

“long-change” effect did not impact the Penguins scoring in the second period.

18

Table 4: Penguins’ regulation goals under Mike Sullivan (54 games)

Period Goals (54 games) Goals (82-Game Pace) 1 40 60.7 2 80 121.5 3 52 78.96 Total 172 261.2

Table 5: Summary of goals scored per period during 2015-16 season

Goals per Period 1 2 3 OT Total Pittsburgh Penguins 56 108 72 5 241 Opponent 67 65 63 4 199

The Penguins, as shown in the histogram in Figure 6, have a spike in scoring in the second period. This is interesting because Penn State, during their successful 2016-2017 season, saw the opposite. Penn State’s scoring dropped in the second period, whereas the Penguins thrived in the second period. While many teams may choose to play a little more conservatively in the second period, the Penguins seemed to cater to their offensive game. The data and analysis make sense from the style of play Sullivan implemented. As one sports journalist points out, “Mike Sullivan convinced players to embrace a simple philosophy: Don't get cute with the puck in the neutral zone” (West 2016). The scoring became more consistent and spread out throughout games.

Looking at the histograms (Figure 7 and Figure 8), the difference between Johnston and Sullivan is apparent, as the Penguins began to utilize their speed and played more complete games by outshooting and outscoring their opponents from start to finish under Sullivan. The scoring rate under Sullivan is more uniform and consistent through the entirely of the game as opposed to

Johnston. To win games in the NHL, the data suggests to play a balanced game for the entirety of the game. This is exactly what Sullivan preached in order to find consistency in his star players’ goal production. 19

PIT 15-16 Regular Season: Goals Scored per Minute in Game 13 12 11 10 9 8 7 6

Frequency 5 4 3 2 1

0

1 6 2 3 4 5 7 8 9

11 16 21 26 31 36 41 46 51 56 10 12 13 14 15 17 18 19 20 22 23 24 25 27 28 29 30 32 33 34 35 37 38 39 40 42 43 44 45 47 48 49 50 52 53 54 55 57 58 59 60 More Minute in Game *Red line denotes first minute of period

Figure 6: Pittsburgh Penguins’ goals per minute in game for 2015-16 season

PIT 15-16: Goals under Mike Johnston (28 Games) 4

3

2 Frequency

1

0

1 2 3 4 5 6 7 8 9

12 39 55 10 11 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 56 57 58 59 60 More Minute in Game *Red line denotes first minute of period

Figure 7: Pittsburgh Penguins’ goals per minute in game under Coach Mike Johnston

20

PIT 15-16: Goals under Mike Sullivan (54 Games) 12 11 10 9 8 7 6

5 Frequency 4 3 2 1

0

5 1 2 3 4 6 7 8 9

25 42 45 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 43 44 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 More Minute in Game *Red line denotes first minute of period

Figure 8: Pittsburgh Penguins’ goals per minute in game under Coach Mike Sullivan

2.9 NHL Goal Regression Model

The scoring for every NHL team was analyzed to determine a team’s likelihood of earning a playoff spot. NHL teams are ideally built to make the playoffs and from there, they have a chance to win the Stanley Cup. As discussed in future chapters, certain players will score and contribute more than other players. Nonetheless, it necessary that teams score goals to win. But how many goals should a team score in order to win enough games to make the playoffs? Which players are going to contribute most to team scoring? This study looks to find the particular number of goals needed to earn a playoff spot for the 2015-2016 NHL season. The Penguins scored 241 goals in the regular season; however, if you include their four shootout wins, they scored 245 goals. A shootout win adds one goal to season goal totals. Nevertheless, 16 teams make the playoffs each year (8 from the Eastern Conference and 8 from the Western Conference). The 16 playoff teams, 21 labeled by their team abbreviation, are shown in Table 6. Each team’s season point total is recorded, as well. Teams earn two points for a win, one point for an overtime or shootout loss, and zero points for a regulation loss. The average number of goals that a playoff team scored was

231.9. In the Eastern Conference (the Penguins’ conference), the average among the 8 playoff teams was 232 goals, with the (DET) scoring the least number of goals (211 goals). The Red Wings were also the last Eastern Conference playoff team, with 93 points, according to www.hockey-reference.com (2018). Going back to Mike Johnston’s predicted season goal total (187.43 goals), it can be assumed the Penguins would not have had enough goals to earn a playoff spot. The only team that would have had less goals than the Penguins would have been the New Jersey Devils (NJD) with 184 goals. From Table 7, it can be seen that the average number of goals scored from non-playoff teams was 211.6 goals, with the New Jersey Devils at the bottom of the NHL with 184 goals.

Table 6: Points, GF, and GA for every 2015-16 playoff team (goals include shootout goals). Colors differentiate divisions.

Team Points Goals For Goals Against FLA 103 239 203 TBL 97 227 201 DET 93 211 224 WSH 120 252 193 PIT 104 245 203 NYR 101 236 217 NYI 100 232 216 PHI 96 214 218 DAL 109 267 230 STL 107 224 201 CHI 103 235 209 NSH 96 228 215 MIN 87 216 206 ANA 103 218 192 LAK 102 225 195 SJS 98 241 210 AVERAGE 231.9 208.3

22

Table 7: Points, GF, and GA for every 2015-16 non-playoff team (goals include shootout goals). Colors differentiate divisions.

Team Points Goals For Goals Against BOS 93 240 230 OTT 85 236 247 MTL 82 221 236 BUF 81 201 222 TOR 69 198 246 CAR 86 198 226 NJD 84 184 208 CBJ 76 219 252 COL 82 216 240 WPG 78 215 239 ARI 78 209 245 CGY 77 231 260 VAN 75 191 243 EDM 70 203 245 AVERAGE 211.6 238.5

Using the goal and team point data for all 30 teams in the NHL during the 2015-2016 season, a linear regression model was created in Minitab to predict the Penguins’ chances of making the playoffs under Johnston and Sullivan. According to the model shown in Table 8, there is a statistical significance between Goals For (GF) and Goals Against (GA). In addition, the high

R-square value explains that 90.64% of the model’s output variability (points in the season) is explained by the two input parameters (GF and GA). The model appears to predict total points fairly well. According to the model, by inputting the Penguins 241 GF and 199 GA, they should have ended up with anything from 105.6 points to 110.7 points according to a 95% confidence interval. The Penguins ended up with 104 points. The model appears to be an accurate model that can help teams predict points. It is interesting to note that GA affects team points more than GF.

Nevertheless, teams can build their teams around players that can score goals and prevent goals from being scored against. 23

Table 8: Linear regression model for points in 2015-16 NHL season based on GF and GA

Regression Equation Points = 106.2 + 0.3669 GF - 0.4347 GA R-sq value 90.64%

Using this model and the predicted GF and GA under Mike Johnston for a full 2015-2016 season (187.4 GF, 196.2 GA), the Penguins would have finished with 85.4 to 94.1 points with 95% confidence. The Detroit Red Wings were the last team in the playoff with 93 points, suggesting that the Penguins would have been on the verge of missing the playoffs under Mike Johnston, according to the assumptions made. Thus, it was a shrewd move to hire Sullivan over Johnston.

Likewise, if Sullivan had coached the entire season and the Penguins were to score at the same pace, they would have had 261.2 GF and 206.51 GA. Using this linear regression model, the

Penguins would have finished with an estimated 112 points. They would have finished an estimated 8 points higher than their actual season total. While the Penguins still would have been a playoff team, they would have been a higher seed. Nonetheless, further studies can be done to validate the GF and GA regression model for overall points.

More specifically, what effect did shots on goal have on winning and losing for the

Penguins? Did the Penguins win when outshooting teams under each coach? In hockey, the only way to score is to shoot the puck, so naturally one would assume that more shots lead to more goals. But is this actually true for the Penguins under Johnston and Sullivan?

The Penguins, as seen in Table 9, outshot their opponent 70.83% of the games they won.

They put an average of 33.67 shots on net in games they won, with a +4.35-shot differential per game. In games they lost, they outshot their opponents 70.59% of the time and put an average of

32.53 goals on net with a +2.26-shot differential per game. In games they lost, it is clear they shot 24

less pucks on goal. From a strategy standpoint, teams should aim to get as many pucks on goal in

order to win, as evident by the summary statistics.

Table 9: Shot summary of Penguins’ wins and losses broken down by coach

Wins: Johnston Wins: Sullivan Wins: Total Season (28 games) (54 games) Outshoot 70.8% Outshoot 53.3% Outshoot 78.8% Average Shots 33.67 Average Shots 30.67 Average Shots 35.03 Shot Differential +4.35 Shot Differential -0.467 Shot Differential +6.55 Losses: Johnston Losses: Sullivan Losses: Total Season (28 games) (54 games) Outshoot 70.59% Outshoot 53.80% Outshoot 81.00% Average Shots 32.53 Average Shots 30.62 Average Shots 33.71 Shot Differential +2.26 Shot Differential -1.85 Shot Differential +4.81

How did the Penguins perform differently under Johnston versus Sullivan? Under

Johnston, the Penguins only outshot their opponents 53.3% of the time when they won and 53.8%

of the games that they lost. In games they won and lost, they had negative average shot differentials

(-0.467 and -1.85 respectively). Under Sullivan, however, the Penguins were a different team. In

games the Penguins won under Sullivan, they outshot their opponents 78.8% of the time with a

+6.55-average shot differential. On the other hand, in losses, they outshot their opponents in 81%

of the games and still had a +4.81-average shot differential. Even in games they lost, they managed

to put more pucks on goal than their opponents. As evident by the data, the Penguins played a

consistent style of play of putting pucks to the net in order to generate as many goals as possible.

Mike Sullivan pioneered this style of play that turned the Penguins’ season around.

As stated previously, the Penguins changed their system to a more fast-paced, simple game

where their skilled players could thrive (West 2016). Instead of trying to make the perfect play in

the neutral zone, the Penguins began to run a system where pucks would be chipped into the 25 opponents end in order to avoid a costly neutral-zone turnover. This style of play helped the

Penguins obtain a positive goal differential, which led to more goal scoring. As shown in Table 9, the key difference between Sullivan and Johnston is that Sullivan’s team consistently outshot its opponents and controlled the pace of play. Because time of possession is generally a difficult stat to describe in hockey due to is fluctuating fluidity, shots on goal is one way to measure a team’s success at controlling the puck more than the opponent. Nevertheless, Mike Sullivan’s style of play (one that allow skilled players to use their speed and skill to beat defenders) is one that should be emulated by teams around the NHL in order to generate more scoring opportunities and thus more goals.

2.10 Pittsburgh Penguins’ Goal Interarrival Times

In the previous section, Penn State’s goal interarrival times were analyzed. In this section, the interarrival times for the Pittsburgh Penguins’ goals were analyzed. The interarrival time summary statistics for the 241 goals scored are displayed in Table 10. Penn State’s average interarrival time between consecutive goals is 11.10 minutes, whereas the Penguins have an interarrival time of 10.15 minutes. The standard deviations are slightly similar, with the Penguins having a deviation of 7.62 minutes and Penn State having a deviation of 9.24 minutes.

Table 10: Basic statistics on Pittsburgh Penguins’ goal interarrival times

MINUTES SECONDS Mean 10.15 609.00 Variance 58.13 209275.16 Standard Deviation 7.62 457.47

The Weibull distribution fit the interarrival time data the best, according to the goodness- of-fit tests in Minitab. Because of the data’s variability the majority of distributions, including 26 exponential and gamma, were not nearly as good of fits as the Weibull disribution. The statistics are shown in Table 11. The Anderson-Darling (A-D) statistic was 1.246, which is good because a low value means that there is a small amount of area between the empirical data and the distribution. The Weibull had the lowest A-D value out of all the distributions tested. While the p- value was low (<0.010), the assumption going forward is to use the Weibull distribution because it appears to fit the data best (Figure 9). Likewise, the data is right skewed and extremely spread out according to the shape and scale. This makes sense in a sport where goals are very random.

Table 11: Weibull distribution fitting results for Penguins’ goal interarrival times

AD-Value 1.246 P-value <0.010 Shape 1.27602 Scale 654.42688

Figure 9: Distribution fitting for Pittsburgh Penguins’ interarrival goal data (R Output) 27

Once the Weibull distribution was deemed to be the distribution that best fits the data, further analysis was completed to see how the Penguins performed in the season. Generally, if teams score every 5 to 10 minutes, they are on pace to score six goals a game. With this said, six goals a game is not a realistic number of goals to score per game. In fact, it would be a rare occurrence, as the Penguins averaged 2.93 goals per game during the 2015-2016 season. However, as seen in Figure 10, 28.24% of the interarrival times is between 300-600 seconds or 5-10 minutes.

In other words, over a quarter of goal interarrival times fall between 5-10 minutes if the Penguins were to continue playing according to their 2015-2016 season pace. Likewise, 25.83% of Penn

State’s interarrival times fall between 5-10 minutes. In a game that relies a lot on emotion and natural momentum swings, it is imperative that teams score goals frequently and thus reduce the interarrival times of goals. Knowing that the Penguins won the Stanley Cup in 2016, it can be assumed that they were one of the better teams at reducing goal interarrival times. This is an assumption that should be backed up with future work.

Figure 10: Pittsburgh Penguins’ Weibull distribution plot highlighting the percent of goal interarrival times between 5-10 minutes 28

Moreover, 25% of the Penguins interarrival goal times, according to the sampling distribution, falls below 246.5 seconds (4.11 minutes). Figure 11 below displays the sampling distribution. This means that 75% of the Penguins’ interarrival goal times are more than 4.11 minutes. Realistically, this makes sense in a random sport, where goals are often highly variable.

Figure 12 highlights other percentiles from the population data in more detail. 95% of the goal interarrival times fall between 37 seconds and 1820 seconds. This is a large range that is explained mostly by the variability in the season data and the randomness of the sport. In future work, it would be interesting to compare this with other teams to determine how elite NHL teams differ from average NHL teams. Penn State, on the other hand, had 95% of their interarrival times fall between 32 and 2113 seconds. Both teams have similar ranges, with Penn State’s range being slighting larger. Using this research as a model, more in-depth analysis can be completed in the future.

Figure 11: Pittsburgh Penguins’ Weibull distribution plot highlighting the 25th percentile of goal interarrival times. 29

Figure 12: Empirical CDF for Penguins highlighting the 2.5, 5, and 97.5 percentiles of interarrival goal data

2.11 Pittsburgh Penguins Future Considerations

Teams can use the interarrival time data to see how they are performing. If teams find that they are not scoring frequently, they can adjust their strategies to take advantage of crucial momentum shifts during the game. In addition, by knowing the interarrival time sampling distribution for a particular team, winning probabilities can be calculated more accurately than

Thomas describes in his research. Thomas looked at the aggregated winning probabilities for all

NHL teams, but with more focused analysis on each team’s interarrival time data, predicted winning probabilities can become more accurate. In future work, game winning probabilities can be investigated to determine a team’s chances of winning based on the score and time reaming in the game.

30

2.12 Conclusion

Goals are needed to win hockey games. The goal interarrival times are crucial for winning games as well. The more frequently a team scores goals, the smaller the interarrival times are and thus, the team has a better chance at outscoring its opponent. In this chapter, the 2016-2017 Penn

State hockey team’s 160 goals scored were analyzed, as well as their interarrival times. The purpose was to find any consistencies in team strategies. As discussed, Penn State saw a drop in second period goals, which can be contributed to a more conservative style of play in the second period. This output data is useful for coaches wishing to see which points of the game scoring is deficient in order to adjust their strategy to improve scoring.

The same data was analyzed for the 2016 Stanley Cup Champion Pittsburgh Penguins. The

Penguins’ 241 goals were studied under Mike Johnston and Mike Sullivan. While there is not much that could be concluded from the histogram of goals scored per minute in the game, unlike with Penn State, the goal pace under Mike Johnston and Mike Sullivan concluded that the Penguins would not have made the playoffs with Johnston. The Penguins saw a spike in shot differentials and played a faster pace to coincide with the players’ skill sets. Likewise, a linear regression model created concluded that GA are weighed slightly more than GF in the model that predicts a team’s final point value at the end of the season. The Penguins, under Mike Sullivan, had average positive shot differentials in both losses and wins. On the other hand, the struggling Penguins in the 28 games under Mike Johnston had an average negative shot differential in both wins and losses.

Shots are essential to winning and being successful. The data emphasized this point. It is the recommendation that teams get as many pucks to the net during the course of a game and season as possible. In order to do this, coaches should cater to their players’ skill sets, as Mike Sullivan was able to do when taking over as head coach. This study proves the validity of Mike Sullivan’s 31 skill as head coach, as he was able to take the difficult decisions away from his skilled players and let them get as many scoring chances as possible (West 2016).

Nevertheless, the interarrival times for Pittsburgh goals were fitted to a Weibull distribution, where broader conclusions were made. Penn State, judging from the sampling distribution, would see 25% of their goal interarrival times fall between 5-10 minutes, which is a good benchmark to achieve. Teams have the objective of scoring as frequently as possible to outscore their opponents. If teams can score consistently every 5-10 minutes, they will always be in a position to win games. In a sporadic sport like hockey, it is the coaches and players’ jobs to harness the in-game momentum shifts in order to score frequently. The analytics is just another tool to help coaches and players adjust their playing style to win games and ultimately make the playoffs. Future analytical work with this data can be completed to make models for predicting the outcome of success at certain points in the games. However, the data analyzed gives reasonable insights to the randomness of the sport and the probability of winning games based on team goals.

32

Chapter 3: Evaluating Player Contribution to Team Success

The methods and results for quantifying a player’s worth to his team are described in the following sections.

3.1 Introduction

The previous chapter focused on team statistics and goals scored during the season. The primary purpose of the goal interarrival chapter was to determine if certain playing styles or team strategies during the game could affect a team’s success. This chapter, however, focuses on individual players. Every National Hockey League (NHL) player from four NHL regular seasons were analyzed and clustered by their style of play.

Inspiration for this chapter was drawn from the premise of finding undervalued players in

Major League Baseball (MLB). Billy Beane, the GM for the Oakland Athletics during the early

2000s, tried to find undervalued professional baseball players that could help the team compete with the best, most financially stable teams in the league. Billy Beane’s strategy is publicized in

Michael Lewis’ book Moneyball. Kevin Grier and Tyler Cowen write that “The Moneyball thesis is simple: Using statistical analysis, small-market teams can compete by buying assets that are undervalued by other teams and selling ones that are overvalued by other teams” (2011). Beane used players’ On-Base-Percentage (OBP) as a measure to find cheap players that can get on base and subsequently score runs. He found that home-run hitters are financially overvalued by the league and players that can get on base via any method are cheap alternatives for building a winning team (Cowen & Grier 2011).

The NHL currently has 31 teams and every GM in the league aims to make his team the most competitive team in the league every year. He does this by picking a team of professional 33 players that can contribute the most to team success. While teams want the best players in the league on their team, there is a salary cap that prevents teams from assembling a team full of elite players that come with large financial contracts. The current salary cap in the NHL is $75 Million

(Clipperton 2017). According to Timothy Chan, who researched player contribution to team performance, “GMs must build well-balanced, high-performing teams by leveraging any competitive advantage they can find” (Chan et al., 2007). By using the statistics that are collected on every NHL player, teams can predict future play and what each player effectively contributes to the team. While hockey analytics is a growing field and still not the sole source GMs rely on, it can certainly help teams determine the best players for their team.

In this chapter, player types are defined and their contribution to team success is investigated. Four NHL seasons, beginning with the 2013-2014 NHL season and ending with the

2016-2017 NHL season, are used to cluster players into specific player types. Forwards and defensemen are clustered into four player types each and then further analyzed for their effect on team success.

Likewise to Billy Beane’s approach in MLB, the Corsi For Percentage (CF%) in hockey is used in this analysis to identify any potentially undervalued players in the NHL. CF% is a statistic that measures a player’s puck possession skill (Masisak 2015). Due to the fluidity of the sport,

CF% is an effective way to identify players that can control the pace of play. The statistic compares the shots that a player’s team produces versus the shots that his opponent’s team produces while he is on the ice. In other words, if more shots are generated while the player is on the ice, it can be assumed his team is possessing the puck better than the other team. Over the course of a season, players with CF% above 50%, are viewed as players that can positively control the game for his 34 team. These players may not have the most points on their team and may be undervalued for their impact they have in generating quality scoring chances for their teams.

3.2 Literature Review

How can NHL teams assemble the best group of players, while not going over the league salary cap? NHL teams consist of 20 skaters each game. There are 12 forwards, 6 defensemen and two goalies that make up a team (Chan et al., 2012). NHL GMs have to put together a team that can compete for the Stanley Cup without spending more than $75 million. While this may seem like a simple task, assembling a competitive team is very challenging. Timothy Chan states that

“In hockey, scientifically analyzing individual player performance is a challenging task because the game is so fluid” (2012).

In his research, Chan clusters forwards and defensemen into four clusters each and goalies into three clusters. Forwards are classified as Top Line, Second Line, Defensive, or Physical.

Defensemen are classified as Offensive, Defensive, Average, or Physical. Finally, goalies are classified as either Elite, Average, or Bottom. Chan looks at four seasons worth of player statistics and creates clusters for each individual season. For forwards, Chan uses goals (G), assists (A), plus/minus (+/-), hits, blocked shots (Blks), and minutes (PIM) as the key statistics to cluster the players. He uses the same statistics for defensemen but combines goals and assists into total player points (Pts) since defensemen score less than forwards. Goalies are put into cluster groups by save percentage (SV%), goal against average (GAA), wins (W), and number of games without giving up a goal (shutouts or SO).

The reason that he only uses single-season statistical results to create cluster groups is because players can change teams and their roles on a team can change from year to year. For this 35 reason, the models created in this chapter follow the same approach. The objective of his research, once the cluster groups are created for each position, is to determine the “impact of different player types on the overall performance of a team” (Chan et al., 2012). Chan looks at all the teams for the

2008-2009 season and determines the number of players from each player cluster that each team has. From there, he determines the player contribution to the team. The independent variables are the player types and the dependent variable in the model is the number of points the team finished with at the end of the season.

At the time of Chan’s research in 2012, advanced hockey statistics were not as common as they are today. According to NHL.com, Corsi and Fenwick, two very common advanced statistics were not publicly available on the site until 2011. Chan’s model did not include Corsi For and

Fenwick percentages, which differentiates the model in this current research from any prior research. Brian Macdonald writes in his research analyzing player performance using regression models, “Fenwick rating (shots plus missed shots) and Corsi rating (shots, missed shots, blocked shots) have been used to analyze players and teams because they have been shown to be better than goals as a predictor of future goals” (2012). Corsi and Fenwick are enhanced statistics that are used to track puck possession. It is arduous to track puck possession in a sport as fluid and spontaneous as hockey. Thus, these advanced statistics attempt to quantify puck possession by comparing a team’s shot attempts to the opponent. The theory is that if a player is on the ice for more shots for than shots against, he is positively contributing to team puck possession.

Furthermore, Macdonald questions in his expected goals model research, “Can this predictive performance be improved further if we include additional statistics like hits, faceoffs, etc., as predictor variables, along with some combination of goals, shots, missed shots and blocked shots?” (2012). Chan’s research from 2012 does not incorporate CF% to cluster players, thus the 36 goal of this research model is to better the current clustering and regression model using an advanced puck possession metric.

3.3 Methodology

The source of the data for this chapter, along with the methodologies for creating the player clusters, the linear regression model, and the bi-criteria optimization model are described in this section.

3.3.1 Data

The individual player statistics for this chapter were gathered and cross-referenced for accuracy from multiple sources. The data for every player during the 2013-2014, 2014-2015, 2015-

2016, and 2016-2017 NHL seasons was gathered from https://frozenpool.dobbersports.com and cross referenced on https://www.hockey-reference.com (2018). Only player statistics for each 82- game regular season were collected. In each season, 850 players were recorded. Once the data was recorded, only players that played at least 10 games in each season were kept for clustering. It was determined that players playing less than 10 games had little significance on his team’s ultimate season performance. The 10-game threshold, left 747, 729, 732, and 730 players in the respective

2013-2014, 2014-2015, 2015-2016, and 2016-2017 NHL seasons.

Finally, the salary cap data was collected from https://frozenpool.dobbersports.com. It was cross-referenced on https://www.capfriendly.com, as well (2018). The cap hit for each player was considered. Essentially this is the overall value of a player’s contract divided by the length of the player’s contract in years. It is the measure of the annual salary cap contribution to his team (Chan et al., 2012). The current salary cap is $75 million, according to https://www.capfriendly.com. 37

3.3.2 Clustering Model

The clustering in this research only looks at NHL forwards and NHL defensemen, unlike

Chan’s research which considers goalies, as well. Goalies were not considered for this research because Chan’s model already had thorough analysis on goalie performance that would not be affected by the introduction of the CF%.

Forwards and defensemen were each clustered into 4 distinct groups to provide greater granularity on their specific playing style. The statistics used to cluster forwards were goals, assists, plus-minus (+/-), hits, season penalty minutes (PIM), and Corsi For Percentage (CF%).

Goals and assists are very common statistics used to locate the elite point producers in the NHL.

The +/- statistic is also very common to determine if a player is on the ice more times when a goal is scored for his team than when a goal is scored against his team. Every time a goal is scored when a player is on the ice, he gets a plus one. When a goal is scored against his team when he is on the ice, a minus one is recorded for him. Hits is the amount of body-checks that the player records on opposing players during the season. PIM, or penalty in minutes, is the number of minutes that a player receives in penalties. Minor penalties, which can range from tripping to slashing, are two minutes each. Major penalties, which are usually given for fighting, are 5 minutes. When a player receives a penalty, he cannot play for the length of the penalty in minutes.

Finally, CF%, as stated previously, measures puck possession. CF% looks at shots taken while a player is on the ice, whether he was the one that shot the puck or not. The exact formula for Corsi

For (CF) is:

퐶퐹 = 푠ℎ표푡푠 표푛 𝑔표푎푙 + 푠ℎ표푡푠 푏푙표푐푘푒푑 푏푦 푡ℎ푒 표푝푝표푠𝑖푛𝑔 푡푒푎푚 + 푠ℎ표푡푠 푡ℎ푎푡 푚𝑖푠푠 푡ℎ푒 푛푒푡 38

To find the CF%, CF and Corsi Against (CA) are considered. CA is the same formula as

CF, but considers shots that the opposing team takes when a player is on the ice. The CF% formula is shown below:

퐶퐹 퐶퐹% = 퐶퐹 + 퐶퐴

As noted by https://www.hockey-reference.com, a percentage above 50% means that a player’s team is in control of the puck more frequently than the opposing team when he is on the ice.

Moreover, defensemen were clustered based on total individual points, +/-, hits, PIM, and

CF%. The points statistic is the summation of goals and assists. The reason points were used as opposed to looking at goals and assists individually is because defensemen tend to score less than forwards. Points would provide more data on scoring for defensemen.

For both forwards and defensemen, these statistics were normalized and standardized, similar to Chan’s approach in his NHL player contribution research model. Every players’ statistics were normalized using games played. For example, goals were normalized by dividing a player’s goals during the season by the games he played. Next, the statistics were standardized in

Minitab to ensure that one statistic did not weigh more than another statistic. No statistic was deemed to be more important for clustering purposes.

Minitab was used to cluster the players. More specifically, the k-means clustering technique was used to cluster forwards and defensemen into four groups, respectively. Every team has four forward lines (12 forwards with three on each line) and three defensive pairings (6 defensemen with two per pairing). For the forwards, it makes logical hockey sense to cluster 39 forwards into four groups. For defensemen, there are 3 pairings, but to gain more granularity on the exact defensemen player type, a fourth cluster was determined to be useful (Chan et al., 2012).

For every season, the same method was used and from there, the cluster types were appropriately named for each player type. The cluster names for forwards were top player, 2nd line player, grinder, and average player. Top players generally have the most goals and assists. Second- line players have similar statistics as top players but may occasionally have better seasons than top players. Grinders most often lead their teams in hits and penalty minutes. Average forwards are players that teams will place on their third or fourth line because they do not produce as many points as top line forwards or 2nd line forwards.

For defensemen, the player types were offensive, defensive, 3rd pair, and physical defensemen. Offensive defensemen typically have the most points, whereas the defensive defensemen have the most positive +/- numbers. 3rd pairing defenseman don’t typically lead in any statistic, as they are average defensemen. Finally, physical defensemen lead in the hits and PIM category. Once the clusters were created, the average summary statistics for each cluster were reported along with the average cap hit for the cluster. Table 12 summarizes the normalized clustering parameters used for forwards and defensemen.

Table 12: Normalized clustering statistics (GP = Games Played)

Forwards Defensemen Goals/GP Pts/GP Assists/GP +/- / GP +/- / GP Hits/GP Hits/GP PIM/GP PIM/GP CF% (Corsi For Percentage) CF% (Corsi For Percentage)

40

3.3.3 Player Contribution Linear Regression Model

A linear regression model was created to determine the effect each player type has on his team’s season point total. Season point totals determine if a team gets in the playoffs. A team earns two points in the standings for every win, one point for every overtime loss, and zero points for every regulation loss. 16 teams in the NHL make the playoffs every season, which is determined by the amount of points a team has at the end of the regular season. A regression model was created to determine how much impact each player type has on a team’s overall point total.

The 2016-2017 season statistics were used in the research to create the model. For every team, a weighted playing time statistic called Effective Average Ice Time (EAIT), exclusive to this research, was created. The purpose of this weighted statistic was to determine how many players from each cluster were on each team and how much they played for that team during the season. EAIT was calculated by first dividing each player’s number of games played by 82 games

(full season). Then, each player’s average season ice time was multiplied by this percentage. This number is a player’s EAIT. The formula shown below is calculated for every player.

퐺푃 퐸퐴퐼푇 = × 푃푙푎푦푒푟′푠 퐴푣푒푟푎𝑔푒 퐼푐푒 푇𝑖푚푒 82

Subsequently, for every player cluster type on every NHL team, the EAIT for those players was summed. For example, the Pittsburgh Penguins had seven top line players on their roster. Each of their EAIT values summed to 75.34 minutes. This EAIT summation was done for every player type on all 30 NHL teams for the 2016-2017 season.

The EAIT data for every team was used to create the linear regression model, where the dependent variable was the total team points and the independent variables were the player types.

The intercept value was not considered in the linear regression, similar to Chan’s research, because 41 the regression coefficient values had more significance when the intercept was not considered. The coefficients in the regression model represent the contribution that each player type has to his team’s season point total. For example, a top line forward in this model has a regression coefficient of 0.466. Because EAIT is used to calculate the coefficients, the coefficients represent the contribution that each player type has on his team’s point total for every minute of effective average ice time that player type plays. Intuitively this may seem ambiguous, but it is meaningful to look at the coefficient values relative to each other. For example, a top line forward has a higher coefficient value than a physical forward, meaning that he contributes more to team success than a physical forward.

Further data analysis was done on the clustering and regression model. Undervalued players were searched for in each year’s cluster. In addition, using the regression model, team’s predicted season point totals and their actual point totals were compared to look for teams that underperformed.

Finally, a bi-criteria optimization model was created based off the regression model in order to determine the optimal number of player types per team, while also minimizing salary cap.

A slight modification was done to the regression model coefficients in order return an output that demonstrated the optimal number of player types to select for the team. The coefficient values for each player type in the 2016-2017 season was multiplied by the average total ice time for that player type. The modified coefficient values represent the points that each player type contributes to his team in a season given that he plays the average ice time specified for his player type. For example, the top line forward coefficient value (0.466) was multiplied by the top line player’s average season ice time total (16.45 minutes) to get a season team point contribution of 7.666 points. In other words, each top line forward contributes 7.666 points to his team during the season, 42 assuming he plays the specified season average ice time. After the data modification was complete, further analysis was conducted.

3.4 Results: Player Clusters

As stated previously, NHL players from four seasons were analyzed to create specific player types. Each season was analyzed separately. The ultimate goal of clustering players was to find undervalued players. In addition, the player types were used to create a linear regression model used to predict player impact on team performance.

The first NHL season analyzed was the 2013-2014 season. In Table 13, the forward clusters are summarized by their clustering statistics. The highlighted values represent the maximum values in each statistical category. Table 14 lists a couple players that are in each forward cluster.

Table 13: 2013-14 forwards clustering results

Player No. Goals Assists +/- Hits PIM CF% Cap Type Players Hit ($M) Top player 125 21.36 30.10 9.71 52.92 29.77 52.93 4.04 2nd line 86 13.30 16.44 -.10 141.83 49.58 50.49 2.53 Grinder 45 4.02 5.09 -5.76 118.98 93.69 43.35 2.08 Average 217 6.96 10.32 -5.35 53.82 20.90 48.40 2.60

Table 14: Sample forwards in 2013-14 clusters

Player Sample Type Players Top player Sidney Crosby 2nd line Mike Fisher Ryan Kesler Grinder Ryan Reaves Average Sam Gagner Matt Read

43

It is clear that the top line forwards produce the most points (goals and assists). They also have the most positive +/- rating, which makes sense because they are on the ice for more goals scored than goals against. In order to be one of the best players on the team, top line forwards must produce a significant amount of points. Likewise, top line forwards have the highest CF%. Again, this validates the clustering results because top line players can effectively control the pace of the game by possessing the puck and leading the team in scoring opportunities. As stated previously, when a player’s CF% is above 50%, that player’s team is generating more scoring opportunities than the opponent when he is on the ice.

Furthermore, the clustering results follow logical hockey regression. In other words, the goals, assists, +/-, CF%, and cap hit all increase or decrease sequentially down to the next cluster type. For example, the top line forwards have the most goals and assists. The next group is the second line forwards followed by the average forwards. Finally, the grinders, or physical forwards, have the least amount of points. However, the physical players are marked by their penalty minutes and hits. These players are generally less skilled, but are more physical than any other player type.

Because of this, they do not possess the puck as well as top and second line players and thus, they earn less money than any other player type.

As expected, top line players are best at generating scoring opportunities, otherwise they would not be top players. CF% is a statistic that can help pinpoint which teams and players dominate the sport. As one journalist in the esteemed The Hockey News magazine notes about playoff series, “Since 2007-08 – not including this year – the team that won the possession battle in the series won 65 percent of their respective series, and their chances of winning went up the higher the possession advantage was” (Luszczyszyn 2015). Thus, teams that have more puck possessing forwards and defensemen will lead their teams to long-term success. These players are 44 elite players in the league, but may also be undervalued because they do not generate the abundance of goals and assists that the highest paid players do. Similar to Billy Beane’s OBP% undervalued key identifier statistic, the CF% is used to find undervalued players in the NHL in this research.

In Figure 13, forward player types are graphed against a goal/game and CF% axes to identify any potential undervalued players.

Figure 13: 2013-14 forwards cluster matrix plot (Sidney Crosby)

Cluster 1 represents top line forwards, cluster 2 represents 2nd line players, cluster 3 represents grinders, and cluster 4 represents average forwards. In Figure 13, Sidney Crosby is identified because he is considered to be the best player in the NHL. In the 2013-2014 season,

Crosby averaged 1.3 points per game and finished with a 53.07% CF%. It is clear that Crosby is elite, thus meriting his $8.70 annual salary cap hit. This clustering model can be used to validate a certain player’s worth. 45

On the other hand, the research model was able to identify an undervalued player. Tyler

Toffoli, of the , finished the season with an astounding 60.31% CF%, but only averaged .468 points/game. Looking at Figure 13, Toffoli lies in the top line forward category, but only earns $0.87 million per season.

Figure 14: 2013-14 forwards cluster matrix plot (Tyler Toffoli) In the following season, Toffoli averaged .64 points/game and in the 2015-2016 season, he averaged 0.71 points/game, according to Hockey Reference. His point-per-game production increased in the following two seasons since his outstanding CF% was discovered in 2013-2014.

This key statistic could have been used to identify Toffoli as an undervalued asset that could produce points in the future since he was certainly generating many scoring opportunities.

Moreover, defensemen were also clustered in the 2013-2014 season. The results are shown in Table 15. The highlighted cells represent the maximum values in each statistical category. The offensive defensemen are known for the offensive skills, thus have the highest average point total.

They also possess the puck better than any other type of defensemen, as they also have the highest 46

CF%. Defensive defensemen are good at preventing goals from being scored, and thus have the most positive +/- statistic because they do not let many goals get scored on them while they are on the ice. In addition, physical defensemen are leaders in hits and penalty minutes because of their aggressive style of play. Finally, the 3rd pair defensemen (or average NHL defensemen), as expected, do not lead any of the statistical categories. Subsequently, they have the lowest salary.

In Table 16, sample players from each defensemen cluster are listed.

Table 15: 2013-14 defensemen clustering results

Player No. Points +/- Hits PIM Corsi Cap Type Players For % Hit ($M) Offensive 52 38.44 3.62 82.04 37.96 52.07 4.17 Defensive 95 14.41 6.04 75.05 28.87 51.82 2.08 3rd Pair 71 11.61 -7.82 60.48 22.65 45.95 1.84 Physical 34 10.41 -3.32 145.44 68.91 47.00 2.03

Table 16: Sample defensemen in 2013-14 clusters

Player Sample Players Type Offensive Duncan Keith P.K. Subban Defensive Dan Girardi Johnny Boychuk 3rd Pair Andrew MacDonald Physical Dion Phaneuf

Defensemen, just like forwards, can be over or undervalued in terms of their contribution relative to their salary. The clustering results facilitated the mining for undervalued and overvalued defensemen. Figure 15 represents the defensemen clustering types for the 2013-2014 NHL season.

Defensemen are plotted on a Pts/game versus CF% plot. The offensive defensemen, shown in blue, are the clear leaders in scoring and possession metrics. Having elite puck possession skills as a 47 defenseman can be extremely valuable to teams because defensemen lead the rush out of their team’s defensive zone. If they can possess the puck well and generate scoring chances, their team will be successful during the time they are on the ice.

More specifically in Figure 15, John Carlson of the is highlighted in the offensive defensemen category. He averaged .463 points/game and had a season CF% of

46.96%. On the other hand, as highlighted in Figure 16, Jack Muzzin of the Los Angeles Kings produced a slightly less 0.316 points/game season average but had an astounding 61.04 CF%. As a defensive defenseman, Muzzin’s cap hit was only $1.0 million versus John Carlson’s $3.90 million cap hit. Jake Muzzin controlled the puck better and was generating more shot attempts than Carlson but earned $2.90 million less. As a GM, Jake Muzzin would be an attractive player to recruit because his skill was undervalued relative to his annual cap hit.

Figure 15: 2013-14 defensemen cluster matrix plot (John Carlson)

48

Figure 16: 2013-14 defensemen cluster matrix plot (Jake Muzzin)

Likewise, it is worth mentioning that the two players determined to be undervalued in this study (Tyler Toffoli and Jake Muzzin) both played for the 2014 Stanley Cup Champion Los

Angeles Kings. Both players had low cap hits, but were able to control the puck very well. Their low cap hits allowed the LA Kings to pay more money for higher point producing players.

However, by paying less for Toffoli and Muzzin, the Kings did not sacrifice scoring opportunities.

Zone time and Corsi metrics are very closely related, according to a blogger on the L.A. Kings SB

Nation blog (P, R. 2013). Thus, by having a high CF%, one can assume the team controlled the puck in the opposing team’s zone more often than not when Toffoli and Muzzin were on the ice.

The undervalued players on the team were generating offense, while also limiting the opponent’s scoring chances. The clustering results led to the discovery of these undervalued Stanley Cup

Champion players. GMs can use a similar approach to find undervalued players to help their team in the future. 49

The next season analyzed was the 2014-2015 NHL season. The same method was used.

The clustering results are shown in Table 17 for forwards and Table 18 for defensemen. Again, the highlighted cells are the maximum values for each statistical category.

Table 17: 2014-15 forwards clustering results

Player No. Goals Assists +/- Hits PIM CF% Cap Type Players Hit ($M) Top player 153 18.88 28.44 7.17 57.25 27.118 52.571 3.53 2nd line 90 13.4 15.69 .978 146.42 38.14 51.60 2.41 Grinder 54 3.74 5.19 -4.35 118.94 74.11 45.63 1.20 Average 176 6.59 9.26 -7.15 60.22 18.73 46.57 1.80

Table 18: 2014-15 defensemen clustering results.

Player No. Points +/- Hits PIM CF% Cap Hit Type Players ($M) Offensive 54 40.06 5.74 91.4 35.30 51.31 4.00 Defensive 104 13.38 4.09 61.71 23.86 51.81 2.09 3rd Pair 44 9.48 -.30 65.55 19.02 45.34 1.64 Physical 53 11.02 -1.83 138.43 51.77 48.02 2.05

Similar to 2013-2014, the top line forwards lead the goals, assists, +/-, CF%, and the annual cap hit categories. The grinders, again, led the way in penalty minutes, while the average forwards did not lead any categories. On the other hand, offensive defensemen led the points, +/- and cap hit category, but unlike the previous season, they trailed the defensive defensemen in the CF% category by .50%. The physical defensemen led the hits and PIM category and merit an almost identical cap hit as 2013-2014. It is interesting to note that offensive defensemen earn more on average than top line forwards. Teams are more willing to overpay for an elite offensive defenseman than they are to pay for an elite forward. The contribution to team success will be discussed in the following chapter (3.5 Regression Model) to determine if paying more for an offensive defenseman is an economically sensible management move for NHL teams. 50

Jake Muzzin (Figure 17) appears to be in the offensive defenseman category with a CF% of 58.14% and a point/game production of 0.53 points/game. He is a consistent player that was only being paid $1.0 million, which is less than the average offensive defenseman earns. His season in 2013-2014 does not appear to be a fluke, but rather it appears that Jake Muzzin is an undervalued player considering his production did not dip in the 2014-2015 NHL season. It is this type of research that can assist GMs to better complete their roster while not overpaying for players.

Figure 17: 2014-15 defensemen cluster matrix plot (Jake Muzzin)

Further data mining for undervalued defensemen was completed. In Figure 18, Kris Letang from the Pittsburgh Penguins is highlighted and in Figure 19 Kevin Shattenkirk from the St. Louis

Blues is highlighted. Kris Letang averaged .78 points/game and had CF% of 56%, while

Shattenkirk averaged .79 points/game with a CF% of 54.24%. Both players were top line players, but Letang made $3 million more. Which player would GMs prefer to have on their team? By analyzing their puck possession skills using CF%, it can be argued that they are similar players, 51 but Shattenkirk is paid less and thus would make more sense for many NHL teams that are trying to save money.

Figure 18: 2014-15 defensemen cluster matrix plot (Kris Letang)

Figure 19: 2014-15 defensemen cluster matrix plot (Kevin Shattenkirk)

The next season of player data clustered was the 2015-2016 season. The results for forwards is shown in Table 19 and the results for the defensemen is shown in Table 20. The 52 maximum values for each cluster are consistent with the previous two seasons. One key change is that the physical defensemen are the leaders in the CF% statistic. This is different than previous seasons, but it appears for the 2015-2016, the physical defensemen clustering group was able to generate, on average, more offense than in previous seasons. Jake Muzzin, for example, fell into the physical defensemen cluster group for this season because he had more hits and PIM than previous seasons. However, he still managed to earn a 57.45 CF%. Players, like Muzzin who fell in the physical defensemen category bolstered the overall CF%.

Table 19: 2015-16 forwards clustering results

Player No. Goals Assists +/- Hits PIM CF% Cap Type Players Hit ($M) Top player 109 23.211 32.17 7.30 72.09 34.66 51.76 4.95 2nd line 162 11.18 15.19 -1.06 63.62 23.22 51.55 2.25 Grinder 81 7.35 8.86 -4.25 131.57 63.8 47.86 1.87 Average 130 4.32 6.18 -5.63 57.92 16.8 45.60 1.48

Table 20: 2015-16 defensemen clustering results

Player No. Points +/- Hits PIM CF% Cap Type Players Hit ($M) Offensive 30 43.033 -6.27 80.37 38.10 50.47 4.78 Defensive 53 24.70 11.92 64.92 26.62 52.01 3.19 3rd Pair 132 10.92 -2.93 69.14 24.37 47.88 2.15 Physical 34 12.65 2.29 146.62 67.79 52.175 2.40

For the 2015-2016 season, Nikita Kucherov appeared to be in the process of becoming an elite top line forward. As shown in the forward cluster matrix plot in Figure 20, Kucherov averaged

.39 goals/game and finished the season with a 54.13% CF%. Kucherov, still in his rookie contract, is clearly on his way to getting paid more money. His elite skill level is apparent in the cluster, as he is in the same group of players like Sidney Crosby (arguably the best player in the world).

Nevertheless, Kucherov earned $4.70 million the next season. It was clear from the clustering results that the saw potential in him and gave him a larger contract to stay 53 with the team. The clustering results have another use; teams can use the results to decide if a player has enough potential to earn a larger salary in subsequent seasons.

Figure 20: 2015-16 forwards cluster matrix plot (Nikita Kucherov)

Finally, the 730 players who played at least 10 games in the 2016-2017 season were also clustered. The results are shown in Table 21 and Table 22. The key difference from any other season studied is that the 2nd line forwards have more assists on a whole and a higher cap hit compared to the top line forwards. This can be attributed to the fact that some elite players may have been clustered into the 2nd line forwards category for some of their other attributes. For example, Patrick Kane was a considered a 2nd line forward in this season’s clustering, but as many hockey experts know, he is an elite player that should generally be considered a top line forward.

Patrick Kane’s dip in goal scoring and +/- rating may be one reason he was clustered as a 2nd line forward for the 2016-2017 season. Nonetheless, the rest of the forward maximum statistical values remain consistent with the previous seasons. 54

Likewise, the defensemen clustering results are also very consistent with the previous years. The reason four years of clustering was completed was to verify the accuracy and consistency of the results. In section 3.5, the regression model uses the 2016-2017 season data due to the fact that it is more recent and because it is consistent with the other year’s data.

Table 21: 2016-17 forwards clustering results

Player No. Goals Assists +/- Hits PIM CF% Cap Type Players Hit ($M) Top player 64 24.09 23.55 6.86 95.69 37.36 52.36 3.70 2nd line 124 17.03 30.02 2.97 44.82 27.27 51.49 4.09 Grinder 59 5.66 6.92 -3.14 119.27 62.44 46.10 1.64 Average 235 7.01 8.51 -.08 60.95 19.09 48.28 1.66

Table 22: 2016-17 defensemen clustering results

Player No. Players Points +/- Hits PIM CF% Cap Hit Type ($M) Offensive 47 40.11 6.34 67.40 29.47 52.01 4.45 Defensive 72 14.53 8.46 77.13 25.36 50.87 2.10 3rd Pair 88 13.52 -8.33 64.74 25.55 47.71 2.45 Physical 41 10.22 -.17 127.75 54.73 49.90 2.10

Finally, Nikita Kucherov appeared to be becoming an elite player in the 2015-2016 season.

In the 2016-2017 season, Kucherov got a raise and was paid $4.70 million and proved his worth.

He was in the top player forward cluster again and ended his season with a 55.39% CF%. He averaged .54 goals/game, which was an increase from his .39 goals/game average in the previous season. Once again, the clustering model was able to find players that were undervalued and players that were bound to increase their point production in subsequent seasons. 55

Figure 21: 2016-17 forwards cluster matrix plot (Nikita Kucherov)

The ultimate goal of the clustering models for each season was to find undervalued or overvalued players to determine where NHL GMs could have seen a future value of signing a certain player to his team. As discussed in multiple examples, there was an opportunity using this model to find undervalued players. This clustering technique, which utilizes CF%, unlike Chan, has proven to be successful in the examples discussed to be a tool that can find cheap players that can help teams win. Nevertheless, the clustering model, specifically for the 2016-2017 season, is used to create a regression model in the next section to predict player impact on team performance.

3.5 Results: Regression Model

Once the clusters were created for forwards and defensemen for the 2016-2017 season, a regression model was created to find a particular player type’s impact on team point total. It is important to understand that NHL GMs try to build their teams to make the playoffs. Only the top 56 eight teams from each conference (Eastern and Western) make the playoffs. In other words, only

16 out 31 teams make the playoffs each year based on their regular season team point total.

In order to make the playoffs, teams need a certain amount of points. The minimum number of points to make the playoffs may vary every year, but for the 2016-2017 season, the minimum number of points needed to make the playoffs in the Eastern Conference was 95 points (Toronto

Maple Leafs). In the Western Conference, teams needed at least 94 points to make the playoffs

( Flames). Every season, NHL GMs must ask themselves how they should assemble their team in order to make the playoffs and thus, compete for the Stanley Cup. Once teams make the playoffs, regardless of their seed, they have a shot at winning the Stanley Cup.

Nevertheless, a linear regression model was created using the 2016-2017 player types and every NHL team’s final season point total to predict the effect that specific player types have on season point total. As stated in the methodology, the contribution to team success for every player was broken down by each team’s sum of each player type’s Effective Average Ice Time (EAIT).

The linear intercept was not included in the regression model because the coefficient values had greater significance without it. The linear regression equation is shown below in Table 23:

Team Points = 0.466(Top-F) + 0.4681(2nd-F) + 0.238(Average-F) + 0.320(Grinder-F) + 0.256(OD)

+ 0.344(DD) + 0.127(3rdD) + 0.282(Physical)

Table 23: Team point regression model coefficient values (2016-17)

Player Type Regression Coefficient Top Line Forward 0.466 2nd Line Forward 0.4681 Average Forward 0.238 Grinder Forward 0.320 Offensive Defensemen (OD) 0.256 Defensive Defensemen (DD) 0.344 3rd Pair (average) Defensemen 0.127 Physical Defensemen 0.282 57

The regression model is determined to be statistically significant because the p-value is 0.

In addition, the R-squared value is 99.53%, which indicates that the model is a good fit for the data since 99.53% of the variation in team points is explained by the independent variables (player types). In addition, the R-squared predicted value is 99.15%, which is an indication that the model is good for predicting future team points.

Each coefficient value can be understood as each player type’s team point contribution, for every minute of EAIT. It is clear that top line forwards and 2nd line forwards provide the most value to the team, as their coefficient values are 0.466 and 0.4681, respectively. In the NHL, these players are highly recruited during free-agency because NHL GMs realize the contribution they can have on their team. This model confirms this. These players can score, setup goals, and lead the team in possession metrics. Their presence on teams is essential for championships.

Out of the defensemen, it appears that the defensive defensemen provide their team .344 points for every minute of EAIT. Interestingly, the average forwards and 3rd pair defensemen provide the least value to their team. Grinders and physical defensemen actually contribute more to team success than average forwards and 3rd pair defensemen. Logically, this makes sense because while grinders may be more physical and take more penalties, they can also kill penalties and prevent goals from being scored. Likewise, physical defensemen are good at eliminating prime opponent scoring chances by playing an intimidating style of play.

The model’s Root Mean Square Error (RMSE) was 6.35, which is very similar to Chan’s player performance regression model (2012). Nevertheless, each team’s sum of EAIT for each player type was inputted into the regression model in order to find the predicted point totals for every team. Table 24 highlights three playoff teams that overperformed (Penguins, Blues, and the 58

Flames) and one non-playoff team that underperformed (Jets), according to the model’s assumptions.

Table 24: Team Predicted Points vs. Actual Points (2016-17)

Team Predicted Points Actual Points Difference Pittsburgh Penguins 103.5 111 +7.5 St. Louis Blues 89.9 99 +9.1 92.3 94 +1.7 91.7 87 -4.7

The Penguins overperformed by 7.5 points according to the model, but their overperformance was not nearly as impactful as the St. Louis Blues or the Calgary Flames. The minimum number of points to make the playoffs for the 2016-2017 season in the Western

Conference was 94 points. The Winnipeg Jets were the first team out of the playoffs with 87 points but underperformed and were expected to finish the season with 91.7 points. It is interesting to note that if the St. Louis Blues did not overperform they would have ended with 89.9 points

(roughly 90 points) and if the Winnipeg Jets did not underperform, they would have ended with

91.7 points (roughly 92 points). The Jets would have been playoff contenders, instead of the Blues had each team performed the way the model expected them to perform. It is an overperformance or underperformance that can be attributed to a coach’s ability to lead his team.

This analysis is important because it can help GMs measure their players and coaches.

When a team underperforms, a number of factors can become the source of blame. One is the coach. The coach may not be utilizing his talent appropriately or the players may not be a fit for the coach’s system. It is this type of analysis that can help GMs restructure their team in the off- season to ensure that the coach and players are a good fit for each other, in order to perform up to 59 expectations. NHL GMs can build off this research’s model to compile a team that is predicted to earn a playoff spot.

Nonetheless, the regression model was a good indicator of team success. The model can be improved and possibly expanded, but it is currently a tool that NHL GMs can use to measure player impact and team expectations versus actual results. As the model suggests, it is ideal for

GMs to compile a team with as many top line forwards, 2nd line forwards, and defensive defensemen as possible.

3.6 Results: Bi-Criteria Optimization Model

From the results of the regression model, a bi-criteria optimization model was created to find the optimal number of each player type to include on a team, while also minimizing the annual cap hit for the team. The average cap hit for each player cluster was considered for the model. In addition, the regression model was modified to obtain output results in terms of the number of players needed, as opposed to number of minutes that each player cluster plays. Currently, the coefficient values represent the number of points that a player type contributes for every minute of EAIT. The new coefficient values, shown in Table 25, show the number of points a player from the specific player cluster contributes to his team. This was done by multiplying the original coefficients by the average ice time for each player cluster group for the 2016-2017 season. For example, the top line forward’s coefficient value (0.466) was multiplied by the cluster’s average ice time (16.45 minutes) to obtain a modified coefficient value of 7.666 points/top-line player.

60

Table 25: Team points contributed by each player type (2016-17)

Player Type New Regression Coefficient (Points/Player) Top Line Forward 7.666 2nd Line Forward 7.920 Average Forward 3.011 Grinder Forward 3.555 Offensive Defensemen (OD) 5.676 Defensive Defensemen (DD) 6.078 3rd Pair (average) Defensemen 2.278 Physical Defensemen 4.749

Once the regression coefficients were modified, the optimization problem was modeled.

The objective was to maximize team points and minimize the total team salary cap hit. NHL teams can only spend up to $75 million, but they must spend at least $55.4 million on players each season

(Clements 2017). NHL teams dress 12 forwards, 6 defensemen, and 2 goalies every game (Chan et al., 2012). Thus, the only personnel constraints in the model are that each team needs 12 forwards and 6 defensemen. Considering the average salary cap hit for each player type (Table 21

& Table 22), the model was constructed. Table 26 summarizes the variables, objective functions, and constraints.

61

Table 26: Bi-Criteria Optimization Model

VARIABLES Explanation F1,F2,F3,F4 The number of each forward type, where: F1 = top line F2 = 2nd line F3 = average F4 = grinder D1,D2,D3,D4 The number of each defensemen type where: D1= offensive defensemen D2 = defensive defensemen D3 = 3rd pair D4 = physical OBJECTIVE FUNCTIONS MAX PTS = 7.666(Top-F) + 7.920(2nd-F) Maximize total team points. The team’s total + 3.011(Average-F) + 3.555(Grinder-F) + 5.676(OD) points depend on the contribution from each + 6.078(DD) + 2.278(3rd-D) + 4.749(Physical) player type on the team. MIN COST = 3.70(Top-F) + 4.09(2nd-F) Minimize the team’s total salary cap hit, + 1.66(Average-F) + 1.64(Grinder-F) + 4.45(OD) where the number of each player type on a + 2.1(0DD) + 2.45(3rd-D) + 2.10(Physical) team is multiplied by its average salary CONSTRAINTS F1+F2+F3+F4 = 12 Each team must have 12 forwards D1+D2+D3+D4 = 6 Each team must have 6 defensemen $55.4푀 ≤ 푇표푡푎푙 푇푒푎푚 퐶푎푝 퐻𝑖푡 ≤ $75푀 Each team must spend between $55.4 million and $75 million. This constraint is the same equation as the minimum cost equation

The output from the model is not surprising. The model’s output states that it is optimal to compose a team of 12 second-line forwards and six defensive defensemen. This would yield 131.5 team points and cost the team $61.68 million. The summary of the solution is shown in Table 27.

The reason the team was all 2nd line forwards and all defensive defensemen was because they both had the highest coefficient values for their respective positions. In addition, they were still under the maximum salary cap because only the average salary cap hits for each cluster was considered.

Some players within the 2nd line cluster may make close to $10 million, while other players in the

2nd line cluster may make only $1 million, for example. The standard deviations of the salary cap hit for each player (Table 28) were not factored into the model and thus, the results were slightly 62 skewed. In reality, teams would not be able to afford a team full of 2nd line players, as demonstrated by the large standard deviation of cap hits.

Table 27: Optimal team with a maximum $75 million salary cap

Forwards Team Structure 2nd Line Forwards = 12 Defensemen Defensive Defensemen = 6 Team Points 131.5 Points Total Salary Cap $61.68 Million

Table 28: Averages and standard deviations for cap hit per player type

Average Cap Hit Cap Hit Standard Player Type ($M) Deviations ($M) Top Line FWD 3.7 2.5 2nd line FWD 4.09 2.35 Grinder FWD 1.64 1.33 Average FWD 1.66 1.41 Offensive DEF 4.45 2.12 Defensive DEF 2.1 1.81 3rd Pair DEF 2.45 1.79 Physical DEF 2.1 1.72

Moreover, in order to determine the cheapest hypothetical team that NHL GMs could assemble, the salary cap constraint was set equal to $55.4 million. The optimal team under this constraint included 11 top line forwards, 1 grinder forward, and 6 defensive defensemen. This team was predicted to earn 125.2 points. Table 29 summarizes the results. Again, this is a simplified model that does not consider that some players may actually contribute more or less than the average contribution for their player type. In addition, certain players obviously make more or less than the average salary for their player type. Thus, this model, while interesting to study, does not tell the entire story and can certainly be improved. 63

Table 29: Optimal team when the salary cap was exactly $55.4 million

Forwards Team Structure Top Line Forwards = 11 Grinder Forwards = 1 Defensemen Defensive Defensemen = 6 Team Points 125.2 Points Total Salary Cap $55.4 Million

Finally, team contribution relative to a player’s salary was studied. In Table 30, the average cap hit per team point contribution is shown for all eight player types. In other words, how much money do teams need to pay each player type in order to receive one point in the standings? For example, a top line forward is paid $483,000 for every team point in the standings he contributes.

Table 30: Cap hit ($M) per contributed team point

Cap Hit Player Type ($M)/ Point Top Line FWD 0.483 2nd line FWD 0.516 Grinder FWD 0.545 Average FWD 0.467 Offensive DEF 0.784 Defensive DEF 0.345 3rd Pair DEF 1.075 Physical DEF 0.442

Top line forwards and average forwards appear to be efficient players in terms of the money they earn for the team contribution they provide. They have the lowest cap hit per team point produced. Likewise, defensive defensemen would be the most efficient signing for a team since they make $345,000 for every team point they contribute. Overall, however, defensemen are more expensive than forwards for every point they contribute. It is recommended that GMs focus on signing talented top line forwards and defensive defensemen. 3rd pairing defensemen, on the other 64 hand, are not very attractive free agency signings because they will cost $1.075 million for every point they contribute to the team. Money can be spent more wisely on a defensive or physical defenseman, for example. With this said, it cannot be overstated that GMs would prefer to have a team consisting of all top line forwards because they will be the top point producers and puck controllers. Sometimes, however, GMs will have to overpay for the best players, but as proven it may be best to sign for the most efficient players.

3.6.1 Sensitivity Analysis

Sensitivity analysis was performed on the bi-criteria optimization results. In every clustering model, besides the 2016-2017 NHL season, the top line forwards earned more money than any other forward. This season, which the model was based off, may have had a few outliers that caused the cluster to be slightly skewed. Thus, choosing 12 second-line forwards was determined to be the most efficient solution to the problem. However, if the contribution to team points was slightly different than the regression model, what would have been the result to the problem? In other words, if top line forwards have a higher coefficient value, how would the results change?

For the sensitivity analysis, a top line forward’s contribution was changed to 8 points

(originally 7.666 points) and a 2nd line forward’s contribution was changed to 7.5 points (originally

7.920). In addition, the salary cap from the 2015-2016 clustering was used. A top line forward made $4.95 million on average, while a 2nd line forward earned $2.25 million on average. The results differed; 12 top line forwards and 6 defensive defensemen became the optimal team. The team point total was 132.47 points (versus 131.5 points in the original optimal solution). Likewise, 65 the team’s salary was $72 million versus the $61.68 million from the original solution. It is apparent that the optimal solution changes depending on the clustering of the players. A few outliers in the 2016-2017 season could have altered the cluster and thus, affected the optimal solution.

One more sensitivity model was run. In this case, the 2015-2016 NHL season average salary cap data was used (Table 19 and Table 20). The top line player contribution was 8 points and the 2nd line forward contribution remained at 7.920. The defensive defensemen contribution was reduced to 5.8 points and the offensive defensemen contribution was increased to 6 points. In this model, the optimal team consisted of 7 top line forwards, 5 second line forwards, and 6 offensive defensemen. This team had a season point total of 131.6 versus the 131.5 points from the original model. The salary of this team was $75 million versus the original $61.68 million.

This sensitivity shows that while a top line forward contributes more than a 2nd line forward, 2nd line forwards are also added to the team, in order to save money to add more offensive defensemen to the team. Offensive defensemen earn more than defensive defensemen but contribute more to team success in this particular analysis; therefore, the model found the optimal solution when the amount of top line forwards was reduced, in order to save money for offensive defensemen. This is a logical solution because a strong defensive unit is crucial for team success in the NHL. The model is consistent with real-world logic.

Finally, the original model ignores outliers for the salary cap consideration. For example, the average 2nd line forward earns $4.09 million, but there are many instances where players in this category make more money because they are “franchise players” that teams do not want to lose. For example, David Krejci and Patrice Bergeron of the were both clustered as

2nd line forwards for the 2016-2017 season. However, they both earn above the average 2nd line 66 forward salary because they are above average 2nd line forwards. Krejci and Bergeron had a cap hit of $7.20 million and $8.70 million, respectively. The optimization solution had the team utilizing 12 second line forwards with a total salary cap hit of $61.68 million. A team with

Bergeron and Krejci, on the other hand, would cost $69.4 million. However, a team with Krejci and Bergeron may be better than the average team because the model only considers the average contribution from 2nd line forwards, not specific players’ contributions. A higher paid player may contribute more than the average clustered player. Thus, when analyzing the results, it is important to realize that individual players are not considered, but rather their cluster type metrics are. It is also interesting to note that teams, under this model, would only be able to afford two players making $10 million. To obtain elite forwards, some teams will pay this amount, but they would be limited in the number of highly-paid players they can sign due to the salary cap. Thus, teams will have to fill the remainder of their roster with less attractive players, such as average forwards and

3rd pair defensemen since they are spending more money for elite top line forwards.

3.7 Future Work

The clustering, regression, and optimization model can all be further improved to obtain more detailed recommendations. In the future, more seasons can be studied for the clustering and regression model. For example, with more seasons, the clusters may be more accurate. In addition, the regression model can be used to determine a team’s potential point total based on its current roster. This would help GMs see what off-season transactions they need to make, in order to increase their chances of winning. Using the regression model as a tool to improve teams is an area of future study. 67

Finally, it would be interesting to look at players on a granular level. In the model, the player types were aggregated for each team. In future studies, it would be beneficial to see the team point contribution for specific NHL players so that NHL GMs can sign players based on their specific contribution rather than their cluster type contribution. In doing this, the bi-criteria optimization model can be used to select specific players to make up the ideal team. This would be a valuable tool to have when GMs are deciding what players to trade for and sign during the free agency period in the off-season.

3.8 Conclusion

How do teams value player contribution? That is the question that keeps NHL GMs busy.

In this chapter, the answer to this question was researched. The k-means clustering method was used to cluster NHL players from four seasons into specific player groups. Forwards were divided into four groups and defensemen were divided into four groups. A key difference from Chan’s research is that CF% was used to cluster players. CF% is an advanced statistic that can help track puck possession, which is also an indicator for how well a player can control the pace of play.

The linear regression model based on the 2016-2017 NHL season proved that top line forwards, 2nd line forwards, and defensive defensemen had the greatest impact on team performance. Likewise, during the 2016-2017 season, there were a few teams, such as St. Louis and Calgary that overperformed according to the model. For their overperformance, they were awarded a chance to compete for the Stanley Cup. On the other hand, Winnipeg underperformed and missed the playoffs.

Finally, a bi-criteria optimization model was created to find the optimal number of players from each player type to maximize team points and minimize team salary cap. It was found that a 68 team of all 2nd line forwards and defensive defensemen was optimal. The restriction on the model is that every player, even within the same cluster, earns a different salary and can have a different contribution to team points. This was not considered in the model but can be researched in future work. Nevertheless, the process of finding undervalued and overvalued players proved to be successful using the model that incorporated the advanced CF%. 69

Chapter 4: Conclusion

Can hockey be explained by numbers and models? Is there a way NHL GMs and coaches can leverage the immense amount of data that is collected on every team and every player in the league? The answer to that question is yes. Baseball geniuses, like Billy Beane (VP of the Oakland

Athletics) proved that analytics has a place in sports, but hockey has been slow in welcoming the analytics community to the sport. However, in recent years, researchers like Thomas, Chan, and

Macdonald have been proving the value of data collected on teams and players. GMs can make informed decisions on particular players and predict their worth to the team, for example. GMs have the goal of reaching the playoffs. They want to obtain just enough points to reach the playoffs because in the NHL anything can happen in the playoffs.

In this thesis, two separate topics were considered: goal interarrival times and player contribution to team success. In Chapter 2, the goal interarrival times for every Penn State goal during the 2016-2017 NCAA season was considered along with every goal that the Pittsburgh

Penguins scored during their 2015-2016 NHL campaign. Their interarrival times compared well, but both teams had periods of inefficiencies. Likewise, the Penguins were analyzed for two reasons. For one, they won the Stanley Cup, so it was interesting to determine the strategies a championship team employs. The other reason they were studied was because they fired their coach Mike Johnston and hired Mike Sullivan after 28 games. The Penguins, after further study, were destined to miss the playoffs under Johnston using the linear regression model based on Goals

For (GF) and Goal Against (GA). It proved to be a clever move to hire Mike Sullivan because not only did the Penguins turn their season around, but they also went on to win the Stanley Cup.

Chapter 3 shifted focus to individual player contribution. Players from four NHL seasons were clustered using a series of hockey statistics; however, the CF% was incorporated into the 70 clustering. The CF% was used to find undervalued players in the league for each season. From there, a linear regression model that quantified the contribution of each player type to their team’s success was created. It was determined that the elite offensive players (top line forwards and 2nd line forwards) contribute the most to their team. However, defensive defensemen provide the most efficient way teams can gain additional points, as they had the lowest cap hit per team point contributed. Finally, a bi-criteria optimization model proved that a team consisting of all 2nd line forwards and all defensive defensemen would maximize team points, while minimizing a team’s total salary cap hit.

The GF and GA regression model created in Chapter 2 (Table 8) and the player contribution model from Chapter 3 (Table 24) show that the two models did a fairly good job at predicting the total points for the 2016-2017 Penguins. The Penguins scored 278 goals and gave up 229 goals in the 2016-2017 season (www.hockey-reference.com). Their predicted point total using the Chapter

2 regression was 108.65 points, whereas the predicted point total was 103.5 points using the

Chapter 3 regression. They are both similar results but use different metrics to get the results.

Nevertheless, they both underestimate the Penguins’ actual point total (111 points). Teams can use both models to predict their season point total and from there, make changes to improve the team.

Likewise, there is an old adage in hockey that defense wins championships (Compton

2014). From this research, this adage has some validity. The regression model from Chapter 2 proved that for every goal scored, a team’s season point total increased by 0.3669 points. However, for every goal that a team gave up, the team’s season point total decreased by 0.4347. It is clear that a goal against has more effect than a goal scored. Thus, having a great defense to prevent goals will help teams win. Likewise, a defensive defenseman was proven to contribute more to team success than any other type of defenseman from the Chapter 3 player contribution regression. This 71 result is consistent with the goal scoring regression from Chapter 2, as both highlight the importance of having a team built around defensemen that can prevent goals from being scored.

This research indicates that teams that can effectively prevent goals from being scored have a greater chance for success. NHL teams should build a strong defensive core first and then acquire elite forwards to complement their defense.

Finally, analytics has a lot of room to grow within hockey. Many teams are investing in analytics departments because they see the value in another tool to evaluate players. With this said, analytics should never be the sole source of evaluating players and teams. While the results in this thesis are beneficial, teams should never pick players based on pure analytics, but instead should ensure that players are good fits with the team, coach, and other players. With better data in the future, quantifying player effectiveness can become more holistic and account for outside factors not considered in this research. Nonetheless, hockey analytics will certainly grow in the future because teams are always looking to find that next great competitive edge.

72

BIBLIOGRAPHY

1. Chan, T. C., Cho, J. A., & Novati, D. C. (2012). Quantifying the Contribution of NHL Player Types to Team Performance. Interfaces 42(2):131-145. https://doi.org/10.1287/inte.1110.0612 2. Chiari, M. (2015, December 12). Mike Johnston Fired by Penguins: Latest Details, Comments, Reaction. Retrieved February 14, 2018, from http://bleacherreport.com/articles/2598361-mike-johnston-fired-by-penguins-latest- details-comments-reaction 3. Clements Omnisport, R. (2017, June 18). NHL sets salary cap for 2017-18 season at $75 million. Retrieved March 22, 2018, from http://www.sportingnews.com/nhl/news/nhl- sets-salary-cap-for-2017-18-season-at-75-million/1sd78h2kx12fz1ob2e8x3qyy00 4. Clipperton, J. (2017, December 8). NHL increases salary cap from $75M US to $78-82M for next season | CBC Sports. Retrieved March 15, 2018, from http://www.cbc.ca/sports/hockey/nhl/nhl-salary-cap-1.4439767 5. Compton, R. (2014, October 24). Economist examines what wins hockey games: Defence or offence. Retrieved March 29, 2018, from http://news.umanitoba.ca/economist- examines-what-wins-hockey-games-defence-or-offence/ 6. Cowen, T. & Grier, K. (2011, December 09). The Economics of Moneyball. Retrieved March 18, 2018, from http://grantland.com/features/the-economics-moneyball/ 7. Delignette-Muller, M. L., & Dutang, C. (2015). fitdistrplus: An R package for fitting distributions. Journal of Statistical Software, 64(1), 1-34. 10.18637/jss.v064.i04. 8. Frost, J. (1970, March 22). The Graphical Benefits of Identifying the Distribution of Your Data. Retrieved February 22, 2018, from http://blog.minitab.com/blog/adventures- in-statistics-2/the-graphical-benefits-of-identifying-the-distribution-of-your-data 9. Interpret all statistics for a probability plot with Weibull fit. (2016). Retrieved February 22, 2018, from http://support.minitab.com/en-us/minitab-express/1/help-and-how- to/graphs/probability-plot/interpret-the-results/all-statistics/probability-plot-with-weibull- fit/#ad-value 10. Luszczyszyn, D. (2015, May 07). Why playoff puck possession does matter – even in a short series. Retrieved March 20, 2018, from 73

http://www.thehockeynews.com/news/article/why-playoff-puck-possession-does-matter- even-in-a-short-series 11. Macdonald, Brian. (2012). “An Expected Goals Model for Evaluating NHL Teams and Players,” Proceedings of the 2012 MIT Sloan Sports Analytics Conference. 12. Masisak, C. (2015, October 05). Hockey advanced statistics: What is a Corsi number? Retrieved March 29, 2018, from http://www.sportingnews.com/nhl/news/what-is-a-corsi- number-explained-stats-nhl-hockey-advanced-statistics/9q6sdoe4l3o51jb3dm5lyjpxh 13. Parnass, Arik. (2015, February 22). Analytics, not statistics, driving NHL evolution. Retrieved January 25, 2018, from https://www.nhl.com/news/analytics-not-statistics- driving-nhlevolution/c-754099 14. P, R. (2013, November 08). Corsi Explained. Retrieved March 20, 2018, from https://www.jewelsfromthecrown.com/2013/11/8/5081592/corsi-explained 15. Rosen, D. (2017, June 01). Mike Sullivan's success as Penguins coach no surprise to mentor. Retrieved February 09, 2018, from https://www.nhl.com/news/mike-sullivans- success-as-penguins-coach-no-surprise-to-mentor/c-289707174 16. Swartz, T. B. (2017). Hockey Analytics. Wiley StatsRef: Statistics Reference Online, 1- 10. doi:10.1002/9781118445112.stat07965 17. Thomas, Andrew C. (2007) "Inter-arrival Times of Goals in Ice Hockey," Journal of Quantitative Analysis in Sports: Vol. 3: Iss. 3, Article 5. 18. West, Bill. (2016, April 6). Penguins' surge under Sullivan comes with simple neutral- zone strategy. Retrieved February 15, 2018, from http://triblive.com/sports/penguins/10258742-74/plus-penguins-sullivan 19. Individual and team statistics collected from https://www.hockey-reference.com 20. NHL player data collected from https://frozenpool.dobbersports.com/frozenpool_report.php 21. NHL player salary cap data collected from https://www.capfriendly.com 22. Penn State scoring data collected from www.Gopsusports.com 23. Pittsburgh Penguins scoring data collected from https://www.hockey- reference.com/teams/PIT/2016_games.html

ACADEMIC VITA STEVEN M. BOLLENDORF

EDUCATION Bachelor of Science, Industrial Engineering The Pennsylvania State University, University Park, PA Graduation: May 2018 Schreyer Honors College

EXPERIENCE ExxonMobil, Houston, TX May 2017 – August 2017 U.S. Distribution Intern • Developed and implemented several data visualization tools using Tableau to reduce distribution costs • Effectively planned for implementation of cloud-based yard management systems in lubricant plants

Acme Corrugated Box, Hatboro, PA May 2016 – August 2016 Industrial Engineering Intern • Performed calculations and gathered data using Excel and C++ to implement a Dücker automatic conveyor line • Presented project findings to management and communicated with internal and external project members • Company implemented new conveyor system, using my calculations, in order to increase throughput efficiency

Undergraduate Researcher, Penn State Industrial Engineering September 2016 – May 2017 • Assist in a graduate-level research study to identify the most efficient workplace exercise device

Undergraduate Researcher, ACURA August 2015 – May 2016 • Designed device that can test for the coefficient of friction between a person’s foot and a designated surface

LEADERSHIP & INVOLVEMENT Co-Chair, Gateway Orientation, Schreyer Honors College (May 2017- September 2017) • Organized and led a group of 240 new honors students into the honors college

Mentor, Change of Campus Leadership Conference (July 2017- August 2017) • Planned leadership conference and mentored Penn State change of campus students

Project Manager, ENACTUS, Student Business Organization (September 2014- May 2016) • Designed a cost-effective filament extruder that converts recycled plastic to 3-D printing filament • Devised a plan to track, transport, and organize business inventory for community-oriented businesses

Tutor, Penn State Learning Center (August 2015 – May 2016) • Tutored students in mathematics and physics; achieved improvement in student grades

Peer Assistant, Penn State Abington First Year Engagement (September 2015 – December 2015) • Presented information to a freshman mathematics class on campus resources, activities, and study skills

Member, Atlas, THON Special Interest Organization (August 2016- Present) Member, Institute of Industrial and Systems Engineers (IISE) (August 2016 - Present) Member, Hockey Management Association (August 2016 – May 2017)

SOFTWARE SKILLS Tableau Minitab 3-D printing MATLAB C++

HONORS & AWARDS Research Fellowship Grant President’s Freshman Award President Sparks Award David S. Rocchino Scholarship Harold and Inge Marcus IE Scholarship Wells Fargo Scholarship