The Pennsylvania State University Schreyer Honors College Department of Industrial and Manufacturing Engineering Hockey Analyt
Total Page:16
File Type:pdf, Size:1020Kb
THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF INDUSTRIAL AND MANUFACTURING ENGINEERING HOCKEY ANALYTICS: PREDICTIVE MODELING OF TEAM AND PLAYER PERFORMANCE STEVEN BOLLENDORF SPRING 2018 A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Industrial Engineering with honors in Industrial Engineering Reviewed and approved* by the following: Guodong (Gordon) Pang Associate Professor Harold and Inge Marcus Department of Industrial and Manufacturing Engineering Thesis Supervisor Catherine Harmonosky Associate Professor and Associate Department Head of Harold and Inge Marcus Department of Industrial and Manufacturing Engineering Honors Adviser * Signatures are on file in the Schreyer Honors College. i ABSTRACT Analytics in hockey is growing in popularity. Deciding which game strategies to implement and which players make a team more competitive is extremely valuable information for coaches and general managers (GMs) of National Hockey League (NHL) teams. The goal of this applied research is to look at two of aspects of the sport to find outcomes that can help with in-game strategies and help find the right players for reasonable salaries on NHL teams. This research looks at The Pennsylvania State University 2016-2017 hockey team and the 2015-2016 Pittsburgh Penguins to discover any scoring rate patterns that winning hockey teams possess. Likewise, a linear regression model based on a team’s Goals For (GF), or goals scored by a team, and Goals Against (GA), or goals scored against a team, predicts that GF contribute less to a team’s success than GA. In addition, the data from four NHL seasons on every NHL player is used to cluster players into specific player types in order to predict their value to team success. The key clustering metric used is the Corsi For Percentage (CF%), which measures a player’s puck possession skill. According to this research, elite forwards, second line forwards, and defensive defenseman provide the most value to a team. Lastly, specific teams during the 2016-2017 season are analyzed to determine if they have underperformed or overperformed relative to the model’s predicted team point total. ii TABLE OF CONTENTS LIST OF FIGURES ..................................................................................................... iii LIST OF TABLES ....................................................................................................... iv ACKNOWLEDGEMENTS ......................................................................................... vi Chapter 1: Introduction ................................................................................................ 1 1.1 History and Evolution of Hockey Analytics .............................................................. 1 1.2 Objectives ................................................................................................................... 2 Chapter 2: Goal Interarrival Times .............................................................................. 5 2.1 Literature Review ....................................................................................................... 5 2.2 Introduction: Penn State Ice Hockey (2016-2017) ..................................................... 7 2.3 Methodology: Penn State Ice Hockey ........................................................................ 7 2.4 Results: Penn State Ice Hockey .................................................................................. 8 2.5 Penn State Hockey Future Considerations ................................................................. 14 2.6 Pittsburgh Penguins (2015-2016) ............................................................................... 15 2.7 Methodology: Pittsburgh Penguins ............................................................................ 15 2.8 Pittsburgh Penguins Descriptive Statistics ................................................................. 16 2.9 NHL Goal Regression Model ..................................................................................... 20 2.10 Pittsburgh Penguins’ Goal Interarrival Times .......................................................... 25 2.11 Pittsburgh Penguins Future Considerations ............................................................. 29 2.12 Conclusion ............................................................................................................... 30 Chapter 3: Evaluating Player Contribution to Team Success ...................................... 32 3.1 Introduction ................................................................................................................ 32 3.2 Literature Review ....................................................................................................... 34 3.3 Methodology .............................................................................................................. 36 3.3.1 Data .................................................................................................................... 36 3.3.2 Clustering Model ............................................................................................... 37 3.3.3 Player Contribution Linear Regression Model .................................................. 40 3.4 Results: Player Clusters .............................................................................................. 42 3.5 Results: Regression Model ......................................................................................... 55 3.6 Results: Bi-Criteria Optimization Model ................................................................... 59 3.6.1 Sensitivity Analysis ........................................................................................... 64 3.7 Future Work ............................................................................................................... 66 3.8 Conclusion ................................................................................................................. 67 Chapter 4: Conclusion.................................................................................................. 69 BIBLIOGRAPHY ........................................................................................................ 72 iii LIST OF FIGURES Figure 1: Penn State Hockey Goal Frequency per Minute in Game ........................................ 9 Figure 2: Penn State Interarrival Goal Times .......................................................................... 11 Figure 3: R fitting output of interarrival goal time data ........................................................... 12 Figure 4: Penn State Weibull distribution plot highlighting the percent of interarrival times between 300 and 600 seconds (5-10 min) ........................................................................ 13 Figure 5: Empirical CDF for Penn State highlighting the 2.5, 5, and 97.5 percentile of interarrival goal data ........................................................................................................................... 14 Figure 6: Pittsburgh Penguins’ goals per minute in game for 2015-16 season ........................ 19 Figure 7: Pittsburgh Penguins’ goals per minute in game under Coach Mike Johnston ......... 19 Figure 8: Pittsburgh Penguins’ goals per minute in game under Coach Mike Sullivan .......... 20 Figure 9: Distribution fitting for Pittsburgh Penguins’ interarrival goal data (R Output) ....... 26 Figure 10: Pittsburgh Penguins’ Weibull distribution plot highlighting the percent of goal interarrival times between 5-10 minutes .......................................................................... 27 Figure 11: Pittsburgh Penguins’ Weibull distribution plot highlighting the 25th percentile of goal interarrival times. ............................................................................................................. 28 Figure 12: Empirical CDF for Penguins highlighting the 2.5, 5, and 97.5 percentiles of interarrival goal data ........................................................................................................................... 29 Figure 13: 2013-14 forwards cluster matrix plot (Sidney Crosby) .......................................... 44 Figure 14: 2013-14 forwards cluster matrix plot (Tyler Toffoli) ............................................. 45 Figure 15: 2013-14 defensemen cluster matrix plot (John Carlson) ........................................ 47 Figure 16: 2013-14 defensemen cluster matrix plot (Jake Muzzin) ........................................ 48 Figure 17: 2014-15 defensemen cluster matrix plot (Jake Muzzin) ........................................ 50 Figure 18: 2014-15 defensemen cluster matrix plot (Kris Letang) .......................................... 51 Figure 19: 2014-15 defensemen cluster matrix plot (Kevin Shattenkirk) ................................ 51 Figure 20: 2015-16 forwards cluster matrix plot (Nikita Kucherov) ....................................... 53 Figure 21: 2016-17 forwards cluster matrix plot (Nikita Kucherov) ....................................... 55 iv LIST OF TABLES Table 1: Basic statistics on goal interarrival times in minutes and seconds ............................ 8 Table 2: Weibull distribution fitting results for Penn State’s goal interarrival times .............. 12 Table 3: Penguins’ regulation goals under Mike Johnston (28 games) ................................... 17 Table 4: Penguins’ regulation goals under Mike Sullivan (54 games) ...................................