<<

Clustering and Analyzing 5v5 NHL Location Data Twitter: @BrendanKumagai Website: kumihockey.com Brendan Kumagai, McMaster University / University of Email: [email protected]

Results Introduction Key Facts Problem: Heatmap-style visualizations are a common and intuitive way to display patterns How do you read this? 10 clusters in shot location data for individual players or teams. However, analyzing these shot maps The tree diagram at the top becomes exponentially more difficult as the number of players/teams increases. For represents how the players 3 subgroups example of this issue, zoom in on the shot heatmaps on the right, now try imagine were split with Ward's linkage Home plate forwards analyzing these and about 360 more players at once. That will get difficult really quickly. hierarchical clustering, resulting Perimeter forwards in 10 colour coded clusters. Defencemen Solution: I propose a framework for grouping NHL players by 5v5 shot location data The heatmap below is a Results separate: 388x388 comparison of the through fitting a shot density polygrid for each player and performing Ward's linkage Perimeter and home plate shooters players' shot maps by Euclidean hierarchical clustering [1] as outlined in the Methodology section. "Dynamic" D-men who produce more shots distance, where dark blue deep in the zone from "static" D-men who Once I have the hierarchical clustering results, I will analyze the composition of each represents very similar while are very highly concentrated in their corner. cluster and compare differences between clusters to search for any interesting skyblue/white represents very different. See "Cluster Composition Observations" observations and patterns in NHL 5v5 shot maps. for more interesting trends.

Methodology Home Plate Forward Clusters F1: Heavy Low Netfront F2: Moderate Netfront + Slot F3: Heavy Netfront + Slot

1 Obtain and prepare NHL 5v5 shot location data 49 Players 37 Players 20 Players

Shot location data obtained from evolving-hockey.com [2]. Top 3 in Goals/60 Top 3 in Goals/60 Top 3 in Goals/60 All unblocked shots 1. C. McDavid (1.19) 1. M. Stone (0.99) 1. J. van Riemsdyk (1.14) 2. J. Guentzel (1.14) 2. E. Malkin (0.98) 2. B. (1.05) At 5v5 Density vs Forward Average 3. B. Tkachuk (1.01) 3. S. Couturier (0.89) 3. A. Lee (0.98) 3 NHL regular seasons (2017-18 to 2019-20) Perimeter Forward Clusters I will be analyzing the 388 players with at least 300 5v5 shots in this dataset. F4: Left-Leaning Perimeter F5: Right-Leaning Perimeter F6: Heavy Perimeter

74 Players 53 Players 30 Players

2 Create a shot density polygrid for each player Top 3 in Goals/60 Top 3 in Goals/60 Top 3 in Goals/60 1. A. Matthews (1.53) 1. A. Ovechkin (1.41) 1. N. Kucherov (1.25) 2. J. Vrana (1.20) 2. B. Gallagher (1.32) 2. V. Tarasenko (1.09) 2D Gaussian kernel density estimation 3. B. Marchand (1.15) 3. V. Arvidsson (1.30) 3. N. Ehlers (1.03) with bandwidth, σ = 2. Defence Clusters

This polygrid is 25x25 with 30 grid D1/D3: Static Shot Location Defence D2/D4: Dynamic Shot Location Defence squares in each corner assumed to be 18 (L) and 33 (R) Players 32 (L) and 42 (R) Players noise and removed from analysis. (565 grid squares in total) Top 3 in Goals/60 Top 3 in Goals/60 1. S. Weber (0.48) [R] 1. D. Hamilton (0.54) [R] 3 Group players based on their shot maps with hierarchical clustering 2. M. Matheson (0.32) [L] 2. V. Hedman (0.47) [L] Density vs Defence Average 3. P. Subban (0.31) [R] 3. J. Carlson (0.42) [R]

Cluster Composition Observations Clusters that are closer Perimeter players Perform Ward's linkage hierarchical to the net have a higher are on average only clustering based on Euclidean distance. expected unblocked 1 inch shorter than shooting %, but luck, netfront players. Cluster F3 shooting talent, and ... other underlying factors Indicating that size reduces the difference may help but is a not What is hierarchical clustering? What is Ward's linkage? Cluster D1 and adds variability in a requirement for getting consistent Ward's linkage is a method of measuring the similarity, or "linkage", between actual unblocked Hierarchical clustering is an two clusters in hierarchical clustering. home plate chances. algorithm that groups together shooting %. objects (players, in this case) based Before using Ward's linkage, I define the (Euclidean) distance between player x and player y as on their features (shot density where and represents the jth grid square density for player x and y, respectively. polygrid squares). Ward's linkage determines which two clusters will lead to the minimal increase in the sum of squared distances between each player and the mean value of the cluster when you form these two clusters together. Algorithm: Determining the best two clusters to link via Ward's linkage: 1.Start with each player in his own 1.Find the mean value of each cluster. The mean of cluster k is where are the vectors of grid Main Takeaways cluster (388 clusters). square densities for all players in the cluster. Through fitting a density polygrid to each player's shot location data and using hierarchical clustering, I identify 10 2.Link the two most similar 2.For each cluster, calculate the distance between the cluster mean and each player, then square these values and sum them up. This is called the "sum of squared distances of the cluster and is defined as for cluster k. clusters of players based on their shooting patterns. clusters. 3.For each pair of clusters, assume they are a single cluster and repeat Steps 1 and 2. 3.Repeat 1 & 2 until you reach the These clusters can be separated into subgroups of perimeter forwards, home plate forwards and defencemen. 4.For each pair of clusters, take the sum of squared distances value for the merged cluster from step 3 and subtract the Furthermore, defencemen can be divided into dynamic and static shooting patterns. desired number of clusters. individual sum of squared distances values found in Step 2. This is written as 5.Select the pair of clusters that has the minimum value from Step 4. Mathematically, this is expressed as The distribution of any statistic, metric or value among players in a cluster can be determined and compared to draw meaningful insights.

References Future Work [1] Ward, J.H., 1963. Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, 58, 236–244 Apply this framework to any other sports location data. [2] Younggren, J. & Younggren, L. NHL PBP Query. Retrieved August 20, 2020. https://www.evolving-hockey.com Continue to explore other high-dimensional clustering techniques for this problem. [3] Yu, D., Boucher, C., Bornn, L., & Javan, M. Playing Fast Not Loose: Evaluating team-level pace of play in using spatio-temporal possession data. Develop "shot map similarities" comparing each individual pair of players with some similarity measurement arXiv preprint arXiv:1902.02020 (2019). Appendix: Cluster Results

F1: Heavy Low Net F2: Moderate Netfront + Slot F3: Heavy Netfront + Slot F4: Left-Leaning Perimeter F5: Right-Leaning Perimeter Marcus Foligno J.T. Miller Connor Brown Mats Zuccarello * * Tyler Johnson Jake Guentzel Ondrej Kase Connor McDavid Colton Sissons Conor Sheary Jakub Voracek Milan Lucic Derek Ryan * Tomas Tatar Jesper Bratt Kevin Labanc Trevor Lewis Jordan Martinook Chris Tierney Blake Coleman David Pastrnak Dustin Brown Tobias Rieder* Tomas Hertl Michael Grabner Jaden Schwartz Richard Panik Mikko Rantanen* Craig Smith Cam Atkinson* Viktor Arvidsson Oliver Bjorkstrand Zack Kassian Anze Kopitar Joonas Donskoi Patric Hornqvist * Calle Jarnkrok Joe Pavelski P.E. Bellemare * Antoine Roussel Ryan Carpenter Ryan Nugent-Hopkins Noel Acciari Brendan Perlini Timo Meier William Carrier T.J. Oshie Jake Debrusk Alex Iafallo Warren Foegele Artturi Lehkonen Brock McGinn Brayden Point Nathan MacKinnon Kyle Clifford* Sebastian Aho Nick Cousins * Justin Williams* Tomas Nosek Sam Gagner Ryan O'Reilly Ondrej Palat** Yanni Gourde** Alex Ovechkin Jeff Carter Filip Chytil Bryan Rust Tyson Jost* ** Bobby Ryan David Krejci Jakub Vrana Evgeny Dadonov Alex Steen Pavel Buchnevich Carl Soderberg Chris Wagner Alex Kerfoot Jean-Gabriel Pageau ** Boone Jenner Josh Anderson Pierre-Luc Dubois Andreas Athanasiou Ryan Hartman Kyle Palmieri Kevin Fiala Mathew Barzal Christian Dvorak Zack Smith Cody Eakin Mark Jankowski* Brad Richardson* Nicklas Backstrom Dominik Simon Christian Fischer* Andrei Svechnikov Alex Radulov Tyler Bertuzzi Mattias Janmark Kasperi Kapanen Vinnie Hinostroza

F6: Heavy Perimeter D1: Static Shooting Right D D2: Dynamic Shooting Right D D3: Static Shooting Left D D4: Dynamic Shooting Left D Teuvo Teravainen Brent Seabrook * Michael Frolik Olli Maatta * Brian Dumoulin Jake Gardiner Patrik Laine Brandon Montour Vince Dunn * Oscar Klefbom T.J. Brodie Nikita Zadorov Dylan Demelo Justin Braun David Savard Alex Edler Ivan Provorov Alex Goligoski Sean Kuraly P.K. Subban Matt Benning Darnell Nurse Ben Hutton Robert Hagg Nick Holden Brandon Carlo Charlie McAvoy Samuel Girard J.T. Compher Rasmus Ristolainen Andre Burakovsky Tony Deangelo Zdeno Chara Marc-Edouard Vlasic Markus Nutivaara Mike Hoffman Frans Nielsen Colin Miller Nate Schmidt Brett Kulak * Jordan Oesterle Darren Helm Anton Stralman Damon Severson Brady Skjei Alex Debrincat Scott Mayfield Ben Chiarot Ryan McDonagh Miro Heiskanen Cody Ceci* Michal Kempny Jakob Chychrun Dmitry Orlov Adrian Kempe Deryk Engelland Brett Pesce Oliver Ekman-Larsson Travis Sanheim Vincent Trocheck John Carlson Jake Muzzin Adam Pelech Nikita Kucherov Mike Green Jack Johnson Jared McCann Keith Yandle * Alex Biega Brayden McNabb Joakim Nordstrom Connor Murphy Sami Vatanen Andy Greene Braydon Coburn*

* = Player is not the best fit in their given cluster by my "eye test" on their individual shot map

** = A group of Tampa Bay forwards who ended up in the "Left-Leaning Perimeter" cluster, but had an abundance of shots juuust slightly above the "Heavy Netfront + Slot" cluster, pushing them out into a perimeter cluster.