Clustering and Analyzing 5V5 NHL Shot Location Data
Total Page:16
File Type:pdf, Size:1020Kb
Clustering and Analyzing 5v5 NHL Shot Location Data Twitter: @BrendanKumagai Website: kumihockey.com Brendan Kumagai, McMaster University / University of Toronto Email: [email protected] Results Introduction Key Facts Problem: Heatmap-style visualizations are a common and intuitive way to display patterns How do you read this? 10 clusters in shot location data for individual players or teams. However, analyzing these shot maps The tree diagram at the top becomes exponentially more difficult as the number of players/teams increases. For represents how the players 3 subgroups example of this issue, zoom in on the shot heatmaps on the right, now try imagine were split with Ward's linkage Home plate forwards analyzing these and about 360 more players at once. That will get difficult really quickly. hierarchical clustering, resulting Perimeter forwards in 10 colour coded clusters. Defencemen Solution: I propose a framework for grouping NHL players by 5v5 shot location data The heatmap below is a Results separate: 388x388 comparison of the through fitting a shot density polygrid for each player and performing Ward's linkage Perimeter and home plate shooters players' shot maps by Euclidean hierarchical clustering [1] as outlined in the Methodology section. "Dynamic" D-men who produce more shots distance, where dark blue deep in the zone from "static" D-men who Once I have the hierarchical clustering results, I will analyze the composition of each represents very similar while are very highly concentrated in their corner. cluster and compare differences between clusters to search for any interesting skyblue/white represents very different. See "Cluster Composition Observations" observations and patterns in NHL 5v5 shot maps. for more interesting trends. Methodology Home Plate Forward Clusters F1: Heavy Low Netfront F2: Moderate Netfront + Slot F3: Heavy Netfront + Slot 1 Obtain and prepare NHL 5v5 shot location data 49 Players 37 Players 20 Players Shot location data obtained from evolving-hockey.com [2]. Top 3 in Goals/60 Top 3 in Goals/60 Top 3 in Goals/60 All unblocked shots 1. C. McDavid (1.19) 1. M. Stone (0.99) 1. J. van Riemsdyk (1.14) 2. J. Guentzel (1.14) 2. E. Malkin (0.98) 2. B. Point (1.05) At 5v5 Density vs Forward Average 3. B. Tkachuk (1.01) 3. S. Couturier (0.89) 3. A. Lee (0.98) 3 NHL regular seasons (2017-18 to 2019-20) Perimeter Forward Clusters I will be analyzing the 388 players with at least 300 5v5 shots in this dataset. F4: Left-Leaning Perimeter F5: Right-Leaning Perimeter F6: Heavy Perimeter 74 Players 53 Players 30 Players 2 Create a shot density polygrid for each player Top 3 in Goals/60 Top 3 in Goals/60 Top 3 in Goals/60 1. A. Matthews (1.53) 1. A. Ovechkin (1.41) 1. N. Kucherov (1.25) 2. J. Vrana (1.20) 2. B. Gallagher (1.32) 2. V. Tarasenko (1.09) 2D Gaussian kernel density estimation 3. B. Marchand (1.15) 3. V. Arvidsson (1.30) 3. N. Ehlers (1.03) with bandwidth, σ = 2. Defence Clusters This polygrid is 25x25 with 30 grid D1/D3: Static Shot Location Defence D2/D4: Dynamic Shot Location Defence squares in each corner assumed to be 18 (L) and 33 (R) Players 32 (L) and 42 (R) Players noise and removed from analysis. (565 grid squares in total) Top 3 in Goals/60 Top 3 in Goals/60 1. S. Weber (0.48) [R] 1. D. Hamilton (0.54) [R] 3 Group players based on their shot maps with hierarchical clustering 2. M. Matheson (0.32) [L] 2. V. Hedman (0.47) [L] Density vs Defence Average 3. P. Subban (0.31) [R] 3. J. Carlson (0.42) [R] Cluster Composition Observations Clusters that are closer Perimeter players Perform Ward's linkage hierarchical to the net have a higher are on average only clustering based on Euclidean distance. expected unblocked 1 inch shorter than shooting %, but luck, netfront players. Cluster F3 shooting talent, and ... other underlying factors Indicating that size reduces the difference may help but is a not What is hierarchical clustering? What is Ward's linkage? Cluster D1 and adds variability in a requirement for getting consistent Ward's linkage is a method of measuring the similarity, or "linkage", between actual unblocked Hierarchical clustering is an two clusters in hierarchical clustering. home plate chances. algorithm that groups together shooting %. objects (players, in this case) based Before using Ward's linkage, I define the (Euclidean) distance between player x and player y as on their features (shot density where and represents the jth grid square density for player x and y, respectively. polygrid squares). Ward's linkage determines which two clusters will lead to the minimal increase in the sum of squared distances between each player and the mean value of the cluster when you form these two clusters together. Algorithm: Determining the best two clusters to link via Ward's linkage: 1.Start with each player in his own 1.Find the mean value of each cluster. The mean of cluster k is where are the vectors of grid Main Takeaways cluster (388 clusters). square densities for all players in the cluster. Through fitting a density polygrid to each player's shot location data and using hierarchical clustering, I identify 10 2.Link the two most similar 2.For each cluster, calculate the distance between the cluster mean and each player, then square these values and sum them up. This is called the "sum of squared distances of the cluster and is defined as for cluster k. clusters of players based on their shooting patterns. clusters. 3.For each pair of clusters, assume they are a single cluster and repeat Steps 1 and 2. 3.Repeat 1 & 2 until you reach the These clusters can be separated into subgroups of perimeter forwards, home plate forwards and defencemen. 4.For each pair of clusters, take the sum of squared distances value for the merged cluster from step 3 and subtract the Furthermore, defencemen can be divided into dynamic and static shooting patterns. desired number of clusters. individual sum of squared distances values found in Step 2. This is written as 5.Select the pair of clusters that has the minimum value from Step 4. Mathematically, this is expressed as The distribution of any statistic, metric or value among players in a cluster can be determined and compared to draw meaningful insights. References Future Work [1] Ward, J.H., 1963. Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association, 58, 236–244 Apply this framework to any other sports location data. [2] Younggren, J. & Younggren, L. NHL PBP Query. Retrieved August 20, 2020. https://www.evolving-hockey.com Continue to explore other high-dimensional clustering techniques for this problem. [3] Yu, D., Boucher, C., Bornn, L., & Javan, M. Playing Fast Not Loose: Evaluating team-level pace of play in ice hockey using spatio-temporal possession data. Develop "shot map similarities" comparing each individual pair of players with some similarity measurement arXiv preprint arXiv:1902.02020 (2019). Appendix: Cluster Results F1: Heavy Low Net F2: Moderate Netfront + Slot F3: Heavy Netfront + Slot F4: Left-Leaning Perimeter F5: Right-Leaning Perimeter Andrew Copp Marcus Foligno Evgeni Malkin Tyler Bozak Patrick Marleau Nazem Kadri Mikael Granlund J.T. Miller Connor Brown Jakob Silfverberg Auston Matthews Riley Sheahan Mats Zuccarello Paul Stastny Artem Anisimov Oskar Sundqvist Stefan Noesen James Van Riemsdyk William Nylander* Rickard Rakell* Jake Guentzel Brandon Saad Sidney Crosby Mathieu Perreault Anthony Mantha Tyler Johnson Travis Konecny Ondrej Kase Sam Reinhart Mark Scheifele Connor McDavid Anthony Duclair Ryan Strome Colton Sissons Conor Sheary Mikko Koivu Taylor Hall Jakub Voracek Tyler Toffoli Sean Monahan Milan Lucic Nick Ritchie Micheal Ferland Derek Ryan Brayden Schenn* Tomas Tatar Jesper Bratt Kevin Labanc Trevor Lewis Patrick Maroon Jordan Martinook Chris Tierney Jonathan Toews Zemgus Girgensons Carl Hagelin Jason Zucker Blake Coleman David Pastrnak Dustin Brown Craig Smith Cam Atkinson* Matthew Tkachuk Tobias Rieder* Sean Couturier Tomas Hertl Michael Grabner Jaden Schwartz Richard Panik Mikko Rantanen* Viktor Arvidsson Oliver Bjorkstrand Zack Kassian Anze Kopitar Joonas Donskoi Patric Hornqvist Johnny Gaudreau Andrew Cogliano* Jeff Skinner Anders Lee Calle Jarnkrok Nick Bjugstad Scott Laughton Matt Calvert Joe Pavelski Tom Wilson Mikael Backlund Max Domi P.E. Bellemare Brock Nelson Brendan Gallagher* Colton Sceviour Nick Bonino Antoine Roussel Wayne Simmonds Ryan Carpenter Leon Draisaitl Clayton Keller Ryan Nugent-Hopkins Kyle Okposo Tyler Pitlick Evander Kane Jamie Benn Ryan Johansen Noel Acciari Josh Bailey Logan Couture Brendan Perlini Timo Meier Jason Pominville Jason Spezza Phillip Danault William Carrier T.J. Oshie Anthony Beauvillier Jake Debrusk Alex Iafallo Brad Marchand Warren Foegele Jack Eichel Jonathan Marchessault Artturi Lehkonen Brock McGinn Mark Stone Filip Forsberg Tanner Pearson Miles Wood Brayden Point Nathan MacKinnon Tyler Seguin Chris Kreider Sebastian Aho Ryan Dzingel Jonathan Drouin Kyle Clifford* Brandon Tanev Radek Faksa Blake Comeau Elias Lindholm Max Pacioretty John Tavares Lawson Crouse Gabriel Landeskog Nick Cousins Luke Glendening Mika Zibanejad* Justin Williams* Zach Hyman Alex Galchenyuk Aleksander Barkov Danton Heinen Matt Duchene Tomas Nosek Nino Niederreiter Brett Connolly Sam Gagner Kevin Hayes Sam Bennett Charlie Coyle Vladislav Namestnikov Ryan O'Reilly Ondrej Palat** Yanni Gourde** Alex Ovechkin Jeff Carter Filip Chytil Bo Horvat Bryan Rust Nico Hischier Tyson Jost* Alex Killorn** Kyle Connor Bobby Ryan David Krejci Jakub