Shot Quality 2005-06

Shot Quality 2005-06

Shot Quality 2005-06 Revisiting NHL shot quality for the 2005-06 Regular Season Ken Krzywicki – October 2006 Abstract This study revisits previous analyses regarding NHL shot quality. A logistic regression model using 2005-06 regular season data was constructed to predict the probability of a shot on goal going in. The data elements available for study during the 2005-06 regular season were more than those in 2003-04. For example, the NHL made available take- and give-aways, hits, missed shots, etc. that were not published in 2003-04. New rules, such as the elimination of the center red line for offside passes, the goalie “forbidden zone,” redrawn zone dimensions and tighter officiating standards made for a different appearing game than in the past. Far more power play opportunities were available the past season and the goals per game were up. A logistic regression model was chosen to fit the 2005-06 regular season data, with the binary target outcome variable of goal versus save. Five predictor variables—distance, rebound, situation, shot after opponent turnover and shot type—remained in the final model. Each shot on goal was then assigned a predicted probability of going in, i.e. shot quality for. One minus this value was the predicted probability of a save (shot quality against). The model fit the data well, as demonstrated below. Certain inferences about actual performance compared to predicted performance were made and contained herein. Background Prior to building this new model, a model built using 2003-04 regular season data1 was examined and found to be rather predictive when applied to 2005-06. However, due to the availability of additional data elements and the changes made to the game, a new model was constructed. This might seem contrary to the reasons given in the paper Playoff Shot Quality – Examining 2003-04 NHL Playoff Shot Quality Using a Regular Season Model,2 for not rebuilding the model based on playoff data where scoring was down. The reason given at the time for not redeveloping the model was that the regular season algorithm fit the playoff data. We still stand by that reasoning, but wish to add the data available had but 389 goals on 4,816 shots. The low number of goals would 1 Krzywicki – January 2005 2 Krzywicki – November 2005 2005-06 Regular Season Shot Quality Ken Krzywicki – October 2006 1 have made redevelopment of the model difficult at best; it would have been hard to construct a robust model that did not over fit a training dataset. We have, nonetheless, recently attempted to fit a new model to the 2003-04 playoff data and found this to be the case.3 Clumping of scores around a certain value, lack of a smooth distribution of predicted probabilities and over fitting issues were encountered. The original 2003-04 model fit the 2005-06 data as well, but for reasons cited above regarding changes in the rules and zone dimensions, as well as the availability of additional data elements, we decided to rebuild the model using current data. While the new model was similar, it availed itself of an extra variable that was not previously obtainable (shot after a turnover) and, as we shall see further on, the trend for short- handed shots was opposite that from 2003-04. 2005-06 Regular Season Data The data used for the model was collected from the NHL play-by-play (PBP) and game summary (GS) files, which are both available at www.nhl.com. The PBP and GS files were generated by the Real Time Scoring System (RTSS) and provided information on the events that occurred throughout the games. This information formed the building blocks for this study. Some files were not available and others stopped short of recording the full game; the data did not tie out to the year-end figures published by the League, but any differences were statistically immaterial to this study. We had 7,426 goals on 73,570 shots for this analysis. All figures presented herein were extracted from this data. Methodology A binary target variable was created with a value of 1 for a goal and 0 for a save. The data was randomly split 75%/25% for model training, or development, and validation. Potential predictor variables were classed, or “binned” and considered for model inclusion. Variables considered for the model included: 3 Bootstrapping techniques were not attempted. 2005-06 Regular Season Shot Quality Ken Krzywicki – October 2006 2 • Distance in feet • Shot type • Situation o Even strength, short-handed or power play • Period • Rebound4 o A shot within two seconds of another shot with a distance less than 25 feet and no intervening event • Own rebound o Rebound shot, as defined above, taken by the same player as prior shot • Shot after face-off win o A shot where the shooting team won a face-off, taken within 5 seconds, with no intervening event • Shot after opponent shot block o A shot after the opponent blocks a shot with no intervening event • Shot after blocking an opponent’s shot o A shot after shooting team blocks an opponent’s shot with no intervening event • Shot after take-away o A shot after shooting team records a take-away with no intervening event • Shot after opponent give-away o A shot after opponent records a give-away with no intervening event • Shot after either type of turnover (defined above) • Shot after a missed shot o A shot after shooting team misses a shot with no intervening event A logistic regression was constructed on the 75% training sample and validated on the 25% that was held out. This was done to ensure that the model did not over fit the development set. Only statistically significant variables remained in the final model. 4 As defined by Ryder in Shot Quality – A Method for the Study of the Quality of a Hockey Team’s Shots Allowed [January 2004] 2005-06 Regular Season Shot Quality Ken Krzywicki – October 2006 3 Model Results Five variables from the list above, plus an intercept term, remained in the final model; the others were either too correlated to these or insignificant. Marginal Variable Range Points Contribution Intercept Add to all records -2.0671 Less than 12 ft 0.5718 12 ft 0.5221 13 - 16 ft 0.4856 17 - 18 ft 0.3464 19 - 21 ft 0.2699 22 - 32 ft 0.0000 Distance 33 - 35 ft -0.4455 0.0484 36 - 37 ft -0.5130 38 - 40 ft -0.7515 41 - 44 ft -0.8876 45 - 52 ft -0.9855 53 - 59 ft -1.0885 60 ft or more -1.2802 Yes 1.3382 Rebound 0.0198 No -0.0743 Even Strength -0.1542 Situation Short-Handed -0.0582 0.0091 Power Play 0.3702 Shot after Yes 0.3917 0.0028 turnover No -0.0428 Wrap or Slap -0.0815 Wrist 0.0127 Shot Type Backhand 0.0227 0.0007 Snap 0.0289 Tip-In 0.1744 Table 1: Model Scorecard, Sorted by Marginal Contribution of Each Variable Shot quality was defined by the model score, or predicted probability of a goal, using Table 1 as follows: 1 P(GOAL) = . (1) −points 1+ e 2005-06 Regular Season Shot Quality Ken Krzywicki – October 2006 4 The marginal contribution5 of the shot type was not very high, but since it did add value to the model, was statistically significant and made sense, it remained. Shot distance and rebound contributed most to the model as evidenced by their relatively higher marginal contributions. An interesting difference in the point assignments for situation versus the 2003-04 model was for short-handed shots. In 2003-04, short-handed shots had a positive trend, i.e., they were of higher quality. This was attributed to the fact that most short-handed shots on goal were probably on breakaways or odd-man rushes. This likely held true for 2005- 06 as well, but with the introduction of the shootout, goalies started practicing breakaways more often. This might explain the lower quality status of short-handed shots. That said, the point assignment was only -0.0582, which, while significant was not that strong relatively. This was another reason for not simply using the 2003-04 model; we wished to minimize misclassification, which would result from the short- handed shots receiving positive points when that clearly was not the trend during the 2005-06 regular season. Other variables that were common to both models6 exhibited directionally similar trends. In order to show the model did not over fit the training data, we wish the Kolmogorov- Smirnov (KS) statistic7 between development and validation datasets to be close and it was—a difference of 1.51 KS points, or 4.6% was observed. Details are shown in Table 2 below: 5 Reduction in max-rescaled R2 when variable removed from model, i.e., full model R2 minus R2 of model without variable in question. 6 All variables except shot after turnover were also in the original model. 7 See Glossary for definition. 2005-06 Regular Season Shot Quality Ken Krzywicki – October 2006 5 KS Report for Training Sample Pred Probability (%) Totals Cuml % Save Int Rate Cuml % Goal Int Rate Cuml % KS Avg Scr (%) 19.64 100.00 5,560 10.09% 4,044 72.73% 8.16% 1,516 27.27% 27.38% 19.22 28.17 14.35 19.63 5,454 19.98% 4,509 82.67% 17.25% 945 17.33% 44.45% 27.20 16.24 13.09 14.34 5,615 30.17% 4,870 86.73% 27.07% 745 13.27% 57.90% 30.83 13.80 9.04 13.08 5,271 39.73% 4,655 88.31% 36.46% 616 11.69% 69.03% 32.56 11.60 8.18 9.03 5,554 49.81% 5,043 90.80% 46.63% 511 9.20% 78.26% 31.62 8.90 5.54 8.17 5,396 59.60% 5,012 92.88% 56.74% 384 7.12% 85.19% 28.45 6.67 4.47 5.53 5,888 70.28% 5,547 94.21% 67.93% 341 5.79% 91.35% 23.42 5.11 3.53 4.46 5,844 80.89% 5,633 96.39% 79.29% 211 3.61% 95.16% 15.87 3.91 3.10 3.52 5,122 90.18% 4,993 97.48% 89.36% 129 2.52% 97.49% 8.13 3.30 0.00 3.09 5,412 100.00% 5,273 97.43% 100.00% 139 2.57% 100.00% 0.00 2.69 Total 55,116 49,579 89.95%

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    25 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us