Paper: Rugby Football Categorization of Rugby Union Players by Performance Characteristics Using Principal Component Analysis and Cluster Analysis Makoto Kiuchi*, Hirofumi Maehana** and Nobuyoshi Hirotsu*
*Graduate School of Health and Sports Science, Juntendo University 1-1 Hiragagakuendai, Inzai, Chiba 270-1695 Japan [email protected] **Department of Psychological Counseling Faculty of Human Sciences, Mejiro University [Received October 4, 2019 ; Accepted August 4, 2020]
The aim of this study was to categorize rugby players by performance characteristics irrespective of positions using principal component analysis (PCA) and cluster analysis (CA). Data were drawn from the Japan Rugby Top League 2015–2016 season; a sample of 231 players and 16 items were used in the analysis. We used PCA to identify the players’ ability and CA to categorize players’ performance characteristics, regardless of position. PCA reduced the 16 items to four principal components: penetration, defense and competition, ball handling and kicking, and turnovers. We categorized players into six clusters based on the four component scores: playmaker, ball carrier, tackler and supporter, competitor, passer and kicker, and spoiler. Although player categorization by CA was achieved regardless of position, the players’ performance characteristics also influenced categorization, and specific positions have performance characteristics. Unique players were identified for some positions. In addition, we clarified a difference in performance characteristics between Japanese and foreign players in the league. Since the categorization of players by performance characteristics irrespective of position enables teams to find necessary players, this study may be advantageous for team management and may contribute to more useful analysis of players and clarifying performance characteristics.
Keywords: game data, performance characteristics, player categorization, principal component analysis, cluster analysis
[Football Science Vol.17, 86-97, 2020]
1. Introduction because the players who can perform the role on the team could be identified. This means that a team that In rugby, players with varied performances is performs an open attack requires a player who can increasing in many domestic competitions due to the have characteristics of playing a pass to succeed. Since rule allowing foreign players to be selected for national the success of a team greatly depends on individual teams of other countries, providing certain conditions performance (Lim et al., 2009), categorizing players by are met. For example, the number of players from other performance characteristics may lead to greater success. countries such as New Zealand, Australia, and South Therefore, categorizing players by their performance Africa has increased in the Japan Rugby Top League characteristics can be helpful in creating a more effective (JRTL). The JRTL currently has a its most diverse selection process. collection of players ever, with each player also having In rugby, there are ten positions, divided into forwards different performance characteristics. (FW) and backs (BK). In the FW pack, there is a prop The performance characteristics of players determine (PR), a hooker (HO), a lock (LO), a flanker (FL), and their role in the team. In many cases, individual players number 8 (NO8); and the BK consists of a scrum-half may be assigned specific roles on a team depending (SH), a fly-half (FH), center (CTB), wing (WTB), and on their performance characteristics. When a team is fullback (FB). A player is generally chosen depending constructed with the aim of strengthening it, it is useful on the position and the trend of performance (Duthie to categorize players by performance characteristics, et al., 2006; Parsons and Hughes, 2001; Quarrie et al.,
86 Football Science Vol.17, 86-97, 2020 http://www.jssf.net/home.html Categorization of Rugby Union Players
2013; Villarejo et al., 2013; Vivian et al., 2001). The are partitioned into groups. Thus, by using CA with performance characteristics of players might relate to the data of their synthetic variables, players who have their positions. For example, an FH performs many similar performance characteristics during the match passes and kicks to move forward to the ball and could be categorized. This will help us to identify the territory for other members of the team. Additionally, an performance characteristics of the individual players. LO is performing many tackles and providing support Hence, through a comprehensive analysis using PCA during a field play (Parsons and Hughes, 2001). and CA, we could categorize players by performance Although the performance characteristics of players characteristics. relate to positions, categorizing players firstly based on The usefulness of such analysis has been seen in other performance characteristics without considering players’ sports: PCA analysis has been used for soccer (football) positions, could contribute to finding a more varied team (Barros et al., 2006; Fernandez-Navarro et al., 2016), composition. For example, a team that has a weak point tennis (Kramer et al., 2017), and Olympic records data in its defense might find players that have characteristics (Naik and Khattree, 1996). For example, Fernandez- to tackle in the SH. Or, a team that is weak in ball Navarro et al. (2016) analyzed the playing style of a possession might select players, such as LOs, who are soccer team comprehensively using PCA. In their study, good ball carriers. The players who have this kind of 19 items of game data were reduced to six components characteristics may not be usual players, but they could that displayed the team performance characteristics be needed in a specific team. As the variation of the from each component score. Likewise, CA has been performance is increasing in modern rugby with the aim used to categorize the performance trend of a position of strengthening a team, categorizing players by their in rugby (Quarrie et al., 2013), which displays the performance characteristics regardless of position, i.e., differences between the activities and time-motion of irrespective of position, could be helpful to understand the positional groups, both between and within the FW performance characteristics of individual players and and the BK. Moura et al. (2014) conducted a study that contributes to strengthen a team. This is important in applied both PCA and CA, summarizing the game data modern rugby, especially now that there is increased of soccer teams using PCA and then classifying them variation among the performance of the players. into winning and losing teams based on their component Categorization of players by performance scores using CA. Similarly, through PCA and CA, rugby characteristics may be conducted using game data. players can be classified depending on their performance Game data contain important information related to characteristics. performance during a game. Therefore, game data were The advantage of categorizing players by performance used to evaluate players. James et al. (2005) developed a characteristics, irrespective of position, is the potential to position-specific evaluation indicator to assess compare identify suitable players for teams. Clarifying the kind the players of each position. Their evaluation method of performance characteristics each player can perform has been used to identify the players’ playing patterns contributes to the acquisition and appointment of through a comparison of such data. However, since players by the team’s management. Therefore, the aim rugby is complicated and multifaceted (International of this study was to categorize players by performance Rugby, 2011), a comprehensive analysis of the characteristics, irrespective of positions using PCA and performance of each player needed to be conducted. CA. We believed that principal component analysis In the JRTL, there are players with a various range (PCA) and cluster analysis (CA) would be effective of performance characteristics from not only Japan but categorizing the players by performance characteristics also elsewhere in the world, and they have a variety from a comprehensive analysis. PCA can reduce data of skills. In this study, the classification of players by into several factors through a grouping process and performance characteristics in the JRTL was used to emphasize the key factors that should be analyzed, and highlight such variation. can create the synthetic variable that could present the ability of performance of the player related to each component. Thus, by using PCA to analyze the game data, the player can be evaluated comprehensively with their synthetic variables. On the other hand, CA can be used for classifications in which similar items
Football Science Vol.17, 86-97, 2020 87 http://www.jssf.net/home.html Kiuchi, M. et al.
2. Methods Ethical approval for this study was granted by the ethics committee of Juntendo University (Faculty of 2.1. Data Health and Sports Sciences, Juntendo University, Chiba, Japan). Data from the JRTL 2015–2016 season was provided by Data Stadium Inc., which was certified as the official 2.2. Statistical Analysis statistics provider by the JRTL. There were ten games in the JRTL 2015–2016 season. Officially, each game lasts We used PCA to extract components from the 80 minutes. Since each team played ten games, the total 16 items (see Table 1). The criterion for component time played per season was 80 × 10 = 800 minutes. For extraction was that the eigenvalue exceeded 1.0. Play this study, we analyzed the data of 231 players who had items with principal component loadings greater than 0.5 played for more than 400 minutes, or about one-half displayed a positive or negative correlation and indicated of the total time that season. The players who played a substantial value for each component. In addition, the different positions for each game were categorized, component scores were quantified by categorizing them depending on the position that they played most often. into six clusters using CA by the k-means method. We We selected items that can be played in any position used the SPSS statistics software (Version 22.0) for this to the comprehensive analysis of all players enabled. analysis. Therefore, the field play was analyzed, except for set pieces such as place kicks, lineouts, and scrums. Data analysis was conducted based on the 16 items indicated in Table 1.
Table 1 Variables studied in player assessment