Archive of SID

Social Network Analysis of Passes and Communication Graph in Football by mining Frequent Subgraphs

Amir Hossein Ahmadi Ahmadi Noori Babak Teimourpour* Master of Information Technology Master of Information Technology Assistant Professor of Information Engineering Engineering Technology Engineering Tarbiat Modares University Tarbiat Modares University Tarbiat Modares University , Tehran, Iran Tehran, Iran [email protected] [email protected] [email protected]

Abstract−Sport is regarded as an inseparable part of Concerning the graph and network analysis, different human life. Currently, a growing trend is observed in people's types of node centrality criteria can be found in a graph, interest in football teams. In general, a successful procedure in which indicate the relative importance of a node in the graph. players’ communication is one of the main factors required for Most centrality concepts were first developed in social the victory of that team. The present study aimed to perform network analysis and consequently, various terms used for analyzes based on the perspective of social and communication measuring the centrality were used as a sociological origin networks (such as player passes and in-game transactions) to [5]. improve team performance. The analysis was performed on data collected from three matches of the Persepolis club in the Regarding the high importance of analyzing the players' first half-season of the Iranian Premier League 2019-20. This cooperation in team matches, several studies have applied research seeks to review this issue from two integral social network analysis to determine relationships between perspectives as follows: 1) evaluating the performance of players in a team [3, 6, 7], as a high degree of coordination in individuals as a part of a social network, 2) investigating the the team leads to better performance. Thus, it can be said that communication network between players. To this aim, we used a team with strong coordination and interaction between the innovative method of recognizing and classifying frequent team players seeks high performance in the games [8]. By subgroups in this analysis. It is worth noting that 20 persen of using the network approach, this study indicates that the these routes were in the defensive line while 31 persen were in crucial role of interaction patterns between players in team the defensive midfielder. However, there were no routes in the performance [3]. attacker line or offensive midfielder, which indicated a form of weakness. On the other hand, various types of node degrees, However, determining the relationship between the points, and n-pass cycles were calculated in other sections. The players in a team is necessary to formulate the passing results revealed the weak performance of the connection process within a team. In the consecutive passes cycle, the bridge between the team's playmakers and the end-players for passing process depends on the connection between the shooting the ball. Although these topics were discussed at a players and the team's collective behavior [9, 10]. The game minor level and only three matches of a team, the results can position of a player is the main limitation on passes be generalized to other issues. distribution and communications between teammates [11, Keywords−Social Network Analysis; Graph Analysis; 12]. Frequent Subgraphs; Reach Centrality; Football Analysis. Note that analyzing the centrality of the pass network is I. INTRODUCTION another important issue that should be considered in this field. The creation of scoring positions during the game and Generally, discovering different ways to score a goal in a the match score to some extent depend on the centrality of football team by using the passes network helps the technical the passing network to different game situations during staff to analyze the match and select players and tactics to be consecutive passes [13]. In 2014, a study was conducted on used in the next matches. So far, a large number of studies the players in the FIFA World Cup (2014), concluding that conducted by using network theory have shown passes midfielders have the highest value of out-of-degree leading to offensive situations as networks, in which nodes centrality, degree centrality, closeness centrality, and indicate players and edges represent passes between players. distance centrality in most teams [14]. Pass network includes various contents such as evaluating the characteristics of players and teams quantitatively [1, 2, On the other hand, some research has studied ball 3]. Another research has only analyzed successful passes, possession and its effect on the result of the match, by including throw-in, goal kicks, corner kick, and free-kick [4]. comparing two possession and direct play styles. In this regard, studies have highlighted the importance and necessity

www.SID.ir

Archive of SID

of evaluating how to use possession style related to effective event. The statistical result of these communication was offensive aspects by teams [15, 16]. For example, a study shown in Table I. reviewed the playing methods of successful and unsuccessful teams in the 1986 World Cup and concluded the high rate of A. Cycle possession in successful teams compared to unsuccessful A cycle or ball circulation cycle is considered from the teams [17]. However, another article obtained a result on first event that Perspolis players possess the ball to the last goals scored in the competition, according to which 80% of event that they lose the ball. These cycles are classified into goals were scored with a sequence of three passes and less two categories of successful and unsuccessful categories, [18]. Specifically, no contradiction exists between these two depending on the result of each cycle. In successful cycles, results and it can be said that the team with the highest possession percentage has scored goals in three passes or the destination of the last communication includes one of less. Another research revealed that successful teams created the nodes of the shoot, on-target shoot, and the like. On the more opportunities by using long consecutive passes, other hand, unsuccessful cycles refer to the cycles whose although the ratio of goals scored by direct style was better last event includes losing the ball and the like. than that of the possession game [19]. In long consecutive pass cycles, the number of shots was significantly higher for B. Communication graph successful teams, compared to short consecutive pass cycles The team's communications network was developed [19]. Another research proved that top European football based on transactions. Nodes include players’ numbers teams use long consecutive passes to score goals when losing or drawing, and short consecutive passes when winning [20]. present on the ground, goal (as Gol), shoot (on goal as Shoot2Darvazeh), shoot, cross (as Santr), throw-in (as Out), By reviewing previous research in this field, it is corner, losing the ball (as Lo), and faults made by Persepolis observed that the highest degree of centrality can be found in and opponents players (as Khata1, Khata2). Fig 1 illustrates offensive players of teams that mainly use direct style, have the network as a directed graph in Gephi visualization [21]. Another research represented that the highest centrality software, as well as an example in this way. was in the wing-back defenders and the defensive midfielders. These values indicate the first step in the pass C. Degrees distribution sequence from the wing-back defenders and the defensive For each node, the sum of the input and output degrees is midfielders. An attempt to create an attacking position starts calculated to obtain the frequency distribution graph of the from the spaces at the back of the ground. They also suggested a play style based on possession and the lack of nodes. The degree of each node indicates the level of the using counterattack [22, 23]. Concerning the teams with play node's involvement in match transactions, which is one of style based on the pass sequence in midfield players, the criteria for measuring the individual performance of midfielders had the most pass received from their teammates, players. Mathematically, G represents the directed graph of and therefore, it is obvious that they were the team’s target the communication and V(G) shows the set of the vertices in players [22]. Despite conducting important studies in this this graph. If , then the vertex v equals the sum of area, few studies have used centrality criteria to identify the edges exited and entered into this vertex. players with an important role in the structure of the team D. Average degrees network graph [21, 24]. This research seeks to analyze the pass and The average node degrees and the graph density level communication network by finding and categorizing frequent were calculated as a criterion for displaying cooperation subgraphs for a football team (Persepolis), aiming to identify between the players. Fig 2 shows the statistics of average and examine team play style patterns, strengths and degrees, average weighted degrees, and graph density weaknesses, critical routes, and strong communication obtained according (1,2,3) from the three matches. The between players. Adopting an attitude different from a social weighted degree of each node is achieved by weighting the network perspective to players can generate different results events in the reach centrality section. In this way, the for team use and increase the quality of matches. TABLE I. STATISTICS OF NODES AND COMMUNICATION IN EACH MATCH ESEARCH ETHOD II. R M Total number of nodes (N) 23 In this study, we examined three matches of a team in Persepolis vs. Gol Gohar Total number of edges (L) 657 the 19th season of the Pro League (Persepolis F.C. against Gol Gohar Sirjan F.C., Paykan F.C., and Total number of nodes (N) 23 Esteghlal F.C.). Then, we designed and constructed a Persepolis vs. Peykan database of communications between Persepolis players and Total number of edges (L) 589 in-game transactions (such as shoot, shoot (on goal), lose Total number of nodes (N) 23 the ball, kick, corner, etc.) as events. The origin, destination, Persepolis vs. Esteghlal and type of communication should be specified in each Total number of edges (L) 374

www.SID.ir

Archive of SID

parameter L' demonstrates the sum of the number of edges without considering duplicate edges, N represents the number of nodes, and W is the sum of the weight of the nodes:

verage degrees

E. Frequent – Correlation subgraph Frequent subgraphs were extracted from the communication network (by gSpan – master) to recognize frequent sequences of match events. Frequent subgraphs indicate strong communication of the nodes participating in each of these sequences with each other. Each of these subgraphs is a correlation path. Fig. 1. Visualizing the communication network of Persepolis versus Peykan Frequent subgraphs can be used to formulate a correlation productivity strategy for players and their 30 0.32 capabilities (before matches) or to analyze the correlation 27.435 path of opponent team players (in previous matches) to 0.315 present a counter plan with these communications or in- 0.314 game analysis. The number of repetitions and nodes in each 25 0.31 subgraph is of great importance. Furthermore, evaluating the 0.308 type of correlation paths formed during a match may 19.87 0.305 represent a part of the team's strategy. 20 F. Reach centrality 0.3 Reach centrality in the communication graph was 14.13 calculated by weighting each event. First, the weight 1 was 15 0.295 assigned to all events and then, the weighting process was implemented on each cycle. If the cycle was classified as a 0.29 successful one, depending on the type of cycle, a positive 10 weight would be assigned to the last event of the cycle, and 0.285 6.783 6.913 the other events in the cycle received a fraction of this 6.043 weight from end to end in descending order. However, a 0.28 5 negative weight was assigned to the last two events of the cycles categorized in the unsuccessful category. 0.275 0.275

Ultimately, the sum of the weight of the edges is obtained 0 0.27 in different cycles, which indicate the suitability or Perspolis-Golgohar Perspolis-Peykan Perspolis-Esteghlal inappropriateness of each path (from the origin player to the target, which is reaching the successful cycle events), by Average weighted degrees Average degrees considering the effective players. Graph density

III. RESULTS Fig. 2. Analyzing cooperation and comparing the points of nodes degree A. Degrees distribution and graph density

Figs 3, 4, and 5 provide the distribution graph for the lines or that indicate the pattern of degrees distribution. The degrees of nodes in the three Persepolis matches which is orange dashed line represents the linear trend relative to the one of the criteria of analogy and analysis of graphs. These beginning and the end while the blue dashed line shows the figures indicates the team's behavioral difference in these average trend between nodes. three games. The blue and orange dashed lines are trend

www.SID.ir

Archive of SID

As shown in Fig 5, the frequency of nodes decreases in 4 higher degrees.

3.5

Active players in each match can be identified by 3 specifying the nodes with a higher-than-average (including 2.5 duplicate communication) communication degree. 2 B. Average degrees 1.5

By comparing the three criteria of average degrees, average 1 Numbernodes of weighted degrees, and graph density, it can be observed 0.5 that: 0 Persepolis and Peykan match has the highest average 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 degrees – density graph, which indicates the higher Degree of nodes cooperation of players in the match.

The average weighted degrees of Persepolis and Gol Gohar match was the highest, suggesting better statistical Fig. 3. Degrees distribution in the communication graph of Persepolis performance in gaining reach centrality points. match against Gol Gohar

C. Frequent – correlation subgraph 3.5

Correlation paths were obtained in all three matches. 3 The following figures depict some of the correlations in the 2.5 Persepolis vs. Peykan match. 2 The repetition of the subgraphs in Fig 6 implies 1.5 Persepolis' strategy for penetration from the right-hand side

by the player no. 17. 1 Numbernodes of The repetition of subgraphs in Fig 7 in the Persepolis vs. 0.5 Esteghlal match shows the proper relationship of the 0 defensive line with the midfield line. However, the midfield 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 line fails to communicate with the offensive line to complete Number of nodes the correlation route.

Correlation paths with at least three players and three Fig. 4. Degrees distribution in the communication graph of Persepolis repetitions were collected from the events of three matches. match against Peykan Further, we categorized the paths depending on their beginning and end points, which were in one line 3.5 (defensive, midfield, and offensive) or their beginning point 3

was in one line and their end point was on another line(Fig 8). The reciprocation paths, which are first in one line and 2.5 then, are on the other line, and finally, return to the origin 2 line are considered as the "other" category(For example, path 1 to 6 is considered a defensive line path in Fig 6). 1.5

D. Centrality 1 Numbernodes of Reach centrality was computed and visualized. Figs 0.5 9,10,11 display players by dark blue, light blue, and creamy from high to low centrality degree. 0 Considering the graph shown in Figs 9, 10 and 11, dark 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 colors indicate that players sent valuable balls (shoot, cross, Degree of nodes assist, etc.) and were effective players. The strategy of reaching the ball to these players is a good analysis. Fig. 5. Degrees distribution in the communication graph of Persepolis match against Esteghlal

www.SID.ir

Archive of SID

Fig 6. Frequent subgraph The graph illustrated in Figs 12 - 13 was plotted based on the reach centrality, in which the size of each node represents the center of reach centrality of the input of that node (players who received and sent more important balls during the match, are displayed in larger view). This analysis seeks to find the critical path to reach the goal node. The graph of Figs 12 - 13 shows some of the paths with the highest reach score from the selected node to the target node. In Fig 12, the path contains the sequence of nodes "8-70- 88-16-goal", respectively. In Fig. 13, the path contains the sequence of nodes "17- 7-88-16-goal", respectively.

E. N-pass cycle Fig. 7. Frequent subgraph Cycles achieve different results with different scores for implementing possession strategy and providing during the game. The number of passes in each cycle was opportunities from the closed play style of the opponent. considered as a label and categorized accordingly. The sum of the scores for each cycle was calculated from the sum of IV. CONCLUSION the scores of the paths in reach centrality (1P: a cycle with The success of a team relies on various factors. By the length 1-pass). adopting a social network approach of passes and in-game Fig 14 shows the result of the analysis of n-pass cycles in transactions, this research sought to analyze the pattern of players' communication in previous competitions. The three matches of the Persepolis club. According to the graph, strong interaction between team members, as well as their it is found that Persepolis performed better in 6-pass, 7-pass, cooperation as a whole network, improves team and 8-pass cycles with ratios of 7.9, 9.2, and 9.2, performance dynamically. This study has evaluated the sport respectively, and gained a higher score concerning the from an innovative and network-based perspective and number of cycles. This analysis reveals Persepolis' play style provided results, suggestions, and strategies.

www.SID.ir

Archive of SID

Number of subgraphs Defensive

Defensive Defensive Offensive to Others 20% to midfield midfield 31% 0% Midfield to Offensive defensive 0% Defensive to Midfield midfield 31% Midfield to Midfield offensive 3% to offensive Midfield to Midfield Offensive defensive 6% to midfield 9% Fig. 9. Displaying the performance of Persepolis players against Gol Gohar Fig. 8. Categorizing Persepolis frequent subgraphs in terms of points gained in the match By considering studies and analyzes conducted on the database of Persepolis matches in the , this research was carried out from two perspectives:  Evaluating players' performance as a part of a social network (various types of node degrees, point of each node, etc.)  Examining the players' communication network as a graph (correlation path, n-pass cycle, critical path, etc.) The first perspective addressed the calculation of different types of degrees, total degree, point of nodes, and graph density. On average, each player was associated with six other nodes. The second perspective was related to the identification and categorization of correlation paths. Totally, 20% of Fig. 10. Displaying the performance of Persepolis players against Pekan in these paths were in the defensive line while 31% were in the terms of points gained in the match midfield line. However, no path was observed in the offensive line and midfield to offensive line. By calculating the reach centrality, critical path, and n-pass cycles, it was found that the 8-pass cycles were effective for Persepolis. Finally, the problem was found in the offensive line or in establishing effective communication with this line, due to the possession play style, playing at the back of the ground, and lack of strong correlation with the offensive line. The function obtained is to investigate only three matches of the team. By evaluating and applying other methods, new results can be obtained that for example, suggest that structural holes in the analysis be examined.

REFERENCES [1] Clemente, F. M., Couceiro, M. S., Martins, F. M. L., & Mendes, R. S., “Using network metrics in soccer: macro-analysis”, Journal of Human Kinetics, 45, 123–134, 2015. [2] Duch, J., Waitzman, J. S., & maral, L. ., “Quantifying the performance of individual players in a team activity”, PLoS One, Fig. 11. Displaying the performance of Persepolis players against Esteghlal 5(6), e10937, 2010. in terms of points gained in the match

www.SID.ir

Archive of SID

[3] Grund, T. U., ”Network structure and team performance: The case of English premier league soccer team”, Social Networks, 34(4), 682– 690, 2012. [4] Kawasaki, T., Sakaue, K., Matsubara, R., & Ishizaki, S., “Football pass network based on the measurement of player position by using network theory and clustering”, International Journal of Performance Analysis in Sport, 19(3), 381-392, 2019. [5] Al Falahi, Kanna, Nikolaos Mavridis, and Yacine Atif., "Social networks and recommender systems: a world of current and future synergies." In Computational Social Networks, pp. 445-465. Springer, London, 2012. [6] Bourbousson, J., Poizat, G., Saury, J., & Seve, C., “Team coordination in : Description of the cognitive connections among teammates”, Journal of pplied Sport Psychology, 22(2), 150- 66, 2010. [7] Duch, J., Waitzman, J. S., & maral, L. ., “Quantifying the performance of individual players in a team activity”, Plos One, 5(6), 109-19, 2010. [8] Fewell, J. H., Armbruster, D., Ingraham, J., Petersen, A., & Waters, J. S., “Basketball teams as strategic networks”, Plos One, 7(11), 474-85, 2012.

Fig. 12. The graph displaying the critical path with the highest score [9] Duch J , Waitzman JS , maral L ., “Quantifying the performance of individual players in a team activity”, PLoS One,5(6),e10937, 2010. [10] Clemente FM , Martins FML , Kalamaras D , Wong DP , Mendes RS ., “General net- work analysis of national soccer teams in Fifa World Cup 2014”. Int J Perform nal Sport,15(1), 2015. [11] Clemente FM , Martins FML , Kalamaras D , Wong DP , Mendes RS ., “Midfielder as the prominent participant in the building attack : a network analysis of na- tional teams in Fifa World Cup 2014”, Int J Perform Anal Sport, 704–22, 2015 . [12] Peña JL , Touchette H ., “ network theory analysis of football strategies”, In: Clanet C, editor. Sports physics: proceuromech physics of sports con- ference. Palaiseau, France: Editions de l’ \ ’Ecole Polytechnique, Palaiseau, p. 517–28, 2012 . [13] Clemente, Filipe Manuel, Hugo Sarmento, and Rodrigo Aquino., "Player position relationships with centrality in the passing network of world cup soccer teams: Win/loss match comparisons", Chaos, Solitons & Fractals 133,109625, 2020. [14] Clemente, F. M., Martins, F. M. L., Kalamaras, D., Wong, P. D., & Mendes, R. S., “General network analysis of national soccer teams in FIF World Cup 2014”, International Journal of Performance Analysis in Sport, 15(1), 80–96, 2015. [15] Tenga, ., Holme, L., Ronglan, L.T., and Bahr, R., “Effect of playing tactics on goal scoring in Norwegian professional soccer”, Journal of Fig. 13. The graph displaying the critical path with the highest score Sports Sciences, 28(3), 237-244, 2010. [16] Lago-Ballesteros, J., Lago-Peñas, C., and Rey, E., “The effect of playing tactics and situational variables on achieving score-box 160 possessions in a professional soccer team”, Journal of Sports 140 Sciences, 30(14), 1455-1461, 2012. 120 [17] Hughes, M.D., Robertson, K., & Nicholson, .., “ n analysis of the 1984 World Cup of ssociation Football”, In Science and Football ,

100 edited by T. Reilly, A. Lees, K. Davids and W. Murphy, pp. 363 – 80 367. London: E & FN Spon, 1988. Points 60 [18] Reep, C., Pollard, R., & Benjamin, B., “Skill and chance in ball games”, Journal of the Royal Statistical Society, , 134, 623-629, 40 1971. 20 [19] Mike Hughes & Ian Franks., “ nalysis of passing sequences, shots and goals in soccer”, Journal of Sports Sciences, 23:5, 509-514, 2005. 0 1P 2P 3P 4P 5P 6P 7P 8P [20] Paixão, Paulo, Jaime Sampaio, Carlos H. Almeida, and Ricardo Total points of Duarte., "How does match status affects the passing sequences of top- 81 7 128 146 73 150 138 83 level European soccer teams?." International Journal of Performance cycles Analysis in Sport 15, no. 1, 229-240, 2015. Number of cycles 56 57 36 26 20 19 15 9 [21] Malta, Pedro, and Bruno Travassos. "Caraterização da transição defesa-ataque de uma equipa de Futebol." Motricidade 10, no. 1: 27- 37, 2014. Fig 14. Graph of total points gained in terms of cycle type

www.SID.ir

Archive of SID

[22] Clemente, Filipe Manuel, Micael Santos Couceiro, Fernando Manuel Mendes. "The social network analysis of Switzerland football team on Lourenço Martins, and Rui Sousa Mendes. "Using network metrics to FIFA World Cup 2014." Journal of Physical Education and Sport 15, investigate football team players' connections: A pilot study." Motriz: no. 1 : 136, 2014. Revista de Educação Física 20, no. 3 : 262-271, 2014. [23] Clemente, Filipe Manuel, Fernando Manuel Lourenço Martins, Dimitris Kalamaras, Joana Oliveira, Patrícia Oliveira, and Rui Sousa

www.SID.ir