Searching for a Unique Style in Soccer
Total Page:16
File Type:pdf, Size:1020Kb
Searching for a Unique Style in Soccer Laszlo Gyarmati, Haewoon Kwak Pablo Rodriguez Qatar Computing Research Institute Telefonica Research {lgyarmati,hkwak}@qf.org.qa [email protected] ABSTRACT strategies was limited due to technological reasons. Nowa- Is it possible to have a unique, recognizable style in soccer days, technological advancements of the last decade allow nowadays? We address this question by proposing a method team staffs to view any first division soccer game on a short to quantify the motif characteristics of soccer teams based notice; hence, it seems to be challenging to have and sustain on their pass networks. We introduce the the concept of a unique playing characteristic—which is additionally suc- “flow motifs” to characterize the statistically significant pass cessful too—in the global soccer space. Does such a unique, sequence patterns. It extends the idea of the network motifs, recognizable style exist in soccer nowadays? highly significant subgraphs that usually consists of three or The identification and understanding of the style of soccer four nodes. The analysis of the motifs in the pass networks teams have practical impacts apart from the esthetics of the allows us to compare and differentiate the styles of differ- game. The players of a team should obey the style (i.e., ent teams. Although most teams tend to apply homogenous strategy) of the team to maximize the team’s chance to win. style, surprisingly, a unique strategy of soccer exists. Specif- Hence, it is crucial to raise youngsters and to sign players ically, FC Barcelona’s famous tiki-taka does not consist of who are capable of playing according to the style of the team. uncountable random passes but rather has a precise, finely Failing to do so not only has an impact on the success but constructed structure. also on the profitability of the club. There are numerous examples of newly signed players who were not compatible with the style of their new clubs [11], therefore, there is a Categories and Subject Descriptors need for a quantitative analysis of team’s style to avoid these G.3 [Probability and Statistics]: Probabilistic algorithms; discrepancies. I.5.4 [Pattern Recognition]: Applications The rareness of goals is the profoundest feature of soc- cer that distinguishes it from other team sports. Although the teams have 11 players and 90 minutes to score in each General Terms match, it is not unusual to have a goalless draw as the final Algorithms,Theory score [1]. These results are not solely due to a spectacular performance of the goalkeepers but rather a consequence of Keywords the low number of scoring chances. Hence, metrics related to scoring cannot describe the style (i.e., the strategy) of a soccer analytics, network motifs, pass network soccer team. Passes, on the other hand, happen numerously in every 1. INTRODUCTION game irrespective of the quality of the teams. The pass net- arXiv:1409.0308v1 [cs.SI] 1 Sep 2014 work of a soccer team consists of the players as vertices and Although the basic rules of soccer have barely changed 1 the passes between the players as the edges. Prior art fo- since the 1920s , they intrinsically enabled teams to develop cused either on the high-level statistics of the pass networks distinctive strategies that dominated the soccer landscape (e.g., betweenness, shortest paths) or the strength of the for several years (e.g., the Hungarian and the Dutch team connection between pairs of players [2, 8, 6, 4]. These met- of the 50s and 70s) [12]. However, in these days the possi- rics describe the static properties of a pass network, e.g., bilities to prepare against these teams and thus ruin their the metrics aggregate all the passes into one network, ne- 1 the introduction of the two-opponent version of the offside glect the order of passes, etc. On the contrary, we focus on rule the dynamic aspects of the pass networks by examining the “flow motifs” of the teams. We propose the concept of “flow motifs” to characterize the statistically significant pass sequence patterns. It ex- tends the idea of the network motifs, highly significant sub- Permission to make digital or hard copies of all or part of this work for graphs that usually consists of three or four nodes, suggested personal or classroom use is granted without fee provided that copies are by Milo et al. [5] that mainly apply to the static complex not made or distributed for profit or commercial advantage and that copies networks (e.g., food webs, protein-structure networks, and bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific social networks). We extend their work towards “flow mo- permission and/or a fee. tifs”to analyze pass networks that are highly dynamic and in Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. which the order of connections is important. Our methodol- Motif: ABAC ogy starts with the extraction of the passing sequences, i.e., 7.5 the order of players whom the ball traversed. Afterwards, we determine computationally the significance of the different k-pass-long motifs in the passing style of the teams. Our flow 5.0 motif profile focuses on how ball traverses within a team. We not only count the number of passes, but also check which 2.5 players are involved in, and how they organize the flow of Z score passes. Based on these computed flow motif profiles, we fi- nally cluster the teams. To the best of our knowledge, our 0.0 study is the first of its kind that investigates motifs in soccer passing sequences. Our contribution is twofold: 1. we propose a method to quantify the motif character- istics of soccer teams based on their pass networks, Sevilla Getafe Málaga and Levante Mallorca Valencia Osasuna Espanyol Barcelona Real Betis Real Madrid Granada CF Granada Celta de Vigo Athletic Bilbao Real Sociedad Real Zaragoza Real Valladolid Rayo Vallecano Rayo Atlético de Madrid 2. we identify similarities and disparities between teams Deportivo de La Coruña and leagues using the teams’ motif fingerprints. In the recent decade, several data-provider companies and Figure 1: The prevalence of the ABAC motif in websites have arisen to annotate soccer matches and to pub- case of the teams of the Spanish first division (me- lish soccer datasets. For example, such initiatives include dian, quartiles) with respect to their z-scores. FC Prozone [9], OptaPro [7], Instat Football [3], and Squawka [10], Barcelona applies the ABAC motif much more fre- among others. The prevalence of data-providers enables us quently than any other team in the league. to take a data-driven, quantitative approach to identify the styles of the soccer teams. We focus on the 2012/13 seasons of major European soccer leagues and analyze the passing where Tmax denotes the time threshold between two passes. strategies of the teams throughout the whole season. These constraints assure that the passes are consecutive (i.e., a player receives the ball and then passes it forward) and 2. METHODOLOGY not having major breaks. Throughout our study, we use Tmax = 5sec to determine if two passes are belonging to the The “flow motifs” of a pass network, in which players are same ball possession. linked via executed passes, consist of a given number of con- Third, we extract all the three-pass long sub-possessions secutive passes, namely, an ordered list of players who were from the ball possessions (e.g., a ball possession having n involved in the particular passes. Throughout this paper we passes contains n − 2 motifs) and convert the player identi- focus on motifs consisting of three consecutive passes, how- fiers into the appropriate A, B, C, and D labels to assemble ever, it is straightforward to generalize our methodology to the motifs. For example, a ball possession where the ball investigate motifs with fewer/more passes. Our methodol- moves between players as 2 → 4 → 5 → 6 → 4 → 6 trans- ogy relaxes the identity of the involved players, i.e., it does lates into three motifs, namely, ABCD, ABCA, and ABCB: not differentiate motifs based on the names of the players, rather focuses on the certain structure of the passes. There z ABCA}| { are five distinct motifs when we analyze three-pass long mo- 2 → 4 → 5 → 6 → 4 → 6 | {z } tifs: ABAB, ABAC, ABCA, ABCB, and ABCD. For exam- ABCD ple, the motif ABAB denotes the following pass sequence: After having the motifs that are present in the pass net- first, player 1 passes to player 2; second, player 2 passes work, we quantify the prevalence of the motifs by comparing the ball back to player 1; and finally, player 1 passes again the pass network of the team to random pass networks hav- to player 2. If a similar pass sequence happens between ing identical properties (in particular, the number of vertices player 3 and player 4 the identified motif is ABAB again and their degree distribution). Specifically, we perturb the (i.e., the crucial characteristic is what happened and not labels of the motifs prevalent in the original pass network between whom). randomly and such we create pseudo motif-distributions. In Our methodology quantifies the prevalence of the flow our data analyses, we generate 1000 random pass networks motifs in the pass networks compared to random networks for each original pass network. Finally, we compute the z- whose degree distribution is the same. To achieve this, we scores (a.k.a. standard scores) of the motifs by comparing start with a list of passes that a team made during a match.