arXiv:1409.0308v1 [cs.SI] 1 Sep 2014 flwmtf”t hrceietesaitclysignificant statistically the characterize to motifs” “flow .. [ I.5.4 oyih 0XAMXXXXX-/XX ...$15.00. X-XXXXX-XX-X/XX/XX ACM 20XX re Copyright lists, fee. cop to To a redistribute and/or to page. or first permission servers the on on post citation to an full republish, advantage thi the commercial and th of or notice provided part this profit fee bear for or without distributed all or granted of made is not copies use hard classroom or or personal digital make to Permission rule ic h 1920s the since INTRODUCTION 1. network pass motifs, network analytics, soccer Keywords Algorithms,Theory Terms General iiist rpr gis hs em n hsri their ruin possi- thus the and days teams these these in However, team against Dutch prepare [12]. the to 70s) and bilities and Hungarian 50s the the landscap (e.g., of soccer years the several dominated for that strategies distinctive . [ G.3 Descriptors Subject and Categories finely precise, a of has consist structure. rather not constructed but does passes tiki-taka random uncountable Sp famous Barcelona’s exists. soccer FC differ- of homogenous ically, of strategy apply unique to styles a tend surprisingly, the teams networks style, most differentiate pass Although the and teams. in compare ent motifs to the us of analysis allows three The of nodes. consists motifs four usually network that the subgraphs of significant idea the highly extends It patterns. sequence nterps ewrs eitouetetecnetof concept the the introduce We based teams networks. soccer pass of their characteristics soccer motif on in method a the style proposing quantify recognizable by to unique, question this a address have We to nowadays? possible it Is ABSTRACT 1 h nrdcino h w-poetvrino h offside the of version two-opponent the of introduction the lhuhtebscrlso ocrhv aeychanged barely have soccer of rules basic the Although rbblt n Statistics and Probability atr Recognition Pattern azoGamt,HeonKwak Haewoon Gyarmati, Laszlo aa optn eerhInstitute Research Computing Qatar 1 {lgyarmati,hkwak}@qf.org.qa hyitisclyealdtast develop to teams enabled intrinsically they , erhn o nqeSyei Soccer in Style Unique a for Searching :Applications ]: :Poaiitcalgorithms; Probabilistic ]: urspirspecific prior quires tews,to otherwise, y tcpe are copies at htcopies that d okfor work s pass ecif- or e , flwmtf”o h teams. the of motifs” “flow h yai set fteps ewrsb xmnn the on examining focus by we networks contrary, pass ne- the the network, On of e.g., one aspects etc. dynamic into network, passes, the of passes pass order the a the all met- of glect These aggregate properties the metrics 4]. static 6, the of the 8, strength [2, describe players the rics of fo- or pairs art networ between paths) pass Prior connection shortest the of edges. betweenness, statistics the high-level (e.g., and the as vertices on players as either players the cused the net- between of pass passes consists The team the teams. soccer the a of of quality work the of irrespective game of strategy) the (i.e., related style metrics team. the Hence, soccer describe cannot chances. spectacular of scoring scoring a consequence to of to a number due rather low final but solely the the goalkeepers not as the each are draw of in goalless results performance a score These have to to [1]. minutes unusual score 90 not Although and is sports. players it match, 11 team have other teams from the it distinguishes that cer a is there the avoid therefore, to style [11], team’s clubs of discrepancies. analysis numerous new quantitative compatible a their are for not of need There were but style who success the club. players the with signed the on newly of impact of an profitability examples has the players only on not team. sign also the so to of style do and the (i.e., to to youngsters Failing according style playing raise of the win. to capable are to obey crucial who chance team’s is should the it team maximize Hence, to a team the of of the players strategy) of esthetics The the from game. apart impacts practical have teams unique, nowadays? a soccer such in Does exist space. style soccer suc recognizable global additionally the sustain is too—in and characteristic—which have cessful short to playing a challenging on unique be game to a soccer seems allow division it first hence, decade any notice; last view to Nowa- the staffs of team reasons. advancements technological technological to due days, limited was strategies ist nlz asntok htaehgl yai n in and mo- dynamic “flow highly towards are complex that work networks static pass their analyze the extend an tifs”to to We networks, apply protein-structure networks). mainly webs, social that food [5] (e.g., al. networks suggest nodes, et four sub- Milo or significant ex three by highly of It consists motifs, usually network that the graphs patterns. of sequence idea the pass tends significant statistically the ass nteohrhn,hpe ueosyi every in numerously happen hand, other the on Passes, soc- of feature profoundest the is goals of rareness The soccer of style the of understanding and identification The epooetecneto flwmtf”t characterize to motifs” “flow of concept the propose We eeoiaResearch Telefonica al Rodriguez Pablo [email protected] ed ks se d a - - which the order of connections is important. Our methodol- Motif: ABAC ogy starts with the extraction of the passing sequences, i.e., 7.5 the order of players whom the ball traversed. Afterwards, we determine computationally the significance of the different k-pass-long motifs in the passing style of the teams. Our flow 5.0 motif profile focuses on how ball traverses within a team. We not only count the number of passes, but also check which 2.5 players are involved in, and how they organize the flow of Z score passes. Based on these computed flow motif profiles, we fi- nally cluster the teams. To the best of our knowledge, our 0.0 study is the first of its kind that investigates motifs in soccer passing sequences. Our contribution is twofold: 1. we propose a method to quantify the motif character- istics of soccer teams based on their pass networks, Sevilla Getafe Málaga

and Levante Mallorca Valencia Osasuna Espanyol Barcelona Real Madrid Granada CF Granada Celta de Vigo Rayo Atlético de Madrid

2. we identify similarities and disparities between teams Deportivo de La Coruña and leagues using the teams’ motif fingerprints.

In the recent decade, several data-provider companies and Figure 1: The prevalence of the ABAC motif in websites have arisen to annotate soccer matches and to pub- case of the teams of the Spanish first division (me- lish soccer datasets. For example, such initiatives include dian, quartiles) with respect to their z-scores. FC Prozone [9], OptaPro [7], Instat Football [3], and Squawka [10], Barcelona applies the ABAC motif much more fre- among others. The prevalence of data-providers enables us quently than any other team in the league. to take a data-driven, quantitative approach to identify the styles of the soccer teams. We focus on the 2012/13 seasons of major European soccer leagues and analyze the passing where Tmax denotes the time threshold between two passes. strategies of the teams throughout the whole season. These constraints assure that the passes are consecutive (i.e., a player receives the ball and then passes it forward) and 2. METHODOLOGY not having major breaks. Throughout our study, we use Tmax = 5sec to determine if two passes are belonging to the The “flow motifs” of a pass network, in which players are same ball possession. linked via executed passes, consist of a given number of con- Third, we extract all the three-pass long sub-possessions secutive passes, namely, an ordered list of players who were from the ball possessions (e.g., a ball possession having n involved in the particular passes. Throughout this paper we passes contains n − 2 motifs) and convert the player identi- focus on motifs consisting of three consecutive passes, how- fiers into the appropriate A, B, C, and D labels to assemble ever, it is straightforward to generalize our methodology to the motifs. For example, a ball possession where the ball investigate motifs with fewer/more passes. Our methodol- moves between players as 2 → 4 → 5 → 6 → 4 → 6 trans- ogy relaxes the identity of the involved players, i.e., it does lates into three motifs, namely, ABCD, ABCA, and ABCB: not differentiate motifs based on the names of the players, rather focuses on the certain structure of the passes. There z ABCA}| { are five distinct motifs when we analyze three-pass long mo- 2 → 4 → 5 → 6 → 4 → 6 | {z } tifs: ABAB, ABAC, ABCA, ABCB, and ABCD. For exam- ABCD ple, the motif ABAB denotes the following pass sequence: After having the motifs that are present in the pass net- first, player 1 passes to player 2; second, player 2 passes work, we quantify the prevalence of the motifs by comparing the ball back to player 1; and finally, player 1 passes again the pass network of the team to random pass networks hav- to player 2. If a similar pass sequence happens between ing identical properties (in particular, the number of vertices player 3 and player 4 the identified motif is ABAB again and their degree distribution). Specifically, we perturb the (i.e., the crucial characteristic is what happened and not labels of the motifs prevalent in the original pass network between whom). randomly and such we create pseudo motif-distributions. In Our methodology quantifies the prevalence of the flow our data analyses, we generate 1000 random pass networks motifs in the pass networks compared to random networks for each original pass network. Finally, we compute the z- whose degree distribution is the same. To achieve this, we scores (a.k.a. standard scores) of the motifs by comparing start with a list of passes that a team made during a match. the original and the constructed random pass networks. As The format of a pass record is a result, we have a characteristic of the (passing) style of a pn =< playeri(n), playerj (n),t(n) > team for every match—in terms of the z-scores of the motifs. where playeri(n) passed the ball to playerj (n) in the t(n) time instance. Second, we derive all the ball possessions 3. DATA ANALYSIS AND RESULTS that a team had. A ball possession < p1,p2,,pn > consists We use publicly accessible information on the pass net- of such passes that fulfill two constraints: works of soccer teams. In particular, the dataset contains information from the 2012/13 seasons of the Spanish, Ital- playerj (m) = playeri(m + 1), ∀m ∈ {1,...,n − 1} ian, English, French, and German first division. For exam- t(m + 1) − t(m) ≤ Tmax, ∀m ∈ {1,...,n − 1} ple, the part of the dataset that contains information on the Motif: ABAB Motif: ABCA Motif: ABCB 7.5 4

10 2 5.0

0 2.5

Z score 5 Z score Z score

−2 0.0

0 Sevilla Sevilla Sevilla Getafe Getafe Getafe Málaga Málaga Málaga Levante Levante Levante Mallorca Mallorca Mallorca Valencia Valencia Valencia Osasuna Osasuna Osasuna Espanyol Espanyol Espanyol Barcelona Barcelona Barcelona Real Betis Real Betis Real Betis Real Madrid Real Madrid Real Madrid Granada CF Granada CF Granada CF Granada Celta de Vigo Celta de Vigo Celta de Vigo Athletic Bilbao Athletic Bilbao Athletic Bilbao Real Sociedad Real Sociedad Real Sociedad Real Zaragoza Real Zaragoza Real Zaragoza Real Valladolid Real Valladolid Real Valladolid Rayo Vallecano Rayo Vallecano Rayo Vallecano Rayo Atlético de Madrid Atlético de Madrid Atlético de Madrid Deportivo de La Coruña Deportivo de La Coruña Deportivo de La Coruña

Figure 3: Z-scores of the ABAB, ABCA, and ABCB motif in the Spanish league.

Motif: ABCD Atlético de Madrid

AthleticCelta Bilbao de Vigo 1.0 0.0

Málaga Espanyol −2.5 0.5 Mallorca

Z score −5.0 Real Madrid Deportivo de La Coruña Osasuna

PC2 0.0 Getafe −7.5 Valencia Real Betis Real Sociedad Sevilla Granada CF −10.0 −0.5 Levante Real Zaragoza Rayo Vallecano

−1.0 Barcelona Sevilla Getafe Málaga Levante Mallorca Valencia Osasuna Espanyol Barcelona Real Betis Real Madrid Granada CF Granada Real Valladolid Celta de Vigo Athletic Bilbao Real Sociedad Real Zaragoza Real Valladolid Rayo Vallecano Rayo Atlético de Madrid

Deportivo de La Coruña −7.5 −5.0 −2.5 0.0 2.5 PC1

Figure 2: FC Barcelona uses the ABCD motif less Figure 4: K-means clustering of the teams in the often than the other teams. Spanish league. One of the four clusters contains only a single team, namely, FC Barcelona that has an unique style based on its passing motifs. Spanish league spreads 20 teams, 380 matches, and more than 250 thousands of passes. We quantify the motif char- acteristics of the teams using the aforementioned dataset. team’s usage of motifs. We use the mean of the z-scores We first present results on the passing styles of teams in the of the five distinct motifs as the features (by averaging the Spanish first division and later on we compare our finding z-scores over 38 matches a team had in the season). After- with the other European leagues and teams. wards, we cluster the teams based on their five-motif long We compare the Spanish teams with respect to their ABAC feature vectors. We use two methods for cluster analysis: motifs in Figure 1. Most of the teams have similar z-scores, k-means and hierarchical clustering. We illustrate the result i.e., apply the ABAC pass motif to comparable extent. How- of the k-means clustering in Figure 4 (the clusters are color- ever, FC Barcelona has a quite distinct strategy: applies coded), where the ratio of the within the cluster and the ABAC motifs significantly more often than the other teams total sum of squares is 90.3%. For example, the cluster that (the difference is at least 2.5 standard deviation). The trend contains Atletico Madrid and Athletic Bilbao, among oth- is similar in case of the ABCD motif; the only difference ers, is characterized by extensive usage of ABAB and ABCA is that the majority of the teams have notably larger z- motifs. While most of the teams are clustered in three ma- scores than FC Barcelona (Figure 2). This means that FC jor groups, FC Barcelona is separated from the other teams. Barcelona applies this motif significantly less frequently than FC Barcelona is the only team in its cluster; hence, it has a the other teams. In general, FC Barcelona uses structured distinctive motif characteristics. motifs (i.e., motifs with more back and forth passes such as The Ward hierarchical clustering algorithm reveals simi- ABAB, ABAC, and ABCB) more often than simpler ones lar trend as shown in Figure 5. Again, FC Barcelona has a compared to other teams. We present the results of the solitary style while the other teams are having resembling remaining motifs in Figure 3. features. The implications of the two clustering schemes are consistent: FC Barcelona had a unique, significantly differ- We next analyze the similarities and the differences of the ent passing style than any other team in the Spanish league. teams’ motif characteristics via cluster analysis. First, for Finally, we take a broader point of view and investigate each team, we construct a feature vector representing the whether the style of FC Barcelona remains unique if we con- admpse u ahrhsapeie nl constructed finely precise, a has uncountabl shed of rather quantita- structure. consist results but Barcelona not Our passes FC does random tiki-taka of years. famous philosophy recent the unique the tively: in the success- seen on quite have light viable—and we also as is com- a ful, surprisingly, soccer to style, of us homogenous strategy apply allows unique to Althoug networks tend teams. pass teams different of most the styles the in anal- differentiate and motifs The pare the structures. passing of their ysis through teams soccer of CONCLUSIONS 5. moti and different motifs, the the in involvement of of players’ prevalence motifs impact the the the pass explore on study (iii) the games (ii) we away matches, condition work, the and (i) of home future results finer- areas: As the ways on reveal three based players. several to address and are motifs to teams There pass plan of of details investigation teams. grained the soccer extend of to motifs flow the WORK FUTURE of 4. usage frequent less of involves feature motif. it distinctive ABCA that The the is in leagues. strategy vast teams Italian Torino’s Juventus—dominant the and and from French diverse with Milan, the properties style Lille, shares a like and has teams teams relegated considered season, I the nearly the of team of majority style. Italian end distinct an the rare, Torino, FCat its that characteristics, maintain surprising pass to is their able in still variation Barcelona teams more more prin- using analyze have motifs we that their Although on analysis. We based component 6 leagues. cipal Figure soccer in teams European the additional show four of teams sider teams. the of groups not does major Barcelona any FC to soccer league. belong the Spanish of the clustering in hierarchical teams Ward 5: Figure 0 5 10 15 epooe uniaiemto oeaut h styles the evaluate to method quantitative a proposed We analyzin of potential the illustrate results presented The

Rayo Vallecano Granada CF Real Betis Mallorca Deportivo de La Coruña Getafe Real Valladolid Osasuna Levante Real Zaragoza Barcelona Málaga Real Madrid Athletic Bilbao Celta de Vigo Real Sociedad Sevilla Valencia Atlético de Madrid

fs. Espanyol h g e t .REFERENCES 6. European a on even Spanish, style unique scale. the a of has leagues. teams Barcelona soccer FC German soccer and of French, English, style Italian, The 6: Figure 1]J Wilson. J. [12] Independent. The [11] Squawka. [10] 1 .Adro n .Sally. D. and Anderson C. [1] 9 Prozone. [9] 8 .L en n .Tuhte ewr theory network A Touchette. H. Pe˜na and L. J. [8] 2 .Dc,J .Wiza,adL .N Amaral. N. A. L. and Waitzman, S. J. Duch, J. [2] 7 OptaPro. [7] 6 .Nrzk,K aaoo n .Yamazaki. Y. and Yamamoto, K. Narizuka, T. [6] 5 .Ml,S hnOr .Izoiz .Kashtan, N. Itzkovitz, S. Shen-Orr, S. Milo, R. [5] Matthews. I. and Roth, J. Carr, P. Oliver, D. Lucey, P. [4] 3 nttFootball. InStat [3]

PC2 −2 0 2 arXiv:1206.6904 strategies. football of analysis 2014. http://www.independent.co.uk/sport/football/news-an Djemba-Djemba Carroll Eric Andy and Torres, Fernando including signings, uniyn h efrac fidvda lyr na in players activity. individual team of performance the Quantifying Wrong 2013. is UK, Football Penguin about Know You Everything ocrTactics Soccer alpsigntok nfobl games. arXiv:1311.0641 football in networks ball-passing position-dependent of properties Statistical 2002. 298(5594):824–827, uligbok fcmlxnetworks. simple complex motifs: of Network blocks Alon. building U. and Chklovskii, D. 2013. mining ACM, data 1366–1374. and pages discovery international Knowledge SIGKDD on ACM conference In 19th data. the spatiotemporal of Proceedings using strategy team Assessing SpVgg GreutherFürth 50510 5 0 −5 Stoke City Real Zaragoza Deportivo de LaCoruña Deportivo Osasuna Fortuna Düsseldorf Fortuna Real Valladolid Levante Sunderland Evian ThononGaillard tp /pooeprs com prozonesports. // http: Cagliari Reading West BromwichAlbion VfL Wolfsburg Borussia Mönchengladbach Borussia tp /ww qak.com squawka. www. // http: tp /otsotpo com optasportspro. // http: Pescara Montpellier Bologna netn h yai:TeHsoyof History The Pyramid: The Inverting Sampdoria 1. FCNürnberg Juventus FC Augsburg Brest Granada CF Rayo Vallecano Rennes Queens Park Rangers 1899 Hoffenheim Bayer 04Leverkusen Norwich City Atalanta Getafe Newcastle United West HamUnited Chievo Ajaccio Mallorca Siena Real Betis 1. FSVMainz05 Hannover 96 Milan SV Werder Bremen Parma Genoa Troyes Nancy Bastia Sochaux VfB Stuttgart ainBos 2013. Books, Nation . Roma Atlético deMadrid Torino Espanyol Hamburger SV lSone PloS Lazio Valenciennes Real Sociedad Palermo Everton Fulham Eintracht Frankfurt Reims Lyon Bordeaux 2012. , 2013. , FC Schalke 04 tp /isafobl.com instatfootball. // http: Udinese Napoli Wigan Athletic Lorient SC Freiburg St Etienne Lille Marseille h os vrJnaytransfer January ever worst The Sevilla Internazionale Valencia Paris SaintGermain Southampton Fiorentina Aston Villa Tottenham Hotspur Catania Borussia Dortmund Borussia Nice Celta deVigo FC Bayern München Toulouse Málaga Real Madrid Athletic Bilbao ()e03,2010. 5(6):e10937, , Liverpool Swansea City Manchester City PC1 . Manchester United h ubr ae Why Game: Numbers The Arsenal ri preprint arXiv Chelsea Science 2014. . 2014. . 2014. . ri preprint arXiv Barcelona , . 2014. . , d-comment/t