<<

2020 24th International Conference Information Visualisation (IV)

Visual Analysis of FIFA Data

Michael Burch Gunter¨ Wallner Eindhoven University of Technology Eindhoven University of Technology Department of Computer Science and Mathematics Department of Industrial Design Eindhoven, The Eindhoven, The Netherlands [email protected] [email protected]

Sergiu Lazar Angelescu Peter Lakatos Eindhoven University of Technology Eindhoven University of Technology Department of Computer Science and Mathematics Department of Computer Science and Mathematics Eindhoven, The Netherlands Eindhoven, The Netherlands [email protected] [email protected]

Abstract—Soccer is one of the most popular in the world, played by thousands of professionals and amateurs every week. Consequently, it is no surprise that it generates an enormous amount of data. In today’s data-driven world it is essential to find an optimal, self-explanatory, way to present the data in a way to be able to derive visual patterns that relate to the underlying data patterns. In this paper, we describe an interactive visualization for analyzing soccer data and identifying patterns, correlations, and insights. We illustrate the usefulness of our approach, especially targeted towards non-visualization experts, by applying it to World Cup data and by discussing potential use cases. Index Terms—Information visualization, Sports analytics, Soc- cer, Non-expert visualization

I. INTRODUCTION

Soccer is obviously the world’s most popular sports. More Fig. 1. Overview dashboard of all the World Cups. than forty percent of the people can be considered to be soccer fans [4]. It is estimated that there are around 250 million soccer players all around the world, including 240 and their performances can then be applied to find insights million amateurs [23]. The global soccer market is huge, that, for instance, can be used for training purposes. comprised of players, teams, leagues, fan clubs, sponsors, All this data is interesting for several stakeholders such as scouts, betting agencies and many others, and all of them trainers, the players themselves, managers, betting companies, interact in myriad ways, generating an enormous amount of customers, and the mass media. However, the time-varying data. The popularity of soccer and the increase in nature of the data and growth in size can make utilizing the capacities can also be tracked through the attendance reports data increasingly challenging. Thus, supporting the analysis of each event (cf. [5]). The first time, one million visitors were with easy-to-understand interactive visualizations, helps to reported in in 1950. Since then, this has tripled, foster and simplify insight generation. In this vein, visual- not just because of additional venues and capacities, but also ization combined with algorithmic data processing and data due to the increased number of participating teams. The record transformation [7], [18] has emerged as a valuable solution. in terms of attendance was set in 1994 in the USA with over Visualization techniques [20] – if they are visually scalable 3.5 million spectators [5] and 32.1 billion viewers world wide and easy-to-understand – are extremely useful when it comes (cf. [15]). to interpreting large amounts of data, not only for experts but The collection of data from soccer games and the calculation especially also for non-experts in visualization. of basic soccer statistics [16] forms a basis for further analyses We propose a visual and interactive dashboard to help and visualizations focusing, for instance, on the evaluation extract insights in soccer data. The dashboard shows relations of player performance during or after a match. Advanced between overall statistics, including goals scored or received, statistics, analyses, metrics, and modeling of soccer players ball possession, and final place by the end of the competitions.

978-1-7281-9134-8/20/$31.00 ©2020 IEEE 114 DOI 10.1109/IV51561.2020.00028 II. BACKGROUND AND RELATED WORK III. DATASETS As data source we used different datasets that contain Several approaches for the visual representation of sports information from all FIFA World Cups. At the beginning the data have been proposed in the past (cf. [10]), including soccer only data that was collected were the teams, the final result, the data (e.g., [12], [16]). However, many of them make use of referee, and the stadium. In the subsequent decades more and either complex techniques (e.g., [2], [8], [24]) that are hard to more data started to be collected, including additional statistics understand for non-visualization experts or they only utilize such as goals and assists, and occasionally, corner kicks and simple infographics (e.g., [16], [17]) that provide limited total fouls. Nowadays, looking at the last World Cup in insights into the data. Moreover, interactions [25] – linking 2018, 36 different match details have been collected for every several graphics such as in multiple coordinated views [11] – single game, including the percentage of successful passes, are typically not well integrated in these visualization tools. the number of fouls committed, tackles, distance covered by Such interactions would, however, make them more easy to the whole team, and the like. Nowadays, a complete dataset use for non-experts in visualization. In a similar fashion, most contains many aspects from each game, events such as passes, of the existing soccer visualizations either focus on the player the players involved in it and the location of the ball on the behavior during a match (e.g., [2]) or statistically represent all pitch, the area covered by each player during the match, or results in an aggregated manner only (e.g., [16]). the location of the shots on the target, as well as the position Even just one match might generate thousands of data points from which a player tried to score. and events, by using cameras, wearable sensors, GPS-packed Based on our goals of designing an interactive and easy- balls, and human analysts. As discussed by Memmert [9], to-use visualization for non-experts in visualization and the there is a huge potential in evaluating this data, especially in resources we found online, we used the following datasets [1]: creating spatial analyses. However, only a fraction of it gets • World Cup matches: Results of all soccer matches from processed nowadays, leaving some of the potential untouched. all World Cups, together with the half-time score for each One reason behind this is that many approaches opt for the game, the referees, and the stadium. most commonly and easily accessible statistics (cf. [17]), and • World Cups: Statistics per World Cup from the countries they do not focus on looking at the data in depth from different that reached the semi-finals, the total number of matches perspectives. and goals scored, and the number of people attending the In addition, different stakeholders involved in soccer, such competition. as clubs, players, supporters, betting agencies, or scouts all • 2018 World Cup statistics: A set with over 36 different have varying objectives and expectations. It is quite a challenge types of statistics for each game from the last World Cup, to have ’one platform for all’ since the tasks to be solved including the final score, the number of yellow/red cards, differ widely. In addition, the range of soccer statistics is the number of corners, and the position from which each quite varied. Among the most prominent statistics websites are goal was scored together with the player who scored. SoccerSTATs [14] and WhoScored [22], which mainly focus on When it comes to the specific data we used in this project, we regular statistical data. However, the latter also has interactive focused on the total number of games won and goals scored visualizations for each game, featuring several perspectives of (first half, second half, total) by each country in all World Cups the match. Consequently, many existing websites such as Soc- until 2018, how many times each country finished a World cerSTATS [14] are mainly designed with a single community Cup in one of the first four places, and the total number of in mind but rarely with multiple ones. Providing several easy- people attending each World Cup. In addition, we decided to to-understand visualizations can be of great help for different emphasize the improvement in terms of the collected data in communities since they can provide a starting point for further the last edition of the World Cup. As such we also used the data exploration based on built hypotheses [7]. total distance covered in kilometers per team, pass accuracy Most dashboards and visualizations of soccer matches share (in percent), ball possession (in percent), and the position from the same core statistical charts showing, for example, the where each goal was scored, by whom, and in what game. number of passes, ball possessions, shots on target, and others. Static measures are usually displayed in a comparison table IV. TASKS or a static chart. While there are innovative and state-of-the- In this section, we discuss the tasks that can be solved with art visualizations [10], most existing ones only focus on a the chosen datasets. We take three different perspectives on few aspects of the data and hence do not provide several the tasks, starting with basic, straightforward ones and then perspectives on it. Another common limitation of many of moving on to more complex analyses. these dashboards is a lack of interaction [25]. Most of them Firstly, we focus on offering an overview of the data, do not offer an interactive way to explore the data and navigate including all teams and World Cups, then narrow it down in it. For example, filtering specific matches or players, sliding to top-performing countries and, finally, look at individual through a timeline of attacks, or watching a visualization of national teams insights, following the overview first, detail-on- the shots on target are possibilities for interactions that could demand scheme [13]. By looking at the dataset of all World improve the analysis of a match [2]. Cup matches since 1930, it is easy to get an overview of

115 the different national teams’ performances. We can deduce conclusions from the number of wins, goals scored, and attendance of each event. Our first and most basic task is to aid the discovery of the most successful teams in the history of World Cups. In addition, because of the detailed dataset of the latest World Cup, we can compare teams in many different ways. Our objective is to be able to determine the best performing teams and to assess their strengths based on different features. Further narrowing the view, we aim to enable the inspection of the goal scoring tendencies of the teams and try to see their strengths and weaknesses through the data. Individual players also play a role at this point, since we know which of them scored goals and from where. This offers the potential to be able to deduce information in terms of team strategies based on goal-related information.

V. V ISUALIZATION DESIGN Fig. 2. Summary of the different dashboards with overview dashboard (top The dashboard was built using Tableau [21] since it is easy left), goal dashboard (top right), statistics dashboard (bottom left), and the main menu (bottom right). to use and provides support for interactivity.

A. Used Techniques • Multiple questions: Interactivity allows to raise multiple Visualizations are effective tools that support the rapid questions per visualization, supporting the comparison extraction of insights due to the fact that they leverage the of results and the break-down of responses according to perceptual abilities of humans for detecting visual patterns specific questions based on different attributes. (cf. [6], [19], [20]). • Focus on detail: Interactivity allows users to zoom into a In terms of chosen visualizations for our dashboard, we visualization, selecting an area of interest, and inspecting started by defining our target group and tasks (see above), this area of the chart in greater detail. Hovering over data followed by setting goals such as providing an opportunity points allows users to get further details about a specific for comparative analysis and highlighting which parameters item. impact the final result of a game the most. Finally, we decided • Non-expert experience: Interactivity is particularly pow- to offer heat maps, tree maps, stacked-, horizontal-, and side- erful by allowing users to select points in a chart in order by-side bar charts, as well as scatter plots since they are to create a summary of the selected data or to filter for commonly considered to be easy to understand, also by non- specific portions. Hence, by using easy-to-apply visual visualization experts. Color is used as the main visual variable concepts the users can focus on more or less information, in the dashboard to determine the most successful teams depending on their information needs. throughout World Cup history. We used a neutral value (light As illustrated in Figure 3 all dashboards include interactions gray color) for average results and increased the saturation such as hovering over different elements, zooming in on dif- towards opposite colors (/red) for diverging data (see, e.g., ferent areas, and selecting multiple elements for comparative Figure 1). analysis. The visual representations are not only interactive To facilitate the analysis of the data from different per- but are also linked with each other, that is, selecting certain spectives, we created three different interactive dashboards graphical primitives applies this selection in all other views in (overview, goals, and statistics) and a main menu that connects which they are visible as well. them together. The main menu forms the entry point and On the highest level, filtering per country is supported and redirects the user to the other three dashboards. Each of the it allows users to go into more details and check different three individual interactive dashboards has a home button statistics about selected countries, also over time. An overview located at the top left corner taking users back to the main of this feature can be seen in Figure 4. screen. An overview of the dashboards can be found in Moreover, an interactive dashboard with all goals scored Figure 2. in the last World Cup was created. It provides information related to the position from where each goal was scored, who B. Interaction and Linked Design scored it, against whom, and the date of the game. Besides We implemented multiple types of interactions to increase the basic interactions outlined above, three additional filters the efficiency of data exploration and information extraction. were implemented. One filter allows the user to select a We mainly follow the interaction principles discussed by Yi et certain player in order to see only the goals scored by this al. [25]. We identified three key advantages of using interactive particular player. The second one enables the user to observe data visualization elements: only the goals scored against a specific team and the last one

116 Fig. 3. Interactions: details-on-demand using tool tips (left), focus using color-coded rows (dark blue row, center), and highlighting (red-colored region, right)

Fig. 4. Overview Dashboard: Geographic regions color-coded according to a specific attribute. Additional charts and textual information provide further details about certain countries.

Fig. 5. Goals Dashboard: All goals scored against in the 2018 World Cup, between the 15th and 23rd of June 2018.

offers the user the possibility to inspect the goals scored in a Fig. 6. Total number of games won by each country across all World Cups. certain period of time, by specifying the start and end date. These filters can also be applied simultaneously, as shown in Figure 5. have played in a final, and only eight countries have won VI. APPLICATION EXAMPLES World Cups at all. Another interesting fact is that the last five In this section we present application examples based on consecutive cups were won by different countries. all World Cups, the top teams in 2018, and related to goal Figure 7 provides an overview of the teams finishing in the scoring. Top 4. It is interesting to see that, for example, and do not have other podium results other than a single A. All World Cups win. The Netherlands came close to winning in 1974, 1978, In our first use case, we explore the results of the 20 World and 2010. Cups held since 1930.1 A glance at Figure 6 shows that four Looking at which teams scored the most goals, it is no countries (the more reddish, the more wins – the more bluish, surprise to see the winners ranking high, with and the less wins) dominated the World Cups with respect to the Brazil far ahead. Germany has scored over 240 goals during number of games won: Germany, Brazil, Argentina, and . the last 90 years. In Figure 10 we can see in the 9th The rest of Western Europe, led by , is closely behind. place in the list of all-time goals, even though the last time Figure 6 also indicates that Asia is quite underrepresented they participated in the event was 1986. However, they were in soccer (grey-colored countries), with only and Korea so successful in the first World Cups that they are still ahead competing. has steady contenders, however, no African of teams being active today. This also highlights a current team has ever played in a final. In fact, only 13 countries limitation of this visualization, as the data is aggregated over time. 1The first event was hosted in , 13 teams participated and it was the only competition that did not have qualifications. B. Top Teams in 2018 Shifting the focus to the most recent World Cup, our objective here is to explore the strengths of the top performing teams and to see how their performance measures relate to

117 Fig. 7. World Cup results per country. Fig. 8. Passing accuracy, ball possession, and Fig. 9. Balls recovered, attempts, and goals. distance covered by all teams.

less so with goals. The data displays their dominance in the game, but also shows the lack of success throughout the . scored the most goals and had the second most attempts after Croatia. is high up in the score list, with half as many attempts as the other top teams. They scored a goal every fourth attempt they made.

C. Goal Scoring Our third and last use case is based on goals scored by a team and on individual players. We start by visualizing the goals scored in the recent World Cup, as depicted in Figure 11. As we have shown in the previous use case, Belgium was the top scoring team. Figure 12 illustrates from which positions they scored their goals from. We can see that all their goals were made from within the area. It can be assumed that Fig. 10. Total number of goals scored by each country across all World Cups. their strategy was to finish the moves as close as possible to their opponent’s goal, and rather go for certainty than having too many attempts from outside the . A similar each other. We started with three features (accuracy of passes, visual pattern is observed if we filter for (not ball possession, and running distance covered) which can be shown), the top scoring player with six goals. These goals seen in Figure 8 (sorted by accuracy). were scored even closer to the net, allowing for the assumption The winner, France, performed mediocre in all three aspects. that the assists he received were well-played. Passing accuracy highly aligns with ball possession, because Another unexpected tendency which can be observed is a majority of possession is by the defenders and mid-fielders, the distribution of goals over the two halves of a match. who can make more pressure-free passes. Saudi Arabia has Figure 13 shows that all top scoring teams have the same a high accuracy, which was most probably due to the reason peculiarity: more goals were scored in the second half than mentioned about ball possession, additionally supported by the in the first. There can be several reasons for this: most soccer fact that three teams are within the Top 4 for both measures. team strategies do not rush scoring a goal and patience plays Possession is also usually negatively associated with running, an important role in the games; pressure also builds up as time because the team that is chasing the ball has to cover more goes on, teams have no choice but to attack as the game gets distance. With an average of 125 kilometers covered in the closer to its end; and lastly, teams get more tired, when playing games, the Russian team ran the most during matches, closely in the second half, which inevitably leads to more mistakes and followed by England and Croatia. However, in general most scoring opportunities. However, more data would be necessary players cover around 10 kilometers during the 90 minutes. to make more in-depth claims about possible reasons. Next, we look at the balls recovered, shot attempts, and total goals scored, as shown in Figure 9 (sorted by total goals). The VII. CONCLUSION AND FUTURE WORK general impression is that more balls recovered result in more In this paper we presented an interactive dashboard for attempts which, in turn, result in more goals. However, good exploring soccer data. The visualizations are designed for non- teams lose possession less often as well. By resorting the view experts, i.e. people with limited knowledge about graphical according to possessions (not shown), we could observe that representations. We illustrated the workings of the dashboard the bottom three teams of this list are Belgium, Germany, and by presenting several use cases using soccer data from World Spain. However, they are at the top with respect to attempts, Cups and provided insights into the data based on several

118 Fig. 11. Origin of goals in the 2018 World Cup. Fig. 12. Belgium’s goals, all in the Fig. 13. Balls recovered, attempts, and goals. penalty area.

perspectives. However, a thorough evaluation of our interac- [9] R. R. Memmert. Match analysis, big data and tactics: Current trends in tive visualization will be necessary to ascertain its use and elite soccer. German Journal of Sports Medicine, 3, 2018. [10] C. Perin, R. Vuillemot, C. D. Stolper, J. T. Stasko, J. , and intuitiveness for non-experts but also for experts, in the best S. Carpendale. State of the art of sports data visualization. Computer case supported by eye tracking [3]. As a future step we would Graphics Forum, 37(3):663–686, 2018. like to take the implementation towards a prediction tool. [11] J. C. Roberts. Guest editor’s introduction: special issue on coordinated and multiple views in exploratory visualization. Information Visualiza- There is a wealth of other insights that could be gained as tion, 2(4):199–200, 2003. well. We would also like to focus more on individual player [12] M. Ryoo, N. Kim, and K. Park. Visual analysis of soccer players and a analysis and prediction of events, however, these require highly team. Multimedia Tools and Applications, 77(12):15603–15623, 2018. [13] B. Shneiderman. The eyes have it: a task by data type taxonomy for extensive datasets which we did not have available for this information visualizations. In Proceedings 1996 IEEE Symposium on work. It would also be intriguing to see differences in insights Visual Languages, pages 336–343, 1996. regarding the national competitions opposed to club leagues. [14] SoccerSTATS. SoccerSTATS, 2019. https://www.soccerstats.com Ac- cessed: August, 2020. Lastly, one of the biggest benefits in visualization is the added [15] C. Solberg, Harry Arne Gratton. Broadcasting the World Cup, pages value of being able to spatially represent the data such that 47–62. Palgrave Macmillan UK, , 2014. underlying patterns in changes in formation, possession, and [16] R. Theagarajan, F. Pala, X. Zhang, and B. Bhanu. Soccer: Who has the ball? Generating visual analytics and player statistics. In 2018 IEEE attack style are clearly visible. Conference on Computer Vision and Pattern Recognition Workshops, CVPR, pages 1749–1757. IEEE Computer Society, 2018. REFERENCES [17] E. R. Tufte. The Visual Display of Quantitative Information. Graphics Press, 1992. [1] World cups statistics, 2018. https://gitlab.com/djh or/ [18] W. M. P. van der Aalst. Process Mining - Data Science in Action, Second 2018-world-cup-stats/blob/master/world cup 2018 stats.csv Accessed; Edition. Springer, 2016. August, 2020. [19] C. Ware. Information Visualization: Perception for Design. Morgan [2] G. L. Andrienko, N. V. Andrienko, G. Budziak, T. von Landesberger, Kaufmann, 2004. and H. Weber. Exploring pressure in . In Proceedings of the 2018 [20] C. Ware. Visual Thinking: for Design. Morgan Kaufmann Series in International Conference on Advanced Visual Interfaces, AVI, pages Interactive Technologies, Paperback, 2008. 54:1–54:3, 2018. [21] R. M. G. Wesley, M. Eldridge, and P. Terlecki. An analytic data [3] T. Blascheck, M. Burch, M. Raschke, and D. Weiskopf. Challenges and engine for visualization in Tableau. In T. K. Sellis, R. J. Miller, perspectives in big eye-movement data visual analytics. In Proceedings A. Kementsietsidis, and Y. Velegrakis, editors, Proceedings of the ACM of the 1st International Symposium on Big Data Visual Analytics, pages SIGMOD International Conference on Management of Data, SIGMOD, 17–24, 2015. pages 1185–1194. ACM, 2011. [4] T. N. Company. World football report - Nielsen, 2018. https://www. [22] WhoScored. Whoscored?, 2019. https://www.whoscored.com Accessed: nielsen.com/us/en/insights/report/2018/world-football-report/ Accessed: August, 2020. August, 2020. [23] M. Wiewiorski, M. Wurm, A. Barg, M. Weber, and V. Valderrabano. [5] FIFA. FIFA World Cup™ comparative statistics 1982- Football/soccer. In Foot and Ankle Sports Orthopaedics, pages 459– 2014, 2014. https://resources.fifa.com/image/upload/ 464. Springer, 2016. comparative-statistics-1982-2010-519730.pdf?cloudid= [24] Y. Wu, X. Xie, J. Wang, D. Deng, H. Liang, H. Zhang, S. Cheng, o3dx8kjlopctz35v460o Accessed: August, 2020. and W. Chen. Forvizor: Visualizing spatio-temporal team formations in [6] C. G. Healey and J. T. Enns. Attention and visual memory in soccer. IEEE Transactions on Visualization and Computer Graphics, visualization and computer graphics. IEEE Transactions on Visualization 25(1):65–75, 2019. and Computer Graphics, 18(7):1170–1188, 2012. [25] J. S. Yi, Y. ah Kang, J. T. Stasko, and J. A. Jacko. Toward a deeper un- [7] D. A. Keim. Solving problems with visual analytics: Challenges and derstanding of the role of interaction in information visualization. IEEE applications. In Proceedings of Machine Learning and Knowledge Transactions on Visualization and Computer Graphics, 13(6):1224– Discovery in Databases - European Conference, pages 5–6, 2012. 1231, 2007. [8] J. L. S. Malqui, N. M. L. Romero, R. Garcia, H. Alemdar, and J. L. D. Comba. How do soccer teams coordinate consecutive passes? A visual analytics system for analysing the complexity of passing sequences using soccer flow motifs. Computers and Graphics, 84:122–133, 2019.

119