Valuing On-The-Ball Actions in Soccer: a Critical Comparison of Xt and VAEP

Valuing On-the-Ball Actions in Soccer: A Critical Comparison of xT and VAEP Maaike Van Roy, Pieter Robberechts, Tom Decroos, Jesse Davis KU Leuven, Department of Computer Science ffi[email protected] Abstract Besides soccer, such models were developed in many other sports, including basketball (Cervone et al., 2014), Objectively quantifying a soccer player’s contributions within American football (Romer, 2006), ice hockey (Routley and a match is a challenging and crucial task in soccer analytics. Schulte, 2015; Liu and Schulte, 2018) and rugby (Kempton, Many of the currently available metrics focus on measuring Kennedy, and Coutts, 2016). However, the low-scoring na- the quality of shots and assists only, although these represent less than 1% of all on-the-ball actions. Most recently, several ture of soccer and the small number of on-the-ball actions approaches were proposed to bridge this gap. By valuing how makes quantifying a player’s contributions within a soccer actions increase or decrease the likelihood of yielding a goal, match particularly challenging. Therefore, those models be- these models are effective tools for quantifying the perfor- came popular in soccer analytics only recently, fueled by the mances of players for all sorts of actions. However, we lack availability of more extensive data. an understanding of their differences, both conceptually and Two primary data sources about soccer matches exist that in practice. Therefore, this paper critically compares two such can be used to value actions: event stream data and opti- models: expected threat (xT) and valuing actions by estimat- cal tracking data. Event stream data annotates the times and ing probabilities (VAEP). Both approaches exhibit variety in locations of specific events (e.g., passes, shots, and cards) their design choices, that leads to different top player rankings and major differences in how they value specific actions. that occur in a game. Optical tracking data records the locations of the players and the ball multiple times per sec- ond. While some work exists on valuing actions using track- Introduction ing data (Fernandez,´ Bornn, and Cervone, 2019; Link, Lang, and Seidenschwarz, 2016; Spearman, 2018), the vast major- A fundamental task in soccer analytics is to objectively ity of work focuses on event stream data as it is more widely quantify a player’s performance during a match. Typically, available, both in terms of leagues covered and availability the goal is to summarize a player’s contribution to the team’s to clubs.1 performance using one, or a handful of numbers. This can Broadly speaking, there are three styles of approaches for help inform a variety of different decisions that a club must valuing actions in soccer using event stream data: make in areas such as team selection, opponent scouting, and player acquisition. Furthermore, these approaches facil- Count-based approaches. These techniques (McHale and itate fan engagement as they provide fodder for debating the Scarf, 2007; McHale, Scarf, and Folker, 2012; Pappalardo relative merits of different players (e.g., McHale, Scarf, and et al., 2019) rate players by (1) assigning a weight to each Folker 2012; Decroos et al. 2019) or they can help tell the action type, and (2) calculating a weighting sum of the story of a match (e.g. Decroos et al. 2017a). number of times a player performs each action type (e.g., In recent years, soccer analytics researchers and enthu- pass, dribble, cross, tackle) during a match. The weights siasts have proposed several of these performance metrics are typically learned by training a model that correlates for assessing individual players. Although, the majority of these counts with either the match outcome or the number these metrics focuses on measuring the quality of specific of goals scored (or conceded). action types in a variety of specific game situations, such Expected possession value (EPV) approaches. These as shooting opportunities (Green, 2012), off-ball position- techniques (Rudd, 2011; Mackay, 2017; Decroos et ing (Spearman, 2018), passing (Bransen and Van Haaren, al., 2017b; Yam, 2019; Singh, 2019) divide a match 2018) and set pieces (McKinley, 2018). The latest research into possessions or phases, which are sequences of has attempted to join these models together in a unifying consecutive on-the-ball actions where the same team framework that can value a wide range of action types in possesses the ball. Hence, these models value each action varying game scenarios (Decroos et al., 2019; Yam, 2019; that progresses the ball, typically by seeing how much the Singh, 2019; Fernandez,´ Bornn, and Cervone, 2019). action changed the team’s chances of producing a goal Copyright c 2020, Association for the Advancement of Artificial 1Often, tracking data is not shared across leagues, which makes Intelligence (www.aaai.org). All rights reserved. event stream data valuable for player recruitment purposes. scoring attempt. Conceptually, the vast majority of these implies that soccer games are divided into possessions, approaches can be seen as modeling a possession using a which are periods of the game where the same team has Markov model. control of the ball. Subsequently, each possession can be Action-based approaches. VAEP (Decroos et al., 2019) is discretized in a consecutive sequence of ball-progressing ac- a recent approach that goes beyond the possession-based tions. The key insight underlying xT and similar models is ones by trying to value a broader set of actions and by that players perform these actions with the intention to move taking the action and game context into account. Decroos the game into a state in which they are more likely to score. et al. frame the problem as a binary classification task and These game states directly correspond to the transient states of a Markov model: players transition the game from one rate each action by estimating its effect on the short-term 3 probabilities that a team will both score or concede. state to another by passing or dribbling until absorption (i.e., a goal or possession turnover). Despite the prevalence and importance of these types of Although the game states can be made arbitrarily com- models, the various approaches are rarely, if ever, directly plex (Rudd, 2011; Yam, 2019), xT represents each game compared either conceptually or empirically. In this work, state Si by only considering the location of the ball. There- we will focus on providing such a comparison between fore, xT overlays a M × N grid on the pitch in order to the elegant and popular expected threat (xT) model (Singh, divide it into M · N zones. Each zone z is then assigned a 2019) and the VAEP model (Decroos et al., 2019). We select value xT(z) that reflects how threatening teams are at that these two models as they are canonical exemplars for the last location, in terms of scoring (Figure 1). The value Q(Si) of two styles of approaches. Because the EPV and action-based game state Si = fa1; : : : ; aig is then simply the value of that approaches both focus on rating each individual action based zone corresponding to ai’s end location. The Markov model on properties of the action, they are more closely related in view allows deriving these xT values from historical data by spirit to each other than to the count-based approaches which iteratively solving the following equation: look at aggregated actions without accounting for any aspect M×N of an action’s context. We highlight the key differences in X 0 design choices made by xT and VAEP, which yields differ- xT(z) = sz · xG(z) + mz · Tz!z0 · xT(z ); ent strengths and weaknesses. Qualitatively, we show sev- z0=1 eral illustrative actions where this leads to each formalism where sz is the probability that a player will shoot when in producing different valuations for particular actions. Quan- zone z, xG(z) is the probability of a shot from zone z being titatively, we show that this leads to different rankings of converted into a goal, mz is the probability that a player will players with xT being slightly more correlated with play- move the ball when in zone z, and T is a transition matrix making whereas VAEP tends to favor shooting. Importantly, that defines the probability that the player moves the ball to both rankings deviate from traditional metrics like goals or each of the other zones when in zone z. Intuitively, solving assists per 90 minutes, which shows they give novel insights the equation boils down to looking another action ahead with into player performance. each added iteration. In the first iteration, all xT(z) values are initialized to zero. After iteration i, xT(z) then represents Action Valuing Frameworks the probability of scoring within the next i actions. When considering event stream data, a soccer match can be Subsequently, the model values a successful action ai that 0 viewed as a sequence of n consecutive actions a1; a2; : : : an. moves the ball from zone z to zone z by computing the dif- Each action ai is described by a number of properties such ference between the threat value before and after that action: as its start location, its end location, its start time, and what 0 VxT (ai) = xT(z ) − xT(z): (2) type of action it was. The effect of an action ai is to move the game from state S = fa ; : : : ; a g to state S = i−1 1 i−1 i VAEP fa1; : : : ; ai−1; aig. Consequently, at a high-level EPV and action-based approaches all value actions according to the VAEP uses a much more complex game state representation following equation: than xT. It considers the three last actions that happened during the game: Si = fai−2; ai−1; aig.

Load more