<<

Predicting Trajectories in

Namhoon Lee and Kris M. Kitani The Robotics Institute, Carnegie Mellon University [email protected], [email protected]

Abstract

Predicting the trajectory of a wide receiver in the game of American football requires prior knowledge about the game (e.g., route trees, defensive formations) and an ac- curate model of how the environment will change over time (e.g., opponent reaction strategies, motion attributes of players). Our aim is to build a computational model of the wide receiver, which takes into account prior knowledge about the game and short-term predictive models of how the environment will change over time. While prior knowledge of the game is readily accessible, it is quite challenging to build predictive models of how the environment will change over time. We propose several models for predicting short- term motions of opponent players to generate dynamic input features for our wide receiver forecasting model. In par- ticular, we model the wide receiver with a Markov Deci- sion Process (MDP), where the reward function is a linear combination of static features (prior knowledge about the game) and dynamic features (short-term prediction of oppo- nent players). Since the dynamic features change over time, we make recursive calls to an inference procedure over the Figure 1: Our approach forecasts viable trajectories for the MDP while updating the dynamic features. We validate our wide receiver in American football. technique on a video dataset of . Our results show that more informed models that accurately predict the motions of the defensive players are better at must also be hallucinated. Recent work on forecasting ac- forecasting wide receiver plays. tivity [15] has been limited to static environments where the reward function defined over the state space does not change (i.e., objects in the scene are assumed to be im- 1. Introduction movable), making it easier to reason about future action se- quences. Predicting the future becomes difficult if the en- The task of analyzing human activity has received much vironment keeps changing over time, since the agent will attention in the field of computer vision. Among the many have to take into account the possible repercussions of his sub-fields dealing with human activities, we address the actions on others. This is especially true in sports scenar- problem of activity forecasting which refers to the task of ios: an offensive play will evolve differently depending on inferring the future actions of people from visual input [15]. the response of the over time. In this work, we fo- Human activity forecasting is a different task from that of cus on forecasting human activity in the dynamic domain recognition, detection or tracking. Vision-based activity of sports by iteratively predicting short-term changes in the forecasting aims to predict how an agent will act in the fu- environment. ture given a image of the world. To estimate future changes in the environment (the de- Forecasting human activity in dynamic environments fense), we utilize two different short-term dynamics pre- is extremely difficult because changes in the environment diction models: (1) a non-linear model exploiting Gaussian WR Static Environment Dynamic Environment CB

goal goal

WR WR CB CB American football video t Trajectories of football players (a) Registration of an American football video (b) Static Environment VS Dynamic Environment

Figure 2: (a) Registration of an American football video onto a top- view ground field. Trajectory of the wide receiver (cyan) and the (magenta). (b) In static environment, the wide receiver (WR, cyan dot) will proceed straight to the goal (⋆) since defender blocking in his path is set immovable. In dynamic environment, however, the wide receiver makes a detour to avoid collision with the cornerback (CB, magenta triangle). process regression (GPR) and (2) a linear model based on a agent motion using a Kalman Filter based on reciprocal ve- constant motion constraint on human motion (CM). To pro- locity obstacles. Pellegrini et al. [22] used the stochastic cess the output of the short-term dynamics predictors, we linear trajectory avoidance model to predict pedestrian tra- develop a sequential inference algorithm that incrementally jectories. Walker et al. [28] proposed visual prediction us- forecasts future actions of the wide receiver based on the ing mid-level visual patches. Urtasun et al. [27] proposed updated set of short-term predictions of defensive motion. a form of Gaussian process dynamical models for learning We stress here that we only hallucinate defensive motion human motion, and the authors in [12] used Gaussian pro- and we do not use true observations of the defense (the pro- cess regression to learn motion trajectories, classify the pat- posed algorithm does not have access to the future). Our tern and detect anomalous events. In [17, 10, 29], the future approach enables forecasting in dynamic environment and human activities are predicted with different approaches in outperforms the standard state-of-the-art forecasting model different domains. Despite previous work to understand hu- in predicting wide receiver activity in terms of both qual- man activities in sports and social scenes, we are more in- itative and quantitative measurements. To the best of our terested in forecasting activity by modeling the response of knowledge, this is the first work to predict wide receiver the players to predict changes in the scene. trajectories from images, by taking into account the incre- Recent research has also shown that predictive models mental dynamics of defensive play. can be inferred using inverse reinforcement learning[1, 24, The contributions of this work are summarized as fol- 21, 23, 15, 7, 2]. Several works on modeling agent behavior lows: (1) development of short-term dynamics prediction has been presented including maximum margin based pre- models that accurately predict changes in the defense, (2) diction [26], and combining prior knowledge and evidence use of a dynamic state reward function that seamlessly in- to derive a probability distribution over the reward functions tegrates multiple short-term forecast distributions during in- [24]. Also, based on the principle of maximum entropy ference, and (3) the application of visual activity forecasting [9] probabilistic approaches to inverse optimal control have for predicting wide receiver trajectories. also been developed [31, 32, 30]. Our work is most similar to [31, 15] in that we use a MDP based on the maximum 1.1. Related work entropy inverse reinforcement learning. Our main focus is, however, to enable forecasting in dynamic environment us- Human activities in sports scenes have been analyzed ing predicted features over future unobserved states. through a variety of computer vision techniques for differ- ent purposes. Intille and Bobick [8] proposed a recogni- 2. Overview: American Football tion algorithm for identifying team-based action in the do- main of American football. Laviers and Sukthankar [18, 19] In American football, the wide receiver is an offensive also modeled opponent players of American football suc- position that plays a key role in passing plays. In order to cessfully, but they used real-time responses to learn policies succeed in passing plays, the wide receiver attempts to re- in an artificial game setting. Previous work [11, 13] used ceive the from the , while avoiding or out- motion fields to predict play evolution in dynamic sports running defenders (typically or safeties) along scenes and predicted global centers of focus. Lucey et al. his pass route. A cornerback is a defensive role that covers [20] effectively discovered team formation and plays using the wide receiver and attempts to block any passing plays. the role representation. For the purpose of predicting agent It is vital for the wide receiver to avoid the defense and ad- motion, not necessarily in sports, Kim et al. [14] predicted vance as far as possible. Since the movement of the wide receiver can be different depending on the defenders, pre- opponent reaction L y dicting the wide receiver trajectories should take into ac- lt+1 count the possible changes to the environment (see Figure lt+2

2b). To simplify our problem, we build our proposed pre- control points ci lt+k Training diction model based on an assumption that the cornerback examples (i) (i) is the primary defender affecting the wide receiver’s future WR C {(x ,y )} c activity and contributes to the changes in the environment centroid 0 ~c1 Transformation Normalization (i.e., generates dynamic features). In our scenario, the op- ~c2 x ponent (CB) forms a negative force field which repells the ~c wide receiver. CB n We describe how dynamics of the environment is pre- dicted in Section 3. Then, using the proposed methods to Figure 3: Creation of a training example used in non-linear predict the dynamic features (i.e., defensive motion) we per- feature regressor. Both wide receiver (cyan) and cornerback form sequential inference for predicting wide receiver tra- (magenta) are contained in x implicitly to predict the oppo- jectories in Section 4. We present our comprehensive ex- nent reaction. perimental results in Section 5 along with an application to sports analytics in Section 6. up on the two trajectories of wide receiver and cornerback: −−→ ~ci = c0ci, where i = 1,..,n. Then it is rotated to a base- 3. Prediction of the Dynamic Environment line to make the training example rotation-translation in- variant (see Figure 3). Since the absolute coordinates and 3.1. Non•linear feature regressor orientations of the trajectories vary for each scenes, this One approach of predicting opponent reaction is to use sort of data calibration is necessary. The training label y is a regressor trained in a supervised manner. Instead of as- a vectorized future locations of the cornerback for k next suming a simple parametric model which lacks expressive time steps. Having a set of N training examples of the power, we propose to use a finer approach, Gaussian pro- form {(x(1),y(1)),..., (x(N),y(N))}, we created the train- cess regression (GPR), which gives more flexibility in rep- ing samples [x, y] to perform GPR. Note that x of the train- resenting data [25] as well as variance estimates that we ing examples contains the information of both wide receiver use for generating dynamic features. Fully specified by its and cornerback implicitly (i.e., all control points from both mean function m(x) and covariance function k(x,x′), the WR and CB) in an attempt to estimate the future opponent Gaussian process generates distribution over functions f as locations and create the dynamic features, based on the in- follows, terplay between wide receiver and cornerback. Using the f ∼ GP(m(x),k(x,x′)) (1) covariance of the prediction from the regression, we esti- mate the area (i.e., 95% confidence interval) of the oppo- where we used linear model for the mean function m(x)= nent reaction as the dynamic feature. wx + b, and an isotropic Gaussian with noise for the co- 2 (x−x′) ′ 2 2 ′ variance function k(x,x ) = σyexp(− 2l2 ) + σnδii . 3.2. Linear (constant motion) feature regressor The hyperparameters {a,b,σy,σn,l} are obtained by a gra- dient based numerical optimization routine. Given some Since the motion of a player is unlikely to change dras- noisy observations, Gaussian process regression not only tically in a short period of time, one natural approach of fits the training examples but also computes the posterior estimating the opponent reaction in the near future is to di- that yields predictions for unseen test cases, rectly model the human activity using instantaneous speed. Specifically, the velocity of a player at current state is deter- T −1 T −1 y∗|y ∼N (µ∗ +Σ∗ Σ (y − µ), Σ∗∗ − Σ∗ Σ Σ∗) (2) mined by averaging over the velocities in the past frames. Then we can estimate the location of a player in the future where µ and µ is the training mean and the test mean re- ∗ using the velocity at current state computed as follows, spectively, and Σ is training set covariance, Σ∗ is training- test set covariance and Σ∗∗ is test set covariance. For the t−1 known function values of the training cases y, the predic- k lt+k = lt + Vi (3) nf tion y∗ corresponding to the test inputs x∗ is the predicted i=Xt−nf position of the cornerback in our case. Given the GPR model, we collect the training samples where lt is the location of a player (represented in 2D) at (i) (i) x and labels y as follows. We constructed x of train- time t, lt+k is the location of a player in k time step later ing examples firstly by concatenating all the relative dis- from the current state, nf is the of past frames, and tance vectors pointing from the centroid c0 of the trajec- Vi is the velocity at frame i. Thus, the k next time step tory pair to the control points ci that are uniformly picked future location of the opponent player is estimated by k multiple of the average velocity per frame plus the current entropy of the observed data, location. Depending on the number past frames used, the ∗ average velocity will be different as well as the estimated θ = argmaxθ L(θ)= argmaxθ log p(ζ; θ) (6) locations of a player. The estimated locations of a player Xζ is then Gaussian filtered to form the area of the opponent reaction as the dynamic feature. where L(θ) is the log-likelihood of the observation which is the trajectory of the wide receiver in our case. The gradient 4. Forecasting in Dynamic Environments of the log-likelihood is expressed by the difference between the empirical mean feature counts ¯f, which is the sum of the 4.1. Maximum entropy inverse optimal control feature counts over all demonstrated trajectories ζ¯ and the ˆ A Markov decision process (MDP) is a mathematical expected mean feature counts fθ over the sampled trajec- ˆ model for decision making process [3, 6] in a form of tories ζ from forecast distribution, which is represented by Markov chains defined with state s, action a, transition state visitation frequency D(s) as follows, ′ probability ps and initial distribution p s . As a stochas- s,a ( 0) ¯ F tic process, an agent takes an action to move to another state f = (s), Xζ¯ Xs∈ζ¯ with the transition probability and accordingly takes reward (7) r(s) for that action. In our problem domain, the location of ˆfθ = p(ζs; θ) F(s)= D(s)F(s). an agent (i.e., wide receiver) from a top-down view in world Xζˆ Xs∈ζˆ Xs coordinate represents the state s =[x,y] and the movement to an adjacent state near the current state is defined as action As the expected feature count matches to the empirical fea- a for the agent. The goal of MDP is to find a policy π for the ture count, the learner aims to mimic the demonstrated be- agent, which maximizes the function of cumulative rewards havior [1]. for a sequence of actions. The reward of a path R(ζ) is the 4.2. Sequential inference in dynamic environment sum of all state rewards r(s) which is the sum of weighted feature responses F(s) ∈ℜk along the path, In dynamic environment where dynamic features come from the opponent reaction, the state reward should change R(ζ; θ)= r(s; θ)= θ · F(s) (4) over time as players move during the play. We thus define Xs∈ζ Xs∈ζ the state reward as a linear combination of the static state where the path ζ is a sequence of states, θ is a vector of reward and the dynamic state reward as the following, reward weights, and the reward of a state r(s; θ) represents r s θ r s θ r s θ (8) the immediate reward received at a state s. The feature re- t( ; )= ( ; s)+ t( ; d) sponses F(s) = F1(s), F2(s),..., Fk(s) are a series of where rt(s; θ) represents the time-varying state reward at different types of features where each feature response is time t while θs and θd are the reward weights of the static represented as a 2D feature response map over the football features and the dynamic features respectively. field. In this work, we not only model static features, but Unlike the standard forecasting approaches in static en- also dynamic features to represent the high cost (e.g., low vironments where they performed a single long-term fore- reward) region which is attributed to the opponent reaction casting [15], we perform multiple short-term predictions se- that changes over time. quentially while updating the dynamic feature. Note in this As the reward weights θ is typically unknown, in in- process that we do not use any real-time observations to es- verse reinforcement learning (IRL) or inverse optimal con- timate the dynamic features. It is our interest to perform a trol (IOC) it is attempted to recover the reward weights from long sequenced short-term forecasts based on the predicted demonstrated examples such that the agent model generates dynamic features. action sequences similar to a set of demonstrated examples To be more precise, the forecast distribution is expressed [1]. In maximum entropy inverse reinforcement learning as a state visitation frequency D(s) which is computed by [31], the distribution over a path ζ is defined as, propagating the initial distribution D(s0) using optimal pol- P θ·F(s) eR(ζ;θ) e s∈ζ icy. In our case we have a short-term forecast distribution p(ζ; θ)= = (5) D˜ (t)(s) at each time step t, which changes over time and Z(θ) Z(θ) gradually produces the final forecast distribution. In this where Z(θ) is the normalizing partition function, showing process, we pass the previous short-term forecast distribu- that a path with higher rewards is exponentially more pre- tion as input to the next inference cycle so that the fore- ferred. With the objective of recovering the optimal reward casting proceeds without discontinuity. It is also necessary function parameters θ∗, we perform an exponentiated gra- to have the predicted short-term trajectories of the wide re- dient descent to maximize the likelihood of the maximum ceiver and cornerback, ζ˜WR and ζ˜CB respectively, for the Algorithm 1 Sequential Inference in Dynamic Environment In the , we propagate the initial distribution Input: θ∗ (optimal reward weight) D(s0) using the policy obtained during the backward pass. Ouput: D(s): forecast distribution The state visitation frequency D(s), or the forecast distri- bution, is then computed by taking the sum of all the prop- 1: ζ˜WR ← ζWR, ζ˜CB ← ζCB, and D(s0) ← 1 2: repeat agated distribution at each step. Next, the expected feature ˆ 3: Backward pass: count fθ is computed by multiplying D(s) by the feature re- sponses f(s). The two-step algorithm repeats until the ex- 4: Estimate Ft(s) with ζ˜WR and ζ˜CB ∗ ∗ ∗ pected feature count ˆf matches the empirical feature count 5: rt(s; θ )= r(s; θ )+ θ · Ft(s) θ s d ¯ λ∇L(θ) 6: for n=N,...,2,1 do f using the exponentiated gradient descent, θ ← θe , (n) where the gradient θ is the difference between ¯f and 7: V (sgoal) ← 0 ∇L( ) θ ∗ (n) ′ (n) ′ ˆf , and λ is the step size (Eq. 6 and 7). 8: Q (s,a)= rt(s; θ )+ Eps [V (s )] θ s,a During the test phase, we use the optimal reward weight 9: V (n−1)(s)= soft max Q(n)(s,a) a θ∗ learned in the training phase to compute the reward func- 10: for tion, and the backward-forward pair repeats until the wide 11: π (a|s) ← eQ(s,a)−V (s) θ receiver reaches to the destination resulting in the final fore- 12: Forward pass: cast distribution (see Algorithm 1). Unlike the standard 13: for n = 1, 2,...,Ns do (n) forecasting approach which takes a single long-term infer- 14: D (sgoal) ← 0 (n+1) s ′ (n) ′ ence, our algorithm is designed to perform multiple short- 15: D (s)= ps′,aπθ(a|s )D (s ) ′ term forecasts incrementally while updating dynamic fea- sP,a (n+1) tures until it reaches to the goal to enable forecasting in dy- 16: ζ˜WR ← max(D˜ (s)) 17: end for namic environment. 18: D(s)= D(n)(s) n 5. Experimental Results 19: D(s0) ←PD(s) ˆ F 20: fθ = s (s)D(s) We used the OSU Digital Scout Project’s football dataset 21: until WRP reaches to the goal which includes the tracking results of all 22 players with the labels of player positions [5]. Using the registration matri- ces provided, we performed a perspective transform to reg- next inference. From the forecast distribution at each time ister the players from the top-down view in the football field step, the state with the maximum probability is chosen to be [4] as seen in Figure 2a. The dataset contains 20 videos of the predicted location for the wide receiver. The predicted passing plays (each video contains only one passing play). locations altogether form the predicted short-term trajectory 10 videos are selected based on its lengths (several pass- ˜ of the wide receiver ζWR. Then combined together with ing plays were too short to analyze, e.g., 2 seconds). We the estimated opponent location, they become the input of performed leave-one-out cross-validation on the selected 10 the next forecasting cycle. This sequential inference yields videos to prevent overfitting to a specific split of the dataset. the final forecast distribution which is no shorter in predic- Feature maps: We used five types of features repre- tion length than the current state-of-the-art [15], but enables sented as a feature map: (1) linear distance to the , forecasting in dynamic environment. (2) cornerback at the start of the play, (3) initial formation 4.3. Algorithm of the defense, (4) wide receiver route tree, and (5) our pro- posed short-term dynamics feature. These features are illus- Solving for the reward function reduces to the problem trated in Figure 4. The linear distance features aims to en- of recovering the optimal reward weight θ∗ that best de- code the logic that there is more incentive to press forward scribes the demonstrated examples. This is achieved dur- as the WR is closer to the goal line. The wide receiver route ing the training phase, through an alternating two-step al- tree encodes the prior belief that the WR is more likely to gorithm called the backward pass and the forward pass. follow standardized paths. The feature maps are normalized In the backward pass, we firstly generate the dynamic such that the reward values range from [0, −1]. feature Ft(s) using ground-truth observations, and then Metrics and measurements: There are 4 different met- iteratively compute the state-action log partition function rics used to evaluate the forecasting result. First, we used Q(s,a) and the state log partition function V (s) through the Kullback-Leibler divergence (KLD) [16] which mea- a value iteration [3]. The soft value V (s) is the expected sures the difference between two probability distributions cost to the destination from a state s, and Q(s,a) is the P and Q defined by value at one step later after taking action a from the current state. Then the policy πθ(a|s) is computed by taking the P (i) DKL(P ||Q)= P (i) (9) exponentiated difference between Q s,a and V s . Q(i) ( ) ( ) Xi 40 static 35 dynamic_CM dynamic_GPR 30 25 (a) Linear distance (b) Cornerback 20 15 10 5

Error of predicted feature (pixel) feature predicted of Error 0 0 5 10 15 20 25 30 35 40 45 50 55 (c) Defense (d) Route tree Time (frame) (a) Average prediction errors 5 7 Duration of prediction: 5 frames Length of input: 20 frames 4.5 Duration of prediction: 10 frames 6 4 5 3.5 3 4 2.5 (e) Dynamic feature (f) Reward map 3

2 Error of GPR (pixel) GPR of Error Error of GPR (pixel) GPR of Error 2 Figure 4: (a)-(e) Five types of feature response maps for an 1.5 1 1 10 15 20 25 30 0 5 10 15 20 example scene. (f) A reward map computed using feature Length of input (frame) Duration of prediction (frame) response maps with corresponding reward weights learned (b) Error by # of examples (c) Error by # of predictions in training phase. Red region indicates the states with high 3 50 WR CB reward, and the proceeds to the right in this scene. 2.8 40

2.6 30 where the true data P is the ground-truth trajectory normal- 2.4 20

Error of GPR (pixel) GPR of Error 2.2 10 ized by its length, s∈ζ P (s) = 1, and the approxima- (pixel) goal to Distance 2 0 tion Q is the forecastP distribution. Second, we measured 5 10 15 20 0 5 10 15 20 25 30 35 40 45 50 55 the physical distance between the trajectories sampled from The number of control points Time (frame) the learned distribution (100 samples in our experiment) (d) Error by # control points (e) Distance to goal and the ground-truth trajectory in two ways: one being the Figure 5: (a) Average prediction errors of static and dy- Euclidean distance (L2) and the other being the modified namic features. The prediction errors in dynamic environ- Hausdorff distance (MHD). The MHD allows to compare ment are computed by the sequential inference procedures, the best corresponding local points within a small tempo- while the error in static environment is computed by setting ral window in two trajectories through local time warping. the initial location of cornerback fixed throughout the play. The computed distance is divided by the length of trajectory (b-d) Average errors of GPR regressor in varying parame- to accommodate trajectories of different lengths in differ- ters. (e) The distance to goal for the WR and CB. ent scenes. Furthermore, we measured the likelihood of the demonstrated trajectory under the obtained policy by com- reward in locations on the field where there is a defensive puting the negative log-loss (NLL) as follows, player, as seen in Figure 4e. Since the forecasted trajec- tory of the wide receiver depends critically on the accurate NLL(ζ)= E − log πθ(a|s) . (10) prediction of the defense, it is important to evaluate the ac- πθ (a|s)  sY∈ζ¯ curacy of the short-term dynamics predictors (dynamic fea- tures). The errors of three different models of the defense The distribution divergence (KLD) and the physical dis- (the cornerback) are plotted in Figure 5a. The first model is tances (L2 and MHD) measure how much a forecast result the naive static assumption, where the wide receiver plans is different from the demonstrated example. In case of the a path based only on the initial location of the cornerback. negative log-loss (NLL), we measure how likely the demon- The second predictive model CM assumes that the corner- strated example can be, under the optimal policy learned will maintain constant velocity (measured at the be- through the proposed algorithm. ginning of the play). The third predictive model GPR uses a 5.1. Evaluation of short•term dynamics predictors Gaussian process regressor to incrementally predict the po- sition of the cornerback. The GPR-based predictive model The dynamic features are the predicted (hallucinated) lo- performs the best but has increasing error due to the accu- cations of the moving defense. The feature map has a low mulation of error over time. atu ni h oeth estecoett h iere- wide the to closest the gets impor- he more moment is meaning the player until goal, opponent up the the tant of to predictions direction the the that in he straight cornerback, the runs outruns receiver merely wide the once that Note ue ( tures 10 only use to chose we thus points. overfitting, control to con- many led too re- points using wide trol that found the We of cornerback). and trajectories ceiver hallucinated (points the points control from of sampled number shows the 5d increasing Figure we of effect 5. Therefore the of future). duration the prediction shorter into a farther selected predict to harder is it h lss uigtegm sse n5 ( 5e in become seen cornerback as game and diverging the receiver during start wide closest CM the the and when GPR is that af- error moment erroneous in The more becomes 5a. CM that and ter game, the of middle nu flnt 0frteGR iue5 hw htthe that ( increases shows prediction 5c of Figure duration GPR. the the as increases for error 20 vec- length input of nominal the input of a length only the is ( to there tor respect with that error shows in con- 5b change of Figure number input the points. and of trol prediction, length of duration the the including sample, selections, parameter timal Best (a). approach forecasting proposed regular the the and than (b) (red) trajectories CM receiver form. proposed wide electronic the ground-truth in using the viewed distributions to forecast closer The much match comparison. (c) in GPR results forecasting Selected 6: Figure PLAY3 PLAY2 PLAY1 diinly eosre htteerr fdnmcfea- dynamic of errors the that observed we Additionally, etse u P erso nvrosstig o op- for settings various on regressor GPR our tested We i.e i.e ,prilyhluiae rjcoy,adw sdan used we and trajectory), hallucinated partially ., ,GRadC)aesmlri h einn othe to beginning the in similar are CM) and GPR ., a eua b M(c) GPR (b)CM (a) Regular i.e ,a rm 20). frame at ., i.e ., i re fKD 2 H n NLL). and MHD L2, 24% KLD, and of 17%, order 21%, (in 6%, by errors reducing state- [15], error current of-the-art all the Over outperform forecast other. methods each proposed their the to 5a metrics, analogous be Figure to in out turn seen results receiver as wide cornerback the the until outruns dif- errors little prediction show average GPR is and in scenes CM ference As test 1. all Table over the average in summarized the and metrics different ing are robust. prox- approaches more proposed distribution much the the using field, ground-truth the football to play- the imity the to of respect size with the Considering ers not. ground- fea- does static the forecasting the of ture match that whereas GPR, closely, (red) and trajectories CM truth methods, proposed using area) the (shaded qualitative distributions a forecast for The 6 comparison). Figure assump- (see feature significant is (static forecasting standard tion) per- the the against time, we gain over since environment formance the expected, of As changes the static. consider assumed is environment where [15] the framework forecasting standard a over approach forecasting activity of Evaluation 5.2. play. the of part later the in than rather ceiver h oeatrslsaeas esrdqatttvl us- quantitatively measured also are results forecast The prediction feature our of performance the evaluated We t = 0 t = 10 t = 20 t = 30 t = 40

t = 50 t = 60 t = 70 t = 80 t = 90 Figure 7: A destination forecasting result. The forecast distribution becomes closer to the demonstrated trajectory (red) as more observations are revealed by time. Best viewed in electronic form.

Method KLD L2 MHD NLL 8 Regular [15] 9.05 9.96 8.44 2.05 Ours (CM) 8.48 7.88 7.07 1.55 7 Ours (GPR) 8.51 7.87 7.03 1.58 6

Table 1: Average errors of forecasting approaches measured KLD 5 in different metrics. The proposed methods, CM and GPR, outperform the standard forecasting model in all criteria. 4

3 0 20 40 60 80 100 6. Application to Sports Analytics Time (% to the end) One application of activity forecasting in the sports do- main is its use for sports analytics. Forecasting distributions Figure 8: Average KLD over all test scenes from destination can be used to convey how easy it is for the opponent team forecasting. The error decreases as play comes to the end. to predict offensive motion. It can also be used to under- distribution becomes closer to the true trajectory (red) as stand how players diverge from typical play motion. Using more observations are revealed. The average KLD over all the forecasting framework introduced in this paper, it is pos- test scenes are plotted in Figure 8 as a quantitative measure- sible to update the forecasting distribution while the play is ment. We believe that this type of visual prediction can be in motion. A benefit of observing real data is that it allow us used for sport analytics. to hypothesize about the final goal (destination) of the wide receiver as the play unfolds. This task is called destination 7. Conclusion forecasting. In order to infer the destination of the wide receiver as We have presented a novel approach of forecasting an the play unfolds, we can compute the posterior distribution agent’s activity in dynamic environment by focusing on the over goals using the ratio of partition functions as follows, task of predicting wide receiver throwing plays in American football. The prediction of possible alterations in visual en- p(sd|s0,u1:t) ∝ p(u1:t|sd,s0) · p(sd) (11) vironment has been neglected and an ill-posed problem, yet Vu1: (sd)−V (sd) ∝ e t · p(sd), we proposed to embrace it within an optimal control frame- work via dynamic feature regression techniques that predict where V 1: (s ) is the state log partition function with ob- u t d opponent reactions. We demonstrated that a careful design servation u , V (s ) is the state log partition function with- 1:t d in the strategic role of an agent, plus modeling the changes out any observation, and s refers to the state at the desti- d of environment in a sequential inference scheme enables nation. By switching the role of the start location to the successful play forecasting in dynamic environments. destination, the value function of the potential destinations can be computed. As more observations of both the wide Acknowledgement receiver and the opponent are revealed, the changes in the reward yield the progress toward a goal. This research was supported in part by the JST CREST An incremental visualization of destination forecasting grant and Jeongsong Cultural Foundation. We thank Wen- is shown in Figure 7. It can be seen that the forecasting Sheng Chu for valuable comments on the manuscript. References [17] T. Lan, T.-C. Chen, and S. Savarese. A hierarchical repre- sentation for future action prediction. In Computer Vision– [1] P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse ECCV 2014, pages 689–704. Springer, 2014. reinforcement learning. In Proceedings of the twenty-first in- [18] K. Laviers and G. Sukthankar. A real-time opponent mod- ternational conference on Machine learning, page 1. ACM, eling system for football. In IJCAI Proceedings- 2004. International Joint Conference on Artificial Intelligence, vol- [2] C. L. Baker, R. Saxe, and J. B. Tenenbaum. Action under- ume 22, page 2476. Citeseer, 2011. standing as inverse planning. Cognition, 113(3):329–349, [19] K. R. Laviersa and G. Sukthankarb. Using opponent model- 2009. ing to adapt team play in american football. 2014. [3] R. Bellman. Dynamic programming and lagrange multipli- [20] P. Lucey, A. Bialkowski, P. Carr, S. Morgan, I. Matthews, ers. Proceedings of the National Academy of Sciences of the and Y. Sheikh. Representing and discovering adversarial United States of America, 42(10):767, 1956. team behaviors using player roles. In Computer Vision [4] R. Hess and A. Fern. Improved video registration using non- and Pattern Recognition (CVPR), 2013 IEEE Conference on, distinctive local image features. In Computer Vision and pages 2706–2713. IEEE, 2013. Pattern Recognition, 2007. CVPR’07. IEEE Conference on, [21] G. Neu and C. Szepesvari.´ Apprenticeship learning using pages 1–8. IEEE, 2007. inverse reinforcement learning and gradient methods. arXiv [5] R. Hess and A. Fern. Discriminatively trained particle filters preprint arXiv:1206.5264, 2012. for complex multi-object tracking. In Computer Vision and [22] S. Pellegrini, A. Ess, and L. Van Gool. Predicting pedestrian Pattern Recognition, 2009. CVPR 2009. IEEE Conference trajectories. In Visual Analysis of Humans, pages 473–491. on, pages 240–247. IEEE, 2009. Springer, 2011. [6] R. A. Howard. Dynamic programming and markov pro- [23] S. Petti and T. Fraichard. Safe motion planning in dy- cesses.. 1960. namic environments. In Intelligent Robots and Systems, [7] D.-A. Huang and K. M. Kitani. Action-reaction: Forecasting 2005.(IROS 2005). 2005 IEEE/RSJ International Conference the dynamics of human interaction. In Computer Vision– on, pages 2210–2215. IEEE, 2005. ECCV 2014, pages 489–504. Springer, 2014. [24] D. Ramachandran and E. Amir. Bayesian inverse reinforce- [8] S. S. Intille and A. F. Bobick. Recognizing planned, multi- ment learning. Urbana, 51:61801. person action. Computer Vision and Image Understanding, [25] C. E. Rasmussen. Gaussian processes for machine learning. 81(3):414–445, 2001. 2006. [9] E. T. Jaynes. Information theory and statistical mechanics. [26] N. D. Ratliff, J. A. Bagnell, and M. A. Zinkevich. Max- Physical review, 106(4):620, 1957. imum margin planning. In Proceedings of the 23rd inter- national conference on Machine learning, pages 729–736. [10] Y. Jiang and A. Saxena. Modeling high-dimensional humans ACM, 2006. for activity anticipation using gaussian process latent crfs. In [27] R. Urtasun, D. J. Fleet, and P. Fua. 3d people tracking with Robotics: Science and Systems, RSS, 2014. gaussian process dynamical models. In Computer Vision and [11] K. Kim, M. Grundmann, A. Shamir, I. Matthews, J. Hod- Pattern Recognition, 2006 IEEE Computer Society Confer- gins, and I. Essa. Motion fields to predict play evolution ence on, volume 1, pages 238–245. IEEE, 2006. in dynamic sport scenes. In Computer Vision and Pattern [28] J. Walker, A. Gupta, and M. Hebert. Patch to the future: Un- Recognition (CVPR), 2010 IEEE Conference on, pages 840– supervised visual prediction. In Computer Vision and Pat- 847. IEEE, 2010. tern Recognition (CVPR), 2014 IEEE Conference on, pages [12] K. Kim, D. Lee, and I. Essa. Gaussian process regression 3302–3309. IEEE, 2014. Computer Vi- flow for analysis of motion trajectories. In [29] D. Xie, S. Todorovic, and S.-C. Zhu. Inferring” dark matter” sion (ICCV), 2011 IEEE International Conference on , pages and” dark energy” from videos. In Computer Vision (ICCV), 1164–1171. IEEE, 2011. 2013 IEEE International Conference on, pages 2224–2231. [13] K. Kim, D. Lee, and I. Essa. Detecting regions of interest IEEE, 2013. in dynamic scenes with camera motions. In Computer Vision [30] B. D. Ziebart, J. A. Bagnell, and A. K. Dey. The prin- and Pattern Recognition (CVPR), 2012 IEEE Conference on, ciple of maximum causal entropy for estimating interact- pages 1258–1265. IEEE, 2012. ing processes. Information Theory, IEEE Transactions on, [14] S. Kim, S. J. Guy, W. Liu, R. W. Lau, M. C. Lin, 59(4):1966–1980, 2013. and D. Manocha. Predicting pedestrian trajectories using [31] B. D. Ziebart, A. Maas, J. A. D. Bagnell, and A. Dey. Maxi- velocity-space reasoning. In Algorithmic Foundations of mum entropy inverse reinforcement learning. In Proceeding Robotics X, pages 609–623. Springer, 2013. of AAAI 2008, July 2008. [15] K. M. Kitani, B. D. Ziebart, J. A. D. Bagnell, and M. Hebert. [32] B. D. Ziebart, N. Ratliff, G. Gallagher, C. Mertz, K. Pe- Activity forecasting. In European Conference on Computer terson, J. A. Bagnell, M. Hebert, A. K. Dey, and S. Srini- Vision. Springer, October 2012. vasa. Planning-based prediction for pedestrians. In Intelli- [16] S. Kullback and R. A. Leibler. On information and suffi- gent Robots and Systems, 2009. IROS 2009. IEEE/RSJ Inter- ciency. The annals of mathematical statistics, pages 79–86, national Conference on, pages 3931–3936. IEEE, 2009. 1951.