Conditional Generative Adversarial Networks for Speed Control in Trajectory Simulation
Total Page:16
File Type:pdf, Size:1020Kb
Conditional Generative Adversarial Networks for Speed Control in Trajectory Simulation Sahib Julka Vishal Sowrirajan Joerg Schloetterer University of Passau University of Passau University of Duisburg-Essen Germany Germany Germany [email protected] [email protected] [email protected] Michael Granitzer University of Passau Germany [email protected] Abstract and autonomous vehicles [3]. However, it remains a chal- lenge due to the subjectivity and variability of interactions Motion behaviour is driven by several factors - goals, in real world scenarios. Trajectory prediction not only needs presence and actions of neighbouring agents, social rela- to be sensitive to several real world constraints, but also in- tions, physical and social norms, the environment with its volves implicit semantic modelling of an agents mobility variable characteristics, and further. Most factors are not patterns, while anticipating the movements of other agents directly observable and must be modelled from context. Tra- in the scene. jectory prediction, is thus a hard problem, and has seen in- Recently we have witnessed a shift in perspective from creasing attention from researchers in the recent years. Pre- the more deterministic approaches of agent modelling with diction of motion, in application, must be realistic, diverse handcrafted features [4–9], to the latent learning of vari- and controllable. In spite of increasing focus on multimodal able outcomes via complex data-driven deep neural network trajectory generation, most methods still lack means for ex- architectures [10–13]. State-of-the-art systems are able to plicitly controlling different modes of the data generation. generate variable or multimodal predictions that are socially Further, most endeavours invest heavily in designing spe- acceptable (adhere to social norms), spatially aware and cial mechanisms to learn the interactions in latent space. similar to the semantics in training data. Most systems can We present Conditional Speed GAN (CSG), that allows con- sufficiently generate outcomes according to the original dis- trolled generation of diverse and socially acceptable trajec- tribution, but lack means for controlling different modes of tories, based on user controlled speed. During prediction, data generation, or to be able to extrapolate to unseen con- CSG forecasts future speed from latent space and conditions texts. Consequently, controlled simulation is a challenge. its generation based on it. CSG is comparable to state-of- Furthermore, most approaches focus on modelling of a the-art GAN methods in terms of the benchmark distance arXiv:2103.11471v1 [cs.CV] 21 Mar 2021 single agent i.e., pedestrians [10] or vehicles [14], and lack, metrics, while being simple and useful for simulation and thus, the modelling of heterogeneous semantic classes. We data augmentation for different contexts such as fast or slow propose that these systems need to be 1. Spatio-temporal paced environments. Additionally, we compare the effect of context aware: aware of space and temporal dynamics of different aggregation mechanisms and show that a naive ap- surrounding agents to anticipate possible interactions and proach of concatenation works comparable to its attention avoid collision, 2. Control-aware: compliant to external and and pooling alternatives. internal constraints, such as kinematic constraints, and sim- ulation control, and 3. Probabilistic: able to anticipate mul- tiple forecasts for any given situation, beyond those in the 1. Introduction training data. Modelling social interactions and the ability to forecast To be able to model the implicit behaviour and predict motion dynamics is pertinent to several application domains especially the sudden, unexpected changes, it is essential such as robot planning systems [1], traffic operations [2], that these systems understand not only the spatial context 1 but also the temporal context. This context should be iden- trajectory forecasting [29, 30], due to their acclaimed suc- tifiable, and adaptable. For instance, in urban simulations, it cess in modelling long sequences, yielding an advantage in is important to simulate trajectories with different character- prediction accuracies over previous deterministic methods. istics specific to the location and time e.g. slow pedestrians In Social-LSTM [29], the authors introduced a grid-based in malls vs fast in busy streets, and so on. Simulations need pooling method, in order to capture local intricate motion to be able to adapt to changing environments. dynamics, and thus introducing spatial sense in these net- In this work, we propose a generative neural net- works. Inspite of their success, all these methods were lim- work framework called CSG (Conditional Speed GAN) iting because of their inability to model multimodal trajec- that takes into account the aforementioned requirements. tories. We leverage the conditioning properties offered by condi- tional GANs [15], to induce temporal structure to the la- 2.1. Generative Models: tent space in sequence generation system inspired by previ- Generative methods, with recent advancements became ous works [10, 12, 16]. Consequently, CSG can be condi- the natural choice for modelling trajectories, since they of- tioned for controlled simulation. CSG is trained in a self- fer distribution learning, rather than optimising on single supervised setting, on multiple contexts such as speed and best outcome. Most related works employ some kind of agent-type in order to generate trajectories specific to those deep recurrent base with latent variable model, such as the conditions, without the need for inductive bias in the form Conditional Variational Autoencoder (CVAE)[23,31] to ex- of explicit aggregation methods used extensively in previ- plicitly encode multimodality or Generarative Adversarial ous works [10,17–22]. The main contributions of this work Networks (GAN) to implicitly do so [10, 12, 32]. A few are as follows: interesting GAN variants have been developed to tackle some of the aforementioned challenges, such as the So- 1. A generative system that can be conditioned on agent cial GAN [10], which can produce multiple socially ac- speed and semantic classes of agents, to simulate mul- ceptable trajectories, and encouraged multimodal genera- timodal and realistic trajectories based on user defined tion of by introducing a variety loss. Additionally, with the control. pooling module, using permutation invariant max-pooling, 2. A trajectory forecaster that uses predicted speeds from a kind of neighbourhood spatial embedding was introduced, the latent space to generate conditional future moves that demonstrated improvement over local grid-based en- that are socially acceptable, without special aggrega- coding, such as the kind used in Social-LSTM [29]. This tion mechanisms like pooling or attention, and per- was improved with an attention mechanism proposed in forms comparable to state of the art, as validated on SoPhie [12], which was explored by numerous following several trajectory prediction benchmarks. works [21, 22, 33–35] and improved in Social Ways [11] and Social BiGAT [16]. 2. Related Work In the current state, the generative models can effectively learn distributions and forecast diverse and acceptable tra- There is a plethora of scientific work done in the past in jectories. However, there still exist open questions as to how the field of trajectory forecasting. Based on structural as- to decide which mode is best, or if the mean is good enough sumptions [23], the existing literature can broadly be clas- for changing scenarios. Existing methods do not tackle the sified as: 1. Ontological, which are mechanics-based, such problem of mode control, which is an essential characteris- as the Cellular Automata model [9] , Reciprocal Veloc- tic needed for simulation and adaptation to different scenar- ity Obstacles (RVO) method [8], or the Social Forces (SF) ios. Further, a key challenge is to find an ideal strategy to model [4] - that use dynamic systems to model the forces aggregate information in scenes with variable neighbours, that affect human motion. For instance, SF models dynam- and it remains unanswered whether special mechanisms like ics with newtonian controls, like, attraction towards goal, pooling or attention are really needed. and repulsion against other agents; these methods make Recently graph based methods have been introduced for strong structural assumptions, and often fail to capture the spatio-temporal modelling such as the Trajectron [28] that intricate motion dynamics [4,5,24,25], and 2. Phenomono- takes as input the relative velocity of the neighbours to logical, which are more data driven and aim to implic- model interaction or the Trajectron++ [13], an improved itly learn complex relationships and distributions. These variant. In this setup, each pedestrian is denoted as a include methods such as GPR (Gaussian Process Regres- node, and two interacting pedestrians are connected with sion) [26], Inverse Reinforcement learning [27], and the an edge. The node representations learn the trajectory se- more recent RNN based methods. However these methods quence, while the edge representations learn the interaction are still restrictive, e.g., GPR suffers from long inference sequence. While these methods provide state of the art re- times [28]. RNNs have fairly recently gained traction in sults in terms of trajectory prediction metrics, such as av- erage and final displacement error, they lack