A Denoising Hybrid Model for Anomaly Detection in Trajectory Sequences

A Denoising Hybrid Model for Anomaly Detection in Trajectory Sequences Maria Liatsikou Symeon Papadopoulos Information Technologies Institute, CERTH Information Technologies Institute, CERTH Thessaloniki, Greece Thessaloniki, Greece [email protected] [email protected] Lazaros Apostolidis Ioannis Kompatsiaris Information Technologies Institute, CERTH Information Technologies Institute, CERTH Thessaloniki, Greece Thessaloniki, Greece [email protected] [email protected] ABSTRACT context dependent [25]. For instance, vehicle trajectory outliers The advances of mobile technologies have led to massive amounts may happen because of anomalies in traffic, caused by various of trajectory data, i.e. data about tracked routes of moving ob- events, like protests, accidents, physical disasters or because of jects. The detection of anomalies in trajectory data is an evolving errors in the GPS devices or of unexpected drivers’ behaviors. research domain, which has applications in traffic management In this work, we propose the use of deep learning based un- and in public safety, but also in climate research and in animal supervised outlier detection and its combination with a density- habit analysis. It is a challenging task due to the existence of non- based outlier detection algorithm, during the detection phase. linear spatiotemporal dependencies. In this work we propose the Two types of autoencoder architectures are applied, both variants combination of deep learning techniques with a traditional out- of LSTM-based networks. The aim is the detection of abnormal lier detection methodology for detecting anomalous trajectories. trajectories, defining them as the ones that the autoencoder fails Two variants of denoising sequential autoencoder architectures to reconstruct. Our models are trained using real-world data, are applied for unsupervised anomaly detection in vehicle trajec- including both normal and abnormal trajectories. Since there is tories, an LSTM-based autoencoder and a Sequence to Sequence no ground truth in our task, in order to evaluate our models on LSTM autoencoder. A weighted distance-based loss function is unseen data and compare their performance, we generate differ- optimized during the training phase, taking into account the im- ent types of abnormal data based on our test set and incorporate pact of trajectory length on the detection outcome. We propose them in it. Anomalous trajectories are detected based on their a hybrid architecture for the detection phase, which combines reconstruction errors. each of the autoencoders with the Local Outlier Factor algorithm The proposed methods can address the problems of mixed dis- – a density–based anomaly detection method – in order to detect tribution densities in a dataset and the dependencies on suitable anomalies. Our models are evaluated on different variants of similarity metrics. These are common drawbacks of traditional synthetic anomalies generated by our dataset. The results indi- outlier detection approaches, in which user-defined parameters, cate a clear performance advantage of our approach compared like distance or density parameters, affect the outcome and are to competitive baselines. not easily defined for sequence data. In general, deep learning models have several advantages over other methods for outlier de- KEYWORDS tection in sequences as they can effectively address the challenges of feature extraction, high dimensionality and non-linearity [18]. Trajectory Analysis, Anomaly Detection, Denoising Autoencoder, Moreover, there is no need to explicitly describe a normal pattern Recurrent Neural Network, Long Short-Term Memory Network, for a trajectory and to define the type of the anomaly (e.g.a LSTM, Sequence to Sequence trajectory with not so many neighbors or with small density). Instead, the proposed method detects trajectories that differ 1 INTRODUCTION from the normal patterns in a rather abstract manner. In specific, the model is trained to reconstruct most of the trajectories The rapid advances in Global Positioning Systems (GPS), com- included in a dataset, while it fails on the more irregular ones, bined with those in computation and storage systems allow the which are classified as anomalies. large-scale collection and analysis of the digital traces obtained What is more, the applied architectures are capable of captur- from GPS devices. The knowledge extracted from vehicle or hu- ing the temporal dependencies in the trajectory data as they are man trajectory data can offer important insights in the fields of both based on recurrent neural networks (RNNs). traffic monitoring and management, public safety and surveil- Several recent works have applied neural autoencoders in tra- lance. A relevant research domain, which has lately been attract- jectory outlier detection tasks [4, 9, 21, 25]. They either use feed ing interest, is anomaly detection in trajectory data. forward or sequential autoencoders in order to detect outliers in An anomalous trajectory is one that appears to be different trajectory data. In these works a threshold is set on the errors’ compared to others with respect to some kind of similarity [3]. It values for classifying a trajectory as outlier or not, or a qualita- is really difficult to uniformly define abnormality, as it is usually tive analysis is conducted on the trajectories with the highest © 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed- reconstruction errors. Moreover, the autoencoders are trained ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus) minimizing a standard loss function, which does not account for on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0 the trajectory length. International (CC BY 4.0) In this work we make the following contributions: • We propose a hybrid architecture that combines a sequen- is detected if there is a drastic change in the similarity values. tial denoising autoencoder with a density-based outlier de- Though related to our task, this method analyses temporal out- tection model for unsupervised anomaly detection in trajec- liers in road network links rather than outliers in trajectories. tories. We apply the Local Outlier Factor (LOF) algorithm Classification-based methods refer to outlier detection with on the autoencoder’s reconstruction errors of unseen data. motion-classifiers or to machine learning classification models. In scoring outlier detection techniques an anomaly score is In ROAM [16] – a supervised trajectory anomaly detection algo- assigned to each instance and the final output is a ranked rithm – trajectories are represented as pattern fragments, which list of all instances with respect to their scores: those in are transformed to a feature space of spatiotemporal attributes. the highest ranks are detected as anomalies. A hierarchical rule-based classifier is applied for classifying the • We train two variants of sequential denoising autoen- trajectories in one of the two classes (supervised setting). coders as a first step in this hybrid setting, by minimizing a iBAT [29] is a lazy isolation technique for trajectory outlier Haversine distance-based weighted loss function, effectively detection. A grid is used to split the studied area, and all trajecto- accounting for the overall length of each trajectory. ries that share the same starting/ending grid cells are grouped • We test on several anomaly detection tasks with synthetic together to detect the anomalous trajectories among them. Thus, data, showing important gains in performance using the this method focuses on discovering anomalous trajectories in proposed models against competitive baselines. specific regions. Furthermore, trajectories are represented as sequences of discrete variables (cell id), which results in a loss of 2 RELATED WORK spatial information. One Class SVM clustering is applied on a dataset of trajectories, extracted from video sequences in the Most trajectory anomaly detection methods rely on distance, den- work of Piciarelli et al.[23]. The aim is to detect the region in sity, historical similarity or classification [3, 11]. Distance-based the feature space that encloses the normal data and excludes the methods detect as outliers trajectories with insufficient number anomalous ones. To address the problem of correctly choosing of neighbors according to a similarity measure. They detect global the number of acceptable anomalies – unknown in unlabeled outliers and their results highly depend on the suitability of the data – the authors suggest a technique for removing the outliers chosen similarity metric. Methods relying on density classify a in the training set. They leverage geometric characteristics of trajectory as an outlier if its density is lower than a threshold. the feature space of the model. After clustering the trajectories, They can detect local abnormalities in datasets with varying the outliers are detected using geometric criteria in the model’s densities, but are computationally expensive. Historical similar- feature space, without taking account of temporal dependencies. ity approaches are applied for detecting temporal outliers based More recent works have started applying deep learning meth- on the calculation of changes in historical similarity between ods for detecting outliers in trajectory data. Roy & Bilodeau [25] data points. Classification-based methods include the detection work on abnormal event detection by analyzing fixed length tra- of anomalous trajectories using

Load more