Anomaly Detection Using Predictive Convolutional Long Short-Term Memory Units" (2016)
Total Page:16
File Type:pdf, Size:1020Kb
Rochester Institute of Technology RIT Scholar Works Theses 11-2016 Anomaly Detection Using Predictive Convolutional Long Short- Term Memory Units Jefferson Ryan Medel [email protected] Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Medel, Jefferson Ryan, "Anomaly Detection Using Predictive Convolutional Long Short-Term Memory Units" (2016). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Anomaly Detection Using Predictive Convolutional Long Short-Term Memory Units by Jefferson Ryan Medel A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering Supervised by Dr. Andreas Savakis Department of Computer Engineering Kate Gleason College of Engineering Rochester Institute of Technology Rochester, NY November 2016 Approved By: _______________________________________________________________________________ Dr. Andreas Savakis Primary Advisor – R.I.T. Dept. of Computer Engineering _______________________________________________________________________________ Dr. Andres Kwasinski Secondary Advisor – R.I.T. Dept. of Computer Engineering _______________________________________________________________________________ Dr. Roy Melton Secondary Advisor – R.I.T. Dept. of Computer Engineering Dedication I would like to dedicate this thesis to my loving parents, Ric and Juliet Medel, and sister, Riane Medel, whom have given me unwavering support over the years. ii Acknowledgements I would like to thank my advisor Dr. Andreas Savakis for his assistance and work guiding me through the development of this thesis, and my committee members Dr. Andres Kwasinski and Dr. Roy Melton for their time. I also want to thank my colleagues in the Computer Vision Lab, Peter Muller, Dan Chianucci, and Breton Minnehan, for their knowledge and company. iii Abstract Automating the segmentation of anomalous activities within long video sequences is complicated by the ambiguity of how such events are defined. This thesis approaches the problem by learning generative models with which meaningful sequences can be identified in videos using limited supervision. We propose two types of end-to-end trainable Convolutional Long Short-Term Memory (Conv-LSTM) networks that are able to predict the subsequent video sequence from a given input. The first is an encoder- decoder based model that learns spatio-temporal features from stacked non-overlapping image patches, and the second is an autoencoder based model that utilizes max-pooling layers to learn an abstraction of the entire image. The networks learn to model “normal” activities from usual events. Regularity scores are derived from the reconstruction errors of a set of predictions with abnormal video sequences yielding lower regularity scores, as they diverge further from the actual sequence with time. The models utilize a composite structure and examine the effects of “conditioning” to learn more meaningful representations. The best model is chosen based on the reconstruction and prediction accuracies. The Conv-LSTM models are evaluated both qualitatively and quantitatively, demonstrating competitive results on multiple anomaly detection datasets. Conv-LSTM units are shown to provide competitive results for modeling and predicting learned events when compared to state-to-the-art methods. iv Table of Contents Dedication ............................................................................................................... ii Acknowledgements ................................................................................................ iii Abstract .................................................................................................................. iv List of Figures ....................................................................................................... vii List of Tables ........................................................................................................ xii Glossary ............................................................................................................... xiii Chapter 1 Introduction ..........................................................................................1 1.1. Thesis Contributions .............................................................................. 2 1.2. Thesis Outline ........................................................................................ 2 Chapter 2 Background ...........................................................................................4 2.1. Feed Forward Neural Networks ............................................................ 4 2.2. Convolutional Neural Networks ............................................................ 5 2.3. Recurrent Neural Networks ................................................................... 7 2.3.1 Long Short-Term Memory .................................................................. 8 2.3.2 Convolutional Long Short-term Memory ......................................... 10 2.4. Future Video Prediction....................................................................... 13 2.5. Anomaly Detection in Videos ............................................................. 17 Chapter 3 Anomaly Detection through Future Prediction ..................................19 3.1. Proposed Architectures ........................................................................ 19 3.1.1 Proposed Composite Convolutional LSTM Encoder-Decoder ......... 19 v 3.1.2 Proposed Conv-LSTM Autoencoder ................................................. 22 3.1.3 Parameters of Note ............................................................................ 24 3.2. Evaluation Algorithm .......................................................................... 26 3.2.1 Parameters of Note ............................................................................ 28 Chapter 4 Experimental Results ..........................................................................30 4.1. Experimental Setup.............................................................................. 30 4.1.1 Dataset Selection ............................................................................... 31 4.2. Preliminary Code Validation ............................................................... 34 4.3. Parameter Selection ............................................................................. 36 4.3.1 Model Parameters .............................................................................. 36 4.3.2 Evaluation Parameters ....................................................................... 38 4.4. Results ................................................................................................. 39 4.4.1 Predicting Past and Future Video Sequences .................................... 39 4.4.2 Preliminary Anomaly Detection Evaluation ..................................... 46 4.4.3 Improved Anomaly Detection Evaluation......................................... 46 Chapter 5 Conclusion ..........................................................................................66 Bibliography ..........................................................................................................68 vi List of Figures Figure 1: A Simple Feed Forward Neural Network. The nodes in the input layer is the input data, while the nodes in the hidden and output layers are perceptrons. Each connection between nodes represents a weighted connection. ......................................4 Figure 2: Diagram illustrating the relationship between the input and layers. Each resulting feature map in the hidden layer uses its own convolutional filter. The filters process the input with a sliding window that sums the convolutional results across every channel at the same coordinate. ................................................................6 Figure 3: A visualization of the LeNet architecture. It combined convolutional, pooling, and fully connected layers to learn a model that can solve classification problems......7 Figure 4: Unrolled Recurrent Unit .......................................................................................8 Figure 5: Long Short-term Memory Cell .............................................................................9 Figure 6: Inner Structure of a Conv-LSTM Unit ...............................................................12 Figure 7: The composite structure for unrolled LSTM unit. Blue lines represent potential conditioning ..................................................................................................14 Figure 8: Reconstruction and future prediction obtained from the Composite Model on a dataset of Bouncing MNIST images [19] .................................................................15 Figure 9: Reconstruction and future prediction obtained from the Composite Model on a dataset of natural image patches. [19] .......................................................................16 Figure 10: Two prediction examples. All of the predictions and ground truths are sampled with an interval of 3. From top to bottom: input frames; ground truth frames; prediction by the ConvLSTM forecasting network. [17]................................17 Figure 11: High-level view of the Conditioned Composite