Intelligent Anomaly Detection for Large Network Traffic with Optimized Deep Clustering (ODC) Algorithm

Received March 2, 2021, accepted March 13, 2021, date of publication March 23, 2021, date of current version March 31, 2021. Digital Object Identifier 10.1109/ACCESS.2021.3068172 Intelligent Anomaly Detection for Large Network Traffic With Optimized Deep Clustering (ODC) Algorithm ANNIE GILDA ROSELIN 1,2, PRIYADARSI NANDA 1, SURYA NEPAL 2, AND XIANGJIAN HE 1, (Senior Member, IEEE) 1Department of Electrical and Data Engineering, University of Technology Sydney (UTS), Ultimo, NSW 2007, Australia 2Commonwealth Scientific and Industrial Research Organisation (CSIRO/Data61), Marsfield, NSW 2122, Australia Corresponding author: Annie Gilda Roselin ([email protected]; [email protected]) This work was supported by the Data61 through the Commonwealth Scientific and Industrial Research Organisation (CSIRO), Marsfield, Australia. ABSTRACT The availability of an enormous amount of unlabeled datasets drives the anomaly detection research towards unsupervised machine learning algorithms. Deep clustering algorithms for anomaly detection gain significant research attention in this era. We propose an intelligent anomaly detection for extensive network traffic analysis with an Optimized Deep Clustering (ODC) algorithm. Firstly, ODC does the optimization of the deep AutoEncoder algorithm by tuning the hyperparameters. Thereby we can achieve a reduced reconstruction error rate from the deep AutoEncoder. Secondly, ODC feeds the optimized deep AutoEncoder's latent view to the BIRCH clustering algorithm to detect the known and unknown malicious network traffic without human intervention. Unlike other deep clustering algorithms, ODC does not require to specify the number of clusters needed to analyze the network traffic dataset. We experiment ODC algorithm with the CoAP off-path dataset obtained from our testbed and the MNIST dataset to compare our algorithm's accuracy with state-of-art clustering algorithms. The evaluation results show ODC deep clustering method outperforms the existing deep clustering methods for anomaly detection. INDEX TERMS Deep learning, AutoEncoders, latent space view, anomaly detection, regularization, BIRCH clustering. I. INTRODUCTION Augmentation (DDC-DA) [10] use convolutional AutoEn- Network traffic increase is directly proportional to increas- coder with k-means clustering, Gaussian mixture variational ing malicious activities on the internet. IoT plays a vital AutoEncoder (GMVAE) [12] practices variational AutoEn- role in producing a massive number of network traffic coder with k-means clustering. Most of these deep clustering datasets and creates significant challenges for detecting techniques use the k-means clustering algorithm for the data anomalies. clustering part, which in turn demands the number of clusters Anomaly detection in network traffic with machine learn- manually. In a real-time situation, predicting the number of ing is a rapidly growing research area [1]–[7]. Deep clus- clusters at the initial time (training the model) for a new tering techniques for anomaly detection use variations of dataset might not help discover new and unknown anomalies. AutoEncoder's latent representation with a k-means clus- To overcome this major limitation of the existing works, tering algorithm. For example, Deep Embedding Clustering we use BIRCH (Balanced Iterative Reducing and Clustering (DEC) [8], Improved Deep Embedding Clustering (IDEC) [9] using Hierarchies) in our ODC deep clustering technique. and Deep Density-based Clustering (DDC) [10] use dense BIRCH has the advantage of intelligent cluster assignment deep AutoEncoder, Deep Convolutional Embedded Cluster- and anomaly detection without human intervention. Also, ing (DCEC) [11] and Deep Density-based Clustering-Data a deep AutoEncoder reduces the dimensionality of the dataset irrespective of it has linear/non-linear data. The BIRCH The associate editor coordinating the review of this manuscript and clustering method is not getting much attention among the approving it for publication was Amir Masoud Rahmani . researchers on deep clustering methods. However, BIRCH This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 9, 2021 47243 A. G. Roselin et al.: Intelligent Anomaly Detection for Large Network Traffic With ODC Algorithm has the capability of doing intelligent clustering on a vast by being factually not the same as the remainder of the dataset [13]. perceptions. Present-day organizations are starting to com- Our contributions are summarized as follows: prehend the significance of interconnected tasks to get their • Optimization of the deep AutoEncoder by tuning the business's full image. Additionally, they have to react to hyper-parameters to achieve a reduced reconstruction quick-moving changes in information instantly, particularly error rate. if there should be an occurrence of cybersecurity dangers. • We inferred a novel unsupervised anomaly detection Unfortunately, there is no compelling method to deal with algorithm `ÒDC'' by incorporating the BIRCH clus- and break down, continually developing datasets physically. tering algorithm with the Latent representation of the With the dynamic frameworks having various segments in a enhanced deep AutoEncoder. ceaseless movement where the ``normal'' conduct is contin- • Unlike other deep clustering algorithms, ODC does not ually reclassified, another proactive way to deal with distin- require to specify the number of clusters needed to ana- guishing anomalous behavior is required [20]. lyze the network traffic. Based on the dataset we use to train the machine learning • ODC handles anomalies, including known and unknown model, anomaly detection varies in many real-world appli- attacks intelligently, for a huge dataset. cations and academic research areas. With the emergence of • We analyzed how the Branching factor value and sensor networks, processing data as it arrives has become the Threshold value of BIRCH influence the cluster- a necessity [21]. Techniques have been proposed that can ing accuracy and normalized mutual information score operate in an online fashion [22]; such techniques assign values. an anomaly score to a test instance as it arrives, but also We observed that our ODC clustering algorithm outper- incrementally update the model. Authors in [23] showcased forms the existing deep clustering methods for anomaly the importance of anomaly detection in dynamic settings detection. Moreover, ODC suits well for vast network traf- through a real-world application example, i.e., forest fire risk fic datasets where multiple scans of the datasets are not prediction. Also, they recommend redesigning the current advisable since ODC has the BIRCH clustering algorithm's models to be able to detect outlying patterns accurately and embedment. ODC incorporates the advantages of the BIRCH efficiently. More specifically, when there are many features, clustering algorithm. We achieved great clustering accuracy a set of anomalies emerge in only a subset of dimensions at and normalized mutual index score for the anomaly detection a particular period. This set of anomalies may appear normal process due to the combination of a deep AutoEncoder and regarding a different subset of dimensions and periods. the BIRCH clustering algorithm. Also, ODC put a stop to the Authors in [24] discussed the unavailability of financial need of domain experts to manually label the large datasets data for fraud detection research and a methodology for and explicitly specify the number of clusters needed for the synthetic data generation. They suggest that a universal tech- dataset. Our proposed method differs from the state of the arts nique in the domain of fraud detection is yet to be found [14]–[17] and [18] in which we associated BIRCH clustering due to the evolving change in the context of normality and with our enhanced deep AutoEncoder. To preserve the data labeled data unavailability. According to [25] much of the point's local structure, the StructAE [19] learns representa- research is performed on simulated data (37 out of the 65 sur- tions for each data point by minimizing reconstruction error veyed papers); in-vehicle network data and vehicular ad hoc with respect to itself. However, ODC achieves low recon- network (VANET) data are seldom considered together to struction error rate by tuning the hyperparameters such as safeguard the connected vehicles (except for 1 out of the activation function and the regularization function. Hence, 65 surveyed papers); Connected vehicles safety research we prove that ODC preserves the data points' structure, lead- does not get the same amount of attention as cybersecurity ing to an intelligent clustering method to detect anomalies. research. It is observed that the anomaly detection domain The rest of the paper is organized as follows. Section II has various promising research directions; many anomaly provides the background information needed to understand detection methods require a large amount of test data set for the ODC clustering algorithm. The working principles of detecting anomalies [26]. The literature survey we conducted a deep AutoEncoder and the BIRCH clustering algorithm in anomaly detection motivates us to use the machine learning are explained in Section II-A and Section II-B, respec- models to determine the abnormal behavior of the legitimate tively. Section III describes the state of the art of deep user in a private network. clustering algorithms. The proposed deep clustering

Intelligent Anomaly Detection for Large Network Traffic with Optimized Deep Clustering (ODC) Algorithm

A Review on Outlier/Anomaly Detection in Time Series Data

Image Anomaly Detection Using Normal Data Only by Latent Space Resampling

Unsupervised Network Anomaly Detection Johan Mazel

Machine Learning and Extremes for Anomaly Detection — Apprentissage Automatique Et Extrêmes Pour La Détection D’Anomalies

Machine Learning for Anomaly Detection and Categorization In

CSE601 Anomaly Detection

Anomaly Detection in Data Mining: a Review Jagruti D

Anomaly Detection in Raw Audio Using Deep Autoregressive Networks

Detecting Anomalies in System Log Files Using Machine Learning Techniques

Anomaly Detection with Adversarial Dual Autoencoders

Backpropagated Gradient Representations for Anomaly Detection

Machine Learning for Automated Anomaly Detection In