A Review of Machine Learning and Deep Learning Techniques for Anomaly Detection in Iot Data

applied sciences Review A Review of Machine Learning and Deep Learning Techniques for Anomaly Detection in IoT Data Redhwan Al-amri 1,* , Raja Kumar Murugesan 1,* , Mustafa Man 2,* , Alaa Fareed Abdulateef 3 , Mohammed A. Al-Sharafi 4,* and Ammar Ahmed Alkahtani 5 1 School of Computer Science and Engineering, Taylor’s University, Subang Jaya 47500, Selangor, Malaysia 2 Faculty of Ocean Engineering Technology & Informatics, Universiti Malaysia Terengganu (UMT), Kuala Nerus 21030, Terengganu, Malaysia 3 School of Computing, Universiti Utara Malaysia, Sintok 06010, Kedah, Malaysia; [email protected] 4 Department of Information Systems, Azman Hashim International Business School, Universiti Teknologi Malaysia, Skudai 81310, Johor, Malaysia 5 Institute of Sustainable Energy (ISE), Universiti Tenaga Nasional (UNITEN), Kajang 43000, Malaysia; [email protected] * Correspondence: [email protected] (R.A.-a.); [email protected] (R.K.M.); [email protected] (M.M.); alsharafi@ieee.org (M.A.A.-S.) Abstract: Anomaly detection has gained considerable attention in the past couple of years. Emerging technologies, such as the Internet of Things (IoT), are known to be among the most critical sources of data streams that produce massive amounts of data continuously from numerous applications. Examining these collected data to detect suspicious events can reduce functional threats and avoid unseen issues that cause downtime in the applications. Due to the dynamic nature of the data Citation: Al-amri, R.; Murugesan, stream characteristics, many unresolved problems persist. In the existing literature, methods have R.K.; Man, M.; Abdulateef, A.F.; been designed and developed to evaluate certain anomalous behaviors in IoT data stream sources. Al-Sharafi, M.A.; Alkahtani, A.A. A Review of Machine Learning and However, there is a lack of comprehensive studies that discuss all the aspects of IoT data processing. Deep Learning Techniques for Thus, this paper attempts to fill this gap by providing a complete image of various state-of-the-art Anomaly Detection in IoT Data. Appl. techniques on the major problems and core challenges in IoT data. The nature of data, anomaly Sci. 2021, 11, 5320. https://doi.org/ types, learning mode, window model, datasets, and evaluation criteria are also presented. Research 10.3390/app11125320 challenges related to data evolving, feature-evolving, windowing, ensemble approaches, nature of input data, data complexity and noise, parameters selection, data visualizations, heterogeneity of Academic Editor: Gabriella Tognola data, accuracy, and large-scale and high-dimensional data are investigated. Finally, the challenges that require substantial research efforts and future directions are summarized. Received: 4 May 2021 Accepted: 4 June 2021 Keywords: anomaly detection; data stream; deep learning; Internet of Things; machine learning Published: 8 June 2021 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in 1. Introduction published maps and institutional affil- iations. The advent of the Internet has revolutionized communication between humans. Simi- larly, Internet of Things (IoT) devices are reshaping how humans perceive and interact with the physical world. By 2025, IoT systems are expected to cross nearly 75 billion connected devices, tripling the global population [1]. The IoT is a network of heterogeneous objects, such as smartphones, laptops, intel- Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. ligent devices, and sensors, connected to the Internet through various technologies [2]. This article is an open access article The IoT enables various sensors and devices to communicate with each other directly distributed under the terms and without user interaction [3]. IoT has become one of the biggest data sources in the past few conditions of the Creative Commons years [4]. Methods such as machine learning algorithms can be used to extract meaningful Attribution (CC BY) license (https:// information from these data. creativecommons.org/licenses/by/ IoT remains a significant challenge. One technique that effectively analyses the 4.0/). collected data stream is anomaly detection [5,6]. These unexplained phenomena could be Appl. Sci. 2021, 11, 5320. https://doi.org/10.3390/app11125320 https://www.mdpi.com/journal/applsci Appl. Sci. 2021, 11, x FOR PEER REVIEW 2 of 24 Appl. Sci. 2021, 11, 5320 2 of 23 IoT remains a significant challenge. One technique that effectively analyses the collected data stream is anomaly detection [5,6]. These unexplained phenomena could be outliers, anomalies, cyber-attack, novelties, exceptions, deviations, surprises, or noise outliers,[7,8], where anomalies, outliers cyber-attack, are the data novelties,points that exceptions, are considered deviations, out of surprises,the ordinary or noise. Detection [7,8], whereof these outliers points are can the be data done points using that outlier are considered detection out methods of the ordinary. [9]. The Detection anomalies of theseare a pointsspecial can kind be doneof outlier using outlierthat has detection actionable methods piece [9].s Theof anomaliesinformation are which a special could kind ofbe outliermeaningful. that has To actionable detect these pieces points of information, anomaly detection which could method be meaningful.s are used To[10] detect. Similarly, these points,fault detection anomaly is detection used to methodsdetect noise, are usedwhich [10 is]. unwanted Similarly,, fault and detectionwrong data is used that tohas detect to be noise,removed which [11] is. unwanted,Cyber attack ands, wrongon the dataother that hand has, toare be more removed sophisticated [11]. Cyber; they attacks, can onbe thehidden other between hand, are the more data sophisticated;points and hard they to detect can be [12] hidden. Figure between 1 illustrate the datas the points difference and hardbetween to detect the above [12].- Figurementioned1 illustrates terms. the difference between the above-mentioned terms. FigureFigure 1.1.Outliers Outliers inin thethe datadata stream.stream. InIn manymany cases, cases, IoT IoT real-time real-time applications applications generate generate infinitely infinitely massive massive data data streams streams that posethat imposepose impose unique unique limitations limitations and obstacles and obstacles to machine to machine learning learning algorithms algorithms [13]. These [13]. challengesThese challenges require require a careful a designcareful of design the algorithm of the algorithm to process to theseprocess data these [14]. data Most [14] existing. Most dataexisting stream data algorithms stream algorithms are less are efficient less efficien and havet and limited have limited capability capability requirements requirements [15]. Many[15]. Many studies studies have investigatedhave investigated the techniques the techniques used for used anomaly for anomaly detection, detection, such as [such16–19 as] that[16– address19] that staticaddress data static and data data stream and data using stream both statistical using both and machinestatistical learning and machine methods.learning However, methods. these However, studies have these not studies focused have on not evolving focused data on streams.evolving data streams. SomeSome studiesstudies have focused on on the the detection detection of of anomalies anomalies in in the the data data stream, stream, such such as as[16] [16. However,]. However, previous previous studies studies have have not not addressed addressed all all the the requirements that havehave toto bebe availableavailable inin thethe algorithmalgorithm toto processprocess IoTIoT datadata streamsstreamsand and thethe mainmain challengeschallengesfor forchoosing choosing anan excellentexcellent algorithmalgorithm thatthat suitssuits thethe IoTIoT datadata characteristics.characteristics. ForFor anomalyanomaly detection,detection, manymany algorithmsalgorithms cancan bebe usedused toto detectdetect anomaliesanomalies inin thethedata data stream.stream. AAgood good anomaly anomaly detectiondetection algorithmalgorithm shouldshould considerconsider the following restrictions related related t too data data streams: streams: •• DataData pointspoints are are pushed pushed out out continuously, continuously, and and the the speed speed of arrivalof arrival of dataof data depends depends on theon datathe data source. source. Thus, Thus, it could it could be fast be fast or slow. or slow. •• TheThe datadata streamstream couldcould bebe potentiallypotentially infinite,infinite, whichwhich meansmeans therethere couldcould bebe nono endend toto thethe incomingincoming data.data. • • FeaturesFeatures and/orand/or characteristics of the arriving data points may evolve.evolve. • • DataData pointspoints areare potentiallypotentially one-pass,one-pass, i.e.,i.e., thethe datadata pointspoints cancan bebe usedused onlyonly once,once, andand discardeddiscardedafter. after.Thus, Thus, fetchingfetching important important data data characteristics characteristics is is important. important. To have good quality anomaly detection, algorithms must have the ability to handle To have good quality anomaly detection, algorithms must have the ability to handle the following challenges before they can be used effectively: the following challenges before they can be used effectively: • Ability to handle fast data—the anomaly detection algorithm must be able to handle • Ability to handle fast data—the anomaly detection

Load more