Anomaly Detection in Cloud Computing Environments

Total Page:16

File Type:pdf, Size:1020Kb

Anomaly Detection in Cloud Computing Environments Anomaly Detection in Cloud Computing Environments vorgelegt von M. Sc. Florian Johannes Schmidt an der Fakultät IV - Elektrotechnik und Informatik der Technischen Universität Berlin zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften - Dr.-Ing. - genehmigte Dissertation Promotionsausschuss: Vorsitzender: Prof. Dr. David Bermbach Gutachter: Prof. Dr. Odej Kao Gutachter: Prof. Dr. Johan Tordsson Gutachter: Prof. Dr. Jan Nordholz Tag der wissenschaftlichen Aussprache: 18. Juni 2020 Berlin 2020 Thank you! { Kathrin, Martina, Hartmut, Friederike, Maryse, Einstein, Felix, Fabian, Fiona, Marion, Thomas, Susanne, Heino, Hildegard, Ella, Harry, Helmut, Anton, Marcel, Odej, Alex, Sören, Ilya, Lauritz, Sasho, Jana, Feng, Tobias, Jan, Kevin, Stefan, Steven, Annika, Alexandra, Elisa, Eva, Paul, René, Sven, Vincent, Johannes, Yannick, Tim } i Zusammenfassung Cloud Computing Paradigmen, werden in der modernen Softwareentwicklung bereits von den meisten Unternehmen angewendet. Die Bereitstellung von digitalen Diensten in einer Cloudumgebung bietet sowohl die Möglichkeit der kostenefzienten Nutzung von Ressourcen als auch die Möglichkeit auf Bedarf dynamisch die Anwendungen zu skalieren. Basierend auf dieser Flexibilität werden immer komplexere Softwareanwen- dungen entwickelt, welches zu anspruchsvollen Wartungsarbeiten der Gesamtinfras- truktur führen. Ebenfalls werden immer höhere Ansprüche an die Verfügbarkeit von Softwarediensten gestellt (99,999% im Industriekontext), was durch die Komplexität moderner Systeme nur noch schwieriger und unter großer Mühe gewährleistet werden kann. Aufgrund dieser Trends steigt der Bedarf an intelligenten Anwendungen, die automatisiert Anomalien erkennen und Vorschläge erarbeiten, um Probleme zu erken- nen, zu beheben oder zumindest zu mindern um keinen negativen Einfuss auf die Servicequalität zu kaskadieren. Diese Arbeit beschäftigt sich mit der Erkennung von degradierten abnormalen Systemzuständen in Cloudumgebungen. Hierbei wird sowohl eine holistische Analy- sepipeline und -infrastruktur beschrieben als auch die Anwendbarkeit von verschiede- nen Strategien des maschinellen Lernens diskutiert, um möglichst eine voll automa- tisierte Lösung bereitzustellen. Basierend auf den zugrunde liegenden Annahmen, wird ein neuartiger unsupervised Anomalieerkennungsalgorithmus namens CABIRCH vorgestellt und dessen Anwendbarkeit analysiert und diskutiert. Da die Wahl der Hy- perparameter einen wichtigen Einfuss auf die Genauigkeit des Algorithmus hat, wird zudem ein Hyperparameterauswahlverfahren mit einer neuartigen Fitness-Funktion vorgestellt, welches zur Vollautomatisierung der Anomalieerkennung führen soll. Hi- erbei ist das Verfahren generalisiert anwendbar für eine Vielzahl von unsupervised Anomalieerkennungsalgorithmen, welche basierend auf jüngsten Veröfentlichungen umfassend evaluiert werden. Dabei wird die Anwendbarkeit zur automatisierten Erken- nung von degradierten abnormalen Systemzuständen gezeigt und mögliche Limitierun- gen diskutiert. Die Ergebnisse zeigen, dass eine Erkennung der verschiedenen Anoma- lien gewährleistet werden kann, jedoch mit einer Fehlalarmrate von über 1%. ii Abstract Cloud computing is widely applied by modern software development companies. Pro- viding digital services in a cloud environment ofers both the possibility of cost-efcient usage of computation resources and the ability to dynamically scale applications on demand. Based on this fexibility, more and more complex software applications are being developed leading to increasing maintenance eforts to ensure the reliability of the entire system infrastructure. Furthermore, highly available cloud service require- ments (99.999% as industry standards) are difcult to guarantee due to the complexity of modern systems and can therefore just be ensured by great efort. Due to these trends, there is an increasing demand for intelligent applications that automatically detect anomalies and provide suggestions solving or at least mitigating problems in order not to cascade a negative impact on the service quality. This thesis focuses on the detection of degraded abnormal system states in cloud environments. A holistic analysis pipeline and infrastructure is proposed, and the applicability of diferent machine learning strategies is discussed to provide an auto- mated solution. Based on the underlying assumptions, a novel unsupervised anomaly detection algorithm called CABIRCH is presented and its applicability is analyzed and discussed. Since the choice of hyperparameters has a great infuence on the accuracy of the algorithm, a hyperparameter selection procedure with a novel ftness function is proposed, leading to further automation of the integrated anomaly detection. The method is generalized and applicable for a variety of unsupervised anomaly detection algorithms, which will be evaluated including a comparison to recent publications. The results show the applicability for the automated detection of degraded abnormal system states and possible limitations are discussed. The results show that detection of system anomaly scenarios achieves accurate detection rates but comes with a false alarm rate of more than 1%. iii Contents 1 Introduction 1 1.1 Research Objectives and Main Contributions . 2 1.2 Publications . 3 1.3 Outline of the Thesis . 4 2 Background 6 2.1 Anomaly detection . 6 2.1.1 Types of Anomalies . 7 2.1.2 Failures and Degraded State Anomalies . 7 2.2 Application Domain . 9 2.2.1 IP multimedia subsystem . 9 2.2.2 Video on demand . 10 2.3 Analytic Concepts . 10 2.3.1 Machine Learning Methodologies . 12 2.3.2 BIRCH . 12 2.3.3 Autoencoder . 14 2.3.4 Variational Autoencoder . 15 2.3.5 Long Short Term Memory Networks . 16 2.3.6 Dynamic Threshold Models . 17 2.3.7 Genetic Algorithm . 18 2.4 Evaluation Metrics . 19 3 Related Work 22 3.1 Characteristics of Service Anomalies . 22 3.2 Anomaly Detection . 23 3.3 Concept Adapting Clustering . 28 3.4 Hyperparameter Optimization . 30 4 Framework for AI-based Anomaly Detection 33 4.1 ZerOps Framework . 33 4.2 Categorization of AI-based Anomaly Detection . 35 4.3 Evaluation . 39 4.3.1 Supervised Learning Evaluation . 39 4.3.2 Semi-supervised Evaluation . 42 4.3.3 Summary . 44 5 Concept Adapting BIRCH 46 5.1 Concept Adapting BIRCH . 46 5.1.1 Micro-cluster Aging . 47 5.2 Anomaly Detection using Concept Adapting BIRCH . 53 5.2.1 Identity Function Threshold Model . 54 5.3 Evaluation . 55 iv v 5.3.1 Infuence of Decay Rate Selection . 56 5.3.2 CABIRCH-based Anomaly Detection . 60 6 Cold Start-Aware Identity Function Threshold Models 67 6.1 Integration of Hyperparameter Optimization into IFTM Framework . 68 6.2 Automated Hyperparameter Optimization . 70 6.2.1 Initialization, Crossover, Mutation, Termination . 70 6.2.2 Fitness Function Defnition . 71 6.3 Evaluation . 73 7 Evaluation 81 7.1 Evaluation Setup . 82 7.1.1 Resource Monitoring . 83 7.1.2 Anomaly Injection Framework . 84 7.2 Evaluation Results . 86 7.3 Discussion . 93 7.4 Future Work . 96 8 Conclusion 98 Appendices 119 A Online Arima 120 B Intervals for Identity functions and Threshold models 123 C Detailed Evaluation Results 125 Chapter 1 Introduction Digitalization transforms our world in various areas like Industry 4.0 (product line, manufacturing automation, predictive maintenance, etc.), transportation (self-driving cars, car-2-car communication, intelligent trafc control systems), smart home, medi- cal assisted surgery, and many more. Gartner predicts that by 2020 there exist more than 20 billion connected IoT-devices [1]. Cisco even forecasts 28.5 billion connected devices by 2022 [2]. While the number of devices and sensors increases, the key net- work technologies like 5G and virtualization of cloud computing and fog computing change the business opportunities and enable fexibility for the infrastructure. The improved fexibility and business opportunities come at a high cost, as the system complexity increases signifcantly. It introduces the challenge of not only administer- ing the complex IT-infrastructure but also adds the challenges of maintaining every e.g. remote device, edge cloud, software service, and the heterogeneous networks in between. Managing this complexity surpasses the ability of human experts to oversee the entire system and react with quick responses to meet the promised Quality of Service (QoS) parameters or even Service Level Agreements (SLA). In application scenarios such as softwarization of dedicated hardware solutions to virtualized environments, telecommunication providers hope to beneft from increased fexibility and cost-efectiveness. Still, the given dedicated hardware components pro- vide a reliability of 99.999% [3], which is therefore demanded for the virtualized com- ponents. Due to the increased complexity of the computation model, which includes hardware components and a stack of virtualized components, softwarized components cannot cope with the high demand for reliability. Because of the fragility of such system stacks, the expectations of system administrators increases to maintain the continuous operation of the services [4]. With respect to this and the recent develop- ments of artifcial intelligence (AI), concepts are developed to analyze and automate large portions of operational tasks for administrators. AI-based automation of in- frastructure operation (AIOps) provides the vision of establishing a system able to autonomously operate and remediate large IT environments. The increased demand for qualifed operation maintenance support refects the es- tablishment
Recommended publications
  • A Review on Outlier/Anomaly Detection in Time Series Data
    A review on outlier/anomaly detection in time series data ANE BLÁZQUEZ-GARCÍA and ANGEL CONDE, Ikerlan Technology Research Centre, Basque Research and Technology Alliance (BRTA), Spain USUE MORI, Intelligent Systems Group (ISG), Department of Computer Science and Artificial Intelligence, University of the Basque Country (UPV/EHU), Spain JOSE A. LOZANO, Intelligent Systems Group (ISG), Department of Computer Science and Artificial Intelligence, University of the Basque Country (UPV/EHU), Spain and Basque Center for Applied Mathematics (BCAM), Spain Recent advances in technology have brought major breakthroughs in data collection, enabling a large amount of data to be gathered over time and thus generating time series. Mining this data has become an important task for researchers and practitioners in the past few years, including the detection of outliers or anomalies that may represent errors or events of interest. This review aims to provide a structured and comprehensive state-of-the-art on outlier detection techniques in the context of time series. To this end, a taxonomy is presented based on the main aspects that characterize an outlier detection technique. Additional Key Words and Phrases: Outlier detection, anomaly detection, time series, data mining, taxonomy, software 1 INTRODUCTION Recent advances in technology allow us to collect a large amount of data over time in diverse research areas. Observations that have been recorded in an orderly fashion and which are correlated in time constitute a time series. Time series data mining aims to extract all meaningful knowledge from this data, and several mining tasks (e.g., classification, clustering, forecasting, and outlier detection) have been considered in the literature [Esling and Agon 2012; Fu 2011; Ratanamahatana et al.
    [Show full text]
  • Machine Learning Or Econometrics for Credit Scoring: Let’S Get the Best of Both Worlds Elena Dumitrescu, Sullivan Hué, Christophe Hurlin, Sessi Tokpavi
    Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds Elena Dumitrescu, Sullivan Hué, Christophe Hurlin, Sessi Tokpavi To cite this version: Elena Dumitrescu, Sullivan Hué, Christophe Hurlin, Sessi Tokpavi. Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds. 2020. hal-02507499v2 HAL Id: hal-02507499 https://hal.archives-ouvertes.fr/hal-02507499v2 Preprint submitted on 2 Nov 2020 (v2), last revised 15 Jan 2021 (v3) HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Machine Learning or Econometrics for Credit Scoring: Let's Get the Best of Both Worlds∗ Elena, Dumitrescu,y Sullivan, Hu´e,z Christophe, Hurlin,x Sessi, Tokpavi{ October 18, 2020 Abstract In the context of credit scoring, ensemble methods based on decision trees, such as the random forest method, provide better classification performance than standard logistic regression models. However, logistic regression remains the benchmark in the credit risk industry mainly because the lack of interpretability of ensemble methods is incompatible with the requirements of financial regulators. In this paper, we pro- pose to obtain the best of both worlds by introducing a high-performance and inter- pretable credit scoring method called penalised logistic tree regression (PLTR), which uses information from decision trees to improve the performance of logistic regression.
    [Show full text]
  • Reinforcement Learning in Supervised Problem Domains
    Technische Universität München Fakultät für Informatik Lehrstuhl VI – Echtzeitsysteme und Robotik reinforcement learning in supervised problem domains Thomas F. Rückstieß Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. Daniel Cremers Prüfer der Dissertation 1. Univ.-Prof. Dr. Patrick van der Smagt 2. Univ.-Prof. Dr. Hans Jürgen Schmidhuber Die Dissertation wurde am 30. 06. 2015 bei der Technischen Universität München eingereicht und durch die Fakultät für Informatik am 18. 09. 2015 angenommen. Thomas Rückstieß: Reinforcement Learning in Supervised Problem Domains © 2015 email: [email protected] ABSTRACT Despite continuous advances in computing technology, today’s brute for- ce data processing approaches may not provide the necessary advantage to win the race against the ever-growing amount of data that can be wit- nessed over the last decades. In this thesis, we discuss novel methods and algorithms that are capable of directing attention to relevant details and analysing it in sequence to overcome the processing bottleneck and to keep up with this data explosion. In the first of three parts, a novel exploration technique for Policy Gradi- ent Reinforcement Learning is presented which replaces traditional ad- ditive random exploration with state-dependent exploration, exploring on a higher, more strategic level. We will show how this new exploration method converges faster and finds better global solutions than random exploration can. The second part of this thesis will introduce the concept of “data con- sumption” and discuss means to minimise it in supervised learning tasks by deriving classification as a sequential decision process and ma- king it accessible to Reinforcement Learning methods.
    [Show full text]
  • Image Anomaly Detection Using Normal Data Only by Latent Space Resampling
    applied sciences Article Image Anomaly Detection Using Normal Data Only by Latent Space Resampling Lu Wang 1 , Dongkai Zhang 1 , Jiahao Guo 1 and Yuexing Han 1,2,* 1 School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China; [email protected] (L.W.); [email protected] (D.Z.); [email protected] (J.G.) 2 Shanghai Institute for Advanced Communication and Data Science, Shanghai University, 99 Shangda Road, Shanghai 200444, China * Correspondence: [email protected] Received: 29 September 2020; Accepted: 30 November 2020; Published: 3 December 2020 Abstract: Detecting image anomalies automatically in industrial scenarios can improve economic efficiency, but the scarcity of anomalous samples increases the challenge of the task. Recently, autoencoder has been widely used in image anomaly detection without using anomalous images during training. However, it is hard to determine the proper dimensionality of the latent space, and it often leads to unwanted reconstructions of the anomalous parts. To solve this problem, we propose a novel method based on the autoencoder. In this method, the latent space of the autoencoder is estimated using a discrete probability model. With the estimated probability model, the anomalous components in the latent space can be well excluded and undesirable reconstruction of the anomalous parts can be avoided. Specifically, we first adopt VQ-VAE as the reconstruction model to get a discrete latent space of normal samples. Then, PixelSail, a deep autoregressive model, is used to estimate the probability model of the discrete latent space. In the detection stage, the autoregressive model will determine the parts that deviate from the normal distribution in the input latent space.
    [Show full text]
  • Unsupervised Network Anomaly Detection Johan Mazel
    Unsupervised network anomaly detection Johan Mazel To cite this version: Johan Mazel. Unsupervised network anomaly detection. Networking and Internet Architecture [cs.NI]. INSA de Toulouse, 2011. English. tel-00667654 HAL Id: tel-00667654 https://tel.archives-ouvertes.fr/tel-00667654 Submitted on 8 Feb 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 5)µ4& &OWVFEFMPCUFOUJPOEV %0$503"5%&-6/*7&34*5²%&506-064& %ÏMJWSÏQBS Institut National des Sciences Appliquées de Toulouse (INSA de Toulouse) $PUVUFMMFJOUFSOBUJPOBMFBWFD 1SÏTFOUÏFFUTPVUFOVFQBS Mazel Johan -F lundi 19 décembre 2011 5J tre : Unsupervised network anomaly detection ED MITT : Domaine STIC : Réseaux, Télécoms, Systèmes et Architecture 6OJUÏEFSFDIFSDIF LAAS-CNRS %JSFDUFVS T EFʾÒTF Owezarski Philippe Labit Yann 3BQQPSUFVST Festor Olivier Leduc Guy Fukuda Kensuke "VUSF T NFNCSF T EVKVSZ Chassot Christophe Vaton Sandrine Acknowledgements I would like to first thank my PhD advisors for their help, support and for letting me lead my research toward my own direction. Their inputs and comments along the stages of this thesis have been highly valuable. I want to especially thank them for having been able to dive into my work without drowning, and then, provide me useful remarks.
    [Show full text]
  • Machine Learning and Extremes for Anomaly Detection — Apprentissage Automatique Et Extrêmes Pour La Détection D’Anomalies
    École Doctorale ED130 “Informatique, télécommunications et électronique de Paris” Machine Learning and Extremes for Anomaly Detection — Apprentissage Automatique et Extrêmes pour la Détection d’Anomalies Thèse pour obtenir le grade de docteur délivré par TELECOM PARISTECH Spécialité “Signal et Images” présentée et soutenue publiquement par Nicolas GOIX le 28 Novembre 2016 LTCI, CNRS, Télécom ParisTech, Université Paris-Saclay, 75013, Paris, France Jury : Gérard Biau Professeur, Université Pierre et Marie Curie Examinateur Stéphane Boucheron Professeur, Université Paris Diderot Rapporteur Stéphan Clémençon Professeur, Télécom ParisTech Directeur Stéphane Girard Directeur de Recherche, Inria Grenoble Rhône-Alpes Rapporteur Alexandre Gramfort Maitre de Conférence, Télécom ParisTech Examinateur Anne Sabourin Maitre de Conférence, Télécom ParisTech Co-directeur Jean-Philippe Vert Directeur de Recherche, Mines ParisTech Examinateur List of Contributions Journal Sparse Representation of Multivariate Extremes with Applications to Anomaly Detection. (Un- • der review for Journal of Multivariate Analysis). Authors: Goix, Sabourin, and Clémençon. Conferences On Anomaly Ranking and Excess-Mass Curves. (AISTATS 2015). • Authors: Goix, Sabourin, and Clémençon. Learning the dependence structure of rare events: a non-asymptotic study. (COLT 2015). • Authors: Goix, Sabourin, and Clémençon. Sparse Representation of Multivariate Extremes with Applications to Anomaly Ranking. (AIS- • TATS 2016). Authors: Goix, Sabourin, and Clémençon. How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms? (to be submit- • ted). Authors: Goix and Thomas. One-Class Splitting Criteria for Random Forests with Application to Anomaly Detection. (to be • submitted). Authors: Goix, Brault, Drougard and Chiapino. Workshops Sparse Representation of Multivariate Extremes with Applications to Anomaly Ranking. (NIPS • 2015 Workshop on Nonparametric Methods for Large Scale Representation Learning). Authors: Goix, Sabourin, and Clémençon.
    [Show full text]
  • Machine Learning for Anomaly Detection and Categorization In
    Machine Learning for Anomaly Detection and Categorization in Multi-cloud Environments Tara Salman Deval Bhamare Aiman Erbad Raj Jain Mohammed SamaKa Washington University in St. Louis, Qatar University, Qatar University, Washington University in St. Louis, Qatar University, St. Louis, USA Doha, Qatar Doha, Qatar St. Louis, USA Doha, Qatar [email protected] [email protected] [email protected] [email protected] [email protected] Abstract— Cloud computing has been widely adopted by is getting popular within organizations, clients, and service application service providers (ASPs) and enterprises to reduce providers [2]. Despite this fact, data security is a major concern both capital expenditures (CAPEX) and operational expenditures for the end-users in multi-cloud environments. Protecting such (OPEX). Applications and services previously running on private environments against attacks and intrusions is a major concern data centers are now being migrated to private or public clouds. in both research and industry [3]. Since most of the ASPs and enterprises have globally distributed user bases, their services need to be distributed across multiple Firewalls and other rule-based security approaches have clouds, spread across the globe which can achieve better been used extensively to provide protection against attacks in performance in terms of latency, scalability and load balancing. the data centers and contemporary networks. However, large The shift has eventually led the research community to study distributed multi-cloud environments would require a multi-cloud environments. However, the widespread acceptance significantly large number of complicated rules to be of such environments has been hampered by major security configured, which could be costly, time-consuming and error concerns.
    [Show full text]
  • CSE601 Anomaly Detection
    Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection • Anomalies – the set of objects are considerably dissimilar from the remainder of the data – occur relatively infrequently – when they do occur, their consequences can be quite dramatic and quite often in a negative sense “Mining needle in a haystack. So much hay and so little time” 2 Definition of Anomalies • Anomaly is a pattern in the data that does not conform to the expected behavior • Also referred to as outliers, exceptions, peculiarities, surprise, etc. • Anomalies translate to significant (often critical) real life entities – Cyber intrusions – Credit card fraud 3 Real World Anomalies • Credit Card Fraud – An abnormally high purchase made on a credit card • Cyber Intrusions – Computer virus spread over Internet 4 Simple Example Y • N1 and N2 are N regions of normal 1 o1 behavior O3 • Points o1 and o2 are anomalies o2 • Points in region O 3 N2 are anomalies X 5 Related problems • Rare Class Mining • Chance discovery • Novelty Detection • Exception Mining • Noise Removal 6 Key Challenges • Defining a representative normal region is challenging • The boundary between normal and outlying behavior is often not precise • The exact notion of an outlier is different for different application domains • Limited availability of labeled data for training/validation • Malicious adversaries • Data might contain noise • Normal behaviour keeps evolving 7 Aspects of Anomaly Detection Problem • Nature of input data • Availability of supervision • Type of anomaly: point, contextual, structural
    [Show full text]
  • Analysis of Earthquake Forecasting in India Using Supervised Machine Learning Classifiers
    sustainability Article Analysis of Earthquake Forecasting in India Using Supervised Machine Learning Classifiers Papiya Debnath 1, Pankaj Chittora 2 , Tulika Chakrabarti 3, Prasun Chakrabarti 2,4, Zbigniew Leonowicz 5 , Michal Jasinski 5,* , Radomir Gono 6 and Elzbieta˙ Jasi ´nska 7 1 Department of Basic Science and Humanities, Techno International New Town Rajarhat, Kolkata 700156, India; [email protected] 2 Department of Computer Science and Engineering, Techno India NJR Institute of Technology, Udaipur 313003, Rajasthan, India; [email protected] (P.C.); [email protected] (P.C.) 3 Department of Basic Science, Sir Padampat Singhania University, Udaipur 313601, Rajasthan, India; [email protected] 4 Data Analytics and Artificial Intelligence Laboratory, Engineering-Technology School, Thu Dau Mot University, Thu Dau Mot City 820000, Vietnam 5 Department of Electrical Engineering Fundamentals, Faculty of Electrical Engineering, Wroclaw University of Science and Technology, 50-370 Wroclaw, Poland; [email protected] 6 Department of Electrical Power Engineering, Faculty of Electrical Engineering and Computer Science, VSB—Technical University of Ostrava, 708 00 Ostrava, Czech Republic; [email protected] 7 Faculty of Law, Administration and Economics, University of Wroclaw, 50-145 Wroclaw, Poland; [email protected] * Correspondence: [email protected]; Tel.: +48-713-202-022 Abstract: Earthquakes are one of the most overwhelming types of natural hazards. As a result, successfully handling the situation they create is crucial. Due to earthquakes, many lives can be lost, alongside devastating impacts to the economy. The ability to forecast earthquakes is one of the biggest issues in geoscience. Machine learning technology can play a vital role in the field of Citation: Debnath, P.; Chittora, P.; geoscience for forecasting earthquakes.
    [Show full text]
  • Anomaly Detection in Data Mining: a Review Jagruti D
    Volume 7, Issue 4, April 2017 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Anomaly Detection in Data Mining: A Review Jagruti D. Parmar Prof. Jalpa T. Patel M.E IT, SVMIT, Bharuch, Gujarat, CSE & IT, SVMIT Bharuch, Gujarat, India India Abstract— Anomaly detection is the new research topic to this new generation researcher in present time. Anomaly detection is a domain i.e., the key for the upcoming data mining. The term ‘data mining’ is referred for methods and algorithms that allow extracting and analyzing data so that find rules and patterns describing the characteristic properties of the information. Techniques of data mining can be applied to any type of data to learn more about hidden structures and connections. In the present world, vast amounts of data are kept and transported from one location to another. The data when transported or kept is informed exposed to attack. Though many techniques or applications are available to secure data, ambiguities exist. As a result to analyze data and to determine different type of attack data mining techniques have occurred to make it less open to attack. Anomaly detection is used the techniques of data mining to detect the surprising or unexpected behaviour hidden within data growing the chances of being intruded or attacked. This paper work focuses on Anomaly Detection in Data mining. The main goal is to detect the anomaly in time series data using machine learning techniques. Keywords— Anomaly Detection; Data Mining; Time Series Data; Machine Learning Techniques.
    [Show full text]
  • Automatic Classification of Oranges Using Image Processing and Data Mining Techniques
    Automatic classification of oranges using image processing and data mining techniques Juan Pablo Mercol, Juliana Gambini, Juan Miguel Santos Departamento de Computaci´on, Facultad de Ciencias Exactas y Naturales, UBA, Buenos Aires, Argentina [email protected],{jgambini,jmsantos}@dc.uba.ar, Abstract Data mining is the discovery of patterns and regularities from large amounts of data using machine learning algorithms. This can be applied to object recognition using image processing techniques. In fruits and vegetables production lines, the quality assurance is done by trained people who inspect the fruits while they move in a conveyor belt, and classify them in several categories based on visual features. In this paper we present an automatic orange’s classification system, which uses visual inspection to extract features from images captured with a digital camera. With these features train several data mining algorithms which should classify the fruits in one of the three pre-established categories. The data mining algorithms used are five different decision trees (J48, Classification and Regression Tree (CART), Best First Tree, Logistic Model Tree (LMT) and Random For- est), three artificial neural networks (Multilayer Perceptron with Backpropagation, Radial Basis Function Network (RBF Network), Sequential Minimal Optimization for Support Vector Machine (SMO)) and a classification rule (1Rule). The obtained results are encouraging because of the good accuracy achieved by the clas- sifiers and the low computational costs. Keywords: Image processing, Data Mining, Neural Networks, Decision trees, Fruit Qual- ity Assurance, Visual inspection, Artificial Vision. 1 Introduction During the last years, there has been an increase in the need to measure the quality of several products, in order to satisfy customers needs in the industry and services levels.
    [Show full text]
  • Anomaly Detection in Raw Audio Using Deep Autoregressive Networks
    ANOMALY DETECTION IN RAW AUDIO USING DEEP AUTOREGRESSIVE NETWORKS Ellen Rushe, Brian Mac Namee Insight Centre for Data Analytics, University College Dublin ABSTRACT difficulty to parallelize backpropagation though time, which Anomaly detection involves the recognition of patterns out- can slow training, especially over very long sequences. This side of what is considered normal, given a certain set of input drawback has given rise to convolutional autoregressive ar- data. This presents a unique set of challenges for machine chitectures [24]. These models are highly parallelizable in learning, particularly if we assume a semi-supervised sce- the training phase, meaning that larger receptive fields can nario in which anomalous patterns are unavailable at training be utilised and computation made more tractable due to ef- time meaning algorithms must rely on non-anomalous data fective resource utilization. In this paper we adapt WaveNet alone. Anomaly detection in time series adds an additional [24], a robust convolutional autoregressive model originally level of complexity given the contextual nature of anomalies. created for raw audio generation, for anomaly detection in For time series modelling, autoregressive deep learning archi- audio. In experiments using multiple datasets we find that we tectures such as WaveNet have proven to be powerful gener- obtain significant performance gains over deep convolutional ative models, specifically in the field of speech synthesis. In autoencoders. this paper, we propose to extend the use of this type of ar- The remainder of this paper proceeds as follows: Sec- chitecture to anomaly detection in raw audio. In experiments tion 2 surveys recent related work on the use of deep neu- using multiple audio datasets we compare the performance of ral networks for anomaly detection; Section 3 describes the this approach to a baseline autoencoder model and show su- WaveNet architecture and how it has been re-purposed for perior performance in almost all cases.
    [Show full text]