Anomaly Detection on Gas Turbine Time-Series' Data Using Deep

Anomaly Detection on Gas Turbine Time-series’ Data Using Deep LSTM-Autoencoder Marzieh Farahani Marzieh Farahani Autumn 2020 Degree Project in Computational Science and Engineering, 30 credits Supervisor: Lili Jiang Extern Supervisor: Mohamed Elhafiz Hassan Examiner: Eddie Wadbro Master of Science Programme in Computational Science and Engineering, 120 credits Abstract Anomaly detection with the aim of identifying outliers plays a very im- portant role in various applications (e.g., online spam, manufacturing, finance etc.). An automatic and reliable anomaly detection tool with accurate prediction is essential in many domains. This thesis proposes an anomaly detection method by applying deep LSTM (long short-term memory) especially on time-series data. By validating on real-world data at Siemens Industrial Turbomachinery (SIT), the proposed method shows promising performance, and can be employed in different data domains like device logs of turbine machines to provide useful information on abnormal behaviors. In detail, our proposed method applies an autoencoder to have feature selection by keeping vital features, and learn the time series’s en- coded representation. This approach reduces the extensive input data by pulling out the autoencoder’s latent layer output. For prediction, we then train a deep LSTM model with three hidden layers based on the encoder’s latent layer output. Afterwards, given the output from the prediction model, we detect the anomaly sensors related to the specific gas turbine by using a threshold approach. Our experimental results show that our proposed methods perform well on noisy and real-world dataset in order to detect anomalies. More- over, it confirmed that making predictions based on encoding representation, which is under reduction, is more accurate. We could say applying autoencoder can improve both anomaly detection and prediction tasks. Additionally, the performance of deep neural networks would be significantly improved for data with high complexity. Acknowledgements We are grateful because we managed to complete the master final project in anomaly detection of a gas turbine using deep LSTM-Autoencoder within the given time by Marzieh Farahani, a student of the master program of computational science and engineering. This theis is could not be completed without the effort and cooperation of Siemens Company and Umeå University. I also thank both supervisors in Siemens Company and Umeå University, Mr. Mohamed Elhafiz Hassan and Dr. Lili Jiang for the guidance and encouragement in finishing the final project. Last but not least, I would like to thank Siemens Data Scientist members in Siemens Com- pany, My family, and Mr. Mehrdad Farahani for their constant source of inspiration and guidance. Contents 1 Introduction 1 1.1 Objectives2 1.2 Scope and Limitation2 1.3 Literature Review3 1.3.1 Statistical-based methods3 1.3.2 Prediction-based methods4 1.3.3 Reconstruction-based methods5 1.4 Thesis Structure5 2 Principles and Concepts 6 2.1 Time Series and Anomaly Forecasting6 2.1.1 Key Components associated with An Anomaly Detection Problem8 2.1.1.1 Nature/Type of Anomaly8 2.1.1.2 Type of Time-spaces8 2.2 Time Series and Deep LSTM9 2.3 Dimentionality Reduction (Autoencoder) 11 3 Methodology 14 3.1 Dataset 14 3.2 Model Design 19 3.3 Prediction Model 19 3.3.1 Reconstruction Autoencoder 19 3.3.1.1 Reduction Using AE 21 3.3.2 Deep LSTM 23 3.4 Detection Model 26 3.4.1 Anomaly Scoring and Selection of Candidate Set 27 4 Experimental Study and Results Analysis 29 4.1 Prediction Model 29 4.1.1 Reconstruction Autoencoder 29 4.1.2 Deep LSTM 33 4.2 Detection Model 36 5 Conclusion and Future Work 39 5.1 Conclusion 39 5.2 Future Work 39 References 41 1(43) 1 Introduction Anomaly detection (aka outlier detection) is the process of identifying unexpected items, observations or events in data sets, which differ from the norm. As an integral part of most companies and businesses, anomaly detection significantly reduces financial and technical losses. Especially, as time-ordered data in various platforms (industrial, health, economic, and financial) has been exponentially increasing along with the emerging Internet of Things (IoT) [18] , which enables them to collect and share data. This growth creates new business opportunities as well as brings more challenges to detect outliers among the time-series data. However, many companies usually have manual monitoring for identifying anomalies on different underlying bases, which require substantial human effort to monitor daily or weekly reports on operations or performance. Thus, it is challenging for the companies to track all metrics simultaneously and find a correlation between them. Besides the above difficulties, time series data in companies is noisy and in large scale, and especially, the label or the class of the anomalies data is lacking. Many types of re- search are trying to apply data-driven methods. These methods for anomaly detection can be mainly categorized into three types: statistical modelings, such as the k-means cluster- ing and Random forest, temporal feature modeling, which is mainly based on the Long Short-term Memory (LSTM), and spatial feature modeling, which takes advantage of Con- volutions Neural Network CNN [14]. The primary purpose of these methods is developing stable algorithms by adopting system conditions to detect outliers even in different environ- ments. Deep learning methods have become successful due to the capacity to handle non-linearity in complex temporal correlation [8]. Deep learning (DL) is retrieved from classical Machine Learning (ML). Still, deep learning is responsible for the growth of Artificial Intelligence usage by improving the existing algorithms. It has shown high-grade performance because of its power to deal with unstructured and unlabeled data. As well as there is no need for domain knowledge to extract features. Nevertheless, it is fair to say in deep learning approaches, there are limitations such as extra time to train and need lots of training data. Moreover, one of the most considered boundaries of deep learning is the Neural Networks at the core of deep learning are black boxes. The project aims to provide and apply the deep LSTM method with using autoencoder in an unsupervised way to detect time series anomalies. Apart from working with an unlabeled dataset, there is no need to conduct feature engineering because it is a complicated task. Instead, many deep neural network parameters are trained to learn the input data’s critical feature during the training stage. Plus, Autoencoder helps deal with the large scale dimensional inputs data which they are ordered in time. 2(43) 1.1 Objectives Siemens Industrial Turbomachinery (SIT) is one of the biggest international companies in power generation. The company invested in diverse projects to examine and study machine lifetime and its corresponding components to identify how and when various failures influ- enced the system. In recent years, the digitalization transformation benefits from collecting and maintaining data in database formats that carry various valuable information about unexpected events, component repair, and operation outage history. The controlling system, which includes a computing device, gives us information about the hardware component’s thermodynamic and operating parameters using sensors placed along with the turbines sections. This thesis’s principal goal is to develop an advanced maintenance strategy that can help the power plant operators increase their assets’ availability and reliability and minimize their CAPEX and OPEX. Capital Expenditures (CAPEX) are significant purchases a company proceeds on goods or services to develop a company’s future performance. Furthermore, Operating Expenses(OPEX) are the typical cost, such as salaries and rent that a company incurs to run their day-to-day operations [36]. To reach this goal, we have stated a general Deep Learning (DL) model with two desires: • Daily Forecasting in order to predict each sensor’s five minutes head for specific gas turbine machine. • Detection in order to find the list of sensors with anomaly behavior that gave the system hard time. It improves system cost along with the daily Forecasting model. The model is expected: • The model must be generalize. In this study, the model performed for the specific case study. For a different case study the result should be obtained within minimal changes. • The model should be accurate and valided under different environmental conditions. The prediction/estimation approach detects abnormal behavior (in a shape anomaly) on the collected data by comparing it with the desired network outputs. It helps build the decision- making process for the power plant operators automatically and classify the useful pattern during the operations. Detecting anomalies can give pre-warning and reduce system costs to the manufactories. The work of anomaly detection in this thesis especially provides useful information to the department within Siemens Industrial Turbomachinery (SIT). 1.2 Scope and Limitation The thesis scope’s decision has been taken after careful analysis of the customer service dataset. It is required to declare that a turbomachine, such as a gas turbine, generally includes several sections, and each section includes numerous hardware components. Over 3(43) time, multiple elements, including thermal cycling, vibration, and pressure pulses within the gas turbine were measurement by sensors (aka signals). Fifteen gas turbine units were considered according to customers’ commonly requested units with the frequent turbine-model package. Finally, The project begins with one final gas turbine unit, and the rest were left out of the project because they were in the commissioning phase, and the signal value’s quality and quantity were not good enough. Another limitation is related to the records of signal value for the specific gas turbine unit at several time intervals. The records were collected from 2012 until the ongoing year. Nevertheless, some signals did not have any records for some months between years (2012- 2020).

Load more