<<

HARD DISK DRIVE FAILURE DETECTION WITH RECURRENCE QUANTIFICATION ANALYSIS

A Thesis Presented

By Wei Li

to The Department of Mechanical & Industrial Engineering

in partial fulfillment of the requirements for the degree of Master of Science

in the field of Industrial Engineering

Northeastern University Boston,

August 2020

ACKNOWLEDGEMENTS

I would like to express my sincere appreciation to my thesis advisor, Professor Sagar

Kamarthi, who has been providing me with guidance, encouragement and patience throughout the duration of this project. I would also like to extend my gratitude to Professor

Srinivasan Radhakrishnan for his inspiration and guidance.

3

TABLE OF CONTENTS

LIST OF TABLES ...... v

LIST OF FIGURES ...... vi

ABSTRACT ...... vii

1. INTRODUCTION ...... 1

1.1 ...... 1

1.2 Self-Monitoring Analysis and Reporting Technology (SMART) ...... 4

1.3 Objective ...... 5

2. STATE-OF-THE-ART LITERATURE REVIEW ...... 5

2.1 Searching Criteria ...... 6

2.2 Extraction Method ...... 6

2.3 Result ...... 7

2.3.1 Problem Definition ...... 7

2.3.2 Source ...... 8

2.3.2 Feature Selection ...... 11

2.3.3 Preprocessing and Imbalanced Data Treatment ...... 12 4

2.3.4 Model Selection ...... 14

2.3.5 Performance Evaluation Metrics ...... 18

2.4 Discussion ...... 19

3. Methodology ...... 21

3.1 Problem Definition ...... 21

3.2 Recurrent Features Generation ...... 21

3.2.1 Phase Space Reconstruction ...... 22

3.2.2 Recurrence Plot ...... 24

3.2.3 Recurrence Quantification Measurements ...... 25

3.3 Approaches ...... 27

3.4 Performance Evaluation ...... 27

4. CASE STUDY ...... 28

4.1 Data Preparation ...... 28

4.2 Recurrence Features Generation ...... 29

4.3 Model Evaluation ...... 32

5. CONCLUSION ...... 34

Reference ...... 36 v

LIST OF TABLES

Table 1. Summary of three popular open-source datasets ...... 10

Table 2. Overview of feature selection methods ...... 12

Table 3. Summary of Frequently Used RQAs ...... 26

Table 4. Parameter selection results for the attributes ...... 31

Table 5. Performance evaluation metrics from different models (95% confidence interval)

...... 33

Table 6. Performance of decision tree without SMOTE for different time in advance .... 34

vi

LIST OF FIGURES

Figure 1. Components of a hard disk drive with single platter ...... 3

Figure 2. The columns of the information extraction table ...... 6

Figure 3. Illustration of three major problem definitions ...... 7

Figure 4. Frequency of the three problem definitions over years ...... 8

Figure 5. Proportion of data collection methods ...... 9

Figure 6. Summary of proposed machine learning models ...... 14

Figure 7. Popularity of the models over years ...... 18

Figure 8. Proposed hard disk prognostic system ...... 21

Figure 9. Graph Representative of a Lorenz Attractor (s = 10, = 25, b = 8/3) ...... 23

Figure 10. Steps to trim the time series ...... 29

Figure 11. Examples of four types of SMART features ...... 30

Figure 12. Parameter search log for feature Writes of Disk100365 ...... 31

Figure 13. Example recurrence plot of Servo10 of Disk100365 ...... 32

vii

ABSTRACT

The need for fast and reliable and management has been immense since the era of . The prognostic techniques such as the Self-Monitoring Analysis and Reporting Technology (SMART) provide status monitoring and failure detection for hard disk drives, the most widely used data storage device in the industrial setting.

However, the original threshold method fails to yield satisfying failure detection rate.

Recently, the researchers have developed various machine-learning-based techniques to make full use of the SMART attributes and improve the failure detection rate while controlling the false alarm rate.

This work first provides a review of the state-of-the-art SMART-based machine learning applications. Covering 51 publications from 2001-2020, the review presents a synthesis of methodologies following the machine learning steps, reveals the research trend over time, and points out some future research possibilities. Then a recurrence-theorem-based prognostic hard disk drive failure detection method is proposed. To well capture the time series structure, the system first converts the original SMART data to recurrence quantification analysis (RQA) measures with time-delay embedding and then passes the data to a binary classifier. The system is tested on a dataset with 369 hard disk drives. The basic decision tree model with SMOTE oversampling achieves an FDR and FAR with a

95% confidence interval of (85%, 100%) and (0%, 6.7%) respectively when detecting errors two days in advance. 1

1. INTRODUCTION

The Industrial (IIoT) has bridged the gap between operational and . Correspondingly, the cloud-based data platforms have improved production efficiency and quality by reducing local developing and maintenance effort. To ensure high quality of the continuous and diverse cloud services, maintaining a highly reliable data storage is a critical task. Therefore, the prognostic health assessment of the hard disk drive, the most preferable data storage device in the industrial setting, has become essential for the smooth running of processes at data centers.

1.1 Hard Disk Drive

The introduction of the stored-program computer by John Von Veumann in 1945 marked the beginning of general-purpose computers, in which the central processing unit is connected with storage devices that store both data and program instructions (Lesser &

Haanstra, 1957). Memory is an essential and critical component of any modern computer.

The memory devices used in modern computers can be categorized into three types: primary, secondary, and tertiary memory devices (Hadjieleftheriou et al., 2007). The primary memory devices allow the CPU to read/write the data at the fastest possible speed.

These devices tend to have small memory size and stay connected close to the CPU. The processor registers, which allow the CPU to have the fastest possible read/write access to data, belong to this class. The CPU uses these tiny-sized registers to store and retrieve data pertaining to the currently executing instructions. The fast read/write access processor cache, another primary memory device, stores the data that has been recently accessed from the slower storage devices. The main memory, which is also a primary memory device, stores instructions and data of running programs. It is relatively slower than registers and 2 processor cache. Primary memory devices are semiconductor-based and are very expensive due to high manufacturing cost.

Secondary memory devices, which have a relatively slower processing speed than primary memories, greatly outperform the primary memory devices in capacity. They enhance the performance and functionality of a computer by enabling read/write access to enormous amount of data and information. Secondary memories are usually supported by electro- mechanical devices. Unlike primary memory devices, which require continuous power supply to preserve their data, secondary memory devices retain their contents regardless of power supply. Secondary memory devices are good for long-term storage of data and information. In many modern computers, hard disk drives play the role of secondary memory. Hard disk drives, which are based on magnetic principles, permanently store data and information. They are two orders of magnitude less expensive than primary memory devices. They offer random and quick access capability and high data transfer rates.

However, they are about six orders of magnitude slower than primary devices when it comes to access time. Hard disk drives continue to serve as secondary memory devices due to three main reasons: (a) hard disk drives are rigid and durable; (b) their tight tolerances allow hard disk drives not only store vast amount of data but also perform read/write operations very fast; and (c) they are much less expensive than semiconductor-based memory devices. As the demand for capacity and speed keeps rising, hard disk drive manufacturers are making hard disk drives that are more compact and faster. Hard disk drives have established themselves as important subunits of modern computer systems. A hard disk drive consists of three important components: the platter(s), the heads, and the 3 actuator. It can have either one single platter or a stack of platters. Figure 1 shows the components of a hard disk drive with single platter.

Figure 1. Components of a hard disk drive with single platter

A platter is a circular disk assembled tightly to the spindle. The spindle turns the platter when using. The platter in a modern HDD rotates more than a hundred times per second, maintaining a nanoscale gap between the disk head and the magnetic media that stores data.

The platter has two smooth surfaces, each of which stores millions of bits of data using a thin film media layer composed of magnetic material. A 0 or 1 bit is represented by orienting the magnetic material in one or the other direction. The thin film media on the pattern surfaces provides high data density. The platter substrate is made of glass composites, which provide a very smooth surface for the deposition of magnetic material.

Each platter surface is divided into tracks and sectors. A track is a concentric ring on the platter and a sector is a circumferential segment (or angular segment) of a track. A sector is the smallest addressable storage unit on the platter. A read/write scan covers at least one sector. All sectors contain an equal and preset number of : 512 bytes for most hard 4 drives. Sectors are separated by gaps which store error correcting codes and head synchronization information required for the hard disk drive to perform its operations correctly. The track density, track per inch (TPI), and linear density, bits per inch of a track

(BPI), characterize physical capabilities of a drive. A typical platter may have as many as

100,000 tracks per inch of radial distance and may store up to 900,000 bits per inch length of a track (Hadjieleftheriou et al., 2007).

1.2 Self-Monitoring Analysis and Reporting Technology (SMART)

IBM invented the first hard disk monitoring technology, the Predictive Failure Analysis

(PFA), and incorporated it in the IBM 0662 SCSI-2 disk drives in 1992. The PFA code monitors subsystems of the hard disk drives. If the record exceeds a pre-determined threshold, an alarm will set off. The output of the monitoring software is binary: "OK" or

"About to fail." The PFA code monitors attributes such as read/write errors, fly height changes, and torque amplification control. Later, Compaq developed a similar technology named IntelliSafe in conjunction with disk drive manufacturers Seagate, Quantum, and

Conner. The IntelliSafe system monitors the disk health parameters and passes the records to the operating and user-space monitoring software. Subsequently, Compaq, IBM, Seagate,

Quantum, and Conner, along with the Small Form Factor Committee, developed a new standard version named Self-Monitoring Analysis and Reporting Technology (SMART), which combines conceptual elements of Compaq's IntelliSafe and IBM's PFA (McLeod,

2005).

Self-Monitoring Analysis and Reporting Technology (SMART) is a monitoring system which detects and reports various indicators of reliability to anticipate failures. The system takes records of SMART attributes at a fixed rate. The system generates alarm 5 when the attribute value exceeds the pre-defined threshold. However, the threshold method provides poor prediction performance, where the failure detection rate is around 3-10% when false alarm rate is 0.1% (Murray et al., 2005). In this case, a health assessment method with better performance is required to prevent catastrophic consequences caused by an unpredicted disk failure, which can be unrecoverable and permanent.

1.3 Research Objective

The objective of this work is to achieve prognostic hard disk drive failure detection with the help of recurrence theory and supervised machine learning method. The model is expected to outperform the original SMART threshold-based method in failure detection rate at a low false alarm rate. The remaining of the paper is organized in the following manner. Section 2 gives a detailed review of state-of-the-art hard disk drive health assessment applications. Section 3 proposes a failure detection model based on the recurrence theory and supervised machine learning. Section 4 validates the feasibility of the model on a real-world hard disk dataset. Section 5 summarizes the whole paper with some discussions.

2. STATE-OF-THE-ART LITERATURE REVIEW

The objective of this review is to identify, assess, and analyze the state-of-the-art applications of machine learning methods for SMART-based hard disk drive failure detection. The review process includes four stages: literature search, full-text assessment, information extraction, and results analysis. The review result provides a synthesis of current methodology in this field, reveals the research trend over time, and points out some future research possibilities. 6

2.1 Searching Criteria

The author conducted the searching process using academic searching engines IEEE

Xplore and Engineering Village. The searching keywords consist of “hard disk drive”,

“failure detection”, “machine learning”, and “SMART”. To avoid missing potentially related publications, the author also performed searching with the synonyms of the keywords, such as “hard drive” and “health assessment”. Relevant articles were included after an examination on the title, keywords, and abstract. A total of 51 peer-reviewed journal articles and conference papers are selected. They cover a time frame from 2001 to

2020 (till May).

2.2 Information Extraction Method

The generic process for a machine learning application includes data collection, preprocessing, and modeling. Therefore, the author created a table to extract the information throughout the machine learning process after a full-text assessment of each paper. Figure 1 shows the columns of the information extraction table following a logical flow.

Figure 2. The columns of the information extraction table 7

2.3 Result

In this section, the data collected above are summarized and presented in an organized manner. More than a summary of all the studies, the results are analyzed in both qualitative and quantitative format.

2.3.1 Problem Definition

The problem definition of the selected generally falls into one of the following three categories: failure detection, health degree classification, and remaining useful time

(RUL) prediction. The failure detection problem is normally based on a one-class or binary classification model. With the SMART data input, the model can detect a failed disk with a preset time in advance. The health degree classification problem introduces the self- defined quantitative health degree, which extended the one-class or binary classification to multi-class classification. Moreover, the RUL prediction, which shapes as a prediction problem, can estimate the remaining time before the failure happens. Figure 2 shows the three problem definitions and their outputs with an example.

Figure 3. Illustration of three major problem definitions 8

The frequency of the three problem definitions over years is plotted in Figure 3. The increasing trend of research interest in this topic is captured in the graph. Also, the performance of RUL prediction models has increased in recent years with the advance of data collection and modeling technology. Therefore, the research focus is gradually shifting from simple disk failure detection to RUL prediction.

Figure 4. Frequency of the three problem definitions over years

Other than the summarized objectives above, the specific focus of each research differs.

For example, a few researches attempted to achieve real-time failure detection to meet the requirement of Industry 4.0 Revolution.

2.3.2 Data Source

The life requires time-to-failure data of a product under normal operating conditions. However, the data collection of failed hard disks is challenging because hard disks are reliable data storage devices with annual failure rates of only 0.3% to 3% per year

(Yang & Sun, 1999). The method used to collect disk data by the selected researches includes datacenter collection, field collection, and degradation experiment. Figure 4 shows the number of researches with the data collecting methods above. 9

Figure 5. Proportion of data collection methods

The disk data collected in the datacenter is most preferred because of the data’s big volume and homogeneous characteristic. The easy access to the open-source data provided by big datacenters also contributes to its popularity. On the contrary, the disk data collected from the field contains more because of the heterogeneous conditions the disks work under before failure. Moreover, the degradation experiment, where the researchers accelerate the failure of the hard disk under laboratory condition, is an effective way to get the life characteristics of a disk. However, the accelerating degradation setting, such as high temperatures and usage, and the limited number of disks the researchers can perform the experiment on each time both put a constraint on the application of this data collection method.

Open-source data are the primary input for the selected researches. 34 out of 51 researches used the dataset from Backblaze Datacenter, Quantum Inc., or Center for Magnetic

Recording Research (CMRR) at UCSD. A summary of these open-source datasets is shown in Table1. 10

Table 1. Summary of three popular open-source datasets

Backblaze Datacenter Quantum Inc. CMCC at UCSD

Failed disk data are collected Data from field; good disk data are Collection Datacenter collection Degradation experiment from a reliability demonstration Method test run by the manufacturer

Type A: 1936 hard drives More than 100000 drives with 9 failed ones Number of of different models with 369 same-type hard drives with Disks an annual failure rate of 191 failed ones 1% Type B: 1808 hard drives with 27 failed ones

Real time data from Time Span 2 to 3 months Most recent 25 days 2014 to 2020

Interval semi-regular intervals Between 24 hours ranging from 0.5 hour to 2 hours Time Snaps several days

Number of the raw and normalized 15 cumulative and SMART values for 62 different incremental SMART 64 different SMART attributes Attributes SMART attributes attributes

Botezatu et al., 2016; Chaves et al. 2016; Rincón et al., 2017; dos Santos Lima et al., 2017; Aussel et al., 2017; Pereira et al., 2017; Xiao et al., 2018; dos Santos Lima et al., 2018; Murray et al., 2003; Murray et Chaves et al., 2018; al., 2005; Agarwal et al., 2009; Anantharaman et al., Zhao et al., 2010; Wang et al., Hughes et al., 2000; 2011; Teoh et al., 2012; Reference 2018; Zhang et al., 2018; Mashhadi et al., 2018; Hamerly & Elkan, 2001 Pitakrat et al., 2013; Wang & Xie et al., 2018; Shen et Miao et al., 2013; Xie et al., al., 2018; Su & Huang, 2018; Kaur & Kaur, 2019; 2018; Lima et al., 2018; Züfle et al., 2020 Wang et al., 2019; Jiang et al., 2020; Kaur & Kaur, 2018; Kaur & Kaur, 2019; Yu, 2019; Li et al., 2019; Wang & Zhang, 2020 11

The data from Backblaze is currently the largest open-source dataset on disk drive performance. The huge data volume and diversity of disk types offers researchers much flexibility in creating subsets towards their own data analysis goals. However, the extreme imbalance between good and failed disks will be challenging in the model training stage.

The data from Quantum Inc. covers the life span of the disks because it was collected from the degradation test. It shares the data imbalance problem with the Backblaze dataset. In contrast, the data from CMRR does not have the imbalance problem. However, since the good drives and the failed drives were collected in different environment, it is possible that the model will in fact learn the difference between environments instead of the disk performances.

2.3.2 Feature Selection

Many factors may lead to the failure of a hard disk drive. Usually, it is not feasible to take all of them into consideration. Therefore, it is critical to select a few factors that are the most influential for the particular application. The feature selection methods proposed in the researches range from intuitive selection to model-assisted selection. Table2 shows the proposed feature selection methods sorted by their complexity levels. Due to the nonparametric of the hard disk drive attributes, the degrading pattern detection, nonparametric reverse arrangement test and RFE with baseline models are proved to be the most efficient feature selection methods.

12

Table 2. Overview of feature selection methods

Feature Selection Method Reference

Select readable and meaningful attributes and get Pitakrat et al., 2013; Zhu et al., 2013; Franklin, rid of attributes with mostly constant values 2017; Shen et al., 2018; Züfle et al., 2020

Zeid & Kamarthi, 2010; Botezatu et al., 2016; Select features with significant degrading patterns dos Santos Lima et al., 2018; Lima et al., 2018

Select attributes whose descriptive differ Li et al., 2017; Mashhadi et al., 2018 the most between classes

Select attributes that have a high correlation Jiang et al., 2020 coefficient with disk errors

Select features based on the ranking of their Ganguly et al. 2016 information entropy

Wang et al., 2011; Wang & Miao et al., 2013; Select attributes with high ranking in failure modes, Wang & Ma et al., 2013; Wang & Jiang et al., mechanisms, and effects analysis (FMMEA) 2019

Murray et al., 2003; Murray et al., 2005; Zhao et al., 2010; Li et al., 2014; Pang et al., 2016; Li et Select attributes by nonparametric reverse al., 2016; Xu et al., 2016; Rincón et al., 2017; arrangement test Aussel et al., 2017; Chaves et al., 2018; Wang & Bao et al., 2019; Liu et al., 2019

Select the features using the Recursive Feature Hamerly & Elkan, 2001; Chaves et al., 2016; Elimination (RFE) algorithm with baseline models Pereira et al., 2017; Xiao et al., 2018; Xu et al., such as Random Forest 2018; Kaur & Kaur, 2018; Kaur & Kaur, 2019

Select highly predictive but uncorrelated features using the Maximum Relevance Minimum Wang & Miao et al., 2013 Redundancy (mRMR) algorithm

2.3.3 Preprocessing and Imbalanced Data Treatment

The data preprocessing methods proposed include missing value imputation with k-NN, outlier elimination, data normalization, data discretization and feature combination. The choice of preprocessing methods strongly depends on the characteristics of the original dataset as well as the requirement of the selected machine learning model. 13

A common challenge for researches in this field is the imbalanced nature of the hard disk data, because the failure of the hard drive is a small-probability event. Among all the imbalanced data treatments proposed in the selected researches, the naïve resampling methods, which are random oversampling and random undersampling, are adopted the most due to its easy implementation and efficiency for models that are sensitive to a skewed distribution. However, the drawback of naïve resampling methods cannot be ignored. The random oversampling inevitably increases the likelihood of model overfitting, and the random undersampling tends to useful and important dataset information (Fernández et al.,

2018).

Some researchers used novel resampling methods to avoid the problems caused by random resampling. Zhang et al. (2018) used the K-means clustering algorithm (Kanungo et al.,

2002) to cluster the healthy drives into 10 clusters, then the top 30% samples closest to the centroid were selected to represent the healthy drive population.

Kaur & Kaur (2019) proposed a novel algorithm named Balanced Splitter Algorithm (BSA) for the imbalance problem. BSA splits all good drives into subsets, and train the model using all combination of the good subsets with the available failed drives. The algorithm adjusts the class proportion without losing any useful information.

Züfle et al. (2020) applied Enhanced Structure Preserving Oversampling (ESPO) and

Synthetic Minority Oversampling Technique (SMOTE) separately. ESPO creates new records of the failed class while maintaining the covariance structure (Cao et al., 2013).

SMOTE creates new records of the failed class with the help of the instance measure

(Chawla et al., 2002). First, it captures the nearest neighbors for each failed record. Then, a randomly selected neighbor is chosen, and the new record is created at a randomly 14 selected point between the two examples in the feature space. The experiment result shows that ESPO improves the prediction quality with more computation time required, while

SMOTE performs worse than random oversampling.

2.3.4 Model Selection

Because of the shape of the hard disk drive failure prediction problem and the labeled nature of the available data, it is natural to use supervised or semi-supervised learning models. Figure 5 is a summary of all the models proposed in the selected researches. The models are categorized into eight groups based on the model principal.

Figure 6. Summary of proposed machine learning models

Probabilistic models build on the probability distribution of attributes. Hamerly & Elkan

(2001) used a naïve Bayes classifier and naïve Bayes submodels on the Quantum Inc. dataset. They both outperforms the threshold method by increasing the true positive rate without raising the false alarm rate. As an extension, Murray et al. (2003) developed a semi-supervised learning model: multiple instance naïve Bayes (mi-NB), which 15 outperforms the naïve Bayes classifier. The statistical hypothesis tests can also substitute the original error threshold method (Hughes et al., 2002). As one of the earliest researches in this field, this hypothesis test achieved an accuracy of 40%-60%, which is 3-4 times higher than the threshold method. The same research team also validated the feasibility of this method on the CMRR dataset (Murray et al.; 2003). Moreover, the hidden Markov model (HMM), which takes time series of attributes as input, can be effective in disk failure prediction (Zhao et al., 2010; Teoh et al., 2012; Wang & Jiang et al., 2019). Some novel probabilistic approaches have been proposed in recent years. Wang & Ma et al. (2013) proposed a two-step parametric method, which outperforms the SVM and HMM in the same experimental condition. In the first step, the model captures the degradation deviation of the attribute. Then a temporal probabilistic model tracks the anomaly progression and creates a confident prediction with minimum false alarm rate.

Rule-based models are based on a rule or a set of rules constructed by analyzing some characteristic or statistics of the training data. Comparing to black box model, it can provide an understanding of events. Agarwal et al. (2009) applied Maximum Likelihood

Rules (MLRules) to a self-collected dataset, where the variables consist of both disk events and SMART attributes. It generates highly interpretable rules and outperforms the baseline model SVM in the same experimental setting.

Instance-based learning computes the similarity measure between a new record and the training records to make a decision. Comparing to commonly used Euclidean distance,

Mahalanobis distance can measure the distance between a record and a distribution while considering the covariance between attributes. This characteristic makes it a suitable distance measure in this case. Wang et al. (2011) proposed a threshold method with 16

Mahalanobis distance. If the distance between the new record and the healthy drive distribution exceeds the threshold value, the record will be classified as failed. The research team later combined sophisticated feature selection methods to improve the model performance (Wang & Miao et al., 2013). The feature selection based Mahalanobis distance (FSMD) outperforms SVM with the CMRR dataset.

Function approximation model estimates functions that map attributes to a value that represents an output class. Yang et al. (2015) used the logistic regression model because of its simplicity and scalability. The research team defined the hard drive failure prediction problem as a cloud computing application with big data. Therefore, the focus is majorly on improving the quality of the training data instead of proposing sophisticated algorithms.

Franklin (2017) validated the significance of the pre-failure signature, Reallocated Sector

Count. Then a linear regression model is applied to forecast the failure rate.

Decision trees are tree-like graphs, where each node contains a conditional statement that further splits the node into branches. Suchatpong and Bhumkittipich (2014) used decision tree learning with boosting and pruning. Similarly, Li et al. (2014) applied basic classification trees and regression trees with model updating strategies. The performance on a dataset collected from a data center is better than the BP-ANN model. Rincón et al.

(2017) proved the efficiency of the decision tree in a heterogeneous environment. In addition, Kaur & Kaur (2019) proposed a voting-based decision tree classifier, which outperforms the one-for-all decision tree. The random forest, which combine a multitude of single trees when training and aggregate the results from the single trees, can correct the decision tree’s weakness of overfitting the training set. It produces promising results in hard disk drive failure prediction (Su & Yon, 2018; Xiao et al., 2018; Xu et al., 2018; 17

Anantharaman et al., 2018; Mashhadi et al., 2018; Shen et al., 2018; Su & Huang, 2018;

Jiang et al., 2020; Züfle et al., 2020). The gradient boosting algorithm can also improve the performance of decision trees in this problem setting (Botezatu et al., 2016; Aussel et al.,

2017; Jiang et al., 2020).

Artificial Neural Networks (ANN) are models that are inspired by the structure and function of biological neural networks. Deep learning, a modern update to ANN, is concerned with building much larger and more complex neural networks. Zeid and

Kamarthi (2010) applied an ANN model to the combination of SMART features and

Read/Write Rate. Zhu et al. (2013) improved the ANN performance by adding the

AdaBoost algorithm and back propagation. Recurrent Neural Network (RNN), which is designed for sequence problems, suits the time series input in this problem. Therefore, multiple studies have investigated different RNN structures and modifications, and gained promising results (Li et al., 2016; Xu et al., 2016; Zhang et al., 2018; Liu et al., 2019; Wang

& Bao et al., 2019). Moreover, a few researchers adopted the Long Short-Term Memory

(LSTM) network to avoid the vanishing gradient problem with traditional RNN model (dos

Santos Lima et al., 2017; dos Santos Lima et al., 2018; Lima et al., 2018).

Ensemble learning generates and combines multiple weak learners to produce one prediction result. Ganguly et al. (2016) proposed a simple two-stage ensemble model with the decision tree at stage one and the logistic regression at stage two to prevent overfitting.

Li et al. (2019) constructed a stacking ensemble model with XGBoost and LSTM as submodels. The model performance meets the standard of production availability.

Similarly, Wang & Zhang (2020) stacked XGBoost, LSTM and XGBoost iterative regression. The ensemble model outperforms all the single models. Comparing to stacking 18 ensemble model, the Bayesian-network-based ensemble learning method gains the most popularity in this field because of its ability to assess the uncertainty in its component model predictions (Pang et al., 2016; Chaves et al., 2016; Pereira et al., 2017; Chaves et al., 2018). Moreover, Xie et al. (2018) proposed an optimized modeling engine (OME), which integrated one-for-one (OFO) model, one-for-all (OFA) model, and transfer learning

(TL) model. The OME outperforms an existing one-for-all model on the Backblaze dataset.

The time trend of the model family popularity is presented in Figure 6. Most of the studies have baseline models in comparison to the novel model. Therefore, if a study investigated models from more than one group, the model with the best performance is counted as the primary model proposed. Two publications that reviewed the performance of multiple models are excluded from the popularity analysis.

Figure 7. Popularity of the models over years

2.3.5 Performance Evaluation Metrics

The objective of substituting the original SMART threshold method with machine learning models is to improve the true alarm rate while controlling the false alarm rate. Therefore, 19 the failure detection rate (FDR), false alarm rate (FAR) and receiver operating characteristic (ROC) curve are most preferable among all the basic classification metrics.

Researchers also defined new metrics to assess the model performance in the hard drive problem setting. Li et al. (2016) proposed the migration rate (MR), defined as the proportion of data that is successfully migrated before disk failure, and the mis-migration rate (MMR), defined as the proportion of data on healthy disks that is migrated needlessly.

In the multi-class classification setting, Xu et al. (2016) proposed the tolerance-skipping- one-level accuracy, which tolerate assessment mistakes by one health level.

For the RUL prediction problem, besides the basic performance metrics such as AE, MAE, and RMSE, prognostic performance metrics also come into picture. Researchers adopted prediction horizon (PH) to verify the model’s ability to estimate the end-of-life in advance

(Chaves et al., 2018; dos Santos Lima et al., 2018; Lima et al., 2018). They also used the

� − � performance metric to check the model’s ability to return the result within a certain error margin.

2.4 Discussion

Despite of a generous amount of research conducted to predict the hard disk drive failure, limitation still exists in model performance and implementation feasibility. There are three significant issues observed from the state-of-the-art researches.

(1) The sophisticated structure of the disk drives leads to difficulties in understanding the disk failure mechanism. In other words, there are multiple possibilities that may cause the failure of a disk. Different categories of disk failure are represented by different SMART patterns, which will introduce bias towards high-frequency failure types, and make the machine learning model less efficient. This characteristic will also have a negative impact 20 on the feature selection process. By excluding some of the features which contains less information value, information about some low-frequency failure may be discarded.

(2) The datasets utilized by the current work differ in type, size, attributes and quality, which hinders the performance evaluation and comparison of different models. The data imbalance and insufficiency put up challenge for model building. Also, most of the datasets are homogeneous, which means the disks are from the same type and operates in the same environment. It constrains the implementation of the novel model in real-world environment.

(3) The computation time and cost are often not considered when developing sophisticated models. The failure prediction system will most likely be applied in environment like data centers, where numerous drives are constantly in high usage. A failure prediction system is expected to handle big data and offer constant surveillance on all the drives. In this case, the computational speed and cost will be considered before implementation. Some sophisticated models can yield good performance, especially the neural networks and the ensemble models. However, the complexity and the speed may constrain their implementation in real-world environment.

In order to improve model performance and implementation practicality, the future work should focus on the following aspect. (1) Improve the understanding of the drive failure pattern and factors that signal hard drive failure in order to produce high quality features.

(2) Develop model using dataset collected from real-world applications. (3) Incorporate cloud computing to handle the big data. 21

3. Methodology

3.1 Problem Definition

The proposed hard disk prognostic system is shown in Figure 5. The objective is to alarm the disk failure before a preset time-in-advance. The system takes the SMART time series data of a disk as input, converts the input to recurrence feature, and classifies the disk to good or failed class with a binary classifier. The time series data is converted to RQA measures and then plugged into a supervised machine learning model, such as logistic regression and classification trees. The impact of imbalanced data treatment methods will also be discussed in the performance evaluation section.

Figure 8. Proposed hard disk prognostic system

3.2 Recurrent Features Generation

A dynamical system is defined by a present state and laws governing the evolution over time (Katok & Hasselblatt, 1997). The behavior of many real-world systems can be represented by a set of states in a state space and predicted to a certain extend by the initial state and a rule for how the state evolves. However, the nonlinearity and complexity of real-world systems put constraints on creating a precise description or prediction for the system behavior. Among the approaches to analyze the properties of a dynamic system, the recurrence analysis renders promising results because the recurrence of states is fundamental for deterministic dynamical systems and typical for stochastic dynamical systems (Marwan et al., 2007). The SMART attributes monitor the disk status over time 22 with various disk logs and sensors. For example, servo patterns and sector temperatures are monitored by servo sectors and temperature sensors respectively, and cumulative error counts are recorded in the disk log. They altogether describe the nonlinear characteristic of the disk performance. In this work, the recurrent analysis theory is performed as a feature extraction method for these nonlinear time series data.

3.2.1 Phase Space Reconstruction

When faced with a chaotic univariate time series, linear methods usually will fail to capture the structure. The incompetent understanding of the time series dynamics will lead to poor performance in distinguishing useful information from noise and forecasting. In this case, the attractor is introduced to represent the further central properties of the dynamics. An attractor, a portion of a phase space of a dynamical system, is a set of numerical values representing the direction toward which the system tends to evolve (Farmer, 1982). A direct observation of the phase space (or the attractor) of one system requires access to all dynamical variables of this system. However, in most cases, only one dynamical variable is available, so the dimension of the phase space remains unknown. For example, Figure 8 shows a three-dimension Lorenz attractor and a time series � (Tucker, 1999). When a point moves on the track of the Lorenz attractor, its position projected on the � axis forms the time series �. A good attractor, like Lorenz attractor for time series �, can represent the structure of the chaotic time series clearly. Therefore, the research interest rises in reconstructing a good attractor given a chaotic time series. 23

Figure 9. Graph Representative of a Lorenz Attractor (� = ��, � = ��, � = �/�)

A popular method for phase space reconstruction is the time delay embedding theorem

(Takens, 1981). The state of a system at an instance time � can be reconstructed as the phase space trajectory �⃗() = {�, �, �, … , �()} with an embedding dimension � and a time delay � . If � ≥ 2� + 1 , where � is the dimension of the attractor, the preservation of the topological structures of the original time series is guaranteed, which means both trajectories represent the same dynamical system in different coordinate systems. The proper selection of the embedding dimension � and time delay � ensures a good phase space reconstruction. As mentioned before, Takens (1981) and Whitney et al.

(1992) proved that an embedding dimension � will be sufficient if � ≥ 2� + 1, where � is the dimension of the attractor. However, the attractor dimension is usually unknown beforehand, and � < 2� + 1 may also be sufficient in some cases. For example, the

Lorenz attractor with � = 3 can be embedded with � = 3. Choosing an overly high

� may amplify the impact of noise and introduce unnecessary computational complexity.

Therefore, an approach to determine the smallest sufficient embedding dimension is much desired. Kennel et al. (1992) proposed a method to determine the minimum embedding dimension with the false nearest neighbors. When � is too small, the reconstructed 24 trajectories tend to intersect, where close points in the reconstructed space are not close in the actual phase space. Such points are defined as the false nearest neighbors. The idea of the proposed method is to raise � until the false nearest neighbors within a certain threshold vanish.

The autocorrelation function and the mutual information are two frequently used methods to determine the proper time delay � (Fraser & Swinney, 1986). A good embedding requires a reasonably small � which yields minimal redundancy between �,

�, �, … , �(). Therefore, the idea of the autocorrelation method is to pick the delay in such way � and � are linearly uncorrelated on average. However, � and � with a zero autocovariance are not necessarily independent. Here is when the concept of mutual information comes into picture. The mutual information function measures independence based on non-linear information theory. The idea of the mutual information method is to choose the delay as the first minimum of the mutual information function.

However, no method works perfect for all scenarios. It is necessary to double-check the embedding parameters by examining the recurrence plot.

3.2.2 Recurrence Plot

The recurrence plot (RP) is a visual representation of the times at which the system state

� in a phase space recurs. RP visualizes the high-dimensional phase space through a two- dimensional graph and reveals hidden information of a time series in a qualitative way. The element value �, of the two-dimensional matrix is defined as:

! !

Ri, j (ε) = H(ε − xi − x j ), i, j = 1,2,..., N 25

where � is the number of states � , �(∙) is the Heaviside function, � is the threshold distance, and ∥∙∥ is a norm (Marwan et al., 2007). When two recurrence pointers �⃗ and �⃗ fall in the same neighborhood with a fixed size of the threshold �, the matrix element �, will have the value of 1, and the corresponding point in the RP will be solid.

The threshold value � is an essential parameter of the RP. Multiple studies have proposed multiple rules to determine the proper � (Mindlin et al., 1992; Zbilut et al., 2002; Thiel et al., 2002). However, the choice of a proper threshold is highly case-sensitive. The testing and adjusting process is inevitable during the construction of RPs.

3.2.3 Recurrence Quantification Measurements

Despite of the ability to visualize the recurrence characteristic of a dynamic system in an intuitive way, the recurrence plots may introduce the subjectivity of the interpreter and further lead to misleading interpretations. To quantify the recurrence plot structures, Zbilut et al. (1992) and Marwan et al. (2007) introduced some recurrence quantification analysis

(RQA) measures. Seven measures adopted in this research are summarized in Table 3.

26

Table 3. Summary of Frequently Used RQAs

Measure Name Definition Formula

The relative density of recurrence 1 N Recurrence Rate RR R , where R is the element of = 2 ∑ i, j i, j points in the sparse matrix and the N i, j=1 (RR) estimator of the correlation sum the matrix, N is the dimension of the matrix

The fraction of recurrence points N N Determinism DET lP(l) R , where is the that form diagonal lines and the = ∑ ∑ i, j P(l) l=l i, j=1 (DET) representation of similar time min evolution of system states histogram of the diagonal line length l

The fraction of recurrence points N N Laminarity that form vertical lines and the LAM = ∑ vP(v) ∑vP(v) , where P(v) is v=v v=1 (LAM) representation of laminar states in min the system the histogram of the vertical line length v

Mean Diagonal The mean of the diagonal line N N Line Length lengths and the inverse of the L = ∑ lP(l) ∑ P(l) l l l l system divergence ( entropy) = min = min (L) K2

The average length of vertical line Trapping Time and the measure of the mean time N N TT = vP(v) P(v) that the system is trapped in a ∑ ∑ (TT) v=vmin v=vmin particular state with slow change

The Shannon entropy of the Entropy probability distribution of the N ENTR = − P(l)ln P(l) diagonal lengths and a measure of ∑ (ENTR) l=lmin recurrence structure complexity

Vertical Entropy The Shannon entropy of the N probability distribution of the VENTR = − ∑ P(v)ln P(v) v v (VENTR) vertical lengths = min 27

3.3 Machine Learning Approaches

Machine learning (ML) is the technique of automatically learning patterns from data without relying on rule-based programming. ML methods are typically categorized into supervised and unsupervised learning. Supervised learning requires a labeled dataset. The

ML model can discover the inferred relationship between the input and output through training and make predictions for future inputs. Supervised learning methods can perform both classification and regression tasks. On the contrary, unsupervised learning can discover hidden structures from an unlabeled dataset. Unsupervised learning is useful for applications such as detecting anomalous conditions, clustering and associations. In addition to supervised and unsupervised learning methods, semi-supervised learning and reinforcement learning offer extra selections for real-world applications (Alpaydin 2020).

Considering the RQA measures input, basic supervised learning methods like logistic regression and decision trees are chosen based on the model popularity in the previous literature review.

3.4 Performance Evaluation

Because the problem is defined as a binary classification problem, the classification metrics are chosen to evaluate the model performance. Among all the basic classification metrics, the focus stays on the failure detection rate (FDR) and the false alarm rate (FAR) as the concerns remain in raising the true alarm rate while controlling the false alarm rate. FDR can be calculated by dividing predicted failed drives with a total number of failed drives.

FAR can be obtained by dividing the number of good drives classified as the failed with the total number of good drives. In addition, the model complexity and computation time are of interest when it comes to the comparison between the two proposed models. 28

4. CASE STUDY

4.1 Data Preparation

The dataset used to evaluate the proposed failure detection model is from the Center for

Magnetic Recording Research in UCSD (Murray et al., 2005). This dataset contains time series of SMART attributes collected from a total of 369 hard disk drives. All the drives are from the same model. 178 of the drives are healthy and 191 have failed. The data for the healthy drives were collected from a semi-controlled reliability test run by the manufacturer while the data for the failed drives were collected from the customer-returned drives. For every disk, the SMART attributes are measured every two hours. The time history stored in the disks varies from 8 hours to 25 days.

Due to the uneven length of time history, the steps to trim the time series are shown in

Figure 9. The first step is to eliminate disks with a recorded time less than 200 hours because Murray et al. (2005) pointed out that disks with low hours might contain garbage information. The second step is to crop the time series to the same length, which is 400 hours here. The third step involves the time-in-advance. The objective of the model is to detect the disk failure before a certain amount of time. Therefore, for the failed disk, the records in the preset time-in-advance window from the failure point are eliminated. The healthy disk time series are also trimmed to keep the length consistent. The time-in- advance selected in this work is from 1 day to 4 days. 29

Figure 10. Steps to trim the time series

Among all 64 SMART attributes included for each drive, 17 useful features are included after a distribution exploration and a survey on past literature. The remaining features are

Servo2, Servo5, Servo10, Writes, Reads, FlyHeight1, FlyHeight6-13, ReadError1,

ReadError3, and ReadError18.

4.2 Recurrence Features Generation

After the preprocessing, the time series will be transformed to recurrence features and then fed to the machine learning algorithms. Before creating the RPs and RQA measures, the embedding dimension � and time delay � require careful selection to yield promising phase space reconstruction results. To increase the model’s ability to detect anomalies, the behavior of the good disks needs to be well captured. Therefore, the parameter selection tests will be performed with the time series from healthy disks.

Based on the behavior of each feature, the SMART features are categorized into four groups: time series with visible repeating patterns (Servo2, Servo10, Writes, Reads), fast- changing time series with no visible patterns (FlyHeight1, 6-13), slow-changing time series with dominating constant values (Servo5), and others (ReadError1, 3, 18). Figure 10 shows an example of four types of features from Disk100349. Note that the read error family is categorized into others because most of the disk has non-changing 0 value. Among the 30 disks which do have read error values, visible repeating pattern can be observed from good disks and chaotic pattern can be observed from failed disks.

Figure 11. Examples of four types of SMART features

The time delay embedding will be performed on features with visible repeating patterns and fast-changing features with no visible patterns. The selected approach to determine the time delay � is the autocorrelation method, which will return the smallest � in such way that � and � are linearly uncorrelated on average. Then the false nearest neighbors

(FNN) method is used to pick the minimum embedding dimension � where the false nearest neighbors within a certain threshold vanish. For example, Figure 11 shows the parameter searching process for Writes of Disk100365. The original time series has a visible repeating pattern. The time delay search log indicates that the autocorrelation is closest to 0 when � = 3. The time delay graph validates that � = 3 can capture the pattern 31 shown in the original time series. At last, the FNN is implemented with � = 3. The proportion of false neighbors reaches the lowest when the embedding dimension � = 5.

Figure 12. Parameter search log for feature Writes of Disk100365

After repeating the same process, the selected embedding dimensions and time delay values are shown in Table 4.

Table 4. Parameter selection results for the attributes

Feature Feature Type Parameter Selection Servo2 time series with visible repeating patterns � = 2, � = 2 slow-changing time series with Servo5 � = 1, � = 0 dominating constant values

Servo10 time series with visible repeating patterns � = 2, � = 2 Writes time series with visible repeating patterns � = 5, � = 3 Reads time series with visible repeating patterns � = 6, � = 3 fast-changing time series with no visible FlyHeight1, 6-13 � = 1, � = 1 patterns ReadError1, 3, 18 Others � = 1, � = 0 32

The parameters suggested by the autocorrection and FNN are validated by observing the according recurrence plots. Figure 12 shows an example recurrence plot of Servo10 of

Disk100365. At last, the RQA measures are created and stored for machine learning model implementation.

Figure 13. Example recurrence plot of Servo10 of Disk100365

4.3 Model Evaluation

The focus of the proposed failure detection system is on developing the recurrence measures from the SMART attributes. It is believed that the recurrence measures from a proper phase space construction will lead to a promising classification result with no need of sophisticated machine learning model structure. In this sense, two basic binary classifier, logistic regression and decision tree, are chosen based on the model popularity in the previous literature review. To avoid overfitting the training data, the logistic regression model is trained with �1 penalty which represents Lasso regression. Also, the decision tree is pruned with cost-complexity pruning method.

Before plugging the RQA measures in the model, the imbalanced nature between class raises concern. After the time series cropping process mentioned before, there are 174 good disks and 31 failed disks left in the dataset. A novel oversampling approach, the synthetic 33 minority oversampling technique (SMOTE), is chosen to rebalance the records count between class (Chawla et al., 2002). Different from random oversampling, SMOTE selects the minority records that are close to each other and creates a new sample between the selected examples in the feature space.

The model is trained using 10-fold validation. Table 5 presents the 95% confidence interval of the evaluation metrics failure detection rate (FDR), false alarm rate (FAR), F1 score, and AUROC with data from two days in advance. Murray et al. In general, the decision trees outperform the logistic regression with relatively high and low-variance performance.

Specifically, the decision tree with SMOTE oversampling achieves the best classification results.

Table 5. Performance evaluation metrics from different models (95% confidence interval)

Model FDR FAR F1 Score AUROC Logistic Regression with (51%, 88%) (1.6%, 8.6%) (0.54, 0.83) (0.62, 0.94) Unmodified Data Logistic Regression with (58%, 100%) (0%, 8.9%) (0.55, 0.90) (0.68, 0.97) SMOTE Oversampling

Decision Tree with (80%, 100%) (0%, 6.8%) (0.80, 1) (0.89, 1) Unmodified Data

Decision Tree with SMOTE (85%, 100%) (0%, 6.7%) (0.70, 0.94) (0.86, 0.98) Oversampling

Table 6 presents the performance of decision tree with SMOTE for different time in advance. From the table, we can conclude that the model is robust in producing low false alarm failure detection results.

34

Table 6. Performance of decision tree without SMOTE for different time in advance

Time in Advance FDR FAR F1 Score AUROC

1 day (79%, 100%) (0%, 8.3%) (0.73, 0.96) (0.88, 1)

2 days (85%, 100%) (0%, 6.7%) (0.70, 0.94) (0.86, 0.98)

3 days (71%, 100%) (0%, 8.6%) (0.76, 0.96) (0.89, 0.99)

4 days (68%, 100%) (0%, 5.9%) (0.79, 0.98) (0.85, 0.98)

When observing the tree structure, the top attributes include the recurrence rate of Writes and Reads, the mean diagonal line length of Writes and Servo5, the vertical entropy of

Servo5, and the trapping time of Reads. The interpretable rules related to these attributes have potentials to substitute the current threshold-based SMART alarm system.

5. CONCLUSION

This work focused on the hard disk drive failure detection with two contributions: a thorough review the state-of-the-art applications of machine learning methods for

SMART-based hard disk drive failure detection and a recurrence-theorem-based prognostic hard disk drive failure detection method.

In the state-of-the-art literature review, the current methodologies in this field are presented following the generic machine learning process. We discovered that the research trend has been shifting from the simple disk failure detection to RUL prediction over the years.

Researchers tend to use open-source data collected from real-world data centers. Also, novel feature selection and imbalance treatment methods are preferable to improve the model performance. The sophisticated modeling is often the highlight of the researches in 35 this field. However, the complexity and the speed may constrain their implementation in real-world environment.

Inspired by the insight from the review, a recurrence-theorem-based hard disk drive prognostic method is proposed. The focus of this method is to capture the underlying time series structure by phase space reconstruction with carefully selected embedding parameters. After converted to RQA measures, the data was passed on to a basic decision tree for classification. A case study with the CMRR dataset validated the feasibility of this model. The embedding parameters were selected per feature. The basic decision tree model with SMOTE oversampling achieves an FDR and FAR with a 95% confidence interval of

(85%, 100%) and (0%, 6.7%) respectively when detecting errors two days in advance.

The future work includes: (a) validate the proposed method on a larger real-world dataset such as real-time data collected from Backblaze data center; (b) extend the binary classification to health-degree multiclass classification with sufficient data input; (c) test the possibility of classifying with convolutional neural network using recurrence plots as input.

36

Reference

Agarwal, V., Bhattacharyya, C., Niranjan, T., & Susarla, S. (2009, December). Discovering rules from disk events for predicting hard drive failures. In 2009 International Conference on Machine Learning and Applications (pp. 782-786). IEEE.

Alpaydin, E. (2020). Introduction to machine learning. MIT press.

Anantharaman, P., Qiao, M., & Jadav, D. (2018, July). Large scale for hard disk remaining useful life estimation. In 2018 IEEE International Congress on Big Data (BigData Congress) (pp. 251-254). IEEE.

Aussel, N., Jaulin, S., Gandon, G., Petetin, Y., Fazli, E., & Chabridon, S. (2017, December). Predictive models of hard drive failures based on operational data. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA) (pp. 619- 625). IEEE.

Botezatu, M. M., Giurgiu, I., Bogojeska, J., & Wiesmann, D. (2016, August). Predicting disk replacement towards reliable data centers. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and (pp. 39-48).

Cao, H., Li, X. L., Woon, D. Y. K., & Ng, S. K. (2013). Integrated oversampling for imbalanced time series classification. IEEE Transactions on Knowledge and Data Engineering, 25(12), 2809-2822.

Chaves, I. C., de Paula, M. R. P., Leite, L. G., Gomes, J. P. P., & Machado, J. C. (2018, July). Hard disk drive failure prediction method based on a Bayesian network. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE.

Chaves, I. C., de Paula, M. R. P., Leite, L. G., Queiroz, L. P., Gomes, J. P. P., & Machado, J. C. (2016, October). Banhfap: A bayesian network based failure prediction approach for hard disk drives. In 2016 5th Brazilian Conference on Intelligent Systems (BRACIS) (pp. 427-432). IEEE. 37

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of research, 16, 321-357. dos Santos Lima, F. D., Amaral, G. M. R., de Moura Leite, L. G., Gomes, J. P. P., & de Castro Machado, J. (2017, October). Predicting failures in hard drives with lstm networks. In 2017 Brazilian Conference on Intelligent Systems (BRACIS) (pp. 222- 227). IEEE. dos Santos Lima, F. D., Pereira, F. L. F., Chaves, I. C., Gomes, J. P. P., & de Castro Machado, J. (2018, October). Evaluation of recurrent neural networks for hard disk drives failure prediction. In 2018 7th Brazilian Conference on Intelligent Systems (BRACIS) (pp. 85-90). IEEE.

Farmer, J. D. (1982). Chaotic attractors of an infinite-dimensional dynamical system. Physica D: Nonlinear Phenomena, 4(3), 366-393.

Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets (pp. 1-377). Berlin: Springer.

Franklin, P. H. (2017, January). Predicting disk drive failure using condition based monitoring. In 2017 Annual Reliability and Maintainability Symposium (RAMS) (pp. 1-5). IEEE.

Fraser, A. M., & Swinney, H. L. (1986). Independent coordinates for strange attractors from mutual information. Physical review A, 33(2), 1134.

Fraser, A. M., & Swinney, H. L. (1986). Independent coordinates for strange attractors from mutual information. Physical review A, 33(2), 1134.

Ganguly, S., Consul, A., Khan, A., Bussone, B., Richards, J., & Miguel, A. (2016, March). A practical approach to hard disk failure prediction in cloud platforms: Big data model for failure management in datacenters. In 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService) (pp. 105-116). IEEE.

Hadjieleftheriou, M., Papadopoulos, A. N., & Zhang, D. (2007). Disk Storage and Basic File Structures. Northeastern University. 38

Hamerly, G., & Elkan, C. (2001, June). Bayesian approaches to failure prediction for disk drives. In ICML (Vol. 1, pp. 202-209).

Hughes, G. F., Murray, J. F., Kreutz-Delgado, K., & Elkan, C. (2002). Improved disk-drive failure warnings. IEEE transactions on reliability, 51(3), 350-357.

Jiang, T., Huang, P., & Zhou, K. (2020). Cost‐efficiency disk failure prediction via threshold‐moving. Concurrency and Computation: Practice and Experience, e5669.

Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., & Wu, A. Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation. IEEE transactions on pattern analysis and machine intelligence, 24(7), 881-892.

Katok, A., & Hasselblatt, B. (1997). Introduction to the modern theory of dynamical systems (Vol. 54). Cambridge university press.

Kaur, K., & Kaur, K. (2018, July). Failure prediction and health status assessment of storage systems with decision trees. In International Conference on Advanced Informatics for Computing Research (pp. 366-376). Springer, Singapore.

Kaur, K., & Kaur, K. (2019). Failure Prediction, Lead Time Estimation and Health Degree Assessment for Hard Disk Drives Using Voting Based Decision Trees. CMC- COMPUTERS MATERIALS & CONTINUA, 60(3), 913-946.

Kennel, M. B., Brown, R., & Abarbanel, H. D. (1992). Determining embedding dimension for phase-space reconstruction using a geometrical construction. Physical review A, 45(6), 3403.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. nature, 521(7553), 436-444.

Lesser, M. L., & Haanstra, J. W. (1957). The Random-Access Memory Accounting Machine—I. System Organization of the IBM 305. IBM Journal of Research and Development, 1(1), 62-71.

Li, J., Ji, X., Jia, Y., Zhu, B., Wang, G., Li, Z., & Liu, X. (2014, June). Hard drive failure prediction using classification and regression trees. In 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (pp. 383-394). IEEE. 39

Li, J., Stones, R. J., Wang, G., Li, Z., Liu, X., & Xiao, K. (2016, September). Being accurate is not enough: New metrics for disk failure prediction. In 2016 IEEE 35th Symposium on Reliable Distributed Systems (SRDS) (pp. 71-80). IEEE.

Li, J., Stones, R. J., Wang, G., Liu, X., Li, Z., & Xu, M. (2017). Hard drive failure prediction using decision trees. Reliability Engineering & System Safety, 164, 55-65.

Li, Q., Li, H., & Zhang, K. (2019, October). Prediction of HDD Failures by Ensemble Learning. In 2019 IEEE 10th International Conference on and Service Science (ICSESS) (pp. 237-240). IEEE.

Lima, F. D. S., Pereira, F. L. F., Leite, L. G., Gomes, J. P. P., & Machado, J. C. (2018, July). Remaining useful life estimation of hard disk drives based on deep neural networks. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1-7). IEEE.

Liu, D., Wang, B., Li, P., Stones, R. J., Marbach, T. G., Wang, G., ... & Li, Z. (2019, December). Predicting Hard Drive Failures for Cloud Storage Systems. In International Conference on Algorithms and Architectures for Parallel Processing (pp. 373-388). Springer, Cham.

Marwan, N., Romano, M. C., Thiel, M., & Kurths, J. (2007). Recurrence plots for the analysis of complex systems. Physics reports, 438(5-6), 237-329.

Mashhadi, A. R., Cade, W., & Behdad, S. (2018). Moving towards real-time data-driven quality monitoring: a case study of hard disk drives. Procedia Manufacturing, 26, 1107-1115.

McLeod, S. (2005). SMART anti-forensics. Forensic Focus.

Mindlin, G. M., & Gilmore, R. (1992). Topological analysis and synthesis of chaotic time series. Physica D: Nonlinear Phenomena, 58(1-4), 229-242.

Murray, J. F., Hughes, G. F., & Kreutz-Delgado, K. (2003, June). Hard drive failure prediction using non-parametric statistical methods. In Proceedings of ICANN/ICONIP. 40

Murray, J. F., Hughes, G. F., & Kreutz-Delgado, K. (2005). Machine learning methods for predicting failures in hard drives: A multiple-instance application. Journal of Machine Learning Research, 6(May), 783-816.

Pang, S., Jia, Y., Stones, R., Wang, G., & Liu, X. (2016, July). A combined Bayesian network method for predicting drive failure times from SMART attributes. In 2016 International Joint Conference on Neural Networks (IJCNN) (pp. 4850-4856). IEEE.

Pereira, F. L. F., dos Santos Lima, F. D., de Moura Leite, L. G., Gomes, J. P. P., & de Castro Machado, J. (2017, October). Transfer learning for Bayesian networks with application on hard disk drives failure prediction. In 2017 Brazilian Conference on Intelligent Systems (BRACIS) (pp. 228-233). IEEE.

Pereira, F., Teixeira, D., Gomes, J. P., & Machado, J. (2019, October). Evaluating One- Class Classifiers for Fault Detection in Hard Disk Drives. In 2019 8th Brazilian Conference on Intelligent Systems (BRACIS) (pp. 586-591). IEEE.

Pitakrat, T., Van Hoorn, A., & Grunske, L. (2013, June). A comparison of machine learning algorithms for proactive hard disk drive failure detection. In Proceedings of the 4th international ACM Sigsoft symposium on Architecting critical systems (pp. 1-10).

Rincón, C. A., Pâris, J. F., Vilalta, R., Cheng, A. M., & Long, D. D. (2017, July). Disk failure prediction in heterogeneous environments. In 2017 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS) (pp. 1-7). IEEE.

Shen, J., Wan, J., Lim, S. J., & Yu, L. (2018). Random-forest-based failure prediction for hard disk drives. International Journal of Distributed Sensor Networks, 14(11), 1550147718806480.

Su, C. J., & Huang, S. F. (2018). Real-time big data analytics for hard disk drive predictive maintenance. Computers & Electrical Engineering, 71, 93-101.

Su, C. J., & Yon, J. A. Q. (2018). Big Data Preventive Maintenance for Hard Disk Failure Detection. International Journal of Information and Education Technology, 8(7).

Suchatpong, T., & Bhumkittipich, K. (2014, May). Hard Disk Drive failure mode prediction based on industrial standard using decision tree learning. In 2014 11th 41

International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) (pp. 1-4). IEEE.

Takens, F. (1981). Detecting strange attractors in turbulence. In Dynamical systems and turbulence, Warwick 1980 (pp. 366-381). Springer, Berlin, Heidelberg.

Teoh, T. T., Cho, S. Y., & Nguwi, Y. Y. (2012, July). Hidden markov model for hard-drive failure detection. In 2012 7th International Conference on Computer Science & Education (ICCSE) (pp. 3-8). IEEE.

Thiel, M., Romano, M. C., Kurths, J., Meucci, R., Allaria, E., & Arecchi, F. T. (2002). Influence of observational noise on the recurrence quantification analysis. Physica D: Nonlinear Phenomena, 171(3), 138-152.

Tucker, W. (1999). The Lorenz attractor exists. Comptes Rendus de l'Académie des Sciences-Series I-Mathematics, 328(12), 1197-1202.

Wang, H., & Zhang, H. (2020, January). AIOPS Prediction for Hard Drive Failures Based on Stacking Ensemble Model. In 2020 10th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 0417-0423). IEEE.

Wang, J., Bao, W., Zheng, L., Zhu, X., & Yu, P. S. (2019). An Attention-augmented Deep Architecture for Hard Drive Status Monitoring in Large-scale Storage Systems. ACM Transactions on Storage (TOS), 15(3), 1-26.

Wang, Y., Jiang, S., He, L., Peng, Y., & Chow, T. W. (2019, July). Hard Disk Drives Failure Detection Using A Dynamic Tracking Method. In 2019 IEEE 17th International Conference on Industrial Informatics (INDIN) (Vol. 1, pp. 1473-1477). IEEE.

Wang, Y., Ma, E. W., Chow, T. W., & Tsui, K. L. (2013). A two-step parametric method for failure prediction in hard disk drives. IEEE Transactions on industrial informatics, 10(1), 419-430.

Wang, Y., Miao, Q., & Pecht, M. (2011, May). Health monitoring of hard disk drive based on Mahalanobis distance. In 2011 Prognostics and System Health Managment Confernece (pp. 1-8). IEEE. 42

Wang, Y., Miao, Q., Ma, E. W., Tsui, K. L., & Pecht, M. G. (2013). Online anomaly detection for hard disk drives based on mahalanobis distance. IEEE Transactions on Reliability, 62(1), 136-145.

Whitney, H., Eells, J., & Toledo, D. (1992). Collected Papers of Hassler Whitney. Nelson Thornes.

Xiao, J., Xiong, Z., Wu, S., Yi, Y., Jin, H., & Hu, K. (2018, August). Disk failure prediction in data centers via online learning. In Proceedings of the 47th International Conference on Parallel Processing (pp. 1-10).

Xie, Y., Feng, D., Wang, F., Zhang, X., Han, J., & Tang, X. (2018, October). OME: An Optimized Modeling Engine for Disk Failure Prediction in Heterogeneous Datacenter. In 2018 IEEE 36th International Conference on Computer Design (ICCD) (pp. 561-564). IEEE.

Xu, C., Wang, G., Liu, X., Guo, D., & Liu, T. Y. (2016). Health status assessment and failure prediction for hard drives with recurrent neural networks. IEEE Transactions on Computers, 65(11), 3502-3508.

Xu, Y., Sui, K., Yao, R., Zhang, H., Lin, Q., Dang, Y., ... & Chintalapati, M. (2018). Improving service availability of cloud systems by predicting disk error. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18) (pp. 481-494).

Yang, J., & Sun, F. B. (1999, January). A comprehensive review of hard-disk drive reliability. In Annual Reliability and Maintainability. Symposium. 1999 Proceedings (Cat. No. 99CH36283) (pp. 403-409). IEEE.

Yang, W., Hu, D., Liu, Y., Wang, S., & Jiang, T. (2015, September). Hard drive failure prediction using big data. In 2015 IEEE 34th Symposium on Reliable Distributed Systems Workshop (SRDSW) (pp. 13-18). IEEE.

Yu, J. (2019, November). Hard disk Drive Failure Prediction Challenges in Machine Learning for Multi-variate Time Series. In Proceedings of the 2019 3rd International Conference on Advances in Image Processing (pp. 144-148).

Zbilut, J. P., & Webber Jr, C. L. (1992). Embeddings and delays as derived from quantification of recurrence plots. Physics letters A, 171(3-4), 199-203. 43

Zbilut, J. P., Zaldivar-Comenges, J. M., & Strozzi, F. (2002). Recurrence quantification based Liapunov exponents for monitoring divergence in experimental data. Physics Letters A, 297(3-4), 173-181.

Zeid, A., & Kamarthi, S. (2010, January). Assessment of Current Health of Hard Disk Drives. In Smart Materials, Adaptive Structures and Intelligent Systems (Vol. 44168, pp. 793-796).

Zhang, J., Wang, J., He, L., Li, Z., & Philip, S. Y. (2018, November). Layerwise perturbation-based adversarial training for hard drive health degree prediction. In 2018 IEEE International Conference on Data Mining (ICDM) (pp. 1428-1433). IEEE.

Zhao, Y., Liu, X., Gan, S., & Zheng, W. (2010, July). Predicting disk failures with HMM- and HSMM-based approaches. In Industrial Conference on Data Mining (pp. 390- 404). Springer, Berlin, Heidelberg.

Zhu, B., Wang, G., Liu, X., Hu, D., Lin, S., & Ma, J. (2013, May). Proactive drive failure prediction for large scale storage systems. In 2013 IEEE 29th symposium on mass storage systems and technologies (MSST) (pp. 1-5). IEEE.

Züfle, M., Krupitzer, C., Erhard, F., Grohmann, J., & Kounev, S. (2020, March). To Fail or Not to Fail: Predicting Hard Disk Drive Failure Time Windows. In International Conference on Measurement, Modelling and Evaluation of Computing Systems (pp. 19-36). Springer, Cham.