Hard Disk Drive Failure Detection with Recurrence Quantification Analysis
Total Page:16
File Type:pdf, Size:1020Kb
HARD DISK DRIVE FAILURE DETECTION WITH RECURRENCE QUANTIFICATION ANALYSIS A Thesis Presented By Wei Li to The Department of Mechanical & Industrial Engineering in partial fulfillment of the requirements for the degree of Master of Science in the field of Industrial Engineering Northeastern University Boston, Massachusetts August 2020 ACKNOWLEDGEMENTS I would like to express my sincere appreciation to my thesis advisor, Professor Sagar Kamarthi, who has been providing me with guidance, encouragement and patience throughout the duration of this project. I would also like to extend my gratitude to Professor Srinivasan Radhakrishnan for his inspiration and guidance. 3 TABLE OF CONTENTS LIST OF TABLES .............................................................................................................. v LIST OF FIGURES ........................................................................................................... vi ABSTRACT ...................................................................................................................... vii 1. INTRODUCTION .......................................................................................................... 1 1.1 Hard Disk Drive ........................................................................................................ 1 1.2 Self-Monitoring Analysis and Reporting Technology (SMART) ............................... 4 1.3 Research Objective ................................................................................................... 5 2. STATE-OF-THE-ART LITERATURE REVIEW ......................................................... 5 2.1 Searching Criteria .................................................................................................... 6 2.2 Information Extraction Method ................................................................................ 6 2.3 Result ......................................................................................................................... 7 2.3.1 Problem Definition ............................................................................................. 7 2.3.2 Data Source ........................................................................................................ 8 2.3.2 Feature Selection .............................................................................................. 11 2.3.3 Preprocessing and Imbalanced Data Treatment ............................................... 12 4 2.3.4 Model Selection ............................................................................................... 14 2.3.5 Performance Evaluation Metrics ...................................................................... 18 2.4 Discussion ............................................................................................................... 19 3. Methodology ................................................................................................................. 21 3.1 Problem Definition .................................................................................................. 21 3.2 Recurrent Features Generation .............................................................................. 21 3.2.1 Phase Space Reconstruction ............................................................................ 22 3.2.2 Recurrence Plot ................................................................................................ 24 3.2.3 Recurrence Quantification Measurements ....................................................... 25 3.3 Machine Learning Approaches ............................................................................... 27 3.4 Performance Evaluation ......................................................................................... 27 4. CASE STUDY .............................................................................................................. 28 4.1 Data Preparation .................................................................................................... 28 4.2 Recurrence Features Generation ............................................................................ 29 4.3 Model Evaluation .................................................................................................... 32 5. CONCLUSION ............................................................................................................. 34 Reference .......................................................................................................................... 36 v LIST OF TABLES Table 1. Summary of three popular open-source datasets ................................................ 10 Table 2. Overview of feature selection methods .............................................................. 12 Table 3. Summary of Frequently Used RQAs .................................................................. 26 Table 4. Parameter selection results for the attributes ...................................................... 31 Table 5. Performance evaluation metrics from different models (95% confidence interval) ........................................................................................................................................... 33 Table 6. Performance of decision tree without SMOTE for different time in advance .... 34 vi LIST OF FIGURES Figure 1. Components of a hard disk drive with single platter ........................................... 3 Figure 2. The columns of the information extraction table ................................................ 6 Figure 3. Illustration of three major problem definitions ................................................... 7 Figure 4. Frequency of the three problem definitions over years ....................................... 8 Figure 5. Proportion of data collection methods ................................................................. 9 Figure 6. Summary of proposed machine learning models .............................................. 14 Figure 7. Popularity of the models over years .................................................................. 18 Figure 8. Proposed hard disk prognostic system .............................................................. 21 Figure 9. Graph Representative of a Lorenz Attractor (s = 10, r = 25, b = 8/3) .......... 23 Figure 10. Steps to trim the time series ............................................................................. 29 Figure 11. Examples of four types of SMART features ................................................... 30 Figure 12. Parameter search log for feature Writes of Disk100365 ................................. 31 Figure 13. Example recurrence plot of Servo10 of Disk100365 ...................................... 32 vii ABSTRACT The need for fast and reliable data storage and management has been immense since the era of cloud computing. The prognostic techniques such as the Self-Monitoring Analysis and Reporting Technology (SMART) provide status monitoring and failure detection for hard disk drives, the most widely used data storage device in the industrial setting. However, the original threshold method fails to yield satisfying failure detection rate. Recently, the researchers have developed various machine-learning-based techniques to make full use of the SMART attributes and improve the failure detection rate while controlling the false alarm rate. This work first provides a review of the state-of-the-art SMART-based machine learning applications. Covering 51 publications from 2001-2020, the review presents a synthesis of methodologies following the machine learning steps, reveals the research trend over time, and points out some future research possibilities. Then a recurrence-theorem-based prognostic hard disk drive failure detection method is proposed. To well capture the time series structure, the system first converts the original SMART data to recurrence quantification analysis (RQA) measures with time-delay embedding and then passes the data to a binary classifier. The system is tested on a dataset with 369 hard disk drives. The basic decision tree model with SMOTE oversampling achieves an FDR and FAR with a 95% confidence interval of (85%, 100%) and (0%, 6.7%) respectively when detecting errors two days in advance. 1 1. INTRODUCTION The Industrial Internet of Things (IIoT) has bridged the gap between operational and information technology. Correspondingly, the cloud-based data analytics platforms have improved production efficiency and quality by reducing local developing and maintenance effort. To ensure high quality of the continuous and diverse cloud services, maintaining a highly reliable data storage is a critical task. Therefore, the prognostic health assessment of the hard disk drive, the most preferable data storage device in the industrial setting, has become essential for the smooth running of processes at data centers. 1.1 Hard Disk Drive The introduction of the stored-program computer by John Von Veumann in 1945 marked the beginning of general-purpose computers, in which the central processing unit is connected with storage devices that store both data and program instructions (Lesser & Haanstra, 1957). Memory is an essential and critical component of any modern computer. The memory devices used in modern computers can be categorized into three types: primary, secondary, and tertiary memory devices (Hadjieleftheriou et al., 2007). The primary memory devices allow the CPU to read/write the data at the fastest possible speed. These devices tend to have small memory