<<

The 7th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems July 31-August 4, 2017, Hawaii, USA Inferring of Solid State Drives based on Current Consumption

Jacob Melton, Ryan Rakvic, James Shey, Hau Ngo, Kevin D. Fairbanks Owens Walker, Justin Blanco, Dane Brown, Luke [email protected] McDowell Electrical and Computer Engineering Department United States Naval Academy Annapolis, MD, US.

Abstract—With the increasing demand for faster reliable When using a secondary storage device such as an SSD, a secondary storage, Solid State Drives (SSDs) have provided a file system is used to organize data in memory. A file system viable replacement for Hard Disk Drives (HDDs). SSDs contain provides a method to organize physical data locations so that NAND components and a processor that executes the device level to optimize performance. The on- the can keep track of and access the data. board processor and firmware handle operations such as garbage Some file systems such as NTFS provide additional functions collection and encryption with no visibility to the user. Therefore, such as fault tolerance which provides an additional record of classifying SSD internal behavior can identify compromised file locations in case of unforeseen errors [4]. This extra fault devices. This paper utilizes high precision measurements of tolerance is a result of using a file system’s journaling to track power used by an SSD via an oscilloscope, to infer a drive’s file what data and metadata have not been fully committed. In the system . We consider four file systems (NTFS, exFAT, FAT32 and ) and demonstrate that frequency analysis of event of a system crash, the journal allows the drive to power consumption can identify the system in use. In particular, quickly return to a consistent state. This paper investigates the we show that transforming the frequency-domain power file systems NTFS, exFAT, FAT32 and EXT4. EXT4 and signature with principal components analysis can produce a NTFS are journaling file systems while exFAT and FAT32 are small number of highly predictive features. Using a k-NN not. All of these file systems are commonly used in modern classifier, we then demonstrate that these features enable an computing systems. SSD’s power signature to identify the correct file system 94.3 The main contribution of this paper is a measurement and percent accuracy on a SSD and with 96.5 percent accuracy on a Crucial SSD. analysis technique for inferring file system by observing the current consumption of the measured device. Knowing the file system used on a drive can help identify what Operating I. INTRODUCTION System (OS) is being used with the SSD. Since OSs have A Solid State Drive (SSD) operates and stores data in a default file systems, a method to determine the file system on fundamentally different way from a (HDD). a device can give information about the OS. It is important for SSD’s utilize a NAND-based flash memory, which has no investigators to know the OS being used moving magnetic components while HDD’s have rotating for a computer system. The OS controls how files are being disks. More specifically, NAND flash memory utilizes an accessed not only on the SSD but throughout the whole array of memory cells that are made up of floating gate system [7]. Additionally, being able to classify the file system transistors managed by the flash translation layer [1]. The an SSD is formatted to based on power consumption can be transistors are arranged in columns connected in series with used to identify malfunction or other abnormal behavior [5,6]. the source terminal of a given transistor connected to the drain For instance, a unique component of SSDs compared to HHDs terminal of the next transistor. Many of these columns are is the onboard controller. The user has no visibility to the placed together to form a memory . Block sizes typically controller’s operations and therefore cannot determine if the can range from 256KB to 4MB and due to the arrangement of firmware being executed contains malicious operations. the flash memory, data has to be erased in blocks [2]. Since Devices can have or Trojan circuits that SSDs utilize NAND flash memory, they have lower access compromises the security and integrity of information stored times, higher data transfer rates, and use less power compared on the SSD. Therefore, it is important to develop techniques to to HDDs, which manipulate a magnetic film to record analyze SSD behavior to ensure the device is performing as information. A controller on the SSD serves as a bridge to the expected. In this work, we classify file systems using the host computer executes garbage collection, and performs current consumption signatures of and operations of operations [3]. varying size.

U.S. Government work not protected by U.S. copyright 72 II. RELATED WORK Darmstadt, Germany) [14] as seen in Fig. 1(b). The data With their unique advantages, SSDs continue to gain recorder was triggered externally by a voltage pulse sent over popularity with consumers. In [3, 9, 10], the performance of a USB-to-Serial connection to synchronize the and stop SSDs are analyzed in different situations and potential times of the data runs. improvements are proposed. For example, in [3] the TRIM 3) SSD: The target SSDs were a Samsung 850 Evo device operation’s performance is analyzed in the EXT4 [15] and with a size of 250GB and a Crucial MX200 desvice with a size NTFS file systems. The TRIM operation is shown to degrade of 250GB. The Samsung’s model was MZ-75E250 with a read performance of the SSD over . speed of 425 MB/s and write speed of 386 MB/s. The Crucial’s In [11], the author uses a black box approach to analyze the model was CT500MX200SSD1, with a read speed of 555MB/s and a write speed of 500MB/s. performance of several different SSD models using a variety of workloads. Creating a model for SSD behavior is important for instances such as designing a storage system. The author of B. Data Collection and Analysis [12] explains how the unique components of SSDs such as the 1) Data Collection: the data collection process is controller contribute to an inability to data from drives semi- automated, with the manual step being the formatting of in a computer forensic situation. As SSDs get faster, more the drive with the desired file system of NTFS, exFAT, Fat32 aggressive firmware programs store and permanently delete or Ext4. Once, the file system is chosen, a Python script data in manners that affect recovery of information. creates files of a user-specified size ranging from 10MB to 1 The authors of [8] present a method for inferring TRIM GB comprised of random characters. Each of these different commands which can be seen as the foundation of work that sized text files is then written to or read from the SSD fifteen our team builds upon in this paper. These results show that times each while the current consumption of the device is power analysis techniques can be used to infer or gather more being recorded. These same random text files are used for information about the internal operations of SSDs. In [13], a testing with both SSDs with their varying file systems. A total similar energy consumption monitoring technique is used to of 1800 current data files were gathered for the testing of four identify read and write operations across different different file systems on an SSD. A minimum of two minutes manufactures. Read and write operations for different SSDs is waited between the end of an operation and the initiation of contain similar energy consumption characteristics. another to ensure that any residual processes on the SSD were III. EXPERIMENTAL DESIGN completed before the new operation. The end of an operation is determined when the host computer has finished writing or

This section describes the experimental setup and the reading from the SSD. Two minutes was chosen because this software used for data collection and analysis. The apparatus amount of time ensures that there are no visually discernable was developed in a previous study [8] to monitor SSD power signs of activity on the host computer or in the voltage signal consumption. resulting from the operation on the SSD. Subsequently, the SSD was reformatted with a different file system and this data A. Hardware gathering process was repeated. In this section the physical components of the experiment are discussed. 1) Computer System: The computer used to program and interface with the SSD has an Intel Core i5-2400 processor with 8 GB DDR3 RAM. During the data collection phase, was running on the computer. The system had two drives mounted to it, the first was an ADATA SP600 SSD used to store the operating system and software required for the experiments and the second was the target SSD. The target SSD was mounted as a secondary drive to ensure that the resulting current signature was unrelated to the OS operations occurring during the experiment. 2) Taking Measurements from SSD: The measurements were gathered from a daughterboard where a 0.1 Ohm precision resistor was placed in series with the power supply Fig. 1. System Setup including (a) Daughterboard and (b) Data Recorder [8] to the SSD as seen is Fig. 1(a). By measuring the voltage (V) 2) Data Analysis: Since the data gathered by the Gen3i Data across a resistor (R) of known value, the current (I) supplied Recorder were voltage measurements, the data points were can be calculated through the equation V=IR. This method converted to current values. A k-nearest neighbors (k-NN) creates a way to monitor the current directly into the SSD. The classifier was then created to classify current consumption data voltages were measured with 14 bits/sample at a rate of 200 files as indicative of either NTFS, Ext4, FAT32, or exFAT file KSamples/s using a GEN3i High Speed data recorder (HBM, systems. The data analysis had two phases. The first was to extract features derived from the frequency domain as a

73 representation of the signal, to be used as classifier inputs. The For each file system, SSD and combination, fifteen second was to build and test the performance of the classifier. write operations and fifteen read operation were executed. Each SSD was tested while mounted on a Windows 10 operating system. The collected data were randomly divided into five equally sized groups. Each group contained a equal number of read and write operations. 5-fold cross validation was then used to estimate the accuracy of the k-NN classifier. In particular, the classifier was trained on four sets of data and tested on the fifth set to determine its accuracy; then a new set was chosen to be tested while the classifier was trained on the remaining 4 sets. This process was repeated until all five sets were used as a test set and the resulting five accuracy measurements were averaged to estimate the generalization performance of the classifier. Figures 3 and 4 show the training data in a feature space for the Crucial and Samsung drives, respectively. Each shade in Fig. 3 and 4 represents a Fig. 2. Power Spectral Density of 100MB writes across NTFS, exFAT, FAT32, EXT4 from 0 Hz to 10000 Hz, the y-axis magnitude was determined different file system. by “PSD_dB (10log10(V^2/Hz))” TABLE I. SAMSUNG SSD CONFUSION MATRIX a) Feature Extraction: For each current consumption Actual File System signature the power spectral density (PSD) was estimated NTFS FAT32 EXT4 exFAT (Welch method, 80 ms Hamming windows with 50% overlap). Predicted NTFS 430 18 3 8 Since the signals were sampled at 200 kSamples/s the nyquist File FAT32 5 423 0 3 frequency is 100 kHz. For each current consumption signal, System EXT4 13 4 431 24 the frequency axis was divided into 100 ranges of 1,000 Hz exFAT 2 5 16 415 and the power in each range was integrated to get 100 values to represent each signal. Fig. 2 shows a power spectral density TABLE II. CRUCIAL SSD CONFUSION MATRIX plot of a 100MB write across the tested file systems truncated from zero to 10,000 Hz. Once these values were obtained, a Actual File System principal component analysis (PCA) was performed and a NTFS FAT32 EXT4 exFAT number of principal components preserving at least 90% of the Predicted NTFS 438 1 8 2 total data variance (three) was retained. These top three File FAT32 0 449 0 1 principal component values were then used to represent each System EXT4 12 0 422 19 current consumption signature. In the case of the Samsung exFAT 0 0 20 428 Drive, these top three principal components accounted for 98.3% of the variance in the signals. In the case of the Crucial Drive, these top three principal components accounted for 95.1% of the variance. b) k-NN Classifier: The extracted features for each operation were used to build a k-NN (k=1; Euclidean distance) classifier to predict file system type. The extracted features were grouped in one of four groups depending on what file system was used for the original operations. Classifier performance was evaluated on unseen data using 5-fold cross-validation to generate classifier inputs. These inputs were assigned the file types corresponding to their closest values in feature space in the training set.

IV. EXPERIMENTAL RESULTS Data was collected on two SSD manufactures for four different file systems. Specifically, read and write operations were executed on the SSD for file sizes of 10MB, 20MB, 30MB, 50MB, 75MB, 100MB, 200MB, 300MB, 400MB, 500MB, 600MB, 700MB, 800MB, 900MB, and 1GB for the file systems NTFS, exFAT, FAT32, and Ext4. Fig. 3. Training Data of Principal Components of NTFS, exFAT, FAT32, and EXT4 file systems for Samsung Drive

74 The classifier correctly identified the file system of the test ACKNOWLEDGMENTS data with 94.4 percent accuracy for the Samsung Drive and The design team would like to thank the Naval Academy 96.5 accuracy for the Crucial drive. Tables I and II are lab technicians and the members of the United States Naval confusion matrices for the Samsung and Crucial drives, Academy Digital Forensics Lab for their help throughout this respectively, giving the breakdown of classifer-predicted and project. The team would also like to thank ASRC Federal actual file-systems across all test samples. The tables suggest that some misclassification error types may be common across Mission Solutions for their generous support during the drives. For, example EXT4 and exFAT, when misclassified project are commonly confused with each other across both the REFERENCES Crucial and Samsung drives. The high accuracy of the k-NN [1] Yan, Wei, Xuguang Wang, and Xujin Yu. "Design and implementation classifier supports that different file systems cause different of an efficient flash-based SSD architecture." In Information Science energy usage patterns, and that this is true across SSD and Technology (ICIST), 2014 4th IEEE International Conference on, manufactures. pp. 79-83. IEEE, 2014. [2] Midorikawa, Hiroko, Hideyuki Tan, and Toshio Endo. "An evaluation of the potential of flash SSD as large and slow memory for stencil computations." In High Performance Computing & Simulation (HPCS), 2014 International Conference on, pp. 268-277. IEEE, 2014. [3] Kim, Giryoung, and Dongkun Shin. "Performance analysis of SSD write using TRIM in NTFS and EXT4." In Computer Sciences and Convergence Information Technology (ICCIT), 2011 6th International Conference on, pp. 422-423. IEEE, 2011. [4] Bairavasundaram, Lakshmi N., Meenali Rungta, Nitin Agrawa, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, and Michael M. Swift. "Analyzing the effects of disk-pointer corruption." In Dependable Systems and Networks With FTCS and DCC, 2008. DSN 2008. IEEE International Conference on, pp. 502-511. IEEE, 2008. [5] M. Tehranipoor and F. Koushanfar, “A survey of hardware trojan taxonomy and detection,” Design Test, IEEE, . PP, no. 99, pp. 1–1, 2013. [6] Wang, Xiaoxiao, Mohammad Tehranipoor, and Jim Plusquellic. "Detecting malicious inclusions in secure hardware: Challenges and solutions." In Hardware-Oriented Security and Trust, 2008. HOST 2008. IEEE International Workshop on, pp. 15-19. IEEE, 2008. [7] Pecherle, George, Cornelia Győrödi, Robert Győrödi, Bogdan Andronic, and Iosif Ignat. "New method of detection and wiping of sensitive information." In Intelligent Computer Communication and Processing (ICCP), 2011 IEEE International Conference on, pp. 145-148. IEEE, Fig. 4. Training Data of Principal Components of NTFS, exFAT, FAT32, 2011. and EXT4 file systems for Crucial Drive [8] Shey, James, Ryan Rakvic, Hau Ngo, Owens Walker, Thomas Tedesso, Justin A. Blanco, and Kevin Fairbanks. "Inferring trimming activity of solid-state drives based on energy consumption." In Instrumentation and Measurement Technology Conference Proceedings (I2MTC), 2016 IEEE International, pp. 1-6. IEEE, 2016. ONCLUSIONS AND UTURE ORK V. C F W [9] Chamazcoti, Saeideh Alinezhad, Seyed Ghassem Miremadi, and This paper demonstrates that it is possible to use current Hossein Asadi. "On endurance of erasure codes in SSD-based storage systems." In Computer Architecture and Digital Systems (CADS), 2013 measurements from an SSD to infer the file system on the 17th CSI International Symposium on, pp. 67-72. IEEE, 2013. device. In particular, the results indicated that by appropriately [10] Wu, Xiaoquan, Nong Xiao, Fang Liu, Zhiguang Chen, Yimo Du, and capturing and processing voltage measurements taken across a Yuxuan Xing. "RAID-aware SSD: improving the write performance and resistor in series with the power supply, we could infer the file lifespan of SSD in SSD-based RAID-5 system." In Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on, system type with a better than 94% accuracy on SSDs from pp. 99-103. IEEE, 2014. two different manufacturers. [11] S. Li and H. Huang, “Black-box performance modeling for solid-state While the results indicate differences in current drives,” in Modeling, Analysis Simulation of Computer and consumption across different file systems, more work can be Telecommunication Systems (MASCOTS), 2010 IEEE International done to explore a greater range of file sizes. Other potential Symposium on, Aug 2010, pp. 391–393. future avenues for research include expanding the operations [12] G. Bell and R. Boddington, “Solid State Drives: The Beginning of the End for Current Practice in Digital Forensic Recovery?” Journal of beyond read and write operations and increasing the number of Digital Forensics, Security and Law, pp. 1–20, 2010. SSD models tested. Additionally, real time hardware [13] Canclini, JonPaul, James McMasters, James Shey, Owens Walker, accelerated classification of the current signatures could be Ryan Rakvic, Hau Ngo, and Kevin D. Fairbanks. "Inferring read and pursued. Finally, expanding the power consumption analysis write operations of solid-state drives based on energy consumption." In further away from the SSD, potentially to the computers power Ubiquitous Computing, Electronics & Mobile Communication source would address questions regarding how far away from Conference (UEMCON), IEEE Annual, pp. 1-6. IEEE, 2016. the SSD operations can be inferred? [14] Hottinger Baldwin Messtechnik GmbH, “GEN series GEN31 – Data Sheet,” B3762-5.1 datasheet, Nov. 2013.

75 [15] Fairbanks, Kevin D. "An analysis of Ext4 for digital forensics." Digital investigation 9 (2012): S118-S130.

76