Cluster-Slack Retention Characteristics: a Study of the NTFS Filesystem
Total Page:16
File Type:pdf, Size:1020Kb
Department of Computer Science Zak Blacher Cluster-Slack Retention Characteristics: A Study of the NTFS Filesystem Master's Thesis D2010:06 Cluster-Slack Retention Characteristics: A Study of the NTFS Filesystem Zak Blacher c 2010 The author and Karlstad University This thesis is submitted in partial fulfillment of the requirements for the Masters degree in Computer Science. All material in this thesis which is not my own work has been identified and no mate- rial is included for which a degree has previously been conferred. Zak Blacher Approved, 10th June, 2010 Advisor: Thijs Holleboom Examiner: Donald Ross iii Abstract This paper explores the statistical properties of microfragment recovery techniques used on NTFS filesystems in the use of digital forensics. A microfragment is the remnant file-data existing in the cluster slack after this file has been overwritten. The total amount of cluster slack is related to the size distribution of the overwriting files as well as to the size of cluster. Experiments have been performed by varying the size distributions of the overwriting files as well as the cluster sizes of the partition. These results are then compared with existing analytical models. v Acknowledgements I would like to thank my supervisors Thijs Holleboom and Johan Garcia for their support in the creation of this document. I would very much like to thank Anja Fischer for her help proofreading and formatting this document, and to Thijs for providing some of the graphics used to demonstrate these models. I would also like to thank Johan Garcia and Tomas Hall for providing the C and C++ code used to generate and count the file fragments. Additionally, I would like to thank the community at #windows on irc.freenode.net for their help and pointers in understanding and making sense of the NTFS filesystem. I would also like to thank the Microsoft Corporation (R) for the creation of the NTFS filesystem. vii Contents 1 Introduction 1 1.1 Introduction . 1 1.2 F.I.V.E.S. 2 1.3 Units . 2 2 Background 3 2.1 Introduction . 3 2.2 Digital Forensics . 3 2.3 Hard Drive Structure . 4 2.3.1 Files and File Distributions . 5 2.3.2 Microfragment Analysis . 6 2.3.3 NTFS . 8 2.3.4 Expected Microfragment Distribution . 9 2.4 Overview of the Microfragment Analysis Model . 10 2.4.1 Fixed Distribution . 11 2.4.2 Uniform Distribution . 11 2.4.3 Exponential Distribution . 12 3 Experiments 15 3.1 Introduction . 15 ix 3.1.1 Testbed . 16 3.2 findGenFrag Experiments . 16 3.2.1 'File Size Distribution' Test . 16 3.2.2 'Cluster Size Distribution' Test . 20 3.2.3 '30 Repetitions' Test . 22 3.2.4 'Rolling Hash' Tests . 24 3.3 Summary . 25 4 Results 27 4.1 Introduction . 27 4.2 findGenFrag Results . 28 4.2.1 'File Size Distribution' Test . 28 4.2.2 'Cluster Size Distribution' Test . 29 4.2.3 '30 Repetitions' Test . 32 4.2.4 'Rolling Hash' Tests . 35 4.3 Summary . 39 5 Conclusion 41 5.1 Microfragment Collection . 41 5.2 Future Work . 41 References 43 A 45 A.1 Graph Data . 45 A.1.1 Uniform Distribution Calculations . 45 A.1.2 'File Size Distribution' Test . 46 A.1.3 'Cluster Size Distribution' Test . 46 A.1.4 '30 Repetitions' Test . 49 x A.1.5 'Rolling Hash' Tests . 49 A.2 Fragment Analysis . 51 A.2.1 findGenFrag.c . 51 A.2.2 genDistrFile.c . 53 A.3 Python Scripts . 64 A.3.1 script2.py . 64 A.3.2 script rh.py . 67 A.4 Extension & Misc Functions . 69 A.4.1 OpenOffice Graph Export Macro . 69 A.4.2 extension-functions.c . 71 xi List of Figures 2.1 Hard Drive Structure . 4 2.2 Windows XP Default File Distribution . 6 2.3 Cluster Slack Example . 7 3.1 Cluster Size v. Slack Recovery . 20 4.1 Cluster Size Distribution Test . 28 4.2 Tail Overlap Example . 29 4.3 Cluster Size Test (rf param v. cluster size) . 30 4.4 Cluster Size Test (cluster size v. rf param) . 31 4.5 Cluster Size Test (Microfragment Recovery) . 32 4.6 Repetitions 4096 Test (Uniform v. Exponential) . 33 4.7 Repetitions 4096 Test . 34 4.8 Recovered Microfragments (Experimental v. Analytical) . 34 4.9 Size Distribution Tests Comparison (Hashes v. Fragments) . 35 4.10 Size Distribution Tests (Hashes by Cluster Size) . 36 4.11 Cluster Size Tests (Rolling Hash) . 36 4.12 Cluster Size Tests (Ratio Demonstration) . 37 4.13 Microfragment v. Hash Ratios . 38 4.14 Microfragment Detection Results . 39 xiii List of Tables 2.1 NTFS Feature List . 9 A.1 'Uniform Probability' graph data . 45 A.2 'File Size Distribution' graph data . 46 A.3 'Cluster Size Distribution' graph data . 48 A.4 '30 Repetitions' graph data . 49 xv Chapter 1 Introduction 1.1 Introduction This chapter will present a brief overview as to the goal of this thesis as well as provide an introduction to the FIVES project and the units used within this document. The purpose of this dissertation is to explore and quantify the results that come from the microfragment analysis of an NTFS volume. If these results prove consistent and reliable, then the use of tail slack inspection (explained in Chapter 2) can be seen as a viable means of forensic file fingerprint recovery. This document will perform a series of microfragment analysis tests (expanded upon in Chapter 3) and compare the results with the models described in reference [1], 'Fragment Retention Characteristics in Slack Space.' Two tests have been additionally designed to compare the matching abilities of microfragment analysis with that of rolling hash block recovery, a more traditional approach used in digital forensics. 1 2 CHAPTER 1. INTRODUCTION 1.2 F.I.V.E.S. The goal of the Forensic Image and Video Examination Support project[2] is to develop a set of automated investigative tools to be used in conjunction with law enforcement agencies to assist in the detection of data. The role of microfragment matching is useful in demonstrating the previous existence of offending files on a device, and is the central aspect of the FIVES toolkit. The basis of the experiments performed in Chapter 3 will be the demonstration of the precision and effectiveness of these tools. FIVES is a targeted project within the Safer Internet Program[3]. 1.3 Units The standard block size in this document is 512 x 8 bit bytes. All units measured refer to the IEC binary unit unless otherwise specified. Efforts have been made to use the notation kibibyte (210 bytes) and its abbreviation 'KiB' instead of the SI defined kilobyte (103 bytes) and it's respective abbreviation 'Kb'. Multiples of the IEC unit are mebibytes (MiB) and gibibites (GiB) which are 220 and 230 bytes respectively. More about these units can be found on the Wikipedia entry page for kilobyte[4]. This notation has not yet met widespread acceptance into the technical vernacular, and a few different standards are accepted in various computer related fields. However, in this dissertation it is important to differentiate between the SI and the IEC units as certain calculations are performed using numbers from both bases. Chapter 2 Background 2.1 Introduction The purpose of this chapter is to expand upon the motivation behind this thesis; the analysis of distributed files and detection of remnant data. To this end, this chapter will briefly introduce the concept of digital forensics, provide a basic overview of traditional hard drive construction and describe the process in which microfragments are generated. Furthermore, it will provide a comprehensive description of the NT filesystem features, a graphic demonstrating the general trend for file sizes on a typical NTFS partition, and finally a restatement of the formulas for calculating the expected appearance of microfrag- ments on a device. 2.2 Digital Forensics In contrast with criminal forensics, the goal of digital forensics is to explain the state of the digital artifact rather than determine the cause. The current implementations of forensic file recovery involve taking a full snapshot of a hard disk or device, and performing an analysis on either the unallocated area or 3 4 CHAPTER 2. BACKGROUND on the full volume. The goal of these methods is the attempt to recover metadata and unreallocated sectors from deleted files. From the deleted sectors, it is possible to extract parts of the underlying data, but the idea of fully recovering overwritten data is infeasible, as the magnetic media does not retain any history. 2.3 Hard Drive Structure Traditionally, storage has been expressed in terms of cylinder, head, and sector count tuples (CHS). A typical hard drive is composed of several rotating platters. Each face of each platter is divided into concentric rings called cylinders. These cylinders are further divided into arc-sections called blocks, and these blocks typically store 512 bytes of data. Each face of the each platter has a separate read head that floats just above the surface. The read seeks to a cylinder, and captures data from the chosen block. Often these devices would capture many blocks at a time (see figure 2.1. Figure 2.1: Hard Drive Physical Structure 2.3. HARD DRIVE STRUCTURE 5 For example, a floppy disk reports 80 cylinders, 2 heads, and 18 sectors of 512 bytes each1. 80 ∗ 2 ∗ 18 ∗ 512B = 1474560B = 1440 KiB, which is the standard capacity for a high-density floppy diskette2. This notation is not without problems, however. The original Master Boot Record specifications allowed 16 bits of information to represent 1024 cylinders, 255 heads, and 64 sectors[5], limiting devices to approximately 8.4 GiB.