Accelerating Digital Forensic Searching Through Gpgpu Parallel Processing Techniques
Total Page:16
File Type:pdf, Size:1020Kb
ACCELERATING DIGITAL FORENSIC SEARCHING THROUGH GPGPU PARALLEL PROCESSING TECHNIQUES A thesis submitted for the degree of Doctor of Philosophy (PhD) by Ethan Bayne School of Design and Informatics, Abertay University. February 2017 Declaration Candidate’s declarations: I, Ethan Bayne, hereby certify that this thesis submitted in partial fulfilment of the requirements for the award of Doctor of Philosophy (PhD), Abertay University, is wholly my own work unless otherwise referenced or acknowledged. This work has not been submitted for any other qualification at any other academic institution. Signed ……………………………………………………………………… Date…………………………………………………………………………. Supervisor’s declaration: I, Robert Ian Ferguson, hereby certify that the candidate has fulfilled the conditions of the Resolution and Regulations appropriate for the degree of Doctor of Philosophy (PhD) in Abertay University and that the candidate is qualified to submit this thesis in application for that degree. Signed ……………………………………………………………………… Date…………………………………………………………………………. Certificate of Approval I certify that this is a true and accurate version of the thesis approved by the examiners, and that all relevant ordinance regulations have been fulfilled. Supervisor…………………………………………………………………. Date………………………………………………………………………… ii Dedication I would like to thank my supervisors – Dr Robert Ian Ferguson and Dr Adam Sampson – for the countless conversations around the different aspects of this research. Their timely encouragement and suggestions have aided in achieving successes beyond anything we expected at the beginning of this investigation. A notable mention goes to Dr Lynsay Shepherd and Dr Gavin Hales. Their friendship (and “bants”) in the department against the dark arts office has kept me sane for the duration of my PhD studies. This work is dedicated to my mum and dad for their continued love and support, without it, this research would have been impossible to accomplish. I would also like to thank my wife – Cecilia – for her continued patience and encouragement. Without her emotional support and supply of caffeine at home, I would not have had the same motivation to succeed in my research. iii ABSTRACT Background String searching within a large corpus of data is a critical component of digital forensic (DF) analysis techniques such as file carving. The continuing increase in capacity of consumer storage devices requires similar improvements to the performance of string searching techniques employed by DF tools used to analyse forensic data. As string searching is a trivially-parallelisable problem, general purpose graphic processing unit (GPGPU) approaches are a natural fit. Currently, only some of the research in employing GPGPU programming has been transferred to the field of DF, of which, a closed-source GPGPU framework was used— Complete Unified Device Architecture (CUDA). Findings from these earlier studies have found that local storage devices from which forensic data are read present an insurmountable performance bottleneck. Aim This research hypothesises that modern storage devices no longer present a performance bottleneck to the currently used processing techniques of the field, and proposes that an open-standards GPGPU framework solution – Open Computing Language (OpenCL) – would be better suited to accelerate file carving with wider compatibility across an array of modern GPGPU hardware. This research further hypothesises that a modern multi-string searching algorithm may be better adapted to fulfil the requirements of DF investigation. Methods This research presents a review of existing research and tools used to perform file carving and acknowledges related work within the field. To test the hypothesis, parallel iv file carving software was created using C# and OpenCL, employing both a traditional string searching algorithm and a modern multi-string searching algorithm to conduct an analysis of forensic data. A set of case studies that demonstrate and evaluate potential benefits of adopting various methods in conducting string searching on forensic data are given. This research concludes with a final case study which evaluates the performance to perform file carving with the best-proposed string searching solution and compares the result with an existing file carving tool— Foremost. Results The results demonstrated from the research establish that utilising the parallelised OpenCL and Parallel Failureless Aho-Corasick (PFAC) algorithm solution demonstrates significantly greater processing improvements from the use of a single, and multiple, GPUs on modern hardware. In comparison to CPU approaches, GPGPU processing models were observed to minimised the amount of time required to search for greater amounts of patterns. Results also showed that employing PFAC also delivers significant performance increases over the BM algorithm. The method employed to read data from storage devices was also seen to have a significant effect on the time required to perform string searching and file carving. Conclusions Empirical testing shows that the proposed string searching method is believed to be more efficient than the widely-adopted Boyer-Moore algorithms when applied to string searching and performing file carving. The developed OpenCL GPGPU processing framework was found to be more efficient than CPU counterparts when searching for greater amounts of patterns within data. This research also refutes claims that file carving is solely limited by the performance of the storage device, and presents compelling evidence that performance is bound by the combination of the performance of the storage device and processing technique employed. v TABLE OF CONTENTS Declaration ........................................................................................................................................... ii Certificate of Approval .......................................................................................................................... ii Dedication............................................................................................................................................ iii ABSTRACT ............................................................................................................................................ iv List of Figures ........................................................................................................................................ x List of Tables ........................................................................................................................................ xii Definitions .......................................................................................................................................... xiii Chapter 1: INTRODUCTION ............................................................................................................ 1 1.1 Motivation .................................................................................................................................. 1 1.2 Problem ...................................................................................................................................... 2 1.3 Thesis aim ................................................................................................................................... 3 1.4 Thesis contribution ..................................................................................................................... 4 1.5 Thesis organisation ..................................................................................................................... 5 Chapter 2: BACKGROUND .............................................................................................................. 7 2.1 A historic perspective on the current state of digital forensics .................................................. 7 2.2 An introduction to the concepts of file carving ........................................................................ 11 2.3 Introduction to string searching algorithms ............................................................................. 13 2.4 Differences between CPU and GPU architecture ..................................................................... 15 2.5 Understanding OpenCL and GPGPU processing ....................................................................... 19 2.6 Related work ............................................................................................................................. 23 2.7 Chapter summary ..................................................................................................................... 27 Chapter 3: SOLUTION CONSTRUCTION AND TESTING METHODOLOGY........................................ 28 3.1 Research platform development .............................................................................................. 28 3.1.1 Technologies used ............................................................................................................ 29 vi 3.1.2 Research platform design ................................................................................................ 30 3.1.3 Research platform implementation ................................................................................. 33 3.1.4 Research platform testing ................................................................................................ 37 3.2 Algorithm choices for data analysis .........................................................................................