A Methodology for Automated Digital Evidence Processing
Total Page:16
File Type:pdf, Size:1020Kb
PhD Thesis Alleviating the Digital Forensic Backlog: A Methodology for Automated Digital Evidence Processing Xiaoyu Du A thesis submitted in fulfilment of the degree of PhD in Computer Science Supervisor: Dr. Mark Scanlon Head of School: Assoc. Prof. Chris Bleakley UCD School of Computer Science College of Science University College Dublin September 2020 Table of Contents List of Figures . vii List of Tables . ix Abstract . xiii 1 Introduction . 1 1.1 Overview . 1 1.1.1 Challenges of Digital Forensics . 2 1.1.2 Approaches and Technologies for the Problem . 2 1.2 Research Questions . 4 1.3 Contribution of this Work . 4 1.4 Limitations of this Work . 5 1.5 Layout of this Thesis . 5 1.6 Brief Overview of the Approach . 6 2 Literature Review . 7 2.1 Introduction . 7 2.2 Digital Forensics . 8 2.2.1 Source of Digital Evidence . 9 2.2.2 Mobile Forensics . 10 2.2.3 Cloud Forensics . 11 2.2.4 IoT Forensics . 12 2.2.5 Digital Forensic Artefacts Example . 13 2.2.6 Digital Evidence Backlogs . 16 2.3 Digital Forensic Test Images . 17 2.3.1 Current Available Disk Images . 18 2.3.2 Automated Disk Image Generation Approach . 18 2.4 Digital Forensic Process Model . 19 2.4.1 Digital Forensic Framework in Initial Phase . 21 2.4.2 Refined Digital Forensic Process Models . 22 2.4.3 Recent Digital Forensic Models for Handling Modern Advancements 23 2.5 Triage Process Model . 25 2.5.1 Digital Device Triage Tools . 26 2.6 Cloud-based Digital Forensic Framework . 27 2.6.1 DFaaS Framework . 27 i 2.6.2 HANSKEN: DFaaS System Used by NFI . 28 2.6.3 Benefits and Advantages of DFaaS . 29 2.7 Data Deduplication and Data Reduction . 30 2.7.1 Data Deduplication Technology . 30 2.7.2 Data Deduplication in Digital Forensics . 31 2.7.3 Data Reduction Approaches . 32 2.8 Automated Digital Forensic Analysis . 33 2.8.1 Challenges of Automation in Digital Forensics . 33 2.8.2 Metadata and Timeline Analysis . 34 2.8.3 Plaso/Log2timeline . 35 2.9 Machine Learning and Digital Forensics . 35 2.9.1 Background of Machine Learning . 35 2.9.2 Machine Learning in Digital Forensics . 36 2.9.3 Background of Deep Learning . 38 2.9.4 Deep Learning Applications in Digital Forensics . 39 2.9.5 Current Challenges and Future Directions . 40 2.10 Correlation Analysis . 41 2.10.1 Cyber-investigation Analysis Standard Expression (CASE) . 42 2.11 File Artefact Prioritisation . 42 2.12 Summary . 43 2.12.1 Gaps in the State of the Art . 43 3 Methodology . 45 3.1 Introduction . 45 3.1.1 A Centralised Digital Evidence Processing System . 45 3.1.2 An Automated File Artefact Analysis Approach . 46 3.1.3 Design of Experimentation and Evaluation . 47 3.2 Tools for Test Disk Images Generation . 48 3.2.1 TraceGen: Overview . 48 3.2.2 TraceGen: Existing Automation Options . 49 3.2.3 EviPlant: Image Generation Using \Evidence Packages" . 51 3.3 Deduplicated Data Acquisition of the System . 52 3.3.1 Deduplicated Acquisition Process Pipeline . 53 3.3.2 Client Responsibilities . 54 3.3.3 Sever Responsibilities . 54 3.3.4 Summary . 55 3.4 Data Storage of the System . 55 3.4.1 Categorising Files Collected from the Disk . 56 3.4.2 Unallocated Space and Slack Space on the Disk . 56 3.4.3 Metadata and Acquisition Records . 57 3.4.4 Acquisition Logs for Performance Analysis . 59 3.5 Forensically Sound Image Reconstruction from Deduplicated Acquisition . 60 3.6 Data Extraction and Preparation . 62 3.6.1 Data Extraction: Tools and Techniques . 62 ii 3.6.2 File Data Reduction/Selection . 62 3.6.3 File Timeline Generation and Feature Extraction . 63 3.6.4 Feature Extraction from File system Metadata . 64 3.6.5 Train/Test Data and Evaluation . 65 3.7 Relevancy File Artefacts Prioritisation . 65 3.7.1 An Overview of the Workflow . 67 3.7.2 Relevancy Score Determination . 68 3.7.3 Adding Pre-trained Models . 69 3.7.4 Relevancy Score Generation . 70 3.8 Summary . 71 4 Test Disk Image Generation . 73 4.1 Overview . 73 4.1.1 Choice of Technologies . 73 4.2 File Data for Disk Image Creation . 74 4.3 EviPlant: Creation, Manipulation and Distribution Disk Image . 75 4.3.1 Overview . 75 4.3.2 Evidence Packages . 75 4.3.3 Testing of Diffing and Injection . 76 4.3.4 Summary . 77 4.4 TraceGen: Automated User Action Emulation . 78 4.4.1 Overview . 78 4.4.2 Virtual Machine Configuration . 78 4.4.3 File Provenance and Interactions . 79 4.4.4 User Action Emulation . 80 4.4.5 Continuous Usage Trace Generation . 80 4.4.6 Summary . 84 5 Deduplicated Digital Evidence Processing . 86 5.1 Overview . 86 5.2 Prototype Setup and Test Data . 87 5.3 Results: Average Acquisition Speed . 87 5.4 Results: Storage Space Saved . 91 5.5 Results: Image Reconstruction . 91 5.6 Tests on an Improved Approach for Data Acquisition . 92 5.6.1 Results . 93 5.7 Summary . 94 5.7.1 Benefits of this Approach . 95 6 File Artefact Analysis and Relevancy Prioritisation . 96 6.1 Overview . 96 6.1.1 Data Processing and Experimentation . 96 6.2 Metadata and Timeline from Disk Image . 97 6.2.1 Timeline Generation and Analysis . 98 iii 6.2.2 An Example Disk Image Timeline . 99 6.2.3 File Timeline Generation . 101 6.2.4 An Example of File Timeline . 102 6.2.5 Feature Extraction from File Timeline . 102 6.3 Metadata-based File Artefact Classification . 103 6.3.1 Example Scenario . ..