A Framework for Identifying Associations in Digital Evidence Using Metadata
Total Page:16
File Type:pdf, Size:1020Kb
A Framework for Identifying Associations in Digital Evidence Using Metadata A thesis by Sriram Raghavan Bachelor of Technology (Electronics), Anna University INDIA Master of Science (Computer Science), IIT Madras INDIA Submitted in accordance with the regulations for the award of Degree of Doctor of Philosophy School of Electrical Engineering and Computer Science Science & Engineering Faculty Queensland University of Technology, Brisbane May 2014 1 Keywords Association Group, Digital Artifact, Evidence Composition, Metadata match, Metadata Associations Model, Provenance Information Model, Similarity Pocket, Similarity Group, Unified Forensic Analysis 2 Abstract During a digital forensics investigation, it is often necessary to identify ‘related’ files and log records for analysis. While this is typically achieved through manual examination of content, recent advances in storage technologies pose two major challenges: the growing volumes of digital evidence; and the technological diversity in storage of data in different file formats and representations. Both these challenges call for a scalable approach to determine related files and logs from one or more sources of digital evidence. In this thesis, I address some of the challenges involved in identifying associations that are inherent among the sources of digital evidence, via their metadata. Metadata pertains to information that describes the data stored in a source, be it a hard disk drive, file system, individual file, log record or a network packet. By definition, metadata as a concept is ubiquitous across multiple sources and hence presents an ideal vehicle to integrate heterogeneous sources and to identify related artifacts. I develop a metadata based model and define metadata-association based relationships to identify ‘related’ files, log records and network packets, using metadata value matches. While there are many tools that allow the extraction of metadata for examination (using only a small fraction of the available metadata), the analysis step is left largely to the individual forensics investigator. In this thesis, I present a consolidated review of such tools and identify specific functionalities that are essential to integrate multiple sources for conducting automated analysis. Besides, I develop a framework and design the associated technology architecture to integrate these functionalities. In this framework, I define a metadata-based layer for automating the analysis – called the metadata association model which identifies the metadata value matches across arbitrary artifacts and organizes them based on the association semantics into meaningful groups. I have built a prototype toolkit of this model, called the AssocGEN analysis engine, for 3 identifying related artifacts from forensic disk images and log files. I evaluated this tool on heterogeneous collections of digital image files and word processing documents and successfully identified doctored files and determined the origin of artifacts downloaded from the Web. Apart from associations in digital evidence that arise out of metadata value-matches, time- based sequencing is an essential part of forensic investigations. Timestamps are an important part of metadata that correspond to events that transpired. Sequencing these timestamps gives rise to a timeline. However, generating a timeline across heterogeneous sources poses several challenges; timestamp interpretation is one such. To address this issue, I develop a provenance information model and the associated technology architecture to incorporate timestamp interpretation while generating a unified timeline across multiple heterogeneous sources. The provenance information model can also validate time-based assertions by comparing semantically related timestamps to establish evidence consistency. I have built a prototype toolkit of this model, called UniTIME, to generate unified timelines from files, logs and packet captures and successfully evaluated it on datasets containing FAT file systems and ZIP file formats. In summary, this research develops a framework for identifying associations in digital evidence using metadata for forensic analyses. I show that metadata based associations can help uncover the inherent relationships between heterogeneous digital artifacts which can aid reconstruction of past events. I also show that metadata association based analysis is amenable to automation by virtue of the ubiquitous nature of metadata across forensic disk images, files, system and application logs and network packet captures. The results obtained demonstrate that metadata based associations can be used to extract many meaningful relationships between digital artifacts, thus potentially benefitting real-life forensics investigations. 4 Table of Contents Keywords ............................................................................................................................................. 2 Abstract ................................................................................................................................................ 3 List of Figures ................................................................................................................................ ....10 List of Tables ................................................................................................................................. …12 List of Abbreviations ....................................................................................................................... .13 Acknowledgements……………………………………………………...…………………...…..….17 Declaration……………………………………………………………………………………....…..18 Previously Published Material………………………………………………………………..……19 1 Introduction………………………………………………………………….…….………..……...21 1.1 The Heterogeneous Nature of Digital Evidence ................................................................................. 22 1.2 Motivation for Finding Associations in Digital Evidence .................................................................. 25 1.2.1 Concept of a Digital Artifact .......................................................................................................... 27 1.2.2 Metadata in Digital Investigations .................................................................................................. 27 1.3 Using Metadata to Determine Associations in Digital Evidence ........................................................ 28 1.4 Objectives of this Thesis ..................................................................................................................... 28 1.5 Overview of the Research Method ...................................................................................................... 30 1.6 Contributions from this Research ........................................................................................................ 32 1.7 Chapter Summary ............................................................................................................................... 34 2 Related Work…………………………..…………………………………………………………...37 2.1 Digital Forensics: A Multi-Staged Scientific Process ......................................................................... 37 2.2 Related Research ................................................................................................................................. 39 2.2.1 Modeling the Digital Forensic Process ........................................................................................... 40 2.2.2 Evidence Acquisition and Representation ...................................................................................... 43 2.2.3 Evidence Examination & Discovery .............................................................................................. 45 2.2.4 Digital Forensic Analysis ............................................................................................................... 48 2.3 Metadata … in and as … Digital Evidence ......................................................................................... 51 2.3.1 File Metadata .................................................................................................................................. 51 2.3.2 Use of File Metadata in Digital Forensics ...................................................................................... 53 2.3.3 Metadata for Grouping Files .......................................................................................................... 54 5 2.3.4 Extending Metadata to Logs and Network Packet Captures .......................................................... 54 2.4 Timestamps as Metadata and Digital Time-lining .............................................................................. 56 2.4.1 Timestamp Semantics and Interpretation ....................................................................................... 56 2.4.2 Causal Ordering of Events .............................................................................................................. 57 2.4.3 Timestamp Representation Across Systems ................................................................................... 58 2.5 Chapter Summary ............................................................................................................................... 59 3 Research Method………………………………………………..………………………………….61 3.1 Research Methodology ....................................................................................................................... 61 3.2 Identifying the Research tasks