A General Strategy for Differential Forensic Analysis

DIGITAL FORENSIC RESEARCH CONFERENCE A General Strategy for Differential Forensic Analysis By Simson Garfinkel, Alex Nelson and Joel Young From the proceedings of The Digital Forensic Research Conference DFRWS 2012 USA Washington, DC (Aug 6th - 8th) DFRWS is dedicated to the sharing of knowledge and ideas about digital forensics research. Ever since it organized the first open workshop devoted to digital forensics in 2001, DFRWS continues to bring academics and practitioners together in an informal environment. As a non-profit, volunteer organization, DFRWS sponsors technical working groups, annual conferences and challenges to help drive the direction of research and development. http:/dfrws.org Digital Investigation 9 (2012) S50–S59 Contents lists available at SciVerse ScienceDirect Digital Investigation journal homepage: www.elsevier.com/locate/diin Ageneralstrategyfordifferentialforensicanalysis Simson Garfinkel a,*, Alex J. Nelson b, Joel Young a a Computer Science, Naval Postgraduate School, 900 N Glebe St., Arlington, VA 2203, USA b Computer Science, University of California, Santa Cruz, 1156 High St, Santz Cruz, CA 95064, USA abstract Keywords: The dramatic growth of storage capacity and network bandwidth is making it increasingly Forensics difficult for forensic examiners to report what is present on a piece of subject media. Differencing Instead, analysts are focusing on what characteristics of the media have changed between Forensic strategies two snapshots in time. To date different algorithms have been implemented for per- Feature extraction forming differential analysis of computer media, memory, digital documents, network Temporal analysis traces, and other kinds of digital evidence. This paper presents an abstract differencing strategy and applies it to all of these problem domains. Use of an abstract strategy allows the lessons gleaned in one problem domain to be directly applied to others. Published by Elsevier Ltd. 1. Introduction month-to-month traffic summaries in an attempt to learn how demands on their networks evolve, as well as to This paper describes differential forensic analysis, identify the presence of malware. a practice that is increasingly used by digital forensic No matter what specific modality is being examined, all examiners but has not been formalized until now. of these use cases involve the collection of at least two Differential forensic analysis compares two different digital objectsda baseline and a final image. Differential digital forensic images (or, more generally, any pair of analysis reports the differences between the twodthat is, digital artifacts) and reports the differences between them. what has changed. But despite the similarity of purpose, to Focusing on the changes allows the examiner to reduce the date each differential analysis use case has been developed amount of information that needs to be examined (by in isolation, with different procedures, tools and reporting eliminating that which does not change), while simulta- standards. neously focusing on the changes that are thought to be the We show that these scenarios can all be implemented result of a subject’s activities (for presumably, it was the using the same strategy. Furthermore, the strategy can activity of the subject that somehow transformed the first cover scenarios apparently unrelated to computer foren- digital image into the second). sics, such as reporting on the changes within a document or Differential analysis is widely practiced today. Reverse even file synchronization. The key to this strategy is the engineers attempt to infer the behavior of malware by extraction of features from the digital artifacts in which comparing the contents of a hard drive before the malware each feature has a separately describable name, location, is introduced with the hard drive captured after the mal- content, and possibly other metadata. ware infection. Sex offenders on many controlled release programs must submit their computers for regular analysis, 1.1. Contributions so that an examiner can determine if the offender has visited a banned website. Network engineers compare This paper presents a principled study of differential analysis and then applies that work to multiple contexts, * Corresponding author. Tel.: 1 617 876 6111. including the analysis of files on a computer’s disk drive, þ E-mail address: [email protected] (S. Garfinkel). the pattern of data sent across a network, and even reports 1742-2876/$ – see front matter Published by Elsevier Ltd. doi:10.1016/j.diin.2012.05.003 S. Garfinkel et al. / Digital Investigation 9 (2012) S50–S59 S51 from other forensic tools. We show that a small set of well- are collections of bytes of variable length that are identified chosen abstractions allows the same differential analysis by strings (typically a path name consisting of one or more strategy to be applied to all of these cases. directory names and a final file name). It is important to note that the tools we have written We use the term image to refer to any kind of digital were created before we formalized our general strategy, not artifact. In this article we occasionally use the word object after. Although it would be quite elegant to have a single as a synonym for image when warranted by context. implementation of differential analysis and then to Baseline image (A). The image first acquired at time TA. specialize that implementation for each modality, what Final image (B). The last acquired image, taken at time TB. actually happened is that we unwittingly wrote multiple Intermediary images (In). Zero or more images recorded implementations of the same abstract strategy each time between the baseline and final images. Image In is the nth we wrote another differential analysis program. Only after image acquired. writing several different differential analysis tools were we Common baseline. A single image that is a common able to appreciate the commonalities between the imple- ancestor to multiple final images. For example, a in Equa- mentations and to realize that the strategy could be made tion (2) is a common baseline for A and B. general by an appropriate choice of abstraction. Image delta (B A). The differences between two À images, typically between the baseline image and the final 2. Definitions, terminology and notation image. Differencing strategy. A strategy for reporting the In this section we introduce a consistent terminology for differences between two or more images. discussing differential analysis. We apply this terminology Differencing strategies and algorithms that implement to prior work as well as to our own contributions. those strategies have long been applied to programs, text, Differential analysis. An analytical process that compares and word processing files (Horwitz, 1990), and are widely two objects (images) A and B and reports the differences available in tools such as Unix diff and Microsoft Word. between them. Although at first it might seem most Traditionally there has been little distinction between the sensible to report the differences as (B A), experience has tool that implements the algorithm and the algorithm À shown that it is frequently more useful to report the itself, and both have been developed for specific differ- differences as the series of operations R necessary to encing tasks. transform A into B: This paper presents a general strategy for differential analysis. By general we mean that the strategy can be A / B (1) R equally applied to other articles of forensic interest, such as memory images and network packet dumps. For example, Typically A and B represent snapshots in timedA might if A and B are collections of packets sent over a network on be an image of a hard drive recorded before a computer is two successive days, the examiner might be interested in deployed, and B might be an image of the same drive after an R that describes changes to metadata describing it has been compromised by an attacker. However both A network flowsdfor example, that a web server that was and B might be two different systems that are based on previously listening on one IP address and port was moved a a common object : to another location, or that a protocol that was previously protected with SSL is no longer using encryption. On the other hand, a differential analysis of a Microsoft Word (2) document at two points in time might report that some paragraphs have been changed while others have been moveddan analysis performed by Word’s “Compare Typically the operations R that are reported are a func- Documents” feature, or the Unix command-line diff utility tion of both the data formats and the needs of the exam- on text files. iner. If A and B are disk images and the examiner is Feature (f). A piece of data or information that is either evaluating the installation footprint of a new application, explicitly extracted from the image or otherwise dependent then R might be a list of files and registry entries that are upon data within the image. For example, an email address created or changed. But if the examiner is looking for from an address book, a URL from a browser cache, the hash evidence of a malware infection, R might be a list of op- value of a sector, and a histogram of port frequency use codes that are changed in existing executables. within a set of packets are all examples of features. Image. A byte stream from any data-carrying device Feature in image ((A, f)). Features typically are found in representing the object under analysis. Practitioners will be images. In this case, feature f is found in image A. familiar with disk images, memory images and cell phone Feature name (NAME (A, f)). Every feature may have zero, images. Images may be physical, which can be thought of as one or multiple names. For example, if a feature is the a collection of sectors, or logical, which can be thought of as contents of a file, the feature name might be the file name.

A General Strategy for Differential Forensic Analysis

Why Is C. Diff So Hard to Culture and Kill?

Tracking Computer Use with the Windows® Registry Dataset Doug

Unix Programmer's Manual

Fine-Grained and Accurate Source Code Differencing Jean-Rémy Falleri, Floréal Morandat, Xavier Blanc, Matias Martinez, Martin Monperrus

AN4421 – Using Codewarrior Diff Tool

Unix Commands (09/04/2014)

Introduction to UNIX Command Line

Ce 322 (0246Er ©Erc D00HP Compiler Construction Prof. Levy Problem Set 6 Due Monday 08 March Reading Assignm

IBM ILOG CPLEX Optimization Studio OPL Language Reference Manual

SAS on Unix/Linux- from the Terminal to GUI

Ka324 Ka324a Ka2902

Integrating Simdiff 4 with Git + Sourcetree