Calhoun: The NPS Institutional Archive
Faculty and Researcher Publications Faculty and Researcher Publications
2012 Digital forensics XML and the DFXML toolset
Garfinkel, Simson
Digital Investigation, Volume 8, pp. 161-174, 2012 http://hdl.handle.net/10945/44286 Digital Investigation 8 (2012) 161–174
Contents lists available at SciVerse ScienceDirect
Digital Investigation
journal homepage: www.elsevier.com/locate/diin
Digital forensics XML and the DFXML toolset
Simson Garfinkel*
Naval Postgraduate School, 900 N. Glebe, Arlington, VA 22203, USA article info abstract
Article history: Digital Forensics XML (DFXML) is an XML language that enables the exchange of structured Received 6 May 2011 forensic information. DFXML can represent the provenance of data subject to forensic Received in revised form 18 November 2011 investigation, document the presence and location of file systems, files, Microsoft Accepted 25 November 2011 Windows Registry entries, JPEG EXIFs, and other technical information of interest to the forensic analyst. DFXML can also document the specific tools and processing techniques that were used to produce the results, making it possible to automatically reprocess Keywords: forensic information as tools are improved. Digital forensics xml DFXML This article presents the motivation, design, and use of DFXML. It also discusses tools that fi Forensic tools have been creased that both ingest and emit DFXML les. Forensic tool validation Published by Elsevier Ltd. Forensic automation
1. Introduction cryptographic hashes. Because these programs now both emit compatible DFXML, their output can be processed by Digital Forensics XML (DFXML) is an XML language a common set of tools. designed to represent a wide range of forensic information DFXML can also document provenance, including the and forensic processing results. By matching its abstractions computer on which the application program was compiled, to the needs of forensics tools and analysts, DFXML allows the linked libraries, and the runtime environment. Such the sharing of structured information between independent provenance can be useful both in research and in preparing tools and organizations. Since the initial work in 2007, courtroom testimony. DFXML has been used to archive the results of forensic DFXML’s minimal use of XML features means that the processing steps, reducing the need for re-processing digital forensic abstractions, APIs and representations described in evidence, and as an interchange format, allowing labeled this paper can be readily migrated to other object-based forensic information to be shared between research serializations, including JSON (Zyp and Court, 2010), collaborators. DFXML is also the basis of a Python module Protocol Buffers (Google, 2011) and the SQL schema (dfxml.py) that makes it easy to create sophisticated implemented in SleuthKit 3.2 (Carrier, 2010). Indeed, it is forensic processing programs (or “scripts”) with little effort. possible to readily convert between all four formats. Forensic tools can be readily modified to emit and consume DFXML as an alternative data representation 1.1. The need for DFXML format. For example, the PhotoRec carver (Grenier, 2011) and the md5deep hashing application (Kornblum, 2011) Today’s digital forensic tools lack composability. Instead were both modified to produce DFXML files. The DFXML of being designed with the Unix approach of tools that can output contains the files identified, their physical location be connected together to solve big problems, most within the disk image (in the case of PhotoRec), and their commonly used forensic tools are monolithic systems designed to ingest a small number of data types (typically * Corresponding author. Tel.: þ1 617 876 6111. disk images and hash sets) and produce a limited set of E-mail address: slgarfi[email protected]. output types (typically individual files and final reports).
1742-2876/$ – see front matter Published by Elsevier Ltd. doi:10.1016/j.diin.2011.11.002 162 S. Garfinkel / Digital Investigation 8 (2012) 161–174
This is true both of tools with limited functionality (e.g., 1.3. Contributions simple file carvers), as well as complex GUI-based tools that include integrated scripting languages. The lack of com- This paper makes several specific contributions to the posability complicates automation and tool validation field of digital forensics. First, it describes the motivation efforts, and in the process has subtly limited the progress of and design goals for DFXML. Second, the paper presents digital forensics research. specific examples of how DFXML can be used to describe Although there are existing file formats and a few XML forensic artifacts. These examples make it easy for devel- languages used in digital forensics today, they are confined opers of today’s forensic tools to adapt their tools to emit to specific applications and limited domains. The lack of and ingest DFXML as a complement to their current file standardized abstractions makes it difficult to compare formats. Next, it presents an API that allows for the rapid results produced by different tools and algorithms. This prototyping and development of forensic applications. lack of standardization similarly impacts tool developers, Finally, it describes how the DFXML abstractions can be who must frequently implement functions in their tools used as a building block for creating new automated that exist in others. forensic processes.
2. Prior work 1.2. Specific uses for DFXML Although file formats, abstractions, and XML are all used DFXML improves composability by providing a language in digital forensics today, they are rarely themselves the for describing common forensic processes (e.g., crypto- subject of study. Mainly, these topics arise when practi- graphic hashing), forensic work products (e.g., the location tioners discover that they cannot share information with of files on a hard drive), and metadata (e.g., file names and one another, or even between different tools, because data timestamps). are stored in different formats. Various prototype DFXML implementations have been used by the author since 2007 for a variety of purposes: 2.1. Digital evidence containers