Wirespeed: Extending the AFF4 Forensic Container Format for Scalable Acquisition and Live Analysis

Digital Investigation 14 (2015) S45eS54 Contents lists available at ScienceDirect Digital Investigation journal homepage: www.elsevier.com/locate/diin DFRWS 2015 US Wirespeed: Extending the AFF4 forensic container format for scalable acquisition and live analysis Bradley L. Schatz Schatz Forensic, Level 10 149 Wickham Tce Brisbane, QLD 4000, Australia abstract Keywords: Current approaches to forensic acquisition are failing to scale to large devices and fast Digital forensics storage interfaces. The research described in this paper identifies limitations in current Evidence containers widely deployed forensic image formats which limit both the ability to acquire evidence at Live forensics maximal rates, and to undertake live analysis in today's environment. Extensions to the Acquisition AFF4 forensic file format are proposed which address these limitations. The proposals have Imaging AFF4 been implemented and proof of concept demonstrated by demonstrating that non-linear partial images may be taken at rates that exceed current physical acquisition approaches, and by demonstrating linear acquisition at rates significantly exceeding current approaches: in the range of 400 MB/se500 MB/s (24e30 GB/min). © 2015 The Authors. Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Introduction in regard to acquisition throughput. Zimmerman (2013) observes far lower rates of in the low 100's of MB/s, and Within the field of digital forensics the volume problem Bertasi and Zago (2013) observes peak rates of 110 MB/s. is well recognized: more devices and larger storage com- Source devices with I/O rates exceeding the commodity pound the amount of data to preserve and analyse per case. SATA hard drive threshold of around 200 MB/s are In 2015, storage technology exhibits a significant amount of increasingly common. Formerly it may have only been diversity: the capacity of spinning disk hard drives grows at servers with RAID subsystems which produced this load; a rate faster than I/O rates; SSD's are capable of I/O rates today's SSD's, and in-cloud virtual attached storage many times faster than their spinning disk cousins; and commonly double this rate. Acquisition of such devices storage is increasingly absent a direct attachment, located commonly proceeds at rates far lower than the maximum I/ across comparatively low I/O rate networks, such as cloud O rate of such devices. based storage. The above trends result in bandwidth constrained Forensic acquisition has failed to scale with this growth spinning disks longer acquisition times, and in the band- in both volume and I/O rates. In practice, forensic imaging width rich devices, sub optimal acquisition durations. Both remains generally reliant on the imaging approach defined of these contribute to latency between the identification of over a decade ago e the linear and complete image, target evidential devices and the gaining of meaningful integrity protected by a hash, and optionally block com- analytic results, due in part to lack of cohesion between pressed (Rosen, 2002). In the field, the dominant methods forensic processing steps (Roussev et al., 2013). of imaging generally achieve I/O rates in the realm of Current responses to minimize this latency when facing 100e150 MB/s, with hardware imaging tool manufacturers very large sets of evidence fail to be wholly satisfactory to beginning to promise speeds of a maximum of 250 MB/s the practitioner: preserve a limited subset of available ev- (Tableau, 2014). Few empirical studies exist in the literature idence and run the risk of not preserving potentially relevant evidence (triage), or cause significant availability E-mail address: [email protected]. impacts on computing resources while complete imaging http://dx.doi.org/10.1016/j.diin.2015.05.016 1742-2876/© 2015 The Authors. Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license (http:// creativecommons.org/licenses/by-nc-nd/4.0/). S46 B.L. Schatz / Digital Investigation 14 (2015) S45eS54 of everything proceeds. Current research proposals hard disk. These interactive accesses are prioritized by a addressing this end to end latency issue are scarce. The scheduler ahead of other accesses. LOTA (Roussev et al., 2013) proposal goes far in conceptu- The analyst might also task the acquisition of various alizing the problem as a real time processing problem, categories of data via a separate GUI controlling the however, due to the requirement for low hundreds of CPU acquisition process: our current implementation allows for cores per target device being processed, the proposed file subsets such as event logs, registries & page files, and approach is generally only applicable to in-lab usage, wider categories such as all allocated data, and unallocated limiting its application to evidence seized (collected) from data. This enables the analyst to successively widen the the field and processed within the laboratory. scope of acquisition as their live analysis activities proceed. The primary hypotheses of this research are that latency Further, live analysis might result in the determination that between the beginning of acquisition and the obtaining of the target device is not relevant, and the acquisition process analysis results can be reduced by reframing acquisition as terminated, the resulting partial image remaining as a basis a (potentially) partial and non-linear activity, and by for reproducing the said analysis. employing a forensic image format that enables the former while not creating bottlenecks. We posit that for contexts where seizure is not an option Maximizing acquisition rate (for example Civil, covert, Incident Response, and at scale of hundreds of machines) this latency can be minimized in Maximizing the rate at which the above non-linear two ways. First, by employing a forensic container format sequence of blocks might be assembled into a forensic that enables acquisition to proceed at maximal I/O rates. image is a systems problem involving a range of variables, Secondly, by re-conceptualizing forensic acquisition as a including: process which occurs hand in hand with live analysis. Computationally feasible analysis tasks, and live analysis The maximal I/O rate of the target device; tasks which traditionally fall in the domain of triage ac- The maximal I/O rate of the target device interface; tivities or live preview, are applied as a priority, contrib- The method of hashing; uting to the incremental building of a partial (or eventually The number of cores and CPU speed; full) physical image. The method of compression; Improvements to the AFF4 forensic container which The path taken in reading disk blocks; resolve the identified shortcomings are proposed, and their The available I/O interconnects to the image storage effectiveness evaluated through a systematic set of exper- device; iments focused on high input rate acquisition. The maximal write rate of the filesystem implementation Motivation The maximal write rate of the evidence container device. This section describes at a high level our acquisition and live analysis system (which we call Wirespeed), the Depending on budgetary constraints, it is reasonable to implementation and testing of which has identified the believe that on a small scale the examiner may be able to shortcomings in existing forensic containers described control the majority of these. For example an analyst later in this paper. We note that the architecture of the employing a removed dead disk acquisition using a forensic Wirespeed system is the subject of a separate publication. media duplicator (“drive cloner”) in the field has control over many of the above but the maximal I/O rates of the target device and its storage interface. Acquisition as an interactive & batch process In the field and at a scale of numerous devices, examiners are regularly faced with far less control over these variables: Fundamental to our proposal is the idea that the scope budgetary constraints will limit the number of forensic du- of acquisition should be subject to change and prioritized plicators able to copy removed dead disks in parallel. Com- based on live analysis activities of the examiner, and more mon responses to these pressures are the usage of forensic generally by categories of forensic data identified for live-CD's to employ in-situ dead disk techniques, necessi- acquisition. Starting with an empty forensic image, the tating the usage of the CPU and I/O channels of the target image is successively built from target blocks, first of vol- system. For such approaches, combined with old server ume metadata and then filesystem metadata. A non-linear, hardware, acquiring via USB2 (max 40 MB/s) is, while not partial image is foundational to such an approach. desirable, still in 2015 a reasonable option. Blocks are then preserved into the image based on two It is well understood in the literature that for at least separate activities. spinning disks, the I/O rate of reads from the disk is An analyst might conduct a live analysis of the storage maximal for reads of long runs of consecutive blocks, device via a virtual disk (i.e. iSCSI) interface to the image e with seeks imposing a significant penalty in throughput. blocks present in the partial image are returned to the Accordingly, we adopt a priority based scheduling analyst's preferred tool directly via the virtual hard disk, approach that prioritises interactive analysis over non- and blocks not present in the image are scheduled for interactive acquisition goals, and scheduling the reading reading from the target device, and when read, stored in of blocks and caching in RAM to minimize disk seeks in the image and returned to the analyst's tool via the virtual acquiring non-linear block sequences.

Load more