Journal of Digital Forensics, Security and Law

Volume 10 Number 3 Article 3

2015

On the Network Performance of Digital Evidence Acquisition of Small Scale Devices over Public Networks

Irvin Homem Stockholm University, [email protected]

Spyridon Dosis Stockholm University

Follow this and additional works at: https://commons.erau.edu/jdfsl

Part of the Computer Engineering Commons, Computer Law Commons, Electrical and Computer Engineering Commons, Forensic Science and Technology Commons, and the Information Security Commons

Recommended Citation Homem, Irvin and Dosis, Spyridon (2015) "On the Network Performance of Digital Evidence Acquisition of Small Scale Devices over Public Networks," Journal of Digital Forensics, Security and Law: Vol. 10 : No. 3 , Article 3. DOI: https://doi.org/10.15394/jdfsl.2015.1205 Available at: https://commons.erau.edu/jdfsl/vol10/iss3/3

This Article is brought to you for free and open access by the Journals at Scholarly Commons. It has been accepted for inclusion in Journal of Digital Forensics, Security and Law by an authorized administrator of (c)ADFSL Scholarly Commons. For more information, please contact [email protected]. On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License.

ON THE NETWORK PERFORMANCE OF DIGITAL EVIDENCE ACQUISITION OF SMALL SCALE DEVICES OVER PUBLIC NETWORKKS Irvin Homem [email protected]

Spyridon Dosis [email protected]

Department of Computer and Systems Sciences Stockholm University Postbox 7003, 164 07 Kista Sweden

ABSTRACT While cybercrime proliferates – becoming more complex and surreptitious on the Internet – the tools and techniques used in performing digital investigations arre still largely lagging behind, effectively slowing down law enforcement agencies at large. Real-time remote acquisition of digital evidence over the Internet is still an elusive ideal in the combat against cybercrime. In this paper we briefly describe the architecture of a comprehensive proactive digital investigation system that is termed as the Live Evidence Information Aggregator (LEIA). This system aims at collecting digital evidence from potentially any device in real time over the Internet. Particular focus is made on the importance of the efficiency of the network communication in the evidence acquisition phase, in order to retrieve potentially evidentiary information remotely and with immediacy. Through a proof of concept implementation, we demonstrate the live, remote evidence capturing capabilities of such a system on small scale devices, highliighting the necessity for better throughput and availability envisioned through the use of Peer-to-Peer overlays. Keywords: Digital Forensics, Digital Evidence, Remote acquisition, Proactive forensics, Mobile devices, P2P, Network performance, Availability INTRODUCTION Analysis Team, 2012) stealing banking credentials and performing espionage activities Malevolent activities quickly adapt and evolve respectively. They are no longer rare to align themselves with the particularities of occurrences with mild consequences. They have their given environment. This is seen in that permanently set up camp in intricate and they are no longer only a confined to the surreptitious forrms, taking unjust advantage physical world. They have readily adapted to over unsuspecting users going about the digital realm, taking up their niche commonplace activities on the Internet. The markedly on the Internet. Examples of such Regin malware (Kaspersky Lab, 2014), are the Zeus banking Trojan (Stone-Gross, formally analyzed and docummented in 2014 as 2012) and the Flame malware (sKyWIper a cyberespionage tool, is an example of this,

© 2015 ADFSL Page 59 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License. having said to have been possibly in the wild process needs to be performed meticulously, since 2003. Today, all activities in digital realm carefully and in many cases slowly in order to are at the risk of being compromised by ensure that there is no potentially crucial piece malicious actors aiming at perpetrating theft, of evidence left behind. This is the state of impersonation, sabotage or to paralyze others’ affairs in the real physical world. activities for personal benefit. However, today’s crime scene is rapidly The consequences of such malicious edging away from a physical reality into a activities for the unsuspecting user have also more virtual one. The forms of evidence found become more detrimental, persistent and in these “Digital Crime Scenes” have also having far reaching effects in that they are moved from the traditional fingerprints, largely untraceable and easily invisible to the footprints, hair samples, blood samples or untrained eye. Developing novel and other DNA related evidence, into more digital innovative methods that enable malicious artifacts.. Such digital forms of evidence activities to remain effectively undetected and commonly include hard-disk drives, live untraceable, is the hallmark of these evildoers. (RAM) memory, network traffic captures, They are almost always one step ahead of the mobile devices, RAID sets (M. Cohen, pursuers. Furthermore, it is relatively easy to Garfinkel, & Schatz, 2009), and virtually any hide among the deluge of data that is created other form of technology that records past among communication devices that support events of its actions; that can be captured and the basic network communication on the can be analyzed during or after the criminal internet. Malevolent activity in the “Digital event and whose integrity can be verified. Realm” can thus, easily become rampant and This opens the floor to almost any form of uncontrollable if there are no equally computer appliance (physical or virtual) that innovative methods to counter the offending can be thought of. Thus arises the actors and their activities. The rate of heterogeneity problem among devices – or innovation and uptake of novel techniques by simply put the seeming lack of standardization law enforcement agencies, digital forensics among vendors of devices that perform related practitioners and incident responders must at tasks. Different devices may have different the very least be equivalent to that of their physical connectors, operating systems, criminal counterparts, if they are to keep up applications, storage formats, with the proliferation of crime on the Internet. encoding schemes and communication One of the foremost areas in digital crime protocols (CDESF Working Group, 2006). This investigations where innovative means of heterogeneity makes the job of a Digital combatting crime are highly necessary, but Investigator a lot more difficult because of the largely lacking, is the evidence capture process. wide variety in which evidence could manifest This is the initial stage of an investigation itself in the wild. This greatly hampers any where artifacts from the scene of the crime manual efforts of collecting evidence, even with need to be retrieved in their original form, or, the assistance of semi-automated tools of today in the case of digital investigations, in some such as disk imagers. form of a complete copy of the original artifact In addition to this, Electronic Crime cases that can be proven to be devoid of any today often involve more than just a single tampering (National Institute of Standards and device. Several computer-like appliances Technology, 2004) (Scientific Working Group including tablets, mobile phones, digital on Digital Evidence (SWGDE), 2006). This cameras, GPS devices, smart-TV’s and even

Page 60 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License. embedded devices such as onboard vehicle Finally, at present, investigators grapple computer systems (from trucks, cars and even with the problem of the relatively immature ships) could be seized for a single case, in order forensic tools that they are presented with. to be subjected to further investigative Current industry standard forensic tools such analysis. If we also bring in the vast realm of as EnCase, FTK, XRY, Volatility and the Internet also into play, such evidence Wireshark, at the moment of writing, do not sources could include web application accounts, cater for the highly divergent nature of digital online email accounts, cloud storage facilities, evidence sources. Most tools focus on a single network traffic captures and logs (Raghavan, niche area such as Filesystem Data, Live Clark, & Mohay, 2009). It is not difficult to Memory, Network Traffic, Mobile Devices or imagine that all these evidence forms could Log data. Some have recently begun to expand easily be part of a single case in today’s world their capabilities. The latest version of Encase and even more so in the imminent realm of the Enterprise v7 (Guidance Software, 2014) Internet of Things. The sheer volume of data claims to acquire evidence from disk drives as that one would have to sift through in order to well as RAM and some mobile devices within investigate a single case could be in the order an organizational context. AccessData’s FTK of Terabytes and can be a more than daunting AD Enterprise (AccessData, 2014) also claims task to perform. (Case, Cristina, Marziale, largely similar functionality. Neither deals with Richard, & Roussev, 2008) network traffic data yet. They are also yet to provide a comprehensive method to interface Furthermore, in the realm of the Internet, with all the variety of data present to provide composed of massively interconnected devices a uniform investigation platform. In addition sharing vast amounts of highly varying data, to this, current tools have rather limited crossing paths at high velocities, the speed of capabilities for capturing potentially the capture of potentially evidentiary evidentiary data on demand over networks as information is of essence. The same levels of well as dealing with extremely large datasets. meticulousness and carefulness of physical Both EnCase and FTK products in their latest evidence acquisition may as well be sacrificed releases seem to allow for remote evidence to some extent for the agility that is needed in acquisition within an enterprise network, reacting to crime in the digital world. This is however the performance of these are unknown because potentially evidentiary information and the tools proprietary, thus difficult to that is not captured almost instantaneously, is measure. Furthermore, most of such tools likely to be lost forever in just a matter of would struggle and would quickly become seconds. However, this does not mean that all problematic when presented with Internet- accuracy and care in collection of digital Scale crime scenes. evidence artifacts is ignored, rather it is traded-off and reduced in favour of speed. In this paper, we present the architecture Nevertheless, the maintenance of the chain of of a scalable, distributed, multi-component custody is always very important in any digital incident response and digital investigation investigation. New methods of achieving platform aimed at dealing with large scale similar standards of the preservation of digital distributed cybercrime investigations. We evidence to those of physical evidence also name this system the Live Evidence need to be sought after and integrated into Information Aggregator, or LEIA, in short. legal standards. The LEIA architecture aims at curbing cybercrime through assisting digital forensics

© 2015 ADFSL Page 61 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License. practitioners and law enforcement agencies in overbuurdening law enforcement agencies and improving their digital crime response advances in technological capabilities. capabilities. Some of these efforts include: Delegation This is to be done through addressing and collaboration among teams; Reductioon of several of the aforementioned problems such as evidence sizes through filtering out known files; the innate and growing complexity of the fast and simple automation of important but growing “Internet-of-Things” types of cases as mundane, repetitivve tasks (such as indexing well as dealing with the constantly growing data for subsequent searches, file carving, amounts of heterogeneous data vis-a-vis the parsing running process in memory or TCP present shortage of physical resources and flows in networkk captures). Most of these technical capacity within Law Enforcement. capabilities have been implemented in current We also address the need for proactive industry standard forensic tools, however, collection of evidence from potential evidence investigators and analysts still remain sources on-demand over public networks, and overbuurdened (van Baar, van Beek, & van further show the need for availability through Eijk, 2014). This is because of the presently redundancy, and faster throughput network abundant and steadily growing amounts of transfers such as those seen in Peer to Peer heterogeneous and disjointed datasets from technologies. The rest of this paper is multiple sources that they are tasked to collect organized as follows: In Section 2, we review and analyze. Methods to alleviate this problem related work outlining shortcomings of through fully automating the remote collection previous similar solutions. Section 3 describes and pre-processing of such data are so far the requirements for a comprehensive either lacking in efficiency or in scalability. distributed digital investigation platform. The Several unidireectional solutions, such as, functionality of the LEIA system with (Almulhem & Traore, 2005) have been particular focus on the networking component proposed in a bid to solve this multi-faceted is described in Section 4. The network-focused problem, however, they have not been proof of concept implementation and results unequivocally successful. In recent times there are outlined in Section 5. In Section 6 and 7, have been initiatives to centralize evidence we summarize the work done in this study and storage (Ren & Jin, 2005), but distribute propose further work that may be done in this processing among several machines (Roussev & area, respectively. Richard III, 2004). There has also been a push towards having the different parties, involved in solving a case to work together, even from BACKGROUND AND geographically separate locations (Davis, RELATED WORK Manes, & Shenoi, 2005), particularly with respect to technical staff in niiche areas (such Several progressive efforts have been made as filesystem forensics, network forensics, live towards improving the efficiency of the Digital memory forensics or mobile forensics) and the Investigation process. The motivations behind legal eexperts. Collaboration has beenn the these have spawned from the changing mainstay of the attempt to get cases solved requirements of national and international faster. legal systems, the evolution in the digital crime scene, the visible backlogs of cases Reducing the amount of data that is neededd to be collected is also a means of reducing the amount of time needed to analyze

Page 62 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License. the data. This has previously been done collect new and possibly better information. through “Known File Filtering” as well as This presupposes that the attacker re-enacts through scripts crafted to use heuristics their malicious behavior in a similar way to (Koopmans & James, 2013). Network Security their previous attempts, each time their efforts Monitoring has also been an avenue for are thwarted by the system. Storage of the gathering data with the help of Intrusion potential evidentiary information in a Detection Systems (IDS’s) assisted through forensically sound manner is not particularly data mining (Leu & Yang, 2003). However, dealt with in this study. The aim was to this has been the specific mandate of the IDS, understand attacks better in order to make centralized or distributed, as the case may be, better decisions on what kind of preventive with terminating (end) devices or intermediary measures to deploy. devices generally playing very minor roles in The RAFT system (Scanlon & Kechadi, this task. 2010) proposed an architecture for performing As far as is known to the author, there has remote evidence acquisition from disks of not been much done, through any single computers using a live CD prepared with disk initiative, in terms of expanding the scope of acquisition tools and networking capabilities. data captured to be the mandate of all possible One of the drawbacks of this system was the devices of reasonable capability. Enabling inability to take live captures of the disk thus individual devices to natively act as part of the requiring the machine to be rebooted as well as Incidence Response System, towards the aim of needing a CD Drive or a USB port. The collecting potential evidentiary data, has not remote acquisition was also seen to suffer from been widely studied. Additionally, speed deficiencies and a need was identified to collaboration on the human processing level improve on this. We extend on this idea but has been emphasized, but it has not been oriented towards mobile devices. (Scanlon, introduced among unrelated networked Farina, Khac, & Kechadi, 2014) further also devices. These devices could possibly be describe a methodology for performing harnessed to work together towards aiding in forensics on devices participating in intelligent real-time capturing, filtering and decentralized cloud storage services such as processing in order to attain and retain that BitTorrent Sync. They show how the default which could be considered as possible replication of data on multiple devices can help evidentiary data, antecedent to the event of a in recovering data despite it having been crime being detected. It is for these reasons maliciously removed to hinder forensic that we delve into this area to explore it analysis. further. (Shields, Frieder, & Maloof, 2011), (Yu et Notable related studies include (Zonouz, al., 2005), (M. I. Cohen, Bilby, & Caronni, Joshi, & Sanders, 2011), that describes a live 2011), and (Moser & Cohen, 2013) describe network forensics system that provisions distributed system architectures for proactive varying Intrusion Detection Systems on host collection and summarization of evidence, with machines based on their respective resource centralized data storage and processing. They costs. It works in a virtualized environment are, however, particularly directed at closed where snapshots are taken periodically and domain enterprise systems, where there is some used to revert the system back to the point form of control and order instigated by system before an attack began. Each system rollback administrators. Participation of computer results in different IDSs being deployed to systems outside the control of the enterprise is

© 2015 ADFSL Page 63 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License. not considered. The system being proposed in (Kahvedžić & Kechadi, 2009) where a detailed this study is aimed at being universal – ontology of Windows Registry artifacts of applying to the entire Internet. interests is introduced. The Open Forensic Integration Architecture (FIA) in (Raghavan The work done by Redding in (Redding, et al., 2009) and FACE (Case et al., 2008) 2005) is the most closely related study done in describe methods for the integration of digital the area of pro-active and collaborative evidence from multiple evidence sources in a computer forensic analysis among bid to facilitate more efficient analysis. The heterogeneous systems. Redding proposes a Advanced Forensic Format (Garfinkel, 2006), peer-to-peer framework for network monitoring AFF4 (M. Cohen et al., 2009) and XIRAF and forensics through which network security (Alink, Bhoedjang, Boncz, & de Vries, 2006) events can be collected and shared among the describe annotated evidence storage formats peers. “Analysis, forensic preservation and that allow for addition of arbitrary metadata reporting of related information can be as well as interoperability among different performed using spare CPU cycles,” (Redding, tools. 2005) together with other spare, under-utilized, or unused resources. This system however In AFF4 (M. Cohen et al., 2009), notably, seems to be designed to collect only network remote evidence capture, some form of security events and not any other forms of availability through manually driven evidence from individual host devices redundancy, and some parallelism in the Furthermore it seems to be aimed towards an evidence capture process of RAID data sets is “administratively closed environment” under also present. However it seems that the the control of some systems administrator initiation of these processes is instigated within an enterprise. An open system that has through human intervention. They are not the Internet as its domain of operation fully automated through machine triggers, and assisting in the collection of any form of thus could be slow to react in acquiring computer based evidence is what is not dealt evidence. The availability (fail-over) provided with in Redding’s work. Thus, it is this that is through redundancy is based on whether the sought after in the current study as will be evidence captured is required in other described later in this paper. locations. If it is not required elsewhere, then the fail-over mechanism would not work In order to facilitate uniform, seamless because there would be only one copy of the exchange of forensic artifacts between evidence. The parallelism (described heterogeneous entities, some form of particularly for acquiring individual disks in a standardization of the transmitted evidence RAID set) is unclear whether it could also formats is necessary. One of the bodies that apply in parallelizing other potential evidence has made proposals related to this is the data sources such as RAM memory or NAND Common Digital Evidence Storage Format storage on mobile devices. Working Group (CDESF Working Group, 2006). Other notable efforts include (Schatz & The proposed idea that this study covers is Clark, 2006) which makes use of the Resource composed of several areas of specialization, Description Framework (RDF) from Semantic namely: The Internet of Things (IoT), Web technologies as a common data Intrusion Detection Systems, Peer to Peer representation layer for digital evidence related Networks, Virtualization infrastructures, Large metadata, using ontologies for describing the Scale Cloud storage and Semantic Web vocabulary related to this data, and technologies. Most of these technologies have

Page 64 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License. been previously harnessed in different distribution in terms of capacities, singularly or in small clusters, participants, data storage, towards the benefit of digital forensics for processing and dissemination. The today’s complex internetworked and system needs to be able to handle intertwined cyber realm. However, to the the heterogeneity that may come author’s knowledge, there has so far not been with distributed systems as well. any work done other than our previous work ii. Scalabbility: Large scale seen in (Homem, 2013) and (Homem, Dosis, & interconnectivity, as well as the Popov, 2013) that aims to merge all these possibility of new entities joining, as technologies together, in order to provide a well as others leaving the system singular scalable solution that solves the dynamically and gracefully without recurring problems of large amounts of data, drastic negative effects on the several sources of evidence, inability of system. The ability to easily collecting evidence efficiently over networks, improve or extend the capabilities heterogeneity among systems, insufficient of the system through new modules processing power, security and privacy – that is also desired. are constantly troubling digital forensic analysts and law enforcement agencies iii. Availability: Providing suitable worldwide. We extend our previous work, by levels of functionality as and when describing the P2P gossiping protocol further, required. as well as by performing tests on larger iv. Universality: Among the partition sizes with two more mobile devices heterogeneity and lack of that have slightly better computing power. standaardization among vendors of More insights related to the performance and different systems, there needs to be reliability of the remote evidence capture some standardization and common process are also realized through these tests. understanding between the systems CHARACTERISTICS on the level of communication and storage of potential evidentiary OF THE DESIRED information. SOLUTIONO v. Responsiveness: The system should Inspired by the challenges documented by be able to aptly detect when a Palmer at the first DFRWS conference security policcy has been (Palmer, 2001), we describe below a wish-list irrecooverably violated, thus of characteristics that one would like to have collecting information in order to in a comprehensive Digital Forensics and pursuue the perpetrators of the Incident Response system for a public open criminal actions. This also improves domain networked environment such as the on efficiency and privacy in that Internet. They are aimed at complementing the system does not have to and updating Palmer’s list in light of the perpetually be collecting all possible current state of electronic crime and the information from all possible present state of forensic tools, as described systems. earlier. vi. Resource Sharing: Today, large i. Distribution: The ability to deal complex problems that are being with massive amounts of solved through collaboration and

© 2015 ADFSL Page 65 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License. sharing of resources as seen in LEIA: THE LIVE Crowdsourcing, P2P networks, and EVIDENCE cloud infrastructures. They provide on demand rapid availability of INFORMATION large amounts of resources from AGGREGATOR collective resource pools providing speed, efficiency and the benefits The LEIA is a 4-tiered system architecture from “the wisdom of the crowd”. that may be described as a combination of hypervisors, intrusion detection systems, peer vii. Integrity (Trust, Reliability & to peer systems and cloud storage. It is made Accuracy): As a system facilitating up of the following components: law enforcement in digital crimes, the levels of trust, reliability, a) The Host-based Hypervisor (HbH) accuracy and integrity of the b) The Peer-to-Peer Distribution information needs to be high Architeecture (P2P-da) enough to be accepted as a c) The Cloud-based Backend (CBB) veritable source of evidentiary information for a court of law. The d) The Law Enforcement Controller Daubert standards and the chain of (LEC) custody need to be adhered to. The functionality of each of the layers of viii. Privacy & Confidentiality: the LEIA system is briefly described in the Personally identifiable and secret following sections. information must be maintained as The Host-based Hypervisor anonymous and confidential as is reasonably acceptable, unless (HBH) incriminated. Unauthorized access The Host-based Hypervisor (HbH) system is to such information is not to be composed of a virtualization layer managed by allowed. a hypervisor – a privileged secure platform ix. Security: In addition to ensuring managing the guest (OS). the security of the potential The hypervisor contains an inbuilt host-based evidentiary information that it aims intrusion detection system also termed as the to collect and process, it must also embedded intrusion detection system (em- take its own security into IDS). Security utilities within the guest OS consideration – especially in terms such as anti-malware tools and intrusion of authentication, authorization, detectioon systems maintain their own data and accountability and non-repudiation logs that are accessible to the HbH. The HbH of activities undertaken. collects and assimilates the information that it gets from its own inbuilt intrusion detection systemm together with other information collected from the other security utilities that may exist within the guest OS. This helps in getting a better perspective of current malicious activity that may be underway.

Page 66 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License. Further to this sharing of information This is managed by the Cloud-based Backend within a single HbH system, individual HbH Differencing Engine (CBB-DE) component of systems also share their information about the CBB. The aim of this is to quickly filter malicious activity they may have discovered out known system data or files through with each other. This communication is matching the files on an HbH against hashes of facilitated through the Peer-to-Peer system files that are known to be benign and Distribution Architecture (P2P-da). This have not been moodified in any way. collaborative effort among the HbH systems A user data profile with its corresponding further helps improve the accuracy of IDSs and hash-lists is alsoo created. Thhe user-data hash- eventually forensic data acquisition. list is also maintained in a dual format – with In order to reduce the amount of data that a local copy residing on the HbH and a remote may need to be collected for analysis, each master copy beiing maintained at the CBB. HbH maintains a hash list of the local files on Further to this the actual user data is backed its guest operating system (Local - Known up at the CBB. Thus, the user data hash lists Data Hash-List, L-KDHL). This L-KDHL is are used to check which files have changed and periodically cross-checked and updated against may need to be backed up to the CBB. a Master – Known Data Hash-List (M-KDHL) stored at the Cloud-based Backend (CBB).

Figure 1. The components of the HbH subsystem

© 2015 ADFSL Page 67 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License.

With respect to “Known Malicious Files” protocols (Jelasity, Voulgaris, Guerraoui, which are files that have been previously Kermarrec, & Steen, 2007), and the Bit- identified as having malicious content within Torrent protocol (B. Cohen, 2003). them, a “Known Malicious File” hash list is to Theere are 3 main functionalities of the be maintained only on the CBB. It is not held P2P-da: on individual HbH systems as it can easily become large and unmanageable for a single I. Maintenance of the P2P Overlay HbH to maintain. III. Dissemination and aggregation of The hypervisor is the critical element when Malicious Behaviour Information it comes to the collection of potential and alerts. evidentiary data. Having privileged access, the IIII. Incident response data collection hypervisor is able to directly interact with the file system, network interfaces, memory caches Theese functionalities generally correspond and other low-level resources, which are all to the P2P protocols that make up the essence primary sources of common evidentiary data in of the P2P-da. The function of the digital investigations. The embedded IDS’s mainteenance of the P2P overlay is facilitated (em-IDS) also collects information mostly in mainlyy through gradient (hierarchical) overlays the form of logs which are parsed to result in assisted throughh epidemic (gossip-based) synthesized alerts. When evidentiary data from overlays. The dissemination and aggregation of the local HbH is collected, it is transmitted malicious behavior information is mainly towards the CBB via neighbouring HbH facilitaated by epidemic (gossip-based) overlays. systems through the action of the P2P-da Incident response data collection is mainly system (described in the next section). While facilitaated through an adaptation of the Bit- such data is in transit through a neighbouring Torrent protocol. The details behind these HbH system, and travelling onward to the individual functionalities are dealt with in the CBB, it is always held in an encrypted form following sections. and only within temporary storage. Maintenance of the P2P Overlay The Peer-to-Peer Distribution The essence of this is for the overall P2P Architecture (P2P-da) network to maintain connectivity among neighbouring nodes as well as the larger HbH The essence of the P2P-da is to provide node population. Further to this, the aim here reliability, scalability and rapid throughput of is to liink HbH nodes in such a way that they transmitted data even in the face of high rates are most beneficial to each other as well as to of “churn”, that is, large numbers of nodes the overall communication of security events joining and leaving the network. In order to and eviidence transmission aims. achieve this, a cocktail of P2P protocols are put together in order to extract the best In order to do this, a hierarchy is to be qualities of these and also allow for each to created among the peer nodes such that those compensate for the other’s shortcomings. The less endowed with resources are lower in the particular P2P protocols that are put together hierarchy and those that are better endowed in order to build the P2P-da are: Gradient are higher in the hierarchy. The aim of this is overlay protocols (Sacha, Dowling, to ensure that nodes that lack resources Cunningham, & Meier, 2006) Epidemic generally communicate security event

Page 68 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License. information, or transmit potentially large this distance could be simple Euclidean evidence files towards more reliable and stable distance in terms the difference between the peers. It is assumed that nodes with more utility metrics of devices, or some other useful resources are more likely to be better equipped sense of distance, such as the geographic to deal with larger amounts of information and distance. Thus each peer should store are also more likely to be online and available information of “nearby” devices termed as the to be communicated with. similar set, consisting of devices with similar utility metrics; as well as a set of devices which A gradient overlay network is suited to are distant in terms of the utility metric, ensure this form of a network structure. It is termed as the “random set”. This mix of built in such a way that a utility metric is used nearby and distant peer knowledge would also to determine which nodes are most suitable to assist in preventing partitioning of the network connect to, and which nodes to avoid. This due to excessive clustering. utility metric is determined from a combination of factors including the amount of For the gradient overlay to be maintained, resources available on a node, the current state each peer should also maintain information of use of the node and the amount of time that about a set of peers that has “weaker” utility it has been online. These utility metrics are metrics and a set that has “stronger” utility shared through random node interactions, metrics. This is in done in order to maintain typical of “gossip-based” (epidemic) P2P communication between peers where the protocols in order for nodes to get to know of hierarchy of the gradient is needed. Such is the other nodes that might be better to link to. case when a peer needs to pass on more computationally intensive tasks, or when As gossip-based P2P protocols are known ensuring that evidence is transported onward to eventually converge to a generally stable to a more stable peer. We term these lists the state, a hierarchy of the HbH systems is thus subordinate list and the superior list, formed with the less endowed elements on the respectively. outer edges and the more capable elements closer towards the centre of the LEIA system These lists are exchanged among peers (that is, the CBB). periodically and randomly. This means that a peer picks a peer either from its similar set or Mechanics of the Gossiping and Gradient its random set and exchanges all or part of its Overlay subordinate and superior lists if they are Neighbour (peer) selection is an important different. Thresholds need to be set in order to process in maintaining both overlays. This is ensure that the exchanges do not result in only because it affects the performance of the a particular set of peer information being gossiping overlay in its ability to communicate exchanged, however these thresholds are not and converge, and thus also the gradient further discussed in this study. Additionally, a overlay. In order to be efficient in converging list of “recently seen peers” should also be information across a network where random maintained in order to avoid cycling between interaction is a key factor, each peer has to be the same peers repeatedly. equipped with good “local knowledge” as well as “distant knowledge”. In our case this would require each peer to have knowledge of both peers that are “nearby” as well as peers that are “distant”. The distance metric determining

© 2015 ADFSL Page 69 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License. Dissemination and Aggregation devices in order too extend the evidence capture of Malicious Behaviour Information & process to other suspected devices. Alerts Theere may be other forms of messages that This capability is necessary in order to might need to traverse the P2P-da; however, facilitate the collaborative mechanisms needed the 2 categories mentioned thus far are the to ensure that security event information is major types. shared, and that potentially useful evidence Incident response data collection information is captured efficiently and transmitted securely. Security event This functionality is triggered by the detection information known by individual HbH peers is of malicious events via the collective duly shared out to others in order for the knowledge gained through collaborating HbH overall system to have a more informed systemms, the em-IDS and guest OS security security posture as well as to be forewarned of mechanisms. For more volatile data such as imminent malicious events. This includes the network traffic and live memory, a fixed time distribution of malicious activity signatures as period is chosen for which to perform the well as the discovery of malicious activity capture process (or a fixed number of originating from certain hosts. When such snapshots of the data over a short period of messages are received, only a set of the most time particularly for live memory) after which common and recently active malicious activity a decision is to be made whether subsequent signatures are maintained at the HbH. These captures need to be made, or whether what has kinds of messages are termed as “Management been collected so far suffices. Correspondence messages” and can be shared out to any peers with the Cloud-Based Backend-Differencing that a particular HbH has address information Engine (CBB-DE) filters out known system about and that have connectivity. files through facilitating the hash comparisons. Primary analysis for IP addresses and The other major types of messages that are hostnames on the data collected may result in involved in this functionality are termed as triggering of other HbH systems to capture “Security Incident Control messages”. These data also. messages facilitate the reaction to the detection of a malicious event. This mainly The actual data collection procedure includes the communication of procedures to involves 3 stages as described in the following initiate the evidence capture process on certain sections. The diagram below (Fig. 2) depicts components of certain nodes as well as the data collection and transfer process of the initiating initial pre-processing such as P2P-da. determining IP addresses of recently connected

Page 70 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License.

Figure 2: The P2P-da Data Transfer process

a) Data Partitioning high throughput of the BitTorrent protocol Different data formats (memory dumps, logs, files, packet captures, disk images) b) Shard Distribution are compressed and stored temporarily Multiple copies of each individual shard on the HbH system in a modified AFF4 are distributed to more capable data structure that also contains simple neighbors (supporters), facilitated by RDF metadata describing the evidence. the gradient overlay. Each time a shard This data structure is termed as the is passed on it increases its “heat level”. Incident Data Archive (IDA). Each After a certain “heat” threshold (that IDA data structure is partitioned in we refer to as the “melting point”) a equal size pieces that will be referred to HbH system is obliged to directly as shards. The shard is a signed and upload to the CBB (more specifically encrypted partition of the IDA the HbH Master Peers of the CBB), analogous to the idea of a “piece” in the else an election procedure is initiated to BitTorrent Protocol. A metadata file determine which previously supporting termed as the “reflection” (which HbH should be delegated the uploading corresponds to the BitTorrent task. In order to avoid an individual Metadata file) is also created and sent node being the only “proxy” and thus a directly to the CBB. In this way the potential single point of failure, CBB acts as the “tracker” and “leeches” individual HbH systems are only IDAs from participating HbH systems allowed to partake in uploading a in the P2P-da, thus benefiting from the

© 2015 ADFSL Page 71 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License. certain number of IDA shards governed The Storage System (SS) is built upon the by the “dependency value”. This Hadoop HDFS architecture (Shvachko, Kuang, improves the overall reliability of the Radia, & Chansler, 2010) that provides not larger system through reducing the only the raw storage capabilities but also possibility of having a single point of scalability, availability, reliability and failure in the transmission process. responsiveness. The Differencing Engine (DE) filters out known files before having them c) Rapid fragment reconstruction stored on the CBB. This is provisioned For a particular shard, downloads are through the MapReduce (Dean & Ghemaawat, initiated from all their respective 2008) capabilities supported by Hadoop. The supporter locations. This is done for DE also provides a query-response mechanism redundancy and bandwidth to the HBH systems with information on maximization purposes. Similar to the known benign data as part of the Master BitTorrent Protocol download, priority Known Data Hash-List (M-KDHL). The M- is given to the shards that are the least KDHL contains data about known files, commonly available, that is, those that memory processes, protocol flows, and log have the fewest recorded supporters. entries and thus enables their removal from In order to reconstitute the IDA, IDAs being prepared. This reduces the size of individual hashes of shards are verified IDAs before being stored on the Storage as they are received, against that in the System (SS) of the CBB. Versions of known reflection. Several supporters upload at data entries in thhe M-KDHL need also to be the same time, thus if a shard is in maintained in a scalable manner; however the error, that from another supporter is mechanics of this is not discussed in this study. taken. Once successfully transferred, The HbH Master Peers are a particular set shards are deleted from supporting of well-endowed peers that are directly HbH systems. connectted to the coore CBB system (that is, the SS and DE) providing an interface to the rest of the LEIA system through the P2P-da. They The Cloudu -based Backend do not have other core functionalities unrelated (CBB) to their LEIA responsibilities and are essentially the backbone of the P2P-da and The CBB system is a highly available, scalable, ultimately the provider of connectivity of the responsive, centralized back end storage service LEIA system outwards to the other HBH capable of storing large amounts of data in a systemms. The HBH Master Peers also serve as homogeneous form. It is subdivided into 3 the central point through which system major components: The Storage System (SS), software updates and malicious event detection the Differencing Engine (DE) and the HbH heuristtics are originated from and disseminated Master Peers. outwards to the HBH systems in the wild.

Page 72 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License.

Figure 3: The Cloud-based Backend components

The Law Enforcement in order to improve the detection capabilities Controller System of the inbuilt IDS’s and thereby also improve the accuracy of collection of data of potential The Law Enforcement Controller is the main forennsic evidentiary use. interface that law enforcement personnel interact with in order to perform their directed PROOOF OF CONCEPT analysis for a particular digital investigation EVALUATION AND case. Through it, a Law Enforcement Agent can initiate specific queries to the data sets RESULTS stored on the CBB, thus retrieving detailed, As the first part of testing the motivations structured information as well as new behind the designed architecture, we decided knowledge inferred through correlation of data to focus on the network transmission originating from different sources that may component as it is critical in enhancing help in solving a case. The aim of this is to speedier evidence collection. In order to automate otherwise manual tasks of correlating demonstrate the need for better throughput data from different heterogeneous sources in networks and the need for redundancy for order to pose valid assertions based on the availability, such as those exhibited in P2P data that could assist a forensic analyst in overlaays, an experiment was set up to simulate performing their duties of making sense of the conditions of the LEIA, however without digital artifacts. This functionality is described the P2P-da component. This means that the in more detail in (Dosis, Homem, & Popov, experiment was performed with the 2013). transmission of potentially evidentiary Additionally, from the new found information from an HbH system to the CBB knowledge obtained through correlation, over a traditional client-server paradigm. The patterns of malicious activities are to be learnt experiment itself focused on the time taken to and stored. These Malicious Activity Patterns perfoorm remote extraction, compression and are to be used as feedback to the HbH systems transmission of increasingly larger disk images over an encrypted channel from small scale

© 2015 ADFSL Page 73 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License. devices over the Internet and the subsequent met. In order to test and measure the reconstruction and storage of this data on a performance of the proof of concept application Hadoop HDFS cluster. The success rate of each working over the client-server paradigm, six evidence acquisition trial run was also different small scale devices were used. Two considered to determine reliability. rounds of testing were performed: the first round with the first 4 less powerful devices, It should be mentioned that for the sake of and the second round with the 2 more simplicity of the experiment, the actual powerful devices. The table below outlines the hypervisor of the HbH system was not built, specifications of the devices being captured. however closely similar conditions – particularly in terms of the LEIA prototype application having privileged access – were

Table 1 Small scale device specifications Device Platform Processor Chipset RAM Disk 350MHz Freescale Busybox v1.6.1 64MB 64MB Chumby Classic ARM926EJ-S i.MX233 Android OS v2.3.3 1 GHz MSM8255 768MB 1.1GB HTC Incredible S (Gingerbread) Scorpion Snapdragon Qualcomm CyanogenMod Dual‐core Snapdragon S3 768MB 4GB HTC MyTouch 4G Slide 10.2 Alpha 1.2GHz Scorpion MSM8260 Android OS, v4.0.3 Dual‐core Galaxy Tab 2 TI OMAP 4430 1GB 8GB (WiFi Only) (Ice Cream Sandwich) 1GHz Qualcomm Android OS v 5.0.1 Quad-core S4 MSM8974 2GB 16GB (Lollipop) 2.3 GHz Krait 400 LTE-A Snapdragon 800 Qualcomm Quad-core Android OS v 5.1.1 MSM8974 2GB 16GB 2.3 GHz Krait 400 Snapdragon 800

Page 74 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License.

Figure 4. The experimental set up

For the testing and the performance (NAND) memory, and no expansion evaluation, partitions of the various devices capabilities, thus it was not included in the were filled to specific size limits with random testing for remote data transfer performance as files, including images, PDFs, music files and there was no way to increase the size of the compressed archive files (RARs) in order to storage capacity. It was, however, used in simulate normal disk usage. These devices were testing to show that the remote device capture subsequently captured over the network. of such a small scale device running on a Linux based platform was possible. It was also used In the first round, the capture process was as the main prototyping device because it had repeated 10 times for each individual partition a rather small storage capacity that enabled size on each device in order to get the average rather quick disk acquisitions when testing the file transfer times that each size took. The software developed. sizes measured were taken at 9 intervals with gradually increasing sizes. The maximum size The repetition process and the use of the of 4GB was taken as the largest size because averaging were done in order to compensate for the average capture (file transfer) times were the effects of random processes that could have beginning to take rather long periods (50-80 affected network transmission times. Such minutes) per test acquisition round. random processes could include network traffic Furthermore, the maximum disk size on any of from other users of the networks being used, the devices available for testing was 8GB (with phone calls coming in and interfering with the the rest being 4GB, 1.1GB and 64MB). A 4GB I/O processes of the devices, or applications mini-SD card was also available and was used being updated on the devices, among others. to supplement the HTC Incredible S in order The tables below show the partition sizes to simulate a larger disk size. The Chumby used and the average times (in milliseconds) Classic only had 64MB available of flash taken to perform the transfer:

© 2015 ADFSL Page 75 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License. Table 2 First Round Results – Tests on "HTC Incredible S" Partition Avg. File Transfer Avg. File Transfer # of Test Runs Amount used time (ms) time (Minutes) 16MB 10 13664 0.23 133MB 10 84600.8 1.41 250MB 10 392323.9 6.54 507MB 10 553933.1 9.23 1000MB 10 978571.8 16.31 1500MB 10 1360375 22.67 2000MB 10 2932376.8 48.87 3000MB 10 3877676.8 64.63 4000MB 10 4814006.6 80.23

Table 3 First Round Results – Tests on "HTC MyTouch 4G Slide" Partition # of Test Runs Avg. File Transfer Avg. File Transfer Amount Used time (ms) time (Minutes) 21.4MB 10 8583 0.14 87.0MB 10 31467 0.52 255MB 10 230709 3.85 500MB 10 338180 5.64 1000MB 10 1174482 19.57 1550MB 10 1323845.90 22.06 2000MB 10 1673928 27.90 3000MB 10 2052952.40 34.22 4000MB 10 3015056.60 50.25

Table 4 First Round Results – Tests on "Samsung Galaxy Tab 2" Partition # of Test Avg. File Transfer Avg. File Transfer Amount Used Runs time (ms) time (Minutes) 4MB 10 1235 0.02 11MB 10 67608 1.13 250MB 10 286947 4.78 500MB 10 426783 7.11 1000MB 10 960952 16.02 1500MB 10 1488236 24.80 2000MB 10 2829355 47.16 3000MB 10 2951551 49.19 4000MB 10 3707556 61.79

Page 76 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License. The data above from three of the four time against the partition size for the client different specimen devices used in the first server network paradigm of remote evidence round was plotted on a graph in order to acquisition. The diagram that follows depicts visualize the general trend of the file transfer the graph that was attained:

5000 HTC MyTouch 4G Slide 4500 HTC Incredible S 4000 Samsung Galaxy Tab 2 3500

3000

2500

2000

1500 File Transfer Time (Secs) File Transfer

1000

500

0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 Partition Size Used (MB)

Figure 5. Remote Acquisition Performance from First Round of Tests using Client-Server paradigm

In the second round of testing and time-outs due to random failures either on the performance evaluation the two most powerful devices, the network, or on the server side were and best equipped devices were tested. The deemed to be the cause of failure. process from round one was replicated, but Thus in the second round it was now with larger partition sizes since these 2 determined that strictly 10 runs would be devices had larger disk storage capacity taken and an average of these would be the (16GB). It was noticed in the first round that resulting estimated value. Among the 10 trial capturing the larger partition sizes was not runs the number of successes and failures always successfully completed. In the first would be recorded. This would enable round, in order to collect 10 time duration determining the average success rate for ever values for the larger partition sizes, often more increasing partition sizes. than 10 trial runs were required. Connection

© 2015 ADFSL Page 77 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License. The tables that follow indicate the durations taken. The number of successes out partition size intervals and the respective time of 10 trials per partition size is also indicated. Table 5 Second Round Results – Tests on "" Partition # of Successful Avg. File Transfer Avg. File Transfer Amount Used Trials time (Sec) time (Minutes) 263MB 10 275 4.58 535MB 10 535 8.92 1055MB 10 887 14.78 2168MB 10 1735 28.92 3221MB 10 2617 43.62 4304MB 9 3358 55.97 6470MB 6 5311 88.52 8637MB 6 6953 115.88 10247MB 5 8373 139.55 12778MB 4 10472 174.53 15758MB 3 13088 218.13

Table 6 Second Round Results – Tests on “ 5” Partition # of Successful Avg. File Transfer Avg. File Transfer Amount Used Trials time (Sec) time (Minutes) 263MB 10 270 4.50 535MB 10 497 8.28 1074MB 10 820 13.67 2168MB 9 1780 29.67 3221MB 10 2757 45.95 4304MB 8 3526 58.77 6470MB 5 5412 90.20 8637MB 4 7119 118.65 10247MB 2 9369 156.15 12778MB 2 13001 216.68 15758MB 2 15117 251.95

Page 78 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License.

16000

14000 Samsung Galaxy S4

12000 Google Nexus 5

10000

8000

6000 File Transfer Time 4000

2000

0 0 2000 4000 6000 8000 10000 12000 14000 16000 Partition Size Used (MB)

Figure 6. Remote Acquisition Performance from 2nd Round of Tests using Client-Server paradigm

12

10 Samsung Galaxy S4

8 Google Nexus 5

6

4

2 Number of Successes in 10 Trials 10 Successes in Number of

0 0 2000 4000 6000 8000 10000 12000 14000 16000 Partition Size Used (MB)

Figure 7: Remote Acquisition Successes out of 10 Trials from 2nd Round of Tests using Client- Server paradigm

© 2015 ADFSL Page 79 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License. DISCUSSIONO transfeerred over the network. Higher spec devices may be able to scale suitably with an In Figure 5, taken from the data in the first increase in disk storage, however, perhaps a round of tests, the curves seem to start off mismatch in terms of a significantly larger with a linear relationship which soon seems to difference between what is fitted into these turn into more of an exponential relationship. mobile devices by default and additional The “HTC MyTouch 4G Slide” clearly portrays external storage attached to them (E.g. this characteristic, with the rest of the devices through large SD cards) may cause the scaling also exhibiting this however not as vividly. of remote evidence acquisition to defer from Overall there seems to be a more exponential the linear pattern. relationship between the Partition Size and the File Transfer Time with respect to the larger As was noted in the first round of testing, sizes of partitions. One could posit that as the and thus measured further in the second partition sizes increase, even to sizes round, the success rate of the remote evidence substantially larger than those in the graph, acquisiition reduced with larger partition sizes. the relationship will become ever more This is seen in Tables 5 & 6 as well as in exponential. This means that the times taken Figure 7. As the partition sizes and effectively to acquire such partition sizes would be the amount of data requiring transfer increase in exponential magnitude and thus increases, the chances of a one-time success shows that the client-server paradigm is likely were seen to reduce. This can be attributted to not suitable enough for the task of performing the fact of the data transfer times taking remote evidence acquisition, especially in the rather long, thus increasing tthe time-window type of environment that the LEIA system is (and thus the likelihood) in which an aimed at. This could imply the need for a more unpredictable network interruption could cause efficient network transfer paradigm for this the transfer to fail. Additionally, it was noticed type of activity in order to enhance the that as the available disk space gradually came throughput. closer to being fully utilized, the more likely the devices were to behave erratically. The However, in Figure 6, taken from the data general erratic nature of suddenly freezing nd in the 2 round of tests – using more powerful (requirring a reboot) while performing the devices as well as larger partition sizes – the remotee evidence capture, as well as the curves seem to take a more linear shape. This presence of error notifications related to high could be attributed to a variety of factors disk usage, were obbserved in both devices used including the availability of better hardware in in the second rouund of tests.. The lack of an the form of CPU capacity, higher RAM, better expanssion slot for external secondary storage in network cards, better I/O speeds between the form of an SD card on the Nexus 5 perhaps components, updated software, kernel modules could have led to slightly worse performance and drivers such as the Android ION memory compared to the Galaxy S4. It is possible that manager. the Galaxy S4 made use of the SD card placed The major difference between the sets of in its expansion slot to cache data; however devices used in the first round and those in the this was not independently verified. nd 2 round is the improvement in hardware and In the proof of concept set-up, partitions software capabilities. It could be said that are captured through the “dd” tool, compressed lower spec devices may not be able to scale and streamed over an encrypted channel to the well with large amounts of storage space to be Cloud-based-backend (CBB) storage site,

Page 80 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License. where the actual file is reconstructed. Other reducing the problem of a single point of than the inherent re-transmission capabilities failure that may be experienced with mobile of the underlying TCP protocol, the proof of devices having intermittent network concept does not embody its own extra connectivity. This could also potentially recovery method for transmission failures. increase the time-window in which evidence Thus, if a network failure interrupts the can be gathered aiding in the need for streaming, a large file being reconstructed on collecting as much evidence as possible in the CBB end could be left incomplete. In the digital investigations. event the device being captured also goes Smaller file sizes being transmitted over offline, then the evidence may be irrecoverably the network are seen to have a higher lost in this Client-Server scenario. likelihood of succeeding, thus splitting up the It should be noted, though, that storage larger partitions into smaller pieces before failures were dealt with through the replication transmission may also help reduce failures. afforded by Hadoop HDFS that was managing This is also a common trend in P2P overlay the CBB backend filesystem. Thus, some form networks such as BitTorrent where large files of availability and thus reliability was are split into smaller pieces, thus furthering achieved, though only for evidence data our hypothesis of using P2P overlays to help successfully saved in storage at least once. improve availability and network throughput. Another alternative paradigm that could assist From the results gathered in the test in improving reliability of remote evidence experiments we discover that larger partition capture would be to collect and transmit sizes may not scale well in remote evidence metadata of the activities occurring on various acquisition depending on the capabilities of the evidence sources (disk, memory or network) device need. Additionally we do see that rather than transferring the entire evidence devices with better hardware could scale well source. The metadata repositories are likely to with data sizes that proved problematic for less be smaller in size and thus transferring and capable devices. Furthermore we see that for replicating these among peers during the large partition sizes the remote acquisition remote evidence capture process, could prove times do increase to significantly long times to be more efficient through promoting the (over 4hrs for >14GB), thus the likelihood of a reliability seen in the transfer of smaller files. failure occurring during these large durations increases. We postulate that the use of P2P In determining the success rates of the networks, between the evidence capture remote evidence acquisition, it should be location and the eventual storage location, mentioned that only 10 trial runs were done could be used to assist in providing availability per partition size because the evidence through replication of data at multiple peers, acquisition times were getting significantly as well as better throughput through using the longer. More than 10 trials proved to be bandwidth of multiple peers to facilitate inconvenient and a hindrance given the time uploading of potential evidentiary data. constraints in performing this study. Ten trials Certain P2P overlays such as the BitTorrent were deemed to be good enough to show some protocol are known to provide better network form of indicative result, even though the throughput, and thus shorter latency times value of 10 may not represent a statistically between evidence capture and storage. Others significant sample size to generalize the results are known to provide high availability through as being fully representative of the replication of data at multiple peers thus phenomenon being analyzed.

© 2015 ADFSL Page 81 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License. CONCLUSIONO FUTTUURE WORK In this study we outlined the numerous Though this architecture is promising, larger problems that blight the digital investigation disk acquisitions nneed to be performed with process, and law enforcement agencies at large, more modern small scale devices that are rendering them slow and ultimately ineffective. equippeed with larger storage capacities in order We proposed a comprehensive architecture of a to further confirm the need for a more efficient proactive, system – that makes used of and reliable form of network data transfer in hypervisors, P2P networks, the RDF the form of P2P communication. From the framework and cloud storage – that could help proposed architecture, several parameters improve the digital investigation process within the P2P communication protocols need through automation. Through a small proof of further optimization and testing. Additionally, concept, we demonstrate a limited part of this a PKII infrastructure can be infused in the system showing remote evidence acquisition of systemm in order to improve the security of the small scale devices over public networks. From communication and storage facilities. Also, the the results of testing this proof of concept, we storage capabilities of the Cloud-based motivate the need for a network paradigm that Backend could be supplemented by that of enables more reliability and better throughput participating HbHH nodes in order to realize a of the network transfer process. Some P2P more distributed, decentralized and overlays, such as the BitTorrent protocol independent storage solution. The conceept of demonstrate these characteristics and could privacy also needs to be addressed within the possibly provide the solution to improving the scope of this solution. Finally, an experiment speed and reliability of remote evidence with a wider scope, in terms of multiple capture. devices being tested simultaneously, wouuld be greatly desired in order to better drive this architecture towards becoming a reality.

Page 82 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License.

REFERENCES AccessData. (2014). AccessData FTK (Vol. 8, pp. S101–S110). Elsevier Ltd. ADEnterprise. Retrieved December 4, doi:10.1016/j.diin.2011.05.012 2014, from Davis, M., Manes, G., & Shenoi, S. (2005). A http://accessdata.com/solutions/digital- network-based architecture for storing forensics/ad-enterprise digital evidence. Advances in Digital Alink, W., Bhoedjang, R. A. F., Boncz, P. A., Forensics: IFIP International Conference & de Vries, A. P. (2006). XIRAF - XML- on Digital Forensics, 194, 33–42. based indexing and querying for digital doi:10.1007/0-387-31163-7_3 forensics. Digital Investigation, 3, 50–58. Dean, J., & Ghemawat, S. (2008). doi:10.1016/j.diin.2006.06.016 MapReduce : Simplified Data Processing on Almulhem, A., & Traore, I. (2005). Experience Large Clusters. Communications of the with Engineering a Network Forensics ACM, 51(1), 1–13. System. Proceedings of the 2005 doi:10.1145/1327452.1327492 international conference on Information Dosis, S., Homem, I., & Popov, O. (2013). Networking. Convergence in Broadband Semantic Representation and Integration of and Mobile Networking. Korea: Springer Digital Evidence. Procedia Computer Berlin Heidelberg. Science, 22, 1266–1275. Case, A., Cristina, A., Marziale, L., Richard, doi:10.1016/j.procs.2013.09.214 G. G., & Roussev, V. (2008). FACE: Garfinkel, S. L. (2006). AFF : A New Format Automated digital evidence discovery and for Storing Hard Drive Images. Association correlation. Digital Investigation, 5, S65– for Computing Machinery. S75. doi:10.1016/j.diin.2008.05.008 Communications of the ACM, 49(2), 85–87. CDESF Working Group. (2006). Standardizing Guidance Software. (2014). Encase Enterprise. digital evidence storage. Communications Retrieved December 4, 2014, from of the ACM. doi:10.1145/1113034.1113071 https://www.guidancesoftware.com/produc Cohen, B. (2003). Incentives build robustness ts/Pages/encase-enterprise/overview.aspx in BitTorrent. Workshop on Economics of Homem, I. (2013). LEIA : The Live Evidence Peer-to-Peer Systems. Retrieved from Information Aggregator A Scalable http://www.ittc.ku.edu/~niehaus/classes/7 Distributed Hypervisor-based Peer-2-Peer 50-s06/documents/BT-description.pdf Aggregator of Information for Cyber- Law Cohen, M., Garfinkel, S., & Schatz, B. (2009). Enforcement. KTH - The Royal Insitute of Extending the advanced forensic format to Technology. accommodate multiple data sources, logical Homem, I., Dosis, S., & Popov, O. (2013). evidence, arbitrary information and LEIA: The Live Evidence Information forensic workflow. Digital Investigation, 6, Aggregator: Towards efficient cyber-law S57–S68. doi:10.1016/j.diin.2009.06.010 enforcement. In World Congress on Cohen, M. I., Bilby, D., & Caronni, G. (2011). Internet Security (WorldCIS-2013) (pp. Distributed forensics and incident response 156–161). London. in the enterprise. In Digital Investigation doi:10.1109/WorldCIS.2013.6751038

© 2015 ADFSL Page 83 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License. Jelasity, M., Voulgaris, S., Guerraoui, R., Forensics in Telecommunications, Kermarrec, A.-M., & Steen, M. van. Information and Multimedia: Lecture (2007). Gossip-based peer sampling. ACM Notes of the Institute for Computer Transactions on Computer Systems Sciences, Social Informatics and (TOCS), 25(3), 1–36. Retrieved from Telecommunications Engineering, 8, 83–94. http://dl.acm.org/citation.cfm?id=1275520 Retrieved from http://link.springer.com/chapter/10.1007/9 Kahvedžić, D., & Kechadi, T. (2009). 78-3-642-02312-5_10 DIALOG: A framework for modeling, analysis and reuse of digital forensic Redding, S. (2005). Using Peer-to-Peer knowledge. Digital Investigation, 6, S23– Technology for Network Forensics. S33. doi:10.1016/j.diin.2009.06.014 Advances in Digital Forensics: IFIP International Federation for Information Kaspersky Lab. (2014). The Regin Platform: Processing, 194, 141–152. doi:10.1007/0- Nation-State Ownage of GSM Networks. 387-31163-7_12 Koopmans, M. B., & James, J. I. (2013). Ren, W., & Jin, H. (2005). Distributed agent- Automated network triage. Digital based real time network intrusion forensics Investigation, 10(2), 129–137. system architecture design. In Proceedings doi:10.1016/j.diin.2013.03.002 - International Conference on Advanced Leu, F.-Y. L. F.-Y., & Yang, T.-Y. Y. T.-Y. Information Networking and Applications, (2003). A host-based real-time intrusion AINA (Vol. 1, pp. 177–182). Ieee. detection system with data mining and doi:10.1109/AINA.2005.164 forensic techniques. IEEE 37th Annual Roussev, V., & Richard III, G. G. (2004). 2003 International Carnahan Conference Breaking the Performance Wall: The Case onSecurity Technology, 2003. Proceedings., for Distributed Digital Forensics. Digital (Mid). doi:10.1109/CCST.2003.1297623 Forensics Research Workshop, 1–16. Moser, A., & Cohen, M. I. (2013). Hunting in Sacha, J., Dowling, J., Cunningham, R., & the enterprise: Forensic triage and incident Meier, R. (2006). Discovery of stable peers response. Digital Investigation, 10(2), 89– in a self-organising peer-to-peer gradient 98. doi:10.1016/j.diin.2013.03.003 topology. In International Conference on National Institute of Standards and Distributed Applications and Interoperable Technology. (2004). Digital data Systems (DAIS) (pp. 70–83). Retrieved acquisition tool specification. Draft for from Comments. Retrieved from http://link.springer.com/chapter/10.1007/1 http://www.cftt.nist.gov/Pub-Draft-1- 1773887_6 DDA-Require.pdf Scanlon, M., Farina, J., Khac, N. A. Le, & Palmer, G. (2001). A Road Map for Digital Kechadi, T. (2014). Leveraging Forensic Research. In Proceedings of the Decentralization to Extend the Digital Digital Forensic Research Workshop, 2001. Evidence Acquisition Window : Case Study Uttica, New York. on BitTorrent Sync. Journal of Digital Raghavan, S., Clark, A., & Mohay, G. (2009). Forensics Security and Law, 9(December), FIA: an open forensic integration 85–99. architecture for composing digital evidence.

Page 84 © 2015 ADFSL On the Network Performance of Digital Evidence Acquisition … JDFSL V10N3 This work is licensed under a Creative Commons Attribution 4.0 International License. Scanlon, M., & Kechadi, M. T. (2010). Online threat- acquisition of digital forensic evidence. In intelligence/threats/The_Lifecycle_of_Pee Lecture Notes of the Institute for r_to_Peer_Gameover_ZeuS/ Computer Sciences, Social-Informatics and Van Baar, R. B., van Beek, H. M. a., & van Telecommunications Engineering (Vol. 31 Eijk, E. J. (2014). Digital Forensics as a LNICST, pp. 122–131). doi:10.1007/978-3- Service: A game changer. Digital 642-11534-9_12 Investigation, 11, S54–S62. Schatz, B., & Clark, A. (2006). An open doi:10.1016/j.diin.2014.03.007 architecture for digital evidence Yu, J., Ramana Reddy, Y. V., Selliah, S., integration. In AusCERT Asia Pacific Reddy, S., Bharadwaj, V., & Kankanahalli, Information Technology Security S. (2005). TRINETR: An architecture for Conference (pp. 15–29). Gold Coast, collaborative intrusion detection and Queensland. Retrieved from knowledge-based alert evaluation. http://eprints.qut.edu.au/21119/ Advanced Engineering Informatics, 19(2), Scientific Working Group on Digital Evidence 93–101. doi:10.1016/j.aei.2005.05.004 (SWGDE). (2006). Data integrity within Zonouz, S., Joshi, K., & Sanders, W. (2011). computer forensics. Retrieved from Floguard: cost-aware systemwide intrusion https://www.swgde.org/documents/Curren defense via online forensics and on-demand t Documents/2006-04-12 SWGDE Data IDS deployment. In Computer Safety, Integrity Within Computer Forensics v1.0 Reliability, and … (pp. 338–354). Naples, Shields, C., Frieder, O., & Maloof, M. (2011). Italy: Springer-Verlag, Berlin, Heidelberg. A system for the proactive, continuous, doi:10.1007/978-3-642-24270-0_25 and efficient collection of digital forensic

evidence. In Digital Investigation (Vol. 8, pp. S3–S13). Elsevier Ltd. doi:10.1016/j.diin.2011.05.002 Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010). The Hadoop Distributed File System. 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 1–10. doi:10.1109/MSST.2010.5496972 sKyWIper Analysis Team. (2012). Skywiper (a.K.a Flame a.K.a Flamer): a Complex Malware for Targeted Attacks (Vol. 05). Budapest. Retrieved from http://www.crysys.hu/skywiper/skywiper.p df\npapers2://publication/uuid/1A396077- EBAB-47F8-A363-162BDAF34247 Stone-Gross, B. (2012). The Lifecycle of Peer- to-Peer ( Gameover ) ZeuS. Retrieved from http://www.secureworks.com/cyber-

© 2015 ADFSL Page 85 JDFSL V10N3 On the Network Performance of Digital Evidence Acquisition … This work is licensed under a Creative Commons Attribution 4.0 International License.

Page 86 © 2015 ADFSL