Forensics Data

Project Report Fast Processing of Large (Big) Forensics Data Indian Academy of Sciences Summer Research Fellowship Program - 2014 By. Pritam Dash School of Computing Science and Engineering, Vellore Institute of Technology, Chennai Campus Guide: Dr. B.M Mehtre Associate Professor Institute of Development and Reaearch in Banking Technology, Hyderabad, India 1 Declaration I hereby declare that the project entitled Fast Processing of Large (Big) Forensics Data submitted for the Indian Academy of Sciences Summer Research Fellowship Program 2014 is my original work and the project has not formed the basis for the award of any degree, associateship, fellowship or any other similar work. Signature of the Student: Place: Hyderabad Date: 25th July 2014 2 Certificate This is to certify that this project report entitled Fast Processing of Large (big) Forensics Data is submitted to Indian Academy of Sciences, Bangalore, is a bonafide record of work done by Mr. Pritam Dash, undergraduate student at Vellore Institute of Technology, Chennai Campus under my supervision from 29th May 2014 to 25th July 2014. Signature of Guide Place: Hyderabad Date: 25th July 2014 3 Acknowledgement I take this opportunity to express my profound gratitude and deep regards to my guide Dr. B.M. Mehtre for his exemplary guidance, monitoring and constant encouragement throughout the course of this work. The help and guidance given by him time to time shall carry me a long way in the journey of life on which I am about to embark. I also take this opportunity to express a deep sense of gratitude to Mr. Sandeep K, Research Associate, IDRBT for his support, valuable information and guidance, which helped me in completing this task through various stages. I am obliged to staff members of (IDRBT), for the valuable information provided by them in their respective fields. I am grateful for their cooperation during the period of my summer research. Pritam Dash Hyderabad, 25th July 2014 4 Abstract The magnitude of potential digital evidence has grown exponentially. The increasing amount of data, especially unstructured data is becoming a major challenge for forensic investigators. Hence, in cyber forensics the necessity of sifting through vast amount of data quickly is now paramount. To speed up the processing, it is essential in the triage process to first eliminate those files that are clearly unrelated to the investigation. An effective method for supporting this work is matching files against black and white lists. We compare 5 different methods of finding additional uninteresting files: frequent hash values, frequent paths, frequent size, clustered creation, and uninteresting extensions [1]. Tests were run on data sources of different volumes collected from Windows and Linux systems. In this work we propose a new strategy for faster processing of large forensics data. And provides a comparison between the total no. of uninteresting files found by Metadata and Hash set matching with the above mentioned 5 methods. In our initial test we could eliminate additional 2.37% and 3.4% uninteresting files in Windows and Linux data sources respectively. Keywords: digital forensics, data source, hash set, metadata, black-list, white-list, uninteresting files. 5 Table of Contents Declaration ................................................................................................................................. 2 Certificate ................................................................................................................................... 3 Acknowledgement ...................................................................................................................... 4 Abstract ...................................................................................................................................... 5 Table of Contents ....................................................................................................................... 6 1 Digital Forensics Overview ................................................................................................ 7 1.1 Introduction ................................................................................................................ 7 1.2 Phases in Digital Forensics Analysis ..................................................................... 7 1.3 Branches of Digital Forensics .................................................................................... 8 1.3.1 Computer forensics ................................................................................................ 8 1.3.2 Mobile device forensics ......................................................................................... 9 1.3.3 Network forensics .................................................................................................. 9 1.3.4 Forensic data analysis ............................................................................................. 9 1.3.5 Database forensics .................................................................................................. 9 1.4 Challenges in Digital Forensics .............................................................................. 9 1.4.1 Large (big) data as a challenge ............................................................................ 10 2 Literature survey .............................................................................................................. 11 2.1 Introduction .............................................................................................................. 11 2.2 Strategies and tools for fast processing ................................................................ 11 2.2.1 Metadata analysis ................................................................................................. 12 2.2.2 Super clustering using Dirim ............................................................................ 12 2.2.3 Using Jumplist to identify fraudulent documents ................................................ 13 2.2.4 OpenLV ................................................................................................................ 13 2.2.5 Hash set matching............................................................................................... 13 2.2.6 Indexing through piece wise hash signature ........................................................ 15 2.2.7 Indexing image hashes ......................................................................................... 15 2.2.8 Methods for finding additional uninteresting files .......................................... 16 3 Proposed Method .............................................................................................................. 17 3.1 Step-1: Finding drives of interest. ............................................................................ 17 3.2 Step- 2: Eliminating uninteresting files .................................................................... 17 3.3 Experimental setup ................................................................................................... 17 3.4 Results and discussions ............................................................................................ 18 4 Conclusion and Future Work ........................................................................................... 21 4.1 Conclusion ................................................................................................................ 21 4.2 Future work .............................................................................................................. 21 References ................................................................................................................................ 22 6 1 Digital Forensics Overview 1.1 Introduction Digital Forensics is the use of scientifically derived and proven methods toward the preservation, collection, validation, identification, analysis, interpretation, documentation and presentation of digital evidence derived from digital sources for the purpose of facilitating reconstruction of events found to be criminal. The technical aspect of an investigation is divided into several sub-branches, relating to the type of digital devices involved; computer forensics, network forensics, forensic data analysis and mobile device forensics. The typical forensic process encompasses the seizure, forensic imaging (acquisition) and analysis of digital media and the production of a report into collected evidence. As well as identifying direct evidence of a crime, digital forensics can be used to attribute evidence to specific suspects, confirm alibis or statements, determine intent, identify sources (for example, in copyright cases), or authenticate documents. Investigations are much broader in scope than other areas of forensic analysis (where the usual aim is to provide answers to a series of simpler questions) often involving complex time-lines or hypotheses. 1.2 Phases in Digital Forensics Analysis Fig- 1 Phases In Digital Forensics Identification. This process includes the search, recognition and documentation of the physical devices on the scene potentially containing digital evidence. Collection – Devices identified in the previous phase can be collected and transferred to an analysis facility. 7 Acquisition – This process involves producing an image of a source of potential evidence, ideally identical to the original. Preservation – Evidence integrity, both physical and logical, must be ensured at all times. Analysis

Load more