SADFE 2015
Proceedings of the 10th International Conference on Systematic Approaches to Digital Forensic Engineering
Carsten Rudolph Nicolai Kuntze Barbara Endicott-Popovsky Antonio Maña
Editors Carsten Rudolph Nicolai Kuntze Monash University Huawei European Research Center Melbourne, Victoria, Australia Frankfurt Am Main Area, Germany
Barbara Endicott-Popovsky Antonio Maña University of Washington University of Malaga Seattle, WA, USA Malaga, Spain
Proceedings of the 10th International Conference on Systematic Approaches to Digital Forensic Engineering (SADFE 2015)
ISBN: 978-84-608-2068-0 Safe Society Labs (Spain)
© Copyright remains with authors of each publication. Authors retain the right to reproduce, distribute, display, adapt and perform their own work for any purpose. The proceedings of SADFE 2015 conference are published by Safe Society Labs as open access, and licensed under a Creative Commons Attribution- NonCommercial 4.0 International License1.
Typeset & Cover Design: Hristo Koshutanski (Safe Society Labs)
1 http://creativecommons.org/licenses/by-nc/4.0/
1
Preface
This volume constitutes the proceedings of the 10th International Conference on Systematic Approaches to Digital Forensic Engineering (SADFE 2015). Over the years, SADFE has been a venue that established new interdisciplinary relations and connections and has been the source of new initiatives and collaborations. One example of such an activity was the 2014 Dagstuhl Seminar "Digital Evidence and Forensic Readiness" with participants from 4 continents. This year, the SADFE steering committee took two risks. Most importantly, it is the first SADFE since 2007 that is not co-located with another event. Second, it is the first SADFE in Europe highlighting the necessity of international co-operation in the area of digital forensics. Nevertheless, SADFE will continue to have the character of a workshop. Single track, so that all participants share the same information and sufficient time and space for interaction and discussions. In response for the 2015 SADFE call for papers, 39 submissions from 16 different countries on 5 continents were received and reviewed. Of the papers submitted, 18 were accepted for presentation at the conference, of those 12 selected for publication in the Journal of Digital Forensics, Security and Law (http://www.jdfsl.org). The program also included key-note talks by Michael M Losavio on "Smart Cities, Digital Forensics and Issues of Foundation and Ethics" and by Klaus Walker on "The careless application of digital evidence in German criminal proceedings". In addition, a panel on the topic of "Digital Forensics: Future Challenges for Security Forces and Government Agencies" was held with the participation of representatives from different law enforcement agencies from around the world, such as The Netherlands, UK, United Arab Emirates and Spain. Many people contributed to the organisation and preparation of this conference, including the program committee and the SADFE steering committee. A special thanks goes to the host and General Chair Antonio Maña. He took care of countless tasks including the overall organisation of the conference, the SADFE 2015 website, publication and proceedings, venue, social events, final program, and many others. SADFE 2015 would have been impossible without his commitment and experience. Last, but certainly not least, thanks go to all the authors who submitted papers and all the attendees. We hope this year's program will once again stimulate exchange and discussions beyond the conference, and we look forward to the next 10 years of SADFE.
September 2015 Carsten Rudolph, Nicolai Kuntze, Barbara Endicott-Popovsky
Program Co-chairs SADFE 2015
2
Organization
Steering Committee: Deborah Frincke, (Co-Chair), Department of Defense, USA Ming-Yuh Huang, (Co-Chair), Northwest Security Institute, USA Michael Losavio, University of Louisville, USA Alec Yasinsac, University of South Alabama, USA Robert F. Erbacher, Army Research Laboratory, USA Wenke Lee, George Institute of Technology, USA Barbara Endicott-Popovsky, University of Washington, USA Roy Campbell, University of Illinois, Urbana/Champaign, USA Yong Guan, Iowa State University, USA
General Chair: Antonio Maña, University of Malaga, Spain
Program Committee Co-Chairs: Carsten Rudolph, Huawei European Research Center, Germany Nicolai Kuntze, Huawei European Research Center, Germany Barbara Endicott-Popovsky, University of Washington, USA
Publication Chair: Ibrahim Baggili, University of New Haven, USA
Publicity Chair Europe: Joe Cannataci, University of Malta, Malta Publicity Chair North-America: Dave Dampier, Mississippi State University, USA Publicity Chair Asia: Ricci Ieong, University of Hong Kong, Hong Kong
3
Program Committee
Sudhir Aggarwal Florida State University, USA Galina Borisevitch Perm State University, Russia Frank Breitinger University of New Haven, USA Joseph Cannatacci University of Groningen, Netherlands Long Chen Chongqing University of Posts and Telecommunications, China Raymond Choo University of South Australia, Australia K.P. Chow University of Hong Kong, Hong Kong David Dampier Mississippi State University, USA Hervé Debar France Telecom R&D, France Barbara Endicott-Popovsky University of Washington, USA Robert Erbacher Northwest Security Institute, USA Xinwen Fu UMass Lowell, USA Simson Garfinkel Naval Postgraduate School, USA Brad Glisson University of Glasgow, UK Lambert Großkopf Universität Bremen, Germany Yong Guan Iowa State University, USA Barbara Guttman National Institute of Standards and Technology, USA Brian Hay University of Alaska Fairbanks, USA Jeremy John British Library, UK Ping Ji John Jay College of Criminal Justice, USA Andrina Y.L. Lin Ministry of Justice Investigation Bureau, Taiwan Pinxin Liu Renmin University of China Law School, China Michael Losavio University of Louisville, USA David Manz Pacific Northwest National Laboratory, USA Nasir Memon Polytechnic Institute of New York University, USA Mariofanna Milanova University of Arkansas at Little Rock, USA Carsten Momsen Leibniz Universität Hannover, Germany Kara Nance University of Alaska Fairbanks, USA Ming Ouyang University of Louisville, USA Gilbert Peterson Air Force Institute of Technology, USA Slim Rekhis University of Carthage, Tunisia Golden Richard University of New Orleans, USA Corinne Rogers University of British Columbia, Canada Ahmed Salem Hood College, USA Viola Schmid Technische Universität Darmstadt, Germany Clay Shields Georgetown University, USA Vrizlynn Thing Institute for Infocomm Research, Singapore Faculty of Engineering and Computing at University of Technology, Sean Thorpe Jamaica William (Bill) Underwood Georgia Institute of Technology, USA
4
Wietse Venema IBM T.J. Watson Research Center, USA Hein Venter University of Pretoria, South Africa Xinyuan (Frank) Wang George Mason University, USA Kam Woods University of North Carolina, USA Yang Xiang Deakin University, Australia Fei Xu Institute of Information Engineering, Chinese Academy of Sciences Alec Yasinsac University of South Alabama, USA SM Yiu Hong Kong University, Hong Kong Wei Yu Towson University, USA Nan Zhang George Washington University, USA
5
Sponsoring Institutions
Safe Society Labs, S.L. http://www.safesocietylabs.com/
The University of Malaga http://www.uma.es
Journal of Digital Forensics, Security and Law http://www.jdfsl.org
6
Table of Contents
UFORIA - A Flexible Visualisation Platform for Digital Forensics and E-Discovery………………….. 8 Arnim Eijkhoudt, Sijmen Vos, Adrie Stander Dynamic Extraction of Data Types in Android’s Dalvik Virtual Machine……………………………… 13 Paulo R. Nunes de Souza, Pavel Gladyshev Chip-off by Matter Subtraction: Frigida Via…………………………………………………………….. 19 David Billard, Paul Vidonne The EVIDENCE Project: Bridging the Gap in the Exchange of Digital Evidence Across Europe……… 25 Maria Angela Biasiotti, Mattia Epifani, Fabrizio Turchi A Collision Attack on Sdhash Similarity Hashing……………………………………………………….. 36 Donghoon Chang, Somitra Kr. Sanadhya, Monika Singh, Robin Verma An empirical study on current models for reasoning about digital evidence…………………………….. 47 Stefan Nagy, Imani Palmer, Sathya Chandran Sundaramurthy, Xinming Ou, Roy Campbell Data Extraction on MTK-based Android Mobile Phone Forensics……………………………………… 54 Joe Kong Open Forensic Devices…………………………………………………………………………………… 55 Lee Tobin, Pavel Gladyshev A study on Adjacency Measures for Reassembling Text Files…………………………………………... 56 Alperen Şahin, Hüsrev T. Sencar An integrated Audio Forensic Framework for Instant Message Investigation…………………………... 57 Yanbin Tang, Zheng Tan, K.P. Chow, S.M. Yiu Project Maelstrom: Forensic Analysis of the Bittorrent-powered Browser……………………………… 58 Jason Farina, M-Tahar Kechadi, Mark Scanlon Factors Influencing Digital Forensic Investigations: Empirical Evaluation of 12 Years of Dubai Police Cases……………………………………………………………………………………………………… 59 Ibtesam Al Awadhi, Janet C Read, Andrew Marrington, Virginia N. L. Franqueira PLC Forensics based on CONTROL Program Logic Change Detection………………………………... 60 Ken Yau, Kam-Pui Chow Forensic Acquisition of IMVU: A Case Study…………………………………………………………... 61 Robert van Voorst, M-Tahar Kechadi, Nhien-An Le-Khac Cyber Black Box/Event Data Recorder: Legal and Ethical Perspectives and Challenges with Digital Forensics………………………………………………………………………………………………….. 62 Michael Losavio, Pavel Pastukov, Svetlana Polyakova Tracking and Taxonomy of Cyberlocker Link Sharers based on Behavior Analysis……………………. 63 Xiao-Xi Fan, Kam-Pui Chow Exploring the Use of PLC Debugging Tools for Digital Forensic Investigations on SCADA Systems… 64 Tina Wu, Jason R.C. Nurse The Use of Ontologies in Forensic Analysis of Smartphone Content…………………………………… 65 Mohammed Alzaabi, Thomas Martin, Kamal Taha, Andy Jones
7
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
UFORIA - A FLEXIBLE VISUALISATION PLATFORM FOR DIGITAL FORENSICS AND E-DISCOVERY Arnim Eijkhoudt & Sijmen Vos Amsterdam University of Applied Sciences Amsterdam, The Netherlands [email protected], [email protected]
Adrie Stander University of Cape Town Cape Town, South Africa [email protected]
ABSTRACT With the current growth of data in digital investigations, one solution for forensic investigators is to visualise the data for the detection of suspicious activity. However, this process can be complex and difficult to achieve, as there few tools available that are simple and can handle a wide variety of data types. This paper describes the development of a flexible platform, capable of visualising many different types of related data. The platform's back and front end can efficiently deal with large datasets, and support a wide range of MIME types that can be easily extended. The paper also describes the development of the visualisation front end, which offers flexible, easily understandable visualisations of many different kinds of data and data relationships.
Keywords: cyber-forensics, e-discovery, visualisation, cyber-security, computer forensics, digital forensics, big data, data mining ! forensic tools to integrate visualisation with 1. INTRODUCTION automated analysis, allowing investigators to With the growth of data that can be encountered in interactively guide their investigations (Garfinkel, digital investigations, it has become difficult for 2010). investigators to analyse the data in the time Many computer forensic tools are not ideally suited available for an investigation. As stated by Teerlink for identifying correlations among data, or for the & Erbacher (2006) “A great deal of time is wasted finding of and visually presenting groups of facts by analysts trying to interpret massive amounts of that were previously unknown or unnoticed. These data that isn’t correlated or meaningful without limitations of digital forensic tools are similar to the high levels of patience and tolerance for error”. forensic analysis of logs in network forensics. For Data visualisation might help to solve this problem, example, logs residing in routers, webservers and as the human brain is much faster at interpreting web proxies are often manually examined, which is images than textual descriptions. The brain can also a time-consuming and error-prone process (Fei, examine graphics in parallel, where it can only 2007). Similar considerations apply to E-mail process text serially (Teerlink & Erbacher, 2006) analysis as well. According to Garfinkel (2010), existing tools use Another issue with current tools is that they do not the standard WIMP model (Window, Icon, Menu, always scale well and will likely have problems Pointing device). This model is poorly suited to dealing with the growth of data in digital representing large amounts of forensic data in an investigations (Osborne, Turnbull, & Slay, 2010). efficient and intuitive way. Research must improve Currently, there are few affordable tools suited to
8
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
and available for these use-cases or situations. 2. ADVANTAGES Additionally, the available tools tend to be Uforia offers many advantages, of which the first is complex, requiring extensive training and very low cost. configuration in order to be used efficiently. A second advantage is that the system scales well Investigative data visualisation is used to assist due to its use of multiprocessing and distributed viewers with little to no understanding of the technologies such as ElasticSearch, so that subject matter, in order to reconstruct a crime or extremely large numbers of artefacts can be item and to understand what is being presented, for handled in a very short time. The processing of the example an investigator which is not familiar with a Enron set, consisting of roughly 500 000 E-mails particular scenario. On the other hand, analysis without attachments, typically takes less than ten visualisations can be used to review data and to minutes to complete on contemporary consumer- assess competing scenario hypotheses for grade hardware. This pre-processing step also investigators that do have an understanding of the ensures that little to no processing needs to be done subject matter (Schofield & Fowle, 2013). at the time of visualisation. A timeline is a valuable form of visualisation, as it Thirdly, the Uforia's development heavily focused greatly assists a digital forensic investigator in on making it as user- and developer-friendly as proving or disproving a hypothetical model possible. Many forensic tools need a substantial proposed for the investigation. A timeline can also amount of training and configuration to accomplish provide support for the mandate the digital forensic meaningful tasks. As this makes the systems investigator received prior to commencing the difficult and expensive to use and develop for, it investigation (Ieong, 2006). Interaction between was considered paramount during Uforia's role players can normally also be shown in network continued development to address these issues. diagrams, so that the combination of a timeline and Although a full UX study has not been conducted network diagram can generally answer many who yet, the UI and feature set was developed using and when answers. mock-ups and feedback from UX- and graphical The aspects of what and where can often be designers, as well as potential users from several answered by examining the contents of evidence fields of expertise, such as process, compliance and items, such as E-mails or the positional data of risk auditors, forensic investigators and law mobile phone calls. It is therefore important to be enforcement officers, where none of the able to display the details of data with ease as well. participants were given prior usage instructions. This paper describes the development of a flexible Another advantage is the extreme flexibility of the platform, Uforia (Universal Forensic Indexer and system. It is very easy to add new modules, e.g. for Analyser), that can be used to visualise many handling new MIME types, as the programming of different types of data and data relations in an easy such a module can normally be accomplished in a and fast way. very short time using simple Python programming. Additionally, the front end is completely web The platform consists of two sections, a back end based, and no special software needs to be installed and a front end, and is based on readily available to use it. This, combined with the following open source technologies. The back end is used to common web design and UX standards, suggests pre-process the data in order to speed up the that even novice users can achieve meaningful indexing and visualisation process handled by the results with little to no training. front end. The resulting product is a simple and extremely flexible tool, which can be used for 3. BACK END many types of data with little or no configuration. Very little training is needed to use Uforia, making 3.1 START-UP PHASE it accessible and usable for forensic investigators without a background in digital investigations or Uforia's back end is used to process the files systems, such as auditors. containing the data that will eventually be indexed and used in the visualisation process.
9
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
The back end's first step is to create a MySQL table module and processed accordingly. for the files. This table contains all metadata Uforia can also deal efficiently with flat-file common to any digital file, as well as calculated database(-like) formats by having modules return metadata (such as NIST hashes). their results as a multi-dimensional array. Uforia's A second database table is then generated, and it database engine turns these into multiple-row contains information about the supported MIME inserts into the appropriate modules' tables. types. This table is built by looking at a Examples of modules that deal with flat-files in this configurable directory containing the modules for fashion, are the modules that handle the mobile the MIME types that can be handled by the system. phone data (CSV-format) and the simple PCAP-file parser. Every module that can handle a specific MIME type is identified and added to this table. The table Due to its highly-threaded operation, the back end eventually contains zero, one or more 1:n key/value can pre-process large volumes of data efficiently in pairs for each of the supported MIME types and relatively little time. Once the processing steps are their respective module handlers. The module completed, the stored data needs to be transferred handlers are themselves stored as key/value pairs, from the back end storage in JSON-format to the with their original name as keys to the matching ElasticSearch engine for use by the visualisation unique table name. front end. These tables are then created for each module, so 4. FRONT END that Uforia can store the returned, processed data from each particular module in its unique table. The front end uses ElasticSearch, AngularJS and D3.js for the visualisation and administration Modules are self-contained files and extremely easy interface. to develop. They only require the structure of their database table to be stored as a simple Python The first step during the visualisation process is to comment line in the particular module, starting with select the modules or file types that need to be # TABLE: …, and a predefined process function visualised in the admin interface. which should return the array of the data to be The next step is to select (and possibly group any stored. identical) fields that need to be indexed by the ElasticSearch engine. The administration interface 3.2 PROCESSING will hint at similar field names in other supported Once all tables are created, the processing of the data types to allow for the merging of data types files that need to be analysed can start. into one searchable set. This makes it possible to correlate the timing of e.g. cell phone calls and E- The first step is to build a list of the files involved. mails. This is read from the config file. Once this list is completed, every file in the list is processed. During or after the indexing and storing in ElasticSearch, one or more visualisations must then The MIME type of the file is determined and then be assigned to the mapping in the admin interface. the relevant processing modules (0, 1 ... n) are This also includes specifying the fields that should called to process the file. The results returned by be laid out on the visualisation's axes. each module are then stored in the database table that was generated earlier for that particular The data in ElasticSearch can then be searched and module. visualised, even if the index process has not been completed yet. Because the front-end uses When Uforia encounters a container format, it can ElasticSearch, searches are fast and highly scalable. deal with it efficiently by recursively calling itself. Only when full detail views of selected evidence For instance, the Outlook PST module will unpack items are necessary, the underlying back-end encountered PST files to a temporary directory and database needs to be accessed. then call Uforia recursively for that temporary location. The unpacked individual E-mails are then automatically picked up by the normal E-mail
10
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
5. USER INTERFACE 6. EXAMPLES The interface is designed with the goal of In this section, an examples can be seen of how optimizing user-friendliness and ease of Uforia is can be used to quickly determine the E- understanding. The user interface sports a mail contacts of suspects. Despite limited available 'responsive design', with UI elements automatically space in this paper, it is nevertheless possible to resizing and repositioning themselves for different recreate similar scenarios for other data types. screen sizes, such as with laptops, tablets and Figure 2 shows an example set of a network graph mobile phones, as can be seen in Figure 1. derived from a sample set of PST-files, where the E-mail content was searched for the words 'investigate', 'books', 'suspect' or 'trading' and shown as a network graph indicating which individuals communicated about these words, with the size of the node indicating the amount of communication received. This immediately indicates the links between several possible suspects, including one whose PST mailbox was not included in the dataset and processed by Uforia.
Figure 1: Mobile Interface
1) The user selects an 'evidence type', which is the name used for the collection, as it was generated in the admin interface 2) Uforia then loads the module fields that have been indexed for that evidence type, e.g. 'Content' for E-mails or documents. 3) The user selects whether the field should 'contain[s]' or 'omit[s]' the information in Figure 2: Network Graph the last field.
4) Finally, the user selects one of the visualisations that have been assigned to Another example is creating a timeline, as seen in the evidence type. Figure 3, to determine when messages were sent 5) Uforia will now render the requested and which were sent around the time of the possible information using the selected transgression. visualisation, with some of the It is easy to determine the times of the E-mail visualisations offering additional messages by hovering over the intersections on the manipulation (such as a network graph). timeline, and to investigate the original E-mails by Lastly, all visualisations have one or more clicking on the intersections (see Figure 4). 'hot zones' where the user can 'click- through' to bring up a detailed view of the selected evidence item(s).
11
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
Uforia was tested on a number of real life scenarios, and in all cases it was able to produce real results in a fast and efficient way, requiring hardly any operator training. In conclusion, Uforia is fast, flexible and low cost solution for investigating large volumes of data.
REFERENCES
Fei, B. K. (2007). Data Visualisation in Digital Forensics. Pretoria, South Africa: Maters Figure 3: Timeline Dissertation, University of Pretoria.
Garfinkel, S. L. (2010). Digital forensics research: The timeline visualisation can handle multiple The next 10 years. Digital Investigation, items like calls from a large number of mobile 64-73. phones. Figure 4 shows anonymised data from a Ieong, R. S. (2006). FORZA - Digital forensics real case, illustrating how contacts and time can investigation Framework that incorporate easily be determined. The horizontal axis indicates legal issues. Digital Investigation(3), 29- the flow of time, while the graph nodes and 34. coloured lines indicate the moment of contact between the two phone numbers. By clicking on the Osborne, G., Turnbull, B., & Slay, J. (2010). The intersections, the original data can once again be ‘Explore, Investigate and Correlate’ (EIC) displayed. conceptual framework for digitalforensics Information Visualisation. International
Conference on Availability, Reliability and Security, (pp. 630 - 634). Schofield, D., & Fowle, K. (2013). Visualising Forensic Data : Evidence (Part 1). Journal of Digital Forensics, Security and Law, Vol. 8(1), 73-90. Teerlink, S., & Erbacher, R. F. (2006). Foundations for visual forensic analysis. 7th IEEE Workshop on Information Assurance. Westpoint, NY: IEEE.
Figure 4: Mobile Phone Timeline
7. CONCLUSION Uforia shows that it is possible to create a simple, user-friendly product that is nevertheless powerful enough to use in the most demanding investigations. It is easy to extend if any new MIME types are encountered or new features are needed.
12
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
DYNAMIC EXTRACTION OF DATA TYPES IN ANDROID’S DALVIK VIRTUAL MACHINE Paulo R. Nunes de Souza, Pavel Gladyshev
Digital Forensics Investigation Research Laboratory, University College Dublin, Ireland
ABSTRACT This paper describes a technique to acquire statistical information on the type of data object that goes into volatile memory. The technique was designed to run in Android devices and it was tested in an emulated Android environment. It consists in inserting code in the Dalvik interpreter forcing that, in execution time, every data that goes into memory is logged alongside with its type. At the end of our tests we produced Probability Distribution information that allowed us to collect important statistical information that made us distinguish memory values between references (Class, Exception, Object, String), Float and Integer types. The result showed this technique could be used to identify data objects of interest, in a emulated environment, assisting in interpretation of volatile memory evidence extracted from real devices.
Keywords: Android, Dalvik, memory analysis.
1. INTRODUCTION 2. BACKGROUND In digital forensic investigations, it is sometimes Traditional digital forensics relies on evidences necessary to analyse and interpret raw binary data found in persistent storages. This is mainly due to fragments extracted from the system memory, the need to both sides of the litigation to reproduce pagefile, or unallocated disk space. Event if the and verify every forensic finding. The persistent precise data format is not known, the expert can storage can be forensically copied, providing a often find useful information by looking for human controllable way to repeat the analysis, getting to readable ASCII strings, URLs, and easily the same results. identifiable binary data values such as Windows An alternative way is to combine the traditional FILETIME timestamps and SIDs. Figure 1 shows forensics with the so called live forensics. The live an example of a memory dump, where a forensics relies on evidences found in volatile FILETIME timestamp can be easily seen (a memory to draw conclusions. This type of evidence sequence of 8 random binary values ending in 01). features a lesser level of control and repeatability if To date, the bulk of digital forensic research compared with traditional evidences. On the other focused on Microsoft Windows platform, this paper hand, live evidences may unravel key information describes a systematic experimental study to find to the progress of a case. However, the question (classes of) easily identifiable binary data values in regarding the reliability of the live evidence Android platform. remains in place, mainly in two moments: the memory acquisition and the memory analysis. In the memory acquisition front, law enforcements and researchers are working to establish standard procedures to be used. These procedures could be based on physical or logical extraction. The physical extraction could need disassembling of the Figure 1: Hexadecimal view of a memory dump device or the use of JTAG as done by Breeuwsma
13
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
[2006]. The logical extraction can be more diverse, that data. from interacting with the system with user privileges as done by Yen et al. [2009]; it could 3. ANDROID STRUCTURE also gain system privileges through a kernel The Android OS is an Operating System based on module as done by Sylve et al. [2012]; even use a Linux, with extensions and modifications, virtual machine layer to have free access to the maintained by Google. The OS was designed to run memory like done by Guangqi et al. [2014], among on a large variety of devices sharing same common others. Regardless of the extraction method, there characteristics [Ehringer, 2010]: (1) limited RAM; will be the need to analyse the extracted data. (2) little processing power; (3) no swap space; (4) One challenge faced when analysing a memory powered by battery; (5) diverse hardware; (6) dump is that application data is stored in memory sandboxed application runtime. following the algorithms of the program owning that memory space. Being aware of the variety of software running on nowadays devices, the task of interpreting the device’s extracted memory is complex. Some researchers are tackling this challenge taking different approaches. Volatility [2015] provides a customizable way to identify kernel data structures from memory dumps; Lin et al. [2011] used graph-based signatures to identify kernel data structures, Hilgers et al. [2014] uses the Volatility framework to identify structures beyond the kernel ones, identifying static classes in the Android system.
A deeper memory analysis tool that would consistently interpret data structures from Figure 2: Architecture of Android OS application software has not yet being developed. The in-depth memory analysis is normally done in To provide a system that could run on such diverse a adhoc basis, interpreting the memory dump from and resource limited devices, they decided to build the light of the reversed engineered application’s a multi-layered OS(Figure 2). The 5 layers are: (1) source code, as done by Lin [2011]. A broader Linux kernel; (2) Hardware Abstraction Layer approach, that would not depend on the (HAL); (3) Android runtime and Native libraries; application’s source code, could be powerful to (4) Android framework; (5) Applications. deep memory analysis. The Android OS is an hybrid of compiled and This approach, not based on the application source interpreted system. The boundary between code, would have advantages and disadvantages. compiled and interpreted execution is the Android As an advantage, this approach could be used in runtime. The versions of the Android used in our situations where the source code is unknown, experiments (android-2.3.6 r1 and android-4.3 unavailable, or legally disallowed to be reversed r2.1) feature Dalvik Virtual Machine (Dalvik VM) engineered. On the other hand, without the source in the runtime package. All the programs running in code to deterministically assert the meaning of each the layers underneath Dalvik VM are compiled and memory cell, this method would need to take a all programs running in the layers above Dalvik probabilistic approach. The foundation for such VM are interpreted. The Dalvik VM hosts approach is a probabilistic understanding of the programs that were written in a Java syntax, memory data associated with their respective type. compiled to an intermediary code level called This paper uses the Android OS as environment to bytecode and then packed to be loaded into Dalvik. present a technique to gather the memory When the software is launched inside Dalvik VM, information associated with its type, making each line of bytecode is interpreted into the possible to have an probabilistic understanding of machine code, normally in ARM architecture.
14
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
The Dalvik VM is implemented as a registerbased data collecting code. The features that most suit our virtual machine. This mean that the instructions needs are: (1) there is an different entry for each operate on virtual registers, being those virtual bytecode instruction, called opcode; (2) several of registers memory positions in the host device. The the opcodes of the Dalvik VM are type related. instruction set provided by the Dalvik VM consists Therefore, it is a good point to place the code of a maximum of 256 instructions, being some of designed to collect the data, relating values and them currently unused. Part of the used instructions types that goes to memory. is type specific, being those the ones chosen to be Even though the Dalvik interpreter is conceptually used to collect data and type information. the central point from where every single line of The Dalvik VM instruction set is grouped in some Dalvik bytecode should pass through, there is one categories: binop/lit8 is the set of binary operations exception. The Android OS features an receiving as one of the arguments a literal of 8 bits; optimization element called Just In Time (JIT) binop/lit16 is the set of binary operations receiving compilation that can bypass the Dalvik interpreter as one of the arguments a literal of 16 bits; [Google, 2010]. The JIT compiler is designed to binop/2addr is the set of binary operations with identify the most demanded tracks of code that run only two registers as arguments, being the result over the Dalvik VM. After identified, those tracks stored in the first register provided; binop is the set would be compiled and, next time they were of binary operations with three registers as demanded, the JIT would call the compiled track, arguments, two source registers and one destination instead of calling the interpreter. This way, the code register; unop is the set of unary operations with we use to collect our data would not be executed two registers as arguments, one source register and and the collected data would not be accurate. one destination register; staticop is the set of operations that perform over static object fields; JIT configuration # of instructions logged instanceop is the set of operations that perform WITH JIT = true 2,676,540 over instance object fields; arrayop is the set of WITH JIT = false 3,643,739 operations that perform over array fields; cmpkind is the set of operations that perform comparison Table 1: Number of instructions logged during the between two floating point or long; const is the set Android booting process of operations that move a given literal to a register; In our tests, the JIT compiler would skip, on move is the set of operations that move the content average, 26.5% of the type bearing instructions of a register to another register. during the Android booting process(Table 1). To Each of those categories has a number of avoid this source of error, it was necessary to instructions specifically designed to operate over deactivate the JIT compiler on our test Android OS. some data type. The whole instruction set The Android system contains an environment distinguishes 12 data types, namely: (1) Boolean; variable WITH JIT that is used to deploy an (2) Byte; (3) Char; (4) Class; (5) Double; (6) Android system with or without JIT. In order to Exception; (7) Float; (8) Integer; (9) Long; (10) deactivate the Just In Time compilation, we edited 3 Object; (11) Short; (12) String. the makefile Android.mk and forced the WITH JIT to be set to false. 4. MODULAR INTERPRETER (MTERP) Having deactivated the JIT, it is necessary to insert As the Android OS is open source, the source code the logging code into the interpreter. The interpreter of the OS [Google, 2015], including the Dalvik source code is put together in a modular fashion, VM, is available to be downloaded and modified. for this reason it is called modular interpreter By inspecting the Dalvik VM source code in (mterp). For each target architecture variant there 4 details, it was possible to identify that the will be a configuration file in the mterp folder . The interpreter2 would be a strong candidate to host the 3 The Android.mk is located on the following directory of the Android source tree: /android/dalvik/vm 2 The interpreter is located on the following directory of the 4 The mterp folder is located on the following directory of the Android source tree: /android/dalvik/vm/mterp Android source tree: /android/dalvik/vm/mterp
15
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
configuration will define, for each Dalvik VM instruction, which version of ARM architecture will Android be used and where the corresponding source code is Emulator located. In order to log all the designed instructions, Boolean.log several ARM source code files, scattered in the mterp folder, will need to be edit accordingly, and Extraction any extra subroutine could be inserted in the file Byte.log footer.S. After all the codes are edited, it is required Log to run a script called rebuild.sh, located in the mterp.log . Processing . mterp folder, that will deploy the interpreter5. Finally, the Android system, that will contain the modified interpreter, need to be built. String.log When executing the deployed Android OS, the data extraction takes place. The extracted data is stored in a single file with one entry per line as shown in Figure 3: Log processing Listing 1. The key information we can find in each entry are the two last columns, containing the type Summing up, to extract the memory values and the hexadecimal value stored in memory. associated with their respective types we needed to do the following: Listing 1: Unprocessed log sample D(285:298) Object = <0x41a1fc68> • deactivate the JIT Compiler from an Android OS; D(285:298) Int = <0x00034769> D(285:298) Object = <0x41a1fc68> • inject code in the Dalvik Interpreter to log D(285:298) Int = <0x00011db5> types and values on each interpreted D(285:298) Byte = <0x2f> typebearing instruction ; D(285:298) Int = <0x00000000> • run the adjusted Android OS to collect data D(285:298) Int = <0x0000002f> on the logs; D(285:298) Char = <0x2f> • process the logged data;
The deactivation of the JIT compiler and the Having this file, we process it to separate one data modification in the Dalvik interpreted code, type on each file and exclude any extra information expectedly, generated an execution overhead. apart from the hexadecimal value, as depicted in Considering the average booting time, the logging the Figure 3. procedure seems to have effected more the response time than the JIT deactivation. The Table 2 shows the average booting times with and without JIT, as well as with and without the logging code. Log = off Log = on
WITH JIT = true 62s 2176s WITH JIT = false 62s 3026s Table 2: Average booting time in seconds
5. RESULTS Having all the processed logs, it was possible to extract some statistical information from them. The 5 The interpreter is located on the following directory of the Table 3 shows in what proportion each type appear Android source tree: /android/dalvik/vm/mterp/out!
16
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
in the logs. The table makes clear that the Int type prevail over the other types, with 54.3% of the appearances. Other types with a rather common rate of occurrence are Byte (8.17%), Char (13.19%) and Object (24.00%). The remainder of the types have a percentage lower than 1%. Type # of occurrences % of total Bool 6,512 0.1787% Byte 297,578 8.1668% Char 444,163 12.1898% Class 1,454 0.0399% Double 836 0.0229% Exception 168 0.0046% Float 6,374 0.1749% Int 1,978,652 54.3028% Long 7,837 0.2151% Object 874,196 23.9917% Short 3,034 0.0833% String 22,935 0.6294% Total 3,643,739 100.0000% Table 3: Booting time in seconds At this point, the 32-bit types are being highlighted. They are: (1) Class; (2) Exception; (3) Float; (4) Integer; (5) Object; (6) String. Each of those 6 types have its own probability distribution of values plotted on the Figure 4. From the distributions it is possible to spot the similarity among the types: (1) Class; (2) Exception; (3) Object; (4) String. All 4 of them Figure 4: Probability distribution of values by 32- have a predominant peak a little after the value bit type (Log scale) 0x4000000. This similarity can be explained by the fact that those 4 types are indeed references, 6. CONCLUSION therefore, pointers to a memory address. If focusing This paper explained a technique to capture only on the values around 0x40000000, the Float memory data along with their corresponding data type could be confused with the reference ones, because it also displays a peak around 0x40000000, type in an emulated Android OS. This technique however a much broader one, moreover, it has an required deactivation of the optimization process called Just In Time compilation and the second lower peak around 0xc0000000. The Int type displays occurrences along the whole spectrum modification of the interpreter ARM code. The of values, featuring two more relevant peaks. One technique creates an expected overhead on the Android execution time. As this technique was only peak around 0x00000000 and the other peak around designed to run in emulated Android, this overhead 0xffffffff. Those two peaks could be explained by an greater occurrence of integer with small absolute is not an issue. The technique allowed us to collect important statistical information that made us values, being them of positive and negative signal, distinguish memory values between references respectively. (Class, Exception, Object, String), Float and Integer
17
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
types. Beyond this specific test case, this technique C. Hilgers, H. Macht, T. Muller, and M. could be use to build an statistical data corpus of Spreitzenbarth. Post-mortem memory analysis Android memory content. This data corpus may of cold-booted android devices. In IT Security become a tile on the work of paving the ground to Incident Management IT Forensics (IMF), the development of a consistent deep memory 2014 Eighth International Conference on, analysis tool. pages 62–75, May 2014. doi: 10.1109/IMF.2014.8. 7. ACKNOWLEDGEMENTS Zhiqiang Lin. Reverse Engineering of Data This work was supported by research grants (BEX Structures from Binary. PhD thesis, CERIAS, 9072/13-6) from Science Without Borders Purdue University, West Lafayette, Indiana, implemented by CAPES Foundation, an agency August 2011. under the Ministry of Education of Brazil. Zhiqiang Lin, Junghwan Rhee, Xiangyu Zhang, REFERENCES Dongyan Xu, and Xuxian Jiang. Siggraph: brute force scanning of kernel data structure Ing. M.F. Breeuwsma. Forensic imaging of instances using graph-based signatures. In 18th embedded systems using JTAG (boundary-scan). Annual Network & Distributed System Security Digital Investigation, 3 (1):32 – 42, 2006. ISSN Symposium Proceedings, 2011. 1742-2876. doi: http://dx.doi.org/10.1016/j.diin.2006.01.003. Joe Sylve, Andrew Case, Lodovico Marziale, and Golden G. Richard. Acquisition and analysis of David Ehringer. The dalvik virtual machine volatile memory from android devices. Digital architecture, 2010. Investigation, 8(34):175–184, 2012. ISSN Google. Google i/o 2010 - a jit compiler for 1742-2876. doi: android’s dalvik vm. Google Developers, May http://dx.doi.org/10.1016/j.diin.2011.10.003. 2010. URL www.youtube.com/watch?v=Ls0tM- Volatility. The volatility framework, 2015. URL c4Vfo. Accessed 6th March 2015. http://www.volatilityfoundation.org/. Accessed 18th March 2015. Google. Android source code repository. repo, 2015. URL https://android.googlesource.com/ Pei-Hua Yen, Chung-Huang Yang, and TaeNam plataform/manifest. Accessed 11th February Ahn. Design and implementation of a live- 2015. analysis digital forensic system. In Liu Guangqi, Wang Lianhai, Zhang Shuhui, Xu Proceedings of the 2009 International Shujiang, and Zhang Lei. Memory dump and Conference on Hybrid Information forensic analysis based on virtual machine. In Technology, ICHIT ’09, pages 239–243, New Mechatronics and Automation (ICMA), 2014 York, NY, USA, 2009. ACM. ISBN 978-1- IEEE International Conference on, pages 60558-662-5. doi: 10.1145/1644993.1645038. 1773–1777, Aug 2014. doi: 10.1109/ICMA.2014.6885969.
18
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
CHIP-OFF BY MATTER SUBTRACTION: FRIGIDA VIA David Billard1, Paul Vidonne2 1University of Applied Sciences in Geneva, Switzerland [email protected]
2LERTI, France [email protected]
ABSTRACT This work introduces an unpublished technique for extracting data from flash memory chips, especially from Ball Grid Array (BGA) components. This technique does not need any heating of the chip component, as opposed to infrared or hot air de-soldering. In addition, it avoids the need of re-balling BGA in case of missing balls at the wrong place. Thus it enhances the quality and integrity of the data extraction. However, this technique is destructive for the device motherboard and has limitations when memory chip content is encrypted. The technique works by subtracting matter by micro-milling, without heating. The technique has been extensively used in about fifty real cases for more than one year. It is named frigida via, compared to the calda via of infrared heating.
Keywords: Chip-off forensics, data extraction, BGA, data integrity preservation, micro-milling, infrared heating.
second case, some balls of the BGA will stay on the 1. INTRODUCTION motherboard and the practitioner will have to re- Forensics laboratories are daily facing the challenge ball the chip in order to extract data using a BGA of extracting data from embedded or small scale reader. digital devices. In the better case, the devices are As an example, the BGA component shown in already known from commercial vendors of figure 1 comes from a cell phone motherboard. The extraction tools and a proved method is available to labeling on the chip is very clear: it’s a NAND chip the practitioner. In most cases, the devices are and the edges of the chip are sharp. unknown, or broken, and then begins the fastidious search of a method to extract data from the device without jeopardizing the judicial value of the – hypothetical – concealed evidence. When no software-based method exists, the desoldering of the chip holding the data is accomplished. The chip is often a flash memory component, more and more of Ball Grid Array (BGA) technology. The de-soldering, even when routinely executed, is no error prone and induces a heavy stress on the component. Furthermore, the controlling of the heating is based on temperature Figure 1: BGA from a cell phone motherboard probes which are not always accurate enough. This The chip has been heated using infrared and the leads to chips being heated too much or chips being result is shown in figure 2. The component changed teared off. In the first case, the data content may be color (no more labeling visible) and the edges are altered, even destroyed in some occasion. In the blurred. The ball grid is also a bit wavy: the heating
19
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
has a dramatic effect on the component. However, later in this work. the component is still readable and data can be extracted. The ruler (in millimeters) has been added to give the reader of this paper a better idea of the component’s size.
Figure 4: Milled Micron BGA recto and verso The paper is organized as follows: section 2 is a review of literature about data extraction from flash components; section 3 presents the principle of the milling process, the machine and the interaction with precision bar turning; section 4 lists some lessons learned in using this technique compared to Figure 2: Heated BGA recto and verso infrared heating and presents a comparative table of pros and cons. In this paper we propose a new method for taking off BGA chips from motherboard, without heating 2. RELATED WORKS them. In fact, instead of taking the chip off, we An extensive literature exists about extracting data remove the motherboard from under the chip. We from flash (or eeprom) memory chips. Most of this use micro-milling technology and we subtract literature assumes that the device is in working matter from the motherboard on the other side of order. For instance, (Breeuwsma, 2006) addresses the chip, until we reach the ball grid. The process is the use of JTAG (boundary-scan) in order to bypass constantly monitored and controlled and it stops or trick the processor or the memory controller. In when reaching the balls. A result of this process is (Sansurooah, 2009), the author is addressing the shown below. use of flasher tools in order to load a bootloader The Micron chip presented in figure 3 is still into the device memory; this bootloader is designed attached to the motherboard. The labeling is clear, to gain access to low-level memory management, and the edges of the chip are sharp. thus enabling the reading of all memory blocks. Some papers, like (Fiorillo, 2009) are using hot air de-soldering to compare the content of flash memory chips before / after some writing of data. In (Willassen, 2005), several ways of desoldering chips are mentioned, all based on heating the component (hot air, infrared, ...). In a remarkable presentation, (van der Knijff, 2007) presents an
overview of most techniques for chipoff and JTAG access. Figure 3: Micron BGA on the motherboard Commercial products like (Cellebrite, 2015) or Once the milling process is done, the chip labeling (Microsystemation, 2015) are based on several is still as clear on the recto, and the grid balls are all techniques in order to gain access to the low-level present at the verso, as shown in figure 4. memory. Although these tools are not suited for Since no heating has been applied, the chip content chip-off, they provide the ability to decode memory has been cleared of any stress and is intact. We dumps extracted from flash memory chips. have been using and refining this technique for To our knowledge, the memory reading of broken / about one year on fifty real cases. We had an issue dismantled digital devices is done either by heating with only one particular case which is presented
20
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
and chip-off or sometimes by entirely Drawing the chip shape at the verso reconstructing the device around the flash memory. Figure 5: Localization step Our paper brings an unpublished approach, 3. Peeling step: using a milling cutter to cut, requiring no heating, thus enhancing the integrity and quality of the data extraction. It is especially layer by layer, the motherboard, until short designed for broken devices but also for running of arriving to the grid balls. Sometimes it devices, with some limitations, discussed in Sec. 4. means also cutting layers of BGA components when the grid balls are lightly 3. SUBTRACTING MATTER encased into the chip. Figure 6 presents a photography of the milling cutter sawing 3.1 PRINCIPLE through the motherboard until the grid balls The aim of the technique is to subtract matter are exposed. around the component. Concerning a BGA component, it sums up to obliterate the motherboard and its other components, leaving the BGA component alone. The technique can be summarized into the following steps: 1. Localization step: since the motherboard is milled, at its verso, just under the memory chip, the cutting tool has to be directed to the Milling to the grid balls localization of the chip, while the chip is hidden by the motherboard. Thus it is Figure 6: Peeling step necessary to locate the chip on the verso side For this milling step, it is of utmost of the motherboard by measuring distances importance that the milling cutter head and from the board sides to the chip sides on the the motherboard be perfectly aligned at 90◦. recto side. Then using the measures to draw Even a very small angle deviation may lead the shape of the chip on the verso of the to a catastrophic bite of the milling cutter into motherboard. the BGA component. In that case, the component may be utterly destroyed. Figure 5 presents a photography of the drawing of the shape of the chip, on the 4. Cleansing step: removing the last bits of verso of the motherboard. motherboard layer and epoxy that may still adhere to the grid balls. 2. Revolving step: turning on itself the BGA component, still attached to its part of Once those steps are finished, there is no need of motherboard, in order to have the re-balling the component, since no ball has been motherboard facing up (thus the component lost. The component can be used straight away in a flash reader, provided that the practitioner has the facing down). right pinout module. The upper image in figure 7 represents a sectional view of a BGA, taken from (Guenin, 2002). The lower image represents the working of the milling cutter, subtracting the motherboard and leaving the grid balls exposed.
21
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
3.4 PRECISION BAR TURNING The idea to implement this frigida via technique comes from interaction with specialists of precision bar turning. These people are specialized in Sectional view of BGA, soldered to the motherboard manufacturing tiny pieces of hardware, like gear wheels one can find in mechanical watches or complex components with special alloys used in space satellites. We were facing more and more devices locked to Sectional view of BGA, detached by milling investigation due to their poor condition: cell phone with a bullet hole, GPS retrieved from a sunken Figure 7: Process illustrated boat or tablet barely surviving a plane crash. Using commercial tools or flash boxes was not an option 3.2 VARIANT and infrared heating was adding additional stress on components already submitted to heavy stress. In some case, in particular when processor and Therefore, instead of thinking like repairing firms memory are piled one on top of the other, before whose job is to detach an object in order to repair the localization step, the motherboard has to be or analyze the failure of the whole device, we cuted all around the component, either by drilling thought about isolating the memory from its holes close to the four sides (like old fashioned external surroundings. In other words: obliterating stamps) or by drilling one hole and using a fretsaw the surrounding area, in order to leave the all around the BGA component. This operation is component exposed. called the punching step and figure 8 presents a photography of such step. One of the first case prompting us to use the milling was the investigation of a cell phone, retrieved after 3.3 MACHINE a car chase between the police and three drug dealers. The motherboard was badly damaged and The machine used for the milling is a standard we feared that using infrared on the memory chip precision micro milling machine from Proxxon may inadvertently damage further the chip. After (Proxxon, 2015). It must be capable of 0.05 extensive testing on spare devices, the milling millimeter steps (0.002 inch) with a rotating speed process was applied to the device remnants and varying from 5,000 to 20,000 rpm (revolutions per information was successfully extracted. minute). The milling cutters have usually a diameter between 1 and 3 millimeters (0.04 to 0.12 4. LESSONS LEARNED AND METHOD inch). A watchmaker grade magnifier, or a digital COMPARISON magnifier, is needed to control and verify the peeling step. 4.1 ENCRYPTION The technique explained in this paper has to be used with prudence when dealing with encrypted devices. In a real case about narcotics, a BlackBerry 9720 was seized. It had a keyboard lock that the owner was not willing to depart from. The frigida via was successfully used and figure 9 presents the recto and verso images of the SKhynix chip.
Separating the component from the others
Figure 8: Punching step
22
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
motherboard, even if the grid balls are melted. We did not find if the epoxy glues together the chip and the motherboard at heating time or if it is done during the assembly of the motherboard. In that case, even a heavy heating cannot desolder the chip, and will more likely destroy the content of the component. In table 1 we summarize the main differences Figure 9: Milled SKhynix BGA recto and verso between calda via and frigida via. But after reading the chip, it appeared that all the Table 1: Comparison Infrared vs Milling component content was encrypted. Finally, after Calda via: Infrared Frigida via: Milling some weeks, the password is supplied. Unfortunately, this password alone is not sufficient Heat damage No heat applied to decrypt the content: it must be used in Re-balling necessary No need of re-balling conjunction with some hardware information, Extensive cleansing Light cleansing contained in other components of the motherboard. Resoldering possible No resoldering Thus, even with the password, the memory remains encrypted. Same process duration The table 1 shows the most obvious differences 4.2 PROCESS DURATION & between infrared and milling. But even if milling COMPARISON seems superior in many aspects with respect to The milling technique takes between thirty minutes infrared, we are still using the two techniques on to one hour, depending on the quality of the the cases. The choice of the technique to apply is motherboard. Namely, if the motherboard is flat, dictated by several factors, among which: without any deformation, it takes less than thirty 1. the availability of the machines; minutes, and if the motherboard has been retrieved after a helicopter crash, it takes about one hour. 2. the risk of finding encrypted data linked Once the chip is off the motherboard, it is tohardware components; immediately available for reading and the first 3. the risk of damaging the chip by heating; contact in the reader socket is usually the good one. 4. the likeness of epoxy presence gluing The infrared (or hotair) method is usually shorter in time for the chip-off, thirty minutes being the upper thememory chip and the motherboard; limit of the process. However, the process can be 5. the training of the practitioner. impeded in many ways. First, the chip can loose grid balls during the When facing a chip-off, we are applying a process; some of them staying attached to the riskbased decision matrix in order to decide motherboard. After cooling the chip, many tries are between calda and frigida via. needed to find which grid balls are missing and additional time is needed to re-ball the chip, even if 5. CONCLUSION not all the grid balls need to be present, only the In this paper, we present a new technique for “useful” ones. extracting data from flash memory chips, especially The heating process also leaves residues of matter from Ball Grid Array (BGA) components. that have to be scrapped off using toothbrushes or This technique, called frigida via (or milling), is special treatment. Then several tries are also needed complementary of infrared or hot air chip-off to place the chip correctly into the reader socket, processes and offers many new possibilities. since the edges of the chip are no more rectilinear. Instead of relying on the heating of the solder of Furthermore, the epoxy layer between the chip and BGA component, in the hope that the component the motherboard can glue the chip to the
23
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
will detach cleanly from the motherboard, the Guenin, B. (2002, February). The many flavors of technique presented in this paper rely on ball grid array packages. Electronics Cooling. subtracting the motherboard from the component. Retrieved from http://www.electronics2 The motherboard is milled under the chip until cooling.com/42002/02/the2many2flavors2ofball2 exposing the grid balls. At the end of the process, grid2array2packages/ the chip is freed from the motherboard and can be placed on a reader socket for further analysis. Microsystemation. (2015). Xry mobile forensics. Retrieved from https://www.msab.com/ Since this technique does not require any heating of the chip component, as opposed to infrared or hot Proxxon. (2015). Precision lathe and milling air de-soldering, it avoids the inadvertent systems. Retrieved from degradation of the memory. As a matter of fact the http://www.proxxon.com component may be already weakened by external causes, or simply of fragile design, and using Sansurooah, K. (2009, December). A forensics heating, even with careful controlling of overview and analysis of usb flash memory temperature, may lead to the destruction of the devices. Proceedings of the 7th Australian memory content. Digital Forensics Conference, 99-108. van der Therefore, the frigida via is more respectful of the data integrity since it does not impose additional Knijff, R. (2007). 10 good reasons why you should stress on the memory chip, and the quality of the shift focus to small scale digital device forensics. data extraction is enhanced. Retrieved from http://www.dfrws.org/2007/4 proceedings/vanderknijff4 pres.pdf In addition, the frigida via avoids the need of reballing BGA in case of missing balls at the wrong Willassen, S. Y. (2005). Abstract forensic analysis place. It also eliminate the problem of the epoxy of mobile phone internal memory. Retrieved gluing memory chip and motherboard in some from http://digitalcorpora.org/corpora/4 devices. files/Mobile2Memory2Forensics.pdf However, this technique is destructive for the device motherboard and re-soldering of the chip component is impossible. That impossibility is a severe limitation when the memory content is encrypted by a combination of password and hardware-related information. The technique works and has been used in about fifty real cases, for more than one year.
REFERENCES Breeuwsma, I. M. (2006). Forensic imaging of embedded systems using jtag (boundaryscan). Digital Investigation, 3(1), 32 - 42. doi: DOI: 10.1016/j.diin.2006.01.003
Cellebrite. (2015). Ufed mobile forensics. Retrieved from http://www.cellebrite.com
Fiorillo, S. (2009, December). Theory and practice of flash memory mobile forensics. Proceedings of the 7th Australian Digital Forensics Conference, 52-84.
24
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
THE EVIDENCE PROJECT: BRIDGING THE GAP IN THE EXCHANGE OF DIGITAL EVIDENCE ACROSS EUROPE
Maria Angela Biasiotti, Mattia Epifani, Fabrizio Turchi
Institute of Legal Information Theory and Techniques of the Italian National Research of Council
Florence , Italy, 50127
[email protected], [email protected], [email protected]
ABSTRACT Based upon the assumption that the very nature of data and information held in electronic form makes it easier to manipulate than traditional forms of data, that all legal proceedings rely on the production of evidence in order to take place and that electronic evidence is no different from traditional evidence in that is necessary for the party introducing it into legal proceedings, to be able to demonstrate that it is no more and no less than it was, when it came into their possession the EVIDENCE Project aims at providing a road map (guidelines, recommendations, technical standards) for realising the missing Common European Framework for the systematic and uniform application of new technologies in the collection, use and exchange of evidence. This road map incorporating standardized solutions aims at enabling all involved stakeholders to rely on an efficient regulation, treatment and exchange of digital evidence, having at their disposal as legal/technological background a Common European Framework allowing them to gather, use and exchange digital evidences according to common standards, rules, practises and guidelines. EVIDENCE activities will also aim at enabling the implementation of a stable network of experts in digital forensics communicating and exchanging their opinions and contributing as well to the building up of a stable communication channel between the public and the private sectors dealing with electronic evidence.
Keywords: digital evidence, digital evidence exchange, metadata, formal languages. different, uncertain, regulations are not 1. THE CONTEXT harmonized and aligned and therefore exchange All legal proceedings rely on the production of among EU Member States jurisdictions and at evidence in order to take place. Electronic transnational level is very hard to be realized. Evidence is no different from traditional evidence What is missing is a Common European in that is necessary for the party introducing it Framework to guide policy makers, law into legal proceedings, to be able to demonstrate enforcement agencies and judges when dealing that it is no more and no less than it was, when it with digital evidence treatment and exchange. came into their possession. In other words, no The EVIDENCE project interpreted this request changes, deletions, additions or other alterations by defining it as: have taken place. The very nature of data and information held in electronic form makes it • the need for a common background for all easier to manipulate than traditional forms of actors involved in the Electronic Evidence life- data. When acquired and exchanged integrity of cycle: Policy makers, LEAs, Judges and Lawyers; the information must be maintained and proved. • the need for a common legal layer devoted to Legislations on criminal procedures in many the he regulation of Electronic Evidence in European countries were enacted before these Courts technologies appeared, thus taking no account of them and creating a scenario where criteria are • the need for standardized procedures in the use, collection and exchange of Electronic 25
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
Evidence (across EU member States). The project is now at its halfway mark and step 1- 5-7 are completed whilst step 2-3-4-6 are on the In response to the above needs and gaps the way to produce final assessment. EVIDENCE project aims at providing a Road Map (guidelines, recommendations, technical 2. PRELIMINARY REMARKS ON THE standards) for realizing the missing Common CONCEPT OF ELECTRONIC European Framework for the systematic and EVIDENCE uniform application of new technologies in the collection, use and exchange of evidence. This Before going for any kind of classification the Road Map incorporating standardized solutions very first issue at stake has been to set the right would enable policy maker to realize an efficient scenario and to fix the range and scope of the regulation, treatment and exchange of digital categorization task with respect to the Project evidence, LEAs as well as judges/magistrates and aims and goals. In this sense, our aim is to prosecutors and lawyers practising in the criminal develop a framework for the application of new field to have at their disposal as technologies in the collection, use and exchange legal/technological background a Common of evidence between Courts of the EU Member European Framework allowing them to gather, states. So, the main keywords to be considered use and exchange digital evidences according to are: Source of Evidence, Authenticity, Evidence, common standards and rules. ICT and Exchange. In order to produce this common, unique The use of ICT associated with evidence is often European way/ approach to the treatment and described utilizing two main expressions: exchange of electronic evidence, the EVIDENCE Electronic Evidence and Digital Evidence. Is the project has identified as relevant the following first one different from the second or are they just steps: synonyms? • Developing a common and shared We know for sure that both electronic and digital understanding on what electronic evidence is and evidence originate from the so called sources of which are the relevant concepts of electronic evidence and that there is a specific need to carry evidence in involved domains and related fields on a forensics analysis in order to identify the (digital forensic, criminal law, criminal evidence itself. We are also aware of the fact that procedure, criminal international cooperation); these sources might be electronic, or non electronic and that in the latter case it can acquire • Detecting which are rules and criteria utilized the status of “digital/electronic evidence ” if for processing electronic evidence in EU Member digitized. States, and eventually how is the exchange of evidence regulated; The analysis of the most significant sources of information demonstrated that there is no uniform • Detecting of the existence of criteria and use of the terms that identify this domain. Indeed, standards for guaranteeing reliability, integrity both digital evidence and electronic evidence are and chain of custody requirement of electronic accepted terms in the scientific community. For evidence in the EU Member States and eventually instance the International Standard Document, in the exchange of it; ISO/IEC 2703, “Guidelines for identification, • Defining operational and ethical implications collection, acquisition, and preservation of digital for Law Enforcement Agencies all over Europe; evidence”, prefers the term digital evidence, because it refers to data that is already in a digital • Defining implications on data Privacy issues; format and does not cover the conversion from • Identifying and developing technological analogical data into digital one. On the other functionalities for a Common European hand, authoritative sources such as the Council of Framework in gathering and exchanging Europe have opted for the term Electronic electronic evidence; evidence in the recently published “Electronic evidence guide” (Council of Europe, 2013). • Seizing the EVIDENCE market.
26
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
Moreover there are many different definitions of evidence is that electronic evidence which is “Electronic/Digital Evidence”, each of them generated or converted to a numerical format. highlighting some, but not all, essential features. Therefore, the EVIDENCE Project activities are The following are the main definition we have based upon its own core definition, capable, in collected/analysed so far (Mason, 2012): our opinion to catch all various sides, challenges of Electronic Evidence, relying on its very general abstraction level. • any data stored or transmitted using a computer that support or refute a theory of how Based upon this definition our statement is that an offense occurred or that address critical within the Electronic Evidence category both elements of the offense such as intent or alibi those evidence that are “born digital” and “not (Carrier, 2006); born digital” but that may have become such during their life-cycle are to be included. • digital evidence is any data stored or transmitted using a computer that support or As a matter of fact electronic evidence and digital refute a theory of how an offense occurred or that evidence in our conceptualization do coincide address critical elements of the offense such as (see Figure 1). Therefore, we will assume that intent or alibi (Casey, 2011). semantically speaking Electronic Evidence is the broader class including both those records “born None of the above cited definitions of digital digital” as well as those ones “not born digital” evidence or electronic evidence matched our but digitized afterwards. Once the digitization needs, therefore we finally decided to adopt the process has been carried out the Evidence following original definition: becomes “electronic” even if it was originally Electronic Evidence is any data resulting from the “non electronic” or analogical. output of an analogue device and/or a digital Figure 1 depicts the relationship between the device of potential probative value that are Electronic Evidence and the other forms in which generated by, processed by, stored on or it may appear, with a specific focus on: transmitted by any electronic device. Digital
Figure 1: From Sources of Evidence to Electronic Evidence
27
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
• Not Electronic items that should be digitized, 3. ELECTRONIC EVIDENCE LIFE CYCLE and therefore are afterwards treated as they were Starting from the relevant concepts extracted both born-digital, once the authenticity is assured as manually and semi-automatically, this step of the related to the original one; project was focused on the identification and • Electronic items - some sort of analogical form, classification of the building blocks of the which, as in the case of the Not Electronic items, conceptual model oriented to the description of the should be digitized. Electronic Evidence domain. The structuring is mainly based upon the electronic evidence life- In the same Figure 1 it is to be noted that: cycle as described in Figure 2. Having clarified • Arrows represent the process of transformation starting point of the conceptualization and the needed to generate the transition from “Non choice of the term preferred for the categorization, Electronic” or from “Analogical” to Digital items. it is worthwhile to describe which is the flow to which actions are referred in the digital forensics • Lines show that no process is needed and that domain. Therefore a brief description of the digital the evidence is per se electronic. forensics procedures will outline the process used Of course the transition from Analogical or Not to manage electronic evidence. The very first Electronic to Electronic is not an essential step; it milestone starts with an incident, an unlawful may happen but is not mandatory. In this way we criminal, civil or commercial act, and sets the scene can include every type of evidence present in paper for the electronic evidence life-cycle scenario. documents, objects, court hearings with witnesses Indeed an artefact or a record enters into the and other, that, due to the increasing use of ICT, are forensic process only if an incident forces it to do frequently objects of digitization. Therefore, we so. Otherwise, for all of its natural lifespan the prefer to use the term Electronic evidence that in artefact or record will remain outside the forensic our opinion comprises a larger range of process and thus forensically irrelevant – though it items/potential evidence. may continue to be very relevant to its user or owner.
Figure 2: Electronic Evidence management timeline/life cycle
28
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
The phases we have taken into consideration are • Evidence Reporting: this is one of he most chiefly based on already existing investigative critical steps. After the completion of identification, process models and ISO 27043 that represents a acquisition and analysis activities digital evidence point reference with the aim of creating a specialists have to complete their job producing a harmonized model on the basis of about other report with all the activities carried out and the existing models. outcome achieved. The report must contain all The digital evidence management timeline/life- details to allow the specialists to testify before a cycle consists of six main different phases, Court only relying on that document. regarding the handling of electronic evidence, starting from the incident event: The investigation process model depicted in Figure • Case Preparation: this is the first step of the 2 represents a simplified view of the whole process, digital evidence management timeline and it because some concurrent processes have not been comprises organizational, technical and represented, such as Obtaining authorization, investigative aspects Documentation, Managing information flow, Preserving chain of custody, Preserving digital Evidence Identification: this is the step • evidence. Furthermore it’s not a sequential flow, it consisting of examining/studying the crime scene in may be circular in some points and it might have order to preserve, as much as possible, the original back up to certain steps, Such example could be: state of the digital/electronic devices that are going to be acquire. • The analysis can reveal that some references to data sources have not been acquired. • Evidence Handling: this is the step where it is defined which specific standard procedures are to • During the acquisition phase it might be be followed, based on the kind of device is being possible to reconsider the acquisition plan to handled. include more data sources. • Evidence Classification: this is the step • During presentation some questions may arise consisting of identifying the main features and the requiring further analysis in order to provide status of the device, taking notes about Case ID, satisfactory answers. Evidence ID, Seizure place/date/made by/ Evidence More and more evidence may be generated in the type, picture, status, etc. course of most court hearings with witnesses being • Evidence Acquisition: this is one of the most recorded and their testimony entered into the critical phase within the digital evidence handle official court record, irrespective of whether a case processes: the forensics specialist must take care of is criminal or civil. Furthermore in our specific the potential digital evidence in order to preserve its view once the reporting phase is accomplished, the integrity during the following processes till to the electronic evidence may open to the scenario of presentation before a Court. Electronic Evidence Exchange. In this case the further step dedicated to the Presentation may take • Evidence Analysis: this is a process heavily place before a National Court or before another EU affected by the kind of case under investigation, the Member State. type of evidence to be handled and the features related to each of the evidence to be examined (e.g. installed operating system, type of file system, etc.).
29
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
Figure 3: Overview of exchange data between Legal Authorities
Figure 3 outlines at a high level of description the Legal Issues work package7, is the identification of exchange process that takes place between the a legal framework in the EU Member States, Requesting and Requested Legal Authorities governing the implementation of new technologies involved in the case after the analysis or in processing evidence, including trans- border interpretation is completed. exchange. Some general consideration have been achieved on the basis of a pilot comparative study: 4. MID-TERM RESULTS • There is no comprehensive international or In order to produce the Road Map a specific set of European legal framework relating to e-evidence, objectives have been considered essential and a only few relevant legal instruments (e.g.: group of mid-term results have been achieved. Cybercrime Convention);
4.1 ELECTRONIC EVIDENCE DOMAIN • Although some regulation exists at national CATEGORIZATION level, rules vary considerably even among countries with similar legal traditions (e.g. on admissibility It has been developed, within the activities carried issues); out in the Categorization work package6, a common and shared understanding on what • It has been gradually developing an electronic evidence is and which are the relevant interpretative evolution of the national criminal concepts of electronic evidence in involved laws so to apply (also) to e-evidence (amendments domains and related fields such as digital forensic, to existing norms); criminal law, criminal procedure and criminal • There has been a increase in knowledge and international cooperation. expertise of actors involved in the handling of e- A mind map representation of the whole evidence, but lack of specific standards is still categorization is visible via the following address: missing; http://www.evidenceproject.eu/categorization. • Several national data protection laws have been modified as a consequence of the introduction of 4.2 LEGAL ISSUES PRELIMINARY antiterrorism measures; RESULTS • Different Laws and practices of member states On of the main goal of the project, addressed by the contribute to create a situation of legal and practical
6 The activities have been developed by the CNR-ITTIG (Italy) 7 The activities have been developed by the University of and CNR-IRPPS (Italy), partners of the Evidence project. Groningen (The Netherlands), partner of the Evidence project.
30
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
uncertainty. 4.4 DIGITAL FORENSICS TOOLS CATALOGUE 4.3 DATA PROTECTION ISSUES Starting from the Digital Evidence life-cycle shown Another crucial goal of the project, addressed by in Figure 2, there are already standards for many of the Data Protection Issues work package8, is the phases depicted. In particular for the acquisition identification of data protection issues and remedies and investigative processes the ISO 27043, ISO regarding the process of gathering and using 27037 and ISO 27042 represent points of reference. electronic evidence. In composing the overview of existing standard for The following general consideration have been the handling of electronic evidence, within the activities related to the Standard Issues work determined: package9, a huge number of digital forensics tools Secondary law: there is no valid regulations have been gathered and there has been created a addressing data protection issues related to the Digital Forensics Tools Catalogue, concerning collection of electronic evidence tools for the Acquisitive and Analysis phases as Conventions: Cybercrime Convention contains described at different levels of details by the procedural regulations on the collection of ISO/IEC standards, above mentioned. electronic evidence and data protection safeguards The Catalogue represents the overview of forensics European Convention on Mutual Assistance in tools for handling digital evidence, generally Criminal Matters addresses the exchange of accepted in the EU member states. The Catalogue, evidence in its current version 1.0 dated February 2015, comprises over 1.200 tools divided into two main Art. 82 (2) TFEU: The EU has a legal competence branches: Acquisition and Analysis. to harmonise particular aspects of criminal procedure law such as: The Digital Forensics Catalogue is visible via the following URL: http://wp4.evidenceproject.eu • admissibility, which includes rules on means of collecting electronic evidence; 4.5 MARKET SIZE MAP OF ACTORS This competence could be used to set up a Another relevant goal of the project, addressed by minimum standard of privacy safeguards to be the Market Size work package10, is the established in relation to the use of certain means of identification and classification of the main types collecting electronic evidence. of actors involved in the "social arena" of electronic evidence. Moreover in most domestic legal frameworks rather few and not necessarily sufficient and/or congruent There are two type of actors having a direct interest privacy safeguards related to electronic evidence in electronic evidence: exist. Such examples could be: • Process Actors: public and private actors • Procedural Law: Structure and Rules - very few involved in handling the electronic evidence; definitions of electronic evidence exist; • Context Actors: actors providing technical • Cross-Border Scenarios & International Law - solution and assistance in this field. in Cloud computing environments legal issues are Furthermore there are nine typological areas of not sufficiently or not at all addressed by law; Process Actors, in turn comprising a total of 40 • Investigative Measures - Existing rules often types of actors: apply both to physical and electronic evidence • Public law enforcement and Intelligence • Admissibility - Not regulated specifically
9 The activities have been developed by the CNR-ITTIG (Italy), partner of the Evidence project 8 The activities have been developed by the Leibniz Universität 10 The activities have been developed by the Laboratory of Hannover (Germany), partner of the Evidence project. Citizenship Sciences (Italy), partner of the Evidence project.
31
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
agencies (e.g. Law enforcement officers, • The media (e.g. Traditional and Social media, Detectives, Intelligence agencies); etc.); • Actors of legal criminal trial (e.g. Judges, • Enterprises interested in the proper functioning Prosecutors, Lawyers, etc.); of justice (e.g. Individual firms, Business associations); • Notaries; • Transnational projects (e.g. Digital forensics • Public register actors (e.g. Business register research projects and training); actors , Civil acts register actors, Landregister actors, etc.); • Other actors collecting evidence (e.g. Public and Private actors that collect data / potential • Forensic examiners (e.g. Fraud examiner, evidence). Forensic laboratory staff member, Digital Evidence First Responder, etc.); 5. ELECTRONIC EVIDENCE EXCHANGE • Private investigators; STAUS QUO OVERVIEW • Hardware producers (e.g. Hardware producers As far as the Exchange process (see Figure 3) is for Computer Forensics, for Mobile Forensics, concerned, there is no standard published or etc.); proposed, furthermore it represents one of the essential points of the EVIDENCE Project that aims • Technology/software producers (e.g. Software to facilitate and foster the exchange between houses that produce complete commercial toolkits different authorities and across the EU Member for forensic analyses, that make software for States. The project aims at defining functional specific commercial analyses, etc.); specifications for exchanging digital evidence, in • Service providers (e.g. Major consulting firms, such way that no matter what forensic tool is being Associated professional studios, etc.). used by an examiner, the results of his or her examination must be verifiable by another Finally ten typological areas of Context Actors, in examiner, independent of the tool being used as turn containing twenty six types of actors, can be long as the tools are comparable in specification enumerated: and function. • Specialized International Organizations (e.g. On the basis of the information gathered so far, it UN agencies concerned with justice and seems that, at the moment, in cross-borders technological innovation, etc.); criminal cases, cooperation is mostly based upon • Law making bodies (e.g. European international agreement or letter rogatory to the organizations, National governments); foreign Court. Independently from the legal framework identified by the EU Member States, the • Technological innovation actors linked to the cooperation is mostly human based where the Internet (e.g. Internet service providers, Cloud electronic evidence exchange is carried out between technology providers); judicial stakeholders from a source EU authority to • Legal and forensic associations and networks another judicial authority in the target EU member (e.g. General legal and forensic associations and state. This approach is similar across countries and, networks, Associations and networks concerned at first glance, the Exchange does not appear based with issues linked to new technologies); on any electronic means at all. In most cases the forensics copy of the original • Research bodies, associations and networks (e.g. Organizations and associations concerned with source of evidence is exchanged: a judicial/police Internet and ICT , Academic institutions concerned authority from an EU member state A (requested with ICT, etc.); authority) requests an EU member state B (requesting authority) to generate a forensics copy, • Actors involved in the field of human rights based on mutual trust between the two competent (e.g. Civil rights organizations, Privacy protection authorities. Later the exchange of the forensic copy organizations, etc.); will be attained on human based: the authority from 32
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
country A instructs someone to take the copy or the enforcement community and based on Universal Message Format (UMF) standard . SIENA is used copy is delivered by a secure courier to the for exchanging personal information related to the requested authority . In any case it has to emphasize crime areas within the mandate of Europol, that no electronic means is involved in the including EU restricted information. exchange process. 7. ELECTRONIC EVIDENCE EXCHANGE: To facilitate human cooperation, institutions such PROPOSED STANDARDS as EuroJust, EuroPol, InterPol put in place systems or platform in order to communicate/share relevant The requirement upon a standard language to information. represent a broad range of forensics information and processing result has become an increasing There are two different cross-borders cooperation need within the forensics community. For the levels: electronic evidence exchange a similar need has to • the judicial cooperation based, almost exclusively, be addressed even though the aim of the exchange via the regular international procedures for mutual may address different issues, for example malware assistance in criminal matters, regulated by strict analysis, relevant artifacts exchange, tools result procedures, time-consuming and unpredictable, but comparison. Research activities conducted in this the only way for an evidence exchange, field have been used to develop and propose many • the investigation cooperation simpler and quicker languages. but only for operational, technical information or CybOX (Cyber Observable eXpression) is one of coordination activities. During investigations there the most important languages that have been may be an information exchange that cannot be recently proposed. It has been devised along with used during the trial over the pleading stage. other related languages, by Mitre.org such as CAPEC (Common Attack Pattern Enumeration and In many cases judicial authorities act relying on international agreement established through Classification), STIX (Structured Threat Eurojust to coordinate investigations and Information eXpression) and TAXII (Threat prosecutions between the EU Member States when Automated eXchange of Indicator Information). dealing with cross-border crime. The use of standard languages for the information The exchange of the electronic evidence should exchange has been dealt in recent scientific take place in a secure environment, relying on a contributions, published in 2014, by the European Union Agency for Network and Information service for exchanging the evidence in a secure manner. In order to achieve this goal such a service Security (ENISA) and in particular Actionable will rely on digital certificates in order to certify the information for Security Incident Response and in proprietary of a public key. This would allow any Standards and tools for exchange and processing of actionable information . judicial authorities (relying parties) to rely upon signatures or assertions made by the private key Another relevant resource is a recent document that corresponds to the certified public key. (Casey, 2014), that proposed DFAX (Digital Forensic Analysis eXpression), that leverages 6. ELECTRONIC EVIDENCE EXCHANGE: CybOX for representing the technical information. EXISTING PLATFORMS There are already existing platforms for the 8. ELECTRONIC EVIDENCE EXCHANGE CHALLENGES information exchange, but, for confidential reasons it has almost been impossible to collect detailed The regular international procedures for mutual information about their architecture and the kind of assistance in criminal matters are time-consuming information exchanged. The main important system and unpredictable, but they represent, at the in the evidence exchange is: SIENA, that stands for moment, the only way for the evidence exchange. Secure Information Exchange Network Nevertheless the current situation may pose Application. It is a secure communication system obstacles for fighting against serious cross-border managed by Europol, dedicated to the EU law 33
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
and organized crime especially in investigative case the exchange is managed through platforms where time is crucial. provided by ISPs via web. This scenario may pose serious issues: Furthermore, when it comes to Electronic Evidence Exchange, a group of questions are to be born in • exchange evidence procedures may be slow: it mind: must be especially born in mind in investigative cases where time is crucial for fighting against • What information should be exchanged? serious cross-border and organized crime; • When may the exchange take place? • exchange evidence procedures may involve big • How the information could be exchanged, even expenses, such as in the case of traveling abroad to taking into consideration security issues? take the original/copy source of evidence to be handled; • Which kind of stakeholders are involved? Judicial and Police authorities must invest lots The present situation raises three main issues: • of money to keep up with the development of • exchange evidence procedures may be slow. forensics technology: expenses related to software This aspect must be especially born in mind in updating and keeping up personnel competencies; investigative cases where time is crucial for exchange desperately needs trusted procedures fighting against serious cross-border and organized • and environments between involved stakeholders crime; So the way forward for the electronic evidence exchange evidence procedures may involve big • exchange would be introducing a cloud expenses, such in case of travelling abroad to take environment to be used from judicial and police the original/copy source of evidence to be handled; authorities and by private stakeholders in order to • Judicial and Police authorities must invest lots speed up the process, optimize costs and foster a of money to keep up with the development of more developed cooperation and trust among the forensics technology. involved competent authorities. Moreover, using this platform could be possible to carry out an In order to address the issues a possible solution electronic evidence exchange using specific meta could be using a cloud environment , centralized or data along with the data related to the source of distributed, for exchanging/sharing evidence where evidence. This meta data, expressed in an open the users could be competent authorities (e.g. standard language could describe the digital judicial, police, etc.) but private subjects as well. evidence in a unique way and be used by software This platform could speed up the exchange companies/producers to represent the widest range procedures and it could avoid, except for special of forensic information and forensic processing cases, travelling abroad to take the original source results in order to share structured information of evidence. Moreover, through a digital platform, a between independent tools and organizations. wider cooperation could be put in place and, for example, specific technical support could be REFERENCES requested through the same digital platform, from a police authority to another located in a different EU Carrier, B. (2006). Hypothesis-Based Approach to member state. A more developed technological Digital Forensic Investigations. Center for cooperation among the involved authorities could Education and Research in Information optimize costs and better distribute resources. Assurance and Security. Purdue University.
9. CONCLUSIONS Casey, E. (2011). Digital Evidence and Computer Crime. Forensic Science, Computers, and the At the moment, there is no standard for the Internet. Elsevier, Third Edition. exchange and it is mostly human based. Only in case of data held by third-parties there is a well- Casey, E., Back, G., Barnum, S. (2015). Leveraging established cooperation between judicial authorities CybOX to standardize representation and and Internet Service Providers (ISP). In this context
34
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
exchange of digital forensic information. Digital Investigation, 12S, 102-110. Elsevier.
Council of Europe. (2013). Electronic Evidence Guide. Retrieved on February 2015 from http://www.coe.int/t/dghl/cooperation/economicc rime/cybercrime/Documents/Electronic%20Evid ence%20Guide/default_en.asp
Daniel, L., Daniel, L. (2011). Digital Forensics for Legal Professionals. Syngress Media Inc.
ISO/IEC 27037. (2012). Guidelines for identification, collection, acquisition and preservation of digital evidence. Retrieved on March 2015 from http://www.iso.org/iso/home/store/catalogue_tc/c atalogue_detail.htm?csnumber=44381
ISO/IEC 27043. (2015). Incident investigation principles and processes. Retrieved on March 2015 from http://www.iso.org/iso/home/store/catalogue_tc/c atalogue_detail.htm?csnumber=44407
Garfinkel, S. L. (2012). Digital forensics XML and the DFXML toolset. Digital Investigation. Elsevier.
Mason, S. (2012). Electronic Evidence, third edition. LexisNexis Butterworths.
Peterson, G., Sujeet, S. (2012). Advances in Digital Forensics VIII, Editors: Peterson, Gilbert, Shenoi. Springer.
35
Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
A COLLISION ATTACK ON SDHASH SIMILARITY HASHING Donghoon Chang, Somitra Kr. Sanadhya, Monika Singh, Robin Verma, Indraprastha Institute of Information Technology Delhi (IIIT-D), India. donghoon,somitra,monikas,robinv @iiitd.ac.in { } ABSTRACT Digital forensic investigators can take advantage of tools and techniques that have the capability of finding similar files out of thousands of files up for investigation in a particular case. Finding similar files could significantly reduce the volume of data that needs to be investigated. Sdhash is a well-known fuzzy hashing scheme used for finding similarity among files. This digest produces a ‘score of similarity’ on a scale of 0 to 100. In a prior analysis of sdhash, Breitinger et al. claimed that 20% contents of a file can be modified without influencing the final sdhash digest of that file. They suggested that the file can be modified in certain regions, termed ‘gaps’, and yet the sdhash digest will remain unchanged. In this work, we show that their claim is not entirely correct. In particular, we show that even if 2% of the file contents in the gaps are changed randomly, then the sdhash gets changed with probability close to 1. We then provide an algorithm to modify the file contents within the gaps such that the sdhash remains unchanged even when the modifications are about 12% of the gap size. On the attack side, the proposed algorithm can deterministically produce collisions by generating many di↵erent files corresponding to a given file with maximal similarity score of 100.
Keywords: Fuzzy hashing, similarity digest, collision, anti-forensics.
1. INTRODUCTION vestigator might be interested in looking only at files similar to a given file in order to investigate The modern world has been turning increasingly modifications to that file. digital: conventional books have been replaced Most forensic software packages contain tools by ebooks, letters have been replaced by emails, which check for ‘similarity’ between files. Auto- paper photographs have been replaced by digi- matic filtering is normally done by measuring the tal image and compact audio and video cassettes amount of correlation between files. However, have been replaced by mp3 and mp4 CD/DVD’s. correlation method does not work well if the ad- Due to the reducing costs of storage devices and versary deliberately modifies the file in such a their ever increasing size, people tend to store manner that the correlation value becomes very several (maybe, slightly di↵erent) versions of a low. For example, a C program can be mod- file. In case a person is suspected of some ille- ified by changing the names of variables, writ- gal activity, security agencies typically seize their ing looping constructs in a di↵erent way, adding digital devices for investigation. Manual foren- comments etc. Ideally, an investigator would like sic investigation of enormous volume of data is to e ciently know the percentage change in two hard to complete in a reasonable amount of time. versions of a file so that he can concentrate on Therefore, it may be helpful for an investigator files which are slightly di↵erent from a desired to reduce the data under investigation by elimi- file. Using Cryptographic Hash Function (CHF) nating similar files from the suspect’s hard disk. as a digest of the file does not work in this situa- On the other hand, in some situations, the in- tion as even a single bit change in the file content 36 Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
is expected to modify the entire digest randomly The rest of the paper is organized as follows: by the application of a CHF. We discuss related literature in 2. Notations § ‘Approximate Matching’ is a technique for and definitions used in the paper are provided in finding similarity among given files, typically by 3. The sdhash scheme is explained in 4 and § § assigning a ‘similarity score’. An approximate existing analysis of the scheme is presented in 5. § matching technique can be characterized into 6 contains our analysis and attack on sdhash, § one of the following categories: Bytewise Match- followed by our proposed algorithm . Finally, we ing, Syntactic Matching and Semantic Match- conclude the paper in 7 and 8 by proposing § § ing (Breitinger, Guttman, McCarrin, & Roussev, solutions to mitigate our attack on sdhash. 2014). Bytewise Matching relies on the byte se- quence of the digital object without considering 2. RELATED WORK the internal structure of the data object. These The first fuzzy hashing technique, Context Trig- techniques are known as fuzzy hashing or simi- gered Piecewise Hashing (CTPH) was proposed larity hashing. Syntactic Matching relies on the by Kornblum (Kornblum, 2006) in his tool internal structure of the data object. It is also named ssdeep. The CTPH scheme is based called Perceptual Hashing or Robust Hashing. on the spamsum algorithm proposed by Andrew Semantic Matching relies on the contextual at- et al. (Tridgell, 2002) for spam email detection. tributes of the digital objects. Sdhash, proposed The ssdeep tool computes a digest of the given by Roussev (Roussev, 2010a) in 2010, is one of file by first dividing the file into several chunks the most widely used fuzzy hashing schemes. It and then by concatenating the least significant 6- is used as a third party module in the popular bits of the hash value of each chunk. A hash func- forensic toolkit ‘Autospy/Slueuth-kit’ 1 and in tion named FNV is used to compute the hash of another toolkit ‘BitCurator’ 2. each chunk. Breitinger et al. analyzed sdhash Chen et al. (Chen & Wang, 2008) and Seo et in (Breitinger, Baier, & Beckingham, 2012; al. (Seo, Lim, Choi, Chang, & Lee, 2009) pro- Breitinger & Baier, 2012) and commented that posed some modifications to ssdeep to improve “approximately 20% of the input bytes do its e ciency and security. Baier et al. (Baier not influence the similarity digest. Thus it is & Breitinger, 2011) presented thorough security possible to do undiscovered modifications within analysis of ssdeep and showed that it does not gaps”. In this work, we show that this claim withstand an active adversary for blacklisting is not entirely correct. We show that if data and whitelisting. between the ‘gaps’ is randomly modified then Roussev et al. (Roussev, 2009, 2010a) pro- the digest changes even when the modifications posed a new fuzzy hashing scheme called sdhash. are only about 2% of the ‘gap size’. After that The basic idea of sdhash scheme is to identify we propose an algorithm which can generate statistically improbable features based on the en- multiple files having sdhash similarity score of tropy of consecutive 64 byte sequence of file data 100 corresponding to a given file, by modifying (which is called a ‘feature’) in order to generate upto 12% of the ‘gap size’. The proposed the final hash digest of the file. Breitinger et algorithm can also be used to carry out an anti- al. (Breitinger & Baier, 2012) showed some weak- forensic mechanism that defeats the purpose nesses in sdhash and presented improvements to of digital forensic investigation by filtering out the scheme. Detailed security and implementa- similar files from a given storage media. An tion analysis of sdhash was done in (Breitinger attacker could generate multiple dissimilar files et al., 2012) by the same authors. This work un- corresponding to a particular file with 100% covered several implementation bugs and showed matching sdhash digest using our technique. that it is possible to beat the similarity score by tampering a given file without changing the per- 1http://wiki.sleuthkit.org/index.php?title= Autopsy 3rd Party Modules ceptual behavior of this file (e.g. image files look 2http://wiki.bitcurator.net/?title=Software almost same despite the tampering). 37 Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering
3. NOTATIONS 4. DESCRIPTION OF Following notations are used throughout this SDHASH work: We now describe the working of sdash using the notation defined in 3. Given a data object D D denotes the input data object of N bytes, § • of length N bytes (B0B1B2.....BN 1), a feature D=B B B .....B ,whereB is the ith byte 0 1 2 N i f is a subset of L (= 64) consecutive bytes of D, of D . k that is f :B B B ...B where 0 k bf represents the number of features within H(nbfk)= (P [nbfk = x]log2P [nbfk = x]) • x ↵ bloom filter bf. 2 P H (nbf )= log ↵ = 8 and H (nbf )=0. bf denotes number of bits set to one within max k 2 min k | | H(nbfk) •| | Normalized entropy of nbfk is = the bloom filter bf. Hmax(nbfk) H(nbfk) 8 . Range of normalized entropy of nbfk t denotes some threshold (sdhash uses t = • is 0 to 1. It is being scaled up to the range 0 to 16). 1000 and represented by Hnorm(nbfk): SFscore(bf 1,bf 2) represents the similarity H(nbf ) • H (nbf )= 1000 k score of bloom filter bf 1 and bf 2. norm k ⇤ 8 ⌫ 38 Proceedings of 10th Intl. Conference on Systematic Approaches to Digital Forensic Engineering After calculating the normalized entropy of each t1000=Pr[nenfdQ=1000]. feature, a precedence rank is assigned to the re- spective feature of the data object D based on We assign a rank ri to each ti as follows: ri=1000 the empirical observation of probability density if ti is the largest, and ri=0 if ti is the smallest. function for normalized entropy of experimental Now each feature fk of D is assigned a precedence data set. rank Rprec,D(fk) as follows: Let Q is the experimental data set of q data ob- jects D1D2D3.....Dq of same type and same size. fk of D, Rprec,D(fk)=ri,where Here the random variable is normalized entropy 8 Pr[nenfd Q=Hnorm(nbfk)]=ti of next data object’s nbfk of set Q, represented as nenfd Q. Let A is a set of integers from 0 to where D is the given data object, n is number of 1000 i.e. 0,1,2,....,1000. features of data object D and 0 k et al., 2012). Two of the implementation bugs, ‘Window size bug’ and ‘left most bug’ mentioned in (Breitinger et al., 2012) still exist in the latest version 3.4 of sdhash implementation. Listing 1 shows the implementation of above stated bugs. At line number 13, there is an error in first condition that causes incorrect identifica- Figure 2: Popularity Rank Calculation from tion of minimum precedence rank (Rprec,D(fk)), (Roussev, 2009, 2010a) referred to as the ‘Window size bug’. This error can be removed by replacing the first condition of while loop with ‘chunk ranks[i+pop win-1] Now features with R (f ) t(threshold) pop,D k min rank’. There is another error in the if con- are selected (in sdhash implementation t=16). dition at line number 14-15 and 26-27, that has Selected features are the least likely features been referred to as the ‘Left most bug’. to occur in any data object. These features If two features (fi,fj) have equal precedence are called “Statistically Improbable Features”. rank (Rprec,D(fi)=Rprec,D(fj)) and are lowest These Statistically Improbable Features will be within a popularity window, then this con- used to generate fingerprint of the data object dition will cause the selection of right most D. Let f ,f ,. . . .,f are the selected fea- { s0 s1 sx } feature that contradicts the proposed sdhash tures, where 0