Fragments and Back Again

Linköping Studies in Science and Technology Thesis No. 1361 Completing the Picture — Fragments and Back Again by Martin Karresand Submitted to Linköping Institute of Technology at Linköping University in partial fulfilment of the requirements for the degree of Licentiate of Engineering Department of Computer and Information Science Linköpings universitet SE-581 83 Linköping, Sweden Linköping 2008 Completing the Picture — Fragments and Back Again by Martin Karresand May 2008 ISBN 978-91-7393-915-7 Linköping Studies in Science and Technology Thesis No. 1361 ISSN 0280–7971 LiU–Tek–Lic–2008:19 ABSTRACT Better methods and tools are needed in the fight against child pornography. This thesis presents a method for file type categorisation of unknown data fragments, a method for reassembly of JPEG fragments, and the requirements put on an artificial JPEG header for viewing reassembled images. To enable empirical evaluation of the methods a number of tools based on the methods have been implemented. The file type categorisation method identifies JPEG fragments with a detection rate of 100% and a false positives rate of 0.1%. The method uses three algorithms, Byte Frequency Distribution (BFD), Rate of Change (RoC), and 2-grams. The algorithms are designed for different situations, depending on the requirements at hand. The reconnection method correctly reconnects 97% of a Restart (RST) marker enabled JPEG image, fragmented into 4 KiB large pieces. When dealing with fragments from several images at once, the method is able to correctly connect 70% of the fragments at the first iteration. Two parameters in a JPEG header are crucial to the quality of the image; the size of the image and the sampling factor (actually factors) of the image. The size can be found using brute force and the sampling factors only take on three different values. Hence it is possible to use an artificial JPEG header to view full of parts of an image. The only requirement is that the fragments contain RST markers. The results of the evaluations of the methods show that it is possible to find, reassemble, and view JPEG image fragments with high certainty. This work has been supported by The Swedish Defence Research Agency and the Swedish Armed Forces. Department of Computer and Information Science Linköpings universitet SE-581 83 Linköping, Sweden Acknowledgements This licentiate thesis would not have been written without the invaluable support of my supervisor Professor Nahid Shahmehri. I would like to thank her for keeping me and my research on track and having faith in me when the going has been tough. She is a good role model and always gives me support, encourage- ment, and inspiration to bring my research forward. Many thanks go to Helena A, Jocke, Jonas, uncle Lars, Limpan, Micke F, Micke W, Mirko, and Mårten. Without hesitation you let me into your homes through the lenses of your cameras. If a picture is worth a thousand words, I owe your more than nine millions! I also owe a lot of words to Brittany Shahmehri. Her prompt and thorough proof-reading has indeed increased the readability of my thesis. I would also like to thank my colleagues at the Swedish Defence Research Agency (FOI), my friends at the National Laboratory of Forensic Science (SKL) and the National Criminal Investigation Department (RKP), and my fellow PhD students at the Laboratory for Intelligent Information Systems (IISLAB) and the Division for Database and Information Techniques (ADIT). You inspired me to embark on this journey. Thank you all, you know who you are! And last but not least I would like to thank my beloved wife Helena and our lovely newborn daughter. You bring happiness and joy to my life. Finally I acknowledge the financial support by FOI and the Swedish Armed Forces. Martin Karresand Linköping, 14th April 2008 Contents 1 Introduction 1 1.1 Motivation . 1 1.2 Problem Formulation . 2 1.3 Contributions . 4 1.4 Scope . 5 1.5 Outline of Method . 6 1.6 Outline of Thesis . 6 2 Identifying Fragment Types 9 2.1 Common Algorithmic Features . 9 2.1.1 Centroid . 9 2.1.2 Length of data atoms . 10 2.1.3 Measuring Distance . 11 2.2 Byte Frequency Distribution . 11 2.3 Rate of Change . 14 2.4 2-Grams . 21 2.5 Evaluation . 22 2.5.1 Microsoft Windows PE files . 25 2.5.2 Encrypted files . 26 2.5.3 JPEG files . 27 2.5.4 MP3 files . 29 2.5.5 Zip files . 29 2.5.6 Algorithms . 30 2.6 Results . 30 2.6.1 Microsoft Windows PE files . 32 2.6.2 Encrypted files . 32 2.6.3 JPEG files . 33 2.6.4 MP3 files . 37 2.6.5 Zip files . 37 2.6.6 Algorithms . 40 i 3 Putting Fragments Together 43 3.1 Background . 43 3.2 Requirements . 46 3.3 Parameters Used . 47 3.3.1 Background . 47 3.3.2 Correct decoding . 49 3.3.3 Non-zero frequency values . 50 3.3.4 Luminance DC value chains . 51 3.4 Evaluation . 52 3.4.1 Single image reconnection . 53 3.4.2 Multiple image reconnection . 53 3.5 Result . 54 3.5.1 Single image reconnection . 54 3.5.2 Multiple image reconnection . 57 4 Viewing Damaged JPEG Images 59 4.1 Start of Frame . 59 4.2 Define Quantization Table . 66 4.3 Define Huffman Table . 67 4.4 Define Restart Interval . 70 4.5 Start of Scan . 72 4.6 Combined Errors . 75 4.7 Using an Artificial JPEG Header . 75 4.8 Viewing Fragments . 76 5 Discussion 79 5.1 File Type Categorisation . 79 5.2 Fragment Reconnection . 81 5.3 Viewing Fragments . 82 5.4 Conclusion . 83 6 Related Work 85 7 Future Work 93 7.1 The File Type Categorisation Method . 94 7.2 The Image Fragment Reconnection Method . 95 7.3 Artificial JPEG Header . 95 Bibliography 97 A Acronyms 103 B Hard Disk Allocation Strategies 105 C Confusion Matrices 107 ii List of Figures 2.1 Byte frequency distribution of .exe . 13 2.2 Byte frequency distribution of GPG . 13 2.3 Byte frequency distribution of JPEG with RST . 14 2.4 Byte frequency distribution of JPEG without RST . 15 2.5 Byte frequency distribution of MP3 . 15 2.6 Byte frequency distribution of Zip . 16 2.7 Rate of Change frequency distribution for .exe . 18 2.8 Rate of Change frequency distribution for GPG . 18 2.9 Rate of Change frequency distribution for JPEG with RST . 19 2.10 Rate of Change frequency distribution for MP3 . 20 2.11 Rate of Change frequency distribution for Zip . 20 2.12 2-gram frequency distribution for .exe . 22 2.13 Byte frequency distribution of GPG with CAST5 . 25 2.14 ROC curves for Windows PE files . 33 2.15 ROC curves for an AES encrypted file . 34 2.16 ROC curves for files JPEG without RST . 34 2.17 ROC curves for JPEG without RST; 2-gram algorithm . 35 2.18 ROC curves for files JPEG with RST . 36 2.19 ROC curves for MP3 files . 38 2.20 ROC curves for MP3 files; 0.5% false positives . 38 2.21 ROC curves for Zip files . 39 2.22 Contour plot for a 2-gram Zip file centroid . 40 3.1 The frequency domain of a data unit . 45 3.2 The zig-zag ordering of a data unit traversal . 46 3.3 The scan part binary format coding . 49 4.1 The original undamaged image . 60 4.2 The Start Of Frame (SOF) marker segment . 60 4.3 Quantization tables with swapped sample rate . 62 4.4 Luminance table with high sample rate . 62 4.5 Luminance table with low sample rate . 64 4.6 Swapped chrominance component identifiers . 64 4.7 Swapped luminance and chrominance component identifiers . 65 4.8 Moderately wrong image width . 65 iii 4.9 The Define Quantization Table (DQT) marker segment . 66 4.10 Luminance DC component set to 0xFF . 68 4.11 Chrominance DC component set to 0xFF . 68 4.12 The Define Huffman Table (DHT) marker segment . 69 4.13 Image with foreign Huffman tables definition . 71 4.14 The Define Restart Interval (DRI) marker segment . 71 4.15 Short restart interval setting . 71 4.16 The Start Of Scan (SOS) marker segment . 72 4.17 Luminance DC Huffman table set to chrominance ditto . 74 4.18 Complete exchange of Huffman table pointers . 74 4.19 A correct sequence of fragments . 78 4.20 An incorrect sequence of fragments . 78 5.1 Possible fragment parts . 82 iv List of Tables 2.1 Camera.

Load more