Dataset for File Fragment Classification of Image File Formats

Fakouri and Teimouri BMC Res Notes (2019) 12:774 https://doi.org/10.1186/s13104-019-4812-0 BMC Research Notes DATA NOTE Open Access Dataset for fle fragment classifcation of image fle formats Reyhane Fakouri and Mehdi Teimouri* Abstract Objectives: File fragment classifcation of image fle formats is a topic of interest in network forensics. There are a few publicly available datasets of fles with image formats. Therewith, there is no public dataset for fle fragments of image fle formats. So, a big research challenge in fle fragment classifcation of image fle formats is to compare the perfor- mance of the developed methods over the same datasets. Data description: In this study, we present a dataset that contains fle fragments of ten image fle formats: Bitmap, Better Portable Graphics, Free Lossless Image Format, Graphics Interchange Format, Joint Photographic Experts Group, Joint Photographic Experts Group 2000, Joint Photographic Experts Group Extended Range, Portable Network Graphic, Tagged Image File Format, and Web Picture. Corresponding to each format, the dataset contains the fle fragments of image fles with diferent compression settings. For each pair of fle format and compression setting, 800 fle fragments are provided. Totally, the dataset contains 25,600 fle fragments. Keywords: Classifcation, File formats, File fragments, Image fle formats Objective Better Portable Graphics (BPG), Free Lossless Image A large amount of Internet trafc is used for exchanging Format (FLIF), Graphics Interchange Format (GIF), image fle formats. As the sizes of these fles are usually Joint Photographic Experts GROUP (JPEG), Joint Pho- much bigger than the maximum network packet size, the tographic Experts Group 2000 (JPEG 2000), Joint Pho- fles are segmented into fragments. Te fragments gener- tographic Experts Group Extended Range (JPEG XR), ated by various users are transmitted over the network. Portable Network Graphic (PNG), Tagged Image File Some of these fragments can be received by the network Format (TIFF), and Web Picture (WEBP). Corresponding surveillance unit. Te network surveillance unit may wish to each format, the dataset contains the fle fragments of to detect the fle format of each fragment for network image fles with diferent compression settings. forensics purposes. Some researches have been carried in the feld of fle Data description fragment classifcation of image fle formats [1, 2]. Tere First, the whole set of raw image fles is downloaded from are a few publicly available datasets of fles with diferent the RAISE project [4]. Tese raw fles are then converted formats [3]. Terewith, there is no public dataset for fle in order to obtain image fles in ten diferent formats: fragments of image fle formats. Tis makes it difcult for BMP, BPG, FLIF, GIF, JPEG, JPEG 2000, JPEG XR, PNG, other researchers to compare the proposed methods with TIFF, and WEBP. For each image fle format, diferent the existing methods. compression settings are considered. Each raw image is In this study, we present a dataset that contains fle converted into a specifc fle format using a particular fragments of ten image fle formats: Bitmap (BMP), compression setting. So, the contents of any two image fles are not the same. 32 pairs of fle format and compression setting are con- *Correspondence: [email protected] sidered. For each pair of fle format and compression Information Theory and Coding Laboratory, University of Tehran, Tehran, setting, we have 160 compressed images. So, totally we Iran © The Author(s) 2019. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creat iveco mmons .org/licen ses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Fakouri and Teimouri BMC Res Notes (2019) 12:774 Page 2 of 3 have 5120 image fles. Each of these fles is segmented individual data set shown in Table 1. For example, data into 1 Kbyte (i.e. 1024 bytes) fragments. Ten, fve frag- set 1 (i.e. BMP1.dat) contains 800 fragments of uncom- ments are randomly selected among the fragments of pressed BMP fles. Data sets are provided in a generic each fle. Before randomly selecting the fragments, 12.5% binary data fle format with .dat fle extension. of the initial fragments and 12.5% of the fnal fragments Data fle 1 (i.e. SettingsTable.pdf) contains a table that of each fle are discarded. Tis is to ensure that the frag- specifes 32 pairs of fle format and compression setting. ments do not contain the fle headers or trailers. In this table, the software program employed for generat- For each pair of fle format and compression setting, ing each fle format is also specifed. Data fle 2 (i.e. Con- we have 800 fle fragments. So, the dataset of fle frag- versionSettings.zip) contains several screenshots of the ments contains 25,600 fle fragments. Te dataset is par- software programs that display the employed compres- titioned according to 32 diferent pairs of fle format and sion settings. Data fle 3 (i.e. ReadFragments.m) is a script compression setting. Each partition is represented by an in MATLAB language that reads all the fragments from Table 1 Overview of data fles/data sets Label Name of data fle/data set File types (fle extension) Data repository (DOI) Data fle 1 SettingsTable Portable document format (.pdf) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data fle 2 ConversionSettings Archive fle format (.zip) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data fle 3 ReadFragments Matlab script fle (.m) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 1 BMP1 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 2 BMP2 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 3 BMP3 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 4 BPG1 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 5 BPG2 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 6 BPG3 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 7 BPG4 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 8 FLIF Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 9 GIF1 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 10 GIF2 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 11 JPEG1 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 12 JPEG2 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 13 JPEG3 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 14 JPEG4 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 15 JPF1 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 16 JPF2 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 17 JPF3 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 18 JPF4 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 19 JXR1 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 20 JXR2 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 21 JXR3 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 22 JXR4 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 23 PNG1 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 24 PNG2 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 25 PNG3 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 26 PNG4 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 27 TIF1 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 28 TIF2 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 29 WEBP1 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 30 WEBP2 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 31 WEBP3 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Data set 32 WEBP4 Generic binary data (.dat) OSF (https ://doi.org/10.17605 /OSF.IO/YH3XP ) Fakouri and Teimouri BMC Res Notes (2019) 12:774 Page 3 of 3 one or more specifc data sets. By running this script and Funding The authors declare no source of funding. selecting some data set fles, the fragments contained in these data sets are read and stored in a variable name Availability of data materials Dataset. Variable Dataset is a MATLAB cell array with The data described in this Data note can be freely and openly accessed on OSF at https ://doi.org/10.17605 /OSF.IO/YH3XP [5].

Dataset for File Fragment Classification of Image File Formats

Exploring the .BMP File Format

Free Lossless Image Format

TS 101 499 V2.2.1 (2008-07) Technical Specification

Thank You for Listening to My Presentation Gif

Energy-Efficient Design of the Secure Better Portable

The GIF That Keeps on Giving: Assignment Design with Looped Animations

51 Document Output

Understanding Image Formats and When to Use Them

Improved Lossy Image Compression with Priming and Spatially Adaptive Bit Rates for Recurrent Networks

Universitas Internasional Batam Design And

Effects of JPEG2000 on the Information and Geometry Content of Aerial Photo Compression

Image Data Martin Spitaler