The Analysis of Process Using Photorec and Foremost

Nurhayati, Nurul Fikri Department of Informatics Engineering, Faculty of Science and Technology Syarif Hidayatullah State Islamic University Jakarta Jl.Ir.H.Juanda No.95 Ciputat 15412 Jakarta-Indonesia [email protected], [email protected]

Abstract— Rapid development of computer is followed by multiplatform, making it is easy to run and does not require development of digital storage device. One common problem of configuration. PhotoRec possesses carving by examining digital storage device is data loss. The problem of data loss could every existing blocks on the storage media. Meanwhile, be solved by using file carving techniques, for example. File Foremost based on , can only be used on Linux and carving techniques could be performed using carving tools, such requires a configuration in the process of carving. Foremost do as PhotoRec and Foremost. This research was conducted to know the carving process by finding a header and footer files. and to compare performance of carving process from PhotoRec PhotoRec and Foremost will be used in research to restore and Foremost based on three parameters, which are the number files with various types: jpg, png, bmp, and tif, and of return files, file validation, and the rate of process. The multimedia files such as audio and video such as wav, mp3, research used simulation methods. The process of file validation wma, mp4, mkv, avi and flv. uses hash algorithm SHA1 to make sure the similarity between original and return files. The result of this research shown with Based on the explanation above, research titled "Analysis table that PhotoRec has a higher performance than Foremost. of File Carving Process Using PhotoRec and Foremost" has PhotoRec have return files less than Foremost, but PhotoRec has been done and expected to be a reference for further a higher percentage of valid files than Foremost. Additionally, development of carving tools. In addition, this research is also the rate of carving file process done by PhotoRec is higher than expected to be a reference to choose a carving tool and to give Foremost. Finally, the research reaches the conclusion that knowledge to people about the technique of returning data PhotoRec is better than Foremost. loss. Keywords— File carving, PhotoRec, Foremost, SHA1, Hash Algorithm, II. RELATED WORK Data loss [7] can occur due to various causes, among I. INTRODUCTION which is a virus, human error, hardware malfunction or a Technological development is followed by the system error. Data loss can occur in a variety of files, like development of digital storage media. Digital storage media documents, photos, and email. Based on a survey by statistic cannot be spared from a problem such as data loss [7]. The (2015), digital data that is missing mostly are data in the form data loss occurs due to many factors, which are the virus, of videos and photos [9, 16]. hardware malfunction, software malfunctions and corrupt. To Data loss can be overcome by using one of the methods in overcome this, a method to restore data which is called file which is called file carving [8,13]. Carving carving used [3, 14]. file is an important aspect of computer forensics and has a There are various carving tools which have their respective major impact on computer forensics because it adds the advantages, among which are PhotoRec, Foremost, Scalpel, flexibility to retrieve stored information from the underlying EnCase, FTK, and Adroit Photo Forensics [4]. Regardless of [12]. There are several carving tools that can be the level of sophistication, the performance of each carving used for computer forensics. The simplest carving tool works tools are different. This was caused by differences in methods by finding the header and footer. A more sophisticated carving and configurations used by the carving tools itself. There are tool performs validation before the file is stored into disk, three parameters that are commonly used by forensic called validating file carvers. Meanwhile, the most advanced practitioners in assessing the performance of a carving tool: carving tool can collect back fragmented files called the percentage of the return file, correctness and reliability of fragmented file recovery [5]. the results provided by the tool, as well as the speed in the Byeongyeong do research return of multimedia files that carving process [1, 10]. are in a compressed state but only 3 types of multimedia files This research used PhotoRec and Foremost as carving tool used are AVI, WAV, and MP3 [17]. Meanwhile, Al Jumah in the process of file carving. PhotoRec has been selected for watched the results of carving coming from disk image and an application that can be run in different OS or called disk drive. The study was conducted to determine whether there are differences in carving results between disk image and disk drive [6, 15, 2]. Imagin Carvin The method of carving file can restore data loss. However, g g reference [10, 14] stated that new problems arise with no Proces Proces many carving tools available, such as not knowing the s s capabilities and limits on the carving tools so it is not effective (Fore to restore the data. Based on the explanation above, this study performed most) analysis file carving using PhotoRec and Foremost. The study aims to determine the performance of the process of carving. The criteria to be used in the form of speed file analysis carving carving process, the number of returned files and validation of returned files.

III. PROPOSED SOLUTION This study uses a simulation, the simulation method consists of several stages as follows:

Fig 2. Process of carving through the disk drive A. Conceptual Model There are two concepts in this research: the process of The components in each architecture is as follows: carving through the disk drives directly and the process of 1. Storage Device carving through the disk image [11]. Storage device or storage medium is used as input. The author uses a storage medium a Toshiba 16GB flash drive in this study. The flash provides a wide range of image files and multimedia, such as video and audio. This study is conditioning the flash in a state of formatted files contained has not been affected or overwritten by other files, so returning data using carving technique could be performed.

2. Imaging Imaging is used to create clones of the storage media in the form of an image. Making the image will be performed using dc3dd. The formation of this imaging will generate a file ends in .img, .dd, or.raw. The author uses imaging to the process of carving using PhotoRec and Foremost.

3. Carving Tools Carving tool is a tool used to restore data. This study used two carving tool, PhotoRec and Foremost. Each carving tools have different configurations. Using one same tool, can also be obtained different results because of the differences in the Fig 1. The proces of carving through the disk image configuration used. Therefore, the present study analyzes the performance of the process of carving using several parameters, namely speed carving process, the number of files that are returned and the validity of the return files.

4. Output/ Carving Result Once the carving process has finished, there would be found a folder containing the files from the original ones. The folder has been initially inaccessible and afterwards it is obtainable. They caused the folder to change the order. Finally, the folder can be read, written and executed.

5. Validation Finally, the results of carving, validation is performed to check is the file correct. It means the file is same from original. The checking is done by comparing the original file with the accordance with the provisions of the conceptual model, the previous file. The author uses a simple shell script SHA1 hash output of input data and modelling. function algorithm for comparing a hash value of the original file with the return file. Based on the validation results F. Experimentation obtained, can be determined whether the file is correct or not by comparing a hash value between the previous file with the After PhotoRec and Foremost installed on , then file carving. the process will be conducted in accordance with the file returns simulation concepts and models that have been described previously. After the carving process is completed B. Input / Output Data and then will do the validation process using the shell script Based on the results obtained at the time of formulating the file that was created. Validation is done by comparing the hash problem, it can be found that one of the most frequently value of the original file with the file back. missing files are photos and video. Therefore, the present study determines some photos and video files that have a G. Output Analysis format is different, such as the three pieces of the file type bmp, jpg, mkv, mp3, png, tif, avi, flv, mp4, wav, wma. Analysis of the results obtained after completion of Overall, the total number of files to be restored is 33 files. running all the scenarios that will be discussed in the next chapter. . Modelling Making the scenarios that will be used for the simulation process. In this study, there are eight scenarios, each of the IV. ANALYSIS AND SIMULATION RESULT four scenarios process of carving through the disk image and the fourth scenario carving process through direct disk drive. A. Simulation 1 Each scenario will be put through the process of carving using Experiments conducted in each scenario aims to get the PhotoRec and Foremost and will be conducted in three average value of each scenario designed. Here is the average experiments each. value of scenario 1: Scenario 1-4 makes the process of carving through the disk Table I Comparison Results Scenario 1 image. Scenario 1 only restore the image file jpg, scenario 2 Scenario 1 PhotoRec Foremost only restore audio files wav, scenario 3 only restore files that mp4 video, while the fourth scenario restore all types of files Number of Files 4 file 12 files that have been predetermined. Process Speed 4 minutes 49 seconds 5 minutes 32 Scenario 5-8 makes the process of carving through the disk seconds drives directly. Scenario 5 only restore the image file jpg, Validation File 75% 25% scenario 6 only restore audio files wav, scenario 7 only restore video files mp4, whereas scenario 8 restores all types of files that have been predetermined. At the table I, could be seen the results of scenario 1, the number of files returned by PhotoRec consists of 4 pieces of files, while Foremost 12 pieces of the file. When the file is D. Simulation validated, PhotoRec yield the valid value that is 75% of the Process simulation will be run using the scenarios that returned files, while Foremost is only 25%. This proves that have been determined in the previous stage. In addition, the restore files are not necessarily better. testing has done in according to the parameters that have been While the speed of the process parameters PhotoRec determined well in the previous stage. carving process for 4 minutes 49 seconds, while Foremost 5 Before the simulation was run, some preparation such as minutes 32 seconds. Seeing the results of the above it can be creating a disk image using dc3dd, PhotoRec installation, concluded from all of the parameters used, in scenario 1 Foremost installation, and script manufactured in simple shell PhotoRec is superior comparing to Foremost on speed hash algorithm SHA-1 will be used to check the validation of parameters and validation process. While the number of files the files back. to back Foremost superior to PhotoRec, although the file is returned more false positive (file that return is not appropriate). E. Verification and Validation B. Simulation 2 Verification and validation of the earlier stages is carried Experiments conducted in each scenario aims to get the out at this stage. If an error occurs in the earlier stages, the average value of each scenario designed. Here is the average correction or improvement would have been done. value of scenario 2: Verification is done by testing whether PhotoRec, Foremost and shell scripts that have been created can run. While validation is performed by rechecking whether PhotoRec, Foremost and SHA-1 shell script that has been made in

Table II Comparison Results Scenario 2 In Table IV it can be seen the results of scenario 4, the Scenario 2 PhotoRec Foremost number of files returned by PhotoRec are 32 pieces of files, while Foremost 35 pieces of the file. When the file is validated, Number of Files 6 file 3 file PhotoRec yield the valid value of 92.7% of the files that Process Speed 4 minutes 57 seconds 5 minutes 7 returned, while Foremost is only 55.68%. This proves that the seconds restored files more is not necessarily better. 50% 0% Validation File While the speed of the process parameters PhotoRec carving process for 7 minutes 51 seconds, while Foremost is In Table II can be seen the results of scenario 2, the 10 minutes 44 seconds. According to the results above it can number of files returned by PhotoRec amounted to 6 pieces of be concluded from all of the parameters used, in scenario 1 files, while Foremost 3 pieces of the file. When the file is PhotoRec superior to Foremost's speed parameter validation validated, PhotoRec yield the valid 50% value of the files that process and file. While the number of files to back Foremost were returned, while Foremost is 0%. produce higher numbers, although the returned files are the While the speed of the process parameters PhotoRec false positive ones (file that is not appropriate). carving process for 4 minutes 57 seconds, while Foremost takes 5 minutes 7 seconds. Seeing the results above, it can be E. Simulation 5 concluded from all of the parameters used, in scenario 2 Experiments conducted in each scenario aims to get the PhotoRec superior to Foremost. average value of each scenario designed. Here is the average value of five scenarios: C. Simulation 3 Table V Comparison Results Scenario 5 Scenario 6 PhotoRec Foremost Experiments conducted in each scenario aims to get the 4 file 12 file average value of each scenario designed. Here is the average Number of Files value of scenario 3: Process Speed 21 minutes 8 seconds 21 minutes 19 seconds Table III Comparison Results Scenario 3 Validation File 75% 25% Scenario 3 PhotoRec Foremost

Number of Files 3 file 2 file In Table V it can be seen the results of scenario 5, the Process Speed 4 minutes 10 seconds 5 minutes 13 seconds number of files returned by PhotoRec consists of 4 pieces of files, while Foremost 12 pieces of the file. When the file is Validation File 100% 66,7% validated, PhotoRec yield the valid value of 75% of the returned files, while Foremost is only 25%. This proves that In Table III can be seen the results of scenario 1, the the restore files are not necessarily better. number of files returned by PhotoRec totaling 3 pieces of files, While the speed of the process parameters PhotoRec while Foremost is two pieces of the file. The condition of the carving process for 21 minutes 8 seconds, Foremost occurs for file validated PhotoRec yield the valid value of 100% of the 21 minutes 19 seconds. The time required for the process of files were returned, whereas only 66.7% Foremost. carving is much longer when using a disk drive directly from While the speed of the process parameters PhotoRec the disk image using. Seeing the results of the above it can be carving process for 4 minutes 10 seconds, while Foremost is 5 concluded from all of the parameters used, in scenario 1 minutes 13 seconds. Seeing the results above, it can be PhotoRec superior to Foremost on speed parameters and concluded from all of the parameters used, in Scenario 3 validation process. While the number of files to back PhotoRec is superior to Foremost. Foremost superior to PhotoRec, although the file is returned more false positive (file that return is not appropriate). D. Simulation 4 Experiments conducted in each scenario aims to get the F. Simulation 6 average value of each designed scenario. Here is the average Experiments conducted in each scenario aims to get the value of the four scenarios: average value of each scenario designed. Here is the average value of the six scenarios: Table IV Comparison Results Scenario 4 Scenario 4 PhotoRec Foremost Table VI Comparison Results Scenario 6 Scenario 5 PhotoRec Foremost Number of Files 32 file 35 file Number of Files 6 file 3 file Process Speed 7 minutes 51 seconds 10 minutes 44 seconds Process Speed 20 minutes 58 21 minutes seconds Validation File 92,7% 55,68% Validation File 50% 0%

In table VI it can be seen the results of scenario 6, the while Foremost is merely 49%. This proves that the restore number of files returned by PhotoRec amounted to 6 pieces of files are not necessarily better. files, while Foremost 3 pieces of the file. When the file is validated PhotoRec yield the valid value of 50% of the files While the speed of the process parameters PhotoRec were returned, while Foremost of 0%. carving process for 29 minutes 58 seconds, while Foremost 26 minutes 5 seconds. Seeing the results above it can be While the speed of the process parameters PhotoRec concluded from all of the parameters used, in scenario 1 carving process for 20 minutes 58 seconds, while Foremost PhotoRec superior to the Foremost for file validation using up to 21 minutes. The time required for the process of parameters. While the speed of the process and the number of carving is much longer when using a disk drive directly from files which are back to Foremost is managed to excel. the disk image using. Seeing the results of the above it can be Although the files returned by Foremost frequently are many concluded from all of the parameters used, in scenario 2 false positive files (file that is not appropriate). PhotoRec superior to Foremost. V. CONCLUSION G. Simulation 7 According the stages simulation result and analysis that Experiments conducted in each scenario aims to get the has been done, the process of carving files using PhotoRec and average value of each scenario designed. Here is the average Foremost using the parameter: number of return files, the value of the scenario 7: speed of the process and file validation can be concluded that: Table VII Comparison Results Scenario 7 1. Foremost process speeds longer than the process Scenario 7 PhotoRec Foremost undertaken by PhotoRec. 2. PhotoRec resulted in fewer number of restored files 2 file 1 file Number of Files compared to the number of files returned by Foremost. Process Speed 21 minutes 5 seconds 20 minutes 31 3. Meanwhile, PhotoRec file validation results are in far seconds greater percentage comparing to the Foremost. Validation File 66,7% 33,3% Based on the exposure PhotoRec can be concluded that the performance is better than the performance of Foremost In Table VII can be seen the results of scenario 7, the number of files returned by PhotoRec amounted to 2 pieces of files, while Foremost is a piece of file. When the file is validated, PhotoRec yield the valid value of 66.7% of the References returned files, whereas the 33.3% is Foremost.

While the speed of the process parameters PhotoRec [1] 1 Ashraf, M. Nadeem. 2012. Forensic Multimedia File carving. carving process for 21 minutes 5 seconds, in the meantime Department of Computer and Systems Sciences. Royal Institute of Foremost is 20 minutes 31 seconds. Seeing the results above, Technology. it can be concluded from all of the parameters used, in [2] 9 Aljumah, Abudullah, et all.2014. Comparison between file carving from disk drive and disk image. Salaman bin Abdulaziz Scenario 3 are superior to PhotoRec Foremost for validation University.Saudi Arabia parameter file and the file number back. While the speed of [3] Beek, Christiaan. 2011. Introduction to File carving. McAfee the process this time parameter Foremost is better than [4] Carrier, Brian. 2005. File System Forensic Analysis. Addison Wesley PhotoRec. Professional. [5] 2 Courrejou, Timothy dan Garfinkel, Simson L. 2011. A COMPARATIVE ANALYSIS OF FILE CARVING SOFTWARE. H. Simulation 8 NAVAL POSGRADUATE SCHOOL. California. [6] 3 EC-Council. 2010. Computer Forensics Investigating Hard Disks, File Experiments conducted in each scenario aims to get the & Operating Systems. Course Technology. average value of each scenario designed. Here is the average [7] Datlabs.2016, 20 Mei. Common Data Loss Situations. Retrieved from value of scenario 8: Datlabs : http//www.datlabsdatarecovery.co.uk/common-data-loss- situations/ Table VIII Comparison Results Scenario 8 [8] Etforecasts. 2016, 19 Mei. COMPUTER-IN-USE FORECAST BY Scenario 8 PhotoRec Foremost COUNTRY. Retrieved from Etforecasts : http://www.etforecasts.com/product/ES_cinusev2.htm 31 file 34 file Number of Files [9] Grenier, Christophe.2016, 2 Juni. Photorec, Digital Picture and File Process Speed 29 minutes 58 26 menit 5 seconds Recovery. Retrieved from cgsecurity : seconds http://www.cgsecurity.org/wiki/PhotoRec Validation File 88,5% 49% [10] 4 Laurenson, Thomas. 2013. Performance Analysis of File carving Tools. University of Otago. New Zealand. [11] Leaver, Michael J. 2007. Disk imaging is Not a Total Backup Solution. In Table VIII can be seen the results of scenario 8, the Retrieved from 2 BrightSparks : number of files returned by PhotoRec is 31 pieces of files, http://2brightsparks.com/resources/articles/disk-imaging-is-not-a-total- while Foremost 34 pieces of the file. When the file is validated, backup-solution.html PhotoRec yield the valid value of 88.5% of the returned files, [12] Mikus, Nicholas. 2005. AN ANALYSIS OF DISC CARVING http://www.statista.com/statistics/463338/private-data-stored-on-digital- TECHNIQUES. NAVAL POSGRADUATE SCHOOL. California. devices-worldwide/ [13] Madani, Sajjad A, et all. 2010. Wireless sensor networks : modeling and [17] Yoo, Byeongyeong, et all. 2011. A study on multimedia file carving simulation. method. Korea University. Republic of Korea. [14] Merola, Antonio. 2008. Data Carving Concepts. SANS Institute [15] Poisel, Rainer, et all. 2011. Advanced File carving Approaches for Multimedia Files. St. Pelten. Austria. [16] Statista. 2016, 15 Juni. Private data stored on digital devices according to internet users worldwide as of June 2015. Retrieved from statista :