2018 International Conference on Computer, Communication and Network Technology (CCNT 2018) ISBN: 978-1-60595-561-2

Design and Implementation of Data Recovery System Based on NTFS

* , Fu-jun FENG , Xiao-jun LI and Jun-ping YAO Research Inst. of High-Tech, China *Corresponding author

Keywords: NTFS, Data storage, Data recovery, MFT.

Abstract. NTFS in Windows is a kind of file systems with good stability and security, and it is important for data security to research on data recovery based on NTFS. Analyzing the structure of NTFS, a data recovery system based on NTFS is designed and implemented, and three modules are presented, which includes disk analyzing, partition scanning and data recovery modules. The data recovery module is the core of the recovery system, and the implementation process of dada recovery based on NTFS is provided particularly. The results of soft testing show that the system proposed can recover the original data very well.

Introduction With the continuous development of network technology and the rapid expansion of the amount of data, we have entered the era of big data. More and more users store some important information in a computer disk, but have to face the risk of missing data. Data recovery is an important method to protect data security, which is the last line of defense for information security. The technology of data recovery can save the user’s data from some unreadable storage devices when the computer system is destroyed by misoperation, virus attack, hardware failure or network attack, and it can make data losses minimum[1-4]. Data recovery can be divided into two categories based on the types of failure, that is logic recovery and hardware recovery. Data recovery due to the disk failure is called hardware recovery, such as track or disk damage. Data recovery completed by software is called logic recovery, which does not involve hardware maintenance in the whole process. Most data recovery belongs to the logic recovery, the main reasons of which include: (1) misoperation, such as misformat or misdeletion; (2) damage by malicious program, such as virus infection; (3) fault of , such as system halt. Data recovery involves a wide range of areas, and the problems of data damage and repair usually exist where electronic data and digital documents need to store. We should check the storage media thoroughly before data recovery, and design a reasonable scheme to recovery the loss data according to the actual situation. So far there are many researches on data recovery for computer data[5-8] and electronic evidence[9-10]. NTFS is a kind of used in Windows system, which has good fault tolerance and security. The advantage of NTFS is good stability, and it is suitable for the situation which demands high data security. So it is important to research data recovery based on NTFS for data security and protection, and a data recovery system based on NTFS is designed and realized in this paper.

File Structure of NTFS

Master File Table (MFT) In NTFS, all data is stored in the disk like a file, such as bitmap, document, catalogue and system boot loader etc., and these system files are called element files, these data are called element data. The storage location of a file on the disk is determined by MFT, which is the core in NFTS. For MFT, it is

240

a relational database composed of arrays of file records, and each file has a file record, it may have several records for the large file. The length of each file record is fixed, that is 1KB. The file record is numbered from “0”, and the first 16 files are element files starting with “$”, which belong to the system files and are hidden for users. These system files have the fixed location in MFT, while the file records of other files and catalogues can be stored anywhere. The first file record in MFT is $MFT itself, which records all files and catalogues. In order to ensure the reliability of the file system structure, a mirror file $MFTMirr is specially prepared for $MFT, which is as the second record in MFT. Data Recovery of NTFS When a file is deleted in NTFS, the data area is not be covered, and at the same time, the file record in MFT is not cleared, but there are three aspects changed: one which offsets 0x16H from the MFT header is changed; the attribute of index root for the father folder is changed; locations corresponding to the clusters occupied in the file record of bitmap are set to “0”. When a file is deleted, the key information includes: (1) BPB parameter BPB means BIOS Parameter Block, which is the basis of NTFS. It occupies 73 and records some important information for the file system. The main parameters for data recovery are MftStartCluster, SectorPerCluster, BytesPerSector. (2) Change of MFT The file record header of MFT begins with “46 49 4C 45”, which is the expression of “FILE” by ASCII, and it will not be changed when the file is deleted. The symbol byte is located at 0x16H offset, and 00H denotes the deleted file, 01H normal file, 02H deleted catalogue, and 03H normal catalogue. If we want to recover some deleted files in NTFS, we can search some file records of MFT, then determine the file name and path, according to the attributes of file name and the referenced cluster number of the father catalogue. If the cluster pointer is well preserved and the file contents are not be covered, the file can be recovered well, even if there are many file fragments. If the attribute value is resident, the data will not be covered before MFT is distributed again. If the attribute value is stored in several file records, it need to recover the other file records of MFT.

Design and Implementation of a Data Recovery System When a file is deleted in NTFS, the symbol byte in the file record header is setted to 00H or 02H, while other attributes are not be changed; for the files with data run, spaces occupied by the deleted file will be retrieved, but the data contents do not be changed. In this paper, a data recovery system is designed, and the main function modules have been implemented. Disk Analyzing Module The purpose of disk analysis is to localize the partition of NTFS, and analyze the related information about Main Boot Record (MBR), (EBR) and MFT. MBR is located at the No.0 track, No.0 cylinder and No.1 sector. The first 446 bytes of MBR with 512 bytes are boot loader, and the subsequent 64 bytes are Disk Partition Table (DPT). A partition is defined by every 16 bytes, so it can determine 4 partitions at most. The fifth byte in each DPT expresses the , and 0x07H means NTFS. Each DPT indicates information of a partition or extended partition, and the DPT of a main extended partition maybe include a DPT of a subextended partition, so a chain structure is created, which can record all partitions in disk. For the extended partition, the DPT structure is the same to the basic partition, and the expression of partition type is 0x05H. The information of extended partition is stored in the form of chain table, and information of the later extended partition is recorded in the DPT of the prior one, but the space of the two DPTs does not overlap. EBR is stored in the first sector of the extended partition, and it can only express two DPTs.

241 The key code is to define the structure of the MFT attribute, which is shown as below: TFILE_RECORD = packed record // the structure of MFT attribute header Header: TNTFS_RECORD_HEADER; // header SequenceNumber : Word; // serial number ReferenceCount : Word; // number of the hard connection AttributesOffset : Word; // offset of the first attribute Flags : Word; // 0: deleted file, 1: common file, 2: deleted catalogue, 3: common catalogue BytesInUse : DWord; // actual size of the file record(bytes) BytesAllocated: DWord; //assigned size of the file record(bytes) BaseFileRecord : Int64; // basic record NextAttributeID : Word; // the attribute ID MFTRecordNumber : DWord; // number of this MFT record TRECORD_ATTRIBUTE = packed record // the structure of attribute value AttributeType : DWord; // type Length : DWord; // length NonResident : Byte; // flag of nonresident (0x00H: resident; 0x01H: nonresident) NameLength : Byte; // length of name NameOffset : Word; // offset of name Partition Scanning Module The function of partition scanning module is to scan file records of MFT in NTFS partition, and decide whether the file is deleted by checking the two bytes starting with 0x16H offset for each file record, then obtain the filename, established time and the last modification time by analyzing the attribute with type of “30H” and “10H”. The key code is shown as below: with PBootSequence do begin if (cOEMID[1]+cOEMID[2]+cOEMID[3]+cOEMID[4] < > 'NTFS') then begin MainForm.DriveSize.Caption := 'size'+'bytes'; MainForm.DriveMFTLocation.Caption := 'MFTlocation : ' if not FixupUpdateSequence(MFTData,diskInfo) then begin Closehandle(hDevice); Dispose(PBootSequence); IsSearching := false; MainForm.DriveMFTSize.Caption:= 'MFTsize:'+IntToStr(MFT_SIZE*diskInfo.BytesPerCluster)+ 'bytes'; MainForm.DriveMFTRecordsCount.Caption := 'file number:'+IntToStr; MainForm.btnScan.Caption := 'stop scanning'; MainForm.btnScan.Caption := 'scanning'; MainForm.btnRecover.Enabled := true; Data Recovery Module If the attribute value of a catalogue is stored in the basic file record of MFT, it is called the resident attribute. For the resident attribute, the attribute value is stored behind the attribute name. If the attibute value is so large that it can not be stored in one file record, a storage space will be distributed for this attribute value by NTFS from the data area, and this stroage space is called data run. The attribute stored in a data run is called nonresident attribute. The key code of data recovery module is shown as below: private { Private declarations } FdiskInfo : TDISK_INFORMATION; begin { Place thread code here } isRecovering := true; with MainForm do begin PB.Max := FFileCount; 242

btnRecover.Caption := 'Stop Recovery'; btnScan.Enabled := false; for i := 0 to ListView1.Items.Count -1 do begin if StopRecover then begin break; end; if listview1.Items[i].Checked then begin btnRecover.Caption := ' File Recovery '; btnScan.Enabled := true; end; The process of data recovery for NTFS is: firstly, judging whether the attibute with type “80H” is resident and obtain the data content; then creating a new file with the name of deleted file in other partition, and copying the original content to the new data area. The flowchart of data recovery is shown in Fig.1. Start

Open disk,read the DPT of MBR and EBR, and determine the disk partition

Obtain the parameters of BPB

Read the value of “MftStartCluster”

No Whether scanning MFT is over?

Yes List filename,creation time,modification time etc., and mark the deleted file

Whether the file No attribute is nonresident? Yes Obtain the data

Data recovery

End

Figure 1. Flowchart of data recovery.

Software Testing The process of software testing for data recovery is: (1) creat a file “test.doc” in partition D, and backup the original file in the backup folder; 243 (2) delete the file “test.doc” and empty recycle; (3) open the data recovery software, and scan the partition D, then some deleted files will be listed; (4) choose the file need to be recovered in the file list, and recover it to the partition E. we can choose the option of “recovery with primitive name” which may cover the old storage area, or “recovery with file number”. (5) we can choose the two options respectively, and the results shows that the recovery contents are the same as the original ones. For the other file types, such as video, audio or compressed files, the recovery process is similar to that in above, and the original data will be recovered well by the software we designed.

Summary Through analyzing the structure of NTFS, a data recovery system is proposed in this paper, and it includes three modules, those are disk analyzing, partition scanning and data recovery modules. The testing results show that it can realize the data recovery and data protection for the different types of files. We will optimize the recovery velocity and complete more functions of the designed system in the next step.

References [1] Shijian Dai, Yanhui Tu, The Technology of Data Recovery, second ed., Publishing House of Electronics Industry, Beijing, 2014. [2] Jingsheng Zhang, Methods and Case Analysis of Data Recovery, Publishing House of Electronics Industry, Beijing, 2010. [3] Wei Liu. Deep Secret of Data Recovery, Publishing House of Electronics Industry, Beijing, 2010. [4] Yong-gang Liu. Decryption of Data Recovery, Publishing House of Electronics Industry, Beijing, 2011. [5] Zhi Zhang, Xinxin Chen, Shiqiang Li, Research on the construct of the important data recovery platform, Journal of Netinfo Security, s1 (2016):146-148. [6] Wei Wang, Technology and application of computer data recovery[J]. J. Elec. Tech. & Soft. Eng., 7 (2015):207-207. [7] Xianwei Xu, Yanying Yang, Ji Cao, Research on the data recovery of files in Windows, J. West Anhui Univ., 2 (2014):24-27. [8] Qi Mo, Fei Dai, Rui Zhu, Modelling and analysis for data recovery strategy in business process collaboration, J. App. Research of Computers, 2 (2016):462-466. [9] Fen Dai, Lu Li, Hongwei Liu. Research on the data recovery of electronic evidence, J. Infor. & Commun., 2016(1):123-125. [10] Guofu Ma, Shengli Ma, Zixian Wang, Application of data recovery in electronic evidence and judicial identification, J. Hebei Univ., 5 (2015):538-545.

244