File Extension Renaming and Signaturing

By

Ryan Ware

Digital Forensics

September 19, 2006 Introduction

Today, several of the major operating systems use extensions to some degree or another. File extensions aid the operating system with determining the appropriate program and method needed to a file in a proper manner. Many cases in digital forensics involve the modification of file extensions on one or files in digital media.

These modified files make the analysis process difficult at times. Without the proper identification of the types of files, important evidence may be excluded from an investigation. Thus, the modification of file extensions must be identified and corrected during a digital forensics investigation.

Background

File extensions:

To distinguish the format of a file, several operating systems use file extensions. The two major operating systems are Windows and Mac OS X. File extensions are a series of alphanumeric characters appended to the end of a file name [4]. Windows uses these file extensions to determine the best program to open the files and a list of other recommended programs to open the files. Under Windows, file extension names include:

.exe and .com for , .jpg and .gif for images, . and . for audio files, and

.txt and .doc for text files. Without the proper file extension, Windows may attempt to open the file with a program that is incapable of opening the file which could cause an error or produce unintelligible output.

File headers:

Unix first used file headers or magic numbers to determine the format of a file similar to file extensions. Now, many programs and operating systems use these file headers. The file headers are not visible in normal programs. Hex editor or hex dump program can display these file headers as well as the contents of the file in hex. Some examples of file headers include [6]:

File Type: File Header information:

JPEG the ASCII code for 'JFIF'

PDF %PDF

GIF the ASCII code for 'GIF89a' or 'GIF87a'

Multipurpose Mail Extensions (MIME):

MIME represents file types for messages and files sent over the Internet. This Internet

Standard initially allowed for the sending of non-standard character encodings over the

Internet but expanded to allow other files such as images, movies, and executables. The

MIME standard uses headers to denote the type of file. For example, the header of most text messages appears as the following [7]:

MIME-Version: 1.0

Content-Type: text/plain

File Extension Renaming

File extensions are very important for the proper identification of programs that can open and display the correct information with regards to a file. Windows relies heavily on the use of file extensions to open files. For example, we create a text document called test.txt. Windows normally tries to open this with Notepad, , or

WordPad. We then copy this file but change the extension to a .jpg. When we try to open test.jpg, Windows tries to open the file with an image viewer such as ImageReady or Photoshop. These programs cannot properly open the documents and present an error.

However, since we know that the program is actually a text document, we can tell

Windows to open it with Notepad. Notepad can properly open this file even though the extension is a .jpg.

This can cause a rather significant problem when doing an investigation of digital media.

When investigating only a few files, the modification of file extensions does not cause a great problem. The investigator can try to open and test each file to ensure the file extension and type are correct. However, most computer hard drives contain tens of thousands or even several hundred thousand files on them. The examination of each file would be infeasible for one or even a small team of forensics investigators. Thus, many investigators use tools that quickly glance at the majority of the files. These tools usually depend on basic signatures such as file names and extensions. If a file extension has been modified, both the tool and the investigator might pass over important evidence. So, most investigators use tools that identify some file extension modifications.

Tools

The tools digital forensics investigators currently use examine file header information to identify files with incorrect file extensions. These tools include:

file - a Unix command that examines the header of the file to determine the type

Droid - examines the header of a file and claims to do some internal analysis

Coroner's Toolkit - uses the file command to check the file type

Sleuth Kit - also uses the file command They can correctly identify files with modified file extensions without much difficulty.

However, most of these tools consistently fail to identify the file type when both the file extension and file header have been modified. For example, Droid can accurately identify the type of a file if the header has not been modified. However, if both the header and the extension have been changed, Droid cannot determine the type of the file.

Modified or Mangled Headers

As mentioned above, the tools used today look primarily at the header file to determine the type of a file and whether the file extension has been modified. With a hex editor, one can edit the header of a file. The user can change the header to anything. He or she can mangle the header or change the header to some other known file type. Thus, the modification of a file header, in most cases, causes the current tools to fail when identifying the type of a file. For example, a file, original.txt, could be changed such that the new extension is a .jpg and the header is a .mov. Most of the tools used by forensics investigators may be "stumped" by these changes but may flag the file as being suspicious because the file extension and header do not match. A more interesting case may be a change of both the file extension and header to the same type. Thus, something that would have been flagged, such as an , may not be flagged because it looks exactly like a text file in both the extension and the header.

Research

There exist several more possible ways to identify the type of a file without examining the file extension or file header. As mentioned above, the MIME standard does provide some identification of files. This may not be helpful in all cases because the MIME type may not be attached to the file. Currently, most forensics tools do not appear to use

MIME type as another means of identification. Thus, further research may be performed in these regards.

There are three more possible research aspects. One may use file compression and determine the type of file based on the compression ratio. Some research has already been done on this topic but not with regards to digital forensics. The researchers were able to identify the basic type, such as image, audio, movie, or text, but were not able to identify specific file types such as JPEG instead of GIF. Another research topic would be to examine the structure of a file. Some research has also been done in this field.

However, the research has only been able to identify JPEG images based on the rate of change of the byte contents of a file [2]. The third research topic involves a fuzzy hash of a file. According to Jesse Kornblum, files can be matched if they have significant sequences of bytes in the same order [3]. All of these research topics appear to have potential but still require a great deal of research before any real tools may be produced for digital forensics investigators.

Conclusion

Simple file extension renaming can cause problems. However, these changes are fairly easy to catch with current forensics tools. A more significant problem can occur when files have changed both their extensions and their file headers. The forensics tools in use do not appear to be to handle these situations well. Research must be done with regards to file typing without the use of file extensions or file headers. References

1) Carrier, Brian. "The Sleuth Kit Informer." February 15, 2003.

.

2) Karresand, Martin and Shahmehri, Nahid. "File Type Identification of

Fragments by Their Binary Structure." June 23, 2006.

.

3) Kornblum, Jesse. "Fuzzy Hashing." August 21, 2006.

08/msg00004.html>.

4) " Extension." September 18, 2006.

.

5) "Introduction - Droid." .

6) "Magic number (programming)." September 17, 2006.

.

7) "MIME." September 19, 2006. .