
Digital Forensics New XML-Based Files Implications for Forensics Two new of!ce document !le formats (Of!ce Open XML and OpenDocument Format) make it easier to glean time stamps and unique document identi!ers while also improving opportunities for !le carving and data recovery. SIMSON L. or more than 20 years, programs such as Micro- OOX !les using GARFINKEL soft Word have stored their documents in binary Microsoft O"ce US Naval !le formats. That’s changing as Microsoft, Sun 2007 for Windows, Microsoft O"ce 2008 for Ma- Postgraduate Microsystems, and other developers migrate to cintosh, OpenO"ce 2.3.1, and NeoO"ce 2.2.2 (a School Fnew XML-based formats for document !les. version of OpenO"ce that runs under MacOS). We Document !les are of critical interest to forensic analyzed these !les using specially written XMLdoc JAMES J. practitioners because of the data they contain; they’re tools, including an XML ZIP-!le browser, a search MIGLETZ also a rich topic for forensic research. Although most utility, and a program for automatically displaying the US Marine investigations concern themselves solely with a docu- di#erences between two XML-containing ZIP !les. Corps ment’s surface content, some examinations dive deeper, These tools are freely available from our Web site Command examining the metadata or deleted material that’s still (www.a$ib.org). and Control present in the !le. Investigators can, for instance, use For this study, we decided not to evaluate the digital Integration metadata to identify individuals potentially responsible signature provisions of either the ODF or OOX for- Division for unauthorized !le modi!cation, establish text plagia- mats because these features are rarely used in practice. rization, or even indicate falsi!cation of evidence. Un- fortunately, metadata can also be modi!ed to implicate Data Recovery innocent people—and the ease of modifying these new Overall, we found that ODF and OOX !les tend to !les means that it’s far easier to make malicious modi!- be smaller than equivalent legacy non-XML !les, al- cations that are di"cult (if not impossible) to detect.1 most certainly a result of ZIP compression. Although With so many aspects to consider, we present a it’s trivial to add to or remove parts from a ZIP archive forensic analysis of the two rival XML-based of- after its creation, we found that in many cases, adding !ce document !le formats: the O"ce Open XML or removing parts to the archive corrupted the !le so (OOX) that Microsoft adopted for its O"ce soft- that it couldn’t be processed with Microsoft O"ce or ware suite and the OpenDocument Format (ODF) OpenO"ce. used by Sun’s OpenO"ce software. We detail how The ZIP structure for these !les is useful when per- forensic tools can exploit features in these !le formats forming data recovery or !le carving. (File carving is and show how these formats could cause problems for the process of recognizing !les by their content, rather forensic practitioners. For additional information on than !le system metadata. Carving is frequently used the development and increased use of these two !le for recovering !les from devices that have hardware formats, see the “Background” sidebar. errors, have been formatted, or have been partially overwritten.) Because each part of the archive includes Analysis and Forensic Implications a multibyte signature and a 32-bit cyclic redundancy To begin our analysis, we created multiple ODF and check (CRC32) for validation, we can recover parts of 38 COPUBLISHED BY THE IEEE COMPUTER AND RELIABILITY SOCIETIES ■ 1540-7993/09/$25.00 © 2009 IEEE ■ MARCH/APRIL 2009 Authorized licensed use limited to: Naval Postgraduate School. Downloaded on January 13, 2010 at 12:09 from IEEE Xplore. Restrictions apply. Digital Forensics a ZIP archive even when other parts of it are damaged, missing, or otherwise corrupted. We can also use the Length Name CRC32 and relative o#sets within the archive to au- ------- ---- tomatically reassemble fragmented ZIP !les.2 We can 1527 [Content_Types].xml then manually process recovered parts or insert them 735 _rels/.rels into other OOX/ODF !les to view the data. 1107 word/_rels/document.xml.rels 4780 word/document.xml Manifest. ODF and OOX both contain a ZIP direc- 6613 word/media/image1.png tory as the last structure in the !le. We can examine 7559 word/theme/theme1.xml this directory using standard tools, such as the Unix 39832 docProps/thumbnail.jpeg unzip command or Sun’s JAR. 25316 word/embeddings/Microsoft_Word_Document1.docx ODF has a second directory that stores document 2036 word/settings.xml parts in an XML data structure called Meta-INF/ 276 word/webSettings.xml manifest.xml. The OOX !les store references to the 734 docProps/app.xml additional document parts in the [Content_Types]. 726 docProps/core.xml xml and .rels parts, in addition to the document con- 15019 word/styles.xml tents themselves. 1521 word/fontTable.xml ------- ------- Contents. Both !le formats include a special XML 107781 14 !les !le that contains the document’s main %ow. In ODF, the !le content is called content.xml. The primary Figure 1. ZIP archive directory. We embedded a Microsoft Word document contents of an OOX word processing document cre- inside another Microsoft Word document with Word’s “Insert/Object...” ated with Microsoft O"ce 2007 or 2008 reside in the command. document.xml part, although the standard allows a di#erent name to be speci!ed in the [Content_Types]. xml part. To this end, we tested both Guidance’s EnCase 6.11 Forensic tools should extract text from the content and AccessData’s Forensic ToolKit 1.8 and determined parts, but tool developers must understand that text that they could display and search for text inside ODF can be present in other document parts as well. For !les, OOX !les, and OOX !les embedded as objects example, Microsoft Word allows other Word docu- inside other OOX !les. ments to be embedded within a Word document us- Both the compressed nature of ODF and OOX ing the “Insert/Object...” menu command. These !les and the multiple codings for the strings pos- documents are embedded as a named .docx !le inside sible within XML represent a signi!cant problem the ZIP archive, as Figure 1 shows. In such an in- for forensic program developers. Because all the text stance, where !les are embedded within other !les, is compressed, it’s no longer possible to !nd it by investigators should analyze !les recursively using a scanning for strings within raw disk or document special forensic tool. images. And because XML allows strings to be cod- The most straightforward way for forensic prac- ed in hexa decimal or even interrupted by comment titioners to handle these new compound document characters (for example, str<--! ignore-->ing), formats is to save the !le and then open it with a any forensic tool that takes shortcuts in decoding compliant program. Although this approach works, it the ZIP archive or implementing the full XML raises several potential problems: schema could return false negatives when perform- ing searches. rThe compound document might contain active content that the forensic investigator doesn’t wish Embedded objects and thumbnails. A big advantage to execute. (Despite assurances from Microsoft and of these XML !le formats is that images and other others that these !le formats are safer, both ODF objects embedded in word processing !les are stored and OOX have provisions for storing active con- in the ZIP !le as their own parts. tent3 and therefore can carry viruses.) We found that Microsoft O"ce 2008 and Neo- rLinks to external Web sites can reveal that someone O"ce for Macintosh both stored thumbnail images has captured the !le and is analyzing it. of the documents’ !rst page by default: Microsoft rIf parts of the !le are overwritten or missing, appli- stores the thumbnail as a .jpg, while NeoO"ce stores cations such as Word or OpenO"ce might be un- it as two !les—a .png and a .pdf. We also found .pptx able to open the !les. thumbnails created by PowerPoint 2007 on Win- rDesktop applications can overlook or ignore critical dows. However, Word 2007 and Excel 2007 didn’t information of interest to the forensic investigator. save thumbnails by default, presumably because the www.computer.org/security 39 Authorized licensed use limited to: Naval Postgraduate School. Downloaded on January 13, 2010 at 12:09 from IEEE Xplore. Restrictions apply. Digital Forensics Background ocument !les are fundamentally container !les—that is, sin- pressed; embedded images are stored as binary objects within D gle !les (a consecutive stream of bytes) that contain multiple their own parts. data objects. A typical Microsoft Word !le might contain data Because Microsoft’s XML languages are de!ned in terms of streams associated with the summary info, the main text, tables, behaviors built in to Microsoft Of!ce, OOX !les can’t be readily and embedded images. The !le also contains numerous forms of translated into ODF or vice versa. metadata—both for the document and for the container itself. Microsoft’s Of!ce 2003 allowed these formats to be used as alternative document !le formats; with Microsoft Of!ce 2007, the OpenDocument Format XML-based document formats became the default !le format.3 Sun Microsystems submitted the OpenOf!ce OpenDocument Native support for Of!ce Open XML is provided today in Micro- Format (ODF) to the Organization for the Advancement of Struc- soft Of!ce 2007 for Windows and Of!ce 2008 for Macintosh. tured Information Standards (Oasis).
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-