The Microsoft Office Open XML Format Preview for Developers White Paper
Total Page:16
File Type:pdf, Size:1020Kb
The Microsoft Office Open XML Format Preview for Developers White Paper Published: June 2005 For the latest information, please see http://www.microsoft.com/office/wave12 Contents Summary...................................................................................................................................1 Introducing the Microsoft Office Open XML Format .................................................................2 File Structure.............................................................................................................................3 ZIP Package .........................................................................................................................3 Parts .....................................................................................................................................4 Relationships ........................................................................................................................5 Macro-Enabled Files versus Macro-Free Files.....................................................................7 File Extensions .....................................................................................................................7 Solution Development...............................................................................................................9 Data Interoperability..............................................................................................................9 Content Manipulation..........................................................................................................10 Content Sharing and Reuse ...............................................................................................11 Document Assembly...........................................................................................................12 Document Security .............................................................................................................13 Managing Sensitive Information .........................................................................................13 Document Styling................................................................................................................13 Document Profiling .............................................................................................................14 Conclusion ..............................................................................................................................15 References..............................................................................................................................15 Summary Microsoft® “Office 12” will introduce new default XML file formats for Microsoft Office Word word processing, Excel® spreadsheet, and PowerPoint® presentation graphics programs, and will change the way developers can approach solutions based on Office documents. This white paper explores the new file format and discusses solution opportunities and scenarios for developers. The Microsoft Office XML Open Format 1 Introducing the Microsoft Office Open XML Format By default, documents created in Microsoft “Office 12” will be based on an XML file format definition. This new format is distinct from the binary-based file format that has been a mainstay of past Office versions. The new Microsoft Office Open XML Format introduces a number of benefits that will accrue not only to developers and the solutions they build, but also to individual users and organizations of all sizes. The following highlights are some of overall benefits of the Open XML Format: • Open and Royalty-Free – The Open XML Format is based on XML and ZIP technologies, thereby making it universally accessible. The specification for the format and schemas will be published and made available under the same royalty-free license that exists today for the Microsoft Office 2003 Reference Schemas and which is openly offered and available for broad industry use. • Interoperable – With industry standard XML at the core of the Open XML Format, exchanging data between Microsoft Office applications and enterprise business systems is greatly simplified. Without requiring access to the Office applications, solutions can alter information inside an Office document or create a document entirely from scratch by using standard tools and technologies capable of manipulating XML. • Robust – The Open XML Format has been designed to be more robust than the binary formats, and, therefore, will reduce the risk of lost information due to damaged or corrupted files. Even documents created or altered outside of Office are less likely to corrupt, as Office programs have been designed to recover documents with improved reliability by using the new format. • Efficient – The Open XML Format uses ZIP and compression technologies to store documents. This type of file compression offers potential cost savings as it reduces the disk space required to store files and decreases the bandwidth needed to transport files by way of e-mail, over networks, and across the Web. • Secure – The openness of the Open XML Format translates to more secure and transparent files. Documents can be shared confidently because personally identifiable information and business sensitive information, such as user names, comments and file paths, can be easily identified and removed. Similarly, files containing content, such as OLE objects or Visual Basic® for Applications (VBA) code can be identified for special processing. For further discussion on the Microsoft Office Open XML Format and its associated benefits, refer to the Microsoft “Office 12” Preview Web site listed in the reference section at the end of this document. The remainder of this document will focus on a technical discussion of the Open XML Format and the opportunities it presents for developers. The Microsoft Office XML Open Format 2 File Structure • ZIP container with compression • Multiple XML parts describing file data, metadata, customer data • Non-XML parts supported as native files (images, OLE objects) • Relationships define file structure At the core of the new Microsoft Office Open XML Format is the usage of XML reference schemas and a ZIP container. The combination of XML and ZIP allows for a very robust and modular format that enables a large number of new scenarios. Each file is composed of a collection of any number of parts; this collection defines the document. Document parts are held together by the container file or package using the industry standard ZIP format. Most parts are simply XML files that describe application data, metadata, and even customer data stored inside the container file. Other non-XML parts may also be included within the container package, and include such parts as binary files representing images or OLE objects embedded in the document. Parts can specify a relationship to other parts; this design provides the structure for an Office file. While the parts make up the content of the file, the relationships describe how the pieces of content work together. The result is an XML file format for Office documents that is tightly integrated but modular and highly flexible. The next few sections will explore each component of the Open XML Format in greater detail and cover specific discussions about how the three participating Office programs utilize the format. ZIP Package • Industry standard format • User gets a ‘single file’ experience • Compression reduces storage requirements • Developers can process file with standard tools There are many elements that go into creating an Office document. Some of these are commonly shared across all the Office applications, for example, document properties, styles, charts, hyperlinks, comments, and annotations. Other elements are specific to each application, like worksheets in Excel, slides in PowerPoint or headers and footers in Word. When users save a document with the current or previous versions of Microsoft Office, a single file is written to disk, which can then easily be opened subsequently. This metaphor is important to how documents are stored, managed and shared in practice. By wrapping the individual parts of an “Office 12” file in a ZIP container, documents will still remain a single file instance. The use of a single package file to represent the entity of a single document means users will have the same experience that they do today when saving and opening documents with “Office 12.” The Microsoft Office XML Open Format 3 “Office 12" File Container Document Properties WordML / SpreadsheetML , etc. Charts Custom-Defined XML Images , Video, Sound Embedded Code /Macros Comments With previous Office versions, developers looking to manipulate the content of an Office document had to know how to read and write data according to the structured storage defined within the binary file. This process is known to be complex and challenging, notably because the Office binary file formats were designed to be primarily accessed through the Office programs. The formats were structured to mirror the in-memory structures of the applications and to run on low memory machines with slow, hard drives. Altering Office binary files programmatically without the Office applications has also been identified as a leading cause of file corruption, and has deterred some developers from even attempting to try to make alterations to the files. ZIP was chosen because it is a well-understood industry standard. There are many tools available today to work with the ZIP