Editor: Barry Leiba • leiba@watson..com Standards OpenDocument Format The for Office

Rob Weir • IBM

OpenDocument Format (ODF) is an XML-based format for office documents, such as , text documents, and presentations. ODF is application-, platform-, and vendor-neutral and thereby facilitates broad of office documents.

lthough personal productivity applica- this was often the pragmatic choice. An ad hoc tions (PPAs) and their associated file for- , designed specifically for a given ap- A mats have been around for many years, plication, will typically parse faster and con- they haven’t been the subject of sume less memory than a general-purpose one. until very recently. We can better understand In fact, to optimize performance, some early for- the current importance and relevance of stan- mats were little more than a direct serialization dardization in this area if we first consider the of their application’s internal data structures. technological, historical, and economic Public availability of documentation for these of electronic documents. early file formats varied. Some vendors made After more than a decade of evolution, PPA their formats available on demand or published functionality has converged on a set of capa- them in form. Others never released their bilities that are conventionally expressed via formats. Still others released them under re- three application types: a , a strictive licenses that limited use by competi- , and a presentation ap- tors. Interoperability among word processors, plication. Although , , Sun, IBM, where it existed at all, was hard-won and often and all offer PPAs, as do open source accomplished by reverse engineering and trial vendors, the difference in functionality among and error. these offerings is relatively minor compared to And so it was, through much of the 1980s and the difference in functionality among the three 1990s. Although the formats weren’t interoper- application types. In other words, the difference able or specified in open standards, this was tol- between and Corel WordPerfect erable because office documents were primarily is minor compared to the difference between exchanged as printed copies. Where exchanges Microsoft Word and . Given this occurred electronically, they were most often convergence in functionality within each ap- between known parties, such as workers in the plication type, we can now describe their core same office, using the same software. Interop- commonalities in an open, standardized docu- erability in such cases was trivially obtained ment format. because authors could make strong, Historically, we weren’t so fortunate. When valid assumptions about the receiving party’s we examine the history of PPAs, we see that the software and platform. In other words, if every- general practice was for each vendor to define a one you correspond with uses the same version file format specifically and exclusively for use of the same word processor running on the same with its own particular product. So, WordPerfect version of the same configured had its own format, as did WordPro, Word, Word- the same way with the same installed, Star, XyWrite, Writing Assistant, StarOffice, and then, in theory, exchanges so on. From a technical perspective, in the days should be straightforward. But reality is rarely when RAM and processing cycles were precious, so cooperative.

March/April 2009 1089-7801/09/$25.00 © 2009 IEEE Published by the IEEE Computer Society 83

Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on March 12, 2009 at 10:45 from IEEE Xplore. Restrictions apply. Standards

From the Department Editor

or many years, when you used a word-processing application, your files were ing to defined archival policies. F stored in a format that was fully understood only by the application you used. Increasingly, governments commu- WordPerfect stored WordPerfect files, Microsoft Word stored Word files, and nicate with citizens electronically, so on — and, while developers often included ad hoc support for their competi- both in publishing government - tors’ formats, that was a hit-and-miss thing, vulnerable to changes in proprietary uments and in receiving documents formats. The same went for presentation slides and spreadsheets. We all have ex- from the public. When a government perience with the results of this, with the difficulties in exchanging files between adopts, by deliberate policy or iner- different applications. tial passivity, a vendor’s proprietary ODF — OpenDocument Format — addresses this problem by providing a stan- document format — which then pro- dard format for storage and exchange of office documents. It works on Windows, motes the exclusive use of that same MacOS, , and other platforms, and it works for any application whose vendor vendor’s products — this requirement chooses to implement it. In this issue’s “Standards” department, IBM’s Rob Weir, who then trickles down to all those who worked on the ODF standard, will take us through some of its history and details. wish to with that government. — Barry Leiba This unintended consequence thus becomes an impediment to those in society least able to afford the pro- prietary software. For example, in Why ODF Now? structures of a single vendor’s appli- the aftermath of Hurricane Katrina, In recent years, our complacency cations — then documents created us- the US Federal Emergency Manage- with a lack of office-document stan- ing those applications will only work, ment Agency (FEMA) set up a Web dards has been shattered. Document or work well, with those applications. site for disaster victims to apply for formats, which historically were un- As the user continues to create and federal assistance. However, the Web seen and unspoken of, are now hotly accumulate documents in that pro- site was designed to work only with debated. Governments throughout prietary format, his or her ability to Microsoft Explorer running the world now mandate or recom- evaluate and migrate to alternative on , rather than mend the use of ODF.1 Corporations PPAs in the market diminishes due strictly following the W3C’s HTML are investing significant time and to substantial switching costs. The standard, which would have worked effort to further ODF adoption or, in presence of high switching costs is everywhere. So, hurricane victims some cases, prevent it. Why the in- the essence of vendor lock-in and is attempting to connect from a friend’s creased emphasis on document for- a significant challenge from the per- or the library’s Linux ma- mat standards? spective of the user, who now faces chine were unable to apply for aid Several factors are at play here. effectively diminished options; the online.2 Dependency on a single First, when the World Wide Web be- potential competitor, who finds it vendor’s proprietary formats is bad came a mainstream success, new docu- more difficult to enter a market dom- economics as well as bad policy. ment distribution patterns emerged. inated by an incumbent with an es- Authors could now easily post docu- tablished lock-in; and public policy, Document Format ments online for anyone to download, which has a long-standing practice of Technical Requirements but they could no longer assume that encouraging competition. One fun- A standard office format must sup- the receiving party was using the damental way to reduce the strength port the gamut of current uses. Of- same software stack. The receiver’s of vendor lock-in and encourage sub- fice documents, as used today, range capabilities were unknown and, in stitutability of equivalent goods is by from unstructured, free-form docu- principle, unknowable. In such an adopting and using open standards ments, to semi-structured memos, to environment, we can achieve in- that encourage interoperability. strongly structured forms. PPA us- teroperability only through deliber- The debate these issues generate ers range in awareness from novices ate engineering efforts, not by merely has been amplified by its predomi- with little or no conception of formal assuming a universe of homogeneous nate impact on public administra- document structure and the ben- systems. The primary way to do this is tions, in which the document-based efits of style and content separation by creating and adopting standards. bureaucracy, along with being pe- to highly sophisticated users creat- Second is the economic impact of rennially budget-challenged, is con- ing highly structured documents as “vendor lock-in” and the deleterious strained by public accountability, the part of a larger document-processing effects that ensue. If a document for- need to provide equal access to citi- pipeline. A document format must mat is designed traditionally — that zens, and the often legally mandated facilitate and encourage modern best is, to closely reflect the internal data duty to preserve documents accord- practices in document design, but it

84 www.computer.org/internet/ IEEE INTERNET COMPUTING

Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on March 12, 2009 at 10:45 from IEEE Xplore. Restrictions apply. OpenDocument Format

should still enable ad hoc use by nov- Similarly, a document format should • be suitable for office documents ice users, who conceptualize docu- facilitate easy global and local navi- containing text, spreadsheets, ments merely in WYSIWYG terms. gation — for instance, by encourag- charts, and graphical documents; A document format must deal ing the use of structured tables with • be compatible with XML v1.0 and with the complexity of modern PPAs, explicitly declared row and column W3C namespaces in XML v1.0 including a lengthy list of conven- headers. Defining the optimal char- specifications; tional features such as headers and acteristics of “accessible” documents • preserve the structure of the doc- footers, footnotes, tables of con- is a crucial and ongoing task. ument to allow re-editing (for ex- tents, revision tracking, and so on. Note also that not all document ample, footnotes must be stored It should define an explicit encoding consumers are human users. Increas- as structured footnotes, not just for such common features and allow ingly, we see that documents are read as text in the document that looks extensibility for more arcane fea- and even created by software auto- like a footnote); tures that only a single implementa- mation. From the simple text scanner • be friendly to transformations us- tion might use. for a search engine’s indexer to more ing the W3C’s Extensible Stylesheet Electronic documents exist in sophisticated information extraction Language (XSLT) or similar XML- an international context and need and analysis applications, the trend based languages or tools; to adapt to international linguistic, is for a document to be read many • keep the document’s content and cultural, and business customs, both more times, and by many more ap- layout information separate to en- current and historical. This encom- plications, than were ever involved able independent processing; and passes issues such as differing char- in creating it. So, a document format • “borrow” from similar, existing acter sets, bidirectional text, ruby must be platform independent as well standards wherever possible and (furigana) , varying ways as adaptable to disparate application permitted. of representing currency symbols, and machine types. Documents are dates, and numbers, varying rules not only for PC use. They are contributed its for sorting (collating) text, and so read and written on “headless” serv- specification of the XML format that on.3 Because these textual conven- ers without any user interface; edited OpenOffice.org uses because it was tions have evolved over centuries of on Web-browser-based word proces- close to meeting these requirements uncoordinated scribal practice, they sors; or even viewed and edited on already, and the TC used this as a don’t easily reduce to a single, clean, mobile phones and other small form- starting point to develop ODF. logical formulation. factor devices with constrained user OASIS published ODF 1.0 in A document format must consider interfaces. A document format must May 2005; the International Or- application and data security as well accommodate all the places and ways ganization for ­Standardization/ as issues such as document encryp- in which we use documents. International Electrotechnical Com- tion, digital signatures, and the rami- mission ratified it in May 2006 as ISO/ fications of executable code embedded A History of ODF IEC 26300:2006 (www.iso.org/iso/iso within documents via scripts or mac- The ODF standard was created and _catalogue/catalogue_tc/catalogue ros. For example, all executable code is maintained by the ODF Technical _detail.htm?csnumber=43485). in a document must be clearly de- Committee (TC) within the Organiza- Promotion has occurred via the clared as such to enable third-party tion for the Advancement of Struc- OpenDocument Format Alliance5 antivirus utilities to find and scan the tured Information Standards (OASIS). (not formally associated with OASIS code for threats. See the “ODF Resources” sidebar for or the OASIS ODF TC) and the OASIS Users of electronic documents links to more information. ODF Adoption TC. vary in their abilities. For instance, OASIS formed the ODF TC in OASIS published a minor re- users with visual impairments might late 2002; it includes representa- lease, ODF 1.1, in February 2007 to require “,” such tives from commercial and open address a few issues as screen readers, to read and edit source software publishers, govern- identified during an early ODF de- documents.4 To be compatible with ment, and ­academia. The TC’s charter ployment by the Commonwealth of such devices, a document format must aimed to create an open, XML-based Massachusetts. ensure that it can encode textual an- file format specification for office The OASIS ODF Interoperability notations for any content that would applications (http://lists.oasis-open. and Conformance TC was created otherwise be purely graphical. An org/archives/tc-announce/200211/ in late 2008 to facilitate interop- embedded image, for example, must msg00001.). The file format erability among ODF implementa- allow an associated text description. needed to tions, with plans to develop an ODF

March/April 2009 85

Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on March 12, 2009 at 10:45 from IEEE Xplore. Restrictions apply. Standards

ODF Resources

or definitive information on OpenDocument Format (ODF), see the Web page archive, most documents are smaller F for the Organization for the Advancement of Structured Information Standards than they would be in an equivalent (OASIS) OpenDocument Format Technical Committee (ODF TC): www. proprietary binary format such as -open.org/committees/tc_home.php?wg_abbrev=office. Microsoft Word’s .doc format. You can download the current specification (ODF 1.1) here: http://docs.oasis Each ODF file contains four key -open.org/office/v1.1/OS/OpenDocument-v1.1.pdf. embedded XML files: The TC is co-chaired by Michael Brauer ([email protected]) and Rob Weir ([email protected]). • manifest.xml, which is the table The TC invites public comments on ODF via its public comment list: www.oasis of contents for the ODF container -open.org/committees/comments/index.php?wg_abbrev=office. and lists all contained files keyed The ODF Adoption TC maintains the OpenDocument..org Web site: http:// by MIME content type (for exam- .xml.org. ple, “text/xml” or “image/png”); Whitepapers and case studies related to ODF adoption are available at the ODF • meta.xml, which contains docu- Alliance’s Web site: www.odfalliance.org. ment-level , including J. David Eisenberg has written a book, OASIS OpenDocument Essentials, which he the bibliographic fields defined makes freely available online: http://books.evc-cit.info/. by the Dublin Core element set, The ODF Toolkit Union has open source libraries for manipulating ODF content such as creator, title, subject, here: http://odftoolkit.org. and language; • styles.xml, which contains the style definitions for the docu- ment, such as what text attributes conformance test suite and host standards and avoiding vendor lock- should be used for “header 1” style interoperability “plugfests,” among in is high. In fact, ODF’s existence versus what attributes should be other activities. and promotion has helped raise gen- used for “normal” text; and eral awareness about these topics. • content.xml, which contains the Application and Tool Support Several policy decisions at vari- document’s structured content, Most leading word processors sup- ous government levels have recom- including paragraphs, lists, ta- port ODF, either natively “out of mended or even mandated using ODF, bles, frames, and images. the box” or via freely downloadable including 16 national governments extensions or plug-ins. Commercial and eight provincial governments ODF’s attribute vocabulary is applications supporting ODF include (, , , based on that used in the W3C’s XLS traditional desktop word proces- , , , , Formatting Objects (XSL:FO) stan- sors such as , Corel , the , , dard (www.w3.org/TR/xsl11). So, a WordPerfect, and Lotus Symphony, and , to name a few). text style definition for text that’s in as well as Web-based word proces- 20-point, red, would be sors such as & Spread- Salient Technical Features defined as sheets and Zoho Writer. Open source An ODF document is typically stored implementations of ODF include and d ist r ibuted in a conta iner fi le con-

elements, simi- ODF has come along when awareness, es such as XML for increased verbos- lar to HTML. links use especially within public administra- ity in the data representation, once the W3C’s XLink standard (www. tions, about the importance of open ODF’s XML is compressed into a .zip w3.org/TR/xlink). So, a default style

86 www.computer.org/internet/ IEEE INTERNET COMPUTING

Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on March 12, 2009 at 10:45 from IEEE Xplore. Restrictions apply. OpenDocument Format

paragraph with a hyperlink to a Web mit suggestions to the ODF comment www.huffingtonpost.com/russell-shaw/ page would look like this: mailing list at www.oasis-open. stupid-fema-tech-blunder-_b_7055.html. org/committees/comments/index. 3. Y. Savourel, J. Kosek, and R. Ishida, “Best More Although the initial versions of W3C, 2008; www.w3.org/TR/xml-i18n-bp. ­information on ODF can be the ODF standard have focused on 4. P. Korn and R. Schwerdtfeger, “Open Docu- found encoding the storage format for the ment Format v1.1 Accessibility Guidelines here. packaging format for bundling mul- -v1.0.. tiple XML files and associated media, 5. OpenDocument Format Alliance, “ODF More complicated constructs are text structure and formatting, vector Annual Report 2008,” 2008; www.odf possible, such as multilevel lists, graphics, and mathematical equa- alliance.org/resources/Annual-Report merged cells in tables, and vector tions — are also applicable to a wid- -ODF-2008.pdf. graphics. But the principles remain er range of application types, such the same: ODF is application- and as project , outlining, Rob Weir is a software architect for IBM. His vendor-neutral and uses existing mind-mapping software, or . research interests are ODF and personal standards where practical. productivity applications. Weir has a BA in astronomy and astrophysics from ODF Futures he conventional WYSIWYG Harvard College. He is cochair of the The OASIS ODF TC is currently com- T word processor could be nearing OASIS ODF TC, as well as a member of pleting work on its draft of ODF 1.2, the end of its useful lifetime. ODF the OASIS ODF Adoption TC, the OASIS which we hope will be ready for for- might evolve to take on greater so- ODF Interoperability and Conformance mal public review and approval as phistication in the area of semantic TC, and a representative in ISO/IEC JTC an OASIS standard in mid-2009. ODF encoding, with facilities to let au- 1 SC34 for the US. Weir maintains a blog 1.2 will focus mainly on thors capture, in a structured way, covering ODF and related topics at www. more of what they’re thinking. Hu- robweir.com/blog. Contact him at robert • the addition of an RDF/XML and man thought is far too rich and di- [email protected]. OWL-based metadata framework verse to be captured merely as bold, to allow metadata annotations italic, or underlined. An allowance of ODF content at a fine-grained for semantic layers could let authors level, which will facilitate appli- encode not just their assertions but cations such as semantic tagging, also their judgments, estimations real-time collaborative editing, of certainty and doubt, facts versus and document compositing from opinions, provenance, authority, and shared fragments; so on in a way that would better lend • the specification of a detailed itself to visualization, , and THEN expression language for spread- analysis. The challenge, which we sheet formulas, called OpenFor- eagerly anticipate, is to evolve ODF mula, which contains hundreds in a direction that embraces these of commonly used logical, math- (and other) possibilities. Learn about ematical, financial, and scientific computing history functions; and References and the people • additional enhancements to fur- 1. “Governments Increasingly Turn to who shaped it. ther increase accessibility. OpenDocument Format as ODF Alli- ance Marks Unprecedented 2008,” ODF COMPUTING In a parallel effort, as ODF 1.2 is Alliance, 2008; www.odfalliance.org/ completed, the ODF TC has created press/Release20081222-annual-report- an “ODF-Next” requirements sub- odf-2008.pdf. committee to collect, classify, and 2. R. Shaw, “Stupid FEMA Tech Blunder http://computingnow. prioritize feature proposals for sub- Will Thwart Some Katrina Claim Appli- computer.org/ct sequent ODF versions. You can sub- cations,” Huffington Post, 2 Sept. 2005;

March/April 2009 87

Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on March 12, 2009 at 10:45 from IEEE Xplore. Restrictions apply.