Opendocument Format the Standard for Office Documents
Total Page:16
File Type:pdf, Size:1020Kb
Editor: Barry Leiba • [email protected] Standards OpenDocument Format The Standard for Office Documents Rob Weir • IBM OpenDocument Format (ODF) is an XML-based open standard file format for office documents, such as spreadsheets, text documents, and presentations. ODF is application-, platform-, and vendor-neutral and thereby facilitates broad interoperability of office documents. lthough personal productivity applica- this was often the pragmatic choice. An ad hoc tions (PPAs) and their associated file for- file format, designed specifically for a given ap- A mats have been around for many years, plication, will typically parse faster and con- they haven’t been the subject of standardization sume less memory than a general-purpose one. until very recently. We can better understand In fact, to optimize performance, some early for- the current importance and relevance of stan- mats were little more than a direct serialization dardization in this area if we first consider the of their application’s internal data structures. technological, historical, and economic context Public availability of documentation for these of electronic documents. early file formats varied. Some vendors made After more than a decade of evolution, PPA their formats available on demand or published functionality has converged on a set of capa- them in book form. Others never released their bilities that are conventionally expressed via formats. Still others released them under re- three application types: a word processor, a strictive licenses that limited use by competi- spreadsheet, and a presentation graphics ap- tors. Interoperability among word processors, plication. Although Microsoft, Corel, Sun, IBM, where it existed at all, was hard-won and often and Google all offer PPAs, as do open source accomplished by reverse engineering and trial vendors, the difference in functionality among and error. these offerings is relatively minor compared to And so it was, through much of the 1980s and the difference in functionality among the three 1990s. Although the formats weren’t interoper- application types. In other words, the difference able or specified in open standards, this was tol- between Microsoft Word and Corel WordPerfect erable because office documents were primarily is minor compared to the difference between exchanged as printed copies. Where exchanges Microsoft Word and Microsoft Excel. Given this occurred electronically, they were most often convergence in functionality within each ap- between known parties, such as workers in the plication type, we can now describe their core same office, using the same software. Interop- commonalities in an open, standardized docu- erability in such cases was trivially obtained ment format. because document authors could make strong, Historically, we weren’t so fortunate. When valid assumptions about the receiving party’s we examine the history of PPAs, we see that the software and platform. In other words, if every- general practice was for each vendor to define a one you correspond with uses the same version file format specifically and exclusively for use of the same word processor running on the same with its own particular product. So, WordPerfect version of the same operating system configured had its own format, as did WordPro, Word, Word- the same way with the same fonts installed, Star, XyWrite, Writing Assistant, StarOffice, and then, in theory, electronic document exchanges so on. From a technical perspective, in the days should be straightforward. But reality is rarely when RAM and processing cycles were precious, so cooperative. MARCH/APRIL 2009 1089-7801/09/$25.00 © 2009 IEEE Published by the IEEE Computer Society 83 Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on March 12, 2009 at 10:45 from IEEE Xplore. Restrictions apply. Standards From the Department Editor or many years, when you used a word-processing application, your files were ing to defined archival policies. F stored in a format that was fully understood only by the application you used. Increasingly, governments commu- WordPerfect stored WordPerfect files, Microsoft Word stored Word files, and nicate with citizens electronically, so on — and, while developers often included ad hoc support for their competi- both in publishing government doc- tors’ formats, that was a hit-and-miss thing, vulnerable to changes in proprietary uments and in receiving documents formats. The same went for presentation slides and spreadsheets. We all have ex- from the public. When a government perience with the results of this, with the difficulties in exchanging files between adopts, by deliberate policy or iner- different applications. tial passivity, a vendor’s proprietary ODF — OpenDocument Format — addresses this problem by providing a stan- document format — which then pro- dard format for storage and exchange of office documents. It works on Windows, motes the exclusive use of that same MacOS, Linux, and other platforms, and it works for any application whose vendor vendor’s products — this requirement chooses to implement it. In this issue’s “Standards” department, IBM’s Rob Weir, who then trickles down to all those who worked on the ODF standard, will take us through some of its history and details. wish to deal with that government. — Barry Leiba This unintended consequence thus becomes an impediment to those in society least able to afford the pro- prietary software. For example, in Why ODF Now? structures of a single vendor’s appli- the aftermath of Hurricane Katrina, In recent years, our complacency cations — then documents created us- the US Federal Emergency Manage- with a lack of office-document stan- ing those applications will only work, ment Agency (FEMA) set up a Web dards has been shattered. Document or work well, with those applications. site for disaster victims to apply for formats, which historically were un- As the user continues to create and federal assistance. However, the Web seen and unspoken of, are now hotly accumulate documents in that pro- site was designed to work only with debated. Governments throughout prietary format, his or her ability to Microsoft Internet Explorer running the world now mandate or recom- evaluate and migrate to alternative on Microsoft Windows, rather than mend the use of ODF.1 Corporations PPAs in the market diminishes due strictly following the W3C’s HTML are investing significant time and to substantial switching costs. The standard, which would have worked effort to further ODF adoption or, in presence of high switching costs is everywhere. So, hurricane victims some cases, prevent it. Why the in- the essence of vendor lock-in and is attempting to connect from a friend’s creased emphasis on document for- a significant challenge from the per- Macintosh or the library’s Linux ma- mat standards? spective of the user, who now faces chine were unable to apply for aid Several factors are at play here. effectively diminished options; the online.2 Dependency on a single First, when the World Wide Web be- potential competitor, who finds it vendor’s proprietary formats is bad came a mainstream success, new docu- more difficult to enter a market dom- economics as well as bad policy. ment distribution patterns emerged. inated by an incumbent with an es- Authors could now easily post docu- tablished lock-in; and public policy, Document Format ments online for anyone to download, which has a long-standing practice of Technical Requirements but they could no longer assume that encouraging competition. One fun- A standard office format must sup- the receiving party was using the damental way to reduce the strength port the gamut of current uses. Of- same software stack. The receiver’s of vendor lock-in and encourage sub- fice documents, as used today, range capabilities were unknown and, in stitutability of equivalent goods is by from unstructured, free-form docu- principle, unknowable. In such an adopting and using open standards ments, to semi-structured memos, to environment, we can achieve in- that encourage interoperability. strongly structured forms. PPA us- teroperability only through deliber- The debate these issues generate ers range in awareness from novices ate engineering efforts, not by merely has been amplified by its predomi- with little or no conception of formal assuming a universe of homogeneous nate impact on public administra- document structure and the ben- systems. The primary way to do this is tions, in which the document-based efits of style and content separation by creating and adopting standards. bureaucracy, along with being pe- to highly sophisticated users creat- Second is the economic impact of rennially budget-challenged, is con- ing highly structured documents as “vendor lock-in” and the deleterious strained by public accountability, the part of a larger document-processing effects that ensue. If a document for- need to provide equal access to citi- pipeline. A document format must mat is designed traditionally — that zens, and the often legally mandated facilitate and encourage modern best is, to closely reflect the internal data duty to preserve documents accord- practices in document design, but it 84 www.computer.org/internet/ IEEE INTERNET COMPUTING Authorized licensed use limited to: KnowledgeGate from IBM Market Insights. Downloaded on March 12, 2009 at 10:45 from IEEE Xplore. Restrictions apply. OpenDocument Format should still enable ad hoc use by nov- Similarly, a document format should • be suitable for office documents ice users, who conceptualize docu- facilitate easy global and local navi- containing text, spreadsheets, ments merely in WYSIWYG terms. gation — for instance, by encourag- charts, and graphical documents; A document format must deal ing the use of structured tables with • be compatible with XML v1.0 and with the complexity of modern PPAs, explicitly declared row and column W3C namespaces in XML v1.0 including a lengthy list of conven- headers. Defining the optimal char- specifications; tional features such as headers and acteristics of “accessible” documents • preserve the structure of the doc- footers, footnotes, tables of con- is a crucial and ongoing task.