Technology & Standards Watch

XML-based Office Document Standards by Walter Ditch

Version 1.0 First published August 2007 Publisher JISC: Bristol, UK Copyright owner Higher Education Funding Council for England To make sure you are reading the latest version of this report, you should always download it from the original source. Original source http://www.jisc.ac.uk/techwatch

© HEFCE 2007 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

Executive Summary Historically, standardisation of the office document formats we use in our everyday working environment has been achieved through the widespread adoption of products from a very small number of suppliers. Initially this was helpful as it meant that a kind of de facto interoperability was achieved, but it has also created a form of vendor lock-in, which requires users to have purchased a particular brand of software product in order to be able to undertake everyday office tasks. This use of de facto, proprietary standards has become increasingly unacceptable, especially within the public sector, where information has to be provided to members of the public without requiring them to have bought software from a particular vendor. Policy moves from within the EU and elsewhere are driving the use of open standards to encourage open and inclusive document exchange. With current trends in office document file formats showing a strong move towards open, standards-based XML formats and away from closed solutions, and with major government and corporate software contracts increasingly demanding compatibility with open standards (many of which are based on the ubiquitous XML), competing software vendors have understandably been keen to have their own preferred office file formats endorsed as open standards. Recent developments related to standards approvals have at times shown something of an undignified rush to the standards 'finish line', with interested parties promoting acceptance of their own solutions, while being directly or indirectly hostile to competing proposals. Developments related to modifiable office document file formats are at a crucial stage. The ISO 26300: 2006 OpenDocument Format for Office Applications (ODF) is being challenged by Ecma-376: Office Open XML (OOXML). At the present time, the OOXML format is progressing through the ISO/IEC's six-month fast track approval process, and, if approved, would result in the existence of two ISO standards—a matter that has caused considerable controversy. This report discusses the above developments and the issues raised, provides a brief comparison of the main technical advantages and disadvantages of ODF and OOXML and analyses the possible outcomes of the standards approval process and their significance to education. The report also includes mention of Adobe's Portable Document Format (PDF) which, although not an XML-based office format, is the most widely used format for documents that are uploaded to the Web. This makes it an important feature of the office document landscape, especially where the electronic provision of non-revisable documents to the general public is concerned. The report proposes that although the UK higher education sector has, for a long time, understood the interoperability benefits of open standards, it has been slow to translate this into easily understandable guidelines for implementation at the level of everyday applications such as office document formats. As far as higher education is concerned, the use of office document formats has now reached a watershed. There is an urgent need for co-ordinated, strategically informed action over the next five years, if the higher education community is to facilitate a cost effective approach to the switch to XML-based office document formats.

2 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

Table of Contents

1. Introduction 4

1.1 Office applications and binary formats 4

1.2 A short history of document file formats 4 1.3 What are standards? 5 2. Towards open standards for office documents 6

2.1 Government moves towards interoperability 6

2.2 Education sector developments 8

2.3 Defining open standards 9 2.4 Vendor-led moves towards open standards 11

2.5 Implications 14

3. Comparing ODF and OOXML 17

3.1 Technical analysis: ODF 17

3.2 Technical analysis: OOXML 19

3.3 Format conversion and associated problems 22 3.4 Legal issues 25

4. Future developments 27

4.1 Trends in the market for office documentation software 27

4.2 Online office documentation services 29

4.3 Living in a two-format world 30

4.4 Semantic Web 31 5. Implications for education 33

5.1 Fidelity and backwards compatibility 33

5.2 Opportunities 34

6. Conclusion and recommendations 36

About the Author 38 Appendix A: What are standards? 39

Appendix B: Numbers of office documents published on the Web 42

References 43

3 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

1. Introduction 1.1 Office applications and binary formats We are all familiar with the day-to-day office applications that sit on our . They allow us to read, create and edit a range of different types of content (words, drawings, etc.) and store them onto our hard drives as different types of office document (for example, a word processed , a of figures or a presentation). These software packages can be categorised in two ways: those that allow the creation and editing of content and those that simply allow the display or printing of content. Both these categories of software manipulate content that is stored as a file on the user's hard-disc or network storage, separate to the actual software package that uses it. The format of this file has now become a high profile issue. 1.2 A short history of document file formats 1.2.1 Binary files In the early days of personal there were many word processing and other office- related applications available. These applications usually made use of binary format files, i.e. the human readable content () was encoded into a machine-readable representation of the data, in binary form (Goldfarb and Prescod, 1998). The exact details of the representation or encoding were often a proprietary standard and undocumented, and thus difficult for software from other vendors to read or process. This means that content has become deeply coupled with the software that was used to create and handle it. The problem with this was that, because there were so many different software packages, which were invariably unable to read another vendor's format, users found it very difficult to exchange documents with each other1. Eventually, as the market matured in the 1980s, a relatively small number of such proprietary file formats, such as those generated by WordPerfect or Lotus 1-2-3, and, later, 's ., .xls, and .ppt file types (or, for read only access at least, Adobe's . file type), came to dominate. This meant that a kind of interoperability was achieved through market consolidation. This is an example of de facto standardisation: in order to be able to read and edit the files sent from other people, one needs to 'join the club' and invest in the same software. This is a form of what economists refer to as a Network Effect2. 1.2.2 Towards XML Since the 1960s computer scientists have worried about the lack of interoperability and exchangeability of documents between different software applications and there has been an ongoing move towards developing a common document format. Debates about commonality also took place in parallel to discussions about abstracting – the ability to abstract the meaning of information in a document and separate this from its rendition (i.e. presentation) (Goldfarb and Prescod, 1998). These discussions led to the development, in the late 1970s and early 1980s, of the Standard Generalized (SGML). Later, as part of its work in the 1990s, the W3C developed a subset of SGML that would retain SGML's major virtues but also "embrace the Web ethic of minimalist simplicity" (Goldfarb and Prescod, 1998, p. 17). This new language was Extensible Markup Language or XML.

1 Such difficulties with formats for storing information were not new. Punched cards were produced in competing formats by IBM and UNIVAC (an 80 column and a 90 column version) until into the 1960s (see: http://www.cs.uiowa.edu/~jones/cards/history.html)

2 For more information on the Network Effect see (Anderson (P), 2007)

4 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

Although, formally, XML is a W3C Recommendation for creating markup languages, for the purpose of this discussion we can simply state that XML is a standard format that can be used to store and organise information. The information in an XML file is in format and thus can be opened by a simple text editor and read by a human. This means that content held in an XML file can be abstracted from its mode of representation and be used across a huge variety of applications. The benefits of the new markup language were widely seen and it has been taken up by a large variety of different information management and software communities. XML has developed to become an essential tool, a kind of lingua franca, for the interchange of data between software, computer systems, documents, databases etc. and as a format for document storage. It is generally accepted that documents stored in XML and plain text files (rather than binary) will be readable and processable long into the future. This flexibility and potential for interoperability has been of considerable interest to a variety of users, and in particular has significantly affected public sector policy in relation to office document formats, as will be seen in the next section. This has resulted in a situation where the use of binary file formats, particularly where these require the use of , is no longer seen as acceptable. Such use of a proprietary, de facto standard is increasingly being seen as a form of vendor lock-in, which reduces consumer choice and increases cost. 1.3 What are standards? In general, 'standards' are designed to provide a kind of blueprint for someone who wants to build something. They can bring together current best practice, so that the thing that is built is safe, for example, or can be used to provide conformity, when we need to ensure interoperability between different components. It is important to remember that the standard is not the thing that is built: there is a distinction that needs to be made between 'a standard which may be implemented' and 'something that is an implementation of a standard' (Sutor, 2006, p.4). For more information on standards see appendix A.

5 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

2. Towards open standards for office documents The tacit acceptance of proprietary office file formats as a way of achieving interoperability is becoming less acceptable. Government agencies, in particular, are becoming increasingly conscious of the need to provide easy access to electronic documents to all stakeholders, while not requiring them to purchase a particular software product in order to view or edit these documents. The requirement to provide long term availability and archiving of documents is also encouraging a move away from proprietary file formats, particularly where future access might only be via a single supplier's software products. The HE/FE sector depends heavily on the effective transfer of information, currently making use of a range of standards, including HTML (data-driven Web for example), TCP/IP (internal and external network transport protocols), together with a number of (mostly proprietary) office document file types. Major educational application areas include:

 Information related to teaching and learning (e-learning content documents and presentations for example, either using native HTML or a combination of HTML and office file formats)

 course and learner information, often being displayed in HTML form, with information coming from a back-end database

 general administration (memos, reports, spreadsheets, etc., mostly using office file formats)

 provision of publicly available electronic information, including institution and course details, plus documents published under freedom of information legislation, typically using a mixture of HTML, office file formats and Adobe PDF files for read- only content. 2.1. Government moves towards interoperability Government moves towards open standards, particularly related to office document formats, are taking place at a number of levels, including:

 European Union, United Nations and World Trade Organisation

 National Governments

 Regional Governments These developments are significant to HE/FE in that they set the policy framework, at national and international level, in which education operates. Government efforts to develop vendor-neutral and user-centric policies may also present a framework which may influence development of policy in the education sector. Major developments in these areas are outlined in the following sections. 2.1.1 European Union and United Nations The EU has had an interest in the use of open standards to facilitate electronic transfer of information for more than 20 years, as part of its efforts to encourage open and inclusive document exchange within the European Union. A notable early effort in this area was the EU-sponsored development of the Open Document Architecture (ODA) standard, which was intended to facilitate the transfer of documents between applications. Despite being approved as an Ecma standard in 1985 and being finalised as an ISO standard (ISO/IEC 8613) in the mid 1990s, ODA failed to become widely used due to a number of factors, in particular, excessive delays in standards development and lack of vendor support (Mahler, 2006). In a more recent initiative, the European Union commissioned an investigation into

6 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents developments related to existing open document formats and associated market trends, which was undertaken by the Valoris consulting group in 2003. This comprehensive document identified a series of criteria by which competing office document formats could be judged, including:

 use of open standards

 being non binary (i.e. XML-based)

 capable of being modified

 preserving format fidelity (see section 3.3.1)

 offering cross-platform interoperability

 supporting current features

 supporting future word processing features

 being widely adopted The EU's acceptance of the broad findings of the Valoris report was followed by the issue of the European Commission's Telematics between Administrations Committee (TAC) conclusions and recommendations (2004) related to open document formats. These forward looking findings identified the importance of open standards as a key enabler of interoperability between governments, citizens and other stakeholders, and made a number of key statements and recommendations, some of which are summarised below:

 The key role of government in encouraging 'non discriminatory' and 'cross platform' access to electronic information was recognised. This included avoiding placing any requirement on end users to purchase or use specific software, while striving to encourage innovation and competition in the office software marketplace.

 It was recognised that not all public documents needed to be published in editable form, and that interoperability issues would be fewer for read only documents (an indirect reference to Adobe's PDF).

 Given that interoperability issues between modifiable document formats were anticipated, public sector organisations were encouraged to publish documents using multiple formats. Where documents were published in a single editable form, it was recommended that this should use a format around which there is industry consensus, as demonstrated by its approval as an . There have also been parallel initiatives by the United Nations: “All Member States and other stakeholders should have the right to access public information made available in electronic format by the organizations and no one should be obliged to acquire a particular type of software in order to exercise such a right. Organizations should seek to foster the interoperability of their diverse ICT systems by requiring the use of open standards and open file formats irrespective of their choice of software. They should also ensure that the encoding of data guarantees the permanence of electronic public records and is not tied to a particular software provider.” UN Joint Inspection Unit, 2005 (extracted from Recommendation 1)

7 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

2.1.2 National and regional government trends National governments are moving steadily towards the adoption of open standards as a means of ensuring interoperability, although the rate of adoption varies from country to country. In the UK, government interoperability requirements are defined by the e-Government Interoperability Framework (e-GIF), with the e-GIF Technical Standards Catalogue (2005) defining precise interoperability requirements between applications. Section 3 states that “XML specifications for office applications” is an area under consideration for inclusion in a future version of the catalogue. Thus, although the UK Government has published general interoperability guidelines, based on open standards, there is not yet any specific guidance related to office document types. In the USA, the state of the Commonwealth of Massachusetts has become an extremely important player in the policy within which XML-based file formats have been operating. In 2005, the state's Chief Information Officer issued a recommendation that public documents, published within the state, should utilise open, XML-based formats. Massachusetts' decision was in the vanguard of moves to open public documents and so far it is the only US state to have mandated such a move, although there have been attempts in other state legislatures (Fontana, 2007a).

There have also been a number of notable developments in other countries relating to the adoption of standards for office documentation. Key European examples include: Norway, which is currently considering the mandation of the use of ODF for public and government documentation (Kirk, 2007); Belgium, where all government departments must be able to read and exchange ODF-based documents by September 2007 (Orlowski, 2006); and the Spanish region of Extremadura, which announced at the end of July that it would also be adopting ODF as the official format for regional document exchange (Kaplan, 2007). Denmark is also noteworthy, having announced that government agencies will be required to test both OOXML and ODF during a one-year period beginning in early 2008. All new products bought by departments must support at least one of the standards (Ministry of Science, 2007). Further afield, Japan recently announced that it would give procurement preference to products which make use of open standards (ODF Alliance, 2007). The Chinese meanwhile have taken a different approach with the production of an alternative revisable office document standard, called (UOF). UOF seems to have been developed to cater for the specific needs of Chinese users, including specific language requirements, plus the requirement for cost effective software licensing. 2.2 Education sector developments The Higher and Further education communities in the UK have developed a culture which is strongly supportive of open standards, and this is reflected in the development and support activities of JISC and its associated services and projects (Kelly et al., 2006). Policy, strategy, procurement, guidance and funding criteria have developed over the years to promote the use of open standards throughout teaching, learning, research and university administration as part of a three stranded approach: open content, open standards, and open source (Kelly et al., 2007). Within the UK's higher and further education community this work has been largely driven by JISC, UKOLN, CETIS and OSS-Watch. JISC has a long-standing policy of encouraging the development and adoption of open standards within the higher and further education community. The JISC 2007-2009 strategy (JISC 2007) document discusses new technology approaches and outlines three principles in respect of its funded projects. The second of these is a commitment to open standards which "support interoperability between systems whether commercial or open source and where available and broadly adopted, allow institutions to mix and match products of either type and

8 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents to replace products without high switch costs" (p. 27). JISC (2005) also maintains a policy on the related issue of the use of open source software which states, in item 2, part 2: "Documentation, graphics, sound, data and other files must, wherever possible and practicable, use open standards". All of this work , in general, with the developing public sector policy context with regard to office document standards. As well as government moves towards open standards, educational policy relating to both open standards and office document formats is likely to be informed by work taking place within formal standards organisations (e.g. ISO) and vendor organisations and consortia (see section 1.4). 2.3 Defining open standards Rather surprisingly, given these trends, there is no single, universally accepted definition of the term 'open standard', with a number of overlapping (but variable) definitions being provided by international organisations such as the European Union3, International Telecommunications Union4, and by individual national governments5. The European Union's Valoris Report gives the following, deliberately minimalist, definition of an open standard: “The minimum requirements for an open standard are that the document format is completely described in publicly accessible documents, that this description may be distributed freely and that the document format may be implemented in programs without restrictions, royalty-free, and with no legal bindings.” Valoris, 2003, p. 20 Even such an apparently simple statement may be the subject of much debate. Not all standards are 'publicly accessible', for example, with some standards development organisations (SDOs) charging a fee for the provision of standards-related documentation. The ability to freely implement a standard in a software program is a further difficult area, with some vendor-developed technologies being the subject of licensing agreements, or even protected by software patents. Patents, and particularly their implementation within software that uses open standards, have proved controversial in this area. Some legal jurisdictions are more amenable to software patents than others (Wilson, 2005) but there is widespread concern at the proliferation of such patents (Rutledge, 2001). Indeed, some open standards have been developed specifically in order to avoid restrictions associated with existing patents (the PNG graphics standard, for example, was partly a response to patented technology within the GIF ). Even higher education has been slow to create a formal definition, preferring, instead, to identify a series of characteristics (Kelly et al, 2007) such as:

• The development of open standards is the responsibility of a trusted neutral organisation

3 The European Commission's definition of an open standard may be found in the European Interoperability Framework for Pan European eGovernment Services (p.9), which is available online from http://ec.europa.eu/idabc/servlets/Doc?id=19528 [accessed 12/06/07].

4 An non-exhaustive list of features of open standards has been produced by an ITU working group, and is available online at http://www.itu.int/ITU-T/othergroups/ipr-adhoc/openstandards. [accessed 12/06/07].

5 The Danish Governmnent's National IT and Telcomm Agency has produced a document which discusses the characteristics of various types of standard, and their consequent degrees of 'openness'. It is available online from http://www.oio.dk/files/040622_Definition_of_open_standards.pdf [accessed 12/06/07].

9 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

• The responsibility for the ongoing maintenance and development of the standard is taken by a trusted neutral organisation • Involvement in the development of the standard is open to all • There is no discriminatory barrier to use of the standard • Access to the standard is available to all; without any financial barrier

Inherent in these characteristics is knowledge and understanding of the role of standards in product development, and the processes involved in creating them. 2.3.1 Standards Development Organisations (SDO) There are different types of SDOs, playing different roles within the overall standards creation and approval process (see appendix A). Dargan (2005) distinguishes between formal standards bodies (FSB) and vendor organisations and consortia (VOC). In addition, not all SDOs enjoy equal stature. The rigour of the standards development process, the degree of independence, the 'openness' of resulting standards, and the extent to which an SDO's products are subsequently adopted, are all ways of understanding the relative value that is accrued by vendor or consortia 'standards', especially where that specification or standard lays claims to openness. Some definitions of an 'open' specification may require that the process be open to all interested parties, consensus driven, and not overly dominated by a single developer (although the Valoris definition makes no reference to this). This latter requirement, in particular, may restrict the number of standards which can be considered truly open, especially from the UK higher education perspective. In terms of this report, we will use the term 'standard' to indicate approval by a formal standards body such as the BSI or ISO and "" standard to indicate approval by a vendor organisation or consortium. The issue of openness will be an ongoing part of the discussion of this report. Formal Standards Development Bodies (FSB) The foremost of the formal standards bodies is the International Organisation for Standardisation, or ISO6. Although officially a non-governmental organization (NGO) in the sense that its members are not delegations of national governments, in fact, membership is a mixture of national partnerships of industry associations, and institutions that are part of the governmental structure of their countries, or are mandated by their government. This means that it actually has more power than a traditional NGO as the standards it sets may become mandated by national governments. ISO is set up as a network of 157 different countries, with one member per country, and describes itself as "a bridging organization in which a consensus can be reached on solutions that meet both the requirements of business and the broader needs of society, such as the needs of stakeholder groups like consumers and users" (ISO, 2007). Vendor Organisations and Consortia There are very many VOCs in existence, and a full list is beyond the scope of this report. As

6 The discrepancy, in English, between 'International Organization for ' and the shorthand 'ISO' is explained on the ISO website as being due the fact that International Organization for Standardization "would have different abbreviations in different languages (IOS in English, OIN in French), [and so] it was decided to use a language-independent word derived from the Greek, isos, meaning "equal". Therefore, the short form of the Organization's name is always ISO" (ISO website: http://www.iso.org/iso/en/networking/pr/isoname/isoname.html [last accessed 10th July 2007].

10 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents far as UK higher education is concerned, JISC participates in a number of consortia, mainly through CETIS, OSS-Watch and UKOLN. In a recent JISC review of standards bodies there were six consortia that JISC is both active within and that have a high level of alignment with JISC objectives (JISC, 2006):

• Dublin Core Metadata Initiative (DCMI): Initial involvement with the DCMI was through UKOLN's participation in the JISC-funded MODELS project. JISC is now a DCMI affiliate and is active in various DCMI working groups.

• Open Archives Initiative (OAI): concerned with the efficient dissemination of content, open access, e-prints. The OAI is highly relevant to work in the JISC Information Environment.

• Organisation for the Advancement of Structured Information Standards (OASIS): valuable for its work on domain specific Web services and related specifications. OASIS works on a diverse set of technologies in areas such as service discovery, business process automation, security, document merger, but also conformance and dissemination programmes.

• Open Group: has a focus on interoperability between enterprise systems. Aligns with JISC’s support for interoperability of educational systems.

Consortium (W3C): for Web data formats, XML standards, metadata standards etc. JISC is represented on the W3C through UKOLN and has provided experts to sit on various working groups. 2.4 Vendor-led moves towards open standards As both the pressure from users and changes to public policy has intensified, the opportunity to make use of the ubiquity of emerging XML technology has led vendors to begin moves away from proprietary, binary formats to more open methods based on document mark up. The Valoris report (2003) identified two primary XML-based office file formats for editable documents: OpenOffice.org XML format (which was available either through ' commercial StarOffice product, or via the free and open source OpenOffice.org project); and Microsoft's (at the time still proprietary) XML Office 2003 XML reference schema file formats, which were available as an option with their 2003 enterprise edition of Office. Modified forms of these two XML-based formats became, in later years, ODF and OOXML respectively (see later in this section). The later TAC report (2004) went on to suggest that Microsoft should be encouraged to move away from binary file types, and to consider submitting their newly developed XML formats (Office 2003 reference schemas) to a recognised SDO of their choice. This is an important development. As the move away from binary towards XML-based formats reaches fruition there is growing pressure to firm the formats up through a standardisation process involving formal standards bodies. This consolidates the move away from a past of proprietary and de facto standards and encourages software producers to 'open up' their formerly closed file formats. It is also important to note that Valoris emphasised editable office document formats, and so did not focus on Adobe's PDF format, which was considered primarily a read-only format (see section 1.4.3). However, PDF is overwhelmingly the preferred format for documents that can be downloaded from the Web, and no discussion of file formats would be complete without taking this into consideration (Ditch, 2007, summarised in appendix B). 2.4.1 Sun Microsystems: OpenDocument Format (ODF) In August 1999, Sun Microsystems purchased a relatively small German software company

11 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents called StarDivision, and, in the process, acquired the company's StarOffice office application suite. This acquisition allowed Sun to enter the office application market—an area dominated by . The StarOffice package, and subsequently OpenOffice.org, made use of XML for its file format, rather than a binary solution (OASIS, 2006a). After initially allowing their StarOffice software to be used free of charge for personal or educational use, a dedicated zero-cost version was subsequently produced. The release of OpenOffice.org version 1.0 as a free, open source office application (managed by a project with the same name and website address as the application) in May 2002 allowed a relatively rapid entry into the office software market, with users able to download the application without restriction. The StarOffice application remained available as a commercially supported application (downloadable for approximately $70 USD), which was intended to appeal to corporate users. The XML format used by OpenOffice v1.0 was taken up by the OASIS Open Office Technical Committee (TC) in late 2002 for development into an open standard. This subsequently developed into the OASIS Open Document Format for Office Applications (OpenDocument) TC which worked on what became known as the OpenDocument Format (ODF) (Tenhumberg et al., 2006). Following a period of development led by Sun Microsystems, industry and other partners (current membership of the technical committee is available from the OASIS website7), ODF was approved as an OASIS standard in May 2005. In September 2005, OASIS submitted it for ISO/IEC 'fast-track' approval and in May 2006, ODF was approved as the internationally recognised office , ISO/IEC 26300:2006 Open Document Format for Office Applications. Following ISO approval, further development work has been undertaken by the OASIS technical group, particularly in the areas of accessibility, spreadsheet formulas, and metadata, with separate subcommittees working on each topic. Details of all versions of the ODF specification, together with subgroup activities, are available from the ODF Technical Committee home page8. Although the ODF specification is complex by normal standards – at more than 700 pages – the reuse of existing open standards, or portions of such standards, considerably reduces the complexity of the specification. The ODF file format is now used natively by an increasing number of office applications, although not directly by the market-leading Microsoft Office suite (for which two third party plug-ins are available)9. OpenOffice.org is perhaps the most widely known ODF-based office application. This is claimed to have received around 80 million downloads to date, although the difficulties of equating downloads with actual ongoing usage are acknowledged (OpenOffice, 2007b). The OpenOffice.org statistics page, which quotes sources including IDC and Gartner, tentatively suggests usage levels may be around 10% of the office market. 2.4.2 Microsoft: Office Open XML (OOXML) format Microsoft Office has been the dominant office productivity suite for some time, with one recent estimate of its market share at 95%, with a customer base of 400 million users (Business Week, 2006). Back in 2003, general industry trends towards the standardisation of application-specific

7 OASIS ODF technical committee membership and voting rights may be viewed from http://www.oasis- open.org/committees/membership.php?wg_abbrev=office [accessed 13/04/07]

8 http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office [accessed 13/04/07].

9 http://sourceforge.net/projects/odf-converter and http://www.sun.com/software/star/staroffice/index.jsp

12 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents formats were becoming evident, with these being increasingly XML-based. Although Microsoft enjoyed a large share of the office market, competitors to Microsoft, including Sun Microsystems and Adobe, were quick to identify these trends (perhaps quicker than Microsoft), and to begin standards developments of their own. In response to discussions with the Danish Government over the use of XML (LaMonica, 2003) Microsoft published its XML Reference Schemas10, which documented the XML-based file formats that could be used as options within Office 2003 (Cover, 2003). Although not put forward for standardisation at this time, the publication of information regarding file formats, together with licensing of their use, was generally welcomed. In November 2005, Microsoft, together with a number of industry partners and supporters, took steps down the standards road and began working with Ecma International11 in December 2005 to produce an open specification for their own Office file formats. This was approved as Ecma-376: Office Open XML File Formats (OOXML)12 in Dec 2006 (ECMA, 2006a; ECMA, 2006b). On the same day, Ecma agreed to fast-track OOXML to ISO/IEC for consideration as a draft international standard (referred to by ISO as ISO/IEC DIS 29500) (ECMA, 2006b). Ecma's fast track approval process has got off to a troubled start, with the initial one-month consultation process receiving an unusually high level of responses from national bodies (20 in total), the majority of which were negative (Aslett, 2007a). A comprehensive list of criticisms of OOXML was published at around this time by Grokdoc (2007), and it seems likely that some opinions were informed by this information. Ecma (2007a) has provided a robust response to the received national body comments and at the time of writing, the Ecma- 376 proposal is continuing on the fast track approval process (with the votes due to be submitted on 2 September, 2007). With the release of Office 2007, Microsoft has moved its applications to an XML-based file model, with the OOXML format covering word processor, spreadsheet and presentation file types. 2.4.3 Adobe PDF Although not a general-purpose office file format, Adobe's Portable Document Format (PDF) has become a de facto standard for displaying and distributing high fidelity views of non- revisable office documents. PDF was first released in 1993 as a format for precisely describing a printable document in a device-independent manner. It is not based on XML, but on the proprietary Postscript Page Description Language developed in the late 1970s by Adobe's founder John Warnock. Although it is an example of a , and Adobe hold patents related to the technology, the company has made it available for use by others. A number of office software suites can now export documents in PDF form, either directly from within the office application, or by the addition of free plug-in software. This, combined with the availability of a free of charge PDF viewer software (Adobe Reader), and Web

10 Microsoft Reference Schemas for Office 2003 are available from http://www.microsoft.com/downloads- /details.aspx?familyid=fe118952-3547-420a-a412-00a2662442d [accessed 11/04/07].

11 Ecma International is a not-for-profit association under Swiss law. It profiles itself as being emphatically industry-led and therefore faster and less bureaucratic than the FSBs. There are five levels of membership and all members must be companies except for one category for 'not for profit' organisations. Its agreements are free to download and can be freely copied.

12 Technical details of the Ecma Office Open XML specification are available from http://www.ecma- international.org/news/TC45_current_work/TC45-2006-50_final_draft.htm [accessed 12/04/07].

13 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents browser plug-ins, has made the PDF format a convenient and relatively trouble-free publishing medium in situations where the end user does not require, or the producer does not want the user to have, the ability to further edit the document. It has become particularly popular for those whose work is destined for printing and in the associated pre-press market and for longer documents that are made available for download over the Web (see appendix B). In parallel with previously discussed trends towards open standards, Adobe have worked steadily with SDOs, most notably AIIM (the Enterprise Content Management Association) and NPES (the Association for Suppliers of Printing, Publishing, and Converting Technologies), to develop PDF-related open standards. In 2001, ISO approved PDF/X (known officially as ISO 15930-1 2001: Prepress digital exchange) which is a sub-set of the main PDF format designed to handle the reliable transfer of files from one printing process to another. In May 2005, ISO also approved another sub-set, PDF/A, which focuses on long- term archiving of an electronic document (officially known as ISO/IEC 19005-1:2005: Electronic document file format for long-term preservation). In January 2007, Adobe commenced the process of releasing, in full, version 1.7 of the PDF format to the ISO/IEC standardisation process, partnering with AIIM for this purpose (Adobe, 2007). The proposed standard will not be XML-based although a separate project within Adobe, Mars, is currently developing an entirely XML-based implementation of the PDF filetype13. The development of Microsoft's XML Specification (XPS) technology14, which offers broadly similar capabilities to PDF, together with its native integration with the Windows Vista , has been a source of some controversy. Microsoft recently withdrew its planned direct support for 'Save as PDF' and 'Save as XPS' features in , at least partly in response to concerns raised by Adobe (Bangeman, 2006; Kawamoto, 2006) offering typical press coverage. Microsoft has announced similar plans for standardisation applications of XPS (Fisher, 2006) and, further to this, Ecma announced in July that a committee had been formed to take this forward (ECMA, 2007b). 2.5 Implications There has been considerable controversy around the approval of Ecma-376 (OOXML) and the resulting attempt to have it approved as an ISO standard. The concerns centre around the possibility of the ratification of two ISO standards for open XML document formats, and the perceived lack of openness of the OOXML format. Indeed, the European Union, recognising that developments related to standards approvals had taken place following the original TAC recommendations (ODF approval by OASIS and ISO/IEC, OOXML endorsement by Ecma, and the ISO/IEC approval of PDF/A for long term data archiving), revised, extended and republished the original 2004 TAC conclusions and recommendations on open document formats in December 2006 as the Pan-European eGovernment Services Committee (PEGSCO) Conclusions and Recommendations on Open Document Formats (PEGSCO, 2006). The document made ten detailed recommendations, with the first five targeted at 'public administrations' (government), and the last five aimed at 'developers'. These recommendations are listed in Tables 1 and 2.

13 http://labs.adobe.com/wiki/index.php/Mars

14 http://www.microsoft.com/whdc/xps/default.mspx

14 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

Table 1: PEGSCO recommendations for 'public administrations' Ref Recommendation

6.1 To make maximum use of internationally standardised open document exchange and storage formats for internal and external communication;

6.2 To use only formats that can be handled by a variety of products, avoiding in this way to force the use of specific products on their correspondents. When the usage of proprietary formats is unavoidable, alternative, internationally standardised open formats shall be provided in addition to proprietary formats;

6.3 To adapt, where appropriate, national guidelines and regulations, taking into account the arrival of international standards in this area;

6.4 To consider the definition of minimum requirements in regard to the functionalities of open document exchange formats in view of pursuing the compatibility of applications;

6.5 To create guidelines for the use of revisable and non-revisable document exchange and storage formats for different purposes.

Table 2: PEGSCO recommendations for 'industry, industry consortia and international standardisation bodies' Ref Recommendation

6.6 To work together towards one international open document standard, acceptable to all, for revisable and non-revisable documents respectively;

6.7 To develop applications that can handle all relevant international standards, leaving the choice to their customers as to what format will be used "by default";

6.8 To avoid invalidating the purpose of open document exchange and storage formats by offering extensions to the relevant international standards as default formats.

6.9 To make proposals for conformance testing and to develop adequate tools in order to safeguard interoperability between applications;

6.10 To continue to improve the existing standards, also taking into account additional needs such as electronically signed documents.

The PEGSCO document goes on to suggest that the possible approval of Ecma-376 as a second ISO standard for revisable documents, may result in increased administrative burdens and incompatibilities, due to the potential need to publish documents in multiple formats. An extract is given below: “Member State experts have identified the perceived compatibility problems between ISO 26300 (ODF) based products and the commercial applications that dominate the offices of today’s administrations as the main barrier for the use of open document exchange and storage formats. The potential arrival of a second international standard for revisable documents may mean that administrations

15 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

will be required to support multiple formats leading to more complexity and increased costs. Although filters, translators and plug-ins may theoretically enable interoperability, experience shows that multiple transformations of formats may lead to problems, especially as there is no complete mapping between all features of each of the different standards.” PEGSCO, 2006, p. 6 2.5.1 The current situation and why it matters to education The situation with regard to office document formats (as of July 2007) is very fluid. ODF has been passed as an international standard and it is supported by number of office suite products. OOXML has been implemented as the native file format supported by Microsoft's Office 2007 application (which was released in January 2007). This format has been approved by Ecma International and has been submitted to ISO for fast-track approval as an international standard. At the time of writing, an international ballot process is taking place to determine whether OOXML should be approved by ISO and votes are due in September 2007 although, strictly speaking, the process continues until the Ballot Resolution Meeting, currently scheduled for February 2008. In effect we are close to a situation where there may be two international standards for office documents. These events have attracted considerable controversy throughout the computing industry and there are claims and counter claims on either side regarding the wisdom and necessity of having two standards, the technical merits of the two, and the tactics being employed. This report will argue that it is important to understand these developments as they are likely to have a number of implications for the education community, both as a large user and publisher of office documentation and as a community with a strong interest in the development of open standards. Although there is widespread use of Microsoft Office applications within the community there is generally a time-lag with regard to upgrading to new releases (OSS Watch, 2007; Ditch, 2007 - summarised in appendix B). Appendix B demonstrates that current usage of the newer XML file formats is low. A window of opportunity therefore presents itself for considering how document file formats should be used, and how HE/FE should approach the issue of upgrading to XML. The rest of the report will focus on explaining the issues pertinent to education and drawing out some of the implications of the discussion.

16 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

3. Comparing ODF and OOXML The Valoris report (2003) found that Microsoft's Office product enjoyed technical superiority over the OpenOffice.org, but it also identified that the combination of the OpenDocument file format (ODF), the free of charge OpenOffice.org software, and its low cost StarOffice alternative, offered sufficient features and functionality for many users. The report also predicted that OpenOffice.org might be expected to gain a 10% share of the overall office software market, with Microsoft's share stabilising at around 85% (this being equivalent to a 10% decrease). Furthermore, the ODF file format was predicted to become the format of choice where interoperability was the key factor. This section will compare the two formats based on discussion of the published specifications. It should also be noted that this section does not analyse the technical specification of the PDF file type, or its variants, since PDF is considered to be a non- revisable office document format. Due to the size of the associated specifications for ODF and OOXML (700+ and 6,000+ pages respectively!) the technical analysis will necessarily be 'high level', with reference made to more detailed technical sources, where necessary15. 3.1 Technical analysis: ODF ODF is an XML-based document file format for office applications that facilitates the creation and editing of documents containing text, spreadsheets, charts, presentations and graphical elements. 3.1.1 File format and structure The ODF standard defines an XML structure or schema, and the associated semantics, for office-based applications. This makes use of XML elements, attributes and namespaces of a number of other existing standard formats such as XSL-FO, SVG (), XLink, XForms, MathML and DublinCore (See Tables 2 and 3 of the ODF specification for more details (OASIS, 2006b)). Extensions to the ODF schema are permitted, through the use of additional XML namespaces, thus offering the possibility of including custom XML schemas (incorporation of user specified data, for example). However, according to Geyer (2006), this is not currently implemented by any ODF compatible applications. The ODF file structure uses a file as a compressed archive to hold a series of XML files and other information (such as binary files containing embedded images) that describe the document's content and presentation. A manifest file holds an index of all the files contained in the package and their types. The use of a Zip file in this manner offers the dual advantages of creating a single document file, containing multiple separate components, and also of using compression technology to reduce the overall size of that file. A RELAX NG schema16 is used to specify and validate the pattern for the structure and content of the ODF document. Relax-NG has become popular as a lightweight and easy to use schema language, itself written in XML, and supported as an open standard by OASIS17. A useful overview of the ODF file format, its historical development, internal structure (including the previously mentioned Zip file format), together with a number of claimed

15 An extensive comparison of ODF and Open XML formats is also available from MacNaghten (2007).

16 An XML schema is used to define a particular type of XML document, imposing various constraints on the structure and content of the document.

17 The OASIS Relax-NG technical committee home page is available from: http://www.oasis- open.org/committees/tc_home.php?wg_abbrev=relax-ng

17 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents advantages, is available from the OASIS (OASIS, 2006a). 3.1.2 Implementations and applications that support ODF A number of office document and associated applications support ODF on a variety of hardware and operating system platforms. These include OpenOffice.org, StarOffice, KOffice, Lotus Notes, AbiWord, Google Docs & Spreadsheets, Zoho Writer, AjaxWriter. have announced that it would support both ODF and OOXML in its next release of Word Perfect (due in late 2007) (Corel, 2006) and IBM has announced support for ODF (but currently not OOXML) in Lotus Notes (Schwartz, 2006). 3.1.3 Advantages and disadvantages of ODF As discussed in appendix A, there a number of competing and counter-competing claims can be made with regard to a standard. This is an area of considerable controversy and the reader should be aware of the polarised nature of some of the debate. This section lists some of the technical advantages and disadvantages of ODF that it is anticipated will be of relevance to the higher education community. Table 3: Technical advantages of the ODF file format Advantage Description

International Standard ODF has been approved by ISO as ISO/IEC 26300

Simple specification, ODF makes internal use of a number of open standards, thus building on existing reducing the complexity of the specification and allowing open standards developers to leverage existing open standards support in applications.

Supported by multiple ODF is supported by a number of applications (see section 2.1.2) applications from a range of vendors, some of which are free, open source solutions.

ODF uses a mixed It has been suggested (Carrera et al., 2005) that ODF is technically content markup model, superior (for word processed documents at least) to OOXML with very good because it uses a mixed content markup model, with improved separation of content and separation of content and presentation (in effect suggesting that presentation. ODF text is 'like XHTML' and that OOXML text is 'like an XML data structure'.)

18 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

Table 4: Technical disadvantages of ODF Disadvantage Description

Mathematical formulae It is claimed (González-Álvarez, 2006) that Tex is more commonly use a little-used formula used to mark up mathematical expressions, rather that the W3C's language MathML, and that MathML has some technical issues. It is debatable how relevant the first point is, as MathML is an underlying XML format and therefore not often encountered directly by the user. However, MathML is also an internationally agreed open specification and hence its use will encourage interoperability between users.

ODF is insufficiently The ODF specification v1.0 lacks detail in areas such as spreadsheet detailed: Spreadsheet formulae. It does not define a specific syntax for spreadsheet formulae are application formulae, leading to possible incompatibility between ODF defined documents (Fioretti, 2005a). However, a detailed spreadsheet formula syntax, OpenFormula, has been under development by the OASIS formula subcommittee since 2005 and is now in final draft stages18.

Macro/scripting There is no defined macro language in ODF, even though ODF language is not defined applications such as OpenOffice.org support macros. This may in ODF cause incompatibility between ODF compatible applications (Fioretti, 2005b). This may be an area which requires further clarification in the interests of improving interoperability between office applications.

No support for digital The ODF standard does not make reference to digital signatures, signatures even though ODF compliant applications make use of them. However, OpenOffice.org makes use of the W3C's XML-Dsig Recommendation, which is likely to be incorporated into the next revision (v1.2) of the ODF standard.

3.2 Technical analysis: OOXML Office Open XML (OOXML) is an XML-based specification for the formatted description of office documents such as wordprocessing files, spreadsheets, presentations, charts and drawings. 3.2.1 File format and structure As was the case with ODF, the OOXML file format is based on a compressed Zip archive, which is renamed according to the office file type. This Zip file contains a number of separate files, called parts, together with a simple folder structure. Many of these files are XML, although other file types, such as images, may be included. Microsoft uses the term Open Packaging Conventions (OPC) to refer to this arrangement and content of files and folders within a Zip archive. Detailed information regarding the OOXML format may be found in the Ecma TC45 Office Open XML documentation (ECMA, 2006a)19. Ecma OOXML uses the

18 See: http://wiki.oasis-open.org/office/About_OpenFormula

19 An whitepaper overview of the structure of OOXML is provided by ECMA at: http://www.ecma- international.org/news/TC45_current_work/OpenXML%20White%20Paper.pdf

19 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

W3C's Schema format as the 'normative' option for specifying and validating the pattern for the structure and content of the document, although RELAX NG is provided for as an 'informative' option. Links between documents are described indirectly, using relationships, in which a named relationship in one file (or part) is actually resolved in another file (the relationships part). For example, an image can be referenced internally (within the Zip archive), externally from the computer's local file system, or externally using an URL20. OOXML uses three, custom XML-based languages to describe the primary types of document content: WordProcessingML, SpreadSheetML, and PresentationML. As their names suggest, these three XML languages correspond to the three main OOXML document types. Each document type allows the embedding of material in the other primary markup languages and a number of other subsidiary, custom XML formats. These subsidiary formats include:

 DrawingML (pictures, charts, diagrams)

 VML (a legacy graphical information format that is being deprecated by Microsoft)

 Math (for the graphical display of mathematical equations) Other XML languages may also be encountered in association with a variety of Microsoft document types, including Visio technical drawings (DataDiagramML) and InfoPath forms (FormTemplateML), although these are not defined within the Ecma-376 specification. 3.2.2 Implementations and applications that support OOXML OOXML is the default format supported by Microsoft's Office 2007 range of office applications. The range is available in a variety of suites, each of which features a number of the latest versions of well-known applications such as Word, Excel, Outlook and PowerPoint. This updated version of the Office product suite is currently available for the Windows XP and Vista operating system. The new Apple Mac version of Office (known as Microsoft Office 2008 for Mac OS X) is slated for release in early 2008 and will also natively support OOXML (Microsoft, 2007b). OOXML is also partially supported by Novell's edition of Open Office (through the use of a translator (Fontana, 2007b)), an open-source project called (for spreadsheets) and, experimentally, on the Apple Mac via Neo Office. Corel have announced that it would support both OOXML and ODF in its next release of Word Perfect (due in 2007) (Corel, 2006). 3.2.3 Advantages and disadvantages of OOXML Ecma's attempt to get OOXML approved as an ISO standard has caused considerable controversy. As already noted, this controversy focuses on the process through the standardisation bodies as well as the technical issues. This section lists some of the perceived technical advantages and disadvantages of the OOXML specification that will be of relevance to the higher education community.

20 A useful tutorial on methods of embedding images within a Word 2007 document has been produced by Doug Mahugh. See http://blogs.msdn.com/dmahugh/archive/2006/12/10/images-in-open--documents.aspx for more details [accessed 15/04/07].

20 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

Table 5: Technical advantages of the OOXML file format Advantage Description

Backwards compatibility Compatibility, at high levels of fidelity and with the large number of with existing Microsoft existing legacy documents stored in Microsoft proprietary binary format, is Office documents. a major design goal of OOXML.

Faster operation and It has been suggested that Microsoft's Office implementation of OOXML better memory use. is faster than OpenOffice.org when opening large spreadsheet files, and that it also uses a lot less memory (Ou, 2005).

The use of a separate OOXML specifies that all external references, such as hyperlinks or linked 'relationships' file to files, are kept in a single, separate XML documents file within the zipped hold hyperlinks package. This simplifies the process of fixing links if the files are moved simplifies editing of from one server to another (Rice, 2006). these hyperlinks

Table 6: Technical disadvantages of the OOXML file format Disadvantage Description

Inconsistencies with Examples of this include: Paper sizes (ISO 216 defines names for paper existing ISO standards sizes, whereas OOXML uses its own numeric codes for these sizes); Dates and Times are covered in ISO 8601, but OOXML makes use of an alternative mechanism which considers 1900 as a leap year and does not understand dates prior to 1900 (an existing error found in Microsoft Office legacy spreadsheets); HTML colour names (ISO/IEC 15445).

Inconsistencies with OOXML defines its own vector graphics markup (DrawingML) rather existing W3C than making use of SVG. This may be in order to remain backwards Recommendations compatible with an earlier Microsoft Office drawing format, VML. A counter argument to this criticism is that standards such as SVG may not be wholly suitable for the required purpose, leading to a requirement to invent a new solution, or to adapt a standard to an excessive degree. Support for this viewpoint comes from the unlikely source of Sun's own development community (Ahrens, 2007), see in particular the follow-up comments. However, in addition, OOXML does not make use of the W3C recommended mathematics markup language, MathML.

Cloning behaviour of Several sections of the OOXML specification make reference to behaviour undocumented legacy of an application without defining the nature of that behaviour. For features example, 'autoSpaceLikeWord95'. It is argued that only Microsoft can implement these proprietary features and therefore OOXML cannot be reasonably implemented by others (Grokdoc, 2007; FSFE, 2007).

Size of documentation The OOXML standard runs to some 6, 000 pages and responses to the Ecma International standardization process have argued that this is a serious issue (ECMA, 2007a) which results from the failure to leverage existing, open standards within the standard.

The use of a separate It has been argued that this may cause problems with the manipulation of 'relationships' file to the XML in an OOXML document and, in particular, may affect the use of hold hyperlinks the standard translation tool, XSLT. This needs to be clarified, as it is

21 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

potentially very serious, since the inability to transform the XML would restrict the repurposing of the information contained in the file, and would also inhibit easy conversion to other formats (for example, to HTML for viewing on a Web browser and PDF for printing (Barnes, 2007)).

Macro/scripting There is no defined macro language in the OOXML specification, even language is not defined though Microsoft Office 2007 does support VBA macros. VBA is a in OOXML proprietary language and could therefore cause incapability between OOXML-compatible applications.

Specification is There are elements in the Microsoft's Office 2007 file formats that are not incomplete documented in Ecma-376 e.g. VBA. This may cause interoperability problems with applications that utilize Ecma-376.

3.3 Format conversion and associated problems As we have seen, office document-related file formats are rapidly evolving towards open standards. However, the possibility of multiple competing standards means that there will continue to be a strong requirement for interoperability between formats. Some vendors are looking to provide this interoperability natively, as a direct part of the main application, while others prefer the plug-in approach, in which an additional piece of software, usually from a third party, is added to the application. Native support has the advantage of providing the user with the ability to directly open or save documents of a supported type, without the need to locate and install additional software. This is clearly a more convenient option for users. However, vendor arguments against the provision of direct support may include:

 possible increase in the size and complexity of the main application

 requires engineering resource, which may be difficult to justify where the demand for a format is considered insufficient

 existing provision of plug-ins by one or more third parties

 patent restrictions and other legal issues Microsoft has funded third party development of an ODF plug-in (Microsoft, 2006) to allow Microsoft Office users to save or load files using the ODF format, with the results of the development effort being made available via SourceForge21. The initial version of the plug-in supports only, but Excel and PowerPoint compatibility is in beta development at the time of writing. In a recent interview, Microsoft's Director of Corporate Standards, Jason Matusow, gave useful insights into Microsoft's reasoning for adopting the plug-in approach, as reported by Aslett (2007a). Matusow stated low levels of interest in ODF during the development phase of Office 2007 as one reason behind the lack of direct support, with Microsoft instead receiving strong demand for direct PDF support. Interest in ODF was seen to be coming particularly from the government sector, with a possible requirement to support ISO standards being significant to Microsoft's existing government customers (who might need to create ODF documents from within Microsoft Office). The choice of a third party plug-in development model was apparently inspired by a desire to adopt a 'transparent development

21 Details of the ODF converter are available from http://sourceforge.net/projects/odf-converter [accessed 08/04/07].

22 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents model' and to avoid suggestions of Microsoft 'manipulating standards' (Aslett, 2007a). At the opposite end of the spectrum Novell have developed a plug-in to allow OpenOffice.org users to save files using the Ecma OOXML format. Once again, the initial version of the plug-in supports word processed documents only. It is further restricted to work only with Novell's own customised version of OpenOffice.org, which is referred to as the Novell Edition of OpenOffice.org22. Performance of this plug-in has been tested by Popov (2007) in comparison with the open source OpenXML/ODF Translator Add-in for Office. Popov considers the latter to be the better option—although even this does not currently offer seamless interoperability between complex word processed documents. Sun Microsystems have also announced (Sun Microsystems, 2007) the availability of a plug- in for Microsoft Office 2003 (Windows-only)23, which allows users to open or save documents using the ODF format. Sun state that the Commonwealth of Massachusetts are using this plug-in as part of their strategy to publish publicly available documents using open standards (Sun Microsystems, 2007). Despite preference by some vendors, the plug-in solution is far from perfect from a user perspective. For example, if an office document is widely published using a given format, then the availability of plug-ins should theoretically allow that document to be opened and edited in any office application. However, the need to install the relevant plug-in on an individual computer effectively makes it difficult to rely on this capability, since not all users will have carried out, or perhaps have the resources to carry out this task. This is a particular problem for HE/FE, where institutions have large numbers of machines to maintain and no control over the software on a student's home PC. The development of a diverse range of plug-ins may also lead to problems, including variable quality, delayed availability, and incomplete coverage of features and applications. Interoperability issues have also been suggested by Weir (2007), related to limitations imposed by user interface design, suggesting that the plug-in approach offers less flexibility than native support. As can be seen from the above discussion, developments in plug-in technology are evolving rapidly, but are currently at an immature stage. It is likely that inherent problems with the plug-in approach will make seamless interoperability difficult to achieve, and this will be particularly problematic for HE. These are the kinds of issues that are behind PEGSCO's concerns around the increased administrative burden that could be imposed on institutions, and it is therefore hoped that native support will emerge as the dominant solution in the medium term, as this would offer important advantages to users of office software:

 universal access to standards-based file formats, without the need to install additional software

 complete bidirectional support for all major file types, from all major office applications

 uniform and predictable quality, following thorough testing by office vendors

 reduced need to publish files using multiple formats The actual achievement of these aims is likely to be facilitated by the consistent requirement of direct support for approved open standards, at the time of software procurement. Government, corporate and educational purchasers of office software are in a particularly strong position to influence suppliers, through their ability to set clear procurement guidelines, and to communicate this information to vendors.

22 Available from http://download.novell.com/ [accessed 09/04/07].

23 Available from: http://www.sun.com/software/star/openoffice/

23 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

3.3.1 Format fidelity Format fidelity is a measure of how accurate a copy is to its source. It is an important concept in the debate over file formats and in any discussion about choosing file formats, as it is an important component in measuring interoperability between applications. The goal is often to have full or 100% fidelity (often seen as being particularly important for accurate records archiving), but lesser degrees may be acceptable to users in certain circumstances. HTML, for example, provides what is known as 'page similarity' in that the basic information is present but the nature of browsers means that no two will display the page exactly the same. The concept of format fidelity as it relates to office documents is discussed at length in the Valoris report (2003, pp. 37–42). It includes the ability to retain document formatting and associated meaning following 'round trip' editing by multiple office applications, behaving consistently across differing computing environments, including tolerance of system-based dependencies such as installed fonts, and not depending excessively on user-based preferences or 'viewer' capabilities, as might be found with a Web browser, for example. It is also a measure of whether one can convert a file in one format to another format, or save a file in one software package in a particular format and then open it with another software package and see exactly the same information and layout replicated in both cases. There are converters that will take a file in one format and convert to another and one of the key measures of how successful they are perceived to be is by judging the degree of fidelity of the final version when compared to the original. Fidelity is also used in the context of whether or not all the possible features of a software package can be supported by a particular file format e.g. if the new word processing package allows the addition of a 3D table to a document, can that table be accurately saved within a non-native file format? Generally, users like fidelity and the feeling that if they save a file, then next time they read it back, even if in another vendor's product or on another platform, it will be the same. One of the main reasons for the widespread adoption of Adobe PDF is that the format provides for very high fidelity of the display and printing of a document on a vast range of computer platforms and printing devices. A key requirement for fidelity is the ability to read an older file, in a previous format, even a binary one, and replicate it accurately in the newer format. This helps to ensure backwards compatibility. Microsoft argues that OOXML actually carries out a different role to ODF, in that OOXML is specifically designed to allow 100% fidelity with previous Microsoft Office documents and their formats, i.e. OOXML can handle specific features in the binary, proprietary Microsoft Word 95 format and replicate them in the new XML-based format for example. In a recent interview with IT journalist Tim Anderson, Microsoft's Jean Paoli, the general manager for OOXML architecture, is quoted as saying: “What people want is to make sure that their billions of important documents can be saved in a format where they don’t lose any information. As a design goal, we said that those formats have to represent all the information that enables high-fidelity migration from the binary formats" (Anderson (T), 2007). Readers should be aware that there is considerable controversy surrounding the claims and counter-claims between supporters of the two XML-based standards as regards fidelity. This specific goal of OOXML as envisaged by its developers is one of the crucial debating points when discussions are taking place over future choices of XML-based formats. Detractors make the argument that OOXML does not actually manage to embrace all the possible features of old binary formats and that 100% fidelity has not been evidenced and may even not be achievable at all using XML formats (Puttick, 2007). There are a number of converters have been produced for various office packages, but the degree of fidelity of these solutions has not yet been authoritatively established.

24 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

3.4 Legal issues In addition to the technical comparison, there are also legal considerations to be taken into account. Users of office-related software and their associated file formats should be aware of general issues related to intellectual property rights, software licensing, and the application of patents related to software. Indeed, as we have noted in section 2.3, these considerations have an impact on discussions over the definition of an open standard. The ability to freely implement a standard in a software program is a difficult area, with some vendor-developed technologies being the subject of licensing agreements, or even protected by software patents. Many standards make use of specifications that are made from, or make reference to, the technical developments of private companies. This is a natural and obvious consequence of the fact that neither public sector researchers nor the standards bodies themselves generate all technology advancements. Such technologies are often the subject of patent protection and the issue of the use of patented technology within standards is the subject of considerable controversy (Clark, 2002). Standards organisations often respond to this dilemma by requiring companies to sign an agreement to licence, either on a royalty-free basis or what is known as Reasonable And Non-Discriminatory (RAND) basis. A well-known example is that of W3C, who have issued a Patent Policy Framework that "seeks to issue Recommendations that can be implemented on a Royalty-free (RF) basis" (W3C, 2004). This is further complicated however by a) debates over what is considered "reasonable" and b) non-disclosure of patents and related patents, where a company may even be unaware they have a claim to a related technology until after the standard has been settled (sometimes known as 'submarine' patents since they surface sometime after the due processes have been complete) (Soininen, 2005). Determining whether a patent claim relates to a proposed technology standard is a difficult process that requires legal analysis and is ultimately determined in a court of law (Lin, 2003). This is further complicated by the fact that companies may have patent applications in process that relate to a technical standard. Such patent applications are not publicly available and are usually considered to be highly commercially confidential. This controversy means that certain definitions of open standards may optionally permit the charging of reasonable fees related to intellectual property rights such as patents, while other definitions require that the standard should be available to all, entirely free of charge. Useful guidance in this area is provided by the JISC-funded OSS- Watch service24. Both ODF and OOXML were originally proprietary-developed products and, although they are in the process of being opened and standardised, they may still be subject to what might be termed, legal 'baggage'. The 'right to use' such formats is a controversial issue, with some users suspicious that commercially sponsored formats may not be sufficiently free of such legal encumbrances. However, office software vendors, wanting to encourage the uptake of their proposed formats, have been keen to defuse such concerns and reassure the user community of their good intentions. When Microsoft first released their XML Reference Schemas with Office 2003, Microsoft issued a covenant not to sue (Microsoft, 2007a), which, although preserving Microsoft's intellectual property rights, was intended to permit use of the format by third parties (both end users and developers of software seeking to provide compatibility with Microsoft applications). The covenant not to sue was extended to include OOXML following the release of the Ecma standard. Critics of Microsoft's covenant not to sue have suggested that it actually 'grants no rights' (Grokdoc, 2007). However, further reassurance may be gained from Microsoft's Open

24 Guidance related to IPR, licensing and patents is available online from http://www.oss- watch.ac.uk/resources/ipr.xml [accessed 13/04/07].

25 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

Specification Promise25, in which they undertake not to make financial claims for the use of any 'covered specification' (with the list of covered specifications including the mandatory elements of OOXML). Such concerns are not restricted to OOXML. Sun Microsystems has issued separate statements related to Patents and IPR via the OASIS website (OASIS, 2005), although it should be noted from the text that coverage for future versions of ODF exists only where “Sun participates to the point of incurring an obligation”. Vendors have also commissioned independent legal advice, in order to provide further reassurance. For example, Sun Microsystems have obtained advice from Eben Moglen (2006), indicating that the ODF file format may be freely used in free and open source software. However, despite these many reassurances, a number of high profile IPR-related controversies such as the SCO v. Linux26 dispute, together with patent assertions related to the LZW compression algorithm used by the GIF graphics format, have served to heighten levels of suspicion in this area. More recently, Microsoft has claimed that open source software, including , OpenOffice and a variety of other applications, infringes 235 of their patents (Parloff, 2007). Precise details of the allegedly infringed patents have not been released, leading some to speculate (Moody, 2006) that the intention may be to spread 'fear, uncertainty and doubt' or to extract royalties through the agreement of licensing deals. Despite Novell's recent licensing agreement with Microsoft, (which included indemnity for their customers against any such patent liability), Aslett (2007b) reports that Novell have sought to distance themselves from Microsoft's claims. Whether these allegations will be tested in court seems uncertain, given the considerable negative publicity generated by previous IPR disputes, and the possibility of protracted and damaging litigation and counter-litigation. Following Adobe's announcement of its intention to release the full PDF format for industry standardisation, Adobe has indicated its intention to adopt a similar licensing model to Microsoft and Sun. They will issue a covenant not to sue and will permit royalty free use of the format, while retaining relevant intellectual property rights (Adobe, 2007).

25 Microsoft's Open Specification Promise is available from http://www.microsoft.com/interop/osp/default.mspx [accessed 13/04/07].

26 A time line of developments related to the SCO-Linux saga is available from http://www.linux.org/news/sco/timeline.html [accessed 31/05/07].

26 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

4. Future developments 4.1 Trends in the market for office documentation software An understanding of current and potential future market trends related to office software is useful for document creators, particularly where documents are intended for a diverse audience and may require long-term preservation through electronic repositories (as is the case in higher and further education). The focus of this section is primarily on trends related to revisable office documents, including word processor, spreadsheet and presentation document types. Microsoft is currently the largest player in the office software market and is likely to remain so for the foreseeable future. However, Microsoft seems likely to face increasing competition from a number of other contenders, including Sun Microsystems, OpenOffice.org, IBM, Corel, together with a variety of entirely online providers (see below). There is also an increasing number of smaller, less well known packages that tend to focus on the easier-to- use, basic functionality of word processing and are aimed at less professional audiences. Zaine Ridling (2007) has written a comprehensive review of the options in this market. A considerable proportion of Microsoft's Office software revenue comes from sales of updates to existing customers. Microsoft's update policy may be roughly summarised as a repeating cycle of 'smaller ripples and larger waves', with major product upgrades tending to appear every four years, interspersed by minor updates every two years. For example, the company has released major versions of its Office product in 2000, 2003 and 2007, with Office XP (2001) as a minor upgrade. The lack of a minor Office upgrade in 2005 is a deviation from this trend. A future minor update to Microsoft Office might therefore be predicted around 2009, with a more substantial upgrade appearing around 2011. Given that existing customers have to pay for ongoing upgrades, either through annual subscription or as individual purchases, Microsoft has historically needed to provide evidence of value for money through innovative additions in order to generate continuing sales revenue. Recent major upgrades have tended to add major functionality in a single headline area, with Internet functionality, new XML capabilities and an overhauled user interface being the focus of the three previously mentioned major upgrades. It is likely that future major revisions to Microsoft software will continue to offer new or enhanced features, although it is difficult to predict their exact focus ahead of time. However, Simon Witts, a Microsoft executive, recently discussed a possible release of Office in 2009 and labelled it Office 1427 (ZDNet, 2006). This product is likely to focus on roles within organisations, with a specific version of the package aimed at individual roles (for example, sales, R&D and HR). There also will be incorporation of Web 2.0 technologies (Krill, 2006). If new or improved functionality is the 'carrot', encouraging users to upgrade, then the 'stick' is the withdrawal of minor updates, bug fixes and technical support from older versions. This is a particular issue for larger corporate and educational users who, although possibly content with existing versions (and potentially adverse to change), are unwilling to operate with unsupported software. Microsoft have issued a standard support lifecycle policy28, which for products such as Windows or Office generally offers five years of standard support, starting from the product's general release date, followed by an optional further 5 years of extended support (at extra cost). Key support dates for common Windows and Office products are

27 It is believed that Microsoft is planning to skip over Office 13 due to it being unlucky!

28 Microsoft's Support Lifecycle Policy FAQ is available from http://support.microsoft.com/gp/lifepolicy [accessed 03/06/07].

27 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents given in Table 8, based on information provided by Microsoft's Lifecycle Information service29. Table 7: Support transition for Microsoft Office and Windows versions Product General Release Date Mainstream Support Extended Until Support Until

Office 2000 27/06/1999 30/06/2004 14/07/2009

Office XP 31/05/2001 11/07/2006 12/07/2011

Office 2003 17/11/2003 14/04/2009 08/04/2014

Office 2007 27/01/2007 10/04/2012 10/04/2017

Windows 2000 31/03/2000 30/06/2005 13/07/2010

Windows XP 31/12/2001 14/04/2009 08/04/2014

Windows Vista 25/01/2007 10/04/2012 11/04/2017

Despite Microsoft's best efforts, the rate of uptake of newer versions of Office has, in the past, been notoriously slow. The current situation immediately following the release of Office 2007 may be usefully compared to that which existed immediately following the release of Office XP in 2001. Valoris (2003, p.69), quoting market research conducted by Giga Information Group, stated at the time that, although Microsoft Office products enjoyed a 95% share of the office software market, only 11% had upgraded to the current XP version. A further 50% were using the predecessor Office 2000, while 33% were still using Office 97. Furthermore, users seemed to be equally divided on whether they intended to upgrade, with 42% actively considering upgrading and a further 45% with no plans to upgrade. If previous market trends are repeated with Office 2007, then it will be some time before sole reliance may be placed on the OOXML file format, regardless of the market share enjoyed by Microsoft's competitors. A relevant recommendation from a Directions on Microsoft report on File Formats in Office 2007 (Helm, 2006) is given below: “In summary, any organization moving to Office 2007 will probably want to adhere to an old network interoperability maxim: "Be conservative in what you send, and liberal in what you accept." For Office files, that suggests generating new documents in existing Office formats whenever possible and limiting use of new Office features that aren't supported by the existing formats, but ensuring that all computers can process incoming documents in the new formats.” While traditional competitors to Microsoft seem to occupy a relatively static market share of approximately 5%, the uptake of OpenOffice.org seems likely to increase significantly, given the product's zero price tag. OpenOffice.org, in their Strategic Marketing Plan (2004, p.8), describe their approach as being based on a 'disruptive marketing' model. The disruptive element in this case is the targeting of markets unattractive to competitors, and of non- consumers who cannot afford to purchase traditional office software. Thus initial growth is likely to be real, but based largely on increasing the numbers of office software users, rather

29 Lifecycle information related to Microsoft products is available from http://support.microsoft.com/gp/lifeselect [accessed 03/06/07].

28 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents than competing away significant numbers of traditional customers. In education, the student population represents a large potential market, which may be receptive to such an approach and this will affect institutional strategies for adoption of XML-based office formats. Further growth in the OpenOffice.org's market share in traditional areas is likely to come from the government and public sector, who will be attracted by the status of ODF as an existing ISO/IEC standard, plus the prospect of savings in software licensing costs. Indirect growth due to bundling of OpenOffice.org with other products is also likely to occur (including Linux distributions, low cost PCs, and products containing OpenOffice.org functionality). However, it should also be noted that Microsoft's recent announcements regarding claimed patent infringements, relating to a number of open source products (including OpenOffice.org), may have an adverse affect on the uptake of these products in corporate and educational sectors, in countries where these patents apply such as the USA, until these allegations have been either refuted or resolved. 4.2 Online office documentation services An additional entrant into this market is the entirely online office software provider, with a number of solutions having recently become available. These services are part of the first stages in a predicted general move away from 'shrink-wrapped', licenced, software packages and towards Web-based delivery of services. At the moment these developments are being driven by AJAX's capability to provide a rich user experience within the Web browser window (Anderson (P), 2007) but are likely to be further driven by the development of Web Services technologies such as WS-* and the uptake of service-oriented approaches, as well as the burgeoning off-line capability of Web applications such as Zimbra Desktop, Google Gears and Firefox 3. Online document services typically offer subscription-based access to full services, or zero- cost basic functionality, funded by on-screen advertising. Recent reviews of free online office suites (Ericson, 2007; Wenzel, 2006) showed that these services could offer a viable solution, given an always-on broadband Internet connection. Claimed advantages include the ease with which documents can be shared with others over the Internet, and the ease of use for people who are travelling away from their office PC. An interesting observation here is that one of the reviewed online services, Google Docs and Spreadsheets, is produced by a major online competitor of Microsoft, and the announcement of presentation capabilities, in addition to existing word processor and spreadsheet features, seems likely to bring Google's online office suite into more direct competition with Microsoft Office (Turner, 2007). Microsoft may respond by offering online, office-related services of their own, and some commentators feel it is only a matter of time before Office is released as a Web-based suite (Foley, 2007). Indeed, Microsoft has made available an online demo version of Office 200730. The longer-term result may be a reduction in the proportion of users willing to pay for premium office software functionality. If recent trends with Web-based e-mail are paralleled, then educational users, including students, may increasingly use online services at home and full office suite functionality when on campus. Interoperability between these services is likely to be important to many of these users. The long-term impact of the entirely online office software provider model is currently unclear, given its heavy reliance on an always-on (and always-responsive) Internet connection. While users may be prepared to tolerate lower responsiveness and reliability

30 http://office.microsoft.com/en-gb/products/HA101687261033.aspx

29 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents levels for services such as e-mail, it remains to be seen whether similar allowance would be made towards entirely online office applications, or to accessing office documents potentially stored on a remote file system. Given the increasing availability of high speed broadband Internet services in the home, the continued effectiveness of these online services, when accessed from a shared campus network (with inevitably lower concurrent bandwidth per user), may need to be assessed. However the appeal and user convenience which may come from integration of services such as search engine, Web portal, e-mail, office applications, together with anywhere/anytime access to user created documents, should not be underestimated. The online solutions do not make use of their own proprietary file formats for documents, instead relying on the existing de facto and open standards. Both Google Docs and Zoho Writer31 for example support saving documents in various formats including PDF, ODF, RTF, and Microsoft's (binary) Word format. 4.3 Living in a two-format world Even if Ecma-376 is not approved as an ISO standard, users will still, in effect, have to operate in a two-format world: one an ISO standard, the other de facto. This is in addition to the continued widespread use of the PDF format for non-editable and print-based documents (which is itself approaching the international standards process) and (X)HTML for millions of webpage documents. In addition, as previously mentioned in section 2.1.2, the Chinese have been working on the Uniform Office Format (UOF), another XML-based standard (McAllister, 2006). This is similar to both ODF and OOXML in that it is an XML-based format that makes use of Zip file containers. UOF seems to have been developed to cater for the needs of Chinese users, including specific language requirements, plus the requirement for cost effective software licensing. Efforts are underway to harmonise UOF with ODF32, thus ensuring interoperability between standards and in the meantime, a plug-in is available to convert between ODF and UOF33. Microsoft is also supporting the development of an OOXML/UOF converter34. An online standards , OpenMalaysia, reports that UOF compatible applications are being rapidly adopted by Chinese government agencies (Yoonkit, 2007). One such application is RedOffice, which is based on the OpenOffice.org codebase, and is thus also providing widespread ODF support to Chinese users, in addition to UOF. It seems likely that all the office documentation packages (offline and online) will support ODF, OOXML and PDF, either natively or as plug-in converter packages, to varying degrees of ability. The consequences of this may be for example, that user domains may emerge with a split between formats: for example, public bodies and governments, who currently have a strong propensity towards use of ODF, might continue with this format, whilst the commercial sector embraces OOXML. Alternatively, there may be more of a mix with users simply getting used to exchanging files in one format or another and to continually switch between the two. Eventually one may emerge as a kind of de facto 'winner'. A second scenario might see this current round of XML adoption fail in a similar way to the

31 Zoho Writer: http://www.zoho.com/

32 A draft OASIS charter related to the harmonisation of UOF and ODF is available from http://www.oasis- open.org/archives/office/200609/msg00029.html [last accessed 19/06/07].

33 Conversion between UOF and ODF formats is supported by a plug-in available from http://odf-to- uof.sourceforge.net/index.html [last accessed 19/06/07].

34 Details of an OOXML/UOF converter for Microsoft Word are available from http://uof- translator.sourceforge.net/ [last accessed 19/06/07].

30 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents failure of ODA (see section 2). Indeed, some commentators argue that if the standardisation process fails in this manner, what might actually happen is a reversion back to binary formats and the loss of the many advantages of moving to XML (Lie, 2007). Although this could be considered an unlikely outcome it is not impossible. An alternative scenario is one outlined by Tim Bray (co-inventor of XML and currently an employee of Sun Microsystems) in a blog entry in 2005, which is to work towards one standard which provides both a universal standard for basic editable documentation and deals with the issue of the enormous legacy of Microsoft documents: "The ideal outcome would be a common, shared, office-XML dialect for the basics—and it should be ODF (or a subset), since that’s been designed and debugged—then another extended vocabulary to support Microsoft features whether they’re cool new whizzy features or mouldy old legacy features (XML Namespaces are designed to support exactly this kind of thing)." (Bray, 2005). This point of view is, to say the least, not without its detractors (see Robert Scoble (2005), an ex- Microsoft employee for example). XHTML as a format Another option is to consider the merits of (X)HTML as a format, not only for webpages, but also a kind of universal or common format for all office documents. For example, Lie (2007), a director at the company that built the Opera Web-browser, recently argued that a common standard is the way forward, but that it should be built upwards from the existing W3C Recommendations for (X)HTML and CSS. Barnes (2007) concurs and outlines the view of some of those involved in document preservation, which is that XHTML can handle both the preservation and access aspects of long-term document storage and that it may be adequate for long-term preservation of simple documents that will be viewed directly by a browser. The idea of XHTML as a common format was also discussed during the EU deliberations over the Valoris report, but Valoris rules out XHTML as a general purpose document format citing fidelity issues (p.30). Tim Bray has written about this idea and his views that the European Union, when considering document formats during the Valoris report, should have considered XHTML as a standard office document format (Bray, 2004). Jon Udell agrees, and adds that the reason XHTML is so useful is that, through the hypertext link, it is well adapted to the modern working environment in which we use a 'network of linked documents' each of which is tied to the context in which it was created (an email thread for example) (Udell, 2004). This latter point emphasises the need to consider the process of creating and reusing documentation in the context of human workflow, an issue that although important has been beyond the scope of this report. These debates may well run for some time. 4.4 Semantic Web Looking a little further into the future, the ideas that are driving the development of the Semantic Web will also impact on office documentation (Carr et al., 2004). The Semantic Web vision rests, in part, on the provision of machine-processable information about meaning, being embedded in files that are available on the Web. The addition of this machine-processable information is a difficult process and an area of active research. As we have seen, office documentation is moving into a new phase in which the information is separated from the presentation. Researchers in the Semantic Web argue that this can be taken a stage further with the addition of knowledge (embedded in the document during the authoring stage) that can later be automatically extracted by machine. This process is known as semantic annotation or knowledge mark-up, and proponents argue that in future this may well be part and parcel of the everyday creation of documentation. Early examples of this technique include SHOE35 and Adobe XMP36. The former provides for

35 http://www.cs.umd.edu/projects/plus/SHOE/

31 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents additional HTML tags that can be included in a webpage and the latter allows Semantic Web RDF constructs to be embedded into PDF files. Experimental work into developing authoring tools in this direction is ongoing, but WickOffice (Carr et al., 2004) and SemanticWord37 (a Microsoft Word-based environment, which adds toolbars to the standard environment to facilitate the addition of semantic annotation) are examples of early-stage demonstrator projects (Oliveira and Lima-Marques, 2006).

36 http://www.adobe.com/products/xmp/

37 http://mr.teknowledge.com/daml/SemanticWord/SemanticWord.htm

32 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

5. Implications for education As already noted, HE is strongly supportive of the use of open standards as they enable interoperability. This policy is, however, somewhat divorced from the everyday reality of the context in which most people create and use office documents. The current furore over the future of office documentation formats presents Higher and Further education with a number of challenges and opportunities. 5.1 Fidelity and backwards compatibility: the challenge of long-term data storage There are many situations within HE/FE in which the long-term storage and access to documents is highly desirable (for example, libraries), or, even in some cases a legal requirement (for example, records management, tax-related financial data etc.). As Stanescu (2004) and Day (2006) make clear it is important for longevity that documents are stored on suitable media and in a suitable format; and that measures for the latter include authenticity (fidelity) and support for backwards compatibility. The preservation of documents over long periods of time can be considered to have two goals: to make information available for re-use or re-purposing in the future and to retain the exact look and feel of a document over a long period of time (CENDI, 2006). The developers of office systems are aware of these requirements, which are not unique to education, and have introduced measures to help support this. For example, Adobe have developed a sub- type of the PDF format (PDF/A), intended specifically for data archiving and this has recently been approved by ISO/IEC as a standard (LeFurgy, 2003). The increasing availability of XML-based formats may also simplify the task of data archiving, particularly where these formats are the basis of open standards and are supported by multiple office applications. Indeed, some archivists argue that the choice is one of four depending on the scenario: PDF, PDF/A, TIFF38 or XML (CENDI, 2006)39. This is a complicated area that needs a whole report of its own, but for the purpose of our discussion we will stay focused on XML. Because it is a text-based format in which the actual material is encoded in the international, open, standard and is therefore human- readable, it is inherently more sustainable as a data format (LeFurgy, 2003). The nature of XML means that content is separated from presentation, which means that content can easily be repurposed to new data formats by using technologies like XSLT40. A very small corruption of a binary file can make it unprocessable, but this is not the case with XML41. In the long term, even in situations where the original software that created the XML is lost and the platform it used consigned to history, it is more than likely that the file will still be readable, and capable of being translated and processed. As already mentioned, the issue of format fidelity and backwards compatibility has proven to be a very contentious point. The British Library, for example, considers this to be so important that it has publicly backed Microsoft and Ecma in its attempts to have OOXML approved as a standard. The National Archives also has a strong interest in this debate as 95%

38 TIFF stands for Tagged Image File Format, and provides a pixel-by-pixel representation of an image or document's contents.

39 A further discussion of file formats for preservation is available: http://dlib.org/dlib/december05/johnson/12johnson.html

40 Extensible Stylesheet Language Transformation, a W3C Recommendation, allows the easy conversion of data that conforms to one XML schema to another.

41 Although it is worth noting that Barnes (2007) makes the point that when the XML is stored in a compressed Zip files (as is the case with both ODF and OOXML) it is still prone to such corruption.

33 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents of all Government and court records are now produced electronically. The Archive has a legal commitment to preserving these important public records and is actively working on new ways of working as it moves from paper to electronic records through its Seamless Flow Programme42. The move towards XML-based formats has led to considerable debate within archiving and preservation circles. Barnes (2007), for example, argues that the move to XML-based formats within word processing applications is an improvement in terms of preservation considerations, but considers OOXML to be 'unsuitable' and ODF to have a number of disadvantages. He argues for the use of ODF as an intermediate format between a word processed document (which could be written using one of the packages that support ODF either natively or with plug-ins) and DocBook XML. This approach, of taking XML-based formats generated by everyday word processing applications and using XSLT translation in order to produce an XML format suitable for preservation, is also outlined by Dobratz (2005). Others consider there are other, more attractive, ways of preserving documentation. Day (2006) outlines the use of emulation technologies (in order to recreate a preserved document's exact look and feel) including the experimental use of IBM's Universal Virtual Computer by the National Library of the Netherlands. Inevitably, such discussions are widened to consider the full authoring and publishing workflow. It is likely that the HE archiving and preservation community will need to instigate further work to establish its requirements and develop a policy for the move to XML. 5.2 Opportunities We are entering a new operational environment for two main reasons: firstly, there is currently a high-level international debate over the way forward for XML-based formats for office documents. Secondly, a new version of Microsoft's Office application has been released in the last few months, but is yet to have a significant impact on education (see appendix B), a sector that is notoriously slow to upgrade (OSS Watch, 2007). However, based on OSS-Watch's estimation of upgrade cycles of smaller information systems, it is reasonable to expect that this process may well gather pace in the next two to five years. This means that there is a window of opportunity to pause and take stock of the situation with regard to the production and exchange of office documentation, and to produce robust guidelines for JISC and institutions as to how best to take advantage of the benefits offered by upgrading to XML-based office formats. However, there is also a third, more speculative criterion to take into account. These events are taking place against a background in which the student population is making increasing use of open source solutions, provided free of charge. Indeed, some institutions are actively encouraging this process by, for example, providing CD-ROMs or USB keys (Sayer, 2007) with a selection of pre-installed open source packages. There is also a more general move towards the uptake of open source within HE and a recent OSS-Watch report documented a major study of the sustainability of open source software as a viable option for higher education (OSS Watch, 2007). 5.2.1 Open standards and policy As well as the policy and strategy context in which universities and colleges operate with regard to file formats there are several practical issues with regard to choosing an appropriate file format for the tasks within education. Many factors come into play and this is borne out by the fact that although bodies like JISC support the use of open standards in general, the reality of office documentation usage, on the ground, within universities and colleges, is that

42 http://www.nationalarchives.gov.uk/documents/faqs-for-general-public.pdf

34 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents closed formats are widely used. 5.2.2 Defining open standards It has been noted elsewhere that the Higher and Further education communities in the UK have a culture which is broadly supportive of open standards. There has already been some work on the issue of open standards within the education community e.g. Kelly et al., (2007). However, bearing in mind recent EU activity in this area, it may now be appropriate to revisit this work in the light of PEGSCO and try to develop practical advice for those looking to move to XML-based office document formats. 5.2.3 Raising awareness and selection of appropriate formats A number of possible office documents formats are available, as we have discussed in section three. Sensible selection of format, based on identification of requirements for internal use and external publication, can reduce the need to publish documents using multiple formats, while ensuring convenient access to information. However, it may be necessary to raise awareness of the role of the end user when deciding on the appropriate file format for a particular document type, in line with PEGSCO recommendation 6.2 (see section 2.5). This could be used to develop specific guidelines, in line with PEGSCO 6.3, that tie in open standards best practice in such a way that they can be easily interpreted and implemented by staff within institutions who have responsibility for software purchasing decisions. For example, PEGSCO recommendation 6.7 advocates that users should be free to choose their own preferred file formats when opening or saving documents, and these formats should be equally convenient to use, subject to any inherent limitations. This is a useful 'first principle' for helping people to understand why it is important for vendors of office software to provide native support for a broad range of open standards and that the development of optional and third party plug-ins should be reserved for specialist scenarios, rather than for use with widely available open standards. Practical considerations that would be factors in any discussion of formats are: . Degree of interoperability and openness of file format . IPR issues . Appropriateness of associated office applications that support the format (e.g. costs, platform, availability etc) . Long term sustainability of storage and backwards compatibility with legacy formats . Fidelity requirements

5.2.4 IPR and patent issues

Organisations are understandably reluctant to make use of file formats or office applications/packages which have unresolved IPR 'baggage'. However, it is beyond the scope of this report to form a valid assessment of the issues as they are presented in the media, and it is recommended that JISC should consider commissioning work in this area.

35 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

6. Conclusions and recommendations The tacit acceptance of proprietary office file formats as a way of achieving interoperability is becoming less acceptable. Government agencies, in particular, are becoming increasingly conscious of the need to provide easy access to electronic documents to all stakeholders, while not requiring them to purchase a particular software product in order to view or edit these documents. The requirement to provide long term availability and archiving of documents is also encouraging a move away from proprietary file formats, particularly where future access might only be via a single supplier's software products. JISC and the wider HE/FE community, as part of the public sector, will be required to address these issues with respect to how they deal with the publication of electronic documents, and the internal/external transference of document files. The pressure to move to open file formats has been ongoing for several years, but as Microsoft, the market leader in office document formats, has arguably been slow to move from its proprietary, binary file formats, the net effect, for most people, has been the continued use of proprietary, de facto standards. However, due to pressure to move towards open document file formats, Microsoft has implemented a staged introduction of XML-based formats for its Office suite. With the release of Office 2007, the transition has been formalised, and existing users who upgrade to Office 2007 will, essentially, be upgrading to a form of XML. However, in developing their XML-based Office suite, Microsoft have chosen not to support the international standard43 but, rather, to develop their own specification (OOXML), which, they argue, has been designed to provide better backwards compatibility with Microsoft binary file formats. The OOXML specification was approved by the Ecma International consortium, which is now trying to have OOXML approved as an alternative ISO standard. This has created a huge amount of controversy. Not only are there concerns from the EU about the possible creation of two international standards in the same area, but the manner in which the process has been managed and the quality of the Ecma specification document itself, have aroused anger and distrust. The arguments and counter-arguments are numerous and complicated. The reality is that even if OOXML is not approved by ISO, JISC and HEIs will still be operating in a 'two-standard' world, one a de jure, and the other de facto. JISC is very supportive of open standards as an aid to interoperability and the implementation of open source software, where appropriate. Most members of staff, however, across JISC and HE more generally, use, and perhaps more significantly, are used to using Microsoft Office. However, it should be stressed that the real arguments are not as straightforward as this, and there may be significant implications for those involved in archiving and preservation activities. The widespread use of Adobe's PDF and its status as the preferred format for uploading documents to the Web, complicates matters still further. However, if JISC and HEIs are going to be able to take advantage of the increased interoperability and reusability on offer through XML, they need to start planning the switch to XML very soon. This means that they will have to take hard decisions about whether to support ODF or Microsoft Office (rather than OOXML per se, as it appears that Ecma-376, the OOXML specification, may not fully specify the format used in Office). Institutions looking to upgrade to XML will need to be confident that they are making the most appropriate decision for their situation, and this will need to be made on the basis of identification of users' needs, costs, interoperability, and sustainability as well as existing and

43 ISO 26300 (ODF) is the international standard for XML-based office document formats, and it is supported by a variety of software packages, the most notable being OpenOffice.org

36 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents future public sector obligations. How JISC and HEIs respond to this issue over the next two to five years will have significant impact on the effectiveness of XML implementation for many years to come. Therefore, how HE responds now has the potential to affect us far into the future. Recommendations Based on the potentially slow speed of user migration to Office 2007 and widespread third party support for older file formats, it is likely that legacy Microsoft file types may be preferred over newer XML formats for some time, especially within HE. Bearing in mind the long-term cost and interoperability implications of the switch it is recommended that consideration should be given, as a matter of some urgency, to commissioning some work into the best approach to upgrading, possibly even involving JISC's international partners. It should also bear in mind possible future scenarios with regard to the use of open source office document packages amongst students and the need to provide interoperability on campus. In concert with this work, it is recommended that profile-raising should take place within JISC and institutions, about the implications of selecting appropriate file formats for document publication. It may also be appropriate to produce guidelines for staff and to revise these guidelines periodically (PEGSCO Recommendation 6.5). In addition, given recent announcements related to alleged patent infringements, it may be appropriate for there to be a commission into the various IPR issues surrounding the claims around the various different formats, in order to offer legally-based advice to educational decision makers considering implementation of products which might potentially be affected. This work should also inform the first recommendation. In the meantime, users of Microsoft Office software may wish to identify a time-scale for upgrading to the latest version. Those organisations not planning immediate upgrades may wish to install compatibility packs as an interim measure, allowing the latest Office file types to be opened and saved by currently installed software versions. (Section 4 – See Directions on Microsoft recommendation.). It could also be prudent to avoid the use of newly introduced features until these are supported by the majority of applications that are in widespread use. Where special conditions exist for long term data storage requirements, it is likely that there will need to be a separate, specialist study for organisations or departments that are heavily involved with the secure, long-term storage of important records or archiving and preservation. As an interim measure, organisations may wish to review the Adobe PDF/A (ISO/IEC 19005-1:2005: Electronic document file format for long-term preservation) format, which is specifically intended for data archiving purposes, possibly in addition to storage of the original revisable office document format. Provision of online office applications and related services is evolving rapidly. A possible area for further research relates to the potential use of such facilities for academic purposes. A TechWatch watching brief should be maintained in this area.

37 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

About the author Walter Ditch is a lecturer at Middlesbrough College, specialising in electronics and engineering applications of ICT. Over the past twenty years, he has variously worked as an ICT manager, lecturer and author, most recently producing a 1,000 page e-book covering the open source office application, OpenOffice.org.

A graduate of the University of Sunderland (Electrical and Electronic Engineering), he also has an MSc in Information Technology Management and has also worked as a Microsoft Certified Systems Engineer.

He can be contacted at: [email protected].

38 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

Appendix A: What are standards? Summary of draft content for the forthcoming JISC Standards Watch report (Autumn/Winter 2007) From the very earliest times, humans have used standards to help spread innovation and technical progress and to help develop markets (Temple, 2005). The use of alphabets, currencies and units of measurement are all examples of the use of standards. In general, 'standards' are commonly accepted agreements on ways to do or make things, and their use ('standardisation') contributes one or more of the following four functions in a modern society:

• Ensuring interoperability or compatibility between different parts of a product or between products as part of a system or network • The provision of a minimum level of quality, which may be defined in terms of functionality or safety of products • The reduction of variety, allowing for economies of scale • The provision of information.

The first of these is the most relevant for any discussion of ICT and its associated technologies. Without widespread agreement over ways of working and standards associated with interoperability, technical achievements like the Internet would not have been possible. A short history of standards Although early efforts at standardisation were designed to increase efficiency and productivity on engineering projects, the potential to protect consumers also became an important factor. In the late 1950s, the British Standards Institution (BSI) extended the scope of their kitemark® standard to domestic goods, in order to enable consumers to ascertain the quality of products that were appearing on the market. The lay person's understanding is that a standard can only be produced by a nationally or internationally recognised formal standards body such as the BSI (UK), DIN (Germany), and organisations such as ISO and IEEE for international standards. This is true in a very specific way, but in a more general sense, standards, particularly in the computer industry, are actually more complicated than this. As well as the formal standards bodies, there are other organisations which produce their own 'standards' and the status that these other standards accrue often depends, in part, on the status of the organisation that produced them. Pedersen and Fomin (2005) report that there are over 400 fora that are relevant to the development and adoption of ICT standards, all with their own blend of people representing different interests. A wide variety of international, European, national, and non-government agencies, as well as vendor organisations, academics and researchers, consortia, individual companies, and even individual people all get involved in the development of what might loosely be termed standards. Standards in ICT In the past, the process of formal standards development was very time consuming, with full ISO accreditation taking up to seven years (Cargill, 1997). The reason for this was, partly, because of the need to be seen to be as democratic as possible, since, traditionally, some of the standards that are ratified by the FSBs go on to form the basis for legal regulation within individual countries. Regulators are therefore expected to have a principled concern for the fullest democratic accountability during the development of what may become legally mandated requirements. In addition, formal standards are traditionally used in procurement decisions by central government departments. However, in the late 1980s this became a problem for the fast-moving computer industry. The

39 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents story has it that the speed of innovation meant that technical developments were increasingly time-sensitive, and that this led to the creation of faster-moving, industry-led consortia to act on standards development. In fact, Egyedi (2001a) argues that little quantitative data for this speed deficit exists, but, whatever the underlying reasons, one of the factors has to be the fact that the technology area with greatest growth at the time (the early Internet and Web) did not seek any formal standards body activity, but chose to work through the Internet Engineering Task Force (IETF) and W3C rather than the ISO44. Cargill (1997) states that the rise and growth in the number of consortia in the early 1990s was a 'major shock' to the ISO and other national bodies involved in IT standardisation who saw a reduction in both their activities and in the number of members of their bodies. This, in turn, led to efforts to streamline some of the bureaucratic burdens within the FSBs. As an example, this reduced the time from start to becoming a draft standard in ISO/IEC from seven years in the late 1980s to three years by the mid 1990s. Two of the most important innovations in respect of speed of process were the introduction of Fast-Track in 1987 and PAS (Publicly Available Specifications45) in 1994 (Egyedi, 2001b). Fast-Track allowed consortia and membership fora who already had a formal relationship with JTC1's standards work to submit prepared technical specifications as Draft International Standards, thereby circumventing some of the earlier standards development processes. However, although there is now a wide variety of consortia and formal standards bodies working in the technical arena, Egyedi (2001a) argues that a simplistic notion of consortia as fast but undemocratic and formal standards bodies as slow but democratic and open, is unfair. This is a complex discussion and has been the subject of debate amongst people involved in standards development for the last 25 years. The interested general reader should bear these debates in mind when reviewing the progress of a technology through one or more of these types of bodies. How are standards created? The process of the creation and development of a standard is one way of understanding why different standards development organisations (SDOs) are seen as having different status. For example, proprietary standards are created by individual companies for use within their organisations and by their customers. Over time, successful proprietary standards may develop, through the market, into what are called de facto standards. In this case, it is the 'fact' that a great many people, possibly everyone, is making use of a proprietary standard, that makes it a standard - no formal standards-making process has taken place. Other standards are created by a much more collective process. In these cases, committees of manufacturers, research organisations, government departments and consumers work together to draw up standards. These collective standards often start life as the ideas and technical specifications of university researchers, professional bodies, learned societies or private companies rather than being created from scratch. In fact, the quality of the technical specification can also be used as a way of judging the quality of a standard.

44 This was compounded by the failure of the Open Standards Interconnect (OSI) standardisation effort. OSI was a major attempt in the 1980s by ISO's JTC1 committee to standardise computer communication around a seven layered model in order to allow any operating system to communicate easily with another. The process failed, in large part, because of the need to find consensus from amongst many vested interests. The Internet was then widely taken up as the solution to the inter-communication problem without the need for further formal standards work. The failure of OSI was seen as a signal event in the story of formalised standards versus the industry consortia (See Cargill, 1997 for the full story).

45 In full: JTC1 N3660, ISO/IEC Directives – Procedures for the Technical Work of JTC1 on Information Technology – Edition 3-supplement 1: The transposition of Publicly Available Specifications into International Standards.

40 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

Different organisations have widely varying processes for the initiation, development and acceptance of a standard. The complexity of these processes is often a reflection of the number of stakeholders involved and the geographical reach of a standard. In addition, standardisation is not simply a process of resolving technical issues; political, economic and administrative factors often also come into play (Jakobs et al., 1996). A wide variety of formats for document drafts, working groups, committees and ballot processes exist which attempt to form, in one way or another, a common agreement through some form of voting or consensus-seeking. What are open standards? The idea of openness within standards creation has several elements: the process and speed by which the standard is created; the cost of accessing and using the standard; any copyright or intellectual property impediments to implementing the standard. For example, proprietary standards can be open, in the sense that the pertinent information and detailed specification is public and open to all, but this is not often the case since such open proprietary standards offer less of a market advantage. Those overseen by national and international standards bodies, which are supposed to be imbued with inherent public good motivations, are far more likely to meet the criteria of openness. A further categorisation takes note of the speed with which specifications are developed and ratified as standards. This is closely related to the concept of democracy i.e. how varied the membership and open the decision-making procedures within a standards body are and whether any one interest can exert control over proceedings: "One of the elemental principles of standardization is that the future of a specification, once it is delivered to a standardization group, lies with the group and the market and not with a single vendor" (Cargill, 1997, p. 199). Both speed and democracy are closely linked to the rise of standards development through industry-led consortia. The status of an industry standard is often benchmarked in relation to the processes that created it, as these can vary so significantly between consortia.

41 Appendix B: Numbers of office documents published on the Web, by filetype and Internet domain

Type Description All (.) .com .org .edu .gov .co.uk .org.uk .ac.uk .gov.uk

Adobe PDF/Microsoft Adobe PDF (.pdf) 302,000,000 59,800,000 120,000,000 34,700,000 16,800,000 2,600,000 1,530,000 1,410,000 1,600,000 XPS Microsoft XPS (.xps) 823 570 107 4 0 0 0 0 0

Word processor (.doc) 44,900,000 2,350,000 2,050,000 277,000 1,720,000 277,000 318,000 379,000 830,000 Microsoft legacy Spreadsheet (.xls) 15,600,000 1,450,000 1,020,000 42,000 1,340,000 42,000 18,900 25,300 106,000 formats Presentation (.ppt) 14,700,000 1,360,000 1,610,000 25,600 675,000 25,600 34,500 87,400 16,000

Word processor (.docx) 925 497 49 8 0 5 0 0 1 Microsoft Office 2007 Spreadsheet (.xlsx) 182 112 2 3 0 0 0 1 0 (OOXML) Formats Presentation (.pptx) 722 497 8 8 0 0 1 0 0

Word processor (.odt) 91,900 910 19,600 738 59 169 105 84 7

Open document Spreadsheet (.ods) 21,100 1,270 515 119 3 54 42 90 1 formats Presentation (.odp) 50,100 701 11,900 929 85 34 29 157 0

Graphics (.odg) 1,660 240 273 121 6 6 1 1 0

Source: google.co.uk, 17th – 19th July 2007

Search syntax example: site:ac.uk filetype:pdf Note: Highlighted cells show a significant 'vendor led' effect in the global total (>1%)See: http://www.webskills4u.com/index.php/site/article/google_file_formats/ for method of data generation References ADOBE, 2007. Adobe to release PDF for industry standardization. Adobe press release, 29th January 2007. Available online at: http://www.adobe.com/aboutadobe/pressroom/pressreleases/200701/012907OpenPDFAIIM.html [last accessed: 01/08/07]. AHRENS, K. 2007. What About SVG? GullFOSS (Sun Microsystems blog). Available online from http://blogs.sun.com/GullFOSS/entry/what_about_svg [last accessed: 01/08/07]. ANDERSON, P. 2007. What is Web 2.0? ideas, technologies and implications for education. JISC Technology and Standards Watch. JISC: Bristol, UK. February 2007. Available online at: www.jisc.ac.uk/techwatch [last accessed: 01/08/07]. ANDERSON, T. 2007. Microsoft's Jean Paoli on the XML document debate. Tim Anderson's ITWriting (journalist's personal blog). Available online at: http://www.itwriting.com/blog/?page_id=187 [last accessed: 01/08/07]. ASLETT, M. 2007a. 'Legitimate concerns' raised over Microsoft's Office formats. Computer Business Review Online, 7th March 2007. Available online from http://www.cbronline.com/article_news.asp?guid=CCED1215-080E-4A71-A5B7-15027927C415 [last accessed: 03/04/07]. ASLETT, M. 2007b. Novell distances itself from Microsoft's patent claims. Computer Business Review, 16th May 2007. Available online from http://www.cbronline.com/article_news.asp?guid=BE40F937- 452D-481F-B728-0262DE6FB8D4 [last accessed: 13/06/07]. BANGEMAN, E. 2006. Adobe may sue Microsoft over "PDF-killer". ars technical, 20th November 2007. Available online from http://arstechnica.com/news.ars/post/20061120-8254.html [last accessed: 12/04/07]. BARNES, I. 2007. The Digital Scholar's Workbench. In: CHAN, L., MARTENS, B. (eds) 2007. ELPUB2007, Openness in Digital Publishing: Awareness, Discovery and Access. Proceedings of the 11th International Conference on Electronic Publishing held in Vienna, Austria 13-15 June 2007, pp. 285- 296. OEKK-Editions: Vienna, Austria. Available online: http://elpub.scix.net/cgi- bin/works/Show?159_elpub2007 [last accessed: 06/08/07]. BRAY, T. 2004. On Custom Schemas. Personal Blog, 18th June 2004. Available online at: http://www.tbray.org/ongoing/When/200x/2004/06/17/CustomSchemas [last accessed: 02/08/07]. BRAY, T. 2005. Thought Experiments. Personal Blog, 27th November 2005. Available online at: http://www.tbray.org/ongoing/When/200x/2005/11/27/Office-XML [last accessed: 01/08/07]. BUSINESS WEEK, 2006. More to life than the Office. BusinessWeek, 3rd July 2006. Available online at: http://www.businessweek.com/magazine/content/06_27/b3991412.htm [last accessed: 01/08/07]. CARGILL, C. 1997. Prelude. Standard View. Volume 5, Issue 4, December 1997, pp. 128-132. ACM Press: New York, USA. CARR, L., MILES-BOARD, T., WOUKEU, A., WILLS, G., AND HALL, W. 2004. The case for explicit knowledge in documents. In: Proceedings of the 2004 ACM Symposium on Document Engineering (Milwaukee, Wisconsin, USA, October 28 - 30, 2004). DocEng '04. ACM Press, New York, NY, pp. 90- 98. DOI= http://doi.acm.org/10.1145/1030397.1030417 CARRERA, D., D'ARCUS, B., EISENBERG, D., HUDSON, A. 2005. Format Comparison Between ODF and MS XML. Groklaw, 25th November 2005. Available online from http://www.groklaw.net/article.php?story=20051125144611543 [last accessed: 17/04/07].

JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

CENDI, 2006 (revised 2007). Formats for Digital Preservation: A review of the alternatives and issues. Information Services and Use. Volume 27, Number 1-2, 2007, pp. 45 – 63. IOS Press: Alabama, USA. Available online: http://iospress.metapress.com/index/9V61373118049755.pdf [last accessed: 01/08/07]. CLARK, D. 2002. Do Web Standards and Patents Mix? Computer. Volume 35, Issue 10, pp. 19-22, October 2002. IEEE: New York, USA. COREL, 2006. Corel WordPerfect Office to support Open Document Format and Microsoft Office Open XML. Corel press release, 29th November 2006. Available online at: http://www.corel.com/servlet/Satellite/us/en/Content/1153321430604?pressId=1164741065876 [last accessed: 01/08/07]. COVER. R. 2003. Microsoft Licenses Office 2003 XML Reference Schemas. Cover Pages (OASIS), 17th November 2003. Available online from http://xml.coverpages.org/LicenseOfficeSchemas.html [last accessed: 12/04/07]. DARGAN, P. 2005. Open Systems and Standards for software product development. Artech House Inc: Massachusetts, USA. DAY, M. 2006. The Long-Term Preservation of Web Content. In: MASANES, J. (ed.) 2006. Web Archiving. Springer: Berlin, Germany. DITCH, W. 2007. Using Google to Track File Format Usage on the Web. Webskills4u website. Available online at: http://www.webskills4u.com/index.php/site/article/google_file_formats/ [last accessed: 03/08/07]. DOBRATZ, S. 2005. Thinking the long term: the XML-based publishing workflow for handling theses and dissertations at Humboldt-University Berlin. 8th International Symposium on Electronic Theses & Dissertations, ETD2005. 28th-30th September 2005, Sydney, Australia. Available online at: http://adt.caul.edu.au/etd2005/papers/075Dobratz.pdf [last accessed: 03/08/07]. ECMA, 2006a. Ecma Office Open XML File Formats Standard - Final draft. Ecma International: Geneva, Switzerland. Available online from http://www.ecma- international.org/news/TC45_current_work/TC45-2006-50_final_draft.htm [last accessed: 12/04/07]. ECMA, 2006b. Ecma international approves Office Open XML standard. Ecma International press release, 7th December 2006. Ecma International: Geneva, Switzerland. Available online at: http://www.ecma-international.org/news/PressReleases/PR_TC45_Dec2006.htm [last accessed: 02/08/07]. ECMA, 2007a. National Body Comments from 30-Day Review of the Fast Track Ballot for ISO/IEC DIS 29500 (Ecma-376) “Office Open XML File Formats”. Ecma International: Geneva, Switzerland, 28th February 2007. Available online from http://www.ecma- international.org/news/TC45_current_work/Ecma%20responses.pdf [last accessed: 10/04/07]. ECMA, 2007b. Ecma International creates TC46 to standardize XML Paper Specification. Ecma International press release, 3rd July 2007. Geneva, Switzerland. Available online at: http://www.ecma- international.org/news/PressReleases/Ecma%20creates%20TC46.htm [last accessed: 02/08/07]. E-GIF, 2005. E-GIF Technical Standards Catalogue. UK Cabinet Office. Version 6.2, September 2005. Available from: http://www.govtalk.gov.uk/documents/TSCv6.2_2005_7_14_final.pdf [last accessed: 12/04/07]. EGYEDI, T. 2001a. Beyond Consortia, Beyond Standardisation? Final Report for the European Commission. October 2001, Delft University of Technology: Netherlands. Available online: http://www.tbm.tudelft.nl/webstaf/tinekee/Report_EU_Beyond_Stand.pdf [last accessed: 01/08/07].

44 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

EGYEDI, T. 2001b. Why Java was – not – standardised twice. Computer Standards & Interfaces. Volume 23, issue 4 (September 2001) pp. 253-265. Elsevier Science Publishers: Amsterdam, Netherlands. ERICSON, R. 2007. Online Office Suites: The Winner Is Clear. ComputerWorld, 17th January 2007. Available online from http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9007884 [last accessed: 12/04/07]. FIORETTI, M. 2005a. OpenDocument office suites lack formula compatibility. NewsForge, 20th September 2005. Available online from http://software.newsforge.com/article.pl?sid=05/09/09/192250&from=rss [last accessed: 13/04/07]. FIORETTI, M. 2005b. Macros an obstacle to office suite compatibility. NewsForge, 17th September 2005. Available online from http://software.newsforge.com/article.pl?sid=05/09/09/1640253&tid=93 [last accessed: 13/04/07]. FISHER, K. 2006. Microsoft's PDF-killer heads towards standards body. ars technica, 15th October 2006. Available online from http://arstechnica.com/news.ars/post/20061015-7992.html [last accessed: 12/04/07]. FOLEY, M. 2007. Microsoft's Office 2007 team wants in on Web 2.0. ZDNet, 10th January 2007. Available online at: http://blogs.zdnet.com/microsoft/?p=194 [last accessed: 01/08/07]. FONTANA, J. 2007a. Massachusetts puts Open XML out for consideration. PC World, 4th July 2007. Available online at: http://www.pcworld.com/article/id,134156-c,opensource/article.html [last accessed: 01/08/07]. FONTANA, J. 2007b. Novell ships translator for OpenXML as fruit of Microsoft partnership. Linux World, 3rd May 2007. Available online at: http://www.linuxworld.com/news/2007/030507-novell- translator.html [last accessed: 01/08/07]. FSFE, 2007. Six questions to national standardisation bodies. Free Software Foundation Europe. Available online at: http://fsfeurope.org/documents/msooxml-questions [last accessed: 01/08/07]. GEYER, C. 2006. OpenDocument FAQ. Opendocument (OASIS), 6th September 2006. Available online: http://opendocument.xml.org/faq [last accessed: 13/04/07]. GOLDFARB, C., PRESCOD, P. 1998. The XML Handbook. Prentice Hall: New Jersey, USA. GONZALEZ-ÁLVAREZ, J. R. 2006. Microsoft avoids MathML in Office XML format. Canonical Science Today. Available online from: http://canonicalscience.blogspot.com/2006/08/microsoft-avoids- -in-office-xml_22.html [last accessed: 16/04/07]. GROKDOC, 2007. EOOXML Objections. GrokDoc, 23rd January 2007. Available online from http://www.grokdoc.net/index.php/EOOXML_objections [last accessed: 13/04/07]. HELM, R. 2006. New File Formats in Office 2007. Directions on Microsoft, 4th December 2006. Available online at: http://www.directionsonmicrosoft.com/sample/DOMIS/update/2007/01jan/0107nffio2.htm [last accessed: 01/08/07]. ISO, 2007. Overview of the ISO system. ISO website. Available online at: http://www.iso.org/iso/en/aboutiso/introduction/index.html [last accessed: 02/08/07]. JAKOBS, K., PROCTER, R., WILLIAMS, R. 1996. Users and Standardization—Worlds apart? The example of Electronic Mail. Standard View, Volume 4, No. 4, December 1996, pp. 183 – 191. ACM Press: New York, USA. JISC, 2005. Policy on open source software for JISC projects and services. JISC Executive, 24th January

45 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

2005. Available online at: http://www.jisc.ac.uk/fundingopportunities/open_source_policy.aspx [last accessed: 01/08/07]. JISC, 2006. JISC Review of standards organisations. Draft, internal document, September 2006. JISC, 2007. JISC Strategy 2007-2009. JISC Executive. Available online at: http://www.jisc.ac.uk/aboutus/strategy/strategy0709.aspx [last accessed: 01/08/07]. KAPLAN, J. 2007. Spain Steps onto the ODF Map. Personal Blog, 31st July 2006. Available online at: http://jakaplan.blogspot.com/2006_07_30_archive.html [last accessed: 06/08/07]. KAWAMOTO, D. 2006. Microsoft, Adobe squabble over PDF. CNET News, 2nd June 2006. Available online from http://news.com.com/2100-1012-6079320.html [last accessed: 12/04/07]. KELLY, B., DUNNING, A., RAHTZ, S., HOLLINS, P., PHIPPS, L. 2006. A Contextual Framework for Standards. Proceedings of WWW2006, Edinburgh, Scotland, 22-26 May 2006. Conference Proceedings, Special Interest Tracks, Posters and Workshops (CD ROM). KELLY, B., WILSON, S., METCALFE, R. 2007. Openness in Higher Education: Open Souce, Open Standards, Open Access. Proceedings of the 11th International Conference on Electronic Publishing held in Vienna, Austria 13-15 June 2007, pp. 161-174. OEKK-Editions: Vienna, Austria. KIRK, J. 2007. Norway likely to mandate open document formats. InfoWorld, 15th May 2007. Available online at: http://www.infoworld.com/article/07/05/15/norway-mandates-open-documents_1.html [last accessed: 06/08/07]. KRILL, P. 2006. Microsoft eyes "people-ready" software. InfoWorld, 5th April 2006. Available online at: http://www.infoworld.com/article/06/04/05/77167_HNwittssoftware2006_1.html [last accessed: 01/08/07]. LAMONICA, M. 2003. Microsoft pries open office 2003. ZDNet, 17th November 2003. Available online at: http://news.zdnet.com/2100-3513_22-5108018.html [last accessed: 01/08/07]. LEFURGY, W. 2003. PDF/A: Developing a File Format for Long-Term Preservation. RLG Digi News, Vol. 7 No. 6 (December 2003). Available online at: http://www.rlg.org/preserv/diginews/diginews7- 6.html#feature1 [last accessed: 01/08/07]. LIE, H. 2007. Microsoft's amusing standards stance. CNET News, 22nd February 2007. Available online at: http://news.com.com/Microsofts+amusing+standards+stance+-+page+2/2010-1013_3-6161285- 2.html?tag=st.next [last accessed: 01/08/07]. LIN, D. 2003. Dual Tragedies: IP Rights in Industry Standards. Computer. Vol. 36, No. 2, February 2003, pp. 25-27. IEEE: New York, USA. MACNAGHTEN, E. 2007. ODF/OOXML technical white paper. Free Software Magazine, Issue 17, 2nd May 2007. Available online from http://www.freesoftwaremagazine.com/articles/odf_ooxml_technical_white_paper?page=0%2C0 [last accessed: 13/06/07]. MAHLER, E. 2006. Open document formats. Pushing String (personal blog), 16th May 2006. Available online at: http://www.xmlgrrl.com/blog/archives/2006/05/16/open-document-architectures/ [last accessed: 01/08/07]. MCALLISTER, N. 2006. China aims to set a new office doc standard. InfoWorld, 4th December 2006. Available online at: http://www.infoworld.com/article/06/12/04/49OPopenent_1.html [last accessed: 01/08/07]. MICROSOFT, 2006. Microsoft expands document Interoperability. Microsoft press release. Available online at: http://www.microsoft.com/presspass/press/2006/jul06/07-06OpenSourceProjectPR.mspx [last accessed: 02/08/07].

46 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

MICROSOFT, 2007. Microsoft covenant regarding Microsoft Office 2003 XML Reference Schemas and Ecma Office Open XML File Formats. Microsoft Corporation: Washington, USA. Available online from: http://office.microsoft.com/en-us/products/HA102134631033.aspx [last accessed: 13/04/07]. MICROSOFT, 2007b. It's Coming: Mac BMU announces intent to deliver Office 2008 for Mac. Microsoft press release, 9th January 2007. Available online at: http://www.microsoft.com/presspass/press/2007/jan07/01-09MacworldPR.mspx [last accessed: 01/08/07]. MINISTRY OF SCIENCE, 2007. Important Political Progress for Open Standard. Danish Ministry of Science, Technology and Innovation press release, 25th June 2007. Available online at: http://videnskabsministeriet.dk/site/frontpage/press/important-political-progress-for-open-standards [last accessed: 06/08/07]. MOGLEN, E. 2006. OpenDocument Opinion Letter. Software Freedom Law Center, 12th July 2006. Available online at: http://www.softwarefreedom.org/resources/2006/OpenDocument.html [last accessed: 01/08/07]. MOODY, G. 2006. Microsoft's Masterpiece of FUD. Linux Journal, 19th September 2006. Available online at: http://www.linuxjournal.com/node/1000097 [last accessed: 02/08/07]. OASIS, 2005. Sun OpenDocument Patent Statement. OASIS website, 29th September 2005. Available online from http://www.oasis-open.org/committees/office/ipr.php [last accessed: 13/04/07]. OASIS, 2006a. Open by Design: The Advantages of the OpenDocument Format (ODF). OASIS ODF Adoption TC, 10th December 2006. Available online from http://www.oasis- open.org/committees/download.php/21450/oasis_odf_advantages_10dec2006.pdf [last accessed: 10/06/07]. OASIS, 2006b. OpenDocument v1.0 (Second Edition) specification. OASIS ODF Adoption TC, 19th July 2006. Available online at: http://www.oasis- open.org/committees/download.php/19274/OpenDocument-v1.0ed2-cs1.pdf [last accessed: 02/08/07]. ODF ALLIANCE, 2007. Japan Becomes First Asian Nation to Embrace Open Software Standards Such as ODF. ODF Alliance press release, 9th July 2007. Available online at: http://www.odfalliance.org/press/Release20070710.pdf [last accessed: 06/08/07]. OLIVEIRA, E., LIMA-MARQUES, M. 2006. An architecture of authoring environments for the semantic web. In: MARTENS, B., DOBREVA, M. (eds.). 2006. Proceedings of ELPUB2006 Conference on Electronic Publishing, Bansko, Bulgaria, June 2006. IMI-BAS: Sofia, Bulgaria. OPENOFFICE.ORG, 2004. Strategic Marketing Plan 2010. OpenOffice.org website. Available at: http://marketing.openoffice.org/strategy/v0.5.pdf [last accessed: 12/04/07]. OPENOFFICE.ORG, 2007a. The OpenOffice.org ODF Toolkit Project. OpenOffice.org website. Available at: http://odftoolkit.openoffice.org/ [last accessed 06/03/07]. OPENOFFICE.ORG, 2007b. OpenOffice.org Stats Project. OpenOffice.org website. Available at: http://stats.openoffice.org/ [last accessed: 01/08/07]. ORLOWSKI, A. 2006. Belgium adopts open office doc format. The Register, 27th June 2006. Available at: http://www.theregister.co.uk/2006/06/27/belgium_odf/ [last accessed: 06/08/07]. OSS WATCH, 2007. Sustainability study: a case study review of open source sustainability models. April 2007. University of Oxford and JISC. Available online at: http://www.jisc.ac.uk/media/documents/programmes/distributedelearning/sustainabilitystudy-1[1].0.pdf [last accessed: 01/08/07].

47 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

OU, G. 2005. Performance analysis of OpenOffice and MS Office. ZDNet, 25th October 2005. Available online from http://blogs.zdnet.com/Ou/?p=120 [last accessed: 21/06/07]. PARLOFF, R. 2007. Microsoft takes on the free world. CNN Money.com (Fortune Magazine). 14th May 2007. Available online from http://money.cnn.com/magazines/fortune/fortune_archive/2007/05/28/100033867/index.htm [last accessed: 13/06/07]. PEGSCO, 2006. Conclusions and recommendations on Open Document Exchange Formats. IDABC: Brussels, Belgium. Available online at: http://ec.europa.eu/idabc/servlets/Doc?id=26971 [last accessed: 05/03/07]. PEDERSEN M., FOMIN, V. 2005. Open Standards and Their Early Adoption. Research Report, Department of Informatics, Copenhagen Business School, November 2005. Available online at: http://ir.lib.cbs.dk/paper/ISBN/x656517335 [last accessed: 06/08/07]. POPOV, D. 2007. Two OpenXML translators compared. Linux.com, 19th March 2007. Available online at http://www.linux.com/article.pl?sid=07/03/12/1654255 [last accessed: 02/04/07]. PUTTICK, C. 2007. Preserving legacy files with ECMA Office Open XML (MSOOXML). ODF Alliance – Europe Action Group, April 2007. Available online at: http://www.odf- eag.eu/repository/white-papers/preserving-legacy-files-with-msooxml.pdf [last accessed: 01/08/07]. RICE, F. 2006. Introducing the Office (2007) Open XML File Formats. Microsoft: Washington, USA. May 2006. Available online from: http://msdn2.microsoft.com/en-us/library/aa338205.aspx [last accessed: 02/08/07]. RIDLING, Z. 2007. Word Processor Review. Donation Coder website, 14th June 2007. Available online at: http://www.donationcoder.com/Reviews/Archive/WordProcs/#textmaker [last accessed: 01/08/07]. RUTLEDGE, L. 2001. Multimedia Standards: Building blocks of the Web. IEEE Multimedia. Volume 8, Issue 3, Jul-Aug 2001, pp. 13 – 15. IEEE: New York, USA. SAYER, P. 2007. French students to get open-source software on USB key. ComputerWorld, 2nd February 2007. Available online from: http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9010159 [last accessed: 01/08/07]. SCHWARTZ, E. 2006. IBM to adopt ODF for Lotus Notes. InfoWorld, 16th May 2006. Available online from http://www.infoworld.com/article/06/05/16/78380_HNibmodf2_1.html [last accessed: 12/04/07]. SCOBLE, R. 2005. Tim Bray wants Microsoft to make Office support ODF. Scobleizer, personal blog, 28th November 2005. Available online at: http://scobleizer.com/2005/11/28/tim-bray-wants-microsoft-to- make-office-support-odf/ [last accessed: 01/08/07]. SOININEN, A. 2005. Open standards and the problems with submarine patents. The 4th conference on Standardization and innovations in information technology, 2005. 21st-23rd September 2005, pp. 218- 231. STANESCU, A. 2004. Assessing the Durability of Formats in a Digital Preservation Environment. D- Lib magazine. Vol. 10. No. 11. Available online at: http://dlib.org/dlib/november04/stanescu/11stanescu.html [last accessed: 01/08/07]. SUN MICROSYSTEMS, 2007. Sun Microsystems announces OpenDocument Format (ODF) Plug-in Application for Microsoft Office. Sun Microsystems press release, 7th February 2007. Available online at: http://www.sun.com/aboutsun/pr/2007-02/sunflash.20070207.1.xml [last accessed: 01/08/2007]. SUTOR, B. 2006. Open Standards vs. Open Source: How to think about software, standards and Service Oriented Architecture at the beginning of the 21st century. Personal blog, May 2006. Available online at: http://www.sutor.com/newsite/essays/index.php [last accessed: 01/08/2007].

48 JISC Technology and Standards Watch, Aug. 2007 XML-based Office Documents

TAC, 2004. TAC approval on conclusions and recommendations on open document formats. IDABC: Brussels, Belgium. Available online at http://ec.europa.eu/idabc/en/document/2592/5588 [last accessed: 05/03/07]. TEMPLE, P. 2005. An overview. In: The Empirical Economics of Standards. DTI Economics Paper No. 12, June 2005, pp. 8-38. DTI: London, UK. TENHUMBERG, E., HARBISON, D., WEIR, R. 2006. Open by design: the format standard for office applications. UPGRADE. Vol. VII, No. 6 December 2006. TURNER, A. 2007. Google targets PowerPoint but denies building Office-killer. ITWire, 19th April 2007. Available online from http://www.itwire.com.au/content/view/11425/1085/ [last accessed: 03/06/07]. UDELL, J. 2004. Open document Formats. InfoWorld, 17th June 2004. Available online at: http://weblog.infoworld.com/udell/2004/06/17.html#a1025 [last accessed: 03/08/07]. UN Joint Inspection Unit, 2005. Policies of United Nations system organizations towards the use of open source software (OSS) in the secretariats. UN Joint Inspection Unit: Geneva, Switzerland. Available for download at: http://www.unjiu.org/data/reports/2005/en2005_3.pdf [last accessed: 1/08/07]. VALORIS, 2003. Comparative assessment of Open Documents Formats Market Overview. IADBC: Brussels, Belgium. Available online at: http://ec.europa.eu/idabc/en/document/3439/5585#VALORIS [last accessed: 17/03/07]. W3C, 2004. W3C Patent Policy. W3C website, 5th February 2004. Available at: http://www.w3.org/Consortium/Patent-Policy-20040205/#def-RF [last accessed: 01/02/07]. WENZEL, E. 2006. Type and Travel: Web-based word processors. ZDNet, 28th November 2006. Available online at: http://reviews.zdnet.co.uk/software/productivity/0,1000001108,39284895,00.htm [last accessed: 01/08/07]. WEIR, R. 2007. Interoperability by Design. An Antic Disposition (Rob Weir's personal blog), 22nd May 2007. Available online at: http://www.robweir.com/blog/2007/05/interoperability-by-design.html [last accessed: 01/08/2007]. WILSON, R. 2005. Software Patents. OSS Watch website, April 2005. Available from: http://www.oss- watch.ac.uk/resources/softwarepatents.xml [last accessed: 12/06/07]. YOONKIT, 2007. UOF and ODF comparison. Which should we choose? OpenMalaysia.com, 18th January, 2007. Available online from: http://www.openmalaysiablog.com/2007/01/uof_and_odf_com.html [last accessed: 19/06/07]. ZDNET, 2006. Microsoft Office 14 to play roles. ZDNet, Between the Lines blog, 5th April 2006. Available online at: http://blogs.zdnet.com/BTL/?p=2835 [last accessed: 01/08/2007].

49