Deployment of XML for Office Documents in Organizations

JYVÄSKYLÄ LICENTIATE THESES IN COMPUTING 16 Eliisa Jauhiainen DeployPent of XML for OfÀFe DoFXPents in Organizations JYVÄSKYLÄ LICENTIATE THESES IN COMPUTING 16 Eliisa Jauhiainen Deployment of XML for Office Documents in Organizations UNIVERSITY OF JYVÄSKYLÄ JYVÄSKYLÄ 2014 Deployment of XML for Office Documents in Organizations JYVÄSKYLÄ LICENTIATE THESES IN COMPUTING 16 Eliisa Jauhiainen Deployment of XML for Office Documents in Organizations UNIVERSITY OF JYVÄSKYLÄ JYVÄSKYLÄ 2014 Editor Mauri Leppänen Department of Computer Science and Information Systems, University of Jyväskylä URN:ISBN:978-951-39-5600-4 ISBN 978-951-39-5600-4 (PDF) ISBN 978-951-39-5599-1 (nid.) ISSN 1795-9713 Copyright © 2014, by University of Jyväskylä Jyväskylä University Printing House, Jyväskylä 2014 ABSTRACT Jauhiainen, Eliisa Deployment of XML for office documents in organizations Jyväskylä: University of Jyväskylä, 201, 63 p. (+ four included articles) (-\YlVN\Ol/LFHQWLDWH7KHVHVLQ&RPSXWLQJ ISSN) ,6%1 (nid.), 978-951-39-5600-4 (PDF) Licentiate Thesis Majority of the content in organizations is stored as documents. Structured documents, like XML documents, allow the structure definitions, document instances, and layout specifications to be handled as separate entities. This is an important feature to realize from a document management point of view. A class of similar documents with the same structure constitutes a document type. The documents are built from components that are logical units of information within the context of the document type. Office documents are typically authored using word-processing software, they are relatively short in length, and intended for human consumption. The development of open office standards brought XML to organizations’ office en- vironments and changed the capabilities of using document content in ways that were previously impossible or difficult. The aim of this research is to ex- plore possible benefits of using custom XML schemas in office documents and factors influencing the design of such schemas. This study reports findings from two action research studies. Both cases involved designing custom XML schemas for office documents with the case organizations. The research resulted in the increased understanding of real-life requirements and challenges of XML schema design for office documents and provided insights of XML-based office document use after implementation. The research area changed significantly since the study was started. When the research began in 2005, the open XML standards for office documents were still under development, yet office suites available had already XML support enabling the use of custom XML schemas. The longevity of the study has provided an opportunity to closely observe the use of an XML-based office document authoring system using custom XML schemas in a case organization through the years. The contributions of this study are the insights to schema design efforts. The two cases revealed issues motivating organizations to con- sider custom XML schemas for their office documents and how XML document management can be analyzed and described. The use of schema design methods was observed to be beneficial in both cases. Keywords: document management, document analysis, schema design, office document Author’s address Eliisa Jauhiainen Department of Computer Science and Information Systems University of Jyväskylä, Finland Supervisors Airi Salminen Department of Computer Science and Information Systems University of Jyväskylä, Finland Anne Honkaranta Digia Plc Jyväskylä, Finland Reviewers Jari Multisilta CICERO Learning University of Helsinki, Finland Tero Päivärinta Department of Computer Science, Electrical and Space Engineering Luleå University of Technology, Sweden ACKNOWLEDGEMENTS I would like to thank both of my supervisors, Professor Airi Salminen and Anne Honkaranta, for their insightfulness, guidance and patience. It’s been a long road, but luckily filled with light and many opportunities to grow. I would also like to express my warmest thanks to the reviewers of this thesis, Jari Multisilta and Tero Päivärinta, for their contribution and insightful comments. This thesis marks an end of an interesting journey. Luckily I have shared it with the best travelling companions possible, a group of very near and dear ones in the Department of Information Systems and Computer Science. The la- dies known as the Tricksters (i.e. “huiputtajat”) - Maritta Pirhonen, Minna Sil- vennoinen, Irja Kankaanpää, and Marjo Silvennoinen - all of you made this road more fun to travel. Thanks for the laughs, shoulders to cry on, sympathetic ears, and many great discussions. Sharing thoughts, ideas and a wide spectrum of feelings with you on research and life in general has been both enlightening and empowering experience for me. I would also like to thank Reija Nur- meksela, my “partner in crime”. Writing research articles with you is always a pleasure. Your insightfulness never ceases to amaze me. Some people enjoy afternoon tea at five o'clock, but Seija Paananen and I enjoy our seven o’clock coffee in the morning. Seija is without a doubt the heart of the Department of Information Systems and Computer Science, and one of the most significant supporters for me throughout these years. Your positivity is truly inspiring! I would also like to express my gratitude to the hardworking people in the Faculty Office. If I have done anything to make your everyday work even a lit- tle bit easier, I’ll take that as one of the most important accomplishment of these past years. Don’t ever forget how important and valuable work you do for the faculty staff every day. Jyväskylä 31.5.2013 Eliisa Jauhiainen LIST OF FIGURES FIGURE 1 XML element tree ................................................................................... 13 FIGURE 2 Components of a content management environment ...................... 18 FIGURE 3 XML document conforming to the DocBook DTD ........................... 20 FIGURE 4 Research process of the thesis .............................................................. 33 FIGURE 5 Example of component list for memo document type ..................... 35 FIGURE 6 Document hierarchy model .................................................................. 36 FIGURE 7 Document component model ............................................................... 38 FIGURE 8 Top-level analysis of four document types ........................................ 40 FIGURE 9 Simple example of reuse map for four document types .................. 40 FIGURE 10Example of an information model ....................................................... 41 LIST OF TABLES TABLE 1 Summary of the characteristics of XML documents ........................... 14 TABLE 2 Interpretations of components identified from memo document type ......................................................................................... 16 TABLE 3 Comparison of standards ....................................................................... 23 TABLE 4 Comparison of methods ......................................................................... 43 CONTENTS 1 XML FOR OFFICE DOCUMENTS ...................................................................... 11 1.1 XML documents ........................................................................................ 12 1.2 Data-centric vs. document-centric XML ................................................ 13 1.3 XML document components ................................................................... 15 1.4 Document management in offices .......................................................... 17 1.5 Standard schemas for XML documents ................................................. 19 1.5.1 DocBook ............................................................................................ 19 1.5.2 OpenDocument Format .................................................................. 20 1.5.3 The Office Open XML ..................................................................... 21 1.5.4 Comparison of the standards ........................................................ 22 1.6 Office document standardization ........................................................... 23 2 RESEARCH GOAL AND METHODOLOGY .................................................... 25 2.1 Research objectives ................................................................................... 25 2.2 Research approach and research process .............................................. 26 2.2.1 Action research ................................................................................ 26 2.2.2 Case 1: The MemoX system ........................................................... 27 2.2.3 Case 2: RAKE project ...................................................................... 30 2.3 Research process ........................................................................................ 32 3 SCHEMA DESIGN METHODS ........................................................................... 34 3.1 The Maler and El Andaloussi method ................................................... 34 3.2 Document Engineering Approach .......................................................... 37 3.3 Unified Content Strategy ......................................................................... 39 3.4 Comparison of the methods .................................................................... 42 4 SUMMARY OF THE INCLUDED ARTICLES .................................................. 44 4.1 Article 1: “Two Methods

Deployment of XML for Office Documents in Organizations

Markdown Markup Languages What Is Markdown? Symbol

Using NROFF and TROFF

Looking to the Future by JOHN BALDWIN

Interface Specification – Archive Content Services

A User's Guide to the Lout Document Formatting System

Triroff, an Adaptation of the Device-Independent Troff for Formatting Tri-Directional Text

Unix Programmer's Manual

Lilypond Informations Générales

Lilypond Allgemeine Information

Markup 101: Markup Basics Cynthia Zender, SAS Institute, Inc., Cary, NC

Electronic Document Preparation Pocket Primer

Nroff/Troff User's Manual