An Open Markup Format for Mathematical Documents (Version 1.1)
Total Page:16
File Type:pdf, Size:1020Kb
OMDoc: An Open Markup Format for Mathematical Documents (Version 1.1) Michael Kohlhase Computer Science, Carnegie Mellon University Pittsburgh, Pa 15213, USA http://www.cs.cmu.edu/∼kohlhase October 5, 2005 Abstract In this report we present a content markup scheme for (collections of) math- ematical documents including articles, textbooks, interactive books, and courses. It can serve as the content language for agent communication of mathematical services on a mathematical software bus. We motivate and de- scribe the OMDoc language and present an Xml document type definition for it. Furthermore, we discuss applications and tool support. This document describes version 1.1 of the OMDoc format. This version is mainly a bug-fix release that has become necessary by the experiments of encoding legacy material and theorem prover interfaces in OMDoc. The changes are relatively minor, mostly adding optional fields. Version 1.1 of OMDoc freezes the development so that version 2.0 can be started off. In contrast to the OMDoc format which has not changed much, this report is a total re-write, it closes many documentation gaps, clarifies various remaining issues. and adds a multitude of new examples. Contents 1 Introduction 1 2 Mathematical Markup Schemes 5 2.1 DocumentMarkupfortheWeb . 6 2.2 Xml, the eXtensible Markup Language . 9 2.3 Mathematical Objects and Formulae . 12 2.4 Meta-Mathematical Objects . 17 2.5 An Active Web of Mathematical Knowledge . 20 3 OMDoc Elements 22 3.1 Metadata for Mathematical Elements . 23 3.1.1 The Dublin Core Elements . 24 3.1.2 Roles in Dublin Core Metadata . 28 3.2 Mathematical Statements . 31 3.2.1 Specifying Mathematical Properties . 31 3.2.2 Symbols, Definitions, and Axioms . 35 3.2.3 Assertions and Alternatives . 38 3.2.4 Mathematical Examples in OMDoc .......... 42 3.2.5 Representing Proofs in OMDoc ............ 43 3.2.6 Abstract Data Types . 49 3.3 Theories as Mathematical Contexts . 52 3.3.1 SimpleInheritance . 54 3.3.2 Inheritance via Translations . 56 3.3.3 Statements about Theories . 58 3.3.4 Parametric theories in OMDoc ............. 60 3.4 AuxiliaryElements . 64 3.4.1 Preservation of Text Structure . 64 3.4.2 Non-Xml Data and Program Code in OMDoc . 68 3.4.3 Applets in OMDoc ................... 71 ii 3.4.4 Exercises ......................... 73 3.5 Adding Presentation Information to OMDoc ......... 74 3.5.1 Specifying Style Information for OMDoc Elements . 75 3.5.2 Specifying the Notation of Mathematical Symbols . 79 3.6 Identifying and Referencing OMDoc Elements . 86 3.6.1 Locating OMS elements by the OMDoc Catalogue . 88 3.6.2 A URI-based Mechanism for Element Reference . 90 3.6.3 Uniqueness Constraints and Relative URI references . 92 4 OMDoc Applications, Tools, and Projects 95 4.1 Transforming OMDoc by XslT StyleSheets . 96 4.1.1 OMDoc Interfaces for Mathematical Software Systems 97 4.1.2 Presenting OMDoc toHumans............. 99 4.2 QMath: An Authoring Tool for OMDoc ...........102 4.3 MBase, an Open Mathematical Knowledge Base . 105 4.4 Project ActiveMath ......................107 4.4.1 OMDocExtensions. 107 4.4.2 Adaptive Presentation . 108 4.4.3 Integration of External Systems . 108 4.4.4 CurrentStatus . 109 5 Conclusion 110 A Errata to the released Specification 133 B Changes from Version 1.0 139 C Quick-Reference Table to the OMDoc Elements 144 D Quick-Reference Table to the OMDoc Attributes 149 E The OMDoc Document Type Definition 155 iii Chapter 1 Introduction It is plausible to expect that the way we do (i.e. conceive, develop, commu- nicate about, and publish) mathematics will change considerably in the next nine1 years. The Internet plays an ever-increasing role in our everyday life, and most of the mathematical activities will be supported by mathemat- ical software systems (we will call them mathematical services) connected by a commonly accepted distribution architecture, which we will call the mathematical software bus. We will subsume all proposed architectures and implementations of this idea [FHJ+99, FK99, DCN+00, AZ00] by the term MathWeb. We believe that interoperability based on communication protocols will eventually make the constructions of bridges between the par- ticular implementations simple, so that the combined systems appear to the user as one homogeneous web. One of the tasks that have to be solved is to define an open markup language for the mathematical objects and knowledge exchanged between mathematical services. The OMDoc format presented in this report at- tempts to do this by providing an infrastructure for the communication and storage of mathematical knowledge. In chapter 2 we will describe the status quo of mathematical markup schemes before OMDoc and show that these markup schemes – while giving a good basis – are not sufficient for content-based markup of mathematical knowledge. They do not provide markup for mathematical forms like defi- nitions, theorems, and proofs that have long been considered paradigmatic of mathematical documents like textbooks and papers. They also leave im- plicit the large-scale structure of mathematical knowledge. In particular, it 1In the release document of OMDoc1.0 [Koh00c] we claimed that it would change in the next 10, and that is one year ago. 1 has traditionally been structured into mathematical theories that serve as a situating context for all forms of mathematical communication. In chapter 3, we define the OMDoc markup primitives and motivate them from either particular structures in mathematical documents or from processing needs of computer-supported mathematics. As all mathematical communication is in the form of (or can be transcribed to) mathematical documents such as publications, overhead slides, letters, e-mails, in/output from mathematical software systems, OMDoc uses documents as a guiding intuition for mathematical knowledge with the goal of providing a frame- work, where all of these forms can be accommodated. In accordance with this motivation OMDoc provides a rich mix of elements of informal and formal mathematics. To model particular kinds of documents in OMDoc usually only a subset will be needed, e.g. informal ones for traditional math- ematical textbooks, or formal ones for communication of software systems. However, availability of both kinds of markup primitives in OMDoc al- low to develop novel kinds of mathematical documents, where formal and informal elements are intimately intermixed. We will discuss current and intended applications of the OMDoc format in chapter 4 and discuss which applications will need which parts of the OMDoc format. Finally, the appendix contains useful materials like the OMDoc docu- ment type definition, and a quick reference table. OMDoc Version 1.1 This document describes version 1.1 of the OMDoc format. Version 1.0 has been released on November 1. 2001, after about 18 Months of development, to give developers a stable interface to base their systems on. It has been adopted by various projects in automated deduction, algebraic specification and computer-supported education. The experience from these projects has uncovered a multitude of small deficiencies and extension possibilities of the format, that have been discussed in the OMDoc community. Version 1.1 is an attempt to roll the uncontroversial and non-disruptive part of the extensions and corrections into a consistent language format. We have tried to keep the changes to version 1.0 conservative, adding optional attributes or child elements. In some cases we had to introduce non-conservative changes, to repair de- sign flaws and inconsistencies of version 1.0. One example is the hpothesis element that has received a required attribute discharged-in that is nec- 2 essary for specifying the scope of local assumptions in proofs, and cannot be inferred from the context. To minimize disruption we have tried to keep changes like this one to a minimum for the elements that are in frequent use today. We are working on a new version (OMDoc2.0) that will incorporate re-organizations of central features of OMDoc like the definition element. We have however re-organized some parts of the OMDoc format that are currently less used in the anticipation that this will make them more effective. Examples are the representations of complex theories (see sec- tions 3.3.2 to 3.3.4) or the organization of non-Xml data (section 3.4.2). Finally, we have added new features that were missing from OMDoc1.0 and turned out to be important for the enterprise of representing mathemat- ical knowledge. Examples of this are a new referencing scheme for OMDoc elements in section 3.6 and a new way of specifying presentation for OM- Doc elements. In both cases, the method that was used in OMDoc1.0 for symbols is extended and generalized to arbitrary OMDoc elements. These extensions have found their way into OMDoc1.1, even though they are not totally fixed yet, since we anticipate to gain implementation experience for OMDoc2.0. They are non-disruptive, since they are strictly additional. An element-by-element account of the changes is tabulated in appendix B. Acknowledgments Of course the OMDoc format has not been developed by one person alone, the original proposal was taken up by several research groups, most no- tably the Ωmega group at Saarland University, the InKa and ActiveMath projects at the German Research Center of Artificial Intelligence (DFKI), the RIACA group at the Technical University of Eindhoven, the In2Math project at the University of Koblenz, and the CourseCapsules project at Carnegie Mellon University. They have discussed the initial proposals, repre- sented their materials in OMDoc and in the process refined the format with numerous suggestions and discussions (see http://www.mathweb.org/∼mailists/omdoc for the archive of the OMDoc mailing list.) The author specifically would like to thank Serge Autexier, Olga Caprotti, David Carlisle, Claudio Sacerdoti Coen, Arjeh Cohen, Armin Fiedler, An- dreas Franke, George Goguadze, Dieter Hutter, Erica Melis, Paul Libbrecht, Martijn Oostdijk, Alberto Palomo Gonzales, Martin Pollet, Julian Richard- son, Manfred Riem, and Michel Vollebregt for their input, discussions and feedback from implementations and applications.