Erschienen in: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries - JCDL '18 / Chen, Jiangping et al. (Hrsg.). - New York : ACM Press, 2018. - S. 233-242. - ISBN 978-1-4503-5178-2 Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context Moritz Schubotz1, André Greiner-Petter1, Philipp Scharpf1, Norman Meuschke1, Howard S. Cohl2, Bela Gipp1 1Information Science Group, University of Konstanz, Germany (
[email protected]) 2Applied and Computational Mathematics Division, NIST, U.S.A. (
[email protected]) ABSTRACT Moritz Schubotz1, André Greiner-Petter1, Philipp Scharpf1, Norman Mathematical formulae represent complex semantic information 1 2 1 in a concise form. Especially in Science, Technology, Engineering, Meuschke , Howard S. Cohl , Bela Gipp and Mathematics, mathematical formulae are crucial to commu- nicate information, e.g., in scientific papers, and to perform com- putations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such infor- 1 INTRODUCTION mation between systems additionally requires conversion methods In STEM disciplines, i.e., Science, Technology, Engineering, and for mathematical representation formats. We analyze how the se- Mathematics, mathematical formulae are ubiquitous and crucial to mantic enrichment of formulae improves the format conversion communicate information in documents, such as scientific papers, process and show that considering the textual context of formulae and to perform computations in computer algebra systems (CAS). reduces the error rate of such conversions. Our main contributions Mathematical formulae represent complex semantic information are: (1) providing an openly available benchmark dataset for the in a concise form that is independent of natural language.