Using NLG for Speech Synthesis of Mathematical Sentences
Total Page:16
File Type:pdf, Size:1020Kb
Using NLG for speech synthesis of mathematical sentences Alessandro Mazzei Universita` degli Studi di Torino [email protected] Michele Monticone Cristian Bernareggi Universita` degli Studi di Torino Universita` degli Studi di Torino [email protected] [email protected] Abstract and so it represents both the function application of f to x, and the multiplication of the constant f People with sight impairments can access to for the constant x surrounded by parenthesis. a mathematical expression by using its LATEX source. However, this mechanisms have sev- In this paper we study the design of a natu- eral drawbacks: (1) it assumes the knowledge ral language generation (NLG) system for pro- of the LATEX, (2) it is slow, since LATEX is ver- ducing a mathematical sentence, that is a natural bose and (3) it is error-prone since LATEX is language sentence containing the semantics of a a typographical language. In this paper we mathematical expression. Indeed, humans, when study the design of a natural language genera- have to orally communicate mathematical expres- tion system for producing a mathematical sen- sions, use their most sophisticated communication tence, i.e. a natural language sentence express- ing the semantics of a mathematical expres- technology, that is natural language. However, sion. Moreover, we describe the main results with respect to other domains, the mathematical of a first human based evaluation experiment domain has a number of peculiarities for speech of the system for Italian language. that needed to be accounted for (see Section4). We have three main research goals in this pa- 1 Introduction per. The first goal is answering to the question: The recent progress of computational linguistic what is the linguistic status of a mathematical ex- techniques and frameworks had a deep impact pression? In other words, we want to investigate in the field of the assistive technologies. For about the possibility to use the standard notions instance, the recent development of commercial of linguistics, primarily syntax, for mathematical platforms for building speech dialogue systems, sentences. The second goal concerns the possibil- which are designed for not-impaired people, can ity to use a standard NLG architecture, that is a also help people with disabilities in daily activi- sentence planner and a realizer, for the production ties. For example a vocal command can be used of a mathematical sentence. The third goal con- to unlock a door in a house. However, for more cerns the possibility to simplify the listening of a specialized activities one needs to understand the mathematical expression by using speech features necessity of specific communities in specific do- during the speech synthesis. Indeed, in contrast mains. with other fields, the mathematical domain is es- In the case of mathematical domain, blind peo- sentially a spoken domain (Chang, 1983). Indeed, ple can access to a mathematical expression by us- by only listening the audio format of mathemati- ing its LATEX source. However, this process have cal sentence, that is without accessing to its writ- several drawbacks. First of all, it assumes the ten form, the standard precedence of the mathe- knowledge of the LATEX. Second, listening LATEX matical operators are hardly recognizable. In other is slow, since LATEX is verbose. Finally, it is error- words, speech features, as pauses and prosody, can prone since LATEX is a typographical language, that modify the perceived structure of the mathemati- is a language designed for specifying the details cal sentence in a peculiar way. of typographical visualization rather than for effi- The schematic architecture of the developed ciently communicate the semantics of a mathemat- framework is designed in Figure1. The schema ical expression. For instance, the simple LATEX ex- follows the well-known approach of the interlin- pression $f(x)$ is just a typographical description gua of rule-based machine translation (Hutchins 463 Proceedings of The 12th International Conference on Natural Language Generation, pages 463–472, Tokyo, Japan, 28 Oct - 1 Nov, 2019. c 2019 Association for Computational Linguistics Enhanced CMML written math. sentence PostProcessor S2S math. CMML audio expression math. in LATEX sentence LatexML SynthCaller Figure 1: The software architecture for the generation of mathematical sentences. The information flow starts from (1) the LATEX representation of the expression, (2) its translation in CMML, (3) enhancement of CMML, (4) generation of the written form of the mathematical sentence, (5) production of the audio form of the mathematical sentence. and Somer, 1992). The process of generating a out the whole workflow of a scientific document. mathematical expression from its LATEX source is In particular, nowadays it is possible to embed a two-step algorithm. In the first step the LATEX is mathematical expressions in web pages not only analyzed and its semantics is represented in Con- as images, which cannot be processed by screen tent MathML (CMML henceforth), a W3C stan- readers, but through MathML or MathJax (Cer- dard1 for the syntax and semantics of mathemat- vone, 2012) and in PDF documents produced from ical expressions. In the second step, the CMML LaTeX through the LaTeX package Axessibility representation is used as input of the S2S (Seman- (Ahmetovic et al., 2018). On the other side, many tics to Speech) module, that is a NLG module, to research works have investigated how people with generate the mathematical sentence and, after the sight impairments can read and understand math- introduction of parenthesis or pauses, its audio for- ematical notation, along two directions: conver- mat encoding. sion into Braille and speech reading. Since Braille The paper is structured as follows. In Section2 is not a universal standard, different converters we give a short review of the accessibility prob- have been developed. The most widespread in- lem of mathematical expressions for visually im- clude conversion from LaTeX to Japanese Braille paired people. In Section3 we describe the first (Hara et al., 2000), to Nemeth code mostly used step of the algorithm, that is the process to extract in English speaking countries (Papasalouros and the CMML representation from its LATEX repre- Tsolomitis, 2015), to Marburg code mostly used sentation. In Section4 we describe our assump- in German speaking countries (Murillo-Morales tions about the syntactic structures associated to et al., 2016) and from MathML to Spanish, French mathematical operators. In Section5 we describe and Italian Braille codes (Soiffer, 2016). Nonethe- the second step of the algorithm, that is the NLG less, Braille cannot support all the notations that of the mathematical expression from its CMML can be expressed through LaTeX or presentation representation. In Section6 we describe a first MathML (e.g., category theory, computational human-based evaluation of the system for Italian logic) and it has not a mechanism to introduce language performed by four blind people. Finally, new notations hence the converters have a number Section7 closes the paper with some considera- of limitations. For what concerns speech reading, tions and pointing to future work. different techniques have been investigated. First, the most common approach transforms LaTeX or 2 Related work MathML expressions into a readable sentence by mapping a sequence of mathematical symbols to Research to enable people with sight impairments an aural equivalent for English (Raman, 1996), to access mathematical notation has been con- Spanish, German and French (Soiffer, 2007), Pol- ducted in two main directions. On one hand, dif- ish (Bier and Sroczynski, 2015) and Thai (Boon- ferent techniques have been investigated to pre- prakong et al., 2017). This approach is totally serve mathematical notation in a source format, unambiguous, but it is very verbose (e.g. mul- that can be processed by a screen reader, through- tiple nested parentheses can be hardly retained). 1https://www.w3.org/TR/MathML3/ Moreover, only a limited number of mathematical chapter4.html contexts are managed. Second, in addition to se- 464 1 <m:apply> 1 <apply> 2 <m:eq/> 2 <eq/> 3 <m:ci>y</m:ci> 3 <ci>y</ci> 4 <m:apply> 4 <apply> 5 <m:times/> 5 <ci>f</ci> 6 <m:ci>f</m:ci> 6 <ci>x</ci> 7 <m:ci>x</m:ci> 7 </apply> 8 </m:apply> 8 </apply> 9 </m:apply> Figure 2: On the left the CMML generated by LatexML. On the rigth the enhanced CMML obtained after the preprocessing phase and . quential speech reading, hierarchical exploration some extra characters in the tag names (e.g. of the mathematical expression is provided (Soif- the suffix m:). With the same aim, we re- fer, 2007; Sorge et al., 2014). This approach re- placed all non standard characters from the duces the mental workload to retain the chunks of tag values, e.g. some variable names which the expression. Third, speech reading is generated are written by the LatexML with italics font. in a controlled environment such as a mathemat- Moreover, in certain cases LatexML generate ical editor (Waltraud Schweikhardt, 2006; Raman some tags with the open math standard, as the and Gries, 1994). The context is defined by the case of conditional-set, that is con- author, hence the speech reading can be more ac- verted by using the csymbol tag. For the sake curate with respect to the semantics. of generality, in order to simplify the gener- The idea to use mathematical sentences for im- ation step of the system, we decided to uni- proving the accessibility of mathematical expres- form these cases to their corresponding pure sions has been previously presented and experi- CMML tag. mented in (Ferres and Fuentes Sepulveda´ , 2011; Fuentes Sepulveda´ and Ferres, 2012) for Span- 2. There are case in which the typographical ori- A ish language. However, in contrast to (Ferres and gin of the LTEX creates ambiguity that cannot Fuentes Sepulveda´ , 2011; Fuentes Sepulveda´ and always correctly solved by LatexML.