The Fyntour Multilingual Weather and Sea Dialogue System

Decalog 2007: Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue, pages 157–158. Trento, Italy, 30 May – 1 June 2007. Edited by Ron Artstein and Laure Vieu. The Fyntour Multilingual Weather and Sea Dialogue System Eckhard Bick Jens Ahlmann Hansen University of Southern Denmark University of Southern Denmark Odense Odense [email protected] [email protected] 1 Introduction (vs. the continental European 24-hour clock). These cultural preferences can be catered for by The Fyntour multilingual weather and sea dia- straightforward conversions of the shared num- logue system provides pervasive access to weath- ber format data ± performed by the application er, wind and water conditions for domestic and logic generating the dynamic VXML output of international tourists who come to fish for the individual languages. seatrout along the coasts of the Danish island of Funen. Callers access information about high and However, the translation of dynamic data in a low waters, wind direction etc. via spoken dia- free text format, from Danish to English and logues in Danish, English or German. We de- Danish to German, ± such as the above-men- scribe the solutions we have implemented to deal tioned forecasts, written in Danish by different with number format data in a multi-language en- meteorologists ± is more complex. In the Fyntour vironment. We also show how the translation of system, the Danish-English translation problem free text 24-hour forecasts from Danish to En- has been solved by a newly developed machine glish is handled through a newly developed ma- translation (MT) system. The Constraint Gram- chine translation system. In contrast with most mar based MT-system, which is rule-based as current, statistically-based MT systems, we make opposed to most existing, probabilistic systems, use of a rule-based apporach, exploiting a full is introduced below. parser and context-senstitive lexical transfer rules, as well as target language generation and 3 CGbased MT System movement rules. The Danish-English MT module, Dan2eng, is a robust system with a broad-coverage lexicon and 2 Number Format Data grammar, which in principle will translate unre- The Fyntour system provides information in stricted Danish text or transcribed speech with- Danish, English and German. A substantial out strict limitations to genre, topic or style. amount of data is received and handled in an in- However, a small benchmark corpus of weather terlingua format, i.e. data showing wind speed forecasts was used to tune the system to this do- (in m/s) and precipitation (in mm) are language- main and to avoid lexical or structural translation neutral numbers which are simply converted into gaps, especially concerning time and measure language-specific pronunciations by specifying expressions, as well as certain geographical ref- the locale of the speech synthesis in the erences and names. VoiceXML , e.g. Methodologically, the system is rule-based rather than statistical and uses a lexical transfer <prompt xml:lang="da-DK"> 1 </prompt> ºenº approach with a strong emphasis on source lan- <prompt xml:lang="de-DE"> 1 </prompt> ºeinº guage (SL) analysis, provided by a pre-existing <prompt xml:lang="en-GB"> 1 </prompt> Constraint Grammar (CG) parser for Danish, ºoneº DanGram (Bick 2001). Contextual rules are used In Germany, wind speed is normally measured at 5 levels: using the Beaufort scale (vs. the Danish m/s 1. CG rules handling morphological disam- norm), while visitors from English speaking biguation and the mapping of syntactic func- countries are accustomed to the 12-hour clock 157 tions for Danish (approximately 6.000 rules) Note that the include translation also could have 2. Dependency rules establishing syntactic-se- been conditioned by the presence of an object mantic links between words or multi-word (D = @ACC), but would then have to be differ- expressions (220 rules) entiated from (b), regne for (‘consider’). 3. Lexical transfer rules selecting translation 2 regne_V equivalents depending on grammatical cate- (a) D=(@S-SUBJ) :rain; gories, dependencies and other structural (b) D=(<H> @ACC) D=("for" PRP)_nil :consid- context (16.540 rules) er; 4. Generation rules for inflexion, verb chains, (c) D=("med" PRP)_on GD=(<H>) :count; compounding etc. (about 700 rules) (d) D=("med" PRP)_nil :expect; 5. Syntactic movement rules turning Danish (e) D=(@ACC) D=("med" ADV)_nil :include; into English word order and handling sub- (f) D=(<H> @SUBJ) D?=("på")_nil :calculate; clauses, negations, questions etc. (65 rules) It must be stressed that the use of grammatical At all levels, CG rules may be exploited to add relations as translation differentiators is very dif- or alter grammatical tags that will trigger or fa- ferent from a simple memory based approach, cilitate other types of rules. where chains of words are matched from parallel As an example, let us have a look at the trans- corpora. First, the latter approach - at least in its lation spectrum of the weatherwise tedious, but linguistically interesting, Danish verb at regne (to rain), which has many other, non-meteorological, meanings (calculate, consider, expect, convert ...) as well. Rather than ignoring such ambiguity and build a narrow weather forecast MT system or, on the other hand, strive to make an ªAIº module understand these meanings in terms of world knowledge, Dan2eng choos- es a pragmatic middle ground where grammatical tags and grammatical context are used as differentiators for possible translation equivalents, stay- ing close to the (robust) SL analysis. Fig 1: The Dan2eng system Thus, the translation rain (a) is cho- naïve, lexicon-free version - cannot generalize sen if a daughter/dependent (D) exists with the over semantic prototypes (e.g. <H> for human) function of situative/formal subject (@S-SUBJ), or syntactic functions, conjuring up the problem while most other meanings ask for a human sub- 1 of sparse data. Second, simple collocation, or co- ject. As a default translation for the latter calcu occurrence, is much less robust than functional late (f) is chosen, but the presence of other de- dependency relations that will allow interfering pendents (objects or particles) may trigger other material such as modifiers or sub-clauses, as well translations. regne med (ce), for instance, will as inflexional or lexical variation. mean include, if med has been identified as an For more details on the Dan2eng MT system, adverb, while the preposition med triggers the see http://beta.visl.sdu.dk/ (demo, documentation, translations count on for human ªgranddaughterº NLP papers). dependents (GD = <H>), and expect otherwise. 1 The ordering of differentiator-translation pairs is 2 The full list of differentiators for this verb con- important - defaults, with fewer restrictions, have tains 13 cases, including several prepositional to come last. For the numerical value of a given complements not included here (regne efter, translation, 1/rank is used. blandt, fra, om, sammen, ud, fejl ...) 158 Decalog 2007: Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue, pages 159–160. Trento, Italy, 30 May – 1 June 2007. Edited by Ron Artstein and Laure Vieu. Dialog OS: an extensible platform for teaching spoken dialogue systems Daniel Bobbert Magdalena Wolska CLT Sprachtechnologie GmbH Computational Linguistics Science Park Saar Universität des Saarlandes 66123 Saarbrücken, Germany 66041 Saarbrücken, Germany [email protected] [email protected] 1 Introduction Default components Dialog OS comes with built-in modules for professional quality speech With the area of spoken dialogue systems rapidly input and output using technology from Nuance developing, educational resources for teaching ba- and AT&T. As part of the platform, Dialog OS sic concepts of dialogue systems design in Lan- provides a number of default input/output device guage Technology and Computational Linguistics clients that can be directly connected without ex- courses are becoming of growing importance. Di- tra programming. Among those are: a simple text 1 alog OS is an extensible platform for develop- console for text-based input and output, a sound ing (spoken) dialogue systems that is intended, player, and a default client for a connection to an 2 among others, as an educational tool. It al- SQL database. CLT can also provide built-in con- lows students to quickly grasp the main ideas of nections to a number of other research and com- ﬁnite-state-based modelling and to develop rela- mercial Automatic Speech Recognition (ASR) and tively complex applications with ﬂexible dialogue Text-To-Speech (TTS) systems. strategies. Thanks to Dialog OS' intuitive interface and extensibility, system implementation Extensibility Dialog OS can be extended to tasks can be distributed among non-technically- work with an arbitrary number of clients through and technically-oriented students making the tool a Java-based API. The low-level communication suitable for a variety of courses with participants between Dialog OS and the clients is handled by of different backgrounds and interests. Below, we a dedicated internal protocol and remains invisible give a brief overview of the framework and out- to the user. Programming a new client involves line some of the student projects in which it was a Java implementation of a high-level functional used as a basis for dialogue management and mod- protocol for the given client, without having to elling. deal with the details of network connection with the dialogue engine itself. 2 Dialog OS: a brief overview FSA-based dialogue modelling The central Dialog OS is an extensible platform for managing part of the dialogue system is the dialogue model. and modelling (spoken) dialogue systems. It com- Dialog OS offers an intuitive way of modelling prises an intuitive Graphical User Interface (GUI), dialogues using Finite State Automata (McTear, default dialogue components, and a communica- 2002). Building a dialogue model consists of tions API to build new components.

The Fyntour Multilingual Weather and Sea Dialogue System

Using Constraint Grammar for Treebank Retokenization

Using Danish As a CG Interlingua: a Wide-Coverage Norwegian-English Machine Translation System

Floresta Sinti(C)Tica : a Treebank for Portuguese

Instructions for Preparing LREC 2006 Proceedings

17Th Nordic Conference of Computational Linguistics (NODALIDA

Frag, a Hybrid Constraint Grammar Parser for French

Prescriptive Infinitives in the Modern North Germanic Languages: An

The VISL System

A Morphological Lexicon of Esperanto with Morpheme Frequencies

Degrees of Orality in Speechlike Corpora

Arborest – a Growing Treebank of Estonian

The English Wikipedia in Esperanto