Initial Considerations in Building a Speech-To-Speech Translation System for the Slovenian-English Language Pair

Total Page:16

File Type:pdf, Size:1020Kb

Initial Considerations in Building a Speech-To-Speech Translation System for the Slovenian-English Language Pair Initial Considerations in Building a Speech-to-Speech Translation System for the Slovenian-English Language Pair J. Žganec Gros1, A. Mihelič1, M. Žganec1, F. Mihelič2, S. Dobrišek2, J. Žibert2, Š. Vin- tar2, T. Korošec2, T. Erjavec3, M. Romih4 1Alpineon R&D, Ulica Iga Grudna 15, SI-1000 Ljubljana, Slovenia 2University of Ljubljana, SI-1000 Ljubljana, Slovenia 3Jožef Stefan Institute, Jamova 39, SI-1000 Ljubljana, Slovenia 4Amebis, Bakovnik 3, SI-1241 Kamnik, Slovenia E-mail: [email protected] Abstract. The paper presents the design concept of the VoiceTRAN Communicator that in- tegrates speech recognition, machine translation and text-to-speech synthesis using the DARPA Galaxy architecture. The aim of the project is to build a robust multimodal speech-to-speech translation communicator able to translate simple domain-specific sentences in the Slove- nian-English language pair. The project represents a joint collaboration between several Slovenian research organizations that are active in human language technologies. We pro- vide an overview of the task, describe the system architecture and individual servers. Fur- ther we describe the language resources that will be used and developed within the project. We conclude the paper with plans for evaluation of the VoiceTRAN Communicator. 1. Introduction point of the interaction. Consequently, this com- promises the flexibility and naturalness of using Automatic speech-to-speech (STS) translation the system. systems aim to facilitate communication among The VoiceTRAN Communicator is being built people who speak in different languages (Lavie within a national Slovenian research project in- et al., 1997), (Wahlster, 2000), (Lavie et al., volving 5 partners: Alpineon, the University of 2002). Their goal is to generate a speech signal Ljubljana (Faculty of Electrical Engineering, in the target language that conveys the linguis- Faculty of Arts and Faculty of Social Studies), tic information contained in the speech signal the Jožef Stefan Institute, and Amebis as a sub- from the source language. contractor. There are, however, major open research is- The project is co-funded by the Slovenian sues that challenge the deployment of natural Ministry of Defense. The aim of the project is to and unconstrained speech-to-speech translation build a robust multimodal speech-to-speech trans- systems, even for very restricted application lation communicator, similar to Phraselator (Sa- domains, due to the fact that state-of-the-art auto- rich, 2001) or Speechalator (Waibel, 2003), able matic speech recognition and machine transla- to translate simple sentences in a Slovenian-Eng- tion systems are far from perfect. Additionally, lish language pair. in comparison to translating written text, con- The application domain is limited to common versational spoken messages are often conveyed application scenarios that occur in peace-keeping with imperfect syntax and casual spontaneous operations on foreign missions when the users speech. In practice, when building demonstra- of the system have to communicate with the lo- tion systems, STS systems are typically imple- cal population. More complex phrases can be mented by imposing strong constraints on the entered via keyboard using a graphical user in- application domain and the type and structure terface. of possible utterances, i.e. both in the range and in the scope of the user input allowed at any EAMT 2005 Conference Proceedings 288 Initial considerations in building a speech-to-speech translation system for Slovenian-English 2. System architecture Takes the signals from Audio Server and Speech Rec- maps audio samples into text strings. The VoiceTRAN Communicator uses the DAR- ognizer Produces a N-best sentence hypothesis PA Galaxy Communicator architecture (Seneff list. et al, 1998). The Galaxy Communicator open Receives N-best postprocessed sentence source architecture was chosen to provide inter- hypotheses from the Speech Recognition Machine Server and translates them from a source module communication support as its plug-and- Translator language into a target language. play approach allows interoperability of com- Produces a scored disambiguated sen- mercial software and research software compo- tence hypothesis list. nents. It was specially designed for develop- Receives rich and disambiguated word ment of voice-driven user interfaces in a multi- strings from the Machine Translation modal platform. It is a distributed, message- Speech Syn- server. thesizer Converts then input word strings into based, hub-and-spoke infrastructure optimized speech and prepares them for the audio for constructing spoken dialogue systems. server. There are two ways of porting modules into the Galaxy architecture: the first is to alter its code so that it can be incorporated into the Galaxy architecture; the second is to create a wrapper or a capsule for the existing module, the capsule then behaves as a Galaxy server. We have opted for the second option since we want to be able to test commercial modules as well. Minimal changes to the existing mod- ules were required, mainly those regarding in- put/output processing. Figure 1. VoiceTRAN system architecture A particular session is initiated by a user ei- ther through interaction with a graphical user The VoiceTRAN Communicator is composed of interface (typed input) or the microphone. The a number of servers that interact with each other VoiceTRAN Communicator servers capture spo- through the Hub as shown in Figure 1. The Hub ken or typed input from the user, and return the is used as a centralized message router through servers’ responses with synthetic speech, graph- which servers can communicate with one an- ics, and text. The server modules are described other. Frames containing keys and values are in more detail in the next subsections. emitted by each server. They are routed by the hub and received by a secondary server based 2.1. Audio server on rules defined in the Hub script. The VoiceTRAN Communicator consists of The audio server connects to the microphone in- the Hub and five servers: audio server, graphic put and speaker output terminals on the host com- user interface, speech recognizer, machine trans- puter and performs recoding user input and play- lator and speech synthesizer: ing prompts or synthesized speech. Input speech captured by the audio server is automatically Audio Server Receives speech signals from the micro- recorded to files for later system training. phone and sends them to the recognizer. Sends synthesized speech to the speak- 2.2. Speech Recognizer ers. Receives input text from the keyboard. The speech recognition server receives the input Displays recognized source language audio stream from the audio server and pro- Graphic User sentences and translated target language vides at its output a word graph and a ranked Interface sentences. list of candidate sentences, the N-best hypothe- Provides user controls for handling the application. ses list that can include part-of-speech informa- tion generated by the language model. EAMT 2005 Conference Proceedings 289 Žganec Gros, et al. The speech recognition server used in Voice- A postprocessing algorithm inserts basic punc- TRAN is based on the Hidden Markov Model tuation and capitalization information before Recognizer developed by the University of Ljubl- passing the target sentence to the speech synthe- jana (Dobrišek, 2001). It will be upgraded to sizer. The output string can also convey lexical perform large vocabulary speaker (in)dependent stress information in order reduce disambigua- speech recognition on a wider application do- tion efforts during text-to-speech synthesis. main. A back-off class-based trigram language A multi-engine based approach will be used model will be used. Given a limited amount of in the early phase of the project that makes it training data the parameters in the models will possible to exploit strengths and weaknesses of be carefully chosen in order to achieve maxi- different MT technologies and to choose the mum performance. most appropriate engine or combination of en- Further in the project we want to test other gines for the given task. Four different transla- speech recognition approaches with an empha- tion engines will be applied in the system. We sis on robustness, processing time, footprint and will combine TM (translation memories), SMT memory requirements. (statistical machine translation), EBMT (exam- Since the final goal of the project is a stand- ple-based machine translation) and RBMT (rule- alone speech communicator used by a specific based machine translation) methods. A simple user, the speech recognizer can be additionally approach to select the best translation from all trained and adapted to the individual user in or- the outputs will be applied. der to achieve higher recognition accuracy at A bilingual aligned domain-specific corpus least in one language. will be used to build the TM and train the A common speech recognizer output typi- EBMT and the SMT phrase translation models. cally has no information on sentence boundaries, In SMT an interlingua approach, similar to the punctuation and capitalization. Therefore, addi- one described in (Lavie et al., 2002) will be in- tional postprocessing in terms of punctuation and vestigated and promising directions pointed out capitalization will be performed on the N-best in (Ney, 2004) will be pursued. hypotheses list before it is passed to the ma- The Presis translation system will be used as chine translator. our baseline system (Romih
Recommended publications
  • Statistical Machine Translation from Slovenian to English
    Journal of Computing and Information Technology - CIT 15, 2007, 1, 47–59 47 doi:10.2498/cit.1000760 Statistical Machine Translation from Slovenian to English Mirjam Sepesy Maucec and Zdravko Kacic Faculty of Electrical Engineering and Computer Science, University of Maribor In this paper, we analyse three statistical models for the The historical enlargement of the EU has brought machine translation of Slovenian into English. All of many new and challenging language pairs for them are based on the IBM Model 4, but differ in the type of linguistic knowledge they use. Model 4a uses only machine translation. A lot of work has been basic linguistic units of the text, i.e., words and sentences. done on Czech [4], Polish [12], Croatian [3], In Model 4b, lemmatisation is used as a preprocessing Serbian [20] and not at last Slovenian [6]. step of the translation task. Lemmatisation also makes it possible to add a Slovenian-English dictionary as an The Czech-English machine translation system additional knowledge source. Model 4c takes advantage of the morpho-syntactic descriptions (MSD) of words. is based on dependency trees. Dependency trees In Model 4c, MSD codes replace the automatic word represent the sentence structure, as concentrated classes used in Models 4a and 4b. The models are around the verb. The presented system was out- experimentally evaluated using the IJS-ELAN parallel corpus. performed by the statistical translation system GIZA++/ISI ReWrite Decoder [16, 10, 18], Keywords: statistical machine translation, translation trained on the same corpus. model, lemmatisation, morpho-syntactic description, Slovenian language. The Polish-English MT system[12] uses an elec- tronic dictionary annotated for morphological, syntactic, and partly semantic information.
    [Show full text]
  • Initial Considerations in Building a Speech-To-Speech Translation System for the Slovenian-English Language Pair
    Initial Considerations in Building a Speech-to-Speech Translation System for the Slovenian-English Language Pair J. Žganec Gros1, A. Mihelič1, M. Žganec1, F. Mihelič2, S. Dobrišek2, J. Žibert2, Š. Vin- tar2, T. Korošec2, T. Erjavec3, M. Romih4 1Alpineon R&D, Ulica Iga Grudna 15, SI-1000 Ljubljana, Slovenia 2University of Ljubljana, SI-1000 Ljubljana, Slovenia 3Jožef Stefan Institute, Jamova 39, SI-1000 Ljubljana, Slovenia 4Amebis, Bakovnik 3, SI-1241 Kamnik, Slovenia E-mail: [email protected] Abstract. The paper presents the design concept of the VoiceTRAN Communicator that in- tegrates speech recognition, machine translation and text-to-speech synthesis using the DARPA Galaxy architecture. The aim of the project is to build a robust multimodal speech-to-speech translation communicator able to translate simple domain-specific sentences in the Slove- nian-English language pair. The project represents a joint collaboration between several Slovenian research organizations that are active in human language technologies. We pro- vide an overview of the task, describe the system architecture and individual servers. Fur- ther we describe the language resources that will be used and developed within the project. We conclude the paper with plans for evaluation of the VoiceTRAN Communicator. 1. Introduction point of the interaction. Consequently, this com- promises the flexibility and naturalness of using Automatic speech-to-speech (STS) translation the system. systems aim to facilitate communication among The VoiceTRAN Communicator is being built people who speak in different languages (Lavie within a national Slovenian research project in- et al., 1997), (Wahlster, 2000), (Lavie et al., volving 5 partners: Alpineon, the University of 2002).
    [Show full text]
  • The Sloleks Morphological Lexicon and Its Future Development Kaja Dobrovoljc, Simon Krek and Tomaž Erjavec
    Kaja Dobrovoljc, Simon Krek and Tomaž Erjavec The Sloleks Morphological Lexicon and its Future Development Kaja Dobrovoljc, Simon Krek and Tomaž Erjavec Abstract This paper presents Sloleks, the largest open-source machine-readable mor- phological lexicon of the Slovene language to date. We first briefly present its development and the formal grammar behind it, and then provide a detailed presentation of the types and structure of inflectional, derivational, grammat- ical and other included information, with a special emphasis on its formal representation within the standardized XML LMF framework. Given that Sloleks is a strong candidate to be used in the compilation of a new diction- ary of modern Slovene, both as a source of morphological information and as a background resource for the language technology tools needed to create it, the second part of the presentation explores the most important aspects of its future development, in particular the expansion of its entry list, addi- tion of pronunciation information, normative categorization of variants and a corpus-based re-evaluation of the existing inflectional paradigms. Such an extensive usage-based open-source morphological lexicon of modern Slovene with a unified system of morphological description will have a long-term use for both language technologies and for other born-digital reference works for the Slovene language. Keywords: morphological lexicon, lexicon of inflected forms, machine read- able dictionary, morphology, inflection, derivation, pronunciation, language standardisation 42 DICTIONARY OF MODERN SLOVENE: PROBLEMS AND SOLUTIONS THE SLOLEKS MORPHOLOGICAL LEXICON AND ITS FUTURE DEVELOPMENT 1 INTRODUCTION When it comes to morphologically rich languages, such as Slovene, the description of morphological paradigms of inflected parts of speech is traditionally very impor- tant.
    [Show full text]
  • Building a Slovenian-English Language Pair Speech-To-Speech Translation System
    SPECOM'2006, St. Petersburg, 25-29 June 2006 Building a Slovenian-English Language Pair Speech-to-Speech Translation System Jerneja Žganec Gros1, France Mihelič2, Mario Žganec1 1Alpineon Research and development Alpineon d.o.o., Ljubljana, Slovenia 2Laboratory of Artificial Perception, Systems and Cybernetics Faculty of Electrical Engineering, University of Ljubljana, Slovenia [email protected] [email protected] similar to Phraselator [4] or Speechalator [5], able to translate Abstract simple sentences in a Slovenian-English language pair. In the paper the design phases of the VoiceTRAN The application domain is limited to common application Communicator are presented, which combines speech scenarios that occur in peace-keeping operations on foreign recognition, machine translation and text-to-speech synthesis missions when the users of the system have to communicate using the Darpa Galaxy architecture. The aim of the with the local population. More complex phrases can be contribution was to build a robust multimodal speech-to- entered via keyboard using a graphical user interface. speech translation communicator able to translate simple First an overview of the VoiceTRAN system architecture domain-specific sentences in the Slovenian-English language is given. We continue to describe the individual server pair. The project represents a joint collaboration between modules. We conclude the paper by discussing the speech-to- several Slovenian research organizations that are active in speech translation evaluation methods and outlining plans for human language technologies. future work. We provide an overview of the task, describe the system architecture and individual servers. Further we describe the 2. System architecture language resources that were used and developed within the project.
    [Show full text]
  • Trans-European Language Resources Infrastructure) European Seminar (1St, Tihany, Hungary, September 15-16
    DOCUMENT RESUME ED 413 738 FL 024 759 AUTHOR Rettig, Heike, Ed. TITLE Language Resources for Language Technology: Proceedings of the TELRI (Trans-European Language Resources Infrastructure) European Seminar (1st, Tihany, Hungary, September 15-16, 1995) . INSTITUTION Institut fuer deutsche Sprache, Mannheim (Germany). ISBN ISBN-963-8461-99-3 PUB DATE 1995-00-00 NOTE 189p.; For individual articles, see FL 024 760-778. "In collaboration with Julia Pajzs and Gabor Kiss." PUB TYPE Collected Works - Proceedings (021) EDRS PRICE MF01/PC08 Plus Postage. DESCRIPTORS *Computational Linguistics; Computer Software; *Computer Software Development; Contrastive Linguistics; Czech; Data Processing; Dictionaries; *Discourse Analysis; Dutch; English; Foreign Countries; Information Technology; *Language Planning; Language Research; *Languages; Languages for Special Purposes; Linguistic Theory; Machine Translation; Morphology (Languages); Research Methodology; Russian; Shared Resources and Services; Slovenian; Spelling; Structural Analysis (Linguistics); Suprasegmentals; Uncommonly Taugi:t Languages; Vocabulary IDENTIFIERS Speech Recognition ABSTRACT. This proceedings contains papers from the first European seminar of the Trans-European Language Resources Infrastructure (TELRI) include: "Cooperation with Central and Eastern Europe in Language Engineering" (Poul Andersen); "Language Technology and Language Resources in China" (Feng Zhiwei); "Public Domain Generic Tools: An Overview" (Tomaz Erjavec); "The 'Terminology Market'" (Christian Galinski); "Lexical
    [Show full text]
  • Proceedings of the Joint Workshop on Exploiting Synergies Between
    EACL 2012 Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra) at EACL-2012 Proceedings of the Workshop c 2012 The Association for Computational Linguistics ISBN 978-1-937284-19-0 Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ii Introduction Welcome to the Joint EACL Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra). This two-day workshop addresses two specific but related research problems in computational linguistics. The ESIRMT event (1st day) aims at reducing the gap, both theoretical and practical, between information retrieval and machine translation research and applications. Although both fields have been already contributing to each other instrumentally, there is still too much work to be done in relation to solidly framing these two fields into a common ground of knowledge from both the procedural and paradigmatic perspectives. The HyTra event (2nd day) aims at sharing ideas among researchers developing and applying statistical, example-based, or rule-based machine translation systems and who wish to enhance their systems with elements from the other approaches. The joint workshop provides participants with the opportunity of discussing research related to technology integration
    [Show full text]