Deliverable D 1.1 Deliverable Title: Benchmark

Total Page:16

File Type:pdf, Size:1020Kb

Deliverable D 1.1 Deliverable Title: Benchmark Deliverable D 1.1 Deliverable title: Benchmark Project acronym: Translate4Rail Starting date: 01/12/2019 Duration (in months): 24 Call (part) identifier: H2020-S2RJU-OC-2019 Grant agreement no: 881779 Due date of deliverable: M2 Actual submission date: 31-01-2019 Responsible: RNE Dissemination level: PU Status: Final Reviewed: yes GA 881779 P a g e 1 | 53 Document history Revision Date Description 1 31-01-2019 First issue Report contributors Name Beneficiary Short Details of contribution Name RailNetEurope RNE Consolidation of the benchmarking Assessment and analysis Union Internationale Initial benchmark research UIC des Chemins de fer Consolidation of the report GA 881779 P a g e 2 | 53 Table of Contents 1. Executive Summary ....................................................................................................................... 5 2. Abbreviations and acronyms ........................................................................................................ 6 3. Background ................................................................................................................................... 8 4. Objective/Aim ............................................................................................................................. 10 5. TRANSLATION FOR BREAKING LANGUAGE BARRIERS IN THE RAILWAY FIELD ........................... 11 5.1. Translate4Rail project objectives and benefits ................................................................. 11 6. BENCHMARK IN RAILWAYS AND OTHER INDUSTRIES ................................................................ 12 6.1. Railways ............................................................................................................................. 12 6.1.1. Use of speech recognition in simulators ........................................................................... 12 6.1.2. RNE and UIC ongoing projects .......................................................................................... 14 6.2. Air traffic management ..................................................................................................... 15 6.2.1. Activities in the field of speech recognition in air traffic .................................................. 15 6.2.2. MALORCA .......................................................................................................................... 17 6.2.3. AcListant ............................................................................................................................ 19 6.3. Maritime traffic ................................................................................................................. 20 6.3.1. Activities in the field of speech recognition in maritime traffic ....................................... 20 6.3.2. Fraunhofer IDMT: Improved Maritime Communication .................................................. 21 6.4. Military applications.......................................................................................................... 22 6.4.1. Activities in the field of speech recognition in military applications ................................ 22 6.4.2. MFLTS Project ................................................................................................................... 23 6.5. Automotive ....................................................................................................................... 24 6.5.1. Activities in the field of speech recognition in Automotive ............................................. 24 6.6. General Language projects ............................................................................................... 25 6.6.1. EMIME ............................................................................................................................... 25 6.6.2. TC STAR ............................................................................................................................. 26 6.7. The Benchmark of the industries ...................................................................................... 27 7. KEY SOLUTION PROVIDERS ......................................................................................................... 29 7.1. Key aspects for a language solution.................................................................................. 29 7.2. Google Translate ............................................................................................................... 31 7.3. Microsoft ........................................................................................................................... 32 7.4. IBM .................................................................................................................................... 34 7.5. Baidu ................................................................................................................................. 36 GA 881779 P a g e 3 | 53 7.6. Carnegie Mellon University ............................................................................................... 36 7.7. Systran ............................................................................................................................... 37 7.8. DeepL ................................................................................................................................ 38 7.9. iTranslate ........................................................................................................................... 39 7.10. SDL Government ............................................................................................................ 40 7.11. KantanMT ...................................................................................................................... 42 7.12. List of speech to speech applications ............................................................................ 43 7.13. List of translation devices .............................................................................................. 46 7.14. The Benchmark of the Translation tool and providers ................................................. 48 8. CONCLUSIONS ............................................................................................................................. 51 9. References .................................................................................................................................. 52 GA 881779 P a g e 4 | 53 1. Executive Summary Benchmark – Speech to Speech: State of the Art (Internet-based search) The concept of the Translate4Rail project is to offer drivers a fully comprehensive set of predefined standardised messages which encompass all they have to exchange with an infrastructure manager traffic controller in normal or exceptional operational situations in a country where they do not understand nor speak the local language. An IT tool will then be implemented to enable the driver and the traffic controller to understand each other even though each of them speaks in his/her native language. This will capitalise on the works already committed on this matter between Infrastructure Managers (IMs) and Railway Undertakings (RUs) at RNE and UIC level. These works have dealt with the analysis of the various types of operational situations needing exchanges between RUs drivers and IMs signallers. The project will enable to test these works and to further harmonise and standardise pre-defined messages in the light of the tests carried out. Such messages will be uttered by the driver or the traffic controller. They will then be identified, translated and uttered in the language of the other party by the given tool. The project will define the functional characteristics of the tool which will create a frame for the exchanges between drivers and traffic controllers. This tool will use voice recognition and translation applications. The tool will be tested on pilot trains running on cross border sections of rail freight corridors where drivers have to use different languages. The project intends to at least maintain the level of safety, increase the traffic fluidity at borders and to increase the competitiveness of the rail sector. Years ago, research has been started regarding the use of computers to recognize and render sign language. Currently, there are numbers of research projects focused also for text and speech automatic translation. Technology is rapidly changing and improving the way we know the world operates today and could in the future. A typical speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques have been proposed for the integration of speech recognition and machine translation. However, corresponding techniques have not yet been considered for speech synthesis. [1] Now, we are dealing with the question: What are the areas in which you see artificial intelligence playing a role? And there is a simple answer: Whether we admit it or not, new technologies and artificial intelligence in the future, perhaps soon, will greatly affect our work and our way of life. It is up to us how we can take advantage of the opportunities that could positively influence our daily lives. For man, language is the most natural thing, and so it is the greatest challenge that machines face, just like humans. Depends on the topic, this benchmark report is partly a compilation of internet-based search. GA 881779 P a g e 5 | 53 2. Abbreviations and acronyms Abbreviation / Acronyms Description
Recommended publications
  • Neuro Informatics 2020
    Neuro Informatics 2019 September 1-2 Warsaw, Poland PROGRAM BOOK What is INCF? About INCF INCF is an international organization launched in 2005, following a proposal from the Global Science Forum of the OECD to establish international coordination and collaborative informatics infrastructure for neuroscience. INCF is hosted by Karolinska Institutet and the Royal Institute of Technology in Stockholm, Sweden. INCF currently has Governing and Associate Nodes spanning 4 continents, with an extended network comprising organizations, individual researchers, industry, and publishers. INCF promotes the implementation of neuroinformatics and aims to advance data reuse and reproducibility in global brain research by: • developing and endorsing community standards and best practices • leading the development and provision of training and educational resources in neuroinformatics • promoting open science and the sharing of data and other resources • partnering with international stakeholders to promote neuroinformatics at global, national and local levels • engaging scientific, clinical, technical, industry, and funding partners in colla- borative, community-driven projects INCF supports the FAIR (Findable Accessible Interoperable Reusable) principles, and strives to implement them across all deliverables and activities. Learn more: incf.org neuroinformatics2019.org 2 Welcome Welcome to the 12th INCF Congress in Warsaw! It makes me very happy that a decade after the 2nd INCF Congress in Plzen, Czech Republic took place, for the second time in Central Europe, the meeting comes to Warsaw. The global neuroinformatics scenery has changed dramatically over these years. With the European Human Brain Project, the US BRAIN Initiative, Japanese Brain/ MINDS initiative, and many others world-wide, neuroinformatics is alive more than ever, responding to the demands set forth by the modern brain studies.
    [Show full text]
  • Context in Neural Machine Translation: a Review of Models and Evaluations
    Context in Neural Machine Translation: A Review of Models and Evaluations Andrei Popescu-Belis University of Applied Sciences of Western Switzerland (HES–SO) School of Management and Engineering Vaud (HEIG–VD) Route de Cheseaux 1, CP 521 1401 Yverdon-les-Bains, Switzerland [email protected] January 29, 2019 Abstract features when translating entire texts, especially when this is vital for correct translation. Taking textual con- This review paper discusses how context has been text into account1 means modeling long-range depen- used in neural machine translation (NMT) in the dencies between words, phrases, or sentences, which past two years (2017–2018). Starting with a brief are typically studied by linguistics under the topics of retrospect on the rapid evolution of NMT models, discourse and pragmatics. When it comes to transla- the paper then reviews studies that evaluate NMT tion, the capacity to model context may improve certain output from various perspectives, with emphasis on those analyzing limitations of the translation of translation decisions, e.g. by favoring a better lexical contextual phenomena. In a subsequent version, choice thanks to document-level topical information, the paper will then present the main methods that or by constraining pronoun choice thanks to knowledge were proposed to leverage context for improving about antecedents. translation quality, and distinguishes methods that This review paper puts into perspective the signif- aim to improve the translation of specific phenom- icant amount of studies devoted in 2017 and 2018 to ena from those that consider a wider unstructured improve the use of context in NMT and measure these context.
    [Show full text]
  • Final Study Report on CEF Automated Translation Value Proposition in the Context of the European LT Market/Ecosystem
    Final study report on CEF Automated Translation value proposition in the context of the European LT market/ecosystem FINAL REPORT A study prepared for the European Commission DG Communications Networks, Content & Technology by: Digital Single Market CEF AT value proposition in the context of the European LT market/ecosystem Final Study Report This study was carried out for the European Commission by Luc MEERTENS 2 Khalid CHOUKRI Stefania AGUZZI Andrejs VASILJEVS Internal identification Contract number: 2017/S 108-216374 SMART number: 2016/0103 DISCLAIMER By the European Commission, Directorate-General of Communications Networks, Content & Technology. The information and views set out in this publication are those of the author(s) and do not necessarily reflect the official opinion of the Commission. The Commission does not guarantee the accuracy of the data included in this study. Neither the Commission nor any person acting on the Commission’s behalf may be held responsible for the use which may be made of the information contained therein. ISBN 978-92-76-00783-8 doi: 10.2759/142151 © European Union, 2019. All rights reserved. Certain parts are licensed under conditions to the EU. Reproduction is authorised provided the source is acknowledged. 2 CEF AT value proposition in the context of the European LT market/ecosystem Final Study Report CONTENTS Table of figures ................................................................................................................................................ 7 List of tables ..................................................................................................................................................
    [Show full text]
  • Three Directions for the Design of Human-Centered Machine Translation
    Three Directions for the Design of Human-Centered Machine Translation Samantha Robertson Wesley Hanwen Deng University of California, Berkeley University of California, Berkeley samantha [email protected] [email protected] Timnit Gebru Margaret Mitchell Daniel J. Liebling Black in AI [email protected] Google [email protected] [email protected] Michal Lahav Katherine Heller Mark D´ıaz Google Google Google [email protected] [email protected] [email protected] Samy Bengio Niloufar Salehi Google University of California, Berkeley [email protected] [email protected] Abstract As people all over the world adopt machine translation (MT) to communicate across lan- guages, there is increased need for affordances that aid users in understanding when to rely on automated translations. Identifying the in- formation and interactions that will most help users meet their translation needs is an open area of research at the intersection of Human- Computer Interaction (HCI) and Natural Lan- guage Processing (NLP). This paper advances Figure 1: TranslatorBot mediates interlingual dialog us- work in this area by drawing on a survey of ing machine translation. The system provides extra sup- users’ strategies in assessing translations. We port for users, for example, by suggesting simpler input identify three directions for the design of trans- text. lation systems that support more reliable and effective use of machine translation: helping users craft good inputs, helping users under- what kinds of information or interactions would stand translations, and expanding interactiv- best support user needs (Doshi-Velez and Kim, ity and adaptivity. We describe how these 2017; Miller, 2019). For instance, users may be can be introduced in current MT systems and more invested in knowing when they can rely on an highlight open questions for HCI and NLP re- AI system and when it may be making a mistake, search.
    [Show full text]
  • The MT Developer/Provider and the Global Corporation
    The MT developer/provider and the global corporation Terence Lewis1 & Rudolf M.Meier2 1 Hook & Hatton Ltd, 2 Siemens Nederland N.V. [email protected], [email protected] Abstract. This paper describes the collaboration between MT developer/service provider, Terence Lewis (Hook & Hatton Ltd) and the Translation Office of Siemens Nederland N.V. (Siemens Netherlands). It involves the use of the developer’s Dutch-English MT software to translate technical documentation for divisions of the Siemens Group and for external customers. The use of this language technology has resulted in significant cost savings for Siemens Netherlands. The authors note the evolution of an ‘MT culture’ among their regu- lar users. 1. Introduction 2. The Developer’s perspective In 2002 Siemens Netherlands and Hook & Hat- Under the current arrangement, the Developer ton Ltd signed an exclusive agreement on the (working from home in the United Kingdom) marketing of Dutch-English MT services in the currently acts as the sole operator of an Auto- Netherlands. This agreement set the seal on an matic Translation Environment (Trasy), process- informal partnership that had evolved over the ing documents forwarded by e-mail by the Trans- preceding six-seven years between Siemens and lation Office of Siemens Netherlands in The the Developer (Terence Lewis, trading through Hague. Ideally, these documents are in MS Word the private company Hook & Hatton Ltd). Un- or rtf format, but the Translation Office also re- der the contract, Siemens Netherlands under- ceives pdf files or OCR output files for automatic takes to market the MT Services provided by translation.
    [Show full text]
  • A Comparison of Free Online Machine Language Translators Mahesh Vanjani1,* and Milam Aiken2 1Jesse H
    Journal of Management Science and Business Intelligence, 2020, 5–1 July. 2020, pages 26-31 doi: 10.5281/zenodo.3961085 http://www.ibii-us.org/Journals/JMSBI/ ISBN 2472-9264 (Online), 2472-9256 (Print) A Comparison of Free Online Machine Language Translators Mahesh Vanjani1,* and Milam Aiken2 1Jesse H. Jones School of Business, Texas Southern University, 3100 Cleburne Street, Houston, TX 77004 2School of Business Administration, University of Mississippi, University, MS 38677 *Email: [email protected] Received on 5/15/2020; revised on 7/25/2020; published on 7/26/2020 Abstract Automatic Language Translators also referred to as machine translation software automate the process of language translation without the intervention of humans While several automated language translators are available online at no cost there are large variations in their capabilities. This article reviews prior tests of some of these systems, and, provides a new and current comprehensive evaluation of the following eight: Google Translate, Bing Translator, Systran, PROMT, Babylon, WorldLingo, Yandex, and Reverso. This re- search could be helpful for users attempting to explore and decide which automated language translator best suits their needs. Keywords: Automated Language Translator, Multilingual, Communication, Machine Translation published results of prior tests and experiments we present a new and cur- 1 Introduction rent comprehensive evaluation of the following eight electronic automatic language translators: Google Translate, Bing Translator, Systran, Automatic Language Translators also referred to as machine translation PROMT, Babylon, WorldLingo, Yandex, and Reverso. This research software automate the process of language translation without the inter- could be helpful for users attempting to explore and decide which auto- vention of humans.
    [Show full text]
  • Machine Translation in the Field of Law: a Study of the Translation of Italian Legal Texts Into German
    Comparative Legilinguistics vol. 37/2019 DOI: http://dx.doi.org/10.14746/cl.2019.37.4 MACHINE TRANSLATION IN THE FIELD OF LAW: A STUDY OF THE TRANSLATION OF ITALIAN LEGAL TEXTS INTO GERMAN EVA WIESMANN, Prof., PhD Department of Interpreting and Translation, University of Bologna Corso della Repubblica 136, 47121 Forlì, Italy [email protected] ORCID: https://orcid.org/0000-0001-9414-8038 Abstract: With the advent of the neural paradigm, machine translation has made another leap in quality. As a result, its use by trainee translators has increased considerably, which cannot be disregarded in translation pedagogy. However, since legal texts have features that pose major challenges to machine translation, the question arises as to what extent machine translation is now capable of translating legal texts or at least certain types of legal text into another legal language well enough so that the post- editing effort is limited, and, consequently, whether a targeted use in translation pedagogy can be considered. In order to answer this question, DeepL Translator, a machine translation system, and MateCat, a CAT system that integrates machine translation, were tested. The test, undertaken at different times and without specific translation memories, provided Eva Wiesmann: Machine Translation in… for the translation of several legal texts of different types utilising both systems, and was followed by systematisation of errors and evaluation of translation results. The evaluation was carried out according to the following criteria: 1) comprehensibility and meaningfulness of the target text; and 2) correspondence between source and target text in consideration of the specific translation situation.
    [Show full text]
  • Automated Translation in Typo3
    AUTOMATED TRANSLATION IN TYPO3 Whitepaper on Translating contents automatically using DeepL and Google Translate in TYPO3 backend by Georgy GE What is TYPO3? TYPO3 CMS is an Open Source Enterprise Content Management System with a large global community. TYPO3 is written in PHP scripting language and was initially authored by Dane Kasper Skårhøj in 1997. The standout features of TYPO3 is its diversity and modularity. It can keep running on multiple web servers like Apache, Nginx or IIS and over numerous operating systems such as Linux, Microsoft Windows, FreeBSD, Mac OS X and OS/2. TYPO3 provides the basis for modern content management that can be adapted by small business websites to large multi-lingual global enterprises portals. TYPO3 always keeps a check on the updated requirements of businesses and public institutions. TYPO3 is opted by small and medium enterprises and municipalities because of its license- cost-free open source approach. In addition to the basic set of interfaces, functions and modules, TYPO3's functionality spectrum can be implemented using extensions. TYPO3 allows users or website operators to upgrade the corresponding websites using a solid extensions framework, and helps to publish and deliver any form of content to multiple devices. During the course of development of a TYPO3 website, a developer can rely on a strong community and about 6,000 extensions that offer unlimited possibilities. Ph : +41 43 558 4360 E-mail: [email protected] Visit: www.pitsolutions.com 2 Why TYPO3? 1. No license cost TYPO3 is an open source software under the GNU General Public License with no license fee.
    [Show full text]
  • TAUS Speech-To-Speech Translation Technology Report March 2017
    TAUS Speech-to-Speech Translation Technology Report March 2017 1 Authors: Mark Seligman, Alex Waibel, and Andrew Joscelyne Contributor: Anne Stoker COPYRIGHT © TAUS 2017 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system of any nature, or transmit- ted or made available in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of TAUS. TAUS will pursue copyright infringements. In spite of careful preparation and editing, this publication may contain errors and imperfections. Au- thors, editors, and TAUS do not accept responsibility for the consequences that may result thereof. Design: Anne-Maj van der Meer Published by TAUS BV, De Rijp, The Netherlands For further information, please email [email protected] 2 Table of Contents Introduction 4 A Note on Terminology 5 Past 6 Orientation: Speech Translation Issues 6 Present 17 Interviews 19 Future 42 Development of Current Trends 43 New Technology: The Neural Paradigm 48 References 51 Appendix: Survey Results 55 About the Authors 56 3 Introduction The dream of automatic speech-to-speech devices enabling on-the-spot exchanges using translation (S2ST), like that of automated trans- voice or text via smartphones – has helped lation in general, goes back to the origins of promote the vision of natural communication computing in the 1950s. Portable speech trans- on a globally connected planet: the ability to lation devices have been variously imagined speak to someone (or to a robot/chatbot) in as Star Trek’s “universal translator” to negoti- your language and be immediately understood ate extraterrestrial tongues, Douglas Adams’ in a foreign language.
    [Show full text]
  • Report About a Survey-Based Research on Machine Translation Post-Editing
    REPORT ABOUT A SURVEY‐BASED RESEARCH ON MACHINE TRANSLATION POST‐EDITING Common ground and gaps between LSCs, linguists and trainers Clara Ginovart Cid Universitat Pompeu Fabra Barcelona 22/12/2020 This report presents the findings of a survey‐based research based on three online questionnaires. It is part of the Industrial Doctorates research n° “2017 DI 010”. One questionnaire is addressed to Language Service Companies (LSCs) that sell or use as a process machine translation post‐editing (MTPE, also abbreviated PEMT). It received 66 submissions. Another questionnaire is addressed to individual linguists (inhouse or freelance) who accept MTPE assignments, and it received 141 submissions. Finally, the third questionnaire is addressed to European master or postgraduate PE educators, and it received 54 submissions. The survey‐based research is completed in between the end of 2018 and the beginning of 2019. In this report, the content of each questionnaire is presented in the first section (1. Content of the Questionnaires). In the second section (2. Results of the Submissions), the data of their findings is displayed, where the percentages have been rounded off. The three related publications where the findings are discussed are cited below: Ginovart Cid C, Colominas C, Oliver A. Language industry views on the profile of the post‐ editor. Translation Spaces. 2020 Jan 14. DOI:10.1075/ts.19010.cid Ginovart Cid C. The Professional Profile of a Post‐editor according to LSCs and Linguists: a Survey‐Based Research. HERMES ‐ Journal of Language and Communication in Business, 60, 171‐190. 2020 Jul 08. DOI:10.7146/hjlcb.v60i0.121318 Ginovart Cid C, Oliver A.
    [Show full text]
  • Assessing the Comprehensibility and Perception of Machine Translations
    ASSESSING THE COMPREHENSIBILITY AND PERCEPTION OF MACHINE TRANSLATIONS A PILOT STUDY Aantal woorden: 16 804 Iris Ghyselen Studentennummer: 01400320 Promotor: Prof. dr. Lieve Macken Masterproef voorgelegd voor het behalen van de graad master in het vertalen in de richting Toegepaste Taalkunde Academiejaar: 2017 – 2018 ASSESSING THE COMPREHENSIBILITY AND PERCEPTION OF MACHINE TRANSLATIONS A PILOT STUDY Aantal woorden: 16 804 Iris Ghyselen Studentennummer: 01400320 Promotor: Prof. dr. Lieve Macken Masterproef voorgelegd voor het behalen van de graad master in het vertalen in de richting Toegepaste Taalkunde Academiejaar: 2017 – 2018 i VERKLARING I.V.M. AUTEURSRECHT De auteur en de promotor(en) geven de toelating deze studie als geheel voor consultatie beschikbaar te stellen voor persoonlijk gebruik. Elk ander gebruik valt onder de beperkingen van het auteursrecht, in het bijzonder met betrekking tot de verplichting de bron uitdrukkelijk te vermelden bij het aanhalen van gegevens uit deze studie. ii ACKNOWLEDGMENTS There are several people who deserve my profound gratitude for their help and support while I was writing this dissertation. In the first place I would like to give thanks to all the respondents who filled in my questionnaire. That small act of kindness meant a great deal to me. The questionnaire required some time and attention to fill in and was distributed in a period when people were being overwhelmed on social media with all kinds of questionnaires for other theses. Therefore, I am very grateful to each and every one of them who helped me. Secondly, I would like to express my deep sense of gratitude to my supervisor, Prof.
    [Show full text]
  • Translation Technology Landscape Report
    Translation Technology Landscape Report April 2013 Funded by LT-Innovate Copyright © TAUS 2013 1 Translation Technology Landscape Report COPYRIGHT © TAUS 2013 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system of any nature, or transmitted or made available in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of TAUS. TAUS will pursue copyright infringements. In spite of careful preparation and editing, this publication may contain errors and imperfections. Authors, editors, and TAUS do not accept responsibility for the consequences that may result thereof. Published by TAUS BV, De Rijp, The Netherlands. For further information, please email [email protected] 2 Copyright © TAUS 2013 Translation Technology Landscape Report Target Audience Any individual interested in the business of translation will gain from this report. The report will help beginners to understand the main uses for different types of translation technology, differentiate offerings and make informed decisions. For more experienced users and business decision-makers, the report shares insights on key trends, future prospects and areas of uncertainty. ,QYHVWRUVDQGSROLF\PDNHUVZLOOEHQH¿WIURPDQDO\VHVRIXQGHUO\LQJYDOXH propositions. Report Structure This report has been structured in discrete chapters and sections so that the reader can consume the information needed. The report does not need to be read from VWDUWWR¿QLVK7KHFRQYHQLHQFHDIIRUGHGIURPVXFKDVWUXFWXUHDOVRPHDQVWKHUHLV some repetition. Authors: Rahzeb Choudhury and Brian McConnell Reviewers - Jaap van der Meer and Rose Lockwood Our thanks to LT-Innovative for funding this report. Copyright © TAUS 2013 3 Translation Technology Landscape Report CONTENTS 1. Translation Technology Landscape in a Snapshot 8 2.
    [Show full text]