TAUS Speech-To-Speech Translation Technology Report March 2017
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Can You Give Me Another Word for Hyperbaric?: Improving Speech
“CAN YOU GIVE ME ANOTHER WORD FOR HYPERBARIC?”: IMPROVING SPEECH TRANSLATION USING TARGETED CLARIFICATION QUESTIONS Necip Fazil Ayan1, Arindam Mandal1, Michael Frandsen1, Jing Zheng1, Peter Blasco1, Andreas Kathol1 Fred´ eric´ Bechet´ 2, Benoit Favre2, Alex Marin3, Tom Kwiatkowski3, Mari Ostendorf3 Luke Zettlemoyer3, Philipp Salletmayr5∗, Julia Hirschberg4, Svetlana Stoyanchev4 1 SRI International, Menlo Park, USA 2 Aix-Marseille Universite,´ Marseille, France 3 University of Washington, Seattle, USA 4 Columbia University, New York, USA 5 Graz Institute of Technology, Austria ABSTRACT Our previous work on speech-to-speech translation systems has We present a novel approach for improving communication success shown that there are seven primary sources of errors in translation: between users of speech-to-speech translation systems by automat- • ASR named entity OOVs: Hi, my name is Colonel Zigman. ically detecting errors in the output of automatic speech recogni- • ASR non-named entity OOVs: I want some pristine plates. tion (ASR) and statistical machine translation (SMT) systems. Our approach initiates system-driven targeted clarification about errorful • Mispronunciations: I want to collect some +de-MOG-raf-ees regions in user input and repairs them given user responses. Our about your family? (demographics) system has been evaluated by unbiased subjects in live mode, and • Homophones: Do you have any patients to try this medica- results show improved success of communication between users of tion? (patients vs. patience) the system. • MT OOVs: Where is your father-in-law? Index Terms— Speech translation, error detection, error correc- • Word sense ambiguity: How many men are in your com- tion, spoken dialog systems. pany? (organization vs. -
VENTURING INTO OUR PAST NEWSLETTER of the JEWISH GENEALOGICAL SOCIETY of the CONEJO VALLEY and VENTURA COUNTY (JGSCV) Volume 4, Issue 5 February 2009
VENTURING INTO OUR PAST NEWSLETTER OF THE JEWISH GENEALOGICAL SOCIETY OF THE CONEJO VALLEY AND VENTURA COUNTY (JGSCV) Volume 4, Issue 5 February 2009 President's Letter There was laughter while learning at Ron Aron's presentation at the January 4 meeting "The Musical 'Chicago' and All That Genealogical Jazz"! Ron is a four-time guest speaker, each time with a different presentation that has the audience learning while enjoying the tales he tells. This presentation, part of a program given at the 2008 IAJGS International Conference on Jewish Genealogy, demonstrated through a myriad of internet resources, newspaper and court documents the truth from fiction about Belva Gaertner, one of the two real-life women who were convicted of murdering their lovers in Chicago in 1924. The websites and references referred to by Ron is posted on www.JGSCV.org website under programs, previous, and the January 4, 2009 date. Recepients of this Newsletter who have not yet joined our JGSCV for the year 2009 are so encouraged. What better can you do to spend $25/$30 for socialization, fun, education and meeting new friends! Additional contributions to our library and program funds are greatly appreciated. As a non-profit organization your SPEAKER RON ARONS contributions are eligible for tax deductibility. We met last Nov. & Dec. on Monday evenings. Some members may prefer the Sunday afternoon meetings. The board is considering holding 2-3 meetings a year on a weekday evening. We would like to hear from members if they would have problems with such a plan. The majority of meetings would still be on Sunday afternoons. -
Neuro Informatics 2020
Neuro Informatics 2019 September 1-2 Warsaw, Poland PROGRAM BOOK What is INCF? About INCF INCF is an international organization launched in 2005, following a proposal from the Global Science Forum of the OECD to establish international coordination and collaborative informatics infrastructure for neuroscience. INCF is hosted by Karolinska Institutet and the Royal Institute of Technology in Stockholm, Sweden. INCF currently has Governing and Associate Nodes spanning 4 continents, with an extended network comprising organizations, individual researchers, industry, and publishers. INCF promotes the implementation of neuroinformatics and aims to advance data reuse and reproducibility in global brain research by: • developing and endorsing community standards and best practices • leading the development and provision of training and educational resources in neuroinformatics • promoting open science and the sharing of data and other resources • partnering with international stakeholders to promote neuroinformatics at global, national and local levels • engaging scientific, clinical, technical, industry, and funding partners in colla- borative, community-driven projects INCF supports the FAIR (Findable Accessible Interoperable Reusable) principles, and strives to implement them across all deliverables and activities. Learn more: incf.org neuroinformatics2019.org 2 Welcome Welcome to the 12th INCF Congress in Warsaw! It makes me very happy that a decade after the 2nd INCF Congress in Plzen, Czech Republic took place, for the second time in Central Europe, the meeting comes to Warsaw. The global neuroinformatics scenery has changed dramatically over these years. With the European Human Brain Project, the US BRAIN Initiative, Japanese Brain/ MINDS initiative, and many others world-wide, neuroinformatics is alive more than ever, responding to the demands set forth by the modern brain studies. -
Instructions for TC37 Submissions
Skype Translator: Breaking Down Language and Hearing Barriers A Behind the Scenes Look at Near Real-Time Speech Translation William D. Lewis Microsoft Research One Microsoft Way Redmond, WA 98125 [email protected] Abstract In the Skype Translator project, we set ourselves the ambitious goal of enabling successful open-domain conversations between Skype users in different parts of the world, speaking different languages. Build- ing such technology is more than just stitching together the component parts; it also requires work in allowing the parts to talk with one another. In addition to allowing speech communication between us- ers who speak different languages, these technologies also enable Skype communication with another class of users: those who have deafness or hard of hearing. Accommodating these additional users re- quired design changes that benefited all users of Skype Translator. Not only does Skype Translator promise to break down language barriers, it also promises to break down the hearing barrier. 1 Introduction In 1966, Star Trek introduced us to the notion of the Universal Translator. Such a device al- lowed Captain Kirk and his crew to communicate with alien species, such as the Gorn, who did not speak their language, or even converse with species who did not speak at all (e.g., the Com- panion from the episode Metamorphosis). In 1979, Douglas Adams introduced us to the “Babelfish” in the Hitchhiker’s Guide to the Galaxy which, when inserted into the ear, allowed the main character to do essentially the same thing: communicate with alien species who spoke different languages. Although flawless communication using speech and translation technol- ogy is beyond the current state of the art, major improvements in these technologies over the past decade have brought us many steps closer. -
E-Book Readers Kindle Sony and Nook a Comparative Study
E-BOOK READERS KINDLE SONY AND NOOK: A COMPARATIVE STUDY Baban Kumbhar Librarian Rayat Shikshan Sanstha’s Dada Patil Mahavidyalaya, Karjat, Dist Ahmednagar E-mail:[email protected] Abstract An e-book reader is a portable electronic device for reading digital books and periodicals, better known as e- books. An e-book reader is similar in form to a tablet computer. Kindle is a series of e-book readers designed and marketed by Amazon.com. The Sony Reader was a line of e-book readers manufactured by Sony. NOOK is a brand of e-readers developed by American book retailer Barnes & Noble. Researcher compares three E-book readers in the paper. Keywords: e-book reader, Kindle, Sony, Nook 1. Introduction: An e-reader, also called an e-book reader or e-book device, is a mobile electronic device that is designed primarily for the purpose of reading digital e-books and periodicals. Any device that can display text on a screen may act as an e-book reader, but specialized e-book reader designs may optimize portability, readability (especially in sunlight), and battery life for this purpose. A single e-book reader is capable of holding the digital equivalent of hundreds of printed texts with no added bulk or measurable mass.[1] An e-book reader is similar in form to a tablet computer. A tablet computer typically has a faster screen capable of higher refresh rates which makes it more suitable for interaction. Tablet computers also are more versatile, allowing one to consume multiple types of content, as well as create it. -
CCIA Comments in ITU CWG-Internet OTT Open Consultation.Pdf
CCIA Response to the Open Consultation of the ITU Council Working Group on International Internet-related Public Policy Issues (CWG-Internet) on the “Public Policy considerations for OTTs” Summary. The Computer & Communications Industry Association welcomes this opportunity to present the views of the tech sector to the ITU’s Open Consultation of the CWG-Internet on the “Public Policy considerations for OTTs”.1 CCIA acknowledges the ITU’s expertise in the areas of international, technical standards development and spectrum coordination and its ambition to help improve access to ICTs to underserved communities worldwide. We remain supporters of the ITU’s important work within its current mandate and remit; however, we strongly oppose expanding the ITU’s work program to include Internet and content-related issues and Internet-enabled applications that are well beyond its mandate and core competencies. Furthermore, such an expansion would regrettably divert the ITU’s resources away from its globally-recognized core competencies. The Internet is an unparalleled engine of economic growth enabling commerce, social development and freedom of expression. Recent research notes the vast economic and societal benefits from Rich Interaction Applications (RIAs), a term that refers to applications that facilitate “rich interaction” such as photo/video sharing, money transferring, in-app gaming, location sharing, translation, and chat among individuals, groups and enterprises.2 Global GDP has increased US$5.6 trillion for every ten percent increase in the usage of RIAs across 164 countries over 16 years (2000 to 2015).3 However, these economic and societal benefits are at risk if RIAs are subjected to sweeping regulations. -
A Survey of Voice Translation Methodologies - Acoustic Dialect Decoder
International Conference On Information Communication & Embedded Systems (ICICES-2016) A SURVEY OF VOICE TRANSLATION METHODOLOGIES - ACOUSTIC DIALECT DECODER Hans Krupakar Keerthika Rajvel Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering SSN College Of Engineering SSN College of Engineering E-mail: [email protected] E-mail: [email protected] Bharathi B Angel Deborah S Vallidevi Krishnamurthy Dept. of Computer Science Dept. of Computer Science Dept. of Computer Science and Engineering and Engineering and Engineering SSN College Of Engineering SSN College Of Engineering SSN College Of Engineering E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] Abstract— Speech Translation has always been about giving in order to make the process of communication amongst source text/audio input and waiting for system to give humans better, easier and efficient. However, the existing translated output in desired form. In this paper, we present the methods, including Google voice translators, typically Acoustic Dialect Decoder (ADD) – a voice to voice ear-piece handle the process of translation in a non-automated manner. translation device. We introduce and survey the recent advances made in the field of Speech Engineering, to employ in This makes the process of translation of word(s) and/or the ADD, particularly focusing on the three major processing sentences from one language to another, slower and more steps of Recognition, Translation and Synthesis. We tackle the tedious. We wish to make that process automatic – have a problem of machine understanding of natural language by device do what a human translator does, inside our ears. -
Learning Speech Translation from Interpretation
Learning Speech Translation from Interpretation zur Erlangung des akademischen Grades eines Doktors der Ingenieurwissenschaften von der Fakult¨atf¨urInformatik Karlsruher Institut f¨urTechnologie (KIT) genehmigte Dissertation von Matthias Paulik aus Karlsruhe Tag der m¨undlichen Pr¨ufung: 21. Mai 2010 Erster Gutachter: Prof. Dr. Alexander Waibel Zweiter Gutachter: Prof. Dr. Tanja Schultz Ich erkl¨arehiermit, dass ich die vorliegende Arbeit selbst¨andig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe sowie dass ich die w¨ortlich oder inhaltlich ¨ubernommenen Stellen als solche kenntlich gemacht habe und die Satzung des KIT, ehem. Universit¨atKarlsruhe (TH), zur Sicherung guter wissenschaftlicher Praxis in der jeweils g¨ultigenFassung beachtet habe. Karlsruhe, den 21. Mai 2010 Matthias Paulik Abstract The basic objective of this thesis is to examine the extent to which automatic speech translation can benefit from an often available but ignored resource, namely human interpreter speech. The main con- tribution of this thesis is a novel approach to speech translation development, which makes use of that resource. The performance of the statistical models employed in modern speech translation systems depends heavily on the availability of vast amounts of training data. State-of-the-art systems are typically trained on: (1) hundreds, sometimes thousands of hours of manually transcribed speech audio; (2) bi-lingual, sentence-aligned text corpora of man- ual translations, often comprising tens of millions of words; and (3) monolingual text corpora, often comprising hundreds of millions of words. The acquisition of such enormous data resources is highly time-consuming and expensive, rendering the development of deploy- able speech translation systems prohibitive to all but a handful of eco- nomically or politically viable languages. -
Three Directions for the Design of Human-Centered Machine Translation
Three Directions for the Design of Human-Centered Machine Translation Samantha Robertson Wesley Hanwen Deng University of California, Berkeley University of California, Berkeley samantha [email protected] [email protected] Timnit Gebru Margaret Mitchell Daniel J. Liebling Black in AI [email protected] Google [email protected] [email protected] Michal Lahav Katherine Heller Mark D´ıaz Google Google Google [email protected] [email protected] [email protected] Samy Bengio Niloufar Salehi Google University of California, Berkeley [email protected] [email protected] Abstract As people all over the world adopt machine translation (MT) to communicate across lan- guages, there is increased need for affordances that aid users in understanding when to rely on automated translations. Identifying the in- formation and interactions that will most help users meet their translation needs is an open area of research at the intersection of Human- Computer Interaction (HCI) and Natural Lan- guage Processing (NLP). This paper advances Figure 1: TranslatorBot mediates interlingual dialog us- work in this area by drawing on a survey of ing machine translation. The system provides extra sup- users’ strategies in assessing translations. We port for users, for example, by suggesting simpler input identify three directions for the design of trans- text. lation systems that support more reliable and effective use of machine translation: helping users craft good inputs, helping users under- what kinds of information or interactions would stand translations, and expanding interactiv- best support user needs (Doshi-Velez and Kim, ity and adaptivity. We describe how these 2017; Miller, 2019). For instance, users may be can be introduced in current MT systems and more invested in knowing when they can rely on an highlight open questions for HCI and NLP re- AI system and when it may be making a mistake, search. -
Is 42 the Answer to Everything in Subtitling-Oriented Speech Translation?
Is 42 the Answer to Everything in Subtitling-oriented Speech Translation? Alina Karakanta Matteo Negri Marco Turchi Fondazione Bruno Kessler Fondazione Bruno Kessler Fondazione Bruno Kessler University of Trento Trento - Italy Trento - Italy Trento - Italy [email protected] [email protected] [email protected] Abstract content, still rely heavily on human effort. In a typi- cal multilingual subtitling workflow, a subtitler first Subtitling is becoming increasingly important creates a subtitle template (Georgakopoulou, 2019) for disseminating information, given the enor- by transcribing the source language audio, timing mous amounts of audiovisual content becom- and adapting the text to create proper subtitles in ing available daily. Although Neural Machine Translation (NMT) can speed up the process the source language. These source language subti- of translating audiovisual content, large man- tles (also called captions) are already compressed ual effort is still required for transcribing the and segmented to respect the subtitling constraints source language, and for spotting and seg- of length, reading speed and proper segmentation menting the text into proper subtitles. Cre- (Cintas and Remael, 2007; Karakanta et al., 2019). ating proper subtitles in terms of timing and In this way, the work of an NMT system is already segmentation highly depends on information simplified, since it only needs to translate match- present in the audio (utterance duration, natu- ing the length of the source text (Matusov et al., ral pauses). In this work, we explore two meth- ods for applying Speech Translation (ST) to 2019; Lakew et al., 2019). However, the essence subtitling: a) a direct end-to-end and b) a clas- of a good subtitle goes beyond matching a prede- sical cascade approach. -
Speech Translation Into Pakistan Sign Language
Master Thesis Computer Science Thesis No: MCS-2010-24 2012 Speech Translation into Pakistan Sign Language Speech Translation into Pakistan Sign AhmedLanguage Abdul Haseeb Asim Illyas Speech Translation into Pakistan Sign LanguageAsim Ilyas School of Computing Blekinge Institute of Technology SE – 371 79 Karlskrona Sweden This thesis is submitted to the School of Computing at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science. The thesis is equivalent to 20 weeks of full time studies. Contact Information: Authors: Ahmed Abdul Haseeb E-mail: [email protected] Asim Ilyas E-mail: [email protected] University advisor: Prof. Sara Eriksén School of Computing, Blekinge Institute of Technology Internet : www.bth.se/com SE – 371 79 Karlskrona Phone : +46 457 38 50 00 Sweden Fax : + 46 457 271 25 ABSTRACT Context: Communication is a primary human need and language is the medium for this. Most people have the ability to listen and speak and they use different languages like Swedish, Urdu and English etc. to communicate. Hearing impaired people use signs to communicate. Pakistan Sign Language (PSL) is the preferred language of the deaf in Pakistan. Currently, human PSL interpreters are required to facilitate communication between the deaf and hearing; they are not always available, which means that communication among the deaf and other people may be impaired or nonexistent. In this situation, a system with voice recognition as an input and PSL as an output will be highly helpful. Objectives: As part of this thesis, we explore challenges faced by deaf people in everyday life while interacting with unimpaired. -
The MT Developer/Provider and the Global Corporation
The MT developer/provider and the global corporation Terence Lewis1 & Rudolf M.Meier2 1 Hook & Hatton Ltd, 2 Siemens Nederland N.V. [email protected], [email protected] Abstract. This paper describes the collaboration between MT developer/service provider, Terence Lewis (Hook & Hatton Ltd) and the Translation Office of Siemens Nederland N.V. (Siemens Netherlands). It involves the use of the developer’s Dutch-English MT software to translate technical documentation for divisions of the Siemens Group and for external customers. The use of this language technology has resulted in significant cost savings for Siemens Netherlands. The authors note the evolution of an ‘MT culture’ among their regu- lar users. 1. Introduction 2. The Developer’s perspective In 2002 Siemens Netherlands and Hook & Hat- Under the current arrangement, the Developer ton Ltd signed an exclusive agreement on the (working from home in the United Kingdom) marketing of Dutch-English MT services in the currently acts as the sole operator of an Auto- Netherlands. This agreement set the seal on an matic Translation Environment (Trasy), process- informal partnership that had evolved over the ing documents forwarded by e-mail by the Trans- preceding six-seven years between Siemens and lation Office of Siemens Netherlands in The the Developer (Terence Lewis, trading through Hague. Ideally, these documents are in MS Word the private company Hook & Hatton Ltd). Un- or rtf format, but the Translation Office also re- der the contract, Siemens Netherlands under- ceives pdf files or OCR output files for automatic takes to market the MT Services provided by translation.