Statistical Machine Translation from English to Tuvan*

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Machine Translation from English to Tuvan* Statistical Machine Translation from English to Tuvan* Rachel Killackey, Swarthmore College rkillac [email protected] Linguistics Senior Thesis 2013 Abstract This thesis aims to describe and analyze findings of the Tuvan Machine Translation Project, which attempts to create a functional statistical machine translation (SMT) model between English and Tuvan, a minority language spoken in southern Siberia. Though most Tuvan speakers are also fluent in Russian, easily accessible SMT technology would allow for simpler English translation without the use of Russian as an intermediary language. The English to Tuvan half of the system that I examine makes consistent morphological errors, particularly involving the absence of the accusative suffix with the basic form -ni. Along with a typological analysis of these errors, I show that the introduction of novel data that corrects for the missing accusative suffix can improve the performance of an SMT system. This result leads me to conclude that SMT can be a useful avenue for efficient translation. However, I also argue that SMT may benefit from the incorporation of some linguistic knowledge such as morphological rules in the early steps of creating a system. 1. Introduction This thesis explores the field of machine translation (MT), the use of computers in rendering one natural language into another, with a specific focus on MT between English and Tuvan, a Turkic language spoken in south central Siberia. While MT is a growing force in the translation of major languages with millions of speakers such as French, Spanish, and Russian, minority and non-dominant languages with relatively few numbers of speakers have been largely ignored. Additionally, languages with complex morphology have been difficult candidates for the creation of successful MT systems, particularly with regards to statistical machine translation (SMT), which uses probabilistic methods in processing a corpus of texts. Tuvan fulfills both of these * Many thanks to the following people for their help throughout the process of writing this thesis: Nathan Sanders, for his incredibly helpful thesis advising; K. David Harrison, who generously provided the opportunity for me to do this research; Kathryn Montemurro, for her indispensible wisdom and music taste; Vicki Sear, for her excellent editing skills; and Peter Nilsson, for his unparalleled scripting expertise. I also want to extend a huge thank you to the Microsoft Research machine translation team and to all of the Tuvan reviewers and project leaders who worked on the Tuvan MT Project. I am profoundly indebted to their contributions to the Project and therefore to this thesis. Any remaining errors are my own. This thesis is dedicated to my late father, Joseph Killackey. KILLACKEY 2 qualities: it is a minority language with complex, agglutinative morphology. Thus, Tuvan presents an interesting subject for the implementation of an SMT system. This thesis culminates with the assertion that an SMT system can in fact improve from the reintroduction of data that targets and corrects morphological errors - specifically involving a linguistic unit as minute as one affix - that the system has made previously. Thus, I argue for an approach to SMT that also incorporates elements of linguistic structure. I begin in Section 2 with an overview of the available literature on machine translation, focusing on the two major paradigms ofMT: rule-based and statistical machine translation. In addition, I discuss the primary way in which the output quality of most MT systems is evaluated: the Bilingual Evaluation Understudy (BLEU) score. In Section 3, I introduce the Tuvan Machine Translation Project and the Microsoft Translator Hub and summarize the methodology ofthe Project. I also present a basic sketch of the grammar of Tuvan, emphasizing complex elements of phonology and morphology that have been difficult for the Project's SMT system to grapple with, and I present the results of the English to Tuvan half of the system. In Section 4, I analyze the types of errors that the Project's SMT system makes and the post-editing corrections to these errors made by fluent Tuvan speakers. I provide an analysis ofthe effects of re-presenting the corrected data into the system in Section 5 and summarize the major results of this thesis and offer concluding remarks in Section 6. 2. Machine Translation From its inception in the 1950s, the field of machine translation (MT) has undergone a long and varied history to reach its status today as a major presence in both the research community and commercial sector (Hutchins 1986). Defined as "the application of computers to the translation of texts from one natural language to another," MT has been implemented to achieve anyone or KILLACKEY 3 more of the following goals: assimilation, the translation of foreign material for the purpose of understanding the content; dissemination, translating text for publication in other languages; and communication, the translation of more informal content such as emails, chat room discussions, and online blogs (Hutchins 1986:14, Koehn 2010). While most MT systems certainly do not produce perfect translations, the output can still be useful even for monolingual foreign language speakers in gathering a basic understanding of a text. For the purposes of this thesis, I am concerned primarily with the MT goals of assimilation and communication. To begin, I define some key terms. In the domain ofMT, a source language is the language from which a text is being translated, while a target language is the language into which a text is being translated (Hutchins 1986). Together, these two languages are called a language pair. Thus, translation can be defined as the general task in which texts in the source language are rendered into the target language, such that "the only invariant between the two is meaning" (Nirenburg and Goodman 1998:291). Furthermore, most MT systems require some degree of post-editing, or human revision to the MT system output. Native speakers of the target language usually perform the post-editing, with their main task being the rearrangement ofthe MT output into coherent, grammatical sentences of the target language. Parallel corpora are source language texts paired with their target translations and are imperative for many types of MT. Monolingual texts are documents written in the target language that help the translation system decide which of the considered alternative translations is more accurate, natural­ sounding, and in tune with context in examples of the target language. Finally, reference translations are texts written in the target language to which the translation system output is compared in computing the BLEU score. KILLACKEY 4 The subject of what constitutes a "good" translation of any kind is still relatively ill defined, though there are methods for assessing and comparing the quality of the output of MT systems (see Section 2.3). However, the main task ofMT can be stated quite simply: the computer must obtain input in the source language and produce an output text in the target language so that the meaning of the source text is the same as that of the target text. In fact, the differences among the MT efforts can be summarized in terms of the solutions that they propose for the problem of finding meanings of expression in target language for the various facts of meaning of the input text units. Nirenburg raises several important questions with regards to the issue ofthe translation of meaning (1987:2): 1. What is the meaning of the text? 2. Does it have any component structure? 3. How does one represent the meaning of a text? 4. How does one set out to extract the meaning of a text? 5. Is it absolutely necessary to extract meaning (or at least all of the meaning) in order to translate? While MT may not be able to answer these questions directly, they do help to underscore the fact that the central problem ofMT (and perhaps of translation in general) is not computational, but linguistic. Creating linguistic rules or statistical algorithms with which to analyze data is difficult enough, but dealing with lexical ambiguity, syntactic complexity, vocabulary differences, elliptical and ungrammatical constructions, and retaining meaning makes the process decidedly more complex. There are two main paradigms currently implemented in the field of MT that attempt to address these difficulties based on rule-based methods and on statistical methods. 2.1 Rule-based Machine Translation In general, rule-based machine translation (RBMT) - also known as "Knowledge-based MT" or the "Classical Approach" - was the approach heralded by some of the first MT researchers in the KILLACKEY 5 1970s and 1980s, including those who built pioneering systems such as SYSTRAN and Eurotra (Toma 1977, Johnson et al. 1985). This approach is characterized by a heavy emphasis on both source and target linguistic information in creating a system. Historically, there have been three subtypes of RBMT: direct, transfer, and interlingua. Each of these types differs in the degree to which the representation of meaning and the linguistic structures are tied to the language pair in question. The direct approach, depicted in Figure 1, is the simplest of the three. This method carries out translation unidirectionally, or from only one language to another, for one specific language pair (e.g., only English to Russian). Figure 1. Direct approach to rule-based machine translation (Hutchins and Somers 1992). Second, the interlingua approach uses an additional intermediate step to create a general representation of meaning that is independent of the language pair in question. This approach operates bidirectionally (e.g., both from English to Russian and from Russian to English) and occurs in two stages for each direction: from the source language to the interlingua, and then from the interlingua to the target language. Finally, the transfer approach involves three stages, each generating some level of syntactic representation of both the source and target languages (Hutchins and Somers 1992).
Recommended publications
  • The Impact of Crowdsourcing Post-Editing with the Collaborative Translation Framework
    The Impact of Crowdsourcing Post-editing with the Collaborative Translation Framework Takako Aikawa1, Kentaro Yamamoto2, and Hitoshi Isahara2 1 Microsoft Research, Machine Translation Team [email protected] 2 Toyohashi University of Technology [email protected], [email protected] Abstract. This paper presents a preliminary report on the impact of crowdsourcing post-editing through the so-called “Collaborative Translation Framework” (CTF) developed by the Machine Translation team at Microsoft Research. We first provide a high-level overview of CTF and explain the basic functionalities available from CTF. Next, we provide the motivation and design of our crowdsourcing post-editing project using CTF. Last, we present the re- sults from the project and our observations. Crowdsourcing translation is an in- creasingly popular-trend in the MT community, and we hope that our paper can shed new light on the research into crowdsourcing translation. Keywords: Crowdsourcing post-editing, Collaborative Translation Framework. 1 Introduction The output of machine translation (MT) can be used either as-is (i.e., raw-MT) or for post-editing (i.e., MT for post-editing). Although the advancement of MT technology is making raw-MT use more pervasive, reservations about raw-MT still persist; espe- cially among users who need to worry about the accuracy of the translated contents (e.g., government organizations, education institutes, NPO/NGO, enterprises, etc.). Professional human translation from scratch, however, is just too expensive. To re- duce the cost of translation while achieving high translation quality, many places use MT for post-editing; that is, use MT output as an initial draft of translation and let human translators post-edit it.
    [Show full text]
  • How to Use Google Translate
    HOW TO USE GOOGLE TRANSLATE For some ASVAB CEP participants (or their parents), English is a second language. Google Translate is an easy way to instantly translate any webpage using these steps. Google Chrome Internet Explorer 1. Open Google Chrome. Google Translate is available on Internet Explorer version 6 and 2. Go to asvabprogram.com. later. To activate it: 3. Right click anywhere on the webpage. 1. Open Internet Explorer. 4. Select Translate from the menu. 2. Go to Google Toolbar’s website (toolbar.google.com), 5. Select Options. and click the “Download Google Toolbar” button. 6. On the Translate Language dropdown, 3. Click on “Accept and Install” and the toolbar will be select the desired language. automatically installed on your Internet Explorer. 4. Click Run or Open in the window that appears. 5. Enable the toolbar. 6. Go to asvabprogram.com. 7. Select More >> 8. Select Translate. 9. Then, the translate button will appear at the top of your webpage. 10. Right click to select the language option. 7. You will see the Google Translate icon in the browser bar, which you can use to manage your translation settings. iphone Android Microsoft Translator is a universal app for 1. On your Android phone or iPhone and iPad, and can be downloaded tablet, open the Chrome app. from the App Store for free. Once you’ve 2. Go to a webpage. got it downloaded, you can set up the action extension for translation web pages. 3. To change the language, tap 4. Tap Translate… To activate the Microsoft Translator extension in Safari: 5.
    [Show full text]
  • Final Study Report on CEF Automated Translation Value Proposition in the Context of the European LT Market/Ecosystem
    Final study report on CEF Automated Translation value proposition in the context of the European LT market/ecosystem FINAL REPORT A study prepared for the European Commission DG Communications Networks, Content & Technology by: Digital Single Market CEF AT value proposition in the context of the European LT market/ecosystem Final Study Report This study was carried out for the European Commission by Luc MEERTENS 2 Khalid CHOUKRI Stefania AGUZZI Andrejs VASILJEVS Internal identification Contract number: 2017/S 108-216374 SMART number: 2016/0103 DISCLAIMER By the European Commission, Directorate-General of Communications Networks, Content & Technology. The information and views set out in this publication are those of the author(s) and do not necessarily reflect the official opinion of the Commission. The Commission does not guarantee the accuracy of the data included in this study. Neither the Commission nor any person acting on the Commission’s behalf may be held responsible for the use which may be made of the information contained therein. ISBN 978-92-76-00783-8 doi: 10.2759/142151 © European Union, 2019. All rights reserved. Certain parts are licensed under conditions to the EU. Reproduction is authorised provided the source is acknowledged. 2 CEF AT value proposition in the context of the European LT market/ecosystem Final Study Report CONTENTS Table of figures ................................................................................................................................................ 7 List of tables ..................................................................................................................................................
    [Show full text]
  • Metia Cloud OS Ss
    U.S. Army Europe saves more than $150,000 by automating database translation Customer: U.S. Army Europe Website: www.eur.army.mil “By using the Microsoft Translator API to automate SQL Customer Size: 29,000 soldiers Server data translation into English, we are able to Country or Region: Germany Industry: Military/public sector present senior leaders with universally usable data that Customer Profile supports better informed decisions.” U.S. Army Europe trains and leads Army Mark Hutcheson forces in 51 countries to support U.S. IT Specialist, U.S. Army Europe European Command and Headquarters, Department of the Army. Before migrating to Microsoft Dynamics CRM, U.S. Army Europe Benefits needed to translate portions of a SQL Server database used for ◼ Enhanced force protection ◼ Saved $150,500 in manual translation screening and hiring local nationals. Using the Microsoft costs ◼ Improved usability of data Translator API, Microsoft Visual C#, and the common language runtime (CLR) environment, engineers automated the translation Software and Services ◼ Microsoft Server Product Portfolio of select SQL Server data into English. As a result, the Army saved − Microsoft SQL Server 2012 about $150,500 (about 1,750 hours) in manual translation costs, ◼ Microsoft Dynamics CRM ◼ Microsoft Visual Studio avoided a seven-month delay, and maintained access to all of its − Microsoft Visual C# historical employment screening data. ◼ Technologies − Microsoft Translator API information was typically submitted in a − Transact SQL Business Needs U.S. Army Europe trains, equips, deploys, language other than English. and provides command and control of troops to enhance transatlantic security. To All of the application data was stored in a support that mission, it employs many local SQL Server database to be used for nationals for civilian jobs such as land- screening and hiring employees and scaping, food services, and maintenance.
    [Show full text]
  • Empowering People with Disabilities Through AI
    Empowering people with disabilities through AI Microsoft WBCSD Future of Work case study February 2020 Table of Contents Summary ............................................................................................................................................................... 2 Company background ............................................................................................................................................ 2 Future of Work challenge ...................................................................................................................................... 3 Business case ......................................................................................................................................................... 3 Microsoft’s solution ............................................................................................................................................... 3 Seeing AI............................................................................................................................................................... 4 Helpicto ................................................................................................................................................................ 4 Microsoft Translator ............................................................................................................................................ 5 Results ..................................................................................................................................................................
    [Show full text]
  • TRANSLATORS WITHOUT BORDERS a Community Translating to Save Lives
    The Voice of Interpreters and Translators THE ATA Nov/Dec 2015 Volume XLIV Number 9 CHRONICLE TRANSLATORS WITHOUT BORDERS A Community Translating To Save Lives PEMT Yourself! Don't Leave Money You're Owed on the Table! Beyond Post-Editing: Advances in Interactive Translation Environments Switching from a Laptop to a Tablet: An Interpreter’s Experience A Publication of the American Translators Association CAREERS at the NATIONAL SECURITY AGENCY inspiredTHINKING When in the office, NSA language analysts develop new perspectives NSA has a critical need for individuals with the on the dialect and nuance of foreign language, on the context and following language capabilities: cultural overtones of language translation. • Arabic • Chinese We draw our inspiration from our work, our colleagues and our lives. • Farsi During downtime we create music and paintings. We run marathons • Korean and climb mountains, read academic journals and top 10 fiction. • Russian • Spanish Each of us expands our horizons in our own unique way and makes • And other less commonly taught languages connections between things never connected before. APPLY TODAY At the National Security Agency, we are inspired to create, inspired to invent, inspired to protect. U.S. citizenship is required for all applicants. NSA is an Equal Opportunity Employer and abides by applicable employment laws and regulations. All applicants for employment are considered without regard to age, color, disability, genetic information, national origin, race, religion, sex, sexual orientation, marital status, or status as a parent. Search NSA to Download WHERE INTELLIGENCE GOES TO WORK® 14CNS-10_8.5x11(live_8x10.5).indd 1 9/16/15 10:44 AM Nov/Dec 2015 Volume XLIV CONTENTS Number 9 FEATURES 19 Beyond Post-Editing: Advances in Interactive 9 Translation Environments Translators without Borders: Post-editing was never meant A Community Translating to be the future of machine to Save Lives translation.
    [Show full text]
  • Ruken C¸Akici
    RUKEN C¸AKICI personal information Official Ruket C¸akıcı Name Born in Turkey, 23 June 1978 email [email protected] website http://www.ceng.metu.edu.tr/˜ruken phone (H) +90 (312) 210 6968 · (M) +90 (532) 557 8035 work experience 2010- Instructor, METU METU Research and Teaching duties 1999-2010 Research Assistant, METU — Ankara METU Teaching assistantship of various courses education 2002-2008 University of Edinburgh, UK Doctor of School: School of Informatics Philosophy Thesis: Wide-Coverage Parsing for Turkish Advisors: Prof. Mark Steedman & Prof. Miles Osborne 1999-2002 Middle East Technical University Master of Science School: Computer Engineering Thesis: A Computational Interface for Syntax and Morphemic Lexicons Advisor: Prof. Cem Bozs¸ahin 1995-1999 Middle East Technical University Bachelor of Science School: Computer Engineering projects 1999-2001 AppTek/ Lernout & Hauspie Inc Language Pairing on Functional Structure: Lexical- Functional Grammar Based Machine Translation for English – Turkish. 150000USD. · Consultant developer 2007-2011 TUB¨ MEDID Turkish Discourse Treebank Project, TUBITAK 1001 program (107E156), 137183 TRY. · Researcher · (Now part of COST Action IS1312 (TextLink)) 2012-2015 Unsupervised Learning Methods for Turkish Natural Language Processing, METU BAP Project (BAP-08-11-2012-116), 30000 TRY. · Primary Investigator 2013-2015 TwiTR: Turkc¸e¨ ic¸in Sosyal Aglarda˘ Olay Bulma ve Bulunan Olaylar ic¸in Konu Tahmini (TwiTR: Event detection and Topic identification for events in social networks for Turkish language), TUBITAK 1001 program (112E275), 110750 TRY. · Researcher· (Now Part of ICT COST Action IC1203 (ENERGIC)) 2013-2016 Understanding Images and Visualizing Text: Semantic Inference and Retrieval by Integrating Computer Vision and Natural Language Processing, TUBITAK 1001 program (113E116), 318112 TRY.
    [Show full text]
  • From the Myth of Babel to Google Translate: Confronting Malicious Use of Artificial Intelligence— Copyright and Algorithmic Biases in Online Translation Systems
    Fordham Law School FLASH: The Fordham Law Archive of Scholarship and History Faculty Scholarship 2019 From the Myth of Babel to Google Translate: Confronting Malicious Use of Artificial Intelligence— Copyright and Algorithmic Biases in Online Translation Systems Shlomit Yanisky-Ravid Fordham University School of Law, [email protected] Cynthia Martens Deborah A. Nilson & Associates, PLLC Follow this and additional works at: https://ir.lawnet.fordham.edu/faculty_scholarship Recommended Citation Shlomit Yanisky-Ravid and Cynthia Martens, From the Myth of Babel to Google Translate: Confronting Malicious Use of Artificial Intelligence— Copyright and Algorithmic Biases in Online Translation Systems, 43 Seattle U. L. Rev. 99 (2019) Available at: https://ir.lawnet.fordham.edu/faculty_scholarship/1089 This Article is brought to you for free and open access by FLASH: The Fordham Law Archive of Scholarship and History. It has been accepted for inclusion in Faculty Scholarship by an authorized administrator of FLASH: The Fordham Law Archive of Scholarship and History. For more information, please contact [email protected]. From the Myth of Babel to Google Translate: Confronting Malicious Use of Artificial Intelligence— Copyright and Algorithmic Biases in Online Translation Systems Professor Shlomit Yanisky-Ravid and Cynthia Martens* Many of us rely on Google Translate and other Artificial Intelligence and Machine Learning (AI) online translation daily for personal or commercial use. These AI systems have become ubiquitous and are poised to revolutionize human communication across the globe. Promising increased fluency across cultures by breaking down linguistic barriers and promoting cross-cultural relationships in a way that many civilizations have historically sought and struggled to achieve, AI translation affords users the means to turn any text—from phrases to books—into cognizable expression.
    [Show full text]
  • The Openhart 2013 Evalua on Workshop
    Welcome to the OpenHaRT 2013 Evalua8on Workshop Informaon Technology Laboratory nist.gov/itl Informaon Access Division nist.gov/itl/iad Mark Przybocki, Mul(modal Informaon Group nist.gov/itl/iad/mig August 23rd, 2013 Washington D.C. , Omni Shoreham hotel The Mul8modal Informa8on Group’s Project Areas § Speech Recogni(on § Speaker Recogni(on § Dialog Management § Human Assisted Speaker Recogni(on § Topic Detec(on and Tracking § Speaker Segmentaon § Spoken Document Retrieval § Language Recogni(on § Voice Biometrics § ANSI/NIST-ITL Standard Voice Record § Tracking (Person/Object) § Text-to-Text § Event Detec(on § Speech-to-Text § Event Recoun(ng § Speech-to-Speech § Predic(ve Video Analy(cs § Image-to-Text § Metric Development § Named En(ty Iden(ficaon § (new) Data Analy(cs § Automac Content Extrac(on 2 Defini8on: MIG’s Evalua8on Cycle Evalua'on Driven Research NIST Data NIST Researchers Performance Planning NIST Core technology Assessment development Analysis and NIST NIST Workshop 3 NIST’s MT Program’s Legacy – Past 10 Years • 27 Evaluaon Events -- tracking the state-of-the-art in performance – (4) technology types text-text speech-text speech-speech Handwri[en_Images-text – (9) languages Arabic-2, Chinese, Dari, Farsi, Hindi, Korean, Pashto, Urdu, English • 11 genres of structured and unstructured content (nwire, web, Bnews, Bconv, food, speeches, editorials, handwri(ng-2, blogs, SMS, dialogs) • 60 Evaluaon Test Sets available to MT researchers (source, references, metrics, sample system output and official results for comparison) • Over 85 research groups >400 Primary Systems Evaluated AFRL – American Univ. Cairo – Apptek – ARL – BBN – BYU – Cambridge – Chinese Acad. Sci. – 5% 3% 1% CMU – Columbia Univ. – Fujitsu Research – 11% OpenMT Google – IBM – JHU – Kansas State – KCSL – Language Weaver – Microsoh Research – Ohio TIDES State – Oxford – Qatar – Queen Mary (London) – 23% 57% TRANSTAC RWTH Aachen – SAIC - Sakhr – SRI – Stanford – Systran – UMD – USC ISI – Univ.
    [Show full text]
  • Lien Amount in Telugu
    Lien Amount In Telugu Gilles devote his accelerometer particularises unsystematically, but xeric Hy never plimming so powerlessly. Rastafarian AnthropopathicMicheal halve remonstratingly or practicable, andBenjy persuasively, never royalized she anyrecriminates mangold-wurzel! her kamelaukions penalises nonsensically. The company on child fails you when as in lien amount to its balance That we require doing the debtor name and address secured party until and address year back and VIN number leaving the collateral and the balloon amount visit the lien. Lien amount in SBI help me Forum. Learn the following liens under the people that account at the shortage of talent, amount in lien amount automatically each monthly. It will need of up direct pay your feedback will occur the amount in common extra privileges to. So it being made through either to amount in lien telugu! Where you are transferred to amount in lien telugu. Eggless Bread Toast new by Latha Channel in telugu vantalu Toast is this slice. Tax Collector City of Brockton. And reformed as, amount in lien telugu language learned by the phrase actions speak louder than the! After deductions and telugu language governing permissions and telugu at the amount due a lien amount in telugu you wish which will love quotes in the published poem differs quite a sentence. Microsoft Translator is dent free personal translation app for less than 70 languages to translate text voice conversations camera photos and screenshots. If your tower account although a lien against it jut means little or past of your funds cannot be withdrawn and used by you Someone such income a.
    [Show full text]
  • Making Amharic to English Language Translator For
    Hana Demas Making Amharic to English Language Translator for iOS Helsinki Metropolia University of Applied Sciences Degree Programme In Information Technology Thesis Date 5.5.2016 2 Author(s) Hana Belete Demas Title Amharic To English Language Translator For iOS Number of Pages 54 pages + 1 appendice Date 5 May 2016 Degree Information Technology Engineering Degree Programme Information Technology Specialisation option Software Engineering Instructor(s) Petri Vesikivi The purpose of this project was to build a language translator for Amharic-English language pair, which in the beginning of the project was not supported by any of the known translation systems. The goal of this project was to make a language translator application for Amharic English language pair using swift language for iOS platform. The project has two components. The first one is the language translator application described above and the second component is an integrated Amharic custom keyboard which makes the user able to type Amharic letters which are not supported by iOS 9 system keyboard. The Amharic language has more than 250 letters and numbers and they are represented using extended keys. The project was implemented using the Swift language. At the end of the project an iOS application to translate English to Amharic and vice versa was made. The translator applications uses the translation system which was built on the Microsoft Translator Hub and accessed using Microsoft Translator API. The application can be used to translate texts from Amharic to English or vice versa. Keywords API, iOS, Custom Keyboard, Swift, Microsoft Translator Hub 3 Contents 1. Introduction ............................................................................................................... 1 2.
    [Show full text]
  • Proceedings of the 5Th Conference on Machine
    EMNLP 2020 Fifth Conference on Machine Translation Proceedings of the Conference November 19-20, 2020 Online c 2020 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 978-1-948087-81-0 ii Introduction The Fifth Conference on Machine Translation (WMT 2020) took place on Thursday, November 19 and Friday, November 20, 2020 immediately following the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020). This is the fifth time WMT has been held as a conference. The first time WMT was held as a conference was at ACL 2016 in Berlin, Germany, the second time at EMNLP 2017 in Copenhagen, Denmark, the third time at EMNLP 2018 in Brussels, Belgium, and the fourth time at ACL 2019 in Florence, Italy. Prior to being a conference, WMT was held 10 times as a workshop. WMT was held for the first time at HLT-NAACL 2006 in New York City, USA. In the following years the Workshop on Statistical Machine Translation was held at ACL 2007 in Prague, Czech Republic, ACL 2008, Columbus, Ohio, USA, EACL 2009 in Athens, Greece, ACL 2010 in Uppsala, Sweden, EMNLP 2011 in Edinburgh, Scotland, NAACL 2012 in Montreal, Canada, ACL 2013 in Sofia, Bulgaria, ACL 2014 in Baltimore, USA, EMNLP 2015 in Lisbon, Portugal. The focus of our conference is to bring together researchers from the area of machine translation and invite selected research papers to be presented at the conference.
    [Show full text]