Country, Nation, and Language Machine Translation in Iceland
Total Page:16
File Type:pdf, Size:1020Kb
M.A. ritgerð Country, Nation, and Language Machine Translation in Iceland Anna Caroline Wagner Júni 2021 ÍSLENSKU- OG MENNINGARDEILD Háskóli Íslands Hugvísindasvið Íslensku- og menningardeild Þýðingafræði Country, Nation, and Language Machine translation in Iceland Ritgerð til M.A.-prófs (30 einingar) Anna Caroline Wagner Kt.: 010888-3879 Leiðbeinandi: Gauti Kristmannsson Maí 2021 Abstract The Icelandic language is inextricably linked to the definition to the self-image of Icelanders. Icelandic and the Icelandic literary tradition were major influences on the Independence movement on the island and thus the founding of the nation-state. The Holy Trinity of country, nation, and language was invented during the period of nationalism in the 19th century and the influences continue into this day regarding language preservation and attitude towards translations. Translations occupy a peripheral position in the literary polysystem and are expected to be domesticated. Machine translations often have a foreignizing effect, which can be rejected especially if translations occupy a peripheral (or weak) position within the literary polysystem. This work looks at language technology in Iceland, especially machine translation, and assumes an interpretive and analytical standpoint. Currently a language technology plan called Máltækniáætlun 2018-2022 is in place, which develops a machine translation system as an assistance tool for translators. This thesis covers general and local chances and challenges in machine translations and applies the skopos theory by Hans Vermeer to machine translation systems. Translation theories are mapped out in regard to machine translation and historical comparisons are made, especially towards evaluations. Finally, machine bias is discussed and how it can present in the data used to train the Icelandic machine translation systems. Ágrip Íslensk tunga er stór hluti af sjálfsmynd Íslendinga. Tungumálið og bókmenntahefðin höfðu mikil áhrif á sjálfstæðisbaráttu Íslendinga og í kjölfar þess stofnun þjóðríkisins. Hin heilaga þrenning „land, þjóð og tunga“ var fundin upp á tíma þjóðernishyggjunnar á 19. öld og áhrifin eru sjáanleg allt til dagsins í dag í varðveislu tungumálsins og afstöðu til þýðinga. Þýðingar skipa jaðarstöðu í bókmenntakerfinu og búist er við að þær séu aðlagaðar að heimamenningunni. Þar af leiðir að vélaþýðingar hafa oft framandi áhrif, sem kann að leiða til þess að þeim sé hafnað, sérstaklega í samhengi stöðu þýðinga í bókmenntakerfinu. Í ritgerð þessari er litið á máltækni, sérstaklega vélþýðinga á Íslandi, og gengið út frá túlkandi og sundugreinandi sjónarmiðum. Núna er Máltækniáætlun 2018-2022 virk, þar sem vélþýðingarkerfi er þróað til að aðstoða þýðendur. Þessi ritgerð fjallar um almenn og staðbundin tækifæri og áskoranir í vélþýðingum og beitir Skopos-kenningunni Hans Vermeers á vélþýðingar. Kenningar úr þýðingafræði eru kortlagðar með tilliti til vélþýðinga og sögulegur samanburður, sérstaklega gagnvart mati á þýðingum, gerður. Að lokum er fjallað um hlutdrægni í gögnum (machine bias) og hvernig hún getur komið fram í gögnum sem notuð eru til að þjálfa íslenska vélþýðingarkerfið. Huga mínum Table of Contents 1 Introduction ........................................................................................................................... 1 2 Language Technology ........................................................................................................... 5 2.1 Historical Overview ........................................................................................................ 8 2.2 Methodology of Machine Translations ......................................................................... 15 2.2.1 First Generation Machine Translation Systems ..................................................... 16 2.2.2 Corpus-Based Machine Translation ....................................................................... 19 2.2.3 Neural Networks and Deep Learning .................................................................... 25 2.3 Human Translator Aids ............................................................................................... 30 2.4 Challenges to Machine Translations ............................................................................. 31 2.5 Evaluation of Machine Translation .............................................................................. 35 3 Icelandic Language Technology.......................................................................................... 38 3.1 Language Technology for Small Languages ................................................................. 38 3.2 Icelandic Language and the Icelandic Independence Movement ............................... 43 3.3 Language Technology for Icelandic ............................................................................. 46 3.4 Open and Closed Machine Translation Systems for Icelandic ..................................... 51 4 Translation Theories and Translations in Icelandic ............................................................ 57 4.1 Concepts of Western Translation Theories .................................................................. 58 4.2 Meaning and Equivalence ............................................................................................. 61 4.3 Functionalist Approaches ............................................................................................ 68 4.3.1 The Structuralist Approach ................................................................................... 68 4.3.2 Holmes´ Map and Literary Polysystem Theory .................................................... 69 4.3.4 Descriptive Translation Studies and Skopos Theory ............................................. 77 4.4 Approaches from Cultural Studies ............................................................................... 83 5 Conclusion .......................................................................................................................... 94 Bibliography .......................................................................................................................... 95 Table of Figures Figure 1 Vauqois' Triangle ...................................................................................................... 16 Figure 2 Direct MT System ..................................................................................................... 17 Figure 3 Interlingua model with two language pairs ............................................................. 18 Figure 4 Transfer model with two language pairs .................................................................. 18 Figure 5 "Máltæknivistkerfið" / "The Language Technology Ecosystem" ............................ 49 Figure 6 Holmes's Map .......................................................................................................... 73 Figure 7 The relations between function, product, and process in translation ...................... 77 1 Introduction Iceland is small island in the North Atlantic Ocean with Icelandic as the official language. What is the defining characteristic of Iceland, the Icelandic language, or the nature? might be one of the biggest questions to ask about Iceland's self-definition.1 Iceland and Icelandic have always been politically interwoven. The close ties between a country and its national language are not unique to Iceland by any means, but Iceland has a certain uniqueness as a micro-state. Iceland recently took a seat at the steering committee at the UNESCO Global Task Force for Making a Decade of Action for Indigenous Languages to bring greater global attention to the critical situation of indigenous languages. 2 Iceland has at certain points been both Norwegian and Danish rule, yet the language has changed less than the other Germanic languages, presumably because of its geographical isolation. 3 If asked what makes Iceland Icelandic, the answer is usually: the language. When 25% of the Icelandic nation came together in 1994 to celebrate the semicentennial anniversary of the Icelandic republic, the question of that exactly makes Iceland a nation was asked. The answer was unmistakably the Icelandic language according to former MP Páll Pétursson. Það er öðru fremur tungan. Hún tengir okkur saman og gerir okkur að sérstökum hópi í samfélagi veraldarinnar […]. Hún varðveitir menningararf fyrri alda og gefur okkur eigin sögu sem kemur okkur við og tengir okkur við fortíðina og landið sem við byggjum.4 It is above all the language. It connects us and constitutes us as a special group in the global community. […] It sustains the cultural heritage of past centuries and gives us our own narrative which applies to us and connects us to the past and the land that we build. 1 Guðmundur Hálfdanarson, "From Linguistic Patriotism to Cultural Nationalism: Language and Identity in Iceland," Languages and identities in historical perspective (2005). 2 "Ísland Í Stýrihóp Unesco Vegna Áratugar Frumbyggjamála," mbl.is, https://www.mbl.is/frettir/innlent/2021/04/19/island_i_styrihop_unesco_vegna_aratugar_frumb yggjam/. And "Unesco Launches the Global Task Force for Making a Decade of Action for Indigenous Languages," https://en.unesco.org/news/unesco-launches-global-task-force-making-decade-action- indigenous- languages?fbclid=IwAR0keeVLDwjswW3QwolM3AwgWlzLStdq31StkVBqUQ8H9vce6LRJvHRJJtg. 3 Höskuldur Thráinsson, The Syntax of Icelandic, ed. B. Comrie P. Austin, J. Bresnan, D. Lightfoot, I. Roberts, N. V. Smith, Cambridge Syntax Guides