M.A. ritgerð

Country, Nation, and Language Machine in Iceland

Anna Caroline Wagner

Júni 2021

ÍSLENSKU- OG MENNINGARDEILD

Háskóli Íslands Hugvísindasvið Íslensku- og menningardeild Þýðingafræði

Country, Nation, and Language

Machine translation in Iceland

Ritgerð til M.A.-prófs (30 einingar) Anna Caroline Wagner Kt.: 010888-3879

Leiðbeinandi: Gauti Kristmannsson Maí 2021

Abstract

The Icelandic language is inextricably linked to the definition to the self-image of Icelanders. Icelandic and the Icelandic literary tradition were major influences on the Independence movement on the island and thus the founding of the nation-state. The Holy Trinity of country, nation, and language was invented during the period of nationalism in the 19th century and the influences continue into this day regarding language preservation and attitude towards . Translations occupy a peripheral position in the literary polysystem and are expected to be domesticated. Machine translations often have a foreignizing effect, which can be rejected especially if translations occupy a peripheral (or weak) position within the literary polysystem. This work looks at language technology in Iceland, especially , and assumes an interpretive and analytical standpoint. Currently a language technology plan called Máltækniáætlun 2018-2022 is in place, which develops a machine translation system as an assistance tool for translators. This thesis covers general and local chances and challenges in machine translations and applies the by Hans Vermeer to machine translation systems. Translation theories are mapped out in regard to machine translation and historical comparisons are made, especially towards evaluations. Finally, machine bias is discussed and how it can present in the data used to train the Icelandic machine translation systems.

Ágrip

Íslensk tunga er stór hluti af sjálfsmynd Íslendinga. Tungumálið og bókmenntahefðin höfðu mikil áhrif á sjálfstæðisbaráttu Íslendinga og í kjölfar þess stofnun þjóðríkisins. Hin heilaga þrenning „land, þjóð og tunga“ var fundin upp á tíma þjóðernishyggjunnar á 19. öld og áhrifin eru sjáanleg allt til dagsins í dag í varðveislu tungumálsins og afstöðu til þýðinga. Þýðingar skipa jaðarstöðu í bókmenntakerfinu og búist er við að þær séu aðlagaðar að heimamenningunni. Þar af leiðir að vélaþýðingar hafa oft framandi áhrif, sem kann að leiða til þess að þeim sé hafnað, sérstaklega í samhengi stöðu þýðinga í bókmenntakerfinu. Í ritgerð þessari er litið á máltækni, sérstaklega vélþýðinga á Íslandi, og gengið út frá túlkandi og sundugreinandi sjónarmiðum. Núna er Máltækniáætlun 2018-2022 virk, þar sem vélþýðingarkerfi er þróað til að aðstoða þýðendur. Þessi ritgerð fjallar um almenn og staðbundin tækifæri og áskoranir í vélþýðingum og beitir Skopos-kenningunni Hans Vermeers á vélþýðingar. Kenningar úr þýðingafræði eru kortlagðar með tilliti til vélþýðinga og sögulegur samanburður, sérstaklega gagnvart mati á þýðingum, gerður. Að lokum er fjallað um hlutdrægni í gögnum (machine bias) og hvernig hún getur komið fram í gögnum sem notuð eru til að þjálfa íslenska vélþýðingarkerfið.

Huga mínum

Table of Contents

1 Introduction ...... 1 2 Language Technology ...... 5 2.1 Historical Overview ...... 8 2.2 Methodology of Machine Translations ...... 15 2.2.1 First Generation Machine Translation Systems ...... 16 2.2.2 Corpus-Based Machine Translation ...... 19 2.2.3 Neural Networks and Deep Learning ...... 25 2.3 Human Translator Aids ...... 30 2.4 Challenges to Machine Translations ...... 31 2.5 Evaluation of Machine Translation ...... 35 3 Icelandic Language Technology...... 38 3.1 Language Technology for Small Languages ...... 38 3.2 Icelandic Language and the Icelandic Independence Movement ...... 43 3.3 Language Technology for Icelandic ...... 46 3.4 Open and Closed Machine Translation Systems for Icelandic ...... 51 4 Translation Theories and Translations in Icelandic ...... 57 4.1 Concepts of Western Translation Theories ...... 58 4.2 Meaning and Equivalence ...... 61 4.3 Functionalist Approaches ...... 68 4.3.1 The Structuralist Approach ...... 68 4.3.2 Holmes´ Map and Literary Polysystem Theory ...... 69 4.3.4 Descriptive and Skopos Theory ...... 77 4.4 Approaches from Cultural Studies ...... 83 5 Conclusion ...... 94 Bibliography ...... 95

Table of Figures

Figure 1 Vauqois' Triangle ...... 16 Figure 2 Direct MT System ...... 17 Figure 3 Interlingua model with two language pairs ...... 18 Figure 4 Transfer model with two language pairs ...... 18 Figure 5 "Máltæknivistkerfið" / "The Language Technology Ecosystem" ...... 49 Figure 6 Holmes's Map ...... 73 Figure 7 The relations between function, product, and process in translation ...... 77

1 Introduction

Iceland is small island in the North Atlantic Ocean with Icelandic as the official language. What is the defining characteristic of Iceland, the Icelandic language, or the nature? might be one of the biggest questions to ask about Iceland's self-definition.1 Iceland and Icelandic have always been politically interwoven. The close ties between a country and its national language are not unique to Iceland by any means, but Iceland has a certain uniqueness as a micro-state. Iceland recently took a seat at the steering committee at the UNESCO Global Task Force for Making a Decade of Action for Indigenous Languages to bring greater global attention to the critical situation of indigenous languages. 2 Iceland has at certain points been both Norwegian and Danish rule, yet the language has changed less than the other Germanic languages, presumably because of its geographical isolation. 3 If asked what makes Iceland Icelandic, the answer is usually: the language. When 25% of the Icelandic nation came together in 1994 to celebrate the semicentennial anniversary of the Icelandic republic, the question of that exactly makes Iceland a nation was asked. The answer was unmistakably the Icelandic language according to former MP Páll Pétursson.

Það er öðru fremur tungan. Hún tengir okkur saman og gerir okkur að sérstökum hópi í samfélagi veraldarinnar […]. Hún varðveitir menningararf fyrri alda og gefur okkur eigin sögu sem kemur okkur við og tengir okkur við fortíðina og landið sem við byggjum.4

It is above all the language. It connects us and constitutes us as a special group in the global community. […] It sustains the cultural heritage of past centuries and gives us our own narrative which applies to us and connects us to the past and the land that we build.

1 Guðmundur Hálfdanarson, "From Linguistic Patriotism to Cultural Nationalism: Language and Identity in Iceland," Languages and identities in historical perspective (2005). 2 "Ísland Í Stýrihóp Unesco Vegna Áratugar Frumbyggjamála," mbl.is, https://www.mbl.is/frettir/innlent/2021/04/19/island_i_styrihop_unesco_vegna_aratugar_frumb yggjam/. And "Unesco Launches the Global Task Force for Making a Decade of Action for Indigenous Languages," https://en.unesco.org/news/unesco-launches-global-task-force-making-decade-action- indigenous- languages?fbclid=IwAR0keeVLDwjswW3QwolM3AwgWlzLStdq31StkVBqUQ8H9vce6LRJvHRJJtg. 3 Höskuldur Thráinsson, The Syntax of Icelandic, ed. B. Comrie P. Austin, J. Bresnan, D. Lightfoot, I. Roberts, N. V. Smith, Cambridge Syntax Guides (Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo: Cambridge University Press, 2007), 1. 4 Guðmundur Hálfdanarson, Íslenska Þjóðríkið - Uppruni Og Endimörk (Reykjavík: Hið Íslenska Bókmenntafélag, 2001), 15. My translation.

1

Vigdís Finnbogadóttir, president at the time and still today one of the key spokeswomen for language preservation in Iceland, asked the audience to:

[…] hugsa til þess að um aldir átti íslensk þjóð sér umfram allt eina réttlætingu, ein rök til þess að krefjast áheyrnar á þingum heimsins: Hún átti sér sjálfstætt tungumál og á þessu tungumáli hafði hún varðveitt minningar sínar, sögur sínar, ljóð sín, frábrugðin minningum, sögum og ljóðum annara þóða.5

[…] think about the fact, that the Icelandic nation had one raison d´être through the ages, one reason to demand an audience in the parliaments of the world: It had an independent language and had preserved its memories in that language, its stories, its poems, different from the memories, stories, and poems of other nations.

The connection between nation and language has often been cited, especially in the early and middle 19th century as European countries strived for independence. Johann Gottfried Herder's (1744-1803) theories on languages and nations mark the precursor to nationalism in Europe. He believed that nations ultimately evolve out of the constant interplay between mankind and nature. Language thus mirrors the natural constitution of societies because they evolve and change in the constant battle of mankind with their environment.6 Nationalism in mainland Europe has brought on centuries of war and destruction. In Iceland, the struggle of nationalism was always been more of a political nature. Due to its geographic isolation and late industrialization, Iceland did not directly participate in any of the European conflicts during the rise of the national states in the 19th and 20th centuries. The independence of its literary tradition (including The Sagas of Icelanders, the Poetic Edda, and Prose Edda, to name a few) have often been associated with the fight for sovereignty.7 As Ástráður Eysteinsson points out, the holy trinity of country, nation, and language was established or “invented” during the Romantic period of Icelandic literature.8 Land, þjóð og tunga (country, nation, and language) is a poem by Snorri

Hjartarson from 1949.9 The poem has been used emblematically in political

5 Ibid., 16. My translation. 6 Ibid., 19. 7 Ástráður Eysteinsson, Tvímæli. Þýðingar Og Bókmenntir, ed. Matthías Viðar Sæmundsson Ástraður Eysteinsson, Fræðirit (Reykjavík: Háskólaútgafan, 1996), 46-47. 8 Ibid., 234. 9 Snorri Hjartarson, Land Þjóð Og Tunga, ed. Lesbók Morgunblaðsins (1949).

2 speeches throughout the 20th century as well as cited in discussions on the conservation of the Icelandic language and culture.10

There have been discussions on whether translations should also be considered a part of the national literary heritage. According to some scholars like Ástráður Eysteinsson, translations should be counted to some extent.11 Eysteinsson mentions that translations are in most cases overlooked or not included in comprehensive anthologies of Icelandic canonic literature (he calls this translation blindness) 12 but are instead sustained in other areas such as “commercial” or “popular” literature intended for children and young adults, or fairy tales. 13 The translation of foreign concepts and words into Icelandic via the creation of new Icelandic words in fact protects the language. This in turn can create a complacency whereby all cultural boundaries are annihilated and the resistance within translations negated as the foreignness of the culture and material is wiped away. 14 Since the status of translations of literary works within the Icelandic literary canon already have a complicated status, how do machine translations fare? Some might argue that MT does not over-domesticate a text, but rather reminds the reader that it is a translation by foreignizing it.15

The Icelandic language community consists of roughly 330 000 speakers. Most are in Iceland, but this number also counts expats around the world. Some second- and third- generation emigrants in North America speak Icelandic as their mother tongue. In the last decade, more and more immigrants have moved to Iceland

10 Another Master's thesis about the self-image of Icelanders in times of globalization, published within the Faculty of Political Science in 2010, takes its name from this poem. Adda María Jóhannsdóttir, ""Land, Þjóð Og Tunga - Þrenning Sönn Og Ein." Þjóðerni Og Sjálfsmynd Á Tímum Hnattvæðingar" (Háskóli Íslands, 2010). 11 Eysteinsson, 244. 12 Gauti Kristmannsson, "Theory, World Literature, and the Problem of ," in Untranslatability Goes Global, ed. Katie Lateef-Jan Suzanne Jill Levine, Routledge Advances in Translation and Interpreting Studies (London, New York: Routledge, 2018), 129. 13 Eysteinsson, 223, 55. 14 Ibid., 246. 15 Peter Constantine, "Google Translate Gets Voltaire: Literary Translation and the Age of Artificial Intelligence," Contemporary French and Francophone Studies 23, no. 4 (2020): 474.

3 and speak Icelandic as their second language.16 Icelandic is a morphologically rich Northern Germanic language and is one of the Insular Scandinavian Languages (to which also includes Faroese). 17 Nominal categories include four cases, and the inflectional paradigms of the nouns vary depending on the inflectional class of the noun and gender. When nouns are modified with adjectives, the adjective follows the noun in gender, case, and number. 18 Icelandic is the official language of trade, commerce, politics, the educational system, and all day-to-day interactions in the country.19 The writing system builds on Latin script with some special characters that are not seen in modern English. One such letter is only used in Icelandic (Þ,þ).20 In order to display Icelandic correctly, technology must support the ISO 8859-1 standard to display the non-ASCII characters (á, é, í, ó, ú, ý, ð, þ, æ, ö, Á, É, Í, Ó, Ú, Ý, Ð, Þ, Æ, Ö) correctly.21

Icelandic presents certain challenges within machine translations, be it due to the size of the language community or the richness of its morphology. For the last 20 years, there has been an active community of experts working within language technology in Iceland. As development of language technology costs the same irrespective of the user base, the development of Icelandic language technology is comparably expensive; this work will cast a look at why the Icelandic government subsidizes the development.

The first chapter gives an introduction into machine translation without relating to research in Iceland specifically. The second part looks at small language communities and how they cope with language technologies, with a focus on Iceland. The third part looks at translation theory and how it can be applied to the focus of Icelandic language technology research. This thesis assumes an interpretive and

16 Kristín M. Jóhannsdóttir Eiríkur Rögnvaldsson, Sigrún Helgadóttir, Steinþór Steingrímsson, Íslensk Tunga Á Stafrænni Öld. The Icelandic Language in the Digital Age., ed. Georg; Uszkoreit Rehm, Hans, Hvítbókarröð = White Paper Series (Springer Verlag, 2012), 9. 17 Ibid., 24. 18 Thráinsson, 1-3. 19 Eiríkur Rögnvaldsson, 41. 20 Ibid., 11. 21 Eiríkur Rögnvaldsson, "Icelandic Language Technology Ten Years Later," (2008): 11.

4 analytical research of machine translation and the way translations are interwoven within society in the Icelandic context.

2 Language Technology

Within the field of natural language processing (NLP) lies the dream of machine translation, which encompasses mankind´s ancient dream of having an an automatic translation of all possible language combinations available in the blink of an eye. As Harold Somers points out, terms such as machine translation can be misleading, as hardly anybody calls computers “machines” anymore. 22 Furthermore, machine translations (MT) are only one kind of translation tool among multiple available translation options.23 The usage of the term MT has nevertheless established itself. Translation in our current time is as important as ever, as the following quote from almost 20 years ago shows:

The snowballing acceleration of available information, the increase in intercultural encounters, and the continuing virtualization of private and business life have resulted in drastic and lasting changes in the way translators work.24

MT is part of NLP in the form of computer-assisted human-human interactions, while other tasks might include human-computer interactions (chatbots, etc.) or machine access to human-human interactions (digital discovery in corpora).25 According to Bender and Lascarides, the more these NLP systems are able to have access to the meaning that speakers convey with natural language, the more useful and effective they will be.26 The jocular answer to the big question of when fully automated high- quality MT will be achieved has been answered with “in five years' time” for decades,

22 Harold Somers, "Introduction," in Computers and Translation: A Translator's Guide, ed. Harold Somers, Benjamins Translation Library (Amsterdam / Philadelphia: John Benjamins Publishing Company, 2003), 1. It is also important to note that NLP does not include formal languages, including programming languages or „computer languages“ which will display their own syntax and semantics as well as occassionally dialects. 23 Frank Austermühl, Electronic Tools for Translators, ed. Anthony Pym, Translation Practices Explained (Manchester, Northampton: St. Jerome Publishing, 2001), 1. 24 Ibid. 25 Alex Lascarides Emily M. Bender, Linguistic Fundamentals for Natural Language Processing Ii, ed. Graeme Hirst, Synthesis Lectures on Human Language Technologies (Toronto: Morgan & Claypool Publishers, 2020), 1. 26 Ibid.

5 which points to the underlying challenges the field faces.27 Some argue that MT is a sub-field of computational linguistics (CL) or natural language processing (NLP), while historically it is factually the other way around.28

The way our globalized world is organized, both in technological, political, and economic dimensions, calls for translation and localization in an ever-increasing manner. Translation means transferring the meaning from a source language to (a) target language(s), while localization means translating words and graphics and adapting products to the specific norms of the target culture. 29 However, even translation is not “just” knowing, understanding, and transferring the usual meaning of words. Rather it is “[…] necessary to convey the meaning of the entire message, not just transfer words from one language to another.”30 Humans still have a cognitive advantage over computational units in that instance, but traditional human translation cannot meet all the translation needs of a globalized world. 31 The singularity, when MT is expected to reach human translation in quality, has now been predicted to occur in about a decade's time.32

One question that remains is whether translation is (still) necessary? English is the dominant language – the lingua franca – of science, technology, international politics, and business. The importance of English for economic growth is reflected in the investment numbers within the Organization for Economic Co-operation and Development (OECD). English-speaking countries receive about three times more foreign investments than non-English speaking countries. 33 Nonetheless, the European Union has granted all member states the privilege to conduct their official

27 Philip Koehn, Neural Machine Translation (Cambridge: Cambridge University Press, 2020), 29. 28 Zhang Xiaojun Liu Qun, "Machine Translation. General.," in The Routledge Encyclopedia of Translation Technology, ed. Chan Sin-wai (London, New York: Routledge, 2015), 105. 29 Austermühl, 146. 30 Caitlin Christianson Joseph Olive, John McCary, Handbook of Natural Language Processing and Machine Translation. Darpa Global Autonomous Language Exploitation (New York, Dordrecht, Heidelberg, London: Springer, 2011), vii. 31 Gabriel Armand Djiako, Lexical Ambiguity in Machine Translation and Its Impact on the Evaluation of Output by Users (Saabrücken: Universitätsverlag des Saarlandes, 2019), 19. 32 Jaap van der Meer, "Translation Technology - Past, Present and Future," in The Bloomsbury Companion to Studies, ed. Maureen Ehrensberger-Dow Erik Angelone, Gary Massey, Bloomsbury Companions (London, New York, Oxford, New Delhi, Sydney: Bloomsbury Academic, 2020), 307. 33 Austermühl, 2.

6 business within EU institutions in their own language.34 As will be discussed later in this work, this stance was an important stepping stone in developing language technology, as all of the translations produced by about 4000 translators were used as a corpus to train automated translation systems. Even though English may be dominant, other languages remain important in a globalized world: “A powerful catalyst for translation has thus been created by the rapid internationalization of markets, particularly by the need to localize not only products but also the methods of designing, producing, marketing and distribution.”35 Reducing the time needed to produce a translation has been one of the main arguments for the mechanization of translations since research in the field began.36 Translation is indeed still very much needed.

Another relevant question is whether machines will ever be able to adequately translate high literature? Most translations are of more mundane material like instructional manuals, user interfaces, publicity leaflets, scientific papers and books, commercial and business transactions, etc. According to MT researchers, these technical translations are often repetitive and tedious, yet require accuracy and consistency. Machine assisted translations and translation aids are well suited to this kind of translation work. 37 scholars disagree with this statement, arguing that the work process involves a lot of creativity and cultural adaptivity. 38 To categorize the different means of translation tools, different categorization tools have been proposed, first by Alan Melby in the 1980s. Hutchins and Somers proposed a categorization according to the involvement of humans versus the mechanization, which will be further discussed in the chapter on computer-assisted translation (CAT). Frank Austermühl claimed in 2001 that: “[…] the idea of an independently acting, error-free translating machine is equally

34 Ibid. 35 Ibid., 4. 36 Djiako, 20. 37 Harold L. Somers W. John Hutchins, An Introduction to Machine Translation (London, San Diego, New York, Boston, Sydney, Tokyo, Toronto: Academic Press, 1992), 2. 38 Jody Byrne, Scientific and Technical Translation Explained, ed. Sharon O´Brien Sara Laviosa, Kelly Washbourne, Translation Practices Explained (Manchester, Kinderhook: St. Jerome Publishing, 2012), 5,161.

7 unrealistic [as a lone human translator with a pencil and typewriter] and will not become a reality for a long time to come, if at all.”39

2.1 Historical Overview The main issues that still prevail in MT are the problem of speed versus accuracy, and the approach to translation itself, be it simple word-for-word dictionary look-up, interlingual approaches, or transfer systems. John Hutchins introduced the word dissemination to describe humans the editing of MT output by humans, and he uses assimilation to describe readers wanting to get a rough idea about the text (sometimes called gisting). He believes that they serve fundamentally different needs. He mentions further that systems created for personal use usually favored assimilation, while professional systems leaned towards dissemination.40

The crude beginnings of MT can likely be traced back to the 17th century, according to Hutchinson and Somers. They argue that both Descartes and Leibniz speculated about dictionaries organized based on universal numerical codes. Cave Beck, Athanasius Kircher, and Johann Becher had all published actual examples by the middle of that century. Contemporary ideas at the time encapsulated the idea of a universal, unambiguous language that humanity could use to communicate without misunderstandings.41 It was to be based on logical principles and iconic symbols and various attempts were made throughout the centuries, most notably Esperanto.42 Until the middle of the 20th century only a few attempts were made to mechanize translations. In 1933, two patents were granted in France and Russia for mechanical translation systems. The invention in France entailed a storage device on paper tape, which was to be used to find equivalent words in a different language. The Russian

39 Frank Austermühl, Electronic Tools for Translators, ed. Anthony Pymibid. (Manchester, Northampton2001), 11. 40 John Hutchins, "Multiple Uses of Machine Translation and Computerised Translation Tools," in International Symposium on Data and Sense Mining, Machine Translation and Controlled Languages - ISMTCL 2009 (centre Tesnière, université of Franche-Comté, Besancon,̧ France: Besancon:̧ Presses universitaires de Franche-Comté, 2009), 13-14. 41 W. John Hutchins, 5. 42 Other examples would include Interslavic, which was manufactured in recent years as a language which should be easily understandable by all native speakers of a Slavonic language. Old Church Slavonic from the 9th was used in 1666 to create a pan-Slavic language and again modernized since 2006. See Jan van Steenbergen, "A Short History of Interslavic," interslavic-language.org, http://steen.free.fr/interslavic/history.html.

8 invention by Petr Smirnov-Troyanskii foresaw the MT of the second half of the century, when he envisioned three stages of translation. At the first stage an editor would undertake the “logical” analysis of the target language and arrange words according to their base form and syntactic functions. The second stage would use a machine to transform those base sentences from the source language into base sentences of a target language and the third stage entails an editor (only fluent in the target language) to produce an output in the target language.43 As we shall see later in this chapter this patent predicted the transfer approach of machine translation that was to take place from the 1960s onwards, once the computational requirements of executing such a task were met.

Research into the use computers for translating natural languages began very soon after the appearance of the first computers, or electronic calculators, developed during World War II to meet large-scale mathematical calculations for supporting military actions. Mechanical translation research at the time was not widespread and rested in the hands of very few scientists, who were mainly mathematicians and electrical engineers. Warren Weaver is the single most influential person during that early period of MT. He was a mathematician and the director of the Natural Sciences Division at the Rockefeller Foundation. Weaver was in contact with most of the leading experts in computational research and information theory research, such as Claude Shannon and Norbert Wiener. On March 4th, 1947 he drafted a letter to Norbert Wiener (also a linguist) to inquire about the possibilities of using computers for translation. He also considered the possibility of using cryptographic methods in the statistical analysis of frequencies of letters and letter-pairs to aid translation, though these ideas very soon discarded as not useful (as they were only monolingual). Indeed, translation was to be become the first non-numerical application of computers. 44 During those early years, Weaver also met with British scientists working on similar issues, namely Andrew D. Booth, an acquaintance of Alan Turing, one of the most influential theorists of computer science. Turing had a background in

43 W. John Hutchins, 6. 44 Gabriela Saldanha Mona Baker, "Routledge Encyclopedia of Translation Studies," (London, New York: Routledge, 2020), 305.

9 cryptoanalysis and thought to use computers for translations as a demonstration of their “intelligence”. Booth came to work on a mechanized dictionary look-up using punched card equipment that eventually produced difficult but understandable translations. Weaver was enormously disappointed with Booth´s mechanical dictionary, as he had expected more groundbreaking research. He aimed to connect Claude Shannon´s information theory, statistical successes in war-time code breaking and notions about language universals to arrive at a better MT output. In July of 1949, he wrote a highly influential memorandum titled “Translation” that marked the beginning of funding and research on mechanical translation in the U.S..45 Soon research teams were established at select universities, with Yehoshua Bar-Hillel as the first person ever to be appointed to a research position in mechanical translation. He believed that it was not possible to have a fully-automated MT, and he was disillusioned by the fact that accuracy and speed in MT could not be achieved at the same time.46 He foresaw the transfer approach in MT, discussed later in this chapter. Bar-Hillel was very skeptical of the value of statistical methods of using large corpora and frequency counts (it bears to mention that quick access storage units would have been an issue at the time) – this method is by far the preferred method in current research. He much preferred the establishment of a universal or general grammar, based on mathematical logic and modern structural linguistics (formalism research that was used was mainly by Zelling Harris and Noam Chomsky to a lesser extent) to create an artificial language or interlingua.47

During the first MT conference in June 1952, an interesting point was made by Victor Oswald from UCLA. He suggested that to resolve one of the greatest cruxes of MT, homonyms easily resolved by humans but not machines, one could use statistical analysis within clearly defined subfields, for example the language used by brain surgeons in the context of their work. He proposed that fields that already use a

45 John Hutchins, "Milestones in Machine Translation. Part 1 - How It All Began in 1947 and 1948," Language Today 3, no. December 1997 (1997). And The History of Machine Translation in a Nutshell, (2005), http://www.hutchinsweb.me.uk/Nutshell-2014.pdf. And W. John Hutchins, "Milestones in Machine Translation No.2," Language Today 6, no. March 1998 (1998). 46 "Milestones in Machine Translation. Part 3 - Bar Hillel's Survey, 1951," Language Today 8, no. May 1998 (1998). 47 "Milestones in Machine Translation. Part 4: The First Machine Translation Conference, June 1952," Language Today 13, no. October 1998 (1998).

10 specific, limited, technical vocabulary by convention would be more suitable for automated translation, thus promoting sublanguages.48

The period between 1952 and 1966 was characterized by a lot of optimism, but in the end did not produce any groundbreaking discoveries, and so disillusion grew. Research was at a standstill on mechanized bilingual dictionaries and some rules for the word order in the output of the target language came to a standstill. The Automatic Language Processing Advisory Committee (ALPAC), commissioned by the U.S. government, concluded in their famous 1966 report that “there is no immediate or predictable prospect of useful machine translation.”49 They extrapolated that MT was less accurate, slower, and twice as expensive as translations produced by human translators. This report marked the virtual end of funding and research in the United States for over a decade, but research continued in Europe, Canada, and Russia. Two systems launched during the aftermath of the ALPAC report are still in use today: Canada's Meteo system, launched in 1976 for translating weather reports, and the Systran system, launched in 1976 by the Commission of the European Communities.50 Much of the early research would not go unnoticed, though, and was of lasting importance to computational linguistics, artificial intelligence, and some research groups that made contributions to linguistic theory.51 The ALPAC report proposed that research should shift towards machine-aided translations. The history of MT and translation technology has thus been intertwined since the very beginning. Chan Sin-wai classified this period after the ALPAC report as a period of germination for translation technology.52 Translation memory being one of the main concepts and functions of computer-assisted translation (CAT), emerged during that period in the late 1970s and 1980s, but only became widely commercially available in the 1990s. The pioneers in the area were Alan K. Melby, Martin Kay, and Peter Arthern, all independent from each other. 53 Translation memories are bilingual databases

48 Ibid. 49 As seen in Hutchins, The History of Machine Translation in a Nutshell. 50 Ibid. 51 W. John Hutchins, 6. 52 Chan Sin-wai, "The Development of Translation Technology. 1967-2013," in The Routledge Encyclopedia of Translation Technology, ed. Chan Sin-wai (London, New York: Routledge, 2015), 3- 4. 53 Ibid., 4.

11 designed to store translated text segments (translation units) together with corresponding original texts.54 Technical documents tend to be repetitive and thus often up to 50% of overlapping elements can be recycled, resulting in faster translations with consistency of style and terminology.55 During the 1980s various smaller, consumer-oriented systems were also released for use in text-processing software and personal computers. Most of the efforts during that time were focused on indirect translations with intermediary representations, sometimes interlingual. Additional semantic, morphological, and syntactic analysis was often included.56

In the early 1990s a new approach was brought forward: the example-based translation approach based on statistical analysis of large-scale corpora. The first advances were made by Japanese teams and a group from IBM, who created the Candide system. This new approach was a continuation of rule-based systems, and the first computer-assisted translation tools (CAT) entered the market in the form of translation memory systems (such as Trados). Translation memory systems enable translators to have easily access to their own previously translated texts. More translator-based workstation modules were developed for professional translators, as well as domain-restricted systems and work on controlled language (of the input language). Another new development was the earliest research into the translation of speech, integrating three different modules of speech recognition, speech synthesis, and translation modules, with both rule-based and corpus-based approaches.57

The late 1990s saw an increasing demand for localization products, especially for software localization. It was desired that products should launch within the same timeframe, after the release of the original version (typically in English), or very shortly thereafter. MT was very helpful in this regard as the documentation, such as software manuals, was internally repetitive and changed very little between editions. Additionally, the demand for MT products for non-professional users saw a rapid increase, which was met in part by downsizing systems that were previously intended

54 Austermühl, 135. 55 Ibid., 134-35. 56 Hutchins, The History of Machine Translation in a Nutshell. 57 Ibid.

12 for professional use. It is unclear how end-users responded and used these systems, as they produce raw translations powered exclusively by MT. Their market share remained high enough to satisfy reasonable demand, but observers suspect that these use-at-home systems were barely used after the initial wave of enthusiasm and the subsequent disappointment with the MT's poor output. Handheld devices such as pocket translators (computerized versions of dictionaries and phrasebooks) saw a quick rise and fall in popularity for travelers during that time too. 58 A new field emerged concerning direct online applications, such as webpages, e-mails, and others, and along with it the need for instantaneous translations that needn't be perfect, but good enough for the text to be intelligible.59

The first online MT systems became available from the mid 1990's. These services were typically on a subscription bases of existing software and typically rule- based, such as Systran and Systran Express, Globalink by CompuServe, and Niftyserve by Fujitsu. 60 Finally, free online MT systems entered the market, thus marking the end of this era. The first system was AltaVista's Babelfish, launched in 1997 and based on various Systran systems, followed by other systems including Google Translate (with a statistical approach at the time61), currently the most-used translation system on a global scale estimated at 100 billion words a day in 103 languages.62 As Hutchins points out, free online translation services were a novelty at the time and users started “playing” with them.

Attracted by the possibilities, many users “tested” the services by inputting for translation sentences containing idiomatic phrases, ambiguous words and complex structures, and even proverbs and deliberately opaque sayings. A favorite method of “evaluation” was back translation (“to-and-from” translation), into another language and back to the original – a method which might appear valid to the uninitiated, but which is not

58 "Multiple Uses of Machine Translation and Computerised Translation Tools." 59 The History of Machine Translation in a Nutshell. 60 "Multiple Uses of Machine Translation and Computerised Translation Tools." The statistical approach has since been dismissed for a neural approach. 61 Ibid. 62 Patrick King, "Small and Medium-Sized Enterprise (Sme) Translation Service Provider as Technology User: Translation in New Zealand," in The Routledge Handbook of Translation and Technology, ed. Minako O´Hagan, Routledge Handbooks in Translation and Interpreting Studies (London, New York: Routledge, 2020), 155.

13

satisfactory. “[…] Numerous commentators have enjoyed finding fault with online MT and, by implication, with MT itself.63 By 2014 statistical machine translation (SMT) was the dominant framework of research. Rule-based methods were kept where necessary, for example with morphologically rich languages (Russian, Finnish, or Icelandic), and problems of discourse relations (treatments of pronouns). The reason for the popularity of SMT methods is the availability of large bi- and monolingual corpora, the availability of open-source software to performing SMT (alignment, filtering, reordering) and the widely accepted evaluation systems. Online translation has outlived PC-based systems and is now almost the only MT system used. 64 By 2015 the first neural machine translation (NMT) systems had achieved higher evaluation scores than SMT and are believed to be state-of-the-art, despite NMT needing larger training data sets than SMT.65

MT is increasingly used in large global companies and translation services, often with pre-processed input such as controlled language or terminology files. Outputs are also monitored and post-edited, sometimes with statistic-based methods and sometimes crowdsourced. The final frontier is the development of methods to evaluate the translation outputs, as statistical measures and human assessments often differ in their judgement. Most professional translators, who might have often been skeptical in the past about MT systems, now fully embrace them and will use terminology and translation memories in creating drafts of translations. 66 Translations of social networking sites also proves challenging, as the language is

63 Hutchins, "Multiple Uses of Machine Translation and Computerised Translation Tools." 15. During the time of writing of this thesis, there are still YouTube channels dedicated to pointing out the nonsense that online MT systems will produce if translating across various languages, such as the account TranslatorFails where videos will have up to 12 million views. The same YouTube channel received also an introductory mention in Lynne Bowkers latest book, see Jairo Buitrago Ciro Lynne Bowker, Machine Translation and Global Research: Towards Improved Machine Translation Literacy in the Scholarly Community (Bingley: Emerald Publishing Limited, 2019), 1. The popular entertainment show Tonight Show Starring Jimmy Fallon also has a segment called Google Translate Songs with some clips on YouTube with 18 million views, pointing to the popularity of these formats. 64 Hutchins, The History of Machine Translation in a Nutshell. 65 Andy Way, "Machine Translation: Where Are We Today?," in The Bloomsbury Companion to Language Industry Studies, ed. Maureen Ehrensberger-Dow Erik Angelone, Gary Massey, Bloomsbury Companions (London, New York, Oxfor, New Delhi, Sydney: Bloomsbury Academic, 2020), 316-17. 66 Hutchins, The History of Machine Translation in a Nutshell.

14 colloquial and not present in the large corpora that SMT systems are trained with.67 The language used on social media is also generally shorter than formal text (and so contains less context), and will contain frequent typos, slang, and abbreviations.68 Facebook launched its own translation system in 2008, with an emphasis on supporting low-resource languages by utilizing multilingual modeling, as well as aligning multilingual word embedding spaces, among other techniques.69

2.2 Methodology of Machine Translations It is possible to distinguish between different input systems and sources for translation. The two main approaches are speech and electronic text, as hardcopy texts are used less often in research and development due to frequent copyright issues. Both input systems have their own challenges and singular problems. Speech input presents itself without sentence boundaries, word- or phrase-boundary markers. Additionally, the uncertainty is not helped by the confusability of some phonemes. Text input presents similar challenges; orthographic separation is typically present, however in some languages such as Chinese or Semitic languages either word boundaries are not set or explicit vowel marking is nonexistent, which creates ambiguity.70

Hutchinson and Somers categorize MT systems according to the following criteria by: bilingual, for two specified languages, or multilingual, for more than two languages. Further categorization includes uni-directional, in one direction only, or bi-directional, in both directions, also called non-reversible and reversible systems respectively.71

67 "Multiple Uses of Machine Translation and Computerised Translation Tools," 16. 68 Don Husa Paco Guzman, "Expanding Automatic Machine Translation to More Languages," https://engineering.fb.com/2018/09/11/ml-applications/expanding-automatic-machine- translation-to-more-languages/. 69 Guillaume Lample Alexis Conneau, Marc´Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou, "Word Translation without Parallel Data," in ICLR 2018 (2018). And Myle Ott Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc´Aurelio Ranzato, "Phrase-Based & Neural Unsupervised Machine Translation," ArXiv abs/1804.07755 (2018). 70 Joseph Olive, vii. 71 W. John Hutchins, 70.

15

Analysis, often the first step in machine translations, is divided into morphological analysis, e.g., identification of word endings, syntactic analysis, e.g., identification of phase structures, and semantic analysis, which seeks resolution for lexical and structural ambiguities.

The three different system designs that will be discussed are the direct translation approach, the interlingua approach, and the transfer approach.72 It is helpful to look at the so-called Vauquois´ Triangle, which depicts the depth of intermediary representations used during the translation process and has been in use since 1968. Transfer approaches are subcategorized into syntactic and semantic transfer approaches.

FIGURE 1 VAUQOIS' TRIANGLE73

2.2.1 First Generation Machine Translation Systems Direct machine translations are historically the oldest and have been more or less abandoned by now. They entail MT systems that are specifically designed for one language pair, e.g., Russian as the source language and English as the target language. Source texts are barely analyzed. Rather, they were fed into a monolingual dictionary to yield a translation that was often barely comprehensible and lacked

72 Ibid., 4. 73 Originally in Vauquois 1968, here quoted from: Alan K. Melby, "Future of Machine Translation: Musings on Weaver's Memo," in The Routledge Handbook of Translation and Technology, ed. Minako O´Hagan (London, New York: Routledge, 2020), 424.

16 intermediate stages of the translation process.74 The direct approach is also called the dictionary-based approach.75 Since the results are typically unintelligible, the only way in which this method is useful is when gisting of a is needed. These rapid and crude translations can be useful for industrial and scientific organizations when expert researchers and engineers need an overview of the research undertaken elsewhere in the world, and can, if needed, order a more thorough translation when they deem the text valuable for their own objective.76

Before Empiricism took the lead in the late 1980s, rationalism, and with it Rule-Based Machine Translation (RBMT), was the main approach researched from the emergence of machine translation. RBMT is also known as knowledge-based MT as it includes syntactic, semantic, morphological, and later contextual knowledge of both the target and source language to render translations. Theoretical linguistic researchers would assembled grammatical rules and dictionaries for the systems to use. Since the emergence of statistical machine translations, more research has been put into developing hybrid systems and constraint-based grammars.77 RBMT is not preferred in modern approaches, but is still used today in both hybrid and stand- alone models. Some of the historically most important systems were RBMT systems. The models used for RBMT are the direct model, the interlingua model, and the transfer model. The models require varying degrees of depths of analysis of source and target language.

FIGURE 2 DIRECT MT SYSTEM78

74 W. John Hutchins, 4, 72. 75 Liu Qun, 110. 76 See e.g. Sylviane Cardey, "Translation Technology in France," ibid., 279. 77 Bai Xiaojing Yu Shiwen, "Rule-Based Machine Translation," ibid. (2013), 186-87. 78 W. John Hutchins, 72.

17

FIGURE 3 INTERLINGUA MODEL WITH TWO LANGUAGE PAIRS79

FIGURE 4 TRANSFER MODEL WITH TWO LANGUAGE PAIRS80 The interlingua approach seeks to convey more meaning than direct translations typically achieve. The interlingua achieves translation in two separate stages. The meaning of the source language text is transferred into the interlingua (which can be a semantic network, a knowledge representation, an artificial language (e.g., Esperanto), a natural language representation, or even a logic expression) and is then translated from the interlingua into the target language. Both translation processes will happen independently of each other, and within multilingual settings an analysis program may be linked to any number of generation programs.81 The way in which interlingual approaches seek to find a universal representation makes it very suitable to MT involving multiple languages, as it drastically reduces the number of components needed when compared to direct and transfer approaches. Interlingua can also be called a knowledge-based approach when the interlingua uses knowledge representation. Interlingua can be also called a bridge language, metalanguage, or pivot language. Recent web-based translation services will sometimes use English as

79 Ibid., 74. 80 Ibid., 75. 81 Ibid., 4.

18 a pivot language to support MT between languages that do not have a direct approach associated between them.82

The transfer approach uses a system with three stages, where all programs involved are specific to certain source and target languages. First the source language is converted into intermediate representations (with ambiguities resolved, irrespective of the target language). In the next stage the texts are converted into the target language in equivalent representations, and differences between languages such as structure and vocabulary are resolved. In the final stage the target language text is generated.83 Independent analysis yields a source-intermediary structure that can be used for MT in various target languages. In that case the characteristics of the target language are not considered when analyzing the source language into the source-intermediary structure. Independent generation is equally possible, whereby source language characteristics are not considered during the generation phase, thus creating a target intermediary structure usable for MT from various source languages. Two transfer approaches are available, named after the main transfer processes; syntactic transfer, and semantic transfer. Both systems share components of morphological and syntactic analysis, syntactic and morphological generation, while semantic transfers add a layer of semantic transfer and generation.84

2.2.2 Corpus-Based Machine Translation Corpus-Based Machine Translation has two main branches, Statistical Machine Translation (SMT) and Example-Based Machine Translation (EBMT), and finally hybrid systems that combine the strengths of both. Both have in common that the translations they produce are not governed by grammatical, but instead repurpose pre-existing translations. Both SMT and EBMT use parallel corpora as their primary source.85

82 Liu Qun, 111. 83 W. John Hutchins, 4. 84 Liu Qun, 110-11. 85 BIlly Wong Tak-ming; Jonathan J. Webster, "Example-Based Machine Translation," ibid. (2013), 137.

19

SMT was first introduced by Warren Weaver in 1949, during the advent of machine translation. He suggested that statistical techniques introduced in Claude Shannon´s information theory might be suitable for helping computers translate automatically between natural languages. Limited computer resources at the time made it impossible for this objective to see the light of day, but SMT saw a resurgence in popularity roughly 40 years later, becoming one of the most widely used and studied methods of MT. SMT is built on the principle of generating translations based on probabilistic methods estimated from parallel text (corpora) of pre-existing human translations.86 The rule of human translators for SMT has sometimes been overlooked or ignored in the academic community of SMT developers, although some will acknowledge the human translators as the source of data.87 Conversation about copyright issues and the ethics of involving human translators' output as material and continue to this day.88

When Peter Brown from IBM presented the purely statistical approach in 1988, he infamously stated that “every time I fire a linguist, my system's performance improves” 89 and made an argument for an empirical corpus-based approach rather than the rational linguist one. It later came to light that the two are not mutually exclusive, and began to be used in hybrid versions instead of in conflict with each other. Other names for SMT are analogy-based, case-based, or memory-based. They all share the same approach of using an already translated corpus or database and matching new input against that database to find suitable examples, recombining them in an analogical manner in order to produce a correct translation. 90 While RBMT requires costly, time-consuming manual development of rules that are not transferrable between languages, SMT uses a data-driven approach to acquiring translation knowledge. The translation knowledge derived automatically from

86 Zhang Min Liu Yang, "Statistical Machine Translation," ibid. (2015), 201. 87 Dorothy Kenny, "The Ethics of Machine Translation," in Proceedings of the XI NZSTI National Conference, ed. Sybille Ferner (NZSTI, 2011). 88 David Lewis Joss Moorkens, "Copyright and the Re-Use of Translation as Data," in The Routledge Handbook of Translation and Technology, ed. Minako O´Hagan (London, New York: Routledge, 2020). 89 Harold L. Somers, ""New Paradigms" in Mt: The State of the Play Once the Dust Has Settled," (2003). 90 Ibid.

20 machine-readable parallel text is not language specific and thus independent of any specific pair of languages.91

Large corpora are crucial for EBMT or SMT to function. These corpora are to be parallelly aligned, meaning that two texts are analyzed and aligned into corresponding segments. Some corpus linguists will use the term translation corpus to refer to mutual translations, while parallel corpus indicates a collection of genre- similar multilingual texts. The corpora often create a sublanguage, which the system is then able to handle better. Large corpora include the bilingual proceedings of the Canadian and Hong Kong parliaments and the Europarl European Union multilingual corpus, covering over 20 languages. Some corpora are manually built, thus avoiding problems of ambiguous examples (conflicting examples with more than one translation).92

The probabilistic modeling of statistical machine translation is the very heart of SMT. Earlier models were based on generative translation models (step-by-step models) while advances have been made with discriminative models introduced in 2002. Discriminative models have become mainstream as they are able to use a variety of diverse and overlapping knowledge sources as features. SMT has moved from modeling flat structures, such as words and phrases, to hierarchical structures, e.g., syntactic trees.93 The training of the SMT occurs automatically from a parallel corpus, a process called parameter estimation. The parameters of the earlier generative models are typically “[…] probability distributions on unobserved latent variables such as word-to-word translation sub-models, distortion models, etc.”94

The early SMT systems were based on word-based models, where the is the word and each decision that the system takes is associated with a probability. Decisions include, for example, predicting the length of translations and choosing appropriate words. The decision sequence for generating the optimal translation is the one with the highest overall probability. These word-based models

91 Liu Yang, 201. 92 Somers. 93 Liu Yang, 201-02. 94 Ibid., 202.

21 are not widely used today but are still utilized to train phrase-based and syntax-based translation models.95 Phrase-based SMT is based on moving phrases as units. Phrases are sequences of consecutive words, not (necessarily) phrases in the common sense in syntactic theory. Phrase-based systems can handle idiom translations, word insertion and even deletion. Phrase segmentation, reordering, and translation are the three sub-models used in phrase-based SMT translation processes.96 Distortion sub- models are an important components in these systems, modeling the divergence of word order between natural languages such as subject-verb-object languages (e.g., English) and subject-object-verb languages (e.g., Japanese).97 Both word-based and phrase-based SMT take into consideration translations on flat structures.

Recent focus in SMT has shifted towards hierarchical syntactic structures, based on the premise that most natural languages are hierarchically structured. The aim is “[…] to assign a parallel syntactic tree structure to a pair of sentences in different languages, with the goal of translating the sentences by applying reordering operations on the trees.”98 This mathematical model is called synchronous grammar or transduction grammar. Synchronous grammars describe structurally correlated pairs of languages, and have been researched since the late 1990s in Context-Fee Grammars (CFG) and formalisms. Hierarchical models are syntax-based models and have shown that moving from flat to hierarchical structures has improved translation quality.99 They are syntax-based in the sense of using the fundamental idea of syntax, rather than exploiting linguistic syntactic structures. Real linguistic parse trees are used in Synchronous Tree Substitution Grammars. In some systems, linguistic syntax is used only on the target side (String-to-Tree), suitable for translating resource-scarce languages into resource-rich languages such as English. Tree-to String models will use linguistic syntax knowledge only on the target language (e.g., English) from resource languages without high-accuracy parsers. The disadvantages of syntax-based SMT includes the availability and accuracy of parsers, which is still a

95 Ibid., 204. 96 Ibid., 205. 97 Ibid., 206. 98 Ibid., 207. 99 Ibid., 207-08.

22 problem for a lot of natural languages. Syntax-based SMT has high memory requirements due to the large number of learned rules, and decoders are much slower than phrase-based decoders due to the complexity encountered. 100

EBMT entails extracting knowledge of pre-existing translations to facilitate new translations. The idea is typically contributed to Nagao in 1984. Around the same time the first computer-aided translation systems (CAT) were being introduced, operating in a similar example-based basis in using a translation memory. In CAT systems human translators manufacture a translation from the examples, while in EBMT the translation output is produced by the system. 101 EBMT is usually considered best suitable for sublanguage translations due to its reliance on preexisting examples.102 The three main stages of EBMT are “[…] (1) matching source fragments against the examples, (2) identifying the corresponding translation fragments, and then (3) recombining them to give the target output.”103 The systems can handle extra-grammatical sentence structures, similar to SMT (which RBMT could not handle). EBMT offers a high flexibility within the three segments of matching, aligning, and recombining, so most systems include some statistical components to them too. Examples may be stored in strings (sentences or phrases), tree structures, templates, or any other annotated representation appropriate for the process. The acquisition of pre-existing examples is similar to SMT, with large bilingual or multilingual corpora being the main source. Efforts have also been made to use not necessarily parallel aligned corpora, but using newsfeeds from news agencies in different languages, as they are believed to convey overlapping information.104 There is no minimum example size as far as granularity is concerned, so a bilingual dictionary can in theory be a restricted corpus, a translation aligned at word-level. Examples are often stored at an arbitrary length, not necessarily matching a linguistically meaningful structure or constituent. The most common grain size is a sentence. Since it is unlikely that a full sentence will yield usable results, the sentences

100 Ibid., 210. 101 BIlly Wong Tak-ming; Jonathan J. Webster, "Example-Based Machine Translation," ibid. (2013), 137-38. 102 Ibid., 144. 103 Ibid., 138. 104 Ibid., 139.

23 must be broken down into smaller chunks during matching and recombination. The size of the example base must also be taken into consideration, as the quality of the system will not improve anymore after a certain example base, but the processing time will increase and thus slow down the translation speed.105 In order to increase performance in EBMT, some systems will generalize examples as “translation rules” to reduce the size of the example base. Sentences like “John flew to Düsseldorf on December 3rd“can be stored as “ flew to on ”, which is easier to match against similar examples. The matching process in EBMT is one of the most studied topics in the field, as retrieving examples that closely match the source sentence is crucial. Examples that are stored as string pairs on sentence levels can be matched against each other with the help of thesauri, but will sometimes need to be decomposed into smaller fragments to help example retrieval. The similarity measure might be as simple as being character based, and two string segments might be matched against each other and compared for the number of modifications needed. This is known as edit-distance and has found widespread use in translation memories, spell checking, and speech processing. Within an edit-distance the edition, deletion, and substitution are measured until two examples are identical – this process is both language-independent and simple.106 Recombination is the next stage. After the translation examples are matched against the input sentences, their counterpart fragments need to be retrieved from the example base and finally combined into a meaningful target sentence. Somers mentions that these are really two separate problems. First, the fragments from the translation example must be scanned to find out which portion corresponds to the source text. Secondly, these portions must be recombined appropriately. When translation examples are already stored segments smaller than sentences (see the matching process), the first problem might be already partially solved. However, sometimes multiple possible translation examples are available. Some systems will use probabilistic (statistical methods) to aid the recombination process, and other systems, that stores examples in tree

105 Ibid., 139-40. 106 Ibid., 141.

24 structures will need to recombine the task in tree unification. Some systems will rely on rule-based engines that supply linguistic knowledge to aid recombination.107

As Somers mentions, most research has been done on hybrid systems, and rule-based and example-based systems are not mutually exclusive, but indeed often complimentary. In some systems whose developers do not describe them as statistical or example-based by their developers, statistical methods are used to teach or generalize the linguistic rules needed for a RBMT. Much work has been done in the extracting of linguistic knowledge and vocabulary extraction from large corpora without calling the systems SMT or RBMT. So what constitutes example-based or statistical MT besides the use of corpora? As Somers describes it: “EBMT means that the main knowledge-based stems from examples. However, as we have seen, examples may be used as a device to shortcut the knowledge acquisition bottleneck in rule-based MT, the aim being to generalize the examples as much as possible.”108 Because examples are stored in a tree structure, EBMT is firmly based on RBMT and the two enhance each other. As Somers mentions, one of the advantages of SMT is that linguistic knowledge of the system can be easily enriched simply by adding new data. Overgeneration is reduced, as the system will only produces constructions that really do occur in natural language. As there are no complex rules and theoratical complexes involved, rule conflicts do not occur. However, conflicting examples can still present a problem.109 Some systems will also combine RBMT and EBMT where the initial translation is carried out with example-matching and the chunks that could not be matched are translated using rule-based machine translation.110

2.2.3 Neural Networks and Deep Learning Almost all development in MT after 2012 has been achieved using neural networks.111 Research has focused on eliminating errors rather than trying to achieve a perfect translation, which has proven to be a more fruitful approach.112 Rebranding efforts

107 Ibid., 143-44. 108 Somers. 109 Ibid. 110 Webster, 145. 111 Andrey Kurenkov, "A Brief History of Neural Nets and Deep Learning," Skynet Today 2020. 112 Koehn, 19.

25 have been made to rebrand neural networks, or artificial neural networks, to “deep learning”, but the term neural network is still widely used. Neural networks draw their inspiration from biological neurons in as much as biological neurons receive input signals from other neurons through their dendrites. Neurons become activated if the input signal is strong enough, and the signal continues through an axon to the dendrites of yet another neuron. Artificial neural networks use this idea of a combined input by a weighted sum, an activation function, and an output value. McCulloch and Pitts conceived the first neural networks in 1943, but the first actual neural networks (perceptrons) weren't created until 1957.113 Perceptrons were conceived by Frank Rosenblatt, a psychologist, as a simplified mathematical model of how neurons in the brain operate. These early models could perform the formal logical reasoning OR/AND/NOT functions by summing up binary inputs and outputting a “1” if the sum exceeds a threshold value, or outputting a “0” if the threshold value is not reached.114 The first machine learning was able to classify simple shapes with a 20x20 pixel input. To achieve more complex outputs, the perceptrons were stacked in layers (neural nets), where each perceptron was responsible for one output of the function.

Neural machine translation (NMT) is in itself a well-defined task unto itself. If NMT is successful with a sufficient output, it could fuel other applications previously unavailable. Examples might include cross-lingual information retrieval or cross- lingual information extraction (such as search engines searching the entire web in different languages according to a search query and returning formatted information to the user).115 Further research interest has focused on neural translation outside of text as input and output into using clues outside written text, such as those needed in image caption translation or video subtitle translation.116 In his book about NMT, Philip Koehn mentions that neural network research, much like MT, has undergone several boom-and-bust cycles with high hopes and overly optimistic expectations after a breakthrough, but ultimately a cooling-down of interest when practical

113 Ibid., 31. 114 Kurenkov. 115 Koehn, 26-27. 116 Ibid., 27-28.

26 applications proved to be less than expected.117 The perceptron was able to learn by placing weights on the input, that were adjusted whenever the algorithm proposed a wrong output. Research came to a stop in 1969 when Minsky and Papert showed that the Boolean XOR function was not learnable by perceptrons.118

The first neural networks learned supervised, meaning they were given a training set of examples of both input and output (x and y), and “learned” to generalize and derive a function that could then predict outputs without a corresponding input (in that case linear, because of its two values). Most training sets include a test set with previously unknown input and output combinations, whose purpose is to evaluate the effectiveness of the machine learning algorithms. As with RBMT, overfitting is an issue, whereby algorithms learn the training set too specificially and cannot generalize to a test set.119 Machine learning itself refers to machines learning unsupervised, or the “field of study that gives computers the ability to learn without being explicitly programmed”. 120 Machine learning is about algorithms recognizing patters in data and making decisions based on probability theory, statistics, combinatorics, and optimization.121

The next step in neural network development was to stack layers into multiple- layer networks. Layers are stacked into an input layer, output layer and various hidden layers. The hidden layers' output remains “hidden” and only serves as input for the output layer. That way innumerous computations can tackle more complicated problems than a perceptron or single layer network could. Through calculus and the chain rule the output values of the network can now be trained or backpropagated. Single neural networks learned only forward, but multilayer neural networks can also learn backwards too, by adjusting the weights of the individual neurons in the layers to minimize the errors.122 Backpropagation dates theoretically back to the 1960s but didn't become popular until 1986, giving research into neural networks a new

117 Ibid., 29. 118 Kurenkov. 119 Ibid. 120 Arthur Samuel, 1959 as seen in Sebastian Raschka to Teaching, 24.05.2015, 2015, https://sebastianraschka.com/Articles/2015_singlelayer_neurons.html. 121 Ibid. 122 Kurenkov.

27 momentum. Many neural networks that still exist today were proposed at that time, such as recurrent neural networks, long short-term memory cells, and convolutional neural networks.123 A typical classification problem that could now be solved (and is still often used to explain how neural networks work) is how neural networks solve handwritten digit recognition.124 Interest cooled down again in the late 1990s when researchers found neural networks too complex and hard to train for natural language processing and speech recognition, to name examples.125

The deep learning aspect of neural networks developed around 2015. The neural networks learned unsupervised, meaning they began to derive structures from unlabeled data on their own, also called pattern recognition. Compared to supervised learning, they now only needed input data for training. An example might be an autoencoder that compresses data back and forth; it encodes the input to a hidden layer and decodes the hidden layer to an output, but with the input and output data ideally being the same. 126 Recurrent neural networks and convolutional neural networks have also been successfully implemented to deal with language modeling in NLP, meaning that they predict words that are likely to follow a stream of words, also called n-grams (autocomplete is an example). The idea of word vectors came into play in order for neural networks to learn the similarity between words in natural language. Heinrich Schütze introduced the idea in 1993 in his paper “Word Space”, in which words are mapped onto a multi-dimensional space (over 100 dimensions) and take an appropriate point.127 Similar words will be in closer proximity to one another, so this is a useful tool for language modeling. These maps are called word vectors and are mapped using phrase co-occurrences in Schütze's paper. These word vectors can be fed into neural networks and optimized using an error function through backpropagation and used for language modeling, as proposed by Yoshua Bengio influential 2003 paper, “A Neural Probabilistic Language Model”. 128

123 Koehn, 32. 124 Grant Sanderson, "What Is Backpropagation Really Doing? | Deep Learning, Chapter 3," in Neural Networks, ed. 3Blue1Brown (2017). 125 Koehn, 32. 126 Kurenkov. 127 Heinrich Schütze, "Word Space," Advances in neural information processing systems 5 (1993). 128 Kurenkov.

28

Backpropagation does though not work very well with multi-layered neural networks, or deep neural nets. This is because errors are backtracked from the output layer and blamed on the layers before, so more layers means that calculus either results in huge or tiny numbers (vanishing or exploding gradients129) and the neural net does not work very well anymore.130

After research cooled down in the early 2000s due to the challenges mentioned above, the field gained new traction after it was “rebranded” by Bengio, LeCun and Geoffrey Hinton amongst others from neural nets to deep learning. In a 2006 breakthrough paper that would rekindle interest in neural nets, Hinton argued that they could indeed be trained, if the weights of the layers were not randomly initialized, but pre-trained. While neural nets were previously mostly trained on the pre-labeled MNIST dataset of handwritten digits, a new challenging dataset was introduced by 2009. ImageNet was intended to contain 50 million pictures to illustrate the WordNet database (a database of English words grouped by meaning). By 2012 computational power reached its limits and Hinton, George Dahl and Abdel- rahman Mohamed discovered that graphic procession units (GPUs) were 70 times faster than CPUs for deep models. Dahl and Mohamed began to work for Microsoft and had access to Big Data, or previously unknown amounts of training data to train speech recognition. In summary, deep learning equals lots of training data plus parallel computation plus scalable, smart algorithms.131

Recalling the Vauquois' Triangle (Figure 5) one can argue that SMT had already come a long way towards an interlingua and NMT is a return to lexical transfer; perhaps in the future it can take a further step towards an interlingua translation.132

129 Jürgen Schmidhuber, "Deep Learning in Neural Networks: An Overview," Neural Networks 61 (2014). 130 Kurenkov. 131 Ibid. 132 Koehn, 11.

29

2.3 Human Translator Aids Translation now seems unthinkable without the use of some kind of technology. Computer-aided Translation (CAT) technologies were the first to emerge, and MT has sometimes been regarded as a CAT tool, because it usually requires some human interaction, i.e., post- or pre-editing. 133 Harold Somers describes the launch of commercially viable MT and CAT systems in the 1980s as disastrous, but much has changed since.134 In a 2008 study, over 70% of translation agencies and translators reported using CAT technology. 135 In 2018 a study involving 1200 language professionals from 55 countries reported that over half of them used MT. 136 Historically, CAT and MT were handled separately. The lines are now blurry, as MT is increasingly integrated into CAT systems.137 Indeed, 99% of translations today are estimated to be mediated by machines.138 The need for translation has outgrown human capabilities; since the launch of Google Translate in 2006, translations have become something that people expect to have immediately available.139

The first technologies, developed around 1990, were translation memory tools that utilized repetitive content through parallel corpus-based searches.140 Specialized terminology was always translated the same after the introduction of software applications such as TermTracer in 1989 or Translator´s Workbench in 1994.141

133 Lynne Bowker, "Computer-Aided Translation: Translator Training," in The Routledge Encyclopedia of Translation Technology, ed. Chan Sin-wai (London, New York: Routledge, 2015). 134 Harold Somers, "Translation Technologies and Minority Languages," in Computers and Translation, ed. Harold Somers, Benjamins Translation Library (Amsterdam, Philadelphia: John Benjamins Publishing Company, 2003), 97. 135 LT2013, "Status and Potential of the European Language Technology Markets," in The Forum for Europe´s Language Technology Industry, ed. Shaping Europe´s digital future (online: European Commission, 2014), 14. 136 Minako O´Hagan, "Introduction: Translation and Technology: Disruptive Entanglement of Human and Machine," in The Routledge Handbook of Translation and Technology, ed. Minako O´Hagan (London, New York: Routledge, 2020), 1. 137 Dorothy Kenny, "Translation and Translator Training," ibid., Routledge Handbooks in Translation and Interpreting Studies, 506. 138 Henry Liu, "Foreword," in The Bloomsbury Companion to Language Industry Studies, ed. Maureen Ehrensberger-Dow Erik Angelone, Gary Massey, Bloomsbury Companions (London, New York, Oxford, New Delhi, Sydney: Bloomsbury Academic, 2020). 139 Jaap van der Meer, "Translation Technology - Past, Present and Future," ibid., 288. 140 Koehn, 21. 141 Meer, 286. Translations can become unusable very quickly if no terminology file is used and the translation work is produced by several human translators, see C. Terry Warner Alan K. Melby, The Possibility of Language. A Discussion of the Nature of Language, with Implications for Human and Machine Translation (Amsterdam, Philadelphia: John Benjamins Publishing Company, 1995), 162.

30

Other translation technology tools include app localization systems, audio-video captioning, community-translation platforms, controlled authoring tools, globalization management systems, localization project management, MT platforms, post-editing tools, proxy-based localization management, quality assurance tools, speech-to-, terminology management tools and repositories, translation apps, translation management systems, and translation memory tools.142 In the beginning, Hutchins and Somers classified the technologies along a continuum depending on who bore the main burden, humans, or machines, from MT to human- aided machine translation to machine-aided human translation to human translation. The CAT term was often used to refer to machine-aided human translation.143

It has been suggested that the long-used term CAT should be abandoned for other terms to recenter the technologies again around the human translator, e.g., translation environment tools (TEnTs) or augmented translations (AT). 144 Most high-quality translations are overseen by language service providers who often outsource the work to freelance translators. 145 The paradigm shift from “from- scratch” translations, now increasingly obsolete, to post-editing content generated from multiple sources, has led to some scholars such as Pym suggesting abandoning the term “source text” and replacing it with “start text”.146

2.4 Challenges to Machine Translations Translations are a difficult task even for human translators, and MT faces some unique problems itself. As Arnold dissects in 2003 (before NMT became popular), all translation problems for MT can be attributed to four limitations of computers: to perform only vaguely specified tasks, to learn things themselves, to perform common-

142 Meer, 290-99. 143 Kenny, "Translation and Translator Training," 505. 144 Minako O´Hagan, "Introduction: Translation and Technology: Disruptive Entanglement of Human and Machine," ibid. 145 Koehn, 21. 146 Maureen Ehrensberger-Dow Erik Angelone, Gary Massey, "Introduction," in The Bloomsbury Companion to Language Industry Studies, ed. Maureen Ehrensberger-Dow Erik Angelone, Gary Massey, Bloomsbury Companions (London, New York, Oxford, New Delhi, Sydney: Bloomsbury Academic, 2020), 3.

31 sense reasoning and to deal with problems that have more than one potential solution or the combinatorial explosion that follows.147

More general translation problems that translation systems encounter, whether rule-based, statistical, or neural, are semantic translation problems, i.e., when meaning is expressed differently in different languages, or is implied rather than stated explicitly. Pronominal anaphora (pronouns referring to something mentioned in the text, an antecedent or person) are a big challenge to MT, 148 as they must be translated after solving the co-reference resolution (to which item of the text the pronoun refers) and then translated into the right gender149, if applicable, in order not to produce morphologically incorrect variants that harm the adequacy, fluency, and politeness of the translation.150 As Terry Winograd pointed out, commonsense reasoning is central to resolving that task. An adequate model of natural language understanding must also include linguistic and non-linguistic reasoning if it is to resolve these tasks.151 Other translation difficulties will revolve around real-world knowledge, facts that human speakers are assumed to know, such as what a cousin is within family relationships towards the speaker. Another challenge is the discourse structure of documents. Syntactic translation problems include languages that use morphology or word order to mark the relationships between words. Phrase translation problems means that meaning is not always compositional, so MT will have to recognize these idiomatic phrases rather than translate them word for word. Ambiguity is a big problem, as natural language can be ambiguous on various levels: the meaning of words, syntactic properties, and morphology. Sometimes ambiguity

147 Doug Arnold, "Why Translation Is Difficult for Computers," in Computers and Translation: A Translator's Guide, ed. Harold Somers, Benjamins Translation Library (Amsterdam, Philadelphia: John Benjamins Publishing Company, 2003), 119. 148 Stefanie Dipper Heike Zinnmeister, Melanie Seiss, "Abstract Pronominal Anaphors and Label Nouns in German and English: Selected Case Studies and Quantitative Investigations," in Crossroads between , Translation Studies & Machine Translation, ed. Silvia Hansen-Schirra Oliver Czulo (Berlin: Language Science Press, 2017). 149 Marta Recasens Kellie Webster, Vera Axelrod, Jason Baldridge, "Mind the Gap: A Balanced Corpus of Gendered Ambiguous Pronouns," Transactions of the Association for Computational Linguistics (2018). 150 Christian Hardmeier Eva Vanmassenhove, Andy Way, "Getting Gender Right in Neural Machine Translation" (paper presented at the Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, 2018). 151 Emily M. Bender, 9-11. Some language pairs, such as English to French from Google Translate will now pass the Winograd Schema Challenge according to Constantine, 475.

32 is willfully entertained, and a translation should reflect that. Ambiguity on the world level, when one word has different meanings, can only be resolved by context information.152 While ambiguity can be used to an advantage, for example in politics or advertising by avoiding controversy while still communicating effectively, it has a negative impact on fluency and adequacy metrics in the output of MT.153

Bender and Lascarides argue that even though information is presented in complex and idiosyncratic ways, it can be formalized in a logically precise model of meaning. 154 Language models themselves do not perform natural language understanding and can be successful only in tasks that are approachable by manipulating linguistic form. They do not capture “meaning” and they cannot be described as “understanding” natural language, although differing claims may be made.155 When the large language models are perceived to be “understanding” or even “reasoning”, Bender and Koller argue they are simply very good at leveraging artefacts and are thus able to learn phenomena such as subject-verb arrangements in English, but when those language models extrapolate forms from their training data they cannot be seen as learning meaning in the process.156 Vector-space representations of words are especially well suited to pick up syntactic and semantic (lexical similarity) word classes.157

Neural networks work best when they are trained on huge datasets, which are not available for all applications in Artificial Intelligence (AI). They have other limitations which include interpretability, verifiability, and others.158 Data sparsity is one of the biggest challenges to the data-driven method, as the most frequent words in large corpora appear very often, while rare words might occur only once. The

152 Koehn, 5-8. 153 Djiako. 154 Emily M. Bender, 204. 155 Alexander Koller Emily M. Bender, "Climbing Towards Nlu: On Meaning, Form, and Understanding in the Age of Data" (paper presented at the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020). 156 Benoît Sagot Ganesh Jawahar, Djamé Seddah, "What Does Bert Learn About the Structure of Language?" (paper presented at the Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019). 157 Emily M. Bender. 158 Kurenkov.

33

Europarl corpus contains a total of 30 million words, 6,5% or 1.929.379 of which are “the”, followed in frequency by dots and commas. 33.447 words occur only once in the large corpus. Some say that there are arguments against purely data-driven methods, as they should be augmented with relevant generalizations from linguistic understanding. 159 These large corpora build “meaning” by analyzing the co- occurrence metrics of words,160 eventually recognizing words by the company they keep.161 Text embedding models use vector representations of words (or sentences and phrases) to model semantic relationships between words in geometric relationships, thus serving as a sort of dictionary to supply computer programs with word meanings.162 These text embedding models have been shown to encode harmful social biases by encoding undesirable correlations in data.163 The widespread and unsupervised use of word embeddings (word vectoring) also demonstrate female/male gender stereotypes to a large extend.164 The vector differences between words represent the words relationships, so an analogy puzzle of “man is to king as women is to x” will yield “man is to woman as computer programmer is to homemaker” or “father is to a doctor as mother is to a nurse”.165 Debiasing algorithms can be used to reduce the bias in computer systems so as not to amplify the bias in society166 which in turn creates discrimination and gender stereotypes in machine learning. 167 CommonCrawl and other large datasets are believed to represent

159 Koehn. 15. NMT used to be ineffective in handling rare words too: Mike Schuster Yonghui Wu, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean, "Google´S Neural Machine Translation System: Bridging the Gap between Human and Machine Translation," ArXiv (2016). 160 Vinodkumar Prabhakaran Ben Hutchinson, Emily Denton, Kellie Webster, Yu Zhong, Stephen Denuyl, "Social Biases in Nlp Models as Barriers for Persons with Disabilities" (paper presented at the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020), 5494. 161 John R. Firth, "A Synopsis of Linguistic Theory, 1930-1955," Studies in linguistic analysis (1975). 162 Kai-Wei Chang Tolga Bolukbasi, James Zou, Benkatesh Saligrama, Adam Kalai, "Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings," in NeurIPS (Barcelona2016), 1. 163 Ben Hutchinson, 5492. 164 Tolga Bolukbasi, 1. 165 Ibid., 1-3. 166 Seraya Maouche, "Google Ai: Opportunities, Risks, and Ethical Challenges," Contemporary French and Francophone Studies 23, no. 4 (2020). 167 Tolga Bolukbasi, 15.

34 different views of the world. However, according to Bender, Gebru and McMillan- Major, factors such as filtering along with factors that narrow Internet participation will lead to some views being overrepresented in the training data, such as white supremacy, misogyny, ageism, and other views in the US and UK English versions.168 Participation online shows that younger users from developed countries are overrepresented in general Internet use.169 Among editors of Wikipedia, only 8.8-15% are women, and 64% of the Reddit-userbase is male. 170 Underrepresented populations are therefore less likely to be included in training data for language models, especially if they are assembled from social media that some age groups might use considerably less.171 Measurable, undesirable social biases towards persons with disabilities have also been shown to be represented in English language models trained on large textual corpora. These biases could lead to harm to the dignity of individuals, reduced autonomy (when mentions of disabilities are censored disproportionally), or reduced freedom of speech along with the perpetuation of societal stereotypes.172 As Birhane and Prabhu note when referring to image searches: “[f]eeding AI systems on the world´s beauty, ugliness, and cruelty, but expecting it to reflect only the beauty is a fantasy.”173

2.5 Evaluation of Machine Translation The evaluation and quality assessment of translations has been a concern since the very beginnings of translation studies, and the evaluation of MT output is not different in that regard. Indeed, evaluation is central to translation, because it is central to every aspect of communication, indicative of both an axiological and

168 Angelina McMillan-Major Emily M. Bender, Timnit Gebru, Shmargaret Shmitchell, "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? ," in Conference on Fairness, Accountability, and Transparency (FAccT ´21) (Virtual Event2021). 169 Ibid. 170 Michael Barrera, "Mind the Gap: Addressing Structural Equity and Inclusion on Wikipedia," (Arlington: University of Texas Arlington, 2020). 171 Mark Diaz Amanda Lazar, Robin Brewer, Chelsea Kim, Anne Marie Piper, "Going Gray, Failure to Hire, and the Ick Factor: Analyzing How Older Bloggers Talk About Ageism," in the 2017 ACM Conference (Portland2017). 172 Ben Hutchinson, 5491,95. 173 Vinay Uday Prabhu Abeba Birhane, "Large Image Datasets: A Pyrrhic Win for Computer Vision" (paper presented at the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021), 1541.Benjamin´s discussion on learned beauty in AI algorithms: Ruha Benjamin, Race after Technology (Cambridge, Medford: Polity Press, 219), 57.

35 ideological position.174 Typically, a comparison will be made on how creatively or faithfully linguistic features are transferred over language borders. 175 So far, no comparison of evaluation techniques for human translation and MT has been made, but it should prove a fruitful enterprise.176

Several measures are available for the categorization and evaluation of translations. Best-practice quality measures are important, as they will also show the progress achieved during engineering. Task-based evaluation such as information gathering as a real-world task is not a great way of evaluating because it yields too few data points, although information gathering may be the most common use of MT currently. If the information gathering is broken down it can however be used to benchmark MT quality, for example with a question-answering exercise to assess content understanding.177 Translator productivity is another potential measure of MT usability, measuring the time that professional human translators spend editing MT output.178 Human assessments are frequently employed to evaluate MT, and while Koehn deems them less informative than task-based evaluation methods, they are scalable with reasonable cost. Human evaluators are tasked with evaluating translations on adequacy and fluency measures on a graded scale, with adequacy (“Does the output convey the same meaning as the input sentence?”) being the hardest to judge. Human evaluators might prefer to judge two systems against one another in ranking campaigns, instead of splitting the judgement of one system into two factors, yielding more consistent results.179 As human evaluators vary greatly in their use of scales, continuous scales have proven better then direct assessment, as the researchers can normalize the results.180 Instead of having humans evaluate the output, another task-based translation evaluation factor is human translation edit

174 Jeremy Munday, Evaluation in Translation (London, New York: Routledge, 2012), 9,155. 175 Qian Duoxiu, "Introducing Corpus Rhetoric into Translation Quality Assessment. A Case Study of the White Papers on China´S National Defense," in The Human Factor in Machine Translation, ed. Chan Sin-wai, Routledge Studies in Translation Technology (London, New York: Routledge, 2018), 83. 176 Harold Somers, "Christa Hauenschild and Susanne Heizmann (Eds), Machine Translation and Translation Theory," Machine Translation 15, no. 3 (2000): 265. 177 Koehn, 42-43. 178 Ibid., 44. 179 Ibid., 45-47. 180 Ibid., 49.

36 rate (HTER), which is evaluation by postediting. HTER measures the amount of post- editing a human evaluator has made to the machine translation output (which is considerably lower than the TER rate, measuring the edit distance between an independently produced human translation and the MT output).181 However rating translations will ultimately remain a subjective enterprise. A team of professionals will set the golden standard for each text based on accuracy, fidelity, appropriate style and register and intelligibility. This standard can vary depending on the particular recipients and particular circumstances and is very difficult to grasp or make final assumptions about.182

Unlike human assessment metrics, automatic metrics hold several advantages for MT researchers, as they are both faster and cheaper. 183 Their objective is to compare MT to human-generated reference translations. While they are widely used, their ability to distinguish better from worse MT systems has been called into question, for example the Word Error Rate (WER) and the Bilingual Evaluation Understudy (BLEU) metric. The BLEU metric is still widely used today, as it offers an automatic evaluation and is thus able to process large quantities of text; in this sense it is a compromise between requiring and ignoring matching word order.184 The metric counts word or word group matches from machine-generated translations against multiple human-generated translations. BLEU does not only count the single words that fit the reference translation, but it also counts n-gram matches (bigrams, trigrams, or 4-grams). A refinement available to BLEU is the precision metric, whereby the matching is reversed, thus counting the n-grams from the MT within the human translation. To achieve better results in the precision metric, a MT system should skip output in difficult cases.185 Other measures such as recall and f-measures are available within BLEU and final scores are computed from thousands of sentence pairs out of a test-set.186 What BLEU does not evaluate is the relative relevance of

181 Ibid., 52. 182 W. John Hutchins, 2. 183 Koehn, 52. 184 Ibid., 53. 185 Ibid., 54. 186 Ibid., 45-55.

37 different words. 187 Human BLEU rates are rarely higher than MT output BLEU scores, even though they are of higher quality. However, the comparison between human evaluation to adequacy and fluency as compared to BLEU scores have shown a considerable correlation.188 When scoring single sentences, TER (translation error rate) or translation edit rate is the preferred measure.189 Both BLEU and TER will not give credit for morphological variants, which the CharacTER metric adresses, but it has not been widely adopted.

Human translations within translation service providers (TSP) are often certified by the ISO 17100 standard, covering specifications according to industry codes, legislation, or best-practice guides. As the International Organization for Standardization writes, “the use of raw output from machine translation plus post- editing is outside the scope of ISO17100:2015”.190 The standard was reviewed and confirmed in 2020 and does considers translation within CAT valid, but not the post- editing of MT output, so it will be interesting to see how this might develop in the future.

3 Icelandic Language Technology

3.1 Language Technology for Small Languages Translation “[…] has been instrumental in the formation of writing and literary culture in every European language […]”191 and continues to gain influence as the world increasingly moves into a globalized stage. Although some see English as a lingua franca, the UNESCO and European Union among others place a focus on multilingualism. Weissbort and Eysteinsson argue that many types of English coexist today, including scholarly, scientific, commercial, and political, which are similar

187 Ibid., 59. 188 Ibid., 60. 189 Ibid., 56. 190 International Organization for Standardization (ISO), "Iso17100:2015(En) Translation Services - Requirements for Translation Services," in Scope (2020 (2015)). 191 Ástráður Eysteinsson Daniel Weissbort, Translation - Theory and Practice (Oxford: Oxford University Press, 2006), 1.

38 enough to be still intelligible without translation across them.192 The Bible is the most translated text in Western culture and history and serves as a leitmotif for the history of translation studies with the story of the Tower of Babel. The narrative of the Tower of Babel in the book of Genesis tells a tale of humans under one united language who attempted to build a tower to the heavens. The deity thus decided to divide humanity by creating multiple languages. Following this tradition, translation therefore has a sacrilegious desire to re-unify humanity and human cultures.193

Before discussing “small” languages, one should discuss small states. Small states are generally defined as nations that are “[…] small in landmass, population, economy, and military capacity.” 194 Yet, as Brady and Thorhallsson mention, the territorial size as a measure of relative power in times of hybrid warfare might be outdated, rather shifting importance instead to the states space or maritime boundaries as well as national resilience, cyber defense, unity, and digital diplomacy capacity. Henderson describes small states based on six factors, including among others low participation in international affairs due to lacking resources, an economic focus on foreign affairs, and a moralistic approach without having the resources necessary to back up that moral emphasis up.195 Iceland is most certainly considered a small state in some regards, some might even say a “micro-state”.196

The distinction between a minority language and a majority language will always be quantitative rather than qualitative – quantitative not in terms of the number of speakers but rather the amount of economic viability. Most translations are motivated by commercial considerations. 197 Since the very beginning of MT, commercially and economically more “important” languages such as English paired with German, Spanish, Italian, French, Japanese, Korean, Russian, and Chinese were increasingly released and researched in MT programs. Other European languages,

192 Ibid., 5. 193 Ibid., 8. 194 Baldur Thorhallsson Anne-Marie Brady, "Small States and the Turning Point in Global Politics," in Small States and the New Security Environment, ed. Baldur Thorhallson Anne-Marie Brady, The World of Small States (Cham: Springer, 2021), 2. 195 Ibid. 196 Lee Miles, "Foreword," in Iceland and European Integration. On the Edge, ed. Baldur Thorhallsson, Europe and the Nation State (London, New York: Routledge, 2004), xiii. 197 Somers, "Translation Technologies and Minority Languages," 87.

39 such as Czech, Polish, Bulgarian, Romanian, Latvian, Lithuanian, Estonian, and Finnish are rarely to be found.198 The language pair Arabic-English was rarely found until the latter half of the 1990s, but this has changed largely due to the global political situation and is now a large field or research.199 Many of the world's most spoken languages have been neglected, such as Asian languages (Malay, Indonesian, Thai, Vietnam), major languages of India (Hindu, Urdu, Bengali, Punjabi, Tamil), and many African languages. As Hutchins reasons, this neglect not only stems from low commercial value but also from a lack of language resources 200 such as corpora needed for statistical MT or rule-based lexica and grammars.201 Hutchins argues that categorizations of minority languages will have a geographical dimension to them as well. In Spain, Basque and Catalan are minor, in the European Union languages such as Welsh, Irish, Estonian, and Lithuanian qualify for that quantifier. Recently more advances have been made in minority languages with MT systems for Basque, Catalan, Galician, Czech, Estonian, Bulgarian, Latvian, and in South and South East Asia with Bengali, Tamil, Thai, and Vietnamese. As mentioned earlier, one of the main problems of minority (and immigrant) languages is the lack of language resources such as corpora of translations, word-processing software (some languages might not have scripts), and the lack of dictionaries and spellcheckers (some languages will not have a standard spelling convention), and sometimes even a lack of experienced and qualified researchers.202 Another issue is that the automatic production of content lists must be language sensitive. Alphabetization differs in some countries – for example, Icelandic and Danish list some accented letters at the end of the alphabet (Ö/Ø), while other alphabets such as German or French do not account for accented letters. 203 Between 1947 and 1957, during the early years of MT, six countries researched and developed MT (the USA, Russia, United Kingdom, Japan, China, and former Czechoslovakia). By 2007 however, 30 out of 193 countries worldwide were

198 Hutchins, "Multiple Uses of Machine Translation and Computerised Translation Tools." 199 Joseph Olive. 200 Only about 11 languages of all the worlds languages can be considered resource-rich languages: Emily M. Bender, "The #Benderrule: On Naming the Languages We Study and Why It Matters," The Gradient, https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it- matters/. 201 Hutchins, "Multiple Uses of Machine Translation and Computerised Translation Tools," 18. 202 Ibid. 203 Somers, "Translation Technologies and Minority Languages," 94.

40 found to be involved to some extend in MT. According to Sin-wan Chai, 16% of all countries worldwide have been engaged in MT, with 30% of them active in research and development.204 Sin-wan Chai's list of 31 countries from 2013 does not include Iceland, however. As researchers of language technology in Iceland mention, Icelandic is seldomly supported in language technology applications and software due to negligible number of native speakers. 205 Another characteristic of minority languages is subtitling. and captions are more important for minority languages than majority languages, as minority languages will have less “original” material and less dubbed material, thus relying more on written screen translations.

UNESCO considers language an essential component of human nature, and, following that argumentation, as a quintessential element of culture. Thus, linguistic diversity will serve as a major guarantee for cultural diversity. UNESCO considers multilingualism a prerequisite for an inclusive knowledge society as well as for an “[…] open, plural and sustainable development that UNESCO aims to promote.”206 As Rögnvaldsson and others discuss, out of the 6000207 existing languages on the planet, over 2000 languages will die out over the next decades. Some others will survive in daily conversations, but will not be usable in science, technology, or business. The survival of languages is not only dependent on the number of speakers and published media, but also on the role of that language within technological developments and the digital age.208 In 2012 an evaluation of European languages concluded, according to four main evaluation categories (MT, Speech analysis, text analysis, basic language technology), that Icelandic was in the lowest of five

204 Sin-wai, 26. 205 Jón Guðnason Anna Björk Nikulásdóttir, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson, "Language Technology Programme for Icelandic 2019-2023," arXiv:2003.09244 (2020). 206 UNESCO Executive Board, "Report by the Director-General on the Execution of the Programme Adopted by the General Conference. Intersectoral Mid-Term Strategy on Languages and Multilingualism," (2007). 1. 207 Pratik et.al. speak of 7000 languages. Sebastin Santy Pratik Joshi, Amar Budhiraja, Kalika Bali, Monojit Choudhury, "The State and Fate of Linguistic Diversity and Inclusion in the Nlp World" (paper presented at the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020). The UNESCO also counts 7000 languages: "Upcoming Decade of Indigenous Languages (2022 – 2032) to Focus on Indigenous Language Users’ Human Rights," https://en.unesco.org/news/upcoming-decade-indigenous-languages-2022-2032-focus- indigenous-language-users-human-rights. 208 Eiríkur Rögnvaldsson, 1.

41 categories, the same category as other languages with fewer speakers such as Irish, Lithuanian, Latvian and Maltese.209 Even though a language may be spoken by few, the development of language technology and resources costs the same for every language, regardless of the number of speakers.210 Rögnvaldsson and others argue that with the advent of the Gutenberg press some regional languages that were seldomly printed, went extinct, such as Cornish (now revived) and Dalmatian. They ask whether the digital age will have a similar impact on languages?211 Similarly, a team from Microsoft Research mentions that some languages used by over a billion people currently have little or no support in LT, amounting to over 90% of the languages used worldwide, raising problems of inclusion. 212 As they mention, according to Bender, most NLP models are not language agnostic, even though they claim to be. 213 Icelandic is a member of the Indo-European language family, amounting for 85% of the papers that Bender assembled to consider the extent of research on various language families. Icelandic is furthermore of the Germanic genus, amounting for roughly 72% of the papers she compared, and so still in a superior position compared to Afro-Asiatic, Sino-Tibetan or Altaic language families.214 Pratik and others mention that Icelandic, among other languages such as Wolof and Kilivila, has a rare feature that exists in only 38 languages in the world.215 This feature is often ignored due to its rarity, and these languages can be adversely affected if no sufficient typological representation is available.216 They mention that there is hope for low-resource languages if there are focused communities working on them. 217 As Bender, Gebru and McMillan-Major point out, “most language technology is built to serve the needs of those who already have the most privilege in

209 Ibid., 2. 210 Rögnvaldsson. 211 Eiríkur Rögnvaldsson, 4-5. 212 Pratik Joshi. 213 Emily M. Bender, "On Achieving and Evaluating Language-Independence in Nlp," LiLT 6, no. 3 (2011). 214 Bender studied papers submitted for the 2009 EACL conference in ibid. 215 Pratik Joshi. The SNegVO/SVNegO characteristic is displayed as well in the other Scandinavian languages Danish, Norwegian and Swedish and the Grassfield Bantu language Aghem: Matthew S. Dryer, "Position of Negative Morpheme with Respect to Subject, Object, and Verb," in The World Atlas of Language Structures Online, ed. Matthew S. Dryer, Haspemath, Martin (Leipzig: Max Planck Institute for Evolutionary Anthropology, 2013). 216 Pratik Joshi, 6286. 217 Ibid., 6290.

42 society.” 218 There are, however, some examples in which language technology is designed to benefit marginalized communities. 219 As Pine and Turin point out, linguistic endangerment is by no means an inevitable by-product of modernization and mentioning in this respect the strategic power of the language of colonial authorities.220

3.2 Icelandic Language and the Icelandic Independence Movement How important is a language for a small community? In Iceland it is inseparable from the country's fight for independence, which is in turn entangled with its rich literary history. Taking a step back, we join Benedict Anderson in asking the question of what a nation is. It is an imagined political community, as even within the smallest nation, people will never be able to meet all of their fellow countrymen, but will still feel some degree of unity. 221 The imagined community will move through homogenous, “empty” time,222 while its members can be sure that the nation will survive223 and give them a frame by which they share a continuous experience with both their ancestors and their fellow members of the community (which they have evidently never met). This simultaneity is a temporal coincidence, measured by clock and calendar, like news articles in a newspaper that have something in common – an imagined community via actors that are unaware of one another but performed an action at the same clocked, calendrical time.224 The development of vernaculars to languages of power in medieval Europe is regarded by Anderson as a gradual, pragmatic, haphazard and unselfconscious process.225 This linguistic diversity alone has made the new imagined communities possible, along with a technology of communication

218 Emily M. Bender, 4. 219 Mark Turin Aidan Pine, "Language Revitalization," Oxford Research Encyclopedia of Linguistics (2018). 220 Ibid. Their research focuses however on languages that work on standardizing orthographies; an issue that has been long resolved in Iceland. 221 Benedict Anderson, Imagined Communities. Reflections on the Origin and Spread of Nationalism (London, New York: Verso, 2006 (1983)), 6. 222 Ibid., 26. 223 Ibid., 12. 224 Ibid., 25-26. 225 Ibid., 42.

43

(print) and a system of productive relations (capitalism).226 The printed languages in turn made an imagined community possible. 227 Print not only standardized vernaculars and spelling, so that people from different regions could communicate and, at the same time, gain awareness of innumerable other people with the same language, it also made it easier to assign a certain language with a certain territory.228 A particular antiquity is central to the idea of a nation, and printed books in fixed languages (knowledge from earlier centuries) were now accessible and able to remain in a permanent form, temporally and spatially reproducible without modernizing changes.229 These imagined communities then “set the stage” for modern nations, with their boundaries mostly along dynastic expansionism. 230 Monolingual dictionaries made languages portable and bilingual dictionaries approached languages with an egalitarian view, bringing two languages to an equal status side by side.231

Reports and fears of the death of the Icelandic language are not new and resurface regularly. In 2006, a speaker at a conference on linguistic development predicted that Icelandic would not be spoken a century from now.232 More recently, the Icelandic minister of Education, Science and Culture, Lilja Alfreðsdóttir, wrote a letter to the CEO of Disney to request that their streaming service Disney+ to include Icelandic translations into their service. She explained that Disney´s movies have had “[…] a formative effect on generations of children, not least due to the excellent […] and high-quality Icelandic translations as subtitles. Our language is the core of the nation´s culture and identity [….]. ”233 The fear of the extinction of the

226 Ibid., 43. 227 Marshall McLuhan, The Gutenberg Galaxy. The Making of Typographic Man (Toronto: University of Toronto Press, 1962 ), 199. 228 Anderson, 43-44. 229 Ibid., 44. 230 Ibid., 46. 231 Ibid., 71. 232 Hálfdanarson, "From Linguistic Patriotism to Cultural Nationalism: Language and Identity in Iceland," 55. 233 Lilja Alfreðsdóttir, "Letter to Disney," news release, 2021, https://scontent.frkv1- 1.fna.fbcdn.net/v/t1.0- 0/p417x417/145337113_3886638161401479_6361409550049456436_o.jpg?_nc_cat=109&ccb=3& _nc_sid=730e14&_nc_ohc=AOrLHQBi64UAX-OVFs7&_nc_ht=scontent.frkv1- 1.fna&tp=6&oh=87d82462becea36e329a06cd74eb8503&oe=605EFE4B. The discussion even made

44

Icelandic language is not new with the dominance of English and earlier dominance of Danish. But, as Guðmundur Hálfdánarson writes, “[t]here is not much factual evidence to support these fears, because Icelandic seems to be thriving, but they are an integral part of the existential angst of the age of globalization.”234 One of the great political issues of modern times is without a doubt the interplay between language and the construction and maintenance of national identities. Some say that Iceland was only able to obtain independence from Denmark because of their own language, which had become a tool in the struggle for self-determination and a defining marker of the nation.235 The interest in the “pure” Icelandic language dates back well into the 19th century and was subsequently emmeshed with the deciphering of Icelandic manuscripts and sagas, which were illegible to the Danes. 236 Iceland was under Danish rule at the time, and was granted its own constitution in 1874, gaining independence by 1944, 237 the reason for independence being mostly rooted in arguments surrounding Icelandic culture and language.238 Well into the 20th century Iceland was one of the poorest countries in Western Europe. However, this never dampened the enthusiasm of Icelandic intellectuals for arguing for Iceland's independence.239 The idea that the current Republic of Iceland was a restoration of the early Icelandic “nation” of the 10th century emerged During celebrations of Icelandic independence. This idea viewed the independence movement as a purely Icelandic idea, and not a European import.240 A counter to this argument has been that the King of Denmark proposed a regional parliament for Icelandic in 1840, calling it Alþingi (Alþingi was the name of the original general assembly in medieval times, abolished in 1800 by the Danish monarchy).241 Today English has replaced

it into the New York Times: Egill Bjarnason, "Iceland Has a Request for Disney+: More Icelandic, Please," The New York Times 2021. 234 Hálfdanarson, "From Linguistic Patriotism to Cultural Nationalism: Language and Identity in Iceland," 62. 235 Ibid., 62-63. 236 Ibid., 58. 237 "Icelandic Modernity and the Role of Nationalism," in Nordic Paths to Modernity, ed. Björn Wittrock Jóhann Páll Árnason (New York, Oxford: Berghahn Books, 2012). 238 "From Linguistic Patriotism to Cultural Nationalism: Language and Identity in Iceland," 62. 239 "Icelandic Modernity and the Role of Nationalism," 252. 240 "Discussing Europe: Icelandic Nationalism and European Integration," in Iceland and European Integration. On the Edge, ed. Baldur Thorhallsson, Europe and the Nation State (London, New York: Routledge, 2004), 131. 241 "Icelandic Modernity and the Role of Nationalism," 255.

45

Danish as the preferred foreign language to learn, with French and German holding a distant third and fourth place.242 Returning to Andersons' imagined communities within time and space, Eysteinnsson observes that Icelanders did not have newspapers or novels at the time of the awakening of their nationalism (as Anderson had suggested in his general theory). Icelanders found a rich source of inspiration, with even mythical dimensions, in their literary tradition and the language that unites them with the present. The language became a metaphorical island, and served alongside the national and literary identity as the uniting force for the inhabitants.243 This island is of course not the same as in the literal sense, as the language can never hold the local or “indigenous” meaning that is ascribed to it. Foreign influences can be found widely throughout the language, and the exclusion of translations from the literary canon was one of the main characteristics of this “patriotic literature”, itself a consequence of the creation of this imagined community.244 The Icelandic Language Technology Project Plan follows the logic of preserving the language independently, thus concentrating all the research domestically and keeping sovereignty over the language technology.

3.3 Language Technology for Icelandic Research on language technology (LT) in Iceland has mostly followed a top-down approach in recent years, meaning that political influence and funding of the development of LT has been vital. LT is considered a young field in Iceland. Microsoft's operating system was translated into Icelandic in 1996 after input by the Icelandic Ministry of Education, Science and Culture highlighting the importance of strengthening Icelandic LT.245 The ministry appointed a committee in 1998 to assess LT, which was published in 1999, and concluded that LT was virtually non-existent even though Icelandic was used as the basis of nation's business interactions and daily

242 Baldur Thorhallsson Gunnar Helgi Kristinsson, "The Euro-Sceptical Political Elite," in Iceland and European Integration. On the Edge, ed. Baldur Thorhallsson, Europe and the Nation State (London, New York: Routledge, 2004), 153. 243 Eysteinsson, 240. 244 Ibid., 240-41. 245 Martha Dís Brandt, "Developing an Icelandic to English Shallow Transfer Machine Translation System" (Reykjavík University, 2011), 4.

46 communications.246 The committee concluded that the language could be in danger of extinction if no use was made of LT to make Icelandic accessible to use in all aspects of daily life.247 The report assessed that annual funding of 225-250 million ISK per year would be necessary. The first LT program was launched in 2000, with important technological results. It was the hope of Eiríkur Rögnvaldsson, professor emeritus of Icelandic Language and Linguistics, that the Icelandic example could inspire to other small language communities with little LT on how to achieve an impact with the fruitful cooperation of authorities, academia, and industry to build LT resources from scratch.248 The first project was an Icelandic Frequency Dictionary, a balanced corpus of 600 000 tokens, used as a gold standard for tagging Icelandic text. The first LT program in 2000 resulted in several important LT resources, such as a morphological database for modern Icelandic inflections, a balanced morphosyntactially tagged corpus of 25 million words, a training model for data-driven POS taggers, a text-to- speech system, a speech recognizer and an improved spell-checker.249 In 2005 the Icelandic Centre for Language Technology (ICLT) was founded. ICLT is a collaboration of the Department of Lexicography at the Árni Magnússon Institute for Icelandic Studies, the Institute of Linguistics at the University of Iceland, and the School of Computer Science ay Reykjavík University.250 From 2008 a Basic Language Research Kit (BLARK) was developed, including a POS tagger, a lemmatizer, a shallow parser, and a context-sensitive spell checker.251

Currently the language technology plan Máltækniáætlun, or Language Technology Project Plan (LTPP), is in action. The plan was commissioned in 2016 by Illugi Gunnarson, at the time the Minister of Culture and Education. The concept was to create a 5-year budget and action plan. The reasoning behind the project plan was explained in the following:

Vaxandi áhrif tölvutækni á daglegt líf munu á næstu árum krefjast aðgerða af hálfu stjórnvalda til að tryggja að íslenskan verði gjaldgeng í samskiptum sem byggja á tölvu-

246 Rögnvaldsson. 247 Brandt, 4. 248 Rögnvaldsson. 249 Brandt, 5. 250 "Introduction," http://www.iclt.is/index_en.html. 251 Brandt, 6.

47

og fjarskiptatækni. Það er mat þeirra sem best til þekkja að íslenskunni stafi hætta af þessari þróun verði ekkert að gert. Jafnframt felst í því mikil tækifæri fyrir íslenskt samfélag ef hægt er að nota tungumálið til fulls í samskiptum við snjalltæki ýmiss konar.252

In coming years, the increasing effect of computers on our daily lives will demand action from the government to ensure that using Icelandic will be an option in all communications using computers and telecommunications technology. Knowledgeable opinion states that, should no action be taken, the Icelandic language is in grave danger. If the language can be used for communication on all types of smart devices, this will also present great opportunities for Icelandic society.253 The LTPP lists 5 core areas of research: speech analysis, speech synthesis, MT, spell checkers, and more generalized language resources. The authors of the LTPP mention that it is essential for smaller language communities to use the available tools, not only for the benefit of the language but also for the benefit of how the language community itself. The authors mention that in cases where the LT resources are not available in Icelandic, devices will still be used, but in other languages. That would lead to lost opportunities and those without sufficient knowledge of another language being left behind. The price of developing LT is counterpointed by lower quality of life versus making sure that society, language, and businesses are competitive.254 The LTPP is made up of three pillars, of which the first one will be further described below. The first pillar (and backbone) is the government funded development of infrastructure software and resources. The second pillar is a competitive fund managed by the Icelandic Centre for Research (Rannís) that allocates funds towards research and development. The third pillar is a joint Master's program of the University of Iceland and Reykjavík University. The total cost of all three pillars is estimated at 14 million Euros.255

The proposed MT unit within the LTPP is an open, automatic, bilingual, and bidirectional MT between Icelandic and English. The LTPP conceptualizes the MT unit as a helpful tool for translators, speeding up their output and thus saving money and time. The LTPP assesses that translating between Icelandic and other languages

252 Jón Guðnason Anna Björk Nikulásdóttir, Steinþór Steingrímsson, Máltækni Fyrir Íslensku 2018- 2022. Verkáætlun (Reykjavík: Mennta- og menningarmálaráðuneytið, 2017), 11. 253 Language Technology for Icelandic 2018-2022. Project Plan (Reykjavík: Mennta- og menningarmálaráðuneytið, 2017), 11. 254 Máltækni Fyrir Íslensku 2018-2022. Verkáætlun, 12. 255 Anna Björk Nikulásdóttir.

48 is not feasible via an intermediary language, thus the efforts should concentrate on one bi-directional language pair. They also mention that the MT system must be trained for the domains it is supposed to work with; they do not mention a controlled language, but they mention the general genre and topic of the text. They agree that the text from the MT system must not be perfect, as it is meant to help translators work faster and to get the gist of foreign texts. 256 The necessary resources are explained in the following graph, taken from the LTPP:

FIGURE 6 "MÁLTÆKNIVISTKERFIÐ" / "THE LANGUAGE TECHNOLOGY ECOSYSTEM"257 Much of the groundwork has already been laid in Iceland, for example with grammar checkers and speech synthesizing used by the visually impaired from the 1990s onward. The quality and possibilities of usage of said tools are, however,

256 Anna Björk Nikulásdóttir, Máltækni Fyrir Íslensku 2018-2022. Verkáætlun, 18. 257 Ibid., 29.

49 limited according to the LTPP. 258 The LTPP stresses that those tools should be accessible to the Icelandic-speaking community as the economic influence can be significant.259 The LTPP mentions that research in speech recognition was performed collaboration with Google and Reykjavík University. As a result, the speech recognition system is the property of Google, but the authors of the LTPP much prefer the technology to be Icelandic-owned, as alterations to the software should be locally handled by local organizations and with local knowledge.260

The LTPP mentions that the first MT systems were rule-based until predictive models took over the scene. They mention that predictive models have the disadvantage that it is very expensive to compose bilingual corpora to train them, especially for smaller languages. They also mention that neural networks have shown promising results for languages with complex grammars, scoring better than earlier approaches.261 The LTPP does not mention a read-aloud option in the MT chapter, as is common in e.g., Google Translate. Icelandic is a challenge not only because of the small size of the language community, but also the morphologically rich language with long compound words and inflected forms can pose challenges, for example for pronunciation resources. 262 The Icelandic writing system is considered deep, meaning spelling does not necessarily correspond to pronunciation. In complex orthographic systems a complex alignment model must to be applied to monitor the quality of pronunciation resources necessary for both automatic speech recognition (ASR) and synthesis systems.263 As is accepted within AI in general, it is also true for translation technology – success depends on access to data. It is thus important that Icelandic does not become an under-resourced language if the LT is to keep up. Some predict that MT will reach human translation quality levels as soon as 2029,264 so it is important that resources are developed sooner rather than later.

258 Ibid., 30. 259 Ibid., 31. 260 Language Technology for Icelandic 2018-2022. Project Plan, 42, 61-62. 261 Máltækni Fyrir Íslensku 2018-2022. Verkáætlun, 72-73. 262 Martin Jansche, "Computer-Aided Quality Assurance of an Icelandic Pronunciation Dictionary" (paper presented at the European Language Resources Association (ELRA), Reykjavík, Iceland, 2014), 1. 263 Ibid., 4. 264 Meer, 289.

50

3.4 Open and Closed Machine Translation Systems for Icelandic The SMT system Moses has been actively developed since 2005 and was released under an open LGPL license. Moses is used in research as well as commercial systems, and Microsoft and Google have both incorporated it into their translation systems. Moses is trained on both bilingual and monolingual corpora. Moses assembles paired words or phrases and aligns them in probabilistic tables and assumes a meaning based on the pairings. The bilingual corpora train the translation model, and the monolingual corpus is used for the probabilistic tables and language assessment. Within the system several translation algorithms and language models can be chosen, and it often takes time to find the ideal setup.265 NMT systems translating to and from Icelandic surfaced after 2014, when the wave of research in NMT began. OpenNMT is a system from SYSTRAN and Harvard, published under an MIT-license, and Nematus is a system released under the BSD-license. Both systems were released in 2017 and the LTPP mentions that an assessment must be made of which system works better with Icelandic. As the LTPP mentions in another chapter, if cheaper solutions are available by utilizing other software environments, rather than building from scratch, they should be taken full advantage of.266 Other NMT systems that support Icelandic are not discussed in the LTPP. Examples of these NMT systems include the Facebook NMT system (with the open source toolkits fairseq sequence modeling and Translate PyTorch library), the Bing Microsoft Translator, AWS Amazon Translate, and Yandex Translate.267 Google Translate is mentioned in the language technology plan as being “inaccurate “, because it doesn´t differentiate between different texts and thus makes mistakes in phrases, concepts, and polysemous words.268 Google Translate is a closed system, which means only Google can develop it further.

265 Anna Björk Nikulásdóttir, Máltækni Fyrir Íslensku 2018-2022. Verkáætlun, 74. 266 Language Technology for Icelandic 2018-2022. Project Plan, 41. 267 These systems might have not been available at the time of printing of the LTPP in early June of 2017. However the LTPP mentions virtual assistants such as Apple Siri, Google Assistant, Microsoft Cortana, Samsung Bixby and Amazon Alexa. Ibid., 32. 268 Máltækni Fyrir Íslensku 2018-2022. Verkáætlun, 75.

51

There are still RBMT systems for Icelandic on the market today. These systems include Tungutorg, a closed system developed by Stefán Briem and Apertium, an open system developed in 2009-2010 at Reykjavík University (currently not accessible online). Neither of these systems are currently in general use, and the LTPP mentions that their rule-based approach is unlikely to be able to compete with NMT. 269 Apertium was at the time considered an important attempt at MT for Icelandic, primarily developed for translations from Icelandic to English and to a lesser extent from Icelandic to Swedish.270 It was praised for being easy to use as open source software, so development could be assisted by anybody. 271 The Apertium system was developed using the existing IceNLP toolkit by Hrafn Loftsson, and had a higher position-independent-error rate (PER) and word error rate (WER) than other available MT systems available at the time.272 In addition to using a 5.000 entry bilingual dictionary, a target language corpus was created from Wikipedia entries in order to perform an automatic evaluation of the MT output.273

The NMT development should take 3 years to complete, and the system should be evaluated on how well the NMT moves the subject matter correctly between languages, and the readability of the text is in the target language.274 The evaluations are made against the following criteria.

(1) Evaluate how many translators use MT to aid their work. The greater the number, the more useful the MT is to the translators. (2) Translation Error Rate (TER). The number of changes by which MT output is amended before it is finished and transmitted (post-editing). (3) The BLEU-metric evaluating whether the changes made to a MT system have improved the system. However, the BLEU algorithm is not useful for evaluating the quality of the translations or comparing systems of different underlying architectures.

269 Ibid. 270 Ingibjörg Elsa Björnsdóttir, "Vélþýðingar Á Íslensku Og Apertium-Þýðingarkerfið," Orð og Tunga 18 (2016): 131. 271 Ibid., 143. 272 Brandt, 1. 273 Ibid., 2. 274 Anna Björk Nikulásdóttir, Máltækni Fyrir Íslensku 2018-2022. Verkáætlun, 85.

52

(4) Clock translators' output with and without the MT. Evaluations based on subtitle translations have shown an increase in productivity by 35.5%275

It has been proposed that 1.5 years should be spent evaluating whether NMT or SMT should be pursued, and the one with the better results will ultimately be chosen.276 The LTPPP also mentions that it would be beneficial to select a specialized field for the translation, because the NMT is more likely to choose a correct translation in that setting.277 The necessary LT for MT is the ParIce corpus, the first parallel corpus built for MT research specifically with 39 million Icelandic words in 3.5 million segment pairs. The largest contribution is from the Opus corpus of film and TV subtitles as well as the European Medicines Agency document portal. The corpus is to be advanced during the program, as Icelandic with its rich morphology demands larger training data. 278 In order to develop an open MT system, some necessary precursory steps must be taken. They consist of assembling a bilingual corpus of 25-30 million pairs, sentences, or parts of sentences. The language technology plan suggests using Wikipedia, OpenSubtitles, and CommonCrawl for aligning an Icelandic-English corpus. Furthermore, a corpus of official EES- translations from the Translation Centre from the Ministry of Foreign affairs should be added and be self-automated, so it is easier to expand them in the future.279 CommonCrawl is a non-profit organization, and although it yields noisy data it can be used for better language models for languages with small Wikipedias.280 Bilingual dictionaries would be helpful for controlled areas, but no publication from governmental agencies is available in that area. The Translation Centre at the Ministry of Foreign Affairs has a terminology database, but it is not accessible in a way that could be used for MT. The only available bilingual dictionary is the terminology bank from the Árni Magnússon Institute for Icelandic Studies.281 The

275 Ibid., 79-80. 276 Ibid., 83. 277 Ibid., 85. 278 Anna Björk Nikulásdóttir, 7. 279 Anna Björk Nikulásdóttir, Máltækni Fyrir Íslensku 2018-2022. Verkáætlun, 80-81. 280 Piotr Bojanowski Edouard Grave, Prakhar Gupta, Armand Joulin, Tomas Mikolov, "Learning Word Vectors for 157 Languages," in International Conference on Language Resources and Evaluation (LREC) (2018), 3487. 281 Anna Björk Nikulásdóttir, Máltækni Fyrir Íslensku 2018-2022. Verkáætlun, 93.

53 monolingual corpus of Icelandic texts set to be produced as part of the program is also mentioned, which can be used to train the fluency of the target language output text.282 Other development towards the usability of the NMT are back-translations, where monolingual Icelandic texts are translated by one of the baseline systems, thereby obtaining training data for the direction en->is. The baseline systems that are to be developed will be evaluated according to performance and the best one selected in the end.283 The two NMT systems to be developed are one attention-based NMT using Tensor2Tensor, and another one using OpenNMT with a bidirectional LSTM model. The third system is a statistical phrase-based MT system based on Moses. Pre- and postprocessing tasks include a project on handling named entities. 284

The LTPP finally proposes an application programming interface in which the MT system that the project plan proposes can be compared to closed systems like Google Translate. The MT system is ultimately to be a CAT system for translators according to the graph provided, and although it is not specifically described as a CAT system within the text, it is mentioned several times that it is intended to help translators work faster than if they were to translate from scratch. Human translators can translate 20-30% faster with the aid of machine translation.285 The system online now at Miðeind is developed in connection with the LTPP and Almannarómur and is called vélþýðing. The description of the system does not mentioned that it is a CAT system, but rather that users can help improve the translations via crowdsourcing in the future, and that is meant to compete with other online, free NMT systems. Miðeind notes that Google Translate and Microsoft Translater handle Icelandic reasonably well, but want to do better.286 The LTPP is very clear on the MT system being a CAT (see Figure 6) and that it is to be used in more specialized areas than Google Translate 287 , while the developing firm leans towards high-quality, fully automated MT output. The use of crowdsourcing and the call for input from non- professional translators input at vélþýðing is somewhat contradictory to the LTPP

282 Ibid., 112. 283 Anna Björk Nikulásdóttir, 7. 284 Ibid. 285 Anna Björk Nikulásdóttir, Máltækni Fyrir Íslensku 2018-2022. Verkáætlun, 72. 286 Miðeind ehf, "Vélþýðing," https://xn--mieind-qwa.is/velthyding.html. 287 Anna Björk Nikulásdóttir, 1.

54 statement. It also remains to be seen how much the LTPP will be able to profit from observing translators' work to obtain data regarding the usability of the system. Literary translators often consider their work an art form and tend to remain skeptical about post-editing MT output.288 The opinions of professional translators in Iceland working in commercial environments towards NMT are a topic for further research. Another MT system that is currently being developed is a domain-specific MT system specialized in sub-language MT tasks in Digital Service Infrastructures, eJustice, and eProcurement by producing high-quality and curated language resources for under-resourced European languages, called the Principle Project.289

A self-owned organization Almannarómur (vox populi), founded in 2014, is responsible for the implementation of the language technology plan along with the research and development team SÍM. 290 Other countries have similar implementation plans, such as the STEVIN (Essential Speech and Language Technology Resources) plan for Dutch/Flemish, a Spanish language technology plan, and two plans for Estonian. Almannarómur acts as the project manager and oversees the Consortium for LT in Iceland SÍM, consisting of 9 members (The Árni Magnússon Institute for Icelandic Studies, Reykjavík University, University of Iceland, the national broadcaster RÚV, Creditinfo, The Association of the Visually Impaired, Grammatek, Miðeind, and Tiro).291

The Translation Center at the Ministry of Foreign Affairs (Þýðingamiðstöð utanríkisráðuneytis) provides a terminology database with over 82 000 entries assembled during the translation of the European Economic Area (EEA) Agreements after 1990. 292 The other major terminology database is maintained by the Árni Magnússon Institute for Icelandic Studies, a research institute at the University of Iceland, independently funded by the Ministry of Education, Science, and Culture. Some of the terminology collections are bilingual (Icelandic-English), while others

288 King, 156. 289 Principle Project, "Principle Leaflet/Infographic," (2021). 290 Almannarómur, "Hvað Er Máltækni Og Hvaða Máli Skiptir Hún Fyrir Íslensku?," https://almannaromur.is/. 291 Anna Björk Nikulásdóttir. 292 Sigrún Þorgeirsdóttir, "Hugtakasafn Þýðingamiðstöðvar Utanríkisráðuneytisins," https://hugtakasafn.utn.stjr.is/index.en.adp.

55 are multilingual. 293 Private sector practitioners such as Skopos Þýðingastofa, túlkamiðstöðin, Alþjóðasetur, Skjal Þýðingar, Þýðingastofa JC, Efnavernd Þýðingar, Markmál Þýðingastofa, Tvístirni Þýðingastofa, and Þýðingastofa Lingua will likely use (a) CAT-system(s) and terminology databases, and further research in this area will prove interesting as well as reveal the number of professional translators in Iceland. The tools developed under the LTPP are free to individuals and organizations, as per the Common Language Resources and Technology Infrastructure (CLARIN) project of the European Union in which Iceland held an observership status from 2018, eventually becoming a full member by 2020. 294 Translators in Iceland are members of the Association of Translators and Interpreters, in which practicing translators must belong to another association (e.g., of professional translators, television translators, or certified court interpreters and translators, etc.) in order to become members.295 Translation contract templates and rates are available to freelance translators via the Writers Union of Iceland.296 Not all countries, even other similarly geographically isolated islands, take the same approach as Iceland, as for example “[g]eographical isolation has encouraged an outward-looking approach to doing business”297 in New Zealand without government subsidies and practitioners looking rather overseas. The geographical isolation in Iceland has rather led to an inward approach with its attempt to re-create all necessary LT tools domestically with the aid of public funding. Regarding inclusiveness, assembly of a corpus of non-standard usage of Icelandic written language began in March 2020, which includes grammatical mistakes and improper lexicality, and is to be published under a CC-BY license.298 A further call for data has

293 Stofnun Árna Magnússonar í íslenskum fræðum, "Íðorðabankinn," https://idordabanki.arnastofnun.is/. 294 Samúel Þórisson Eiríkur Rögnvaldsson, "Um Clarin-Is," https://clarin.is/um-clarin/. 295 Bandalag Þýðenda og túlka, "Lög Félagsins," (2004). The last entry on the webpage dates back three years ago though. 296 Rithöfundasamband Íslands. Bandalag skrifandi stétta, "Þýðingasamningur Við Útgefendur - Taxtar," https://rsi.is/samningar-og-taxtar/taxtar/thydingasamningur-vid-utgefendur-taxtar/. 297 King. 150. 298 Lilja Björk Stefánsdóttir, "Við Söfnum Leiðréttingum Á Íslensku Ritmáli," ed. Máltækni við Háskóla Íslands (2020).

56 been extended to the community of people with Icelandic as a second language on online forums.299

As Jiménez-Crespo points out, non-professional translation (NPT) has emerged in recent years due to the accessibility afforded by digital technologies and web-based translation technologies. 300 In contexts such as Facebook Translate initiatives, fansubbing processes, the TED Open Translation initiative and cloud subtitling platforms such as Amara, self-declared translators (participants who have not received specific education in translation), who are typically not remunerated, will translate.301 The two main technological approaches are divided into solicited and unsolicited translations. Solicited translations are crowdsourced translations, meaning the initiating party (e.g., companies, institutions, organizations) will control the technologies used, and often carry out the translations as micro-task approaches. Unsolicited translations are often found when self-organized communities (often called participatory cultures) produce their own , such as activist translations or fansubbing.302 Research is needed to determine the extent of NPT in Iceland, and further discussion is required in the future.

4 Translation Theories and Translations in Icelandic

Machine translation itself is an interdisciplinary field, with scientists working within computational and applied linguistics, and artificial intelligence unite.303 However, MT is also largely concerned with translation, which is why it is surprising to find so little research on MT from the translation studies community throughout the years. 304 More recently there seems to be a shift, as in the past year two large

299 Þórunn Arnardóttir, "Óskað Eftir Textum Frá Fólki Sem Hefur Íslensku Sem Annað Mál," news release, 2020. The goal is to assemble 100 000 expressions. Anna Björk Nikulásdóttir, Language Technology for Icelandic 2018-2022. Project Plan, 44. 300 Miguel A. Jiménez-Crespo, "Technology and Non-Professional Translation," in The Routledge Handbook of Translation and Technology, ed. Minako O´Hagan (London, New York: Routledge, 2020), 1. 301 Ibid., 239. 302 Ibid., 240-41. 303 Djiako, 23. 304 Jeremy Munday's introductory book into translation studies, which covers all the main discussions, can be an example. It containts but 7 entried for MT, but 54 entried on various types of equivalence. Jeremy Munday, Introducing Translation Studies. Theories and Applications, 3rd ed. (London, New York: Routledge, 2012).

57 publications have concerned themselves with the various aspects of human translator and machine interactions.305 The emergence of NMT has prompted research from the translation studies communities as well as the diversification of job titles within the language service industry.306 Research has recently shifted to a translator studies perspective, with an emphasis on translation and interpreting workplace research.307 Translation scholars and scholars of Icelandic actively take part in the development of MT in Iceland.308

4.1 Concepts of Western Translation Theories Western scholars are responsible for many theoretical concepts within translation theory, such as mass communication, discourse analysis, back-translation, and psycholinguistics, and translation theory is largely Eurocentric. American scholars, such as Eugene Nida, also known as the “father of translation theory”, have also contributed greatly to the field.309 Western translation theories derived from , primarily the study of Classical Greek and Latin, and is highly concerned with lexical fidelity.310 This associations with a small section of canonical texts and the focus on written texts have resulted in translation theories' “[…] traditional pre- occupation with identity and preservation, its pervasive metaphors of transport and transference and its assumption of discrete, bounded entities, whether linguistic, social, political, historical or disciplinary.”311 As Hermans argues, these theories prove a narrow basis on which to reflect and theorize on the complexities of a postmodern,

305 Minako O´Hagan, The Routledge Handbook of Translation and Technology, Routledge Handbooks in Translation and Interpreting Studies (London, New York: Routledge, 2020). And Maureen Ehrensberger-Dow Erik Angelone, Gary Massey, The Bloomsbury Companion to Language Industry Studies, Bloomsbury Companions (London, New York, Oxford, New Delhi, Sydney: Bloomsbury Academic, 2020). 306 O´Hagan, "Introduction: Translation and Technology: Disruptive Entanglement of Human and Machine." 307 Regina Rogl Hanna Risku, Jelena Milošević, "Researching Workplaces," in The Bloomsbury Companion to Language Industry Studies, ed. Maureen Ehrensberger-Dow Erik Angelone, Gary Massey, Bloomsbury Companions (London, New York, Oxford, New Delhi, Sydney: Bloomsbury Academic, 2020), 37. 308 Project. 309 Chan Sin-wai, "Caught in the Web of Translation. Reflections on the Compilation of Three Translation Encyclopedias," in The Human Factor in Machine Translation, ed. Chan Sin-wai, Routledge Studies in Translation Technology (London, New York: Routledge, 2018), 56. 310 Munday, Introducing Translation Studies. Theories and Applications, 9-10. 311 Theo Hermans, Translating Others, 2 vols., vol. 1 (London, New York: Routledge, 2014 (2006)), 9.

58 postcolonial, and globalizing world. 312 Maria Tymoczko argues that non-Western translation theories might not share this narrow perspective, but rather assume a conceptual orientation that accepts that the translated text can show a substantial change of form.313 She also mentions that the theories are limited by the dominant ideological perspective of the time of their conception, for example Western imperialism or Western historical circumstance, such as “[…] the position of a national language and literature within a larger cultural hegemony.”314 She argues that theories in general are based on presuppositions that must be articulated and acknowledged and eventually reviewed and reconsidered. The translation theories discussed in this thesis are based on Greco-Roman textual traditions, a small subset of a European cultural context, and grows out of “[…] Christian values, nationalistic views about the relationship between language and cultural identity, and an upper- class emphasis on technical expertise and literacy.315 As translation studies have now occupied an epistemological space independent from linguistics, the pervasive idea still prevails at times, that a translation is in fact the original, just in a different code. Translating would thus mean taking apart and reconstructing the subject with a different material and different means in layman's terms, the building itself nevertheless remains intact or the same. 316

Throughout the 20th and 21st century, translation studies and MT research have generally followed separate paths. As Baker and Saldanha phrased it according to Hauenschield and Heizmann “[…] theoretical engagement with machine translation in translation studies was limited for most of the twentieth century, with translation scholars' indifference largely reciprocated by their counterparts in machine translation.”317 A possible explanation for this is that translations sometimes became new “originals” that could no longer be altered, such as translations of the

312 Ibid. 313 Maria Tymoczko, "Reconceptualizing Translation Theory. Integrating Non-Western Thought About Translation," in Translating Others, ed. Theo Hermans (London, New York: Routledge, 2014 (2006)), 22. 314 Ibid., 14. 315 Ibid., 14-15. 316 Ovidi Carbonell Cortés, "Misquoted Others. Locating Newness and Authority in ," ibid., 47. 317 Mona Baker, 305.

59

Bible. 318 Would translation theory and people in general be less bothered if the translations did not concern a “holy”, “canonical” or historical text319, but everyday occurrences that they are used to receiving in a translated form, e.g. daily news from abroad?320 Hauenschild and Heizmann published their book in 1997, trying to “in an attempt to bridge the gap between translation studies and MT, but as Harold Somers points out, they mostly rely on the analysis of only one MT system. Somers argues that their section on aspects of human translation of considerable interest to MT remains speculative as to how these theories might be applicable in MT.321 The second section discusses human translation oriented towards MT, and holds greater relevance, according to Somers. Two researchers quoted in the book (Prahl and Petzold) draw a connection between translation problems (as described within translation studies) and translation mismatches (as described within MT research). Somers states that they mistakenly interpret these translation mismatches as examples, where a source language expression has many target-language equivalents.322 As he points out, MT literature distinguishes these types of problems much more precisely, into mismatches, divergences, and gaps.323 Hauenschild and Heizmann cite a paper that discusses the fact that novice translators often make assumptions that they deduct from a hypothetical context, whereas experienced translators will try to work within an ambiguity-preserving strategy. As MT is often criticized for not being able to use real world knowledge, Somers is relieved to see how humans struggle within insufficient context, too.324 Heizmann mentions human translators using translation strategies (whether consciously or unconsciously) in moments of such insufficient background information during the translation process. She deduces that MT systems also use such decision processes, in another paper

318 Eysteinsson, 36. 319 A case in point would be the chivalric sagas or riddarasögur according to Álfrún Gunnlaugsdóttir. Ibid., 45,48-49. 320 Ibid., 14-17. 321 Somers, "Christa Hauenschild and Susanne Heizmann (Eds), Machine Translation and Translation Theory," 262. 322 Ibid., 264. 323 Bonni Jean Dorr, Machine Translation: A View from the Lexicon (Cambridge, London: The MIT Press, 1993), xv. 324 Somers, "Christa Hauenschild and Susanne Heizmann (Eds), Machine Translation and Translation Theory," 264.

60 argues that human reduction processes can be modelled in a MT system.325 However, there is perhaps more interaction with MT than previously disclosed. As André Lefevere and Susan Bassnett wrote, the main question in the early years of translation studies seemed to be whether translation is possible at all. They point out that translation studies has moved on to asking why people are interested in proving or disproving the feasibility of something that has existed for over 4000 years. 326 Lefevere and Bassnett mention that this “preposterous” question regarding the possibility of translatability emerged in the post-War period, when machines promised translations valid for all places and all times. As they sarcastically write: “machines, and machines alone, were to be trusted to produce “good” translations, always and everywhere.”327 It is thus interesting to view the history of translation studies and the struggles with equivalence from this viewpoint. Modern translation studies refers to any theories that (claim) to relate to translations, according to Lefevere and Bassnett.328 Translation theories prior to the post-War period are not discussed in this thesis.

4.2 Meaning and Equivalence The eternal struggle between literal and free translation carries over to MT.329 These two extreme types of translations are rarely found; in reality one usually encounters a mixture of both. The most extreme type of is “interlinear translation”, meaning that the source text syntax is copied into the target text syntax, a word-for-word translation. This type of translation is not very intelligible; however, such translations will often prove helpful for rhetorical and linguistic analysis. This interlinear translation can be compared to the first-generation direct MT systems. The most radical type of free translation is a transferring of meaning from a source

325 Susanne Heizman, "Human Strategies in Translation and Interpreting - What Mt Can Learn from Translators," in MT - Ten Years On, ed. Verbmobil (Cranfield: Universität Hildesheim, Institut für Angewandte Sprachwissenschaft, 1994), 1. Translations as decision processes are also described by Jiří Levý, "Translation as a Decision Process," in The Translation Studies Reader, ed. Lawrence Venuti (London, New York: Routledge, 2000). 326 Susan Bassnett André Lefevere, "Where Are We in Translation Studies?," in Constructing Cultures: Essays in Literary Translation Topics in Translation, ed. André Lefevere Susan Bassnett, Topics in Translation (Clevedon: Multilingual Matters, 1998), 1. 327 Ibid. 328 Ibid. 329 Alan K. Melby, 9.

61 language to the meaning of a target language. It is however far from obvious what “meaning” entails and how to measure it.330 Throughout history, translation theories have diverged in their stances for and against both types of translations. Walter Benjamin's 1923 influential essay, “The Task of the Translator”, for example, argues for a very literal approach and has been very influential. 331 Advocates of free translation sometimes accuse literal translations of violating a target language, while advocates for literal translations argue that they are “faithful” to the source texts, implying that free translations are unfaithful. Lefevere and Bassnett also mention that the concept of a universally valid equivalence or translatability is abandoned and translators will “[…] decide on the specific degree of equivalence they can realistically aim for in a specific text […]” and that this equivalence has little to do with the original concept as a foolproof way of finding the abstract equivalence. 332 A potential equivalence was proposed by Luhmann with the statement, that A and B are functionally equivalent, if they are both able to solve problem X.333

As Andrew Chesterman points out, there are many memes run through translation history. One of them is the relation norm, stating that a translator should assure that an appropriate relationship is established between source and target text.334 Traditionally this norm was defined as “faithfulness” or “fidelity” to the source text and the concept of truth in equivalence. Since equivalence is impossible according to Chesterman, translators are bound to be unfaithful, where the traddutore traditore (translator-traitor) trope originates. 335 The idea that translations are a betrayal or inferior reproduction of an original is closely related to

330 Ibid. 331 Walter Benjamin, "Charles Baudelaire Tableaux Parisiens. Deutsche Übertragung Mit Einem Vorwort Über Die Aufgabe Des Übersetzers," in Walter Benjamin. Gesammelte Schriften Iv 1, ed. Tillman Rexroth (Frankfurt am Main: Suhrkamp, 1991 (1972)). English translations: Harry Zohn, "The Task of the Translator. An Introduction to the Translation of Baudelaire´S Tableaux Parisiens," in Walter Benjamin. Illuminations, ed. Hannah Arendt (New York: Schocken Books, 2007 (1968)). And E. M. Valk James Hynd, "The Task of the Translator " in Translation Theory and Practice. A Historical Reader, ed. Daniel Weissbort Ástraður Eysteinsson (Oxford: Routledge, 2006 (1968)). 332 André Lefevere, 1-2. 333 Hans J. Vermeer Katharina Reiss, Towards a General Theory of Translational Action. Skopos Theory Explained, trans. Christiane Nord (London, New York: Routledge, 2014 (1984)), 119. 334 Andrew Chesterman, Memes of Translation. The Spread of Ideas in Translation Theory, vol. 22, Benjamins Translation Library (Amsterdam, Philadelphia: John Benjamins Publishing Company, 1997), 178. Italics not in original. 335 Ibid.

62 the terms “translation” versus “original”, and connected to authority and power.336 Translations as an enrichment for the target language culture were at one end of the spectrum from target-dominant and source-dominant theories. When texts are to be “[…] exploited for the benefit of the receiving culture”337, fluency, readability and intelligibility become the main concern. These translations were then typically tagged as belles infidèles, beautiful but unfaithful translations. 338 As Lori Chamberlain points out, oftentimes these metaphors were highly sexualized, and mentions that the term belles infidèles was coined in the 17th century but owns its longevity to the appearance of having found, not only a phonetic similarity, but also having captured the cultural complicity by equating fidelity in marriage and translations. So, translations should be like women, either beautiful or faithful. As Chamberlain points out, this double standard runs throughout the history of translation metaphors. Within the belles infidèles meme the translation (the female) and the relationship to its source text (husband, father, author) is condemned publicly as in traditional marriages, the wife/translation is publicly outed for crimes the husband/source text is incapable of committing. 339 The word fidelity is no exception, according to Chamberlain, as it is adapted according to context, often in a gendered version and riddled with ambivalence and anxiety about paternity (authorship or authority) and maternity (expressed through the belles infidèles or the “adulation” of the mother tongue of the target culture).340 As Lefevere and Bassnett point out, the equivalence concept used to be the central concept of translation studies but has since disintegrated.341 Melby argues that the main reason for the struggle between free and faithful translation is the search for or claim that transcendental meaning, which suggests that language communicates a meaning or messages independent of its “carrier” language, in other words, the claim that meaning exists independent of humans. Ferdinand Saussure argued the same with his distinction between langue

336 Susan Bassnett, "When Is a Translation Not a Translation?," in Constructing Cultures: Essays on Literary Translation Topics, ed. André Lefevere Susan Bassnett, Topics in Translation (Clevedon: Multilingual Matters, 1998), 25. 337 Chesterman, 22, 25. 338 Ibid. 339 Lori Chamberlain, "Gender and the Metaphorics of Translation," in The Translation Studies Reader, ed. Lawrence Venuti (London, New York: Routledge, 2000), 315. 340 Ibid., 319. 341 André Lefevere, 1.

63 and parole; however his approach is destabilized since his transcendent meaning has its basis in the divine. As Melby asks, “[i]f computers could produce translations indistinguishable from those of humans, would that ability be evidence for transcendental meaning?” If meaning is not transcendental, how is it then possible to communicate through languages?342

Scholars such as Neubert and Shreve have argued that there is indeed not just one correct translation.343 Little research has been undertaken to evaluate rhetorical perspectives, as Qian Duoxio points out. Dynamic equivalence (according to Nida and Tauber) was initially thought to cover the rhetorical effects of translations as well. Different perspectives were later added to the equivalence portfolio, such as communicative equivalence by Newark in 1981, shifts by Catford in 1965, pragmatic equivalence by Baker in 1992 and directional equivalence by Pym in 2010.344 Register evaluation was popularized by Juliane House's work in the interpersonal function of language.345 Duoxiu proposes to assess translations from a rhetorical perspective as well, in order to arrive at a conclusion whether the target text had similar success in evoking similar feelings or calling the reader to take similar actions as the source text. He proposes to evaluate such effects according to a methodology of corpus-based rhetorical assessment. To employ that method, the texts in the corpus are broken down into rhetorical features (words and phrases) and evaluated according to a set of pre-determined options on the overall effect of the phrase or word. Once the texts are tagged, they can be analyzed according to pre-dominant factors and thus the translation can correspondingly be evaluated and quantitatively measured.346 Hatim and Mason agree that a register analysis, an analysis of the communicative potential of the utterances, is fruitful.347 They state that the structure and texture of text are subject to higher-order contextual requirements, composing the communicative

342 Alan K. Melby, 11. 343 Gregory M. Shreve Albrecht Neubert, Translation as Text, ed. Gert Jäger Albrecht Neubert, Gregory M. Shreve, Translation Studies (Kent, London: The Kent State University Press, 1992). 344 Duoxiu, 84. 345 Munday, Evaluation in Translation, 19. 346 Duoxiu, 89. 347 Ian Mason Basil Hatim, The Translator as Communicator (London, New York: Routledge, 1997), 97.

64 potential through register-based, semiotic, and pragmatic features. 348 Thus these tools can give a new dimension to quality assessments in equivalence in translation studies349, perhaps even MT evaluation in the future?

Nida and Taber suggested in 1969 that translation should be split into three processes called the “analysis-transfer-synthesis”. 350 This approach was new to translation studies at the time in the sense that it made translation an explicit process. As Melby points out, early MT followed that same structure, however it is questionable how these steps relate to the human translation process. 351 Human translations have been viewed as black box processes, or an individualistic view of a mysterious inner process that occurs when translating. This idea persists in many translation theories today. 352 The black-box metaphor is interesting in regard to machine learning, as in neural networks the “[…] results of deep learning are sub- symbolic and un-inspectable by humans”.353

Translations often display the component of communication, which relates to MT in a variety of ways as discussed in this section. As George Steiner writes, no two people speak the exact same language for a variety of reasons. These reasons include a desire of the individual to not only represent a message or idea, but for the potential to conceal information or leave something unsaid.354 The MT scholar Alan K. Melby and C. Terry Warner go so far as to deem that the limits of MT are due to a lack of communicative intent and acknowledgment of their counterparts from machines.355 They furthermore stipulate that this is due to the over-reliance on the objectivist generalizations that linguistics offers as meaning existing independently of people, words, and languages. 356 Melby and Warner argue, following Levinas, that “[…]

348 Ibid. 349 Duoxiu, 97. 350 Charles R. Taber Eugene A. Nida, The Theory and Practice of Translation, Helps for Translators. Prepared under the Auspices of the United Bible Societies (Leiden: E. J. Brill, 1982 (1969)), 33. 351 Alan K. Melby, 10. 352 Tymoczko, 18. 353 Melby, 428. 354 George Steiner, After Babel (London, Oxford, New York: Oxford University Press, 1976 (1975)), 47-49. 355 Alan K. Melby, 119. 356 Ibid., 2.

65 language is grounded in constant interactions between humans who are responding to ethical obligations to each other, yielding or resisting the perceived needs of the other.”357 They state that not only the form but also the content of language depends on recognition of the other party in a communicative act.358 Translations within the context of communication have been theorized widely, for example by Jörn Albrecht. He views translations as a special case of communication, as a communication process in two steps. He doesn´t view translation as a problem of recoding, but rather as an adequate rendering of semantic information. Equivalence doesn't mean that the texts must be identical, but rather that they are of equal value.359 Hans J. Vermeer disagrees on translations as two-phase processes of communications. The text can never serve two masters at once, the translational action must recognize the relationship between source culture and target culture as well as reorganize the relationship between the verbalized elements and the situation, not merely transcoding the linguistic signs.360

The parallel corpus used to train the Icelandic NMT system is to an extent assembled from subtitles. Subtitles are an interesting field within translation studies. Subtitles display their own set of rules such as politeness or tone which are very culturally dependent361 Politeness refers to face-threatening-acts as per Brown and Levinson's study, which speakers will normally try to avoid them as much as possible. The severity of the face-threatening-act depends also on the relative distance and social power between speaker and addressee.362 The dynamics of politeness will often require a degree of linguistic modification to be relayed trans-culturally.363 There are various discourse processes that must be studied in regard to subtitling, as the

357 Ibid., 167. 358 Ibid., 122-23. 359 Holger Siever, Übersetzungswissenschaft. Eine Einführung (Tübingen: Narr Francke Attempto Verlag GmbH + Co. KG, 2015), 63. 360 Katharina Reiss, 58. Self-translations are another interesting field here at the fringes of translation theory, when authors publish their translations of their own work but might in the process rethink their original concepts. Bassnett, 31. 361 Basil Hatim, 1,78. 362 Stephen C. Levinson Penelope Brown, Politeness. Some Universals in Language Usage, ed. John J. Gumperz, Studies in Interactional Sociolinguistics (Cambridge, New York, New Rochelle, Melbourne, Sydney: Cambridge University Press, 1987). 363 Basil Hatim, 82.

66 medium in which meaning is conveyed has specific physical constraints and the redundancy of speech is reduced. Translators and dubbers must also assess coherence strategies, as readers cannot back-track meaning.364 Translators will also have to construct the discourse for a specific target-audience according to the effect it will have on them. 365 Hatim and Mason mentions the viewers as “auditors”, as the screenwriter intends the dialogue for not directly addressed receivers. 366 Under severe limitations of space, subtitlers

[…] make it their overriding priority to establish coherence for their receivers, i.e. the mass auditors, by ensuring easy readability and connectivity; their second priority would then be the addressee-design of the dictional characters on screen (particularly in terms of the inter-personal pragmatics involved). Specifically, there is systematic loss in subtitling of indicators of interlocutors accommodating to each other´s “face-wants”.367

According to Basil and Hatim´s analysis it is difficult for target language audiences to retrieve interpersonal meaning in its entirety from subtitles, sometimes even giving misleading impressions of characters directness in the observed discourses.368 It could prove interesting to further investigate the discourses present within the subtitle collection used to train the Icelandic NMT and SMT from the LTPP.

MT and human translation are related in their early approaches to the field, mainly building on linguistic (lexical) transfers. An evolution to a functional approach is observable within human translation,369 which will be covered in the following chapter. Machine translation has evolved from a lexical approach to a linguistic rule- based approach, and onward towards a pragmatic approach relying on data collection.370

364 Ibid., 79. 365 Ibid., 82. 366 Ibid., 83. 367 Ibid., 84. 368 Ibid., 96. 369 Djiako, 29. 370 Ibid., 30.

67

4.3 Functionalist Approaches

4.3.1 The Structuralist Approach As mentioned earlier, translation theory and MT research existed mostly parallel to one another with few intersections. 371 Jeremy Munday mentions that translation studies' relationship to other disciplines is not fixed though. A strong link to contrastive linguistics (where overlaps could be found with MT scholars372) in the 1960s shifted to a focus to cultural studies and more recently a shift towards computation and media.373 Recent research has been focused on interdisciplinarity or multidisciplinary with translation studies in the role of a Phoenecian trader (Murray) among longer-established disciplines, such as comparative literature, modern language studies, cultural studies such as gender and post-colonial studies, linguistics and philosophy.374 Since its beginnings in the 1970s, translation studies has struggled with the juxtaposition of professional translation and more abstract research activity. 375 The earliest translation research focused on translation as a language-learning methodology or translation as a part of comparative literature. As to the object of translation studies, Jakobson´s structuralist approach suggests the following categories to qualify for translations:

Intralingual translations or rewording – interpreting verbal signs with other verbal signs within the same language; this often includes explanatory actions, for example, or re-writing a Wikipedia article into simplified English.

Interlingual translations – interpreting verbal signs with other verbal signs of another language; also called translation proper (and the object of study in this paper).

371 Mona Baker, 305. 372 Baker and Saldanha quote Bennett ibid. 373 Munday, Introducing Translation Studies. Theories and Applications, 25. 374 Ibid., 24. 375 Ibid., 11.

68

Intersemtiotic translations or transmutations – interpreting verbal signs by means of a non-verbal sign system (for example translating a written text into music or movies).376

MT is mostly unconcerned with intersemiotic translation, with little intralingual translation (spell-checkers and other word processing tools) and mainly interlingual translation. Recent developments in the fields of localization, both the cultural and lingual adaptation for a different part of the world, have challenged the rigidness of the definition of those approaches.377

4.3.2 Holmes´ Map and Literary Polysystem Theory Ástráður Eysteinsson asks whether translators in Iceland have played a main role in inventing the literary language, reinforcing the pillars of the ancient language that is still used today.378 He doesn´t answer this question unambiguously, because his main intention is to view translations within the outlines of the literary polysystem, as established by Itamar Even-Zohar. The polysystem theory is based on the works of Russian formalists of the 1920s and Czech structuralists of the 1930s and 1940s and, although conceived to work within literature, can also be applied to other complexes within culture.379 Within formalism, literary works are not studied by themselves, but as part of the social, historical, literary, and cultural framework.380 A prerequisite for the polysystem theory is the inclusion of all works of literature, irrespective of norms of taste.381 The position of translated literature within the polysystem is not fixed, as the system displays a dynamic hierarchy that can change throughout history. Innovatory and conservative systems are in constant competition and flux. 382 Translation occupies a primary position in the polysystem when it is linked to major events of literary history, and shapes the center of the polysystem.383 This could occur

376 Ibid., 8. 377 Ibid., 9. 378 Eysteinsson, 225. 379 Itamar Even-Zohar, "Polysystem Studies," Poetics Today. International Journal for Theory and Analysis of Literature and Communication 11, no. 1 (1990): 2. Munday, Introducing Translation Studies. Theories and Applications, 165. 380 Introducing Translation Studies. Theories and Applications, 165. 381 Even-Zohar, 13. 382 Munday, Introducing Translation Studies. Theories and Applications, 166. 383 Even-Zohar, 46.

69 when a “young” literature emerges and looks abroad for applicable models. This could also happen when a literature is peripheral or “weak”, and looks to import the literary types it lacks, usually when a smaller language or nation is dominated by a larger culture. A third possibility of translations assuming a primary position might be when a critical turning point presents itself in history and creates vacuum in the literature of the country, whereby old models no longer suffice (often for younger generations). 384 If, however, translation assumes a secondary position within the polysystem, it moves to a peripheral system that has no major influence on the central system, and even becomes a conservative element, conforming to the literary norms and preserving conventional forms. This secondary position is the standard mode for translated literatures, according to Even-Zohar.385 The status within the polysystem will dictate the translation strategy – when the translated works occupy a primary position the translations will be oriented more closely towards the source text, resulting in revolutionary or foreign texts for the home culture. These trends from translations may even eventually become the new norm in the target culture, as the polysystem will allow for innovations at that point. In cases where translations occupy a secondary position within the polysystem, translators adhere closely to target culture conventions, producing non-adequate translations according to Even- Zohar.386 Susan Bassnett agrees with the importance of the systematic approach for a radical rethinking of how canons are composed and literary history is written, but criticizes Even-Zohar's crude, evaluative terminology.387

Looking at the polysystem in Iceland, it is important to note that typically a very clear line has been drawn between translations and original works. During the Icelandic Age of Enlightenment, the line was not so clear, or at least not so judgmental.388 The center of Icelandic literature was focused on translations from roughly 1740-1860, with translations of so-called classic literature.389 A turning point in the polysystem

384 Ibid., 47-48. 385 Munday, Introducing Translation Studies. Theories and Applications, 167. 386 Even-Zohar, 50-51. 387 Susan Bassnett, "The Translation Turn in Cultural Studies," in Constructing Cultures. Essays in Literary Translation, ed. André Lefevere Susan Bassnett, Topics in Translation (Clevedon: Multilingual Matters, 1998), 127-28. 388 Eysteinsson, 229. 389 Ibid.

70 was reached in 1835, when the journal Fjölnir was first published. Translations were still printed in the first edition of Fjölnir, but they were not met with approval and were considered “futile for most Icelanders”.390 From this point on there has been a visible goal of placing the medieval manuscripts in the primary position within the literary polysystem, a practice that continues to this day.391 Thus, translations began to occupy a secondary position within the polysystem. As Eysteinsson points out, the romanticization of the medieval sagas as well as the emphasis on the purity of the language borders on intralingual translation in order for the narrative to work.392 The history of the nation is construed in such a way that there seems to be an incessant bridge to a messianic past.393 The Holy Trinity of country, nation and language was thus “invented” during the Romantic Period of Icelandic literature. 394 Within Western theories of literature systems, translations have long been shunned because originals are occupying the primary position. Once the canonic literatures of nations came into play, translations were largely ignored395. The ambivalence towards MT systems (are they for the general public or for specialized translators?) and the fear of the language becoming obsolete as outlined in the LTPP are deeply rooted within the discourses described here. MT in Iceland thus enters this polysystem with its own challenges and the predetermined secondary position of translations. As Even-Zohar argues, the position within the polysystem determines translation approaches. As translations occupy a peripheral position in Iceland, domestication strategies should be preferred, thus the foreignization effect sometimes experienced in MT output is likely to be rejected.396

Another observation from translation studies regarding foreignization and domestication concepts concerns be the visibility of the translator (or in this case the MT system). Eugene Nida first posed the question in his influential 1947 essay on principles of Bible translation, where he argues for a cultural adaptation with the

390 Ibid., 230. 391 Ibid., 231. 392 Ibid., 232. 393 Ibid., 233. 394 Ibid., 234. 395 Ibid., 65. 396 Constantine, 474.

71 translator attempting to be invisible (meaning that the text appears to have been written in the target language to begin with). On the other hand, Lawrence Venuti founded the argument in 1986 that invisibility may not be necessarily desirable.397 Lawrence Venuti calls for translators to inscribe themselves into the text, instead of trying to be invisible.398 He states that the “illusion of transparency” is due to the translator's effort to have the target text free of linguistic and stylistic peculiarities (he calls it “domesticated” 399 ), thus appearing to the reader that the text is not a translation, but the “original”. 400 Another imminent question in this context is whether translations are to be defined by an identity-type relationship, or whether it might be fruitful to define them instead by a similarity relationship, which can entail difference.401

In 1972, the US scholar James S. Holmes published a paper seminal to the emergence of the academic discipline of translation studies in the English-speaking world in 1972 (the paper was not only widely until 1988).402 The “Name and Nature of Translation Studies” draws up a map of an overall framework of what translation studies covers,403 later presented by Gideon Toury as Descriptive Translation Studies (DTS).404 Holmes also argues for translation studies as the name of the discipline, which was ultimately established. 405 He presented a classification within the field, that Toury depicts with an iconic map.

397 Alan K. Melby, 9. 398 Bassnett, "When Is a Translation Not a Translation?," 25. 399 Lawrence Venuti, The Translator´S Invisibility. A History of Translation, ed. André Lefevere Susan Bassnett, Translation Studies (London, New York: Routledge, 1995), 34. 400 Ibid., 1. 401 Tymoczko, 23. 402 Munday, Introducing Translation Studies. Theories and Applications, 10. 403 James S. Holmes, "The Name and Nature of Translation Studies," in The Translation Studies Reader, ed. Lawrence Venuti (London, New York: Routledge, 2000). 404 Gideon Toury, Descriptive Translation Studies - and Beyond, Benjamins Translation Library (Amsterdam, Philadelphia: John Benjamins Publishing Company, 1995). 405 Holmes, 173-74.

72

FIGURE 7 HOLMES'S MAP406 The partial theoretical theories within the framework are not exclusive, but often occur simultaneously within one text.407 Toury also attests, that even though the separation serves a methodical purpose, the aim on an institutional level should be to study the interdependencies as well, rather than only the local problems. 408 As Barkhordar suggests, Holmes's Map can be used to examine the different branches of MT research.409 It will be used here to categorize Icelandic language technology, as mentioned in the 2018-2022 project plan for Icelandic language technology.410

Within the pure branch of translation studies and the descriptive sub-branch, the product-oriented theories are used within MT to evaluate the output. Diachronic studies are applied when MT output is evaluated based on the different stages of MT history, while synchronic studies are used when texts are evaluated based on linguistic markers (for example lexicality, ambiguity, and others). The LTPP proposes that it should be possible to compare the Icelandic NMT output to other NMT output from Google Translate or others; 411 however, it is not mentioned explicitly whether the authors expect a synchronic or diachronic comparison

406 Toury, 4. 407 Holmes, 181. 408 Toury, 5. 409 Seyyed Yahya Bahordar, "The Assessment of Machine Translation According to Holmes´ Map of Translation Studies," Translation Journal (2018). 410 Anna Björk Nikulásdóttir, Language Technology for Icelandic 2018-2022. Project Plan. 411 Máltækni Fyrir Íslensku 2018-2022. Verkáætlun, 85.

73 approach. My interpretation is that they expect a synchronic approach, as both systems are NMT, but currently in different stages of their development as well as open versus closed.412

Within the process-oriented approaches, theories deal with the decision- making structures during translations, as described earlier in this chapter by Susanne Heizmann. 413 At some point during the writing of this thesis end-users were given the possibility to instantly revise possible synonyms within the vélþýðing NMT output. There are also plans to establish a testing environment where the public can submit texts and chose the translations they prefer, thus giving instant feedback to the newly developed neural network, but is not currently available within vélþýðing.414 It is, however, available within Google Translate and Facebook's NMT though. Thus, decision-making processes might be remodeled.

The function-oriented approaches within the descriptive branch of pure translation theories are concerned with the usage of the MT output, whether for general purposes or for narrower use. The LTPP mentions that if the new translation system is to be useful in certain domains, it must be fed with a tailored parallel corpus from that domain.415 The sub-language approach of the Principle Project is another example of tailored MT systems.416 Another example is the Læknarómur, a speech recognition system for radiologists.417

The final sub-branch within “pure” translation studies in DTS are the theoretical approaches. Here, the partial approaches are examined first. The medium-restricted theories can be applied to MT by examining the translator itself as well as the medium used by MT.418 Human translators and MT interact in

412 Language Technology for Icelandic 2018-2022. Project Plan, 87. This is currently employed online here: Miðeind ehf, "Vélþýðing. Knúin Af Greyni," https://velthyding.is/. During the writing of this thesis, there was at some point a possibility for the end-user to compare the output of Google Translate and Vélþýðing within the interface, but this feature had been disabled while writing this chapter (March 27th and 28th 2021). 413 Bahordar. 414 Anna Björk Nikulásdóttir, Language Technology for Icelandic 2018-2022. Project Plan, 87. 415 Ibid., 85. 416 Project. 417 Anna Björk Nikulásdóttir, Language Technology for Icelandic 2018-2022. Project Plan, 42. 418 Bahordar.

74 countless ways, especially within CAT systems. The LTPP mentions that it is desirable that as many translators as possible log their translations into the new systems.419 The LTPP also mentions that the usefulness of the translation system will largely be determined by the usefulness for translators as potential end-users.420 The LTPP mentions that the medium used within the system should be trained with a data- driven approach as opposed to the rule-based approaches used in Tungutorg and Apertium. They mention that they believe NMT will be the most successful, but that SMT could also prove useful.421

The area-restricted theories can be applied here insofar, as the translation system developed will be language-pair restricted (Icelandic – English) and bidirectional. Further language pairs or a multilingual system are not mentioned in the project plan.422

The rank-restricted theories can be applied to MT in the case of sublanguages and different linguistic ranks of a language, which will call for lexical and grammatical modifications.423 The project plan mentions that the system is to be built for general use, but with the possibility to adapt it to more specified fields.424

The text-type-restricted theories will study the genre of the text to be translated by MT. With some text types, accuracy is not necessary, rather a quick overview is needed. Text-types are not mentioned specifically in the LTPP, but the plan mentions that the differentiation of the general system is useful for official documents or subtitle translation as well as for domain-specific language.425

The time-restricted theories from Holmes's map can be used to study the advances made by MT technology from a historical angle. The LTPP indeed draws a

419 Anna Björk Nikulásdóttir, Language Technology for Icelandic 2018-2022. Project Plan, 80. 420 Ibid., 81. 421 Ibid. 422 Ibid., 83. 423 Bahordar. 424 Anna Björk Nikulásdóttir, Language Technology for Icelandic 2018-2022. Project Plan, 83. 425 Ibid., 88-89.

75 quick overview of the history of MT in general as well as mentioning an evolution from RBMT to SMT and NMT.426

The problem-restricted theories in MT consider the challenges that MT experiences, which can be linguistic or extra-linguistic. Non-standard language can be especially challenging to MT.427 The Language Technology Project Plan mentions the inadequacies of various MT systems, for example Google Translate's poor output: “The accuracy is low as it does not differentiate between texts from different domains and, as a result, is prone to making errors in translating ambiguous words, phrases and concepts. It is a closed-software that only Google can develop or adapt to special needs.”428 They also mention that the two RBMT systems that are currently not in public use (Tungutorg and Apertium) are unlikely to yield better results than the proposed NMT.429

Finally, the general theoretical branch of translation studies from Holmes´s mp is not mentioned in Bahordar but will be covered in the following chapters here. The applied branch of translation studies with translator training, translator aids and translator criticism are somewhat covered within the LTPP. The MT system is mainly conceptualized to be useful, which can be assessed by whether translators use it.430 Translator training is not mentioned but could entail the acknowledgment of the crucial role played of MT in society. 431 The area of is mentioned in the evaluation chapter in this work. Not all researchers agree on these classifications, however. Jeremy Munday places editing within the translation criticism branch, which has a MT component. He places all other CAT tools, as well as MT itself, corpora, and online databases as IT applications under translation aids.432

426 Ibid., 74, 78. 427 Bahordar. 428 Anna Björk Nikulásdóttir, Language Technology for Icelandic 2018-2022. Project Plan, 78. 429 Ibid. 430 Máltækni Fyrir Íslensku 2018-2022. Verkáætlun, 78. 431 Bahordar. 432 Munday, Introducing Translation Studies. Theories and Applications, 19.

76

4.3.4 Descriptive Translation Studies and Skopos Theory Within DTS, Toury mentions the futility of studying a product-oriented subject without determining the function of the product or studying a process-oriented subject without considering cultural-semiotic conditions. 433 He proposes the following model:

FIGURE 8 THE RELATIONS BETWEEN FUNCTION, PRODUCT, AND PROCESS IN TRANSLATION434 He mentions, just like Even-Zohar, that the translation strategy adopted by the translator will largely depends on the status of translation in a central or peripheral position, which will influences high and low prestige or rarity versus prevalence.435 The focus of the functionalist theories has thus moved away from a source language orientation to a target language orientation. 436 Earlier translation theories are strongly grounded in linguistics and largely concerned with equivalence but functionalist approaches reject equivalence altogether and associate themselves rather within comparative literature studies.437

Gideon Toury is also known to break with the tradition of defining the objects within translation studies in Western (Eurocentric) models, as he proposed a posteriori definition within translation studies. As such, any text that is accepted as

433 Toury, 7. 434 Ibid. 435 Ibid. 436 Erich Prunč, Einführung in Die Translationswissenschaft (Graz: Selbstverlag, Institut für Theoretische und Angewandte Translationswissenschaft, 2002), 238. 437 Siever, 169.

77 a translation by a receptor culture must be studied as a translation,438 irrespective of the scholar's expectations of a translation.439 Thus, even pseudo-translations without a source text become the objects of translation studies.440 Thus the equivalence, as suggested by earlier researchers, footed in a source-text orientation was abandoned.441 As such, it is naive to compare translations to the source text based on deviations and equivalence, as they must be studied based on their own story of origin as well as the literary system of the target culture.442 The Manipulation School,443 as they came to be called, have pointed out themselves, that they do not suggest a model of evaluation, but of explanation.444

With the shift away from purely linguistic models and a focus on the target text and culture, translation studies saw a rise in theories pertaining to the practice of human translators 445 , or translational actions, 446 and translation as an act of intercultural communication.447 Katharina Reiß circles back to the equivalence model but aimed the equivalence at the entire text, rather than the sentence levels.448 The theory came to be associated with Hans J. Vermeer as skopos theory (skopos is Greek and defined within the theory as “purpose”), while Reiß' text-typological approach is a specific theory within the larger skopos theory background.449 The skopos of the text should be the guiding light to the translation process, which is a variety of translational action, based on a source text (unlike a consultant's report for example). The aim and mode of the translatum (the resulting translated text) should be

438 Theo Hermans, "Introduction. Translation Studies and a New Paradigm," in The Manipulation of Literature. Studies in Literary Translation, ed. Theo Hermans (London, Sydney: Croom Helm, 1985), 13. 439 Tymoczko, 21. 440 Siever, 171. 441 Ibid., 170. 442 Jörn Albrecht, Literarische Übersetzung (Darmstadt: Wissenschaftliche Buchgesellschaft, 1998), 196. 443 Prunč, 229. 444 Theo Hermans, Translation in Systems. Descriptive and System-Oriented Approaches Explained, ed. Anthony Pym, Translation Theories Explained (Manchester: St. Jerome Publishing, 2009 (1999)), 159, 4. 445 Munday, Introducing Translation Studies. Theories and Applications, 111. 446 Katharina Reiss. Some translate the term to be “translatorial actions”. See Christiane Nord, "Translator's Preface," in Towards a General Theory of Translational Action, ed. Hans J. Vermeer Katharina Reiss (London, New York: Routledge, 2012), ii. 447 Munday, Introducing Translation Studies. Theories and Applications, 133. 448 Ibid., 111. 449 Nord, i.

78 negotiated by the client who commissions the translation.450 The source text thus becomes a constituent of the translation commission, and therefore the basis for all relevant factors pertaining to the translation, which should be hierarchically ordered.451 The target text can diverge from the source text considerably, but it may also follow the same skopos as the source text – whatever skopos was deemed necessary or appropriate in the context. In order to qualify as a translational action, the translator must explain why they acted as they did during the translational process.452 Such an aim is present in any action and indicates that any act of speech is skopos oriented.453 Vermeer asserts that his theory is applicable to all translational actions, from advertising texts to literary translations, but it is not the point to discuss the extend to which the skopos is realizable.454 The recipients of a translation are always clear according to the skopos theory. A translator might not visualize a specific addressee, but if they try to make themselves intelligible “to the world” they must assume, albeit unconsciously, a certain level of intelligence and education among the recipients.455 A skopos may not be stated explicitly, it can also be implied by the commissioner or translator themselves, but a statement of skopos is necessary in order for the translation to be carried out at all. 456 In practice, commissions are typically given explicitly (“Please translate this text”), but as Vermeer writes:

The specification of purpose, addressees etc. is usually sufficiently apparent from the commission situation itself: unless otherwise indicated, it will be assumed in our culture that for instance a technical article about some astronomical discovery is to be translated as a technical article for astronomers.457

The commission of the translation then should therefore include the goal or aim of the translation, including practical matters such as fee and deadline, and should be negotiated between the commissioner and the translator. The realizability

450 Hans J. Vermeer, "Skopos and Commission in Translational Action," in The Translation Studies Reader, ed. Lawrence Venuti (London, New York: Routledge, 2000), 221. 451 Ibid., 222. 452 Ibid., 223. 453 Ibid., 224. 454 Ibid., 226. 455 Ibid., 227. 456 Ibid., 228. 457 Ibid., 229.

79 would only depend on the target culture, and if the commission cannot be realized, the translator must decide upon an “optimal” translation depending on the circumstances, which can mean “as good as possible in view of the resources available”.458 The translation can contain more or even less information than the source text, but according to Vermeer it will contain other information than the source text.459 The skopos theory is still applied today. As Lynne Bowker states, translations must be fit-for-purpose, so it is the responsibility of the client to specify the purpose of the translation, and the way in which the translator produces the translation is largely left up to them. The iron triangle of time vs. cost vs. quality is to be kept in mind, where one will always have to draw the shorter straw.460 Thus, these theories are central to MT within a CAT approach.

Skopos theory assumes a human translator, able to make informed decisions and acting as an interpreter.461 Some observations can be deduced by applying the skopos theory to MT. In vélþýðing one must click “translate” to receive a translation, thus requesting the translation; or in Vermeer's words, ordering the commission. In Google Translate, the translation happens automatically as soon as text is entered in the source language field. Supposedly the commission happens here, as soon as one visits the website whose sole purpose is to provide translations. On Facebook, a translation often is suggested a priori. The user then has the possibility of requesting the original text. In this case the social network assumes the need for a translation and presupposes a commission. The skopos is implied in all cases and not explicitly stated, but according to Vermeer, all translations must have a skopos in order for the translation to be carried out at all. Deadlines are as soon as possible and it is accepted that the services are free for those three translation services. As Vermeer states, the realizability of the skopos depends on the target culture and language, thus in cases where the MT system cannot provide an optimal solution, often a translation “as good as possible in view of the resources” will be given. The end user must then determine

458 Ibid., 230. 459 Siever, 87. 460 Lynne Bowker, "Fit-for-Purpose Translation," in The Routledge Handbook of Translation and Technology, ed. Minako O´Hagan (London, New York: Routledge, 2020), 459-65. 461 Siever, 87.

80 whether the translation is intelligible and whether their intended skopos was achieved. Indeed, Google Translate and vélþýðing offer to suggest edits to the target text in case the end-user can provide a more accurate translation in their opinion. However, it could prove interesting to further research whether the feasibility of asking users what their skopos with the translation is. This way the MT system could offer different results to different users. If translators are to be the end users, it is especially important to see what kind of assistance they are looking for. Running the following sentence on March 28th, 2021 produced the following results:

Source sentence: Sigurður er stundum fúll.

Human translation: Sigurður is sometimes grumpy.

Vélþýðing: Victory is sometimes furious.

Google Translate: Sigurður is sometimes drunk.

The sentence was not meant to trick the systems and does not contain homonyms, sarcasm, names that are identical to objects or concepts, non-standard language, slang, or figures of speech. Vélþýðing erroneously translated the male name Sigurður. Google Translate keeps the name but doesn't recognize the word fúll, instead translating the English word full, meaning drunk. If the user's skopos were clear, the system could offer different versions. Therefore, if the user's intention were to receive an optimal solution, the system could mark the words it was not sure about or leave them in the original language, for the user to look them up manually. The results point to the conclusion, that the systems assume, that user' s skopos is to receive some translation rather than no translation at all. As Nida and Taber asked at the time when discussing equivalence: “Is this a correct translation?”. The answer to this question is another question: “For whom?” 462 This point should be further researched and discussed with the MT development teams and empirically researched, which is outside of the scope of this thesis.

462 Eugene A. Nida, 1. Italics not in the original, but used when quoted in Katharina Reiss, 86.

81

A theoretical and methodological turning point in translation studies occurred around 1990 with the emergence of the cultural approach or “cultural turn”, associated with André Lefevere and Susan Bassnett.463 Following this cultural turn, translation studies gained full independence from linguistics and literature studies. 464 As André Lefevere writes, language has only a tangential impact on translations. It is to be equated with transcoding. The technical activity of translation is but one small part of the field of translations, rather the definitions of translations within a culture are culturally bound.465 Translations themselves oscillate from the communication of information (now the primary concern of translations) to the circulation of cultural capital (now a minor concern). The third type of translations are to be found within the entertainment industry (movies, novels), while the fourth type are “persuasive” texts that try to get the reader to pursue a certain course of action. As Lefevere points out these types are not mutually exclusive, but information is necessary to function on a professional level, while cultural capital is necessary to function within the “right circles” within the society.466 As Lefevere writes: “[t]hat cultural capital is transmitted, distributed, and regulated by means of translation, among other factors, not only between cultures, but also within one given culture.”467 He further points out that cultural capital must fit the discursive formation, or textual system that constitutes “world literature”, and therefore must be analogous to something that already exists. This is especially relevant to oral poetry, such as the Finnish Kalevala or even the Nordic sagas to some extent.468 Translations studies thus stresses that translation doesn't take place in a void, but there are many

463 Jingjing Huan Chen Yan, "The Culture Turn in Translation Studies," Open Journal of Modern Linguistics 4 (2014). 464 Gauti Kristmannsson, "Þegar Hagrætt Er Í Menningunni," in Þýðingar, Endurritun Og Hagræðing Bókmenntaarfsins, ed. Þröstur Helgason, Ritröð Í Þýðingafræði (Reykjavík: Þýðingasetur Háskóla Íslands, 2013), 14. 465 André Lefevere, "Chinese and Western Thinking on Translation," in Constructing Cultures: Essays in Literary Translation Topics in Translation, ed. André Lefevere Susan Bassnett, Topics in Translation (Clevedon: Multilingual Matters, 1998), 24. 466 "Translation Practice(S) and the Circulation of Cultural Capital: Some Aeneids in English," in Constructing Cultures: Essays ed. André Lefevere Susan Bassnett, Topics in Translation (Clevedon: Multilingual Matters, 1998), 41. 467 Ibid. 468 "The Gates of Analogy: The Kalevala in English," in Constructing Cultures: Essays on Literary Translation Topics in Translation, ed. Susan Bassnett André Lefevere, Topics in Translation (Clevedon: Multilingual Matters, 1998), 76.

82 stakeholders involved who will decide which texts are selected for translation, what roles editors play, as well as the textual and extratextual constraints on the translator during the translation process itself.469 It could prove to be another fruitful enterprise to ponder these questions in regard to MT – which texts are selected for translation, why, and by whom? As Bassnett mentions, translation studies and cultural studies are concerned with power relations and textual productions. They have booth seen a “ripening” process, from an anthropological notion of culture to cultures or internationalization and from an oppositional standpoint of literary criticism to a study of hegemonic relations, and debates about equivalence to questions of hegemonic influences in text productions. She mentions that “[…] both have used semiotics to explore the problematics of encoding and decoding.”470

4.4 Approaches from Cultural Studies Translations present unique challenges to discourses of power and MT should be viewed in this context too. As the bi-directional proposed MT system will be only available in English it points to Iceland presenting itself within a Western tradition, the context of which will be discussed here. As the corpora to train the MT systems will be from modern texts gathered from the Internet via CommonCrawl from the Internet (along with Wikipedia and OpenSubtitles), discourse analysis on the post- colonial language use and power relations are an important step in lessening the inherent machine bias. Machine bias is one of the greatest challenges that arise from unsupervised training data. When the size of language models increases, it becomes more obscure what exactly is included in the training data. When texts are exclusively collected from online resources they can “[…] overrepresent hegemonic viewpoints and encode biases potentially damaging to marginalized populations.”471 It is also important that the NMT resolves grammatical gender correctly as per the Winograd Schema Challenge to improve morphological agreement, as English does not express grammatical gender (only natural gender),472 but Icelandic expresses grammatical

469 Bassnett, "The Translation Turn in Cultural Studies," 123. 470 Ibid., 133. 471 Emily M. Bender, 1. 472 Anne Curzan, Gender Shift in the History of English, ed. Merja Kytö, Studies in English Language (Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo: Cambridge University Press, 2003), 2.

83 gender. Gender bias473 and other biases are not mentioned in the LTPP, and this chapter argues for the abundance of issues that could be observed and studied in this context.474 Orientalism and the concept of Otherness have had a lot of influence on translation studies. Western translation studies have been historically quite occupied with preservation, identity, and assumptions about discrete entities of linguistic, social, or political dimensions as well as written canonical texts. As Theo Hermans mentions, they are thus rather narrowly equipped to deal with the inequities and complexities of a postmodern, postcolonial, unstable, globalized world. Even if translations studies tries to be intercultural, its own history stands in the way of accepting radical differences and the particularities of the local.475 Lawrence Venuti is known for theorizing the translator's invisibility476 within Anglo-American culture, and he also promotes foreignizing strategies, in the tradition of Schleiermacher's ethnodeviant pressure to send the reader abroad and register the linguistic and cultural differences of foreign texts. Schleiermacher's other choice was a domesticated approach, in which the author is brought back home by domesticating to target language cultural values via ethnocentric reduction. According to Venuti, foreignizing strategies should be preferred, as they limit the “ethnocentric violence of translation”, and as a “strategic cultural intervention” against the hegemonic Anglophone nations. Venuti considers it helpful against racism, ethnocentrism, and imperialism.477 Even though conscious foreignizing strategies like these might not be employed, texts might still be rejected within a target culture. This might occur when translations of “exotic” texts (of the “Other”), are supposed to be foreignized and the translation is received by the target culture as being too “modern” or

473 Jaeyeong Yang Won Ik Cho, Jiwon Kim, Nam Soo Kim, "Towards Cross-Lingual Generalization of Translation Gender Bias" (paper presented at the FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, online, 2021). 474 Other NMT developers put a strong stance on machine learning fairness, e.g.: Sarah E. Fox Allison Woodruff, Steven Rousso-Schindler, Jeff Warshaw, "A Qualitative Exploration of Perceptions of Algorithmic Fairness" (paper presented at the CHI 2018, Montréal, 2018). The LTPP mentions though that that in the development of grammar and spell-checking, the distinction between right and wrong might not be easy, as the „dative sickness“ is going to be found in a lot of the source material used to train the LT, but is nonetheless not the correct usage. Anna Björk Nikulásdóttir, Language Technology for Icelandic 2018-2022. Project Plan, 92. 475 Hermans, Translating Others, 1, 1. 476 Venuti, 1. 477 Ibid., 20.

84

“domesticated”. 478 Anthropologist Johannes Fabian formulated the denial of coevalness, which points to the core of foreignization. The Other is denied a place within the same temporality as the target culture; it is not contemporary, but primitive. It is either distanced within an inaccessible future or remote past.479 Thus this “Othering” becomes instrumental in building an image of self and other through that relationship, as theorized within post-colonial discourses.480 It is important to keep these kinds of discourses in mind when assembling parallel corpora of data, especially because these discourses will be present in English-Icelandic texts assembled online, as outlined in this chapter.

Post-colonial studies is an academic field based on studying the influences of colonialism and imperialism. As the Empire Writes Back states, over three-quarters of all people living in the world today have had their lives shaped by colonialism. It argues that the day-to-day experiences of colonized people are powerfully encoded through various art forms, including literature. 481 As Homi Bhabha writes, the collective experience of nationeness, cultural value and community interest are negotiated in the overlap and displacement of domains of difference, interstices that were formerly articulated within the singularities of class, gender, generation, location, geographical locale, or anything that inhabits a claim to identity in the modern world. 482 Cultural identities are not fixed or predetermined, they are negotiated in historical transformation, especially the minority perspective in the social articulation of difference. One form of identification can stem from the recognition of tradition, so “restaging” the past to introduce other, incomparable cultural temporalities to the invention of tradition. At the same time, this process obstructs access to an originary identity or “received” tradition.483 The present is thus not synchronic or homogenous, much unlike history which “[…] tells the beads of

478 Cortés, 55. 479 Johannes Fabian, Time and the Other. How Anthropology Makes Its Object (New York: Columbia University Press, 2014 (1983)), 31. 480 Cortés, 55. 481 Gareth Griffiths Bill Ashcroft, Helen Tiffin, The Empire Writes Back. Theory and Practice in Post- Colonial Literatures, 2nd ed. (London: Routledge, 2002 (1989)), 1. 482 Homi K. Bhabha, The Location of Culture (London, New York: Routledge, 1994), 1-2. 483 Ibid., 2. See also the discussion in polysystem theory in this thesis.

85 sequential time like a rosary, seeking to establish serial, causal connections […].”484 Furthermore, states and nations are not always identical, but may be joined by a hyphen. As Gayatri Spivak and Judith Butler state, when we ponder the relations, “[t]he state we are in when we ask this question may or may not have to do with the state we are in.”485 The wordplay on “state we are in” might refer to a “state of mind” and state that offers citizenship, signifies legal and institutional structures, and forms the conditions under which people are juridically bound.486 They ask with reference to Hannah Arendt whether there is a way (or right) to belong outside the possibilities states offer – a state without nationalism, thus nullifying the need for a nation- state.487 Butler and Spivak state that Arendt theorized statelessness but was not able to theorize the desire for citizenship. 488 The nation-state is reenacted through performance and language, as is symbolically visible through the state's national anthems language. 489 As they write, “[t]he nation-state requires the national language.”490 Orientalism is attributed mainly to Edward Said and asks the question of whether modern imperialism has ever ended at all. 491 Orientalism has been criticized by Homi Bhabha among others for being too constrained by a dualist model, as well as not being automatically applicable to the centuries preceding imperialism.492 Orientalism is a school of thought, an intellectual power493, that Said associates with American, French, and British coverage or tradition of constructing the East or Orient as “[…] its contrasting image, idea, personality, experience.”494 The East is not a fixed geographic entity, as it may contain the Middle East from European

484 Ibid., 4. 485 Gayatri Chakravorty Spivak Judith Butler, Who Sings the Nation-State? Language, Politics, Belonging (London, New York, Calcutta: Seagull Books, 2007), 2. 486 Ibid., 2-3. 487 Ibid., 48-50. 488 Ibid., 74. 489 Ibid., 68-71. 490 Ibid., 79. 491 Edward W. Said, Orientalism (London, New York, Toronto, Dublin, Melbourne, New Delhi, Auckland, Johannesburg: Penguin Books, 2003 (1978)), xvi. 492 Dagný Kristjánsdóttir, "Guðríður Símonardóttir: The Suspect Victim of the Turkish Abductions in the 17th Century," in The Postcolonial North Atlantic. Iceland, Greenland and the Faroe Islands, ed. Ebbe Volquardsen Lill-Ann Körber, Berliner Beiträge Zur Skandinavistik (Berlin: Nordeuropa- Institut der Humboldt-Universität zu Berlin, 2020), 154. 493 Said, 41. 494 Ibid., 2.

86 perspective or Asia (China, Japan) for the US.495 The discourse around the Orient is largely rooted in the colonial past and dates back well into the 18th century, according to Said. 496 The objects of Orientalism are mostly imaginative, and exercised by making statements about it, describing it, teaching it or ruling over it from a Western perspective.497 The distinction between Orient and Occident is a man-made reality, with its own imagery, vocabulary, and tradition of thought and the two ideas reflect and support each other.498 However, the Orient was mostly constructed by the West as a means of finding its own identity against the “otherness” of the East, creating a discourse of “us” against “them”.499 The Orient thus didn´t speak for itself, but it was spoken for, made “oriental”, by an inequality in the relationship of power.500 The foreignization of the Others came from a position of strength from Europe, depicting the Orient as childlike, irrational, depraved, “different”. Thus, they could depict Europe as mature, virtuous, rational, “normal”. The Orient was thus not a place of action, rather it was a place to be judged, to be ruled, to be disciplined or to be illustrated.501 As Said says, “[…] Orientalism was ultimately a political vision of reality whose structure promoted the difference between the familiar (Europe, the West, “us”) and the strange (the Orient, the East, “them”)”. 502 The influence of this discourse is still pervasive today, according to Said.503 He further asks, whether such a polarization is necessary and useful for the explanation of human experience, whether it is possible at all to categorize humans into such different cultures, traditions, societies and realities. 504 As Said argues, it only contributes to the polarization of the human experience, limits human encounters, and channels any knowledge into the separated compartments of East and West.505 However, it has been pointed out that within the frame of post-colonial theory, the West is often seen

495 Ibid., 1-2. 496 Ibid., 2-3. 497 Ibid., 3. 498 Ibid., 5. 499 Ibid., 3, 7, 45. 500 Ibid., 5-6. 501 Ibid., 40. 502 Ibid., 43. 503 Ibid., 44, xxii. 504 Ibid., 45. 505 Ibid., 46.

87 as a collective whole. However, the social groups and countries within that frame all have their particularities and stand in power relations to one another, a fact that is often overlooked.506 Translations as a part of literature poorly fit into this polarizing narrative of the human experience and separated cultures. They have therefore been largely ignored when canons of national literatures were assembled.507

Iceland has a rich history with its neighbors, dating back to Norwegian rule if the country until 1383 and then merging (along with Norway) with Denmark, where it remained under the Danish crown until 1944.508 Even in the aftermath of Icelandic sovereignty, Denmark still has considerable impact on the political debates and culture as well as social phenomena in Iceland.509 Nevertheless, Icelanders rarely consider themselves to be post-colonial, which can be explained to some degree by the fact that Icelanders descend from medieval Scandinavian settlers and are not an indigenous population.510 In fact, when Icelandic artefacts were to be exhibited next to Greenland and the West Indies during a colonial exhibition in 1905 in Copenhagen's Tivoli Gardens, Icelanders objected firmly.511 However, the question of whether Iceland is truly post-colonial cannot be answered conclusively. Some suggest looking at the Nordic involvement in terms of colonial compliancy, as the self-image of many people in the North is that of a Nordic exceptionalism, referring to a more peaceful and non-colonial past as compared to other European countries.512 Others

506 Kristín Loftsdóttir, "Whiteness Is from Another World: Gender, Icelandic International Development and Multiculturalism," Europen Journal of Women´s Studies 19, no. 1 (2012): 42. 507 Eysteinsson, 66. 508 Sigurður Gylfi Magnússon, Wasteland with Words: A Social History of Iceland (London: Reaktion Books Ltd, 2010), 18. 509 Lill-Ann Körber Ebbe Volquardsen, "The Postcolonial North Atlantic: An Introduction," in The Postcolonial North Atlantic. Iceland, Greenland and the Faroe Islands, ed. Lill-Ann Körber Ebbe Volquardsen, Berliner Beiträge Zur Skandinavistik (Berlin: Nordeuropa-Institut der Humboldt- Universität zu Berlin, 2014), 9. 510 Ibid., 13-14. And Guðmundur Hálfdanarson, "Iceland Perceived: Nordic, European or Colonial Other?," in The Postcolonial North Atlantic, ed. Ebbe Volquardsen Lill-Ann Körber, Berliner Beiträge Zur Skandinavistik (Berlin: Nordeuropa-Institut der Humboldt-Universität zu Berlin, 2020). 511 Ann-Sofie Nielsen Gremaud, "Iceland as Centre and Periphery: Postcolonial or Crypto-Colonial Perspectives. ," in The Postcolonial North Atlantic. Iceland, Greenland and the Faroe Islands, ed. Ebbe Volquardsen Lil-Ann Körber, Berliner Beiträge Zur Skandinavistik (Berlin: Nordeuropa- Institut der Humboldt-Universität zu Berlin, 2020), 99. 512 Kristín Loftsdóttir, "Endurútgáfa Negrastrákanna. Söguleg Sérstaða Íslands, Þjóðernishyggja Og Kynþáttafordómar," Rítið 13, no. 1 (2013). 108. And Lars Jensen Kristín Loftsdóttir, "Nordic Exceptionalism and the Nordic "Others"," in Whiteness and Postcolonialism in the Nordic Region,

88 mention that colonialism must be accepted as a key factor during the shaping of European identities, to which Iceland belongs.513 The Icelandic language had become incomprehensible to other Scandinavians by the late Middle Ages, explained by the divergence of the other Scandinavian languages through association with other European languages.514 The independence campaign was headed by middle class men of social standing, educated in Copenhagen (the country's capital at the time). It was justified by Iceland's literary tradition as well as their medieval legal tradition, as they argued that Iceland had never formally submitted to Denmark.515 The literary history is commonly broken down into concepts of the Golden Age and the Dark Age. To the Golden Age from 1100-1400 belong a lot of Icelandic family sagas as well as the Book of Icelanders and the Book of Settlements, Codex Regius (with the Poetic Edda), Heimskringla and other poetry in the skaldic verse, preserved from preliterate times. Literature from the Dark Age was considered inferior in quality and included religious writings, but also other significant genres like rímur, annals, and historical materials as well as the first travel books and autobiographies. In the Modern Age the educated elite came under the influence of Romanticism, as was common in Europe at the time. Common people also started to make their voices heard, and popular culture emerged. 516 Most publications well into the eighteenth century were of religious nature, with most of the medieval manuscripts not being re-printed for the mass market until 1890.517 Iceland within a post-colonial context occupies a special status, as Iceland did not “inherit” its literary tradition from the colonizer. However, the intellectuals arguing for Iceland's independence were very much educated by the colonizer, and thus only came into the position of wanting to reclaim the manuscripts that had left Iceland “for the center” of the colonizer. Iceland had already written its literature and had its own aspirations of expansion. Eysteinnsson mentions here Sigurður Nordal's ideas that it would not have been unthinkable that Iceland could ed. Lars Jensen Kristín Loftsdóttir, Studies in Migration and Diaspora (London, New York: Routledge, 2016 (2012)). 513 Kristín Loftsdóttir, "Icelandic Identities in a Postcolonial Context," in The Postcolonial North Atlantic, ed. Ebbe Volquardsen Lil-Ann Körber, Berliner Beiträge Zur Skandinavistik (Berlin: Nordeuropa-Institut der Humboldt-Universität zu Berlin, 2020). 514 Magnússon, 18. 515 Ibid., 147. 516 Ibid., 149. 517 Ibid., 158-59.

89 have constituted an empire in the 11th century.518 This understanding of culture is still to some degree visible in the present time, although it has never been quite unrestrained as it was with Nordal. The idea behind this thought is that Icelandic culture is self-sufficient within its literature, and an interesting reversal of the ideas of colony (or post-colony) versus colonized is visible in this trope. This idea is to some extent a delusion according to Eysteinsson, but since the nationalists of the 19th century found so much inspiration in the medieval literature that in turn sparked the struggle for independence it also holds true to some extent, that contemporary Iceland would be impossible without the cultural heritage.519 Thus the idea that the literature of the 19th century was a direct successor of the medieval heritage is explained, as well as the shunning of translations as part of that cultural heritage.520 It is impossible to imagine how big of a role translations played within the medieval literary heritage, but Eysteinsson mentions that it was a more significant contribution than has been claimed within historical reviews.521 Looking at the most recent history (after 1945), the majority of literary publications have in fact been translations, but they were mute within the polysystem.522

Iceland as part of a Scandinavian context has appeared as both a producer and a subject of cultural hierarchizations.523 It is now widely accepted that Iceland and other Nordic countries were not exempted from post-colonialism or Orientalism,524 and are deeply entrenched within colonial practices that still manifests in its societies, peoples, and cultures.525 Some individuals within Scandinavian or Nordic countries have been occupied by the powerful narrative of (gender) equality as a source of identity, while neglecting to come to terms with the region's own colonial practices

518 Eysteinsson, 237. And Sigurður Nordal, Íslenzk Menning (Reykjavík: Mál og Menning, 1942). 284-285. 519 Eysteinsson, 238. 520 Ibid., 238-39. 521 Ibid., 67. 522 Ibid., 257-59. 523 Loftsdóttir, "Icelandic Identities in a Postcolonial Context." 524 A position that was still quite novel when first brought forward in the early 2010s. 525 Ebbe Volquardsen Lill-Ann Körber, "Preface to the Second Edition," in The Postcolonial North Atlantic. Iceland, Greenland and the Faroe Islands, ed. Ebbe Volquardsen Lil-Ann Körber, Berliner Beiträge Zur Skandinavistik (Berlin: Nordeuropa-Institut der Humboldt-Universität zu Berlin, 2020).

90 and ideologies.526 The West in general has gained a status as modern and civilized, which can be called a hegemonic status, with Europe seen as the cradle of the modernity and civilization.527 Kristín Loftsdóttir reads the Icelandic nationalist ideas within this context, for example the stance on the purity of the language and the emphasize on the uniqueness of the literature. Icelandic nationalism and the struggle for independence must be contextualized within the island's status under Danish rule in the 19th century and of nationalist discourses within Europe on the Icelandic elite. As the country only gained independence in 1944, it did not directly participate in the colonial projects of the 19th and 20th century, however, the nationalistic ideas that come part and parcel with colonialism were a large part of the Icelandic identity. 528 The 19th century discourse placed Iceland very strongly within the “brotherhood” of the colonizers.529 Contemporary discourses, however, place Iceland firmly outside of the racism of the 20th century.530 Icelanders thus described themselves as a former colony (distancing themselves from the evils of colonialism) that is now one of the wealthiest countries in the world 531 and, as outlined in the introduction, as an important member of the indigenous language community.

Iceland has become a hub of tourism since the 2008 financial collapse, and it is possible to disseminate the way Iceland was produced as a destination within a post-colonial context. Tourism can be interpreted as a pseudo-event, a socially produced experience with a network of wittingly inauthentic actors engaging in a mediated reality. Destinations are thus not a fixed concept, but mobile entities in flux, that need to be constantly performed and narrated.532 Postcolonial concepts can be

526 Loftsdóttir, "Whiteness Is from Another World: Gender, Icelandic International Development and Multiculturalism," 43. 527 Gail Lewis, "Imaginaries of Europe. Technologies of Gender, Economies of Power," ibid.13, no. 2 (2006). 528 Kristín Loftsdóttir, "Whiteness Is from Another World: Gender, Icelandic International Development and Multiculturalism," ibid.19, no. 1 (2012): 44. 529 ""Pure Manliness": The Colonial Project and Africa's Image in Nineteenth Century Iceland," Identities: Global Studies in Culture and Power 16, no. 3 (2009): 271. 530 "Going to Eden: Nordic Exceptionalism and the Image of Blackness in Iceland," African and Black Diaspora: An International Journal 7, no. 1 (2014): 29. 531 "Whiteness Is from Another World: Gender, Icelandic International Development and Multiculturalism," 48. 532 Kristín Loftsdóttir Katrín Anna Lund, Michael Leonard, "More Than a Stopover: Analysing the Postcolonial Image of Iceland as a Gateway Destination," Tourist Studies 17, no. 2 (2016): 146.

91 used to disseminate the exotic as a postmodern variant of colonialism and those who have the power to shape destinations hold the power to shape the message and opinion of the destination. Within travel literature Iceland has often been viewed as a wilderness at the margins, in need of being tamed for human amusement, either within European hegemony or beyond its reach. Viewing Iceland as the cultural Other by projecting the perceived alien environment onto its people dates back to the 19th century.533 Translators who translated Icelandic texts positioned themselves as “us” (a central-European community) against “them”, a community upon which judgement, goodwill or pity is cast, in commentaries.534 This is visible both in literary translations as well as writings in modern-day tourism. 535 While Iceland could theoretically be placed as part of the North and West (signifying “modernity” and “civilization”) rather than East or South (signifying “tradition” or “backwardness”) it seems to have been rather assigned to the West Nordic region, or Vestnorden. Within the travel literature of the 19th century this vast space of the Faroe Islands, Iceland and Greenland marked the symbolic edge of the industrialized corner or Northwestern Europe.536 Travelers typically did not consider Iceland a blank slate but approached it with ambivalence. Nature served as a romanticized representation against the bourgeois Europe while Iceland was also recognized within the European intellectual hierarchy for supplying a foundational myth of Northern European culture with its medieval literature, but since deteriorated into “barbarian chaos”537, or childlike simplicity. 538 Borealism as comparable to Orientalism has not been systematically studied or defined as an epistemological space, however discursive strategies from Orientalism can certainly be used to interpret literature about Iceland. 539 Post-colonial discourses in Iceland can also be observed within social

533 Hálfdanarson, "Iceland Perceived: Nordic, European or Colonial Other?," 40. 534 Marion Lerner, "Paratexts as an Instrument of Power. German Translations of Icelandic Prose around 1900.," CLINA 5, no. 1 (2019): 188. 535 "Images of the North, Sublime Nature, and a Pioneering Icelandic Nation," in Iceland and Images of the North, ed. Sumarliði R. Ísleifsson, Collection Droit Au Pôle (Québec: Presses de l´Université du Québec, 2011). 536 Hálfdanarson, "Iceland Perceived: Nordic, European or Colonial Other?," 46. 537 Ibid., 46-47. 538 Kristín Loftsdóttir, "Icelandic Identities in a Postcolonial Context," ibid., ed. Ebbe Volquardsen Lil-Ann Körber, 75. 539 Guðmundur Hálfdanarson, "Iceland Perceived: Nordic, European or Colonial Other?," ibid., ed. Ebbe Volquardsen Lill-Ann Körber, 47.

92 discourses regarding immigration (the majority accepting low-skill jobs) and racism, as well as the association with a nationalist rhetoric as Business Vikings during the economic boom. 540 When the first travel literature about Iceland was published during the early 19th century, the nature and language often stood out as particularly alien and unusual.541 Iceland as a place within the Arctic profits from the fascination of the High North as a place of adventures and exploration, with the narrative of pure, “clean”, remote and untouched nature.542 However, within Iceland's controversial post-colonial status, the environmental crisis of tourism and its overconsumption of resources was added. 543 For its recent period of mass tourism, Iceland has constructed itself for external parties as an European hetero-image, as the “exotic” Other,544 even a pre-civilized paradise at the boundary of civilization and magic.545 Tourism has also seen to new publications in Iceland, among them many translations, in Iceland, of folklore, children's literature, medieval sagas, and history books.546 Gauti Kristmannson mentions, that Icelandic literature and Icelandic in its peripheral location would not survive without translation, in both directions (into Icelandic and from Icelandic). He argues that translation is one of or indeed the only means by which Icelandic culture can defend itself against the onslaught of English and American culture.547 When speaking of “defense”, Kristmannsson views translation as either a jingoistic (or parochial) “national literature” attitude, “[…] or a post- colonial justification of the oppressed people of Iceland (which arguably went from the political colonization by Denmark to cultural colonization by the USA).”548

540 Kristín Loftsdóttir, "Icelandic Identities in a Postcolonial Context," ibid., ed. Ebbe Volquardsen Lil-Ann Körber, 76-77. 541 Katrín Anna Lund, 147. 542 Katla Kjartansdóttir Katrín Anna Lund, Kristín Loftsdóttir, ""Puffin Love": Performing and Creating Arctic Landscapes in Iceland through Souvenirs," ibid.18 (2017): 143. 543 Reinhard Hennig, "Postcolonial Ecology: An Ecocritical Reading of Andri Snær Magnason´S Dreamland: A Self-Help Manual for a Frightened Nation (2006)," in The Postcolonial North Atlantic. Iceland, Greenland and the Faroe Islands, ed. Ebbe Volquardsen Lill-Ann Körber, Berliner Beiträge Zur Skandinavistik (Berlin: Nordeuropa-Institu der Humboldt-Universität zu Berlin, 2020). 544 Ann-Sofie Nielsen Gremaud, "Ísland Sem Rými Annarleikans. Myndir Frá Bókasýningunni Í Frankfurt Árið 2011 Í Ljósi Kenningar Um Dul-Lendur Og Heterótópíur," Rítið 12, no. 1 (2012). 545 "Iceland as Centre and Periphery: Postcolonial or Crypto-Colonial Perspectives. ." 100. Mass tourism has, as of the Corona pandemic, recently paused in Iceland. 546 One of the larger publishing houses Forlagið has an entire section with 511 titles named „books for foreign travelers“. 547 Kristmannsson, "Theory, World Literature, and the Problem of Untranslatability," 135-36. 548 Ibid., 136.

93

5 Conclusion

The role of translation in the shaping of national cultures is a widely debated topic and will be further complicated by the introduction of MT into the debate.

This thesis looked at the situation in Iceland, while including international approaches. We are in a diverse and fast evolving field, that will generate topics of discussions for years to come. I see potential for translation theory to propose an inclusive way to review machine translations automatically to combine the strength and weaknesses of both fields. Further research is proposed on the usage of computer-assisted translation (CAT) systems in Iceland as well as the extent of non- professional translations (NPT). Support for Icelandic within generic language technology is typically limited, as the language contains some peculiarities that present challenges for unaccustomed technology. The Icelandic government has thus commissioned a Language Technology Project Plan for various language technologies, machine translation amongst them. The Language Technology Project Plan is in line with the importance of the national language to the self-image of an Icelandic nation. The financing of the 4-year action plan for natural language technology development can thus be explained through the lens of monolingual nation-states and the importance of a national language as one of the founding pillars of that nation. Even though the development of language resources is comparably expensive for Iceland in comparison with larger countries (the development of language technology costs the same irrespective of the user base), the language is so central to the self-understanding of a nation that sovereignty over the language tools is vital for the nation to be realized. As the proposed machine translation systems are either statistical-based or neural network based, they will both depend on copious amounts of training data. As Iceland is a country with a post-colonial background and embedded in other discourses surrounding gender and inclusiveness, it is important to be aware of the bias of machine learning that can result from unsupervised training data. Translations in Iceland occupy a peripheral part of the literary polysystem which informs translation strategies. Thus machine translation must attempt to domesticate as much as possible, as foreignizing strategies will likely be firmly rejected by the language community.

94

Bibliography

(ISO), International Organization for Standardization. "Iso17100:2015(En) Translation Services - Requirements for Translation Services." In Scope, 2020 (2015). Abeba Birhane, Vinay Uday Prabhu. "Large Image Datasets: A Pyrrhic Win for Computer Vision." Paper presented at the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2021. Aidan Pine, Mark Turin. "Language Revitalization." Oxford Research Encyclopedia of Linguistics (2018). Alan K. Melby, C. Terry Warner. The Possibility of Language. A Discussion of the Nature of Language, with Implications for Human and Machine Translation. Amsterdam, Philadelphia: John Benjamins Publishing Company, 1995. Albrecht, Jörn. Literarische Übersetzung. Darmstadt: Wissenschaftliche Buchgesellschaft, 1998. Albrecht Neubert, Gregory M. Shreve. Translation as Text. Translation Studies. Edited by Gert Jäger Albrecht Neubert, Gregory M. Shreve Kent, London: The Kent State University Press, 1992. Alexis Conneau, Guillaume Lample, Marc´Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou. "Word Translation without Parallel Data." In ICLR 2018, 2018. Alfreðsdóttir, Lilja. "Letter to Disney." news release, 2021, https://scontent.frkv1- 1.fna.fbcdn.net/v/t1.0- 0/p417x417/145337113_3886638161401479_6361409550049456436_o.jpg?_nc_c at=109&ccb=3&_nc_sid=730e14&_nc_ohc=AOrLHQBi64UAX- OVFs7&_nc_ht=scontent.frkv1- 1.fna&tp=6&oh=87d82462becea36e329a06cd74eb8503&oe=605EFE4B. Allison Woodruff, Sarah E. Fox, Steven Rousso-Schindler, Jeff Warshaw. "A Qualitative Exploration of Perceptions of Algorithmic Fairness." Paper presented at the CHI 2018, Montréal, 2018. Almannarómur. "Hvað Er Máltækni Og Hvaða Máli Skiptir Hún Fyrir Íslensku?" https://almannaromur.is/. Amanda Lazar, Mark Diaz, Robin Brewer, Chelsea Kim, Anne Marie Piper. "Going Gray, Failure to Hire, and the Ick Factor: Analyzing How Older Bloggers Talk About Ageism." In the 2017 ACM Conference. Portland, 2017. Anderson, Benedict. Imagined Communities. Reflections on the Origin and Spread of Nationalism. London, New York: Verso, 2006 (1983). André Lefevere, Susan Bassnett. "Where Are We in Translation Studies?". In Constructing Cultures: Essays in Literary Translation Topics in Translation, edited by André Lefevere Susan Bassnett. Topics in Translation. Clevedon: Multilingual Matters, 1998. Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson. "Language Technology Programme for Icelandic 2019-2023." arXiv:2003.09244 (2020). Anna Björk Nikulásdóttir, Jón Guðnason, Steinþór Steingrímsson. Language Technology for Icelandic 2018-2022. Project Plan. Reykjavík: Mennta- og menningarmálaráðuneytið, 2017. ———. Máltækni Fyrir Íslensku 2018-2022. Verkáætlun. Reykjavík: Mennta- og menningarmálaráðuneytið, 2017. Anne-Marie Brady, Baldur Thorhallsson. "Small States and the Turning Point in Global Politics." Chap. 1 In Small States and the New Security Environment, edited by Baldur Thorhallson Anne-Marie Brady. The World of Small States, 1-13. Cham: Springer, 2021.

95

Arnardóttir, Þórunn. "Óskað Eftir Textum Frá Fólki Sem Hefur Íslensku Sem Annað Mál." news release, 2020. Arnold, Doug. "Why Translation Is Difficult for Computers." Chap. 8 In Computers and Translation: A Translator's Guide, edited by Harold Somers. Benjamins Translation Library, 119-42. Amsterdam, Philadelphia: John Benjamins Publishing Company, 2003. Austermühl, Frank. Electronic Tools for Translators. Translation Practices Explained. Edited by Anthony Pym Manchester, Northampton: St. Jerome Publishing, 2001. Bahordar, Seyyed Yahya. "The Assessment of Machine Translation According to Holmes´ Map of Translation Studies." Translation Journal (2018). Barrera, Michael. "Mind the Gap: Addressing Structural Equity and Inclusion on Wikipedia." Arlington: University of Texas Arlington, 2020. Basil Hatim, Ian Mason. The Translator as Communicator. London, New York: Routledge, 1997. Bassnett, Susan. "The Translation Turn in Cultural Studies." Chap. 8 In Constructing Cultures. Essays in Literary Translation, edited by André Lefevere Susan Bassnett. Topics in Translation, 123-41. Clevedon: Multilingual Matters, 1998. ———. "When Is a Translation Not a Translation?". Chap. 2 In Constructing Cultures: Essays on Literary Translation Topics, edited by André Lefevere Susan Bassnett. Topics in Translation. Clevedon: Multilingual Matters, 1998. Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, Stephen Denuyl. "Social Biases in Nlp Models as Barriers for Persons with Disabilities." Paper presented at the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020. Bender, Emily M. "The #Benderrule: On Naming the Languages We Study and Why It Matters." The Gradient, https://thegradient.pub/the-benderrule-on-naming-the- languages-we-study-and-why-it-matters/. ———. "On Achieving and Evaluating Language-Independence in Nlp." LiLT 6, no. 3 (2011). Benjamin, Ruha. Race after Technology. Cambridge, Medford: Polity Press, 219. Benjamin, Walter. "Charles Baudelaire Tableaux Parisiens. Deutsche Übertragung Mit Einem Vorwort Über Die Aufgabe Des Übersetzers." In Walter Benjamin. Gesammelte Schriften Iv 1, edited by Tillman Rexroth. Frankfurt am Main: Suhrkamp, 1991 (1972). Bhabha, Homi K. The Location of Culture. London, New York: Routledge, 1994. Bill Ashcroft, Gareth Griffiths, Helen Tiffin. The Empire Writes Back. Theory and Practice in Post-Colonial Literatures. 2nd ed. London: Routledge, 2002 (1989). Bjarnason, Egill. "Iceland Has a Request for Disney+: More Icelandic, Please." The New York Times, 2021. Björnsdóttir, Ingibjörg Elsa. "Vélþýðingar Á Íslensku Og Apertium-Þýðingarkerfið." Orð og Tunga 18 (2016): 113-43. Board, UNESCO Executive. "Report by the Director-General on the Execution of the Programme Adopted by the General Conference. Intersectoral Mid-Term Strategy on Languages and Multilingualism." 2007. Bowker, Lynne. "Computer-Aided Translation: Translator Training." Chap. 4 In The Routledge Encyclopedia of Translation Technology, edited by Chan Sin-wai, 88-105. London, New York: Routledge, 2015. ———. "Fit-for-Purpose Translation." Chap. 27 In The Routledge Handbook of Translation and Technology, edited by Minako O´Hagan, 453-69. London, New York: Routledge, 2020. Brandt, Martha Dís. "Developing an Icelandic to English Shallow Transfer Machine Translation System." Reykjavík University, 2011.

96

Byrne, Jody. Scientific and Technical Translation Explained. Translation Practices Explained. Edited by Sharon O´Brien Sara Laviosa, Kelly Washbourne Manchester, Kinderhook: St. Jerome Publishing, 2012. Cardey, Sylviane. "Translation Technology in France." In The Routledge Encyclopedia of Translation Technology, edited by Chan Sin-wai. London, New York: Routledge, 2015. Chamberlain, Lori. "Gender and the Metaphorics of Translation." Chap. 23 In The Translation Studies Reader, edited by Lawrence Venuti, 314-30. London, New York: Routledge, 2000. Chen Yan, Jingjing Huan. "The Culture Turn in Translation Studies." Open Journal of Modern Linguistics 4 (2014): 487-94. Chesterman, Andrew. Memes of Translation. The Spread of Ideas in Translation Theory. Benjamins Translation Library. Vol. 22, Amsterdam, Philadelphia: John Benjamins Publishing Company, 1997. Constantine, Peter. "Google Translate Gets Voltaire: Literary Translation and the Age of Artificial Intelligence." Contemporary French and Francophone Studies 23, no. 4 (2020): 471-79. Cortés, Ovidi Carbonell. "Misquoted Others. Locating Newness and Authority in Cultural Translation." Chap. 1 In Translating Others, edited by Theo Hermans, 43-66. London, New York: Routledge, 2014 (2006). Curzan, Anne. Gender Shift in the History of English. Studies in English Language. Edited by Merja Kytö Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo: Cambridge University Press, 2003. Daniel Weissbort, Ástráður Eysteinsson. Translation - Theory and Practice. Oxford: Oxford University Press, 2006. Djiako, Gabriel Armand. Lexical Ambiguity in Machine Translation and Its Impact on the Evaluation of Output by Users. Saabrücken: Universitätsverlag des Saarlandes, 2019. Dorr, Bonni Jean. Machine Translation: A View from the Lexicon. Cambridge, London: The MIT Press, 1993. Dryer, Matthew S. "Position of Negative Morpheme with Respect to Subject, Object, and Verb." In The World Atlas of Language Structures Online, edited by Matthew S. Dryer, Haspemath, Martin. Leipzig: Max Planck Institute for Evolutionary Anthropology, 2013. Duoxiu, Qian. "Introducing Corpus Rhetoric into Translation Quality Assessment. A Case Study of the White Papers on China´S National Defense." Chap. 5 In The Human Factor in Machine Translation, edited by Chan Sin-wai. Routledge Studies in Translation Technology, 83-99. London, New York: Routledge, 2018. Ebbe Volquardsen, Lill-Ann Körber. "The Postcolonial North Atlantic: An Introduction." In The Postcolonial North Atlantic. Iceland, Greenland and the Faroe Islands, edited by Lill-Ann Körber Ebbe Volquardsen. Berliner Beiträge Zur Skandinavistik, 7-31. Berlin: Nordeuropa-Institut der Humboldt-Universität zu Berlin, 2014. Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov. "Learning Word Vectors for 157 Languages." In International Conference on Language Resources and Evaluation (LREC), 3483-87, 2018. ehf, Miðeind. "Vélþýðing." https://xn--mieind-qwa.is/velthyding.html. ———. "Vélþýðing. Knúin Af Greyni." https://velthyding.is/. Eiríkur Rögnvaldsson, Kristín M. Jóhannsdóttir, Sigrún Helgadóttir, Steinþór Steingrímsson. Íslensk Tunga Á Stafrænni Öld. The Icelandic Language in the Digital Age. Hvítbókarröð = White Paper Series. Edited by Georg; Uszkoreit Rehm, Hans: Springer Verlag, 2012.

97

Eiríkur Rögnvaldsson, Samúel Þórisson. "Um Clarin-Is." https://clarin.is/um-clarin/. Emily M. Bender, Alex Lascarides. Linguistic Fundamentals for Natural Language Processing Ii. Synthesis Lectures on Human Language Technologies. Edited by Graeme Hirst Toronto: Morgan & Claypool Publishers, 2020. Emily M. Bender, Alexander Koller. "Climbing Towards Nlu: On Meaning, Form, and Understanding in the Age of Data." Paper presented at the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020. Emily M. Bender, Angelina McMillan-Major, Timnit Gebru, Shmargaret Shmitchell. "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? ." In Conference on Fairness, Accountability, and Transparency (FAccT ´21), 14. Virtual Event, 2021. Erik Angelone, Maureen Ehrensberger-Dow, Gary Massey. The Bloomsbury Companion to Language Industry Studies. Bloomsbury Companions. London, New York, Oxford, New Delhi, Sydney: Bloomsbury Academic, 2020. ———. "Introduction." Chap. 1 In The Bloomsbury Companion to Language Industry Studies, edited by Maureen Ehrensberger-Dow Erik Angelone, Gary Massey. Bloomsbury Companions, 1-15. London, New York, Oxford, New Delhi, Sydney: Bloomsbury Academic, 2020. Eugene A. Nida, Charles R. Taber. The Theory and Practice of Translation. Helps for Translators. Prepared under the Auspices of the United Bible Societies. Leiden: E. J. Brill, 1982 (1969). Eva Vanmassenhove, Christian Hardmeier, Andy Way. "Getting Gender Right in Neural Machine Translation." Paper presented at the Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, 2018. Even-Zohar, Itamar. "Polysystem Studies." Poetics Today. International Journal for Theory and Analysis of Literature and Communication 11, no. 1 (1990). Eysteinsson, Ástráður. Tvímæli. Þýðingar Og Bókmenntir. Fræðirit. Edited by Matthías Viðar Sæmundsson Ástraður Eysteinsson Reykjavík: Háskólaútgafan, 1996. Fabian, Johannes. Time and the Other. How Anthropology Makes Its Object. New York: Columbia University Press, 2014 (1983). Firth, John R. "A Synopsis of Linguistic Theory, 1930-1955." Studies in linguistic analysis (1975). fræðum, Stofnun Árna Magnússonar í íslenskum. "Íðorðabankinn." https://idordabanki.arnastofnun.is/. Ganesh Jawahar, Benoît Sagot, Djamé Seddah. "What Does Bert Learn About the Structure of Language?" Paper presented at the Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019. Gremaud, Ann-Sofie Nielsen. "Iceland as Centre and Periphery: Postcolonial or Crypto- Colonial Perspectives. ." In The Postcolonial North Atlantic. Iceland, Greenland and the Faroe Islands, edited by Ebbe Volquardsen Lil-Ann Körber. Berliner Beiträge Zur Skandinavistik, 83-105. Berlin: Nordeuropa-Institut der Humboldt-Universität zu Berlin, 2020. ———. "Ísland Sem Rými Annarleikans. Myndir Frá Bókasýningunni Í Frankfurt Árið 2011 Í Ljósi Kenningar Um Dul-Lendur Og Heterótópíur." Rítið 12, no. 1 (2012): 7-29. Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc´Aurelio Ranzato. "Phrase-Based & Neural Unsupervised Machine Translation." ArXiv abs/1804.07755 (2018). Gunnar Helgi Kristinsson, Baldur Thorhallsson. "The Euro-Sceptical Political Elite." Chap. 9 In Iceland and European Integration. On the Edge, edited by Baldur Thorhallsson. Europe and the Nation State, 145-61. London, New York: Routledge, 2004. Hanna Risku, Regina Rogl, Jelena Milošević. "Researching Workplaces." Chap. 3 In The Bloomsbury Companion to Language Industry Studies, edited by Maureen

98

Ehrensberger-Dow Erik Angelone, Gary Massey. Bloomsbury Companions, 37-63. London, New York, Oxford, New Delhi, Sydney: Bloomsbury Academic, 2020. Hálfdanarson, Guðmundur. "Discussing Europe: Icelandic Nationalism and European Integration." Chap. 8 In Iceland and European Integration. On the Edge, edited by Baldur Thorhallsson. Europe and the Nation State, 128-45. London, New York: Routledge, 2004. ———. "From Linguistic Patriotism to Cultural Nationalism: Language and Identity in Iceland." Languages and identities in historical perspective (2005): 55-67. ———. "Iceland Perceived: Nordic, European or Colonial Other?". In The Postcolonial North Atlantic, edited by Ebbe Volquardsen Lill-Ann Körber. Berliner Beiträge Zur Skandinavistik, 39-67. Berlin: Nordeuropa-Institut der Humboldt-Universität zu Berlin, 2020. ———. "Icelandic Modernity and the Role of Nationalism." In Nordic Paths to Modernity, edited by Björn Wittrock Jóhann Páll Árnason. New York, Oxford: Berghahn Books, 2012. ———. Íslenska Þjóðríkið - Uppruni Og Endimörk. Reykjavík: Hið Íslenska Bókmenntafélag, 2001. Heike Zinnmeister, Stefanie Dipper, Melanie Seiss. "Abstract Pronominal Anaphors and Label Nouns in German and English: Selected Case Studies and Quantitative Investigations." In Crossroads between Contrastive Linguistics, Translation Studies & Machine Translation, edited by Silvia Hansen-Schirra Oliver Czulo, 153-95. Berlin: Language Science Press, 2017. Heizman, Susanne. "Human Strategies in Translation and Interpreting - What Mt Can Learn from Translators." In MT - Ten Years On, edited by Verbmobil. Cranfield: Universität Hildesheim, Institut für Angewandte Sprachwissenschaft, 1994. Hennig, Reinhard. "Postcolonial Ecology: An Ecocritical Reading of Andri Snær Magnason´S Dreamland: A Self-Help Manual for a Frightened Nation (2006)." In The Postcolonial North Atlantic. Iceland, Greenland and the Faroe Islands, edited by Ebbe Volquardsen Lill-Ann Körber. Berliner Beiträge Zur Skandinavistik, 105-27. Berlin: Nordeuropa-Institu der Humboldt-Universität zu Berlin, 2020. Hermans, Theo. "Introduction. Translation Studies and a New Paradigm." In The Manipulation of Literature. Studies in Literary Translation, edited by Theo Hermans. London, Sydney: Croom Helm, 1985. ———. Translating Others. 2 vols. Vol. 1, London, New York: Routledge, 2014 (2006). ———. Translation in Systems. Descriptive and System-Oriented Approaches Explained. Translation Theories Explained. Edited by Anthony Pym Manchester: St. Jerome Publishing, 2009 (1999). Hjartarson, Snorri. Land Þjóð Og Tunga. Edited by Lesbók Morgunblaðsins1949. Holmes, James S. "The Name and Nature of Translation Studies." Chap. 13 In The Translation Studies Reader, edited by Lawrence Venuti, 172-86. London, New York: Routledge, 2000. Hutchins, John. The History of Machine Translation in a Nutshell. 2005. http://www.hutchinsweb.me.uk/Nutshell-2014.pdf. ———. "Milestones in Machine Translation. Part 1 - How It All Began in 1947 and 1948." Language Today 3, no. December 1997 (1997): 22-23. ———. "Multiple Uses of Machine Translation and Computerised Translation Tools." In International Symposium on Data and Sense Mining, Machine Translation and Controlled Languages - ISMTCL 2009. centre Tesnière, université of Franche- Comté, Besancoņ , France: Besancon:̧ Presses universitaires de Franche-Comté, 2009. Hutchins, W. John. "Milestones in Machine Translation No.2." Language Today 6, no. March 1998 (1998): 22-23.

99

———. "Milestones in Machine Translation. Part 3 - Bar Hillel's Survey, 1951." Language Today 8, no. May 1998 (1998): 22-23. ———. "Milestones in Machine Translation. Part 4: The First Machine Translation Conference, June 1952." Language Today 13, no. October 1998 (1998): 12-13. "Introduction." http://www.iclt.is/index_en.html. "Ísland Í Stýrihóp Unesco Vegna Áratugar Frumbyggjamála." mbl.is, https://www.mbl.is/frettir/innlent/2021/04/19/island_i_styrihop_unesco_vegna_ aratugar_frumbyggjam/. James Hynd, E. M. Valk. "The Task of the Translator ". In Translation Theory and Practice. A Historical Reader, edited by Daniel Weissbort Ástraður Eysteinsson. Oxford: Routledge, 2006 (1968). Jansche, Martin. "Computer-Aided Quality Assurance of an Icelandic Pronunciation Dictionary." Paper presented at the European Language Resources Association (ELRA), Reykjavík, Iceland, 2014. Jiménez-Crespo, Miguel A. "Technology and Non-Professional Translation." Chap. 14 In The Routledge Handbook of Translation and Technology, edited by Minako O´Hagan, 239-55. London, New York: Routledge, 2020. Joseph Olive, Caitlin Christianson, John McCary. Handbook of Natural Language Processing and Machine Translation. Darpa Global Autonomous Language Exploitation. New York, Dordrecht, Heidelberg, London: Springer, 2011. Joss Moorkens, David Lewis. "Copyright and the Re-Use of Translation as Data." Chap. 28 In The Routledge Handbook of Translation and Technology, edited by Minako O´Hagan, 469-82. London, New York: Routledge, 2020. Jóhannsdóttir, Adda María. ""Land, Þjóð Og Tunga - Þrenning Sönn Og Ein." Þjóðerni Og Sjálfsmynd Á Tímum Hnattvæðingar." Háskóli Íslands, 2010. Judith Butler, Gayatri Chakravorty Spivak. Who Sings the Nation-State? Language, Politics, Belonging. London, New York, Calcutta: Seagull Books, 2007. Katharina Reiss, Hans J. Vermeer. Towards a General Theory of Translational Action. Skopos Theory Explained. Translated by Christiane Nord. London, New York: Routledge, 2014 (1984). Katrín Anna Lund, Katla Kjartansdóttir, Kristín Loftsdóttir. ""Puffin Love": Performing and Creating Arctic Landscapes in Iceland through Souvenirs." Tourist Studies 18, no. 2 (2017): 142-58. Katrín Anna Lund, Kristín Loftsdóttir, Michael Leonard. "More Than a Stopover: Analysing the Postcolonial Image of Iceland as a Gateway Destination." Tourist Studies 17, no. 2 (2016): 144-63. Kellie Webster, Marta Recasens, Vera Axelrod, Jason Baldridge. "Mind the Gap: A Balanced Corpus of Gendered Ambiguous Pronouns." Transactions of the Association for Computational Linguistics (2018). Kenny, Dorothy. "The Ethics of Machine Translation." In Proceedings of the XI NZSTI National Conference, edited by Sybille Ferner: NZSTI, 2011. ———. "Translation and Translator Training." Chap. 30 In The Routledge Handbook of Translation and Technology, edited by Minako O´Hagan. Routledge Handbooks in Translation and Interpreting Studies, 498-516. London, New York: Routledge, 2020. King, Patrick. "Small and Medium-Sized Enterprise (Sme) Translation Service Provider as Technology User: Translation in New Zealand." Chap. 9 In The Routledge Handbook of Translation and Technology, edited by Minako O´Hagan. Routledge Handbooks in Translation and Interpreting Studies, 148-66. London, New York: Routledge, 2020. Koehn, Philip. Neural Machine Translation. Cambridge: Cambridge University Press, 2020. Kristín Loftsdóttir, Lars Jensen. "Nordic Exceptionalism and the Nordic "Others"." In Whiteness and Postcolonialism in the Nordic Region, edited by Lars Jensen Kristín

100

Loftsdóttir. Studies in Migration and Diaspora. London, New York: Routledge, 2016 (2012). Kristjánsdóttir, Dagný. "Guðríður Símonardóttir: The Suspect Victim of the Turkish Abductions in the 17th Century." In The Postcolonial North Atlantic. Iceland, Greenland and the Faroe Islands, edited by Ebbe Volquardsen Lill-Ann Körber. Berliner Beiträge Zur Skandinavistik, 143-64. Berlin: Nordeuropa-Institut der Humboldt-Universität zu Berlin, 2020. Kristmannsson, Gauti. "Theory, World Literature, and the Problem of Untranslatability." In Untranslatability Goes Global, edited by Katie Lateef-Jan Suzanne Jill Levine. Routledge Advances in Translation and Interpreting Studies. London, New York: Routledge, 2018. ———. "Þegar Hagrætt Er Í Menningunni." In Þýðingar, Endurritun Og Hagræðing Bókmenntaarfsins, edited by Þröstur Helgason. Ritröð Í Þýðingafræði. Reykjavík: Þýðingasetur Háskóla Íslands, 2013. Kurenkov, Andrey. "A Brief History of Neural Nets and Deep Learning." Skynet Today, 2020. Lefevere, André. "Chinese and Western Thinking on Translation." In Constructing Cultures: Essays in Literary Translation Topics in Translation, edited by André Lefevere Susan Bassnett. Topics in Translation. Clevedon: Multilingual Matters, 1998. ———. "The Gates of Analogy: The Kalevala in English." Chap. 5 In Constructing Cultures: Essays on Literary Translation Topics in Translation, edited by Susan Bassnett André Lefevere. Topics in Translation, 76-90. Clevedon: Multilingual Matters, 1998. ———. "Translation Practice(S) and the Circulation of Cultural Capital: Some Aeneids in English." Chap. 3 In Constructing Cultures: Essays edited by André Lefevere Susan Bassnett. Topics in Translation, 41-57. Clevedon: Multilingual Matters, 1998. Lerner, Marion. "Images of the North, Sublime Nature, and a Pioneering Icelandic Nation." In Iceland and Images of the North, edited by Sumarliði R. Ísleifsson. Collection Droit Au Pôle. Québec: Presses de l´Université du Québec, 2011. ———. "Paratexts as an Instrument of Power. German Translations of Icelandic Prose around 1900.". CLINA 5, no. 1 (2019): 181-90. Levý, Jiří. "Translation as a Decision Process." In The Translation Studies Reader, edited by Lawrence Venuti. London, New York: Routledge, 2000. Lewis, Gail. "Imaginaries of Europe. Technologies of Gender, Economies of Power." Europen Journal of Women´s Studies 13, no. 2 (2006): 87-102. Lill-Ann Körber, Ebbe Volquardsen. "Preface to the Second Edition." In The Postcolonial North Atlantic. Iceland, Greenland and the Faroe Islands, edited by Ebbe Volquardsen Lil-Ann Körber. Berliner Beiträge Zur Skandinavistik. Berlin: Nordeuropa-Institut der Humboldt-Universität zu Berlin, 2020. Liu, Henry. "Foreword." In The Bloomsbury Companion to Language Industry Studies, edited by Maureen Ehrensberger-Dow Erik Angelone, Gary Massey. Bloomsbury Companions. London, New York, Oxford, New Delhi, Sydney: Bloomsbury Academic, 2020. Liu Qun, Zhang Xiaojun. "Machine Translation. General.". In The Routledge Encyclopedia of Translation Technology, edited by Chan Sin-wai. London, New York: Routledge, 2015. Liu Yang, Zhang Min. "Statistical Machine Translation." In The Routledge Encyclopedia of Translation Technology, edited by Chan Sin-wai. London, New York: Routledge, 2015. Loftsdóttir, Kristín. "Endurútgáfa Negrastrákanna. Söguleg Sérstaða Íslands, Þjóðernishyggja Og Kynþáttafordómar." Rítið 13, no. 1 (2013): 101-24. ———. "Going to Eden: Nordic Exceptionalism and the Image of Blackness in Iceland." African and Black Diaspora: An International Journal 7, no. 1 (2014): 27-41.

101

———. "Icelandic Identities in a Postcolonial Context." In The Postcolonial North Atlantic, edited by Ebbe Volquardsen Lil-Ann Körber. Berliner Beiträge Zur Skandinavistik, 67-83. Berlin: Nordeuropa-Institut der Humboldt-Universität zu Berlin, 2020. ———. ""Pure Manliness": The Colonial Project and Africa's Image in Nineteenth Century Iceland." Identities: Global Studies in Culture and Power 16, no. 3 (2009): 271-93. ———. "Whiteness Is from Another World: Gender, Icelandic International Development and Multiculturalism." Europen Journal of Women´s Studies 19, no. 1 (2012): 41-54. LT2013. "Status and Potential of the European Language Technology Markets." In The Forum for Europe´s Language Technology Industry, edited by Shaping Europe´s digital future. online: European Commission, 2014. Lynne Bowker, Jairo Buitrago Ciro. Machine Translation and Global Research: Towards Improved Machine Translation Literacy in the Scholarly Community. Bingley: Emerald Publishing Limited, 2019. Magnússon, Sigurður Gylfi. Wasteland with Words: A Social History of Iceland. London: Reaktion Books Ltd, 2010. Maouche, Seraya. "Google Ai: Opportunities, Risks, and Ethical Challenges." Contemporary French and Francophone Studies 23, no. 4 (2020): 447-55. McLuhan, Marshall. The Gutenberg Galaxy. The Making of Typographic Man. Toronto: University of Toronto Press, 1962 Meer, Jaap van der. "Translation Technology - Past, Present and Future." Chap. 13 In The Bloomsbury Companion to Language Industry Studies, edited by Maureen Ehrensberger-Dow Erik Angelone, Gary Massey. Bloomsbury Companions, 285-311. London, New York, Oxford, New Delhi, Sydney: Bloomsbury Academic, 2020. Melby, Alan K. "Future of Machine Translation: Musings on Weaver's Memo." Chap. 25 In The Routledge Handbook of Translation and Technology, edited by Minako O´Hagan, 419-37. London, New York: Routledge, 2020. Miles, Lee. "Foreword." In Iceland and European Integration. On the Edge, edited by Baldur Thorhallsson. Europe and the Nation State. London, New York: Routledge, 2004. Mona Baker, Gabriela Saldanha. "Routledge Encyclopedia of Translation Studies." London, New York: Routledge, 2020. Munday, Jeremy. Evaluation in Translation. London, New York: Routledge, 2012. ———. Introducing Translation Studies. Theories and Applications. 3rd ed. London, New York: Routledge, 2012. Nord, Christiane. "Translator's Preface." In Towards a General Theory of Translational Action, edited by Hans J. Vermeer Katharina Reiss. London, New York: Routledge, 2012. Nordal, Sigurður. Íslenzk Menning. Reykjavík: Mál og Menning, 1942. O´Hagan, Minako. "Introduction: Translation and Technology: Disruptive Entanglement of Human and Machine." In The Routledge Handbook of Translation and Technology, edited by Minako O´Hagan. London, New York: Routledge, 2020. ———. The Routledge Handbook of Translation and Technology. Routledge Handbooks in Translation and Interpreting Studies. London, New York: Routledge, 2020. Paco Guzman, Don Husa. "Expanding Automatic Machine Translation to More Languages." https://engineering.fb.com/2018/09/11/ml-applications/expanding-automatic- machine-translation-to-more-languages/. Penelope Brown, Stephen C. Levinson. Politeness. Some Universals in Language Usage. Studies in Interactional Sociolinguistics. Edited by John J. Gumperz Cambridge, New York, New Rochelle, Melbourne, Sydney: Cambridge University Press, 1987. Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, Monojit Choudhury. "The State and Fate of Linguistic Diversity and Inclusion in the Nlp World." Paper presented at

102

the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 2020. Project, Principle. "Principle Leaflet/Infographic." 2021. Prunč, Erich. Einführung in Die Translationswissenschaft. Graz: Selbstverlag, Institut für Theoretische und Angewandte Translationswissenschaft, 2002. Raschka, Sebastian. "Single-Layer Neural Networks and Gradient Descent." In Teaching, 2015. Rögnvaldsson, Eiríkur. "Icelandic Language Technology Ten Years Later." (2008). Said, Edward W. Orientalism. London, New York, Toronto, Dublin, Melbourne, New Delhi, Auckland, Johannesburg: Penguin Books, 2003 (1978). Sanderson, Grant. "What Is Backpropagation Really Doing? | Deep Learning, Chapter 3." In Neural Networks, edited by 3Blue1Brown, 2017. Schmidhuber, Jürgen. "Deep Learning in Neural Networks: An Overview." Neural Networks 61 (Jan 2015 2014): 85-117. Schütze, Heinrich. "Word Space." Advances in neural information processing systems 5 (1993): 895-902. Siever, Holger. Übersetzungswissenschaft. Eine Einführung. Tübingen: Narr Francke Attempto Verlag GmbH + Co. KG, 2015. Sin-wai, Chan. "Caught in the Web of Translation. Reflections on the Compilation of Three Translation Encyclopedias." Chap. 3 In The Human Factor in Machine Translation, edited by Chan Sin-wai. Routledge Studies in Translation Technology London, New York: Routledge, 2018. ———. "The Development of Translation Technology. 1967-2013." Chap. 1 In The Routledge Encyclopedia of Translation Technology, edited by Chan Sin-wai, 3-31. London, New York: Routledge, 2015. Somers, Harold. "Christa Hauenschild and Susanne Heizmann (Eds), Machine Translation and Translation Theory." Machine Translation 15, no. 3 (2000): 262-66. ———. "Introduction." In Computers and Translation: A Translator's Guide, edited by Harold Somers. Benjamins Translation Library. Amsterdam / Philadelphia: John Benjamins Publishing Company, 2003. ———. "Translation Technologies and Minority Languages." Chap. 6 In Computers and Translation, edited by Harold Somers. Benjamins Translation Library, 87. Amsterdam, Philadelphia: John Benjamins Publishing Company, 2003. Somers, Harold L. ""New Paradigms" in Mt: The State of the Play Once the Dust Has Settled." (2003). Steenbergen, Jan van. "A Short History of Interslavic." interslavic-language.org, http://steen.free.fr/interslavic/history.html. Stefánsdóttir, Lilja Björk. "Við Söfnum Leiðréttingum Á Íslensku Ritmáli." edited by Máltækni við Háskóla Íslands, 2020. Steiner, George. After Babel. London, Oxford, New York: Oxford University Press, 1976 (1975). stétta, Rithöfundasamband Íslands. Bandalag skrifandi. "Þýðingasamningur Við Útgefendur - Taxtar." https://rsi.is/samningar-og-taxtar/taxtar/thydingasamningur-vid- utgefendur-taxtar/. Thráinsson, Höskuldur. The Syntax of Icelandic. Cambridge Syntax Guides. Edited by B. Comrie P. Austin, J. Bresnan, D. Lightfoot, I. Roberts, N. V. Smith Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo: Cambridge University Press, 2007. Tolga Bolukbasi, Kai-Wei Chang, James Zou, Benkatesh Saligrama, Adam Kalai. "Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings." In NeurIPS. Barcelona, 2016.

103

Toury, Gideon. Descriptive Translation Studies - and Beyond. Benjamins Translation Library. Amsterdam, Philadelphia: John Benjamins Publishing Company, 1995. túlka, Bandalag Þýðenda og. "Lög Félagsins." (2004). Tymoczko, Maria. "Reconceptualizing Translation Theory. Integrating Non-Western Thought About Translation." Chap. 1 In Translating Others, edited by Theo Hermans, 13-33. London, New York: Routledge, 2014 (2006). "Unesco Launches the Global Task Force for Making a Decade of Action for Indigenous Languages." https://en.unesco.org/news/unesco-launches-global-task-force- making-decade-action-indigenous- languages?fbclid=IwAR0keeVLDwjswW3QwolM3AwgWlzLStdq31StkVBqUQ8H9vc e6LRJvHRJJtg. "Upcoming Decade of Indigenous Languages (2022 – 2032) to Focus on Indigenous Language Users’ Human Rights." https://en.unesco.org/news/upcoming-decade- indigenous-languages-2022-2032-focus-indigenous-language-users-human-rights. Venuti, Lawrence. The Translator´S Invisibility. A History of Translation. Translation Studies. Edited by André Lefevere Susan Bassnett London, New York: Routledge, 1995. Vermeer, Hans J. "Skopos and Commission in Translational Action." Translated by Andrew Chesterman. In The Translation Studies Reader, edited by Lawrence Venuti. London, New York: Routledge, 2000. W. John Hutchins, Harold L. Somers. An Introduction to Machine Translation. London, San Diego, New York, Boston, Sydney, Tokyo, Toronto: Academic Press, 1992. Way, Andy. "Machine Translation: Where Are We Today?". In The Bloomsbury Companion to Language Industry Studies, edited by Maureen Ehrensberger-Dow Erik Angelone, Gary Massey. Bloomsbury Companions. London, New York, Oxfor, New Delhi, Sydney: Bloomsbury Academic, 2020. Webster, BIlly Wong Tak-ming; Jonathan J. "Example-Based Machine Translation." In The Routledge Encyclopedia of Translation Technology, edited by Chan Sin-wai. London, New York: Routledge, 2013. Won Ik Cho, Jaeyeong Yang, Jiwon Kim, Nam Soo Kim. "Towards Cross-Lingual Generalization of Translation Gender Bias." Paper presented at the FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, online, 2021. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean. "Google´S Neural Machine Translation System: Bridging the Gap between Human and Machine Translation." ArXiv (2016). Yu Shiwen, Bai Xiaojing. "Rule-Based Machine Translation." In The Routledge Encyclopedia of Translation Technology, edited by Chan Sin-wai. London, New York: Routledge, 2013. Zohn, Harry. "The Task of the Translator. An Introduction to the Translation of Baudelaire´S Tableaux Parisiens." In Walter Benjamin. Illuminations, edited by Hannah Arendt. New York: Schocken Books, 2007 (1968). Þorgeirsdóttir, Sigrún. "Hugtakasafn Þýðingamiðstöðvar Utanríkisráðuneytisins." https://hugtakasafn.utn.stjr.is/index.en.adp.

104