Introducing the Autshumato Integrated Translation Environment
Total Page:16
File Type:pdf, Size:1020Kb
Introducing the Autshumato Integrated Translation Environment Hendrik J. Groenewald Wildrich Fourie Centre for Text Technology (CTexT) Centre for Text Technology (CTexT) North-West University North-West University Potchefstroom, South Africa Potchefstroom, South Africa [email protected] [email protected] Abstract An example of this is the fact that the South African government does not have the capacity Translation is an indispensable process to produce parliamentary records in all eleven for socioeconomic and cultural official languages. Multilingual parliamentary development in a multilingual society. records are one of the most important resources The use of translation tools such as for machine translation (MT) projects elsewhere translation memories and machine in the world. One such a project is the translation systems can be beneficial in EuroMatrix project, a statistical and hybrid supporting the human translator. machine translation project for translating Unfortunately, the availability of these between European languages (see tools is limited for resource-scarce www.euromatrix.net). The EuroMatrix project languages. In this paper, we introduce uses the parliamentary records of the European the Autshumato Integrated Translation Parliament, available in all 23 official languages Environment that facilitates a of the European Union, as one of their main comprehensive set of translation tools, resources for the development of machine including machine translation and translation systems. The parliamentary records of translation memory. The Autshumato the South African government are only recorded Integrated Translation Environment is in English on both provincial and national level. specifically developed for translation The result of this is that a large number of South between the official South African African citizens are denied access to important languages, but it is in essence language- government information in their home language, independent and can therefore be used to since English is the mother tongue of only 8.2% translate between any language pair. of the South African population (Van der Merwe and Van der Merwe, 2006). Another negative 1 Introduction aspect is that researchers and developers working on the development of machine translation South Africa is a culturally diverse country, with systems for South African languages do not have a large number of different ethnic groups. This access to as large amounts of data as their rich cultural diversity is the main reason for European colleagues. South Africa having no fewer than eleven The South African Government is aware of the official languages. The South African importance of elevating the status and promoting Constitution (Republic of South Africa, 1996) the equal use of all eleven official languages on guarantees equal status to each of the eleven all levels. As a result of this, the government is official languages. involved in a large number of initiatives and Translating between these eleven languages is projects to promote multilingualism. One such as an immense task, and unsurprisingly, English project is the Autshumato1 Project that was often acts as the primary language for 1 government communication. The problem with Autshumato was a Khoi-khoi leader that worked as English being the lingua franca is that the other an interpreter between the Europeans and the Khoi- khoi people during the establishment of the Dutch 10 official languages are marginalised. settlement at the Cape of Good Hope in the 17th century (Giliomee and Mbenga, 2007). Autshumato © 2009 European Association for Machine Translation. can for this reason be viewed as one of the first translators in South Africa. Proceedings of the 13th Annual Conference of the EAMT, pages 190–196, Barcelona, May 2009 190 commissioned by the South African Department similar documents is not available. The of Arts and Culture for the development of NLS has a terminology-database, but this translation tools and resources for the eleven is not electronically accessible by the official South African languages. An integral translators due to proprietary licensing part of the Autshumato Project is the Autshumato restrictions. The translators do however Integrated Translation Environment (ITE), receive a hard copy of the terminology which is the subject of this paper. database. The rest of this article is organised as follows: • In the past, some translators used SDL the next section provides information about the Trados (see www.trados.com), a end-user requirements for the Autshumato ITE; translation memory (TM) system. SDL Section 3 describes related work; and Section 4 Trados is not used anymore, due to provides detailed information about the design software compatibility and licensing and implementation of the system. The main issues. functionalities are discussed in Section 5, while the paper concludes with a discussion of future • No online machine translation systems work in Section 6. exist for South African languages. In general, insufficient online resources are 2 End-user requirements available for African languages. As indicated in the previous section, the South • Translators do not use a standard file African government’s translation agencies do not naming convention. The consequence of have the capacity to translate all government this is difficulty in finding parallel documents and communications into all of the translations of the same document. official languages. The magnitude of this • problem is increased by the lack of availability of Hardware – Most translators do not have translation tools for the eleven official languages. dual monitors connected to their The purpose of the Autshumato project is to computers. Dual monitors can be useful develop tools and resources that will help when translating between electronic government translators increase both the quantity documents. and the quality of their translation work. • Translators make no use of previously Although these tools are specifically developed translated documents. Some documents for use by government translators, it will in contain duplicate information that can be future be released under an open source license copied instead of translated again. for use by all translators. Previously translated documents can also Since government translators are the most act as a template for new documents on important clients of the Autshumato Project, one the same topic or genre. of the first objectives of this project was to • determine the end-user requirements of the Some translators perform translation by translators working at various government overtyping on the original document. departments. The end-user requirements were They do not keep copies of the original elicited by means of questionnaires and joint- documents, making it impossible to obtain requirements planning (JRP) sessions. The joint parallel-translated documents. requirements planning sessions were held with • The South African government have translators working at the offices of the South decided to implement open source African National Language Service (NLS) and software in all government departments. members of the development team. The results of Proprietary operating systems and word the questionnaires and joint requirements processors are being phased out in favour planning sessions can be summarised as follows: of open source solutions. The problem with this is that the change to open source • Inaccurate translations – The overload of is happening very slowly. Some work that translators receive has a departments have already implemented negative influence on the quality of open source, while others are still using translations. proprietary software. The implication of • A system that contains a memory of this is that all software developed for the terms previously coined and used in 191 South African government must be open fact that all the translation tools/resources source and cross-platform compatible. required by the translator are accessible from within a single environment. These translation • Most translators use Microsoft® Office tools and resources include translation memory, Word as their default translation machine translation, and term banks (glossaries). environment. They are used to the graphic A diagram of the tools/resources that provide user interface of Microsoft® Office Word, input to the ITE is displayed in Figure 1. The and therefore would prefer a computer ITE must furthermore support open standards assisted translation program with a similar (i.e. TMX, XLIFF, etc.) to ensure compatibility user interface to that of Microsoft® Office with other translation tools. Using open Word. standards also ensures that information is always • Translators working at government accessible, and never becomes locked away departments have basic computer skills. within a proprietary format requiring a legacy Translators require computer assisted application to access. translation tools with a shallow learning curve that are simple and intuitive to use. • Surprisingly, most translators are not opposed to machine translation systems. The abovementioned preferences were analysed and rendered into end-user requirements, which steered the design and development of Autshumato ITE. 3 Related Work Translation memories and machine translation systems are traditionally seen as opposing technologies. Although combining translation memory with machine translation is not a novel idea (Mügge, 2001 and Shuttleworth, 2002), very few computer-assisted