Using Danish as a CG Interlingua: A Wide-Coverage Norwegian-English Machine Translation System Eckhard Bick Lars Nygaard Institute of Language and The Text Laboratory Communication University of Southern Denmark University of Oslo Odense, Denmark Oslo, Norway
[email protected] [email protected] Abstract running, mixed domain text. Also, some languages, like English, German and This paper presents a rule-based Japanese, are more equal than others, Norwegian-English MT system. not least in a funding-heavy Exploiting the closeness of environment like MT. Norwegian and Danish, and the The focus of this paper will be existence of a well-performing threefold: Firstly, the system presented Danish-English system, Danish is here is targeting one of the small, used as an «interlingua». «unequal» languages, Norwegian. Structural analysis and polysemy Secondly, the method used to create a resolution are based on Constraint Norwegian-English translator, is Grammar (CG) function tags and ressource-economical in that it uses dependency structures. We another, very similar language, Danish, describe the semiautomatic as an «interlingua» in the sense of construction of the necessary translation knowledge recycling (Paul Norwegian-Danish dictionary and 2001), but with the recycling step at the evaluate the method used as well SL side rather than the TL side. Thirdly, as the coverage of the lexicon. we will discuss an unusual analysis and transfer methodology based on Constraint Grammar dependency 1 Introduction parsing. In short, we set out to construct a Norwegian-English MT Machine translation (MT) is no longer system by building a smaller, an unpractical science. Especially the Norwegian-Danish one and piping its advent of corpora with hundreds of output into an existing Danish deep millions of words and advanced parser (DanGram, Bick 2003) and an machine learning techniques, bilingual existing, robust Danish-English MT electronic data and advanced machine system (Dan2Eng, Bick 2006 and 2007).