Automated Translation at Grenoble University

AUTOMATED TRANSLATION AT GRENOBLE UNIVERSITY Bernard Vauquois and Christian Boitet Groupe d'Etudes pour ia Traduction Automatique Boite Postale 68 Universit~ de Grenoble 38402 Saint-Martin-d'H/~res, France INTRODUCTION syntactic analysis, direct mapping into a dependency analysis, procedural semantic analysis (using a special- The authors thank Dr. Slocum for the opportunity to ized low-level programming language to transform tree present the work on machine translation at Grenoble. structures) into an interlingua (the famous "language The plan he has proposed for the contributions to this pivot", or "pivot language"), lexical transfer, syntactic special issue was certainly a very good starting point, as a generation (using the same low-level tree-transformation common frame to present various systems around the language), and morphological generation (beginning with world. a recursive top-down procedure applied to the tree to get It is, however, inevitable that we could not completely the correct order of the leaves, to be transformed into fit into it, so that we have sometimes taken some liberty words). for which we hope to be excused. Since 1972, another approach has been followed that relies more on the use of SLLPs (Specialized Languages 1 PROJECT HISTORY for Linguistic Programming); that is, on procedural tech- I.I DATESAND FUNDING SOURCES niques like controlled production systems and heuristic programming. Also, a main goal has been to design the Founded in 1972, GETA is one of the laboratories of the basic software and the various lingwares with a view to Computer Science Department at Grenoble University. multilingual translation. From its inception it was supported by CNRS, the French A completely integrated programming environment National Center for Research, by means of association (ARIANE-78) has been developed and used to build a contracts renewed every four years, and by the Universi- variety of linguistic models, in order to test the general ty, having the status of "University Research Team asso- multilingual design and the various facilities for lingware ciated with CNRS". Also, several projects have been preparation, debugging, and actual use, as well as for partially supported by contracts from the Ministries of human pre- and post-editing. Defense, Telecommunications, and Industry. Before that, in 1961, CNRS had created CETA 1.2 LANGUAGESTRANSLATED; PROJECT (Centre d'Etudes pour la Traduction Automatique) as a AND SYSTEM SIZES "laboratoire propre", that is, a laboratory supported by The first three subsections below describe the kinds of CNRS funds and various contracts but not by the Univer- experiments performed at GETA. sity. From 1961 to 1971, CETA elaborated some basic 1.2.1 WRITINGOF VERY REDUCED SCALE MODELS software tools for MT, and experimented with it mainly The writing of very reduced scale models is mainly on translation from Russian into French. Some investi- oriented to training researchers in the methodology of gations were also carried out into the analysis of German MT and the use of the various SLLPs available under and Japanese. In 1971, a corpus including about ARIANE-78. 400,000 running words had been translated from Russian As an example, Feng Zhi Wei (ISTIC, Beijing) has into French, after several years of development of the written a small multilingual translation system from various grammars and dictionaries (the "lingware", so to Chinese into Japanese, French, English, German, and speak). Russian, that (of course) uses the same analysis modules The system was typically second generation: finite- for all language pairs. state morphological analysis, augmented context-free Copyright1985 by the Association for Computational Linguistics. Permission to copy without fee all or part of this material is granted provided that the copies are not made for direct commercial advantage and the CL reference and this copyright notice are included on the first page. To copy otherwise, or to republish, requires a fee and/or specific permission. 0362-613X/85/010028-36503.00 28 Computational Linguistics, Volume 11, Number 1, January-March 1985 Bernard Vauquois and Christian Boitet Automated Translation at Grenoble University A transfer from English (using a bigger model of 1350 lines for the morphological analysis English analysis) to Chinese and a generation of Chinese 3100 lines for the structural analysis has been produced by Ping Yang (ISTIC). In a similar 380 lines for the structural transfer way, using the same analysis of English, Professor Jun- 930 lines for the syntactic generation Ichi Tsujii (of Kyoto University) conducted an exper- 120 lines for the morphological generation iment into English-Japanese translation. Since 1983 this system has been used to test an exper- imental "translation unit" using a production-oriented 1.2.2 CONSTRUCTIONOF SMALLSCALE MODELS subset of ARIANE-78, PROTRA. Photocopies of techni- The construction of small scale models - that is, with cal abstracts from the Referativnyij Zhurnal are regularly small dictionaries but medium size grammars - has been received by the unit, manually inputted, checked for done for feasibility studies and training. typing errors (by using the morphological analyzer), A French-English system was constructed by J,Ph. machine translated, manually revised (using a multiwin- Guilbaud and M. Dymetman in the framework of a feasi- dow editor), and sent back. The revision effort includes bility study for the Ministry of Telecommunications the search for the source of errors encountered. during 1981. This system has recently been reused to As it is, and taking into account that manual input implement a rough "self grading" technique to be used in represents the major bottleneck, around 50 to 60 connection with a translator work station (A. Melby, abstracts per month (5000 to 7500 running words) are BYU). processed. Fields covered include space sciences and Another example is the development of a German- metallurgy. French system, using the same generation as the Next comes the English-Malay system (IN1-BM1). In Russian-French system (see below). This work was 1979, a cooperative project with USM (Universti Sains started around 1979, with J. Ph. Guilbaud of GETA in Malaysia, Penang) was launched. The aim of the project charge of morphological analysis and transfer, and is to produce an English-Malay system for the translation Professor Stahl (Paris) in charge of structural analysis. of teaching material in technical fields. The level of labo- In cooperation with IFCI (Institut de Formation et de ratory prototype should be attained by the end of 1984, Conseil en Informatique, Grenoble), another system, when systematic tests will be performed. BEX-FEX, has been developed for teaching purposes (B. The analysis part has been jointly developed by GETA Roudaud, S. Chappuy, and E. Guilbaud). The analysis is and USM. It is the same that was mentioned earlier: a simplified version of the IN1 analysis (see below), and having initially been started for this project, it has been the dictionaries are purposely very small (500 lexical reused for experiments of translation into other units for each language). languages. As far as size is concerned, the dictionaries In order to give an idea of the complexity of such a contain 1800 English lexical units (about 3000 words) system, rather than measure the number of grammatical and 1800 Malay lexical units (about 2700 words). The rules, because they can be very simple or very complex, various grammars contain we measured the number of source lines of the grammars 420 lines for the morphological analysis written in the SLLPs ATEF, ROBRA, or SYGMOR. 5000 lines for the structural analysis BEX-FEX contains 600 lines for the structural transfer 420 lines for the morphological analysis 1500 lines for the syntactic generation 2500 lines for the structural analysis 670 lines for the morphological generation 300 lines for the structural transfer More recently, a similar project was started with 1000 lines for the syntactic generation several Universities of Thailand (Chulalongkorn and 100 lines for the morphological generation Rakhamhaeng in Bangkok, Prince of Songkla in Had- Also in connection with IFCI, a feasibility study for an Yai) for translation from English into Thai. English-Arabic system was started in 1984. Last but not least, ARIANE-78 and the related linguistic techniques have been selected as the basis for an !.2.3 BUILDINGOF LARGE SCALE SYSTEMS industrial development, in the framework of the French Large scale systems have been built at the level of labo- National Project. After a preparation phase in 1982-83 ratory prototypes. (ESOPE project), this project was formally launched in The largest system with regard to the vocabulary, and late 1983, with a group of private companies around SG2 the most experimented with, is the Russian-French as industrial partners. system RUB-FRB. According to an estimate made in June 1.2.4 PROJECT SIZES 1984, the dictionaries contain 7000 Russian lexical units (about 17000 words) and 5600 French lexical units AS far as the number of people "engaged in the project" (about 16000 words). is concerned, no easy answer may be given, because of By "words", we mean here lenuna, that is, a normal the multiplicity of the experiments. GETA itself has a form of an occurrence, usually used to identify an item in core of about ten people engaged in software or ling- a "natural" dictionary. The various grammars contain ware, plus a varying number of students, not working Computational

Load more