<<

Target Preposition Selection – an Experiment with Transformation-Based Learning and Aligned Bilingual Data

Ebba Gustavii Department of Linguistics and Philology, Uppsala University, Sweden [email protected]

Abstract. The translation of prepositions is often considered one of the more difficult tasks within the field of machine translation. We describe an experiment using transformation- based learning to induce rules to select the appropriate target language preposition from aligned bilingual data. Results show an accuracy of 84.9%, to be compared with a baseline of 75.5%, where the most frequent translation alternative is always chosen.

1. Introduction The paper is organized as follows: In the second section, we will look into the heteroge- The selection of prepositions may be due to lots neous nature of prepositions and discuss some of factors, some of which are mainly idiosyn- of its implications on the translation process. In cratic to the language in question, and some of the third section, we will briefly review some which are dependent on the content that the previous experiments on related tasks; we will prepositions contribute with. In the field of ma- specifically consider whether they have involved chine translation, the translation of prepositions the use of aligned bilingual data or not. The fourth is thus often considered to be one of the more section will outline and motivate the main fea- difficult issues, and often there are separate mo- tures of the current approach. In the fifth sec- dules dedicated to that task. tion, transformation-based learning will be in- The many dependencies, often lexical in na- troduced. The sixth section presents the actual ture, make it cumbersome, maybe even unfeasi- experiment: the data and tools, the parameter ble, to manually identify and formalize the const- settings and the choice of templates. Section raints necessary to translate prepositions appro- seven is devoted to a presentation of the results. priately. With the growing bulk of large parallel In the final section, some concluding remarks corpora, however, supervised machine-learning will be given. techniques may be used to facilitate the tedious work: either by revealing patterns hidden in the 2. How Prepositions Translate data, or more directly, by using the techniques to generate classifiers selecting the appropriate Linguists often distinguish two types of prepo- preposition. sitional uses; their functional use and their lexi- Here we will take the latter approach, and cal use.1 In its functional use, a preposition is apply transformation-based learning to induce governed by some other , most often by a rules for correcting prepositions output by a as in example 1, but sometimes by an ad- rule-based machine translation system. Selec- jective (afraid of), or a (belief in). tional constraints will be sought in the target lan- guage context. For training, however, solely 1. I believe in magic. aligned bilingual corpus data will be used, and one rule sequence will be induced for each source language preposition. Each classifier will 1 Other labels that have been used for approxi- be trained on target language prepositions actu- mately the same distinction are: determined vs. non- ally being aligned to the respective source lan- determined, synsemantic vs. autosemantic and non- guage preposition. predicative vs. predicative. (Tseng, 2002)

112 EAMT 2005 Conference Proceedings Target Language Preposition Selection

The selection of a functional preposition is de- content only, but may be further constrained by termined by the governor, and the preposition is the they govern. We say at the bank and typically not carrying much semantic informa- in the store, though the prepositions contribute tion. This is evident when comparing semanti- with approximately the same meaning in both cally similar taking different prepositions, cases. (For an in-depth discussion on classifica- such as charge NP with NP, blame NP for NP, tional issues of prepositions, see Tseng (2000)). and accuse NP of NP. When translating a func- When choosing a strategy for selecting the tional preposition, the identity of the source appropriate target preposition, one should thus language preposition is thereby of less impor- keep both kinds of prepositional uses in mind - tance. Rather, the crucial information lies in the something which implies the need for both bi- co-occurrence patterns of the target language.2 lingual and monolingual data. Working from an interlingual perspective, Miller (1998) suggests that content-free prepositions, 3. Related Work which roughly coincide with prepositions in Several strategies have been suggested for the their functional use, need not be represented at task of selecting the appropriate target word in the inter-lingual level at all, but are better treated context. Most of these, however, address the as a problem of generation. Within a corpus-based translation of content . We will take a brief strategy, this would correspond to using only mo- look at some of the more influential such pro- nolingual target data as corpus data. posals. For the specific task of selecting the ap- In their lexical use, prepositions are not de- propriate target preposition, we will take a closer termined by some governing word, but are se- look at a strategy proposed by Kanayama (2002). lected due to their meaning. In example 2, other The methods suggested for target word se- prepositions than in are grammatically valid, lection may be classified according to whether e.g. under or beside, but these would alter the they make use of aligned bilingual corpus data meaning of the utterance. or not. 2. The rabbit is in the hat. The obvious advantage of not using aligned When translating a lexical preposition, the iden- bilingual corpora, but monolingual corpora in- tity of the source language preposition, or rather stead, is the vast increase in data available. Da- the content it carries, is thus of importance; gan and Itai (1994) suggest a statistically-based something which implies the need for bilingual approach using a monolingual target corpus and data. a bilingual dictionary. When the bilingual dic- The best place to look for clues for the selec- tionary gives several translation alternatives for tion of a target preposition is evidently depend- a word, the context is considered, and the alter- ent on whether the source preposition is func- natives are ranked according to how frequently tional or lexical. The optimal strategy would thus they occur in a similar context in the target lan- be to treat functional and lexical prepositions guage corpus. When there is more than one se- differently. In practice, however, it turns out to lection to be made, the order is determined by a be very difficult to classify prepositional uses constraint propagation algorithm. The results into these categories. The verb put, for instance, taken from an evaluation on a small English- subcategorizes for a direct and a locative Hebrew test set were promising, showing a re- where the latter often is expressed by a preposi- call of 68% and a precision of 91%. tional phrase (e.g. put the vase on the table). Kanayama (2002) presents an algorithm spe- The prepositional phrase is thus subcategorized cifically tailored to acquire statistical data for for, but still, the selection of the preposition is the translation of the Japanese postposition de to semantically based. Moreover, lexical preposi- the appropriate English preposition. Following tions are not always chosen on the basis of their Dagan and Itai (1994), he selects the target word on the basis of co-occurrence patterns in the target language. For the experiment, however, 2 This is a bit simplified. The particular syntactic also a Japanese parsed corpus is used, from relation that is signaled by the source language prepo- which almost half a million verb phrases with sition may of course be of relevance. the postposition de are extracted. These are par-

EAMT 2005 Conference Proceedings 113 Ebba Gustavii tially translated to English, with the preposition the number of acceptable produced by the sys- left unspecified. Next, a parsed English news- tem from 37 to 45 sentences out of 100. (Brown paper corpus is searched for the partial transla- et al, 1991b) tions where the unspecified preposition is in- In statistical machine translation, aligned bi- stantiated as one of six predefined translations lingual data plays a major role in the selection of de. When translating de, the most frequent of target words. Probability estimates are ex- target preposition, given the surrounding verb tracted from a translation model and a language and noun, is chosen. In case there are no such model, which are built from an aligned bilin- tuples in the data, only the noun context is con- gual corpus and a monolingual corpus, respec- sidered. As a last resort a default preposition is tively. In part, however, the problem noted by selected. The reported total precision was 68.5%, Dagan and Itai (1994) still prevails; since the to be compared with a baseline of 41.8% (where target language model is built on non-aligned the default translation is always chosen). data, there are no means to distinguish the dif- Dagan and Itai (1994) note that the use of ferent sources when context statistics are gath- non-aligned corpus data alone, makes it impos- ered for a target word. sible to distinguish between instances of a tar- get word that corresponds to different source 4. Main Features of the Current words when gathering context statistics for the Approach target words. Therefore, each instance of a tar- get word will be treated as a translation of all The aim of the current experiment is to construct the source words for which it is a potential classifiers able to correct prepositions output from translation. In both experiments, this has been re- a rule-based MT-system. We will assume that ported to be a source of errors. For instance, the the rule-based system, as a default, picks the algorithm suggested by Kanayama selects with most frequent target language preposition given over for in work (with/for) the company, since the source preposition. Our task will thus be to that construction is the most frequent one in the identify the contexts where this default selec- target language corpus. In the particular context tion should be overridden, and the selected though, with is not an appropriate translation of preposition be changed for a more appropriate 3 de, but corresponds to the translation of some one. We will avoid inducing rules where a other adposition. preposition should be changed to some other Approaches to target word selection that part-of-speech, or where it should be completely make use of aligned bilingual data have also removed, since such rules would alter the out- been suggested. Among the more influential ones put structure in an uncontrolled way. The are Brown et al (1991a; 1991b). In their pro- will consequently be on situations where prepo- posal, the translation process is preceded by a sitions translate as prepositions. This limits the sense-labeling phase, where ambiguous words applicability of the strategy to relatively similar are labeled with senses that correspond to dif- , as the ones of the current study (Swed- ferent translations in the particular target lan- ish and English). guage. A word token is sense-labeled by refer- To induce the classifiers we will use the ence to a single in its context (e.g. the symbolic induction algorithm transformation- first verb to its right). For each ambiguous word based learning (TBL) (for a very brief introduc- the algorithm identifies the informant site that tion, see section 5). TBL has successfully been partitions the tokens in a way that maximizes applied to a wide range of NLP-tasks, e.g. part- the mutual information between the senses and of-speech tagging (Brill, 1995), prepositional the aligned translations. For instance, when trans- phrase attachment (Brill & Resnik, 1994), spell- lating the French verb prendre to English, the ing correction (Mangu & Brill, 1997) and word most informative feature was found to be the sense disambiguation (Lager & Zinovjeva, 2001). accusative object (approximated as the closest succeeding noun). By incorporating the sense- 3 We will assume that the rule-based system an- labeling technique into a statistical machine notates whether prepositions are output as defaults or translation system, Brown et al (1991b) increased have been selected by some rule. The post-processing filter should only be applied to the former ones.

114 EAMT 2005 Conference Proceedings Target Language Preposition Selection

For the current task, where we look for contexts nominals inside the prepositional phrase. The in which a default selection should be overrid- potential governors will be approximated as the den, we find TBL to be particularly well-suited; closest preceding verb, noun or , and starting with a good heuristic and then, itera- the governed nominals, as the closest succeed- tively, define contexts where previous decisions ing noun. With fully parsed data, the governor, should be changed, is at the heart of TBL. as well as the governed nouns, would be recog- Paliouras et al (2000) compare the perform- nized with higher precision. The resulting clas- ance of different machine learning techniques sifiers would however be dependent on having (symbolic induction algorithms, probabilistic clas- access to fully parsed data, something which is sifiers and memory-based classifiers) on word not always output from rule-based MT-systems. sense disambiguation (WSD), and find the sym- bolic induction algorithms to give the best re- 5. Transformation-Based sults. Since WSD and target word selection are Learning relatively similar tasks, this gives further moti- vation for the choice of a symbolic induction Transformation-based learning, introduced by Brill algorithm for the task at hand. (1995), is an error-driven symbolic induction Since the selection of target language prepo- algorithm that learns an ordered set of rules from sitions to a great extent is due to factors idio- annotated training data. The format of the in- syncratic to the target language, we will follow duced rules is determined by a set of rule tem- Dagan and Itai (1994), and Kanayama (2002), plates that define what features the rules are to in looking for selectional constraints in the tar- condition. In a first stage, the algorithm labels get language context. To avoid confusing the every instance with its most likely tag (initial sources, as may happen when non-aligned data annotation). It then iteratively examines every is used, we will however use an aligned bilingual possible rule-instantiation and selects the one corpus, and induce one rule sequence for each which improves the overall tagging the most. source language preposition. Each classifier will The iteration continues until no rule-instantia- be trained on actual translations (i.e. alignments) tion reaches a reduction in error above a certain only of the respective source language preposi- threshold. tion. This strategy, to look for selectional con- In our experiments we use µ-TBL, a flexible straints in the target language context, while still and efficient prolog-implementation of a gener- keeping track of the identity of the source lan- alized form of transformation-based learning, de- guage preposition, may be viewed as a com- veloped by Lager (1999). promise to accommodate for both functional and lexical uses of prepositions. 6. Experimental Setup The classifiers will have access to the word form, the lemma and the part-of-speech of the 6.1. Data and Evaluation potential contextual triggers. We will primarily As parallel corpus data, we have used a subset of accommodate for selectional constraints trig- the Swedish-English EUROPARL corpus (Koehn, gered by governing words, or from governed n.d.). The subset consists of approximately 3

Source Language Accuracy Accuracy Nr of Training In- Preposition TBL Baseline stances i (in) 87.0% 83.3% 27190 av (of) 89.4% 79.8% 21182 för (for) 80.2% 73.2% 14632 med (with) 88.6% 85.4% 8465 på (on) 81.1% 45.3% 7898 om (on) 73.4% 59.3% 7502 Total: 84.9% 75.5% - Table 1. Accuracy for the six most frequent source language prepositions (score threshold 2, accuracy threshold 0.6). Baseline calculated from always selecting the most frequent translation (given in brackets).

EAMT 2005 Conference Proceedings 115 Ebba Gustavii million tokens in each language, out of which 6.3. µ-TBL – Parameter Settings approximately 90% were used for training, and When running the µ-TBL system, the user must the remaining 10% were left for testing. The decide on a minimum score threshold4 and a corpus was word- aligned with the GIZA++ tool- minimum accuracy threshold5. The optimal val- kit (Och & Ney, 2000). ues of these depend on the data at hand, and are To identify the prepositions, and to accom- best estimated empirically. Here we have only modate for more general rules to be learnt, the experimented with three values for each: 2, 4, corpus was part-of-speech tagged. For both lan- and 6 as possible score thresholds, and 0.6, 0.8 guages the TnT-tagger (Brants, 2000) was used, and 1.0 as possible accuracy thresholds. with a model extracted from the Penn Treebank

Wall Street Journal Corpus (Marcus et al, 1994) for the English part, and from the Stockholm- 7. Experimental Results Umeå Corpus (Ejerhed et al, 1992) for the Swed- ish part (Megyesi, 2002). The best overall results, presented in Table1, In the English part, all verbs, nouns and ad- were achieved with a score threshold of 2, and jectives were lemmatized with the morphologi- an accuracy threshold of 0.6. The increase in cal tool morpha. (Minnen et al, 2001) accuracy, as compared to a baseline where the From the aligned and processed corpus, train- most frequent translation of each preposition is ing and testing sets were extracted for the six always selected, is quite varied for the different most frequent prepositions in the training cor- source language prepositions. It ranges from 3.2 pus: i, av, för, med, på and om. For each of those, to 35.8 percentage points, and is generally we extracted the aligned target language prepo- higher where the baseline is low. The two prepo- sitions in their sentence context. sitions that show the highest baseline are med The target prepositions in the training and the and i. For these, the most frequent translation is testing sets were initially annotated with the most appropriate in more than 80% of the cases. By frequent translation of their respective source pre- adding the post-processing filter to these, the ac- positions (as estimated from the training corpus). curacy only slightly increases (by 3.2 and 3.7 In so doing, we are simulating the output of an percentage points respectively). For på and om, MT-system that always selects the most frequent on the other hand, the most frequent translation translation of a source language preposition. is appropriate in only 45.3% and 59.3% of the Each rule sequence was evaluated by run- respective cases. Adding the post-processing fil- ning the built-in evaluation function in µ-TBL ter to these dramatically improves the accuracy on its respective test set. (by 35.8 and 14.1 percentage points respectively). Intuitively, med and i are more inclined to be 6.2. Templates used lexically than are på and om. This may, in The templates determine the format of the rules part, explain why the baseline strategy of sim- to be learnt, or more specifically, what features ply selecting the most frequent translation is so should be conditioned by the rules. As was pre- much more effective for the former two prepo- viously noted, we have defined the templates to sitions than it is for the latter two. accommodate for selectional constraints triggered Summing up the results for all six preposi- either from some governing word, or from a word tions, the application of the learnt rule sequences inside the prepositional phrase. Templates for gives an accuracy of 84.9% which corresponds external triggers are defined to condition the to an increase of 9.4 percentage points as com- closest preceding noun, verb or adjective. There pared to the baseline. are also supplementary templates conditioning any immediately preceding word and/or part-of- speech. Templates for internal triggers are defined to condition the closest succeeding noun. Also 4 The score of a rule is its number of positive in- here supplementary templates are defined to con- stances minus its number of negative instances dition any immediately succeeding word and/or 5 The accuracy of a rule is its number of positive part-of-speech. instances over its total number of instances.

116 EAMT 2005 Conference Proceedings Target Language Preposition Selection

8. Concluding Remarks DAGAN, I. and A. Itai. (1994). ‘Word Sense Disam- biguation Using a Second Language Monolingual Cor- We have reported on an experiment with using pus’. Computational Linguistics, (20:4):563-596. transformation-based learning to induce rules to EJERHED, E., G. Källgren, O. Wennstedt and M. select target language prepositions. Selectional Åström. (1992). ‘Linguistic Annotation System of the constraints have been sought in the target lan- Stockholm-Umeå Project’. Technical Report, Depart- guage context. To avoid loosing control of the ment of General Linguistics, University of Umeå. source language prepositions, we have used KANAYAMA, H. (2002). ‘An Iterative Algorithm aligned bilingual corpus data only, and induced for Translation Acquisition of Adpositions’. In Pro- one rule sequence for each source language ceedings of the 9th Conference on Theoretical and preposition. Methodological Issues in Machine Translation (pp. An evaluation, using the built-in evaluation 85-95), Keihanna, Japan. function in µ-TBL, revealed an accuracy of KOEHN, P. (n.d.). ‘Europarl: A Multilingual Corpus 84.9% which corresponds to an increase of 9.4 for Evaluation of Machine Translation’. Draft, Un- percentage points as compared to the baseline published. where the most frequent translation is always selected. LAGER, T. (1999). ‘The µ-TBL System: Logic Pro- It still remains to be investigated how the gramming tools for Transformation-Based Learn- application of the rule sequences would perform ing’. In Proceedings of the 3rd International Workshop on Computational Natural Language Learning (pp. on data output from a real MT-system. The 33-42), Bergen, Norway. rules are conditioning target words in the con- text of the prepositions, and the applicability of LAGER, T., N. Zinovjeva. (2001). ‘Sense and De- the rules is thus dependent on the translation of duction: The Power of Peewees Applied to the SEN- the surrounding words. The effect of this is SEVAL-2 Swedish Lexical Sample Task’. In Pro- something which can only be estimated empiri- ceedings of SENSEVAL-2: 2nd International Work- shop on Evaluating Word Sense Disambiguation Sys- cally. tems, Toulouse, France. MANGU, L. and E. Brill. (1997). ‘Automatic rule ac- 9. References quisition for spelling correction’. In Proceedings of the 14th International Conference on Machine Learn- BRANTS, T. (2000). ‘TnT – a statistical part-of-speech ing (pp. 187-194), Nashville, Tennessee. tagger’. In Proceedings of the 6th Applied NLP Con- MARCUS, M., B. Santorini, M.-A. Marcinkiewicz. ference (pp. 224-231), Seattle, USA. (1994). ‘Building a large annotated corpus of Eng- BRILL, E. and P. Resnik. (1994). ‘A rule-based ap- lish: The Penn Treebank’. Computational Linguistics, proach to prepositional phrase attachment disambigua- 19(2):313:330. tion’. In Proceedings of the 15th conference on Com- MEGYESI, B. (2002). ‘Data-Driven Syntactic Analy- putational Linguistics (pp. 1198-1204), Kyoto, Japan. sis – Methods and Applications for Swedish’. PhD BRILL, E. (1995). ‘Transformation-based error-dri- thesis. Department of Speech, Music and Hearing, ven learning and natural language processing: A case KTH, Stockholm, Sweden. study in part-of-speech tagging’. Computational Lin- MILLER, K. (1998). ‘From above to under: Ena- guistics, (21:4):543-566. bling the Generation of the Correct Preposition from BROWN, P., S. Della Pietra, V. Della Pietra, R. Mer- an Interlingual Representation’. In Proceedings of cer. (1991a). ‘A statistical approach to sense disam- the AMTA/SIG-IL Second Workshop on Interlin- biguation in machine translation’. In Proceedings of guas, Langhorne, Pennsylvania. the DARPA Workshop of Speech and Natural Lan- MINNEN, G, J. Carroll and D. Pearce. (2001). ‘Ap- guage (pp. 146-151), Pacific Grove, California. plied morphological processing of English’. Journal BROWN, P., S. Della Pietra, V. Della Pietra, R. Mer- of Natural Language Processing, (7:3):207-223. cer. (1991b). ‘Word Sense Disambiguation using OCH, F., H. Ney . (2000). ‘Improved Statistical Align- statistical methods’. In Proceedings of the 29th An- ment Models’. In Proceedings of the 38th Annual nual Meeting of the Association for Computational Meeting of the Association for Computational Lin- Linguistics (pp. 264-270), Berkeley, California. guistics (pp. 440-447), Hong Kong, China.

EAMT 2005 Conference Proceedings 117 Ebba Gustavii

PALIOURAS, G., V. Karkaletsis, I. Androutsopou- ings of the 2nd International Conference on Natural los and C.D. Spyropoulos . (2000). ‘Learning Rules Language Processing’ (pp. 383-394), Patra, Greece. for Large- Vocabulary Word Sense Disambiguation: TSENG, J. L. (2000). ‘The Representation and Selec- A of Various Classifiers’. In Proceed- tion of Prepositions’. PhD Thesis, University of Ed- inburgh.

118 EAMT 2005 Conference Proceedings