Cross-lingual Transfer of Semantic Role Labeling Models

Mikhail Kozhevnikov and Ivan Titov Saarland University, Postfach 15 11 50 66041 Saarbrucken,¨ Germany mkozhevn|titov @mmci.uni-saarland.de { }

Abstract have been proposed. On one end of the scale is Semantic Role Labeling (SRL) has be- unsupervised SRL, such as Grenager and Manning come one of the standard tasks of natural (2006), which requires some expert knowledge, language processing and proven useful as but no labeled data. It clusters together arguments a source of information for a number of that should bear the same semantic role, but does other applications. We address the prob- not assign a particular role to each cluster. On the lem of transferring an SRL model from other end is annotating a new dataset from scratch. one language to another using a shared There are also intermediate options, which often feature representation. This approach is make use of similarities between languages. This then evaluated on three language pairs, way, if an accurate model exists for one language, demonstrating competitive performance as it should help simplify the construction of a model compared to a state-of-the-art unsuper- for another, related language. vised SRL system and a cross-lingual an- The approaches in this third group often use par- notation projection baseline. We also con- allel data to bridge the gap between languages. sider the contribution of different aspects Cross-lingual annotation projection systems (Pado´ of the feature representation to the perfor- and Lapata, 2009), for example, propagate infor- mance of the model and discuss practical mation directly via alignment links. How- applicability of this method. ever, they are very sensitive to the quality of par- allel data, as well as the accuracy of a source- 1 Background and Motivation language model on it. Semantic role labeling has proven useful in many An alternative approach, known as cross-lingual natural language processing tasks, such as ques- model transfer, or cross-lingual model adaptation, tion answering (Shen and Lapata, 2007; Kaisser consists of modifying a source-language model to and Webber, 2007), (Sammons make it directly applicable to a new language. This et al., 2009), (Wu and Fung, usually involves constructing a shared feature rep- 2009; Liu and Gildea, 2010; Gao and Vogel, 2011) resentation across the two languages. McDon- and dialogue systems (Basili et al., 2009; van der ald et al. (2011) successfully apply this idea to Plas et al., 2009). the transfer of dependency parsers, using part-of- Multiple models have been designed to auto- speech tags as the shared representation of . matically predict semantic roles, and a consider- A later extension of Tackstr¨ om¨ et al. (2012) en- able amount of data has been annotated to train riches this representation with cross-lingual word these models, if only for a few more popular lan- clusters, considerably improving the performance. guages. As the annotation is costly, one would like In the case of SRL, a shared representation that to leverage existing resources to minimize the hu- is purely syntactic is likely to be insufficient, since man effort required to construct a model for a new structures with different semantics may be realized language. by the same syntactic construct, for example “in A number of approaches to the construction of August” vs “in Britain”. However with the help of semantic role labeling models for new languages recently introduced cross-lingual word represen-

1190 Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 1190–1200, Sofia, Bulgaria, August 4-9 2013. c 2013 Association for Computational Linguistics tations, such as the cross-lingual clustering men- classified as being or not being a head of an argu- tioned above or cross-lingual distributed word rep- ment phrase with respect to the in ques- resentations of Klementiev et al. (2012), we may tion and then each of the arguments is assigned a be able to transfer models of shallow semantics in role from a given inventory. The model is factor- a similar fashion. ized over arguments – the decisions regarding the In this work we construct a shared feature repre- classification of different arguments are made in- sentation for a pair of languages, employing cross- dependently of each other. lingual representations of syntactic and lexical in- With respect to the use of syntactic annotation formation, train a semantic role labeling model on we consider two options: using an existing depen- one language and apply it to the other one. This dency parser for the target language and obtaining approach yields an SRL model for a new language one by means of cross-lingual transfer (see sec- at a very low cost, effectively requiring only a tion 4.2). source language model and parallel data. Following McDonald et al. (2011), we assume We evaluate on five (directed) language pairs – that a part-of-speech tagger is available for the tar- EN-ZH, ZH-EN, EN-CZ, CZ-EN and EN-FR, where get language. EN, FR, CZ and ZH denote English, French, Czech and Chinese, respectively. The transferred model 2.2 SRL in the Low-resource Setting is compared against two baselines: an unsuper- vised SRL system and a model trained on the out- Several approaches have been proposed to obtain put of a cross-lingual annotation projection sys- an SRL model for a new language with little or tem. no manual annotation. Unsupervised SRL mod- In the next section we will describe our setup, els (Lang and Lapata, 2010) cluster the arguments then in section 3 present the shared feature repre- of predicates in a given corpus according to their sentation we use, discuss the evaluation data and semantic roles. The performance of such models other technical aspects in section 4, present the can be impressive, especially for those languages results and conclude with an overview of related where semantic roles correlate strongly with syn- work. tactic relation of the argument to its predicate. However, assigning meaningful role labels to the 2 Setup resulting clusters requires additional effort and the model’s parameters generally need some adjust- The purpose of the study is not to develop a yet ment for every language. another semantic role labeling system – any exist- If the necessary resources are already available ing SRL system can (after some modification) be for a closely related language, they can be uti- used in this setup – but to assess the practical ap- lized to facilitate the construction of a model for plicability of cross-lingual model transfer to this the target language. This can be achieved ei- problem, compare it against the alternatives and ther by means of cross-lingual annotation projec- identify its strong/weak points depending on a par- tion (Yarowsky et al., 2001) or by cross-lingual ticular setup. model transfer (Zeman and Resnik, 2008). 2.1 Semantic Role Labeling Model This last approach is the one we are considering in this work, and the other two options are treated We consider the dependency-based version of se- as baselines. The unsupervised model will be fur- mantic role labeling as described in Hajicˇ et al. ther referred to as UNSUP and the projection base- (2009) and transfer an SRL model from one lan- line as PROJ. guage to another. We only consider verbal pred- icates and ignore the predicate disambiguation 2.3 Evaluation Measures stage. We also assume that the predicate identifi- cation information is available – in most languages We use the F1 measure as a metric for the argu- it can be obtained using a relatively simple heuris- ment identification stage and accuracy as an ag- tic based on part-of-speech tags. gregate measure of argument classification perfor- The model performs argument identification mance. When comparing to the unsupervised SRL and classification (Johansson and Nugues, 2008) system the clustering evaluation measures are used separately in a pipeline – first each candidate is instead. These are purity and collocation

1191 the word in the source language it is most often 1 aligned to. PU = max Gj Ci N j | ∩ | Xi Cross-lingual clusters. There is no guarantee 1 that each of the words in the evaluation data is CO = max Gj Ci , present in our dictionary, nor that the correspond- N i | ∩ | Xj ing source-language word is present in the training data, so the model would benefit from the ability where Ci is the set of arguments in the i-th induced to generalize over closely related words. This can, cluster, Gj is the set of arguments in the jth gold cluster and N is the total number of arguments. for example, be achieved by using cross-lingual ¨ ¨ We report the harmonic mean of the two (Lang and word clusters induced in Tackstrom et al. (2012). c We incorporate these clusters as features into our Lapata, 2011) and denote it F1 to avoid confusing it with the supervised metric. model. 3.2 Syntactic Information 3 Model Transfer Part-of-speech Tags. We map part-of-speech tags The idea of this work is to abstract the model away into the universal tagset following Petrov et al. from the particular source language and apply it (2012). This may have a negative effect on the to a new one. This setup requires that we use the performance of a monolingual model, since most same feature representation for both languages, for part-of-speech tagsets are more fine-grained than example part-of-speech tags and dependency rela- the universal POS tags considered here. For exam- tion labels should be from the same inventory. ple Penn inventory contains 36 tags and Some features are not applicable to certain lan- the universal POS tagset – only 12. Since the finer- guages because the corresponding phenomena are grained POS tags often reflect more language- absent in them. For example, consider a strongly specific phenomena, however, they would only be inflected language and an analytic one. While the useful for very closely related languages in the latter can usually convey the information encoded cross-lingual setting. in the word form in the former one (number, gen- The universal part-of-speech tags used in eval- der, etc.), finding a shared feature representation uation are derived from gold-standard annotation for such information is non-trivial. In this study for all languages except French, where predicted we will confine ourselves to those features that are ones had to be used instead. applicable to all languages in question, namely: Dependency Structure. Another important aspect part-of-speech tags, syntactic dependency struc- of syntactic information is the dependency struc- tures and representations of the word’s identity. ture. Most dependency relation inventories are 3.1 Lexical Information language-specific, and finding a shared representa- tion for them is a challenging problem. One could We train a model on one language and apply it to a map dependency relations into a simplified form different one. In order for this to work, the words that would be shared between languages, as it is of the two languages have to be mapped into a done for part-of-speech tags in Petrov et al. (2012). common feature space. It is also desirable that The extent to which this would be useful, however, closely related words from both languages have depends on the similarity of syntactic-semantic in- similar representations in this space. terfaces of the languages in question. Word mapping. The first option is simply to use In this work we discard the dependency rela- the source language words as the shared represen- tion labels where the inventories do not match and tation. Here every source language word would only consider the unlabeled syntactic dependency have itself as its representation and every target graph. Some discrepancies, such as variations in word would map into a source word that corre- attachment order, may be present even there, but sponds to it. In other words, we supply the model this does not appear to be the case with the datasets with a gloss of the target sentence. we use for evaluation. If a target language is poor The mapping (bilingual dictionary) we use is in resources, one can obtain a dependency parser derived from a word-aligned parallel corpus, by for the target language by means of cross-lingual identifying, for each word in the target language, model transfer (Zeman and Resnik, 2008). We

1192 take this into account and evaluate both using the vent the model from capturing overly specific as- original dependency structures and the ones ob- pects of the source language, which we do by con- tained by means of cross-lingual model transfer. fining the model to first-order features. We also avoid feature selection, which, performed on the 3.3 The Model source language, is unlikely to help the model to The model we use is based on that of Bjorkelund¨ better generalize to the target one. The experi- et al. (2009). It is comprised of a set of linear clas- ments confirm that feature selection and the use sifiers trained using Liblinear (Fan et al., 2008). of second-order features degrade the performance The feature model was modified to accommodate of the transferred model. the cross-lingual cluster features and the reranker component was not used. 3.5 Feature Groups We do not model the interaction between differ- For each word, we use its part-of-speech tag, ent argument roles in the same predicate. While cross-lingual cluster id, word identity (glossed, this has been found useful, in the cross-lingual when evaluating on the target language) and its setup one has to be careful with the assumptions dependency relation to its parent. Features associ- made. For example, modeling the sequence of ated with an argument word include the attributes roles using a Markov chain (Thompson et al., of the predicate word, the argument word, its par- 2003) may not work well in the present setting, ent, siblings and children, and the words directly especially between distant languages, as the order preceding and following it. Also included are the or arguments is not necessarily preserved. Most sequences of part-of-speech tags and dependency constraints that prove useful for SRL (Chang et relations on the path between the predicate and the al., 2007) also require customization when applied argument. to a new language, and some rely on language- Since we are also interested in the impact of dif- specific resources, such as a valency lexicon. Tak- ferent aspects of the feature representation, we di- ing into account the interaction between different vide the features into groups as summarized in ta- arguments of a predicate is likely to improve the ble 1 and evaluate their respective contributions to performance of the transferred model, but this is the performance of the model. If a feature group outside the scope of this work. is enabled – the model has access to the corre- sponding source of information. For example, if 3.4 Feature Selection only POS group is enabled, the model relies on Compatibility of feature representations is neces- the part-of-speech tags of the argument, the pred- sary but not sufficient for successful model trans- icate and the words to the right and left of the ar- fer. We have to make sure that the features we use gument word. If Synt is enabled too, it also uses are predictive of similar outcomes in the two lan- the POS tags of the argument’s parent, children guages as well. and siblings. Depending on the pair of languages in ques- Word order information constitutes an implicit tion, different aspects of the feature representation group that is always available. It includes the will retain or lose their predictive power. We can Position feature, which indicates whether the be reasonably certain that the identity of an ar- argument is located to the left or to the right of gument word is predictive of its semantic role in the predicate, and allows the model to look up the any language, but it might or might not be true attributes of the words directly preceding and fol- of, for example, the word directly preceding the lowing the argument word. The model we com- argument word. It is therefore important to pre- pare against the baselines uses all applicable fea- ture groups (Deprel is only used in EN-CZ and CZ-EN experiments with original syntax). POS part-of-speech tags Synt unlabeled dependency graph 4 Evaluation Cls cross-lingual word clusters Gloss glossed word forms 4.1 Datasets and Preprocessing Deprel dependency relations Evaluation of the cross-lingual model transfer re- quires a rather specific kind of dataset. Namely, Table 1: Feature groups. the data in both languages has to be annotated

1193 with the same set of semantic roles following the and English-French, the data is drawn from Eu- same (or compatible) guidelines, which is seldom roparl (Koehn, 2005), for English-Chinese – from the case. We have identified three language pairs MultiUN (Eisele and Chen, 2010). The word for which such resources are available: English- alignments were obtained using GIZA++ (Och Chinese, English-Czech and English-French. and Ney, 2003) and the intersection heuristic. The evaluation datasets for English and Chi- nese are those from the CoNLL Shared Task 4.2 Syntactic Transfer 2009 (Hajicˇ et al., 2009) (henceforth CoNLL-ST). In the low-resource setting, we cannot always Their annotation in the CoNLL-ST is not identi- rely on the availability of an accurate dependency cal, but the guidelines for “core” semantic roles parser for the target language. If one is not avail- are similar (Kingsbury et al., 2004), so we eval- able, the natural solution would be to use cross- uate only on core roles here. The data for the lingual model transfer to obtain it. second language pair is drawn from the Prague Unfortunately, the models presented in the pre- Czech-English Dependency Treebank 2.0 (Hajicˇ vious work, such as Zeman and Resnik (2008), et al., 2012), which we converted to a format simi- McDonald et al. (2011) and Tackstr¨ om¨ et al. lar to that of CoNLL-ST1. The original annotation (2012), were not made available, so we repro- uses the tectogrammatical representation (Hajic,ˇ duced the direct transfer algorithm of McDonald 2002) and an inventory of semantic roles (or func- et al. (2011), using Malt parser (Nivre, 2008) and tors), most of which are interpretable across vari- the same set of features. We did not reimple- ous predicates. Also note that the syntactic anno- ment the projected transfer algorithm, however, tation of English and Czech in PCEDT 2.0 is quite and used the default training procedure instead of similar (to the extent permitted by the difference perceptron-based learning. The dependency struc- in the structure of the two languages) and we can ture thus obtained is, of course, only a rough ap- use the dependency relations in our experiments. proximation – even a much more sophisticated al- For English-French, the English CoNLL-ST gorithm may not perform well when transferring dataset was used as a source and the model was syntax between such languages as Czech and En- evaluated on the manually annotated dataset from glish, given the inherent difference in their struc- van der Plas et al. (2011). The latter contains one ture. The scores are shown in table 2. thousand sentences from the French part of the Eu- We will henceforth refer to the syntactic annota- roparl (Koehn, 2005) corpus, annotated with se- tions that were provided with the datasets as orig- mantic roles following an adapted version of Prop- inal, as opposed to the annotations obtained by Bank (Palmer et al., 2005) guidelines. The au- means of syntactic transfer. thors perform annotation projection from English to French, using a joint model of syntax and se- 4.3 Baselines mantics and employing heuristics for filtering. We Unsupervised Baseline: We are using a version use a model trained on the output of this projec- of the unsupervised semantic role induction sys- tion system as one of the baselines. The evalua- tem of Titov and Klementiev (2012a) adapted to tion dataset is relatively small in this case, so we perform the transfer only one-way, from English Setup UAS, % to French. The part-of-speech tags in all datasets were re- EN-ZH 35 placed with the universal POS tags of Petrov et al. ZH-EN 42 (2012). For Czech, we have augmented the map- EN-CZ 36 pings to account for the tags that were not present CZ-EN 39 in the datasets from which the original mappings EN-FR 67 were derived. Namely, tag “t” is mapped to Table 2: Syntactic transfer accuracy, unlabeled at- “VERB” and “Y” – to “PRON”. tachment score (percent). Note that in case of We use parallel data to construct a bilingual French we evaluate against the output of a super- dictionary used in word mapping, as well as vised system, since manual annotation is not avail- in the projection baseline. For English-Czech able for this dataset. This score does not reflect the 1see http://www.ml4nlp.de/code-and-data/treex2conll true performance of syntactic transfer.

1194 the shared feature representation considered in or- Setup Syntax TRANS PROJ der to make the scores comparable with those EN-ZH trans 34.5 13.9 of the transfer model and, more importantly, to ZH-EN trans 32.6 15.6 enable evaluation on transferred syntax. Note EN-CZ trans 46.3 12.4 that the original system, tailored to a more ex- CZ-EN trans 42.3 22.2 pressive language-specific syntactic representa- EN-FR trans 61.6 43.5 tion and equipped with heuristics to identify ac- EN-ZH orig 51.7 19.6 tive/passive voice and other phenomena, achieves ZH-EN orig 53.2 29.7 higher scores than those we report here. EN-CZ orig 63.9 59.3 Projection Baseline: The projection baseline we CZ-EN orig 67.3 60.9 use for English-Czech and English-Chinese is a EN-FR orig 71.0 51.3 straightforward one: we label the source side of a parallel corpus using the source-language model, Table 3: Argument identification, transferred then identify those verbs on the target side that are model vs. projection baseline, F1. aligned to a predicate, mark them as predicates and propagate the argument roles in the same fashion. Most unsupervised SRL approaches assume A model is then trained on the resulting training that the argument identification is performed data and applied to the test set. by some external means, for example heuristi- For English-French we instead use the output of cally (Lang and Lapata, 2011). Such heuristics a fully featured projection model of van der Plas et or unsupervised approaches to argument identifi- al. (2011), published in the CLASSiC project. cation (Abend et al., 2009) can also be used in the 5 Results present setup. In order to ensure that the results are consistent, 5.2 Argument Classification the test sets, except for the French one, were par- In the following tables, TRANS column contains titioned into five equal parts (of 5 to 10 thousand the results for the transferred system, UNSUP – sentences each, depending on the dataset) and the for the unsupervised baseline and PROJ – for pro- evaluation performed separately on each one. All jection baseline. We highlight in bold the higher evaluation figures for English, Czech or Chinese score where the difference exceeds twice the max- below are the average values over the five sub- imum of the standard deviation estimates of the sets. In case of French, the evaluation dataset is two results. too small to split it further, so instead we ran the Table 4 presents the unsupervised evaluation re- evaluation five times on a randomly selected 80% sults. Note that the unsupervised model performs sample of the evaluation data and averaged over as well as the transferred one or better where the those. In both cases the results are consistent over the subsets, the standard deviation does not exceed Setup Syntax TRANS UNSUP 0.5% for the transfer system and projection base- line and 1% for the unsupervised system. EN-ZH trans 83.3 73.9 ZH-EN trans 79.2 67.6 5.1 Argument Identification EN-CZ trans 66.4 66.1 We summarize the results in table 3. Argument CZ-EN trans 68.2 68.7 identification is known to rely heavily on syntac- EN-FR trans 74.6 65.1 tic information, so it is unsurprising that it proves EN-ZH orig 84.5 89.7 inaccurate when transferred syntax is used. Our ZH-EN orig 79.2 83.0 simple projection baseline suffers from the same EN-CZ orig 74.1 74.0 problem. Even with original syntactic information CZ-EN orig 74.6 76.7 available, the performance of argument identifica- EN-FR orig 73.3 72.3 tion is moderate. Note that the model of (van der Plas et al., 2011), though relying on more expres- Table 4: Argument classification, transferred sive syntax, only outperforms the transferred sys- model vs. unsupervised baseline in terms of the c tem by 3% (F1) on this task. clustering metric F1 (see section 2.3).

1195 Setup Syntax TRANS PROJ Label Freq. F1 Re. Pr. EN-ZH trans 70.1 69.2 PAT 14707 69.4 70.0 68.7 ZH-EN trans 65.6 61.3 ACT 14303 81.1 81.7 80.4 EN-CZ trans 50.1 46.3 TWHEN 3631 70.6 65.1 77.0 CZ-EN trans 53.3 54.7 EFF 2601 45.4 67.2 34.3 EN-FR trans 65.1 66.1 LOC 1990 41.8 35.3 51.3 EN-ZH orig 71.7 69.7 MANN 1208 54.0 63.8 46.9 ZH-EN orig 66.1 64.4 ADDR 1045 30.2 34.4 26.8 EN-CZ orig 59.0 53.2 CPHR 791 20.4 13.1 45.0 CZ-EN orig 61.0 60.8 EXT 708 42.2 40.5 44.1 EN-FR orig 63.0 68.0 DIR3 695 20.1 17.3 23.9

Table 5: Argument classification, transferred Table 7: EN-CZ transfer (with original syntax), F1, model vs. projection baseline, accuracy. recall and precision for the top-10 most frequent roles. original syntactic dependencies are available. In nominal part of a complex predicate, as in “to have the more realistic scenario with transferred syn- [a plan] ”, and DIR3 indicates destination. tax, however, the transferred model proves more CPHR accurate. 5.3 Additional Experiments In table 5 we compare the transferred system We now evaluate the contribution of different as- with the projection baseline. It is easy to see pects of the feature representation to the perfor- that the scores vary strongly depending on the lan- mance of the model. Table 8 contains the results guage pair, due to both the difference in the anno- for English-French. tation scheme used and the degree of relatedness between the languages. The drop in performance Features Orig Trans when transferring the model to another language POS 47.5 47.5 is large in every case, though, see table 6. POS, Synt 53.0 53.1 POS, Cls Setup Target Source 53.7 53.7 POS, Gloss 63.7 63.7 EN-ZH 71.7 87.1 POS, Synt, Cls 55.9 56.4 ZH-EN 66.1 86.2 POS, Synt, Gloss 65.2 66.3 EN-CZ 59.0 80.1 POS, Cls, Gloss 61.5 61.5 CZ-EN 61.0 75.4 POS, Synt, Cls, Gloss 63.0 65.1 EN-FR 63.0 82.5 Table 8: EN-FR model transfer accuracy with dif- Table 6: Model accuracy on the source and target ferent feature subsets, using original and trans- language using original syntax. The source lan- ferred syntactic information. guage scores for English vary between language pairs because of the difference in syntactic anno- The fact that the model performs slightly bet- tation and role subset used. ter with transferred syntax may be explained by two factors. Firstly, as we already mentioned, the We also include the individual F scores for 1 original syntactic annotation is also produced au- the top-10 most frequent labels for EN-CZ trans- tomatically. Secondly, in the model transfer setup fer with original syntax in table 7. The model it is more important how closely the syntactic- provides meaningful predictions here, despite low semantic interface on the target side resembles that overall accuracy. on the source side than how well it matches the Most of the labels2 are self-explanatory: Pa- “true” structure of the target language, and in this tient (PAT), Actor (ACT), Time (TWHEN), Effect respect a transferred dependency parser may have (EFF), Location (LOC), Manner (MANN), Ad- an advantage over one trained on target-language dressee (ADDR), Extent (EXT). CPHR marks the data. 2 http://ufal.mff.cuni.cz/∼toman/pcedt/en/functors.html The high impact of the Gloss features here

1196 may be partly attributed to the fact that the map- diction (Spreyer and Frank, 2008). Interestingly, ping is derived from the same corpus as the eval- it has also been used to propagate morphosyntac- uation data – Europarl (Koehn, 2005) – and partly tic information between old and modern versions by the similarity between English and French in of the same language (Meyer, 2011). terms of word order, usage of articles and prepo- Cross-lingual model transfer methods (McDon- sitions. The moderate contribution of the cross- ald et al., 2011; Zeman and Resnik, 2008; Durrett lingual cluster features are likely due to the insuf- et al., 2012; Søgaard, 2011; Lopez et al., 2008) ficient granularity of the clustering for this task. have also been receiving much attention recently. For more distant language pairs, the contribu- The basic idea behind model transfer is similar to tions of individual feature groups are less inter- that of cross-lingual annotation projection, as we pretable, so we only highlight a few observations. can see from the way parallel data is used in, for First of all, both EN-CZ and CZ-EN benefit notice- example, McDonald et al. (2011). ably from the use of the original syntactic annota- A crucial component of direct transfer ap- tion, including dependency relations, but not from proaches is the unified feature representation. the transferred syntax, most likely due to the low There are at least two such representations of syntactic transfer performance. Both perform bet- lexical information (Klementiev et al., 2012; ter when lexical information is available, although Tackstr¨ om¨ et al., 2012), but both work on word the improvement is not as significant as in the case level. This makes it hard to account for phenom- of French – only up to 5%. ena that are expressed differently in the languages The situation with Chinese is somewhat compli- considered, for example the syntactic function of cated in that adding lexical information here fails a certain word may be indicated by a preposi- to yield an improvement in terms of the metric tion, inflection or word order, depending on the considered. This is likely due to the fact that we language. Accurate representation of such infor- consider only the core roles, which can usually be mation would require an extra level of abstrac- predicted with high accuracy based on syntactic tion (Hajic,ˇ 2002). information alone. A side-effect of using adaptation methods is that we are forced to use the same annotation scheme 6 Related Work for the task in question (SRL, in our case), which in turn simplifies the development of cross-lingual Development of robust statistical models for core tools for downstream tasks. Such representations NLP tasks is a challenging problem, and adapta- are also likely to be useful in machine translation. tion of existing models to new languages presents Unsupervised semantic role labeling meth- a viable alternative to exhaustive annotation for ods (Lang and Lapata, 2010; Lang and Lapata, each language. Although the models thus obtained 2011; Titov and Klementiev, 2012a; Lorenzo and are generally imperfect, they can be further refined Cerisara, 2012) also constitute an alternative to for a particular language and domain using tech- cross-lingual model transfer. niques such as active learning (Settles, 2010; Chen For an overview of of semi-supervised ap- et al., 2011). proaches we refer the reader to Titov and Klemen- Cross-lingual annotation projection (Yarowsky tiev (2012b). et al., 2001) approaches have been applied ex- tensively to a variety of tasks, including POS 7 Conclusion tagging (Xi and Hwa, 2005; Das and Petrov, 2011), morphology segmentation (Snyder and We have considered the cross-lingual model trans- Barzilay, 2008), verb classification (Merlo et al., fer approach as applied to the task of semantic role 2002), mention detection (Zitouni and Florian, labeling and observed that for closely related lan- 2008), LFG (Wroblewska´ and Frank, guages it performs comparably to annotation pro- 2009), (Kim et al., 2010), jection approaches. It allows one to quickly con- SRL (Pado´ and Lapata, 2009; van der Plas et al., struct an SRL model for a new language without 2011; Annesi and Basili, 2010; Tonelli and Pi- manual annotation or language-specific heuristics, anta, 2008), dependency parsing (Naseem et al., provided an accurate model is available for one of 2012; Ganchev et al., 2009; Smith and Eisner, the related languages along with a certain amount 2009; Hwa et al., 2005) or temporal relation pre- of parallel data for the two languages. While an-

1197 notation projection approaches require sentence- Anders Bjorkelund,¨ Love Hafdell, and Pierre Nugues. and word-aligned parallel data and crucially de- 2009. Multilingual semantic role labeling. In Pro- pend on the accuracy of the syntactic parsing and ceedings of the Thirteenth Conference on Computa- tional Natural Language Learning (CoNLL 2009): SRL on the source side of the parallel corpus, Shared Task, pages 43–48, Boulder, Colorado, June. cross-lingual model transfer can be performed us- Association for Computational Linguistics. ing only a bilingual dictionary. Ming-Wei Chang, Lev Ratinov, and Dan Roth. 2007. Unsupervised SRL approaches have their ad- Guiding semi-supervision with constraint-driven vantages, in particular when no annotated data is learning. In ACL. available for any of the related languages and there Chenhua Chen, Alexis Palmer, and Caroline Sporleder. is a syntactic parser available for the target one, 2011. Enhancing active learning for semantic role but the annotation they produce is not always suf- labeling via compressed dependency trees. In Pro- ficient. In applications such as Information Re- ceedings of 5th International Joint Conference on trieval it is preferable to have precise labels, rather Natural Language Processing, pages 183–191, Chi- ang Mai, Thailand, November. Asian Federation of than just clusters of arguments, for example. Natural Language Processing. Also note that when applying cross-lingual model transfer in practice, one can improve upon Dipanjan Das and Slav Petrov. 2011. Unsupervised part-of-speech tagging with bilingual graph-based the performance of the simplistic model we use projections. Proceedings of the Association for for evaluation, for example by picking the features Computational Linguistics. manually, taking into account the properties of the target language. Domain adaptation techniques Greg Durrett, Adam Pauls, and Dan Klein. 2012. Syn- tactic transfer using a bilingual lexicon. In Pro- can also be employed to adjust the model to the ceedings of the 2012 Joint Conference on Empirical target language. Methods in Natural Language Processing and Com- putational Natural Language Learning, pages 1–11, Acknowledgments Jeju Island, Korea, July. Association for Computa- tional Linguistics. The authors would like to thank Alexandre Kle- Andreas Eisele and Yu Chen. 2010. MultiUN: mentiev and Ryan McDonald for useful sugges- A multilingual corpus from United Nation docu- tions and Tackstr¨ om¨ et al. (2012) for sharing the ments. In Proceedings of the Seventh International cross-lingual word representations. This research Conference on Language Resources and Evaluation (LREC’10). European Language Resources Associ- is supported by the MMCI Cluster of Excellence. ation (ELRA). Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang- References Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Omri Abend, Roi Reichart, and Ari Rappoport. 2009. Machine Learning Research, 9:1871–1874. Unsupervised argument identification for semantic role labeling. In Proceedings of the Joint Con- Kuzman Ganchev, Jennifer Gillenwater, and Ben ference of the 47th Annual Meeting of the ACL Taskar. 2009. Dependency grammar induction via and the 4th International Joint Conference on Nat- bitext projection constraints. In Proceedings of the ural Language Processing of the AFNLP, ACL ’09, 47th Annual Meeting of the ACL, pages 369–377, pages 28–36, Stroudsburg, PA, USA. Association Stroudsburg, PA, USA. Association for Computa- for Computational Linguistics. tional Linguistics. Qin Gao and Stephan Vogel. 2011. Corpus expan- Paolo Annesi and Roberto Basili. 2010. Cross-lingual sion for statistical machine translation with seman- alignment of FrameNet annotations through hidden tic role label substitution rules. In Proceedings of Markov models. In Proceedings of the 11th interna- the 49th Annual Meeting of the Association for Com- tional conference on Computational Linguistics and putational Linguistics: Human Language Technolo- Intelligent , CICLing’10, pages 12– gies, pages 294–298, Portland, Oregon, USA. 25, Berlin, Heidelberg. Springer-Verlag. Trond Grenager and Christopher D. Manning. 2006. Roberto Basili, Diego De Cao, Danilo Croce, Bonaven- Unsupervised discovery of a statistical verb lexicon. tura Coppola, and Alessandro Moschitti. 2009. In Proceedings of EMNLP. Cross-language frame semantics transfer in bilin- gual corpora. In Alexander F. Gelbukh, editor, Pro- Jan Hajic.ˇ 2002. Tectogrammatical representation: ceedings of the 10th International Conference on Towards a minimal transfer in machine translation. Computational Linguistics and Intelligent Text Pro- In Robert Frank, editor, Proceedings of the 6th In- cessing, pages 332–345. ternational Workshop on Tree Adjoining Grammars

1198 and Related Frameworks (TAG+6), pages 216— Philipp Koehn. 2005. Europarl: A parallel corpus for 226, Venezia. Universita di Venezia. statistical machine translation. In Conference Pro- ceedings: the tenth Machine Translation Summit, Jan Hajic,ˇ Massimiliano Ciaramita, Richard Johans- pages 79–86, Phuket, Thailand. AAMT. son, Daisuke Kawahara, Maria Antonia` Mart´ı, Llu´ıs Marquez,` Adam Meyers, Joakim Nivre, Sebastian Joel Lang and Mirella Lapata. 2010. Unsuper- vised induction of semantic roles. In Human Lan- Pado,´ Jan Stˇ epˇ anek,´ Pavel Stranˇak,´ Mihai Surdeanu, guage Technologies: The 2010 Annual Conference Nianwen Xue, and Yi Zhang. 2009. The CoNLL- of the North American Chapter of the Association 2009 shared task: Syntactic and semantic dependen- for Computational Linguistics, pages 939–947, Los cies in multiple languages. In Proceedings of the Angeles, California, June. Association for Compu- Thirteenth Conference on Computational Natural tational Linguistics. Language Learning (CoNLL 2009): Shared Task, pages 1–18, Boulder, Colorado. Joel Lang and Mirella Lapata. 2011. Unsupervised semantic role induction via split-merge clustering. Jan Hajic,ˇ Eva Hajicovˇ a,´ Jarmila Panevova,´ Petr In Proc. of Annual Meeting of the Association for Sgall, Ondrejˇ Bojar, Silvie Cinkova,´ Eva Fucˇ´ıkova,´ Computational Linguistics (ACL). Marie Mikulova,´ Petr Pajas, Jan Popelka, Jirˇ´ı Semecky,´ Jana Sindlerovˇ a,´ Jan Stˇ epˇ anek,´ Josef Ding Liu and Daniel Gildea. 2010. Semantic role ˇ features for machine translation. In Proceedings of Toman, Zdenkaˇ Uresovˇ a,´ and Zdenekˇ Zabokrtsky.´ rd 2012. Announcing Prague Czech-English depen- the 23 International Conference on Computational dency treebank 2.0. In Nicoletta Calzolari (Con- Linguistics (Coling 2010), Beijing, China. ference Chair), Khalid Choukri, Thierry Declerck, Adam Lopez, Daniel Zeman, Michael Nossal, Philip Mehmet Ugurˇ Dogan,ˇ Bente Maegaard, Joseph Mar- Resnik, and Rebecca Hwa. 2008. Cross-language iani, Jan Odijk, and Stelios Piperidis, editors, Pro- parser adaptation between related languages. In ceedings of the Eight International Conference on IJCNLP-08 Workshop on NLP for Less Privileged Language Resources and Evaluation (LREC’12), Is- Languages, pages 35–42, Hyderabad, India, Jan- tanbul, Turkey, May. European Language Resources uary. Association (ELRA). Alejandra Lorenzo and Christophe Cerisara. 2012. Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Unsupervised frame based semantic role induction: Cabezas, and Okan Kolak. 2005. Bootstrapping application to French and English. In Proceedings parsers via syntactic projection across . of the ACL 2012 Joint Workshop on Statistical Pars- Natural Language Engineering, 11(3):311–325. ing and Semantic Processing of Morphologically Rich Languages, pages 30–35, Jeju, Republic of Ko- Richard Johansson and Pierre Nugues. 2008. rea, July. Association for Computational Linguistics. Dependency-based semantic role labeling of Prop- Ryan McDonald, Slav Petrov, and Keith Hall. 2011. Bank. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Process- Multi-source transfer of delexicalized dependency parsers. In Proceedings of the Conference on Em- ing, pages 69–78, Honolulu, Hawaii. pirical Methods in Natural Language Processing, EMNLP ’11, pages 62–72, Stroudsburg, PA, USA. Michael Kaisser and Bonnie Webber. 2007. Question Association for Computational Linguistics. answering based on semantic roles. In ACL Work- shop on Deep Linguistic Processing. Paola Merlo, Suzanne Stevenson, Vivian Tsang, and Gianluca Allaria. 2002. A multi-lingual paradigm Seokhwan Kim, Minwoo Jeong, Jonghoon Lee, and for automatic verb classification. In Proceedings Gary Geunbae Lee. 2010. A cross-lingual an- of the 40th Annual Meeting of the Association for notation projection approach for relation detection. Computational Linguistics (ACL’02), pages 207– In Proceedings of the 23rd International Conference 214, Philadelphia, PA. on Computational Linguistics, COLING ’10, pages 564–571, Stroudsburg, PA, USA. Association for Roland Meyer. 2011. New wine in old wineskins?– Computational Linguistics. Tagging old Russian via annotation projection from modern translations. Russian Linguistics, Paul Kingsbury, Nianwen Xue, and Martha Palmer. 35(2):267(15). 2004. Propbanking in parallel. In In Proceedings Tahira Naseem, Regina Barzilay, and Amir Globerson. of the Workshop on the Amazing Utility of Paral- 2012. Selective sharing for multilingual dependency lel and Comparable Corpora, in conjunction with parsing. In Proceedings of the 50th Annual Meet- LREC’04. ing of the Association for Computational Linguis- tics, pages 629–637, Jeju Island, Korea, July. Asso- Alexandre Klementiev, Ivan Titov, and Binod Bhat- ciation for Computational Linguistics. tarai. 2012. Inducing crosslingual distributed rep- resentations of words. In Proceedings of the Inter- Joakim Nivre. 2008. Algorithms for deterministic in- national Conference on Computational Linguistics cremental dependency parsing. Comput. Linguist., (COLING), Bombay, India. 34(4):513–553, December.

1199 Franz Josef Och and Hermann Ney. 2003. A sys- Ivan Titov and Alexandre Klementiev. 2012a. A tematic comparison of various statistical alignment Bayesian approach to unsupervised semantic role in- models. Computational Linguistics, 29(1). duction. In Proc. of European Chapter of the Asso- ciation for Computational Linguistics (EACL). Sebastian Pado´ and Mirella Lapata. 2009. Cross- lingual annotation projection for semantic roles. Ivan Titov and Alexandre Klementiev. 2012b. Semi- Journal of Artificial Intelligence Research, 36:307– supervised semantic role labeling: Approaching 340. from an unsupervised perspective. In Proceedings of the International Conference on Computational Martha Palmer, Daniel Gildea, and Paul Kingsbury. Linguistics (COLING), Bombay, India, December. 2005. The Proposition Bank: An annotated cor- pus of semantic roles. Computational Linguistics, Sara Tonelli and Emanuele Pianta. 2008. Frame infor- 31:71–105. mation transfer from English to Italian. In Proceed- Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. ings of LREC 2008. A universal part-of-speech tagset. In Proceedings of Lonneke van der Plas, James Henderson, and Paola LREC, May. Merlo. 2009. Domain adaptation with artificial Mark Sammons, Vinod Vydiswaran, Tim Vieira, data for of speech. In Proc. 2009 Nikhil Johri, Ming wei Chang, Dan Goldwasser, Annual Conference of the North American Chap- Vivek Srikumar, Gourab Kundu, Yuancheng Tu, ter of the Association for Computational Linguistics, Kevin Small, Joshua Rule, Quang Do, and Dan pages 125–128, Boulder, Colorado. Roth. 2009. Relation alignment for textual en- Lonneke van der Plas, Paola Merlo, and James Hen- tailment recognition. In Text Analysis Conference derson. 2011. Scaling up automatic cross-lingual (TAC). semantic role annotation. In Proceedings of the 49th Burr Settles. 2010. Active learning literature survey. Annual Meeting of the Association for Computa- Computer Sciences Technical Report, 1648. tional Linguistics: Human Language Technologies, HLT ’11, pages 299–304, Stroudsburg, PA, USA. Dan Shen and Mirella Lapata. 2007. Using semantic Association for Computational Linguistics. roles to improve . In EMNLP. Alina Wroblewska´ and Anette Frank. 2009. Cross- David A Smith and Jason Eisner. 2009. Parser adap- lingual projection of LFG F-structures: Building tation and projection with quasi-synchronous gram- an F-structure bank for Polish. In Eighth Interna- mar features. In Proceedings of the 2009 Confer- tional Workshop on and Linguistic Theo- ence on Empirical Methods in Natural Language ries, page 209. Processing, pages 822–831. Association for Com- putational Linguistics. Dekai Wu and Pascale Fung. 2009. Can semantic role labeling improve SMT? In Proceedings of 13th Benjamin Snyder and Regina Barzilay. 2008. Cross- Annual Conference of the European Association for lingual propagation for morphological analysis. In Machine Translation (EAMT 2009), Barcelona. Proceedings of the 23rd national conference on Ar- tificial intelligence. Chenhai Xi and Rebecca Hwa. 2005. A backoff Anders Søgaard. 2011. Data point selection for cross- model for bootstrapping resources for non-English language adaptation of dependency parsers. In Pro- languages. In Proceedings of the conference on Hu- ceedings of the 49th Annual Meeting of the Associ- man Language Technology and Empirical Methods ation for Computational Linguistics: Human Lan- in Natural Language Processing, pages 851–858, guage Technologies, volume 2 of HLT ’11, pages Stroudsburg, PA, USA. 682–686, Stroudsburg, PA, USA. Association for David Yarowsky, Grace Ngai, and Ricahrd Wicen- Computational Linguistics. towski. 2001. Inducing multilingual text analysis Kathrin Spreyer and Anette Frank. 2008. Projection- tools via robust projection across aligned corpora. In based acquisition of a temporal labeller. Proceed- Proceedings of Human Language Technology Con- ings of IJCNLP 2008. ference. Oscar Tackstr¨ om,¨ Ryan McDonald, and Jakob Uszko- Daniel Zeman and Philip Resnik. 2008. Cross- reit. 2012. Cross-lingual word clusters for direct language parser adaptation between related lan- transfer of linguistic structure. In Proc. of the An- guages. In Proceedings of the IJCNLP-08 Workshop nual Meeting of the North American Association on NLP for Less Privileged Languages, pages 35– of Computational Linguistics (NAACL), pages 477– 42, Hyderabad, India, January. Asian Federation of 487, Montreal,´ Canada. Natural Language Processing. Cynthia A. Thompson, Roger Levy, and Christopher D. Imed Zitouni and Radu Florian. 2008. Mention detec- Manning. 2003. A generative model for seman- tion crossing the language barrier. In Proceedings tic role labeling. In Proceedings of the 14th Eu- of the Conference on Empirical Methods in Natural ropean Conference on Machine Learning, ECML Language Processing. 2003, pages 397–408, Dubrovnik, Croatia.

1200