Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources Simone Conia Andrea Bacciu Roberto Navigli Sapienza NLP Group Department of Computer Science Sapienza University of Rome {first.lastname}@uniroma1.it Abstract robust multilingual representations, such as multilingual word embeddings (Grave et al., 2018) and While cross-lingual techniques are finding in- multilingual language models (Devlin et al., 2019; creasing success in a wide range of Natural Conneau et al., 2020), researchers have been able Language Processing tasks, their application to shift their focus to the development of models to Semantic Role Labeling (SRL) has been strongly limited by the fact that each language that work on multiple languages (Cai and Lapata, adopts its own linguistic formalism, from Prop- 2019b; He et al., 2019; Lyu et al., 2019). Bank for English to AnCora for Spanish and A robust multilingual representation is neverthe- PDT-Vallex for Czech, inter alia. In this work, less just one piece of the puzzle: a key challenge in we address this issue and present a unified multilingual SRL is that the task is tightly bound to model to perform cross-lingual SRL over het- linguistic formalisms (Màrquez et al., 2008) which erogeneous linguistic resources. Our model implicitly learns a high-quality mapping for may present significant structural differences from different formalisms across diverse languages language to language (Hajic et al., 2009). In the re- without resorting to word alignment and/or cent literature, it is standard practice to sidestep this translation techniques. We find that, not only is issue by training and evaluating a model on each our cross-lingual system competitive with the language separately (Cai and Lapata, 2019b; Chen current state of the art but that it is also robust et al., 2019; Kasai et al., 2019; He et al., 2019; Lyu to low-data scenarios. Most interestingly, our et al., 2019). Although this strategy allows a model unified model is able to annotate a sentence in a single forward pass with all the invento- to adapt itself to the characteristics of a given for- ries it was trained with, providing a tool for malism, it is burdened by the non-negligible need the analysis and comparison of linguistic theo- for training and maintaining one model instance ries across different languages. We release our for each language, resulting in a set of monolingual code and model at https://github.com/ systems. SapienzaNLP/unify-srl. Instead of dealing with heterogeneous linguistic theories, another line of research consists in 1 Introduction actively studying the effect of using a single for- Semantic Role Labeling (SRL) – a long-standing malism across multiple languages through annota- open problem in Natural Language Processing tion projection or other transfer techniques (Akbik (NLP) and a key building block of language un- et al., 2015, 2016; Daza and Frank, 2019; Cai and derstanding (Navigli, 2018) – is often defined as Lapata, 2020; Daza and Frank, 2020). However, the task of automatically addressing the question such approaches often rely on word aligners and/or “Who did what to whom, when, where, and how?” automatic translation tools which may introduce (Gildea and Jurafsky, 2000; Màrquez et al., 2008). a considerable amount of noise, especially in low- While the need to manually engineer and fine-tune resource languages. More importantly, they rely on complex feature templates severely limited early the strong assumption that the linguistic formalism work (Zhao et al., 2009), the great success of neu- of choice, which may have been developed with a ral networks in NLP has resulted in impressive specific language in mind, is also suitable for other progress in SRL, thanks especially to the ability of languages. recurrent networks to better capture relations over In this work, we take the best of both worlds sequences (He et al., 2017; Marcheggiani et al., and propose a novel approach to cross-lingual SRL. 2017). Owing to the recent wide availability of Our contributions can be summarized as follows: 338 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 338–351 June 6–11, 2021. ©2021 Association for Computational Linguistics • We introduce a unified model to perform demonstrated the effectiveness of linguistically- cross-lingual SRL with heterogeneous linguis- informed self-attention layers in SRL; Cai and tic resources; Lapata(2019b) observed that syntactic dependen- cies often mirror semantic relations and proposed • We find that our model is competitive against a model that jointly learns to perform syntactic state-of-the-art systems on all the 6 languages dependency parsing and SRL; He et al.(2019) de- of the CoNLL-2009 benchmark; vised syntax-based pruning rules that work for mul- • We show that our model is robust to low- tiple languages. On the other hand, the complexity resource scenarios, thanks to its ability to gen- of syntax and the noisy performance of automatic eralize across languages; syntactic parsers have deterred other researchers who, instead, have found methods to improve SRL • We probe our model and demonstrate that it without syntax: Cai et al.(2018) took advantage implicitly learns to align heterogeneous lin- of an attentive biaffine layer (Dozat and Manning, guistic resources; 2017) to better model predicate-argument relations; Chen et al.(2019) and Lyu et al.(2019) obtained • We automatically build and release a cross- remarkable results in multiple languages by cap- lingual mapping that aligns linguistic for- turing predicate-argument interactions via capsule malisms from diverse languages. networks and iteratively refining the sequence of We hope that our unified model will further ad- output labels, respectively; Cai and Lapata(2019a) vance cross-lingual SRL and represent a tool for proposed a semi-supervised approach that scales the analysis and comparison of linguistic theories across different languages. across multiple languages. While we follow the latter trend and develop a syntax-agnostic model, we underline that both the 2 Related Work aforementioned syntax-aware and syntax-agnostic End-to-end SRL. The SRL pipeline is usually approaches suffer from a significant drawback: divided into four steps: predicate identification, they require training one model instance for each predicate sense disambiguation, argument identi- language of interest. Their two main limitations fication, and argument classification. While early are, therefore, that i) the number of trainable pa- research focused its efforts on addressing each step rameters increases linearly with the number of lan- individually (Xue and Palmer, 2004; Björkelund guages, and ii) the information available in one et al., 2009; Zhao et al., 2009), recent work has suc- language cannot be exploited to make SRL more cessfully demonstrated that tackling some of these robust in other languages. In contrast, one of the subtasks jointly with multitask learning (Caruana, main objectives of our work is to develop a unified 1997) is beneficial. In particular, He et al.(2018) cross-lingual model which can mitigate the paucity and, subsequently, Cai et al.(2018), Li et al.(2019) of training data in some languages by exploiting and Conia et al.(2020), indicate that predicate the information available in other, resource-richer sense signals aid the identification of predicate- languages. argument relations. Therefore, we follow this line Cross-lingual SRL. A key challenge in perform- and propose an end-to-end system for cross-lingual ing cross-lingual SRL with a single unified model SRL. is the dissimilarity of predicate sense and semantic Multilingual SRL. Current work in multilingual role inventories between languages. For example, SRL revolves mainly around the development of the multilingual dataset distributed as part of the novel neural architectures, which fall into two CoNLL-2009 shared task (Hajic et al., 2009) adopts broad categories, syntax-aware and syntax-agnostic the English Proposition Bank (Palmer et al., 2005) ones. On one hand, the quality and diversity of and NomBank (Meyers et al., 2004) to annotate En- the information encoded by syntax is an enticing glish sentences, the Chinese Proposition Bank (Xue prospect that has resulted in a wide range of con- and Palmer, 2009) for Chinese, the AnCora (Taulé tributions: Marcheggiani and Titov(2017) made et al., 2008) predicate-argument structure inventory use of Graph Convolutional Networks (GCNs) to for Catalan and Spanish, the German Proposition better capture relations between neighboring nodes Bank which, differently from the other PropBanks, in syntactic dependency trees; Strubell et al.(2018) is derived from FrameNet (Hajic et al., 2009), and 339 PDT-Vallex (Hajic et al., 2003) for Czech. Many of 3 Model Description these inventories are not aligned with each other as they follow and implement different linguistic the- In the wake of recent work in SRL, our model falls ories which, in turn, may pose different challenges. into the broad category of end-to-end systems as it learns to jointly tackle predicate identification, Padó and Lapata(2009), and Akbik et al.(2015, predicate sense disambiguation, argument identi- 2016) worked around these issues by making the fication and argument classification. The model English PropBank act as a universal predicate architecture can be roughly divided into the follow- sense and semantic role inventory and projecting ing components: PropBank-style annotations from English onto non- English sentences by means of word alignment • A universal sentence encoder whose pa- techniques applied to parallel corpora such as Eu- rameters are shared across languages and roparl (Koehn, 2005). These efforts resulted in the which produces word encodings that capture creation of the Universal PropBank, a multilingual predicate-related information (Section 3.2); collection of semi-automatically annotated corpora for SRL, which is actively in use today to train and • A universal predicate-argument encoder evaluate novel cross-lingual methods such as word whose parameters are also shared across lan- alignment techniques (Aminian et al., 2019).

Load more