POLYGLOT: Multilingual Semantic Role Labeling with Unified Labels

Alan Akbik Yunyao Li IBM Research Almaden Research Center 650 Harry Road, San Jose, CA 95120, USA {akbika,yunyaoli}@us.ibm.com

Abstract

Semantic role labeling (SRL) identifies the predicate-argument structure in text with semantic labels. It plays a key role in un- derstanding natural language. In this pa- per, we present POLYGLOT, a multilingual semantic role labeling system capable of semantically parsing sentences in 9 differ- Figure 1: Example of POLYGLOT predicting English Prop- ent languages from 4 different language Bank labels for a simple German sentence: The verb “kaufen” is correctly identified to evoke the BUY.01 frame, while “ich” groups. The core of POLYGLOT are SRL (I) is recognized as the buyer,“ein neues Auto”(a new car) models for individual languages trained as the thing bought, and “dir”(for you) as the benefactive. with automatically generated Proposition Banks (Akbik et al., 2015). The key fea- Not surprisingly, enabling SRL for languages ture of the system is that it treats the other than English has received increasing atten- semantic labels of the English Proposi- tion. One relevant key effort is to create Proposi- tion Bank as “universal semantic labels”: tion Bank-style resources for different languages, Given a sentence in any of the supported such as Chinese (Xue and Palmer, 2005) and languages, POLYGLOT applies the corre- Hindi (Bhatt et al., 2009). However, the conven- sponding SRL and predicts English Prop- tional approach of manually generating such re- Bank frame and role annotation. The re- sources is costly in terms of time and experts re- sults are then visualized to facilitate the quired, hindering the expansion of SRL to new tar- understanding of multilingual SRL with get languages. this unified semantic representation. An alternative approach is annotation projec- tion (Pado´ and Lapata, 2009; Van der Plas et 1 Introduction al., 2011) that utilizes parallel corpora to trans- Semantic role labeling (SRL) is the task of la- fer predicted SRL labels from English sentences beling predicate-argument structure in sentences onto sentences in a target language. It has shown with shallow semantic information. One promi- great promise in automatically generating such re- nent labeling scheme for the English language is sources for arbitrary target languages. In previ- the Proposition Bank (Palmer et al., 2005) which ous work, we presented an approach based on fil- annotates predicates with frame labels and argu- tered projection and bootstrapped learning to auto- ments with role labels. Role labels roughly con- generate Proposition Bank-style resources for 7 form to simple questions (who, what, when, where, languages, namely Arabic, Chinese, French, Ger- how much, with whom) with regards to the predi- man, Hindi, Russian and Spanish (Akbik et al., cate. SRL is important for understanding natural 2015). language; it has been found useful for many ap- Unified semantic labels across all languages. plications such as information extraction (Fader et One key difference between auto-generated al., 2011) and question answering (Shen and Lap- PropBanks and manually created ones is that the ata, 2007; Maqsud et al., 2014). former use English Proposition Bank labels for Figure 2: Side by side view of English, Chinese and Spanish sentences parsed in POLYGLOT’s Web UI. English PropBank frame and role labels are predicted for all languages: All example sentences evoke the BUY.01 frame and have constituents accordingly labeled with roles such as the buyer, the thing bought, the price paid and the benefactive. all target languages, while the latter use language- experiment with the tool to understand the breadth specific labels. As such, the auto-generated Prop- of shallow semantic concepts currently covered, Banks allow us to train an SRL system to consume and to discuss limitations and the potential of such text in various languages and make predictions in an approach for downstream applications. a shared semantic label set, namely English Prop- Bank labels. Refer to Figures 1 and 2 for examples 2 System Overview of how our system predicts frame and role labels Figure 3 depicts the overall architecture of POLY- from the English Proposition Bank for sentences GLOT. First, we automatically generate labeled in German, English, Chinese and Spanish. training data for each target language with anno- Similar to how Stanford dependencies are one tation projection (Akbik et al., 2015) (Figure 3 basis of universal dependencies (De Marneffe et Step 1). We use the labeled data to train for each al., 2014; Nivre, 2015), we believe that English language an SRL system (Figure 3 Step 2) that PropBank labels have the potential to eventu- predicts English PropBank frame and role labels. ally become a basis of “universal” shallow se- Both the creation of the training data and the train- mantic labels. Such a unified representation of ing of the SRL instances are one-time processes. shallow semantics, we argue, may facilitate ap- POLYGLOT provides a Web-based GUI to al- plications such as multilingual information ex- low users to interact with the SRL systems (Fig- traction and question answering, much in the ure 3 Step 3). Given a natural language sentence, same way that universal dependencies facilitate depending on the language of the input sentence, tasks such as crosslingual learning and the devel- POLYGLOT selects the appropriate SRL instance opment and evaluation of multilingual syntactic and displays on the GUI the semantic parse, as parsers (Nivre, 2015). The key questions, how- well as syntactic information. ever, are (1) to what degree English PropBank frame and role labels are appropriate for different target languages; and (2) how far this approach can handle language-specific phenomena or semantic concepts. Contributions. To facilitate the discussions of the above questions, we present POLYGLOT, an SRL system trained on auto-generated PropBanks for 8 languages plus English, namely Arabic, Chi- nese, French, German, Hindi, Japanese, Russian and Spanish. Given a sentence in one of these 9 languages, the system applies the correspond- ing SRL and visualizes the shallow semantic parse with predicted English PropBank labels. POLY- GLOT allows us to illustrate our envisioned ap- proach of parsing different languages into a shared shallow semantic abstraction based on the English Proposition Bank. It also enables researchers to Figure 3: System overview AM-TMP A0 buy.01 A4 A1 Yesterday, Diego bought his girlfriend some flowers pings. In (Akbik et al., 2015), we additionally proposed a process of filtered projection and boot- strapped learning, and successfully created Propo- sition Banks for 7 target languages. We found the quality of the generated PropBanks to be moderate Ayer, Diego compró flores para su novia to high, depending on the target language. Table 1 AM-TMP A0 buy.01 A1 A4 shows estimated precision, recall and F1-score for Figure 4: Annotation projection for a word-aligned English- Spanish sentence pair. each language with two evaluation methods. Par- tial evaluation counts correctly labeled incomplete In the next sections, we briefly describe the cre- constituents as true positives while exact evalua- ation of the labeled data and the training of the tion only counts correctly labeled complete con- SRL systems, followed by a tour of the Web UI. stituents as true positives. For more details we re- fer the reader to (Akbik et al., 2015) . 3 Auto-Generation of Labeled Data 4 Semantic Role Labeling We followed an annotation projection approach to automatically generate the labeled data for dif- Using the auto-generated labeled data, we ferent languages. This approach takes as input a train the semantic role labeler of the MATE word-aligned parallel corpus of English sentences toolkit (Bjorkelund¨ et al., 2009), which achieved and their translations in a target language (TL). A state-of-the-art semantic F1-score in the multilin- semantic role labeler then predicts labels for the gual semantic role labeling task of the CoNLL- English sentences. In a projection step, these la- 2009 shared task (Hajicˇ et al., 2009). The parser is bels are transferred along word alignments onto implemented as a sequence of local logistic regres- the target language sentences. The underlying the- sion classifiers for the four steps of predicate iden- ory is that translated sentence pairs share a de- tification, predicate classification, argument iden- gree of semantic similarity, making such projec- tification and argument classification. In addition, tion possible (Pado´ and Lapata, 2009). it implements a global reranker to rerank sets of Figure 4 illustrates an example of annotation local predictions. It uses a standard feature set of projection: Using an SRL system trained with the lexical and syntactic features. English Proposition Bank, the English sentence is Preprocessing. Before SRL, we execute a labeled with the appropriate frame (BUY.01) and pipeline of NLP tools to extract the required lex- role labels: “Diego” as the buyer (A0 in PropBank ical, morphological and syntactic features. To fa- annotation), “some flowers” as the thing bought cilitate reproducability of the presented work, we (A1) and “his girlfriend” as the benefactive (A4). use publicly available open source tools and pre- In addition, “yesterday” is labeled AM-TMP, sig- nifying a temporal context of this frame. These PREDICATE ARGUMENT labels are then projected onto the aligned Spanish LANG. Match P R F1 P R F1 Agr κ words. For instance, “compro´” is word-aligned to part. 0.97 0.89 0.93 0.86 0.69 0.77 0.92 0.87 Arabic “bought” and thus labeled as BUY.01. The pro- exact 0.97 0.89 0.93 0.67 0.63 0.65 0.85 0.77 jection produces a Spanish sentence labeled with part. 0.97 0.88 0.92 0.93 0.83 0.88 0.95 0.91 Chinese English PropBank labels; such data can in turn be exact 0.97 0.88 0.92 0.83 0.81 0.82 0.92 0.86 used to train an SRL system for Spanish. part. 0.95 0.92 0.94 0.92 0.76 0.83 0.97 0.95 French State-of-the-art. Direct annotation projection of- exact 0.95 0.92 0.94 0.86 0.74 0.8 0.95 0.91 part. 0.96 0.92 0.94 0.95 0.73 0.83 0.95 0.91 ten introduces errors, mostly due to non-literal German exact 0.96 0.92 0.94 0.91 0.73 0.81 0.92 0.86 translations (Akbik et al., 2015). Previous work part. 0.91 0.68 0.78 0.93 0.66 0.77 0.94 0.88 Hindi defined lexical and syntactic constraints to in- exact 0.91 0.68 0.78 0.58 0.54 0.56 0.81 0.69 crease projection quality, such as filters to al- part. 0.96 0.94 0.95 0.91 0.68 0.78 0.97 0.94 Russian low only verbs to be labeled as frames (Van der exact 0.96 0.94 0.95 0.79 0.65 0.72 0.93 0.89 Plas et al., 2011), heuristics that ensure that only part. 0.96 0.93 0.95 0.85 0.74 0.79 0.91 0.85 Spanish heads of syntactic constituents are labeled as ar- exact 0.96 0.93 0.95 0.75 0.72 0.74 0.85 0.77 guments (Pado´ and Lapata, 2009) and the use of Table 1: Estimated precision and recall over seven languages verb translation to guide frame map- from our previous evaluation (Akbik et al., 2015). LANGUAGE NLPPREPROCESSING PARALLEL DATA SETS #SENTENCES

Arabic STANFORDCORENLP, KHOJASTEMMER, STANFORDPARSER UN, OpenSubtitles 24,5M Chinese STANFORDCORENLP, MATEPARSER UN, OpenSubtitles 12,2M English CLEARNLP n/a n/a French STANFORDCORENLP, MATETRANSITIONPARSER UN, OpenSubtitles 36M German STANFORDCORENLP, MATETRANSITIONPARSER Europarl, OpenSubtitles 14,1M Hindi TNTTAGGER, MALTPARSER Hindencorp 54K Japanese JJST Tatoeba, OpenSubtitles 1,7M Russian TREETAGGER, MALTPARSER UN, OpenSubtitles 22,7M Spanish STANFORDCORENLP, MATEPARSER UN, OpenSubtitles 52,4M Table 2: NLP tools and source of parallel data used for each language. Since English is the source language for annotation projection, no parallel data was required to train SRL. NLP tools: STANFORDCORENLP: (Manning et al., 2014) , TNTTAGGER: (Brants, 2000), TREETAGGER: (Schmid, 1994), KHOJASTEMMER: (Khoja and Garside, 1999), STANFORDPARSER: (Green and Manning, 2010), STANFORDCORENLP: (Choi and McCallum, 2013), MATEPARSER: (Bohnet, 2010), JJST: proprietary system, MATETRANSITIONPARSER: (Bohnet and Nivre, 2012), MALTPARSER: (Nivre et al., 2006). trained models where available. A breakdown of sented as a grid in which each row corresponds the preprocessing tools used for each language is to one identified semantic frame. The grid high- given in Table 2. lights sentence constituents labeled with roles and Data sets. In order to generate training includes role descriptions for better interpretabil- data for POLYGLOT, we used the following ity of the parsing results. sources of parallel data: The UN corpus of Below the results of the semantic analysis the official United Nations documents (Rafalovitch GUI shows two more detailed views of the pars- et al., 2009), the of Euro- ing results. The first visualizes the dependency pean parliament proceedings (Koehn, 2005), the parse tree generated using WHATSWRONGWITH- OpenSubtitles corpus of movie subtitles (Lison MYNLP3, while the second (omitted in Fig- and Tiedemann, 2016), the Hindencorp corpus ure 5 in the interest of space) displays the full automatically gathered from web sources (Bojar syntactic-semantic parse in CoNLL format, in- et al., 2014) and the Tatoeba corpus of language cluding morphological information and other fea- learning examples1. The data sets were obtained tures not present in the dependency tree visualiza- from the OPUS project (Tiedemann, 2012) and tion. These two views may be helpful to users that word aligned using the Berkeley Aligner2. Table 2 wish to identify possible sources of SRL errors. lists the data sets used for each language and the For instance, a common error class stem from er- combined number of available parallel sentences. rors in dependency parsing, causing incorrect con- stituents to be labeled as arguments. 5 POLYGLOT User Interface Example sentence. Figure 5 illustrates the re- The Web-based GUI of POLYGLOT allows users to sult visualization of the tool. A user enters a sen- enter sentences in one of 9 languages and request tence “Hier, je voulais acheter une baguette, mais a shallow semantic analysis. Figure 5 presents a je n’avais pas assez d’argent” (engl. “Yesterday I screenshot of the GUI. Users begin by entering a wanted to buy a baguette but I didn’t have enough sentence in the text field and clicking on the “parse money”) into the text field. As indicated in the sentence” button. As indicated in Figure 3, it is top left corner, this sentence is auto-detected to be then passed to a language-specific NLP pipeline French. based on the associated language that is detected The results of semantic analysis is displayed be- automatically by default or specified by the user. low the input field. The first column in the grid The pipeline tokenizes and lemmatizes the sen- indicates that three frames have been identified: tence, performs morphological analysis, depen- WANT.01, BUY.01 and HAVE.03. The second row dency parsing and semantic role labeling. in the grid corresponds to the WANT.01 frame, Output. The syntactic and semantic parsing re- which identifies “je” (engl. “I”) as the wanter and sults are displayed below the input field, follow- “acheter une baguette” (engl. “buy a baguette”) as ing the design of (Bjorkelund¨ et al., 2010): The the thing wanted. The arguments are color-coded topmost result table is the semantic analysis, pre- by PropBank argument type for better readability. For instance, in the PropBank annotation scheme, 1http://tatoeba.org/eng/ 2https://code.google.com/archive/p/berkeleyaligner/ 3https://code.google.com/archive/p/whatswrong/ Figure 5: POLYGLOT’s Web UI with a French example sentence. the agents of the three frames in the example (the References wanter, the buyer and the owner) are all annotated Alan Akbik, Laura Chiticariu, Marina Danilevsky, with the same role (A0). They are thus highlighted Yunyao Li, Shivakumar Vaithyanathan, and Huaiyu in the same yellow color in the visualization. This Zhu. 2015. Generating high quality proposition allows a user to quickly gauge whether the seman- banks for multilingual semantic role labeling. In 4 ACL 2015, 53rd Annual Meeting of the Association tic analysis of the sentence is correct . for Computational Linguistics Beijing, China, page to appear. 6 Demonstration and Outlook Rajesh Bhatt, Bhuvana Narasimhan, Martha Palmer, Owen Rambow, Dipti Misra Sharma, and Fei Xia. We present POLYGLOT as a hands-on demo where 2009. A multi-representational and multi-layered users can enter sentences and request shallow se- for hindi/urdu. In Proceedings of the Third mantic analyses. We also plan to make it publicly Linguistic Annotation Workshop, pages 186–189. accessible in the future. We are currently working Association for Computational Linguistics. on improving the annotation projection approach Anders Bjorkelund,¨ Love Hafdell, and Pierre Nugues. to generate higher quality data by experimenting 2009. Multilingual semantic role labeling. In Pro- with further constraints, language-specific heuris- ceedings of the Thirteenth CoNLL: Shared Task, tics and improved frame mappings. We are par- pages 43–48, Boulder, Colorado, June. Association for Computational Linguistics. ticularly interested in how far English Proposi- tion Bank labels are suitable for arbitrary target Anders Bjorkelund,¨ Bernd Bohnet, Love Hafdell, and Pierre Nugues. 2010. A high-performance syntac- languages and may serve as basis of a “universal tic and semantic dependency parser. In Proceedings semantic role labeling” framework: we are qual- of the 23rd International Conference on Computa- itatively analysing auto-generated PropBanks in tional Linguistics: Demonstrations, pages 33–36. comparison to manual efforts; meanwhile, we are Association for Computational Linguistics. evaluating POLYGLOT in downstream applications Bernd Bohnet and Joakim Nivre. 2012. A transition- such as multilingual IE. Through the presentation based system for joint part-of-speech tagging and la- of POLYGLOT, we hope to engage the research beled non-projective dependency parsing. In Pro- ceedings of the 2012 EMNLP-CoNLL, pages 1455– community in this discussion. 1465. Association for Computational Linguistics. Bernd Bohnet. 2010. Very high accuracy and fast de- 4The example sentence in Figure 5 contains one error: pendency parsing is not a contradiction. In Proceed- The word “hier” (engl. “yesterday”) should be labeled AM- ings of the 23rd COLING, pages 89–97. Association TMP instead of AM-DIS. All other labels are correct. for Computational Linguistics. Ondrejˇ Bojar, Vojtechˇ Diatka, Pavel Rychly,` Pavel Umar Maqsud, Sebastian Arnold, Michael Hulfenhaus,¨ Stranˇak,´ V´ıt Suchomel, Alesˇ Tamchyna, Daniel Ze- and Alan Akbik. 2014. Nerdle: Topic-specific ques- man, et al. 2014. Hindencorp–hindi-english and tion answering using wikia seeds. In COLING (De- hindi-only corpus for . In Pro- mos), pages 81–85. ceedings of the Ninth LREC. Joakim Nivre, Johan Hall, and Jens Nilsson. 2006. Thorsten Brants. 2000. Tnt: a statistical part-of- Maltparser: A data-driven parser-generator for de- speech tagger. In Proceedings of the sixth confer- pendency parsing. In Proceedings of LREC, vol- ence on Applied natural language processing, pages ume 6, pages 2216–2219. 224–231. Association for Computational Linguis- tics. Joakim Nivre. 2015. Towards a universal grammar for natural language processing. In Computational Jinho D. Choi and Andrew McCallum. 2013. Linguistics and Intelligent Text Processing, pages 3– Transition-based dependency parsing with selec- 16. Springer. tional branching. In Proceedings of the 51st Annual Sebastian Pado´ and Mirella Lapata. 2009. Cross- Meeting of the Association for Computational Lin- lingual annotation projection for semantic roles. guistics. Journal of Artificial Intelligence Research, 36(1):307–340. Marie-Catherine De Marneffe, Timothy Dozat, Na- talia Silveira, Katri Haverinen, Filip Ginter, Joakim Martha Palmer, Daniel Gildea, and Paul Kingsbury. Nivre, and Christopher D Manning. 2014. Univer- 2005. The proposition bank: An annotated cor- sal stanford dependencies: A cross-linguistic typol- pus of semantic roles. Computational linguistics, ogy. In LREC, volume 14, pages 4585–4592. 31(1):71–106. Anthony Fader, Stephen Soderland, and Oren Etzioni. Alexandre Rafalovitch, Robert Dale, et al. 2009. 2011. Identifying relations for open information ex- United nations general assembly resolutions: A six- traction. In Proceedings of the Conference on Em- language parallel corpus. In Proceedings of the MT pirical Methods in Natural Language Processing, Summit, volume 12, pages 292–299. pages 1535–1545. Association for Computational Linguistics. Helmut Schmid. 1994. Probabilistic part-of-speech tagging using decision trees. In Proceedings of the Spence Green and Christopher D Manning. 2010. Bet- international conference on new methods in lan- ter arabic parsing: Baselines, evaluations, and anal- guage processing, volume 12, pages 44–49. Cite- ysis. In Proceedings of the 23rd COLING, pages seer. 394–402. Association for Computational Linguis- tics. Dan Shen and Mirella Lapata. 2007. Using semantic roles to improve question answering. In EMNLP- Jan Hajic,ˇ Massimiliano Ciaramita, Richard Johans- CoNLL, pages 12–21. son, Daisuke Kawahara, Maria Antonia` Mart´ı, Llu´ıs Marquez,` Adam Meyers, Joakim Nivre, Sebastian Jorg¨ Tiedemann. 2012. Parallel data, tools and inter- Pado,´ Jan Stˇ epˇ anek,´ et al. 2009. The conll-2009 faces in opus. In Proceedings of LREC, MAY 21-27, shared task: Syntactic and semantic dependencies 2012, Istanbul, Turkey, pages 2214–2218. in multiple languages. In Proceedings of the Thir- Lonneke Van der Plas, Paola Merlo, and James Hen- teenth CoNLL: Shared Task, pages 1–18. Associa- derson. 2011. Scaling up automatic cross-lingual tion for Computational Linguistics. semantic role annotation. In Proceedings of the 49th Annual Meeting of the Association for Computa- Shereen Khoja and Roger Garside. 1999. Stemming tional Linguistics: Human Language Technologies: arabic text. Lancaster, UK, Computing Department, short papers-Volume 2, pages 299–304. Association Lancaster University. for Computational Linguistics. Philipp Koehn. 2005. Europarl: A parallel corpus for Nianwen Xue and Martha Palmer. 2005. Automatic statistical machine translation. In MT summit, vol- semantic role labeling for chinese verbs. In IJCAI, ume 5, pages 79–86. volume 5, pages 1160–1165. Citeseer. Pierre Lison and Jorg¨ Tiedemann. 2016. Opensub- titles2016: Extracting large parallel corpora from movie and tv subtitles.

Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard, and David Mc- Closky. 2014. The Stanford CoreNLP natural lan- guage processing toolkit. In Association for Compu- tational Linguistics (ACL) System Demonstrations, pages 55–60.