SemLink+: FrameNet, VerbNet and Event Ontologies

Martha Palmer, Claire Bonial Diana McCarthy Department of Linguistics Department of Theoretical and University of Colorado Applied Linguistics (DTAL) Boulder, CO University of Cambridge mpalmer/[email protected] [email protected]

places and events buried within text. These Abstract relations can be very complex, and are not always explicit, requiring subtle semantic This paper reviews the significant interpretation of the data. For instance, NLP contributions FrameNet has made to systems must be able to automatically our understanding of lexical resources, recognize that Stock prices sank and The stock semantic roles and event relations. market is falling can be describing the same event. Such an interpretation relies upon a 1 Introduction recognition of the similarity between sinking and falling, as well as noting the connection One of the great challenges of Natural between stock prices and the stock market, Language Processing (NLP) is the multitude of and, finally, acknowledgment that they are choices that language gives us for expressing playing the same role. A key element in event the same thing in different ways. This is extraction is the identification of the obviously true when taking other languages participants of an event, such as the initiator of into consideration - the same thought can be an action and any parties affected by it. expressed in English, French, Chinese or Basically who did what to whom, when, where, Russian, with widely varying results. However, why and how? Many systems today rely on it is also true when considering a single to help identify language such as English. Light verb participants, and lexical resources that provide constructions, nominalizations, idioms, slang, an inventory of possible predicate argument paraphrases, and synonyms all give us myriads structures for individual lexical items are of alternatives for “coining a phrase.” This crucial to the success of semantic role labeling causes immense difficulty for NLP systems. (Palmer,et al., 2010). No one has made greater contributions to advancing the state of the art of lexical 3 SemLink+ and Semantic Roles , and its applications to NLP, than Chuck Fillmore. In this paper we focus on the SemLink (Palmer, 2009) is an ongoing effort central role that FrameNet has played in our to map complementary lexical resources: development of SemLink+ and in our current PropBank (Palmer et al., 2005), VerbNet explorations into event ontologies that can play (Kipper et al., 2008), FrameNet (Fillmore et a practical role in accurate automatic event al., 2004), and the recently added OntoNotes extraction. (ON) sense groupings (Weischedel, et al., 2011). They all associate semantic information 2 Detecting events with the propositions in a sentence. Each was created independently with somewhat differing An elusive goal of current NLP systems is the goals, and they vary in the level and nature of accurate detection of events – recognizing the semantic detail represented. FrameNet is the meaningful relations among the topics, people,

13 Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929–2014), pages 13–17, Baltimore, Maryland USA, June 27, 2014. c 2014 Association for Computational Linguistics most fine-grained with the richest semantics, for core verb specific roles and for adjuncts it VerbNet focuses on syntactically-based has several generally applicable ArgModifiers generalizations that carry semantic that have function tag labels such as: MaNneR, implications, and the relatively coarse-grained TeMPoral, LOCation, DIRection, GOaL, etc. PropBank has been shown to provide the most VerbNet uses more traditional linguistic effective training data for supervised Machine thematic role labels, with about 30 in total, and Learning techniques. Nonetheless, they can be assumes adjuncts (ArgM’s) will be supplied by seen as complementary rather than conflicting, PropBank based semantic role labelers. and together comprise a whole that is greater FrameNet is even more fine-grained and has than the sum of its parts. SemLink serves as a frame-specific core and peripheral roles called platform to unify these resources. The recent Frame Elements for each frame, amounting to addition of ON sense groupings, which can be over 2000 individual Frame Element types. thought of as a more coarse-grained view of For example, He talked about politics would WordNet (Fellbaum, 1998), provides even receive the following semantic role labels from broader coverage for verbs, and a level of each framework.1 representation that is appropriate for linking between VerbNet class members and PropBank (talk.01) FrameNet lexical units, as described below. HeArg0 talkedRELATION about politicsArg1 SemLink unifies these lexical resources at several different levels. First by providing VerbNet (Talk-37.5): type-to-type mappings between the lexical HeAGENT talkedRELATION about politicsTOPIC units for each framework. For PropBank these are the very coarse-grained rolesets, for FrameNet (Statement frame): VerbNet they are verbs that are members of HeSPEAKER talkedRELATION about politicsTOPIC VerbNet classes, and for FrameNet they are the lexical units associated with each Frame. The Thanks to Chuck Fillmore’s careful same lemma can have multiple PropBank guidance, the rich, meticulously crafted rolesets and can be in several VerbNet classes Frames in FrameNet, with their detailed and FrameNet frames, but always with descriptions of all possible arguments and their different meanings. In general, the mappings relations to each other, offer the potential of from PropBank to VerbNet or FrameNet tend providing a foundation for inferencing about to be 1-many, while the mappings between events and their consequences. In addition VerbNet and FrameNet are more likely to be 1- FrameNet has from the beginning been 1. For example, the verb hear has just one inclusive in its addition of nominal and coarse-grained sense in PropBank, with the adjectival forms to the Frames, which greatly following roleset: increases our coverage of all predicating elements (Bonial, et al., 2014). There is also a Arg0: hearer comprehensive FrameNet Constructicon that Arg1: utterance, sound painstakingly lists many phrasal constructions, Arg2: speaker, source of sound such as “the Xer, the Yer” that cannot be found anywhere else (Fillmore, et al., 2012). Many of This roleset maps to both the Discover and See these frames, including the constructions, classes of VerbNet, and the Hear and apply equally well to other languages, as Perception_experience frames of FrameNet. evidenced by the various efforts to develop Then, for each lexical unit, SemLink also FrameNets in other languages2 promising a supplies a mapping between the semantic roles likely benefit to multilingual information of PropBank and VerbNet, as well as the roles of VerbNet and FrameNet. PropBank uses 1 very generic labels such as Arg0 and Arg1, Arg0 maps to Agent maps to Speaker. Arg1 maps to Topic maps to Topic. which correspond to Dowty’s Prototypical 2 Agent and Patient, respectively (Dowty, 1991). See FrameNet projects in other languages listed at https://framenet.icsi.berkeley.edu/fndrupal/framenets_in_ PropBank has up to six numbered arguments other_languages

14 processing as well. Given the close theoretical the kitchen event. It can sometimes be quite ties between PropBank, VerbNet and challenging to determine the relationship FrameNet, it should be possible to bootstrap between two events. For instance, earthquakes from the successful PropBank-based automatic are quite often associated with the collapse of semantic role labelers to equally accurate buildings, as in the following example, The FrameNet and VerbNet annotators, and to quake destroyed parts of Sausalito. All tall improve overall semantic role labeling buildings were demolished. performance (Bauer & Rambow, 2011; Many readers might agree that the Dipanjan, et al., 2010; Giuglea & Moschitti, earthquakes CAUSED the demolishment of the 2006; Merlo & der Plas, 2009; Yi, et al., 2007). buildings. However, are the building collapses That is one of the primary goals of SemLink. also SUBEVENTs of the earthquakes? The first release of SemLink (1.1) contained Sometimes they happen a few days later, or mappings between these three lexical resources immediately, simultaneously with the as well as a set of PropBank instances from the earthquake. Are they both subevents? In Wall Street Journal data with mappings to general, for accurate event detection, it would VerbNet classes and thematic roles (Palmer, be very useful to know which events must 2009). Our most recent release, SemLink 1.2,3 precede, must follow, or cannot be now includes mappings to FrameNet frames simultaneous with, which other events. As and Frame Elements wherever they are discussed in the 2013 NAACL Events available (FN version 1.5), as well as ON sense workshop and this year’s ACL Events groupings (Bonial, et al., 2013). The mapping workshop, clear, consistent annotation of files between PropBank and VerbNet (version events and their coreference and causal and 3.2), and FrameNet have also been checked for temporal relations is a much desired but very consistency and updated to more accurately challenging goal (Ikuta & Palmer, 2014). Any reflect the current relations between these assistance that can be provided by lexical resources. resources is welcome. This annotated corpus can now be used to Another very important contribution that train and evaluate VerbNet Class and FrameNet has made is in the realm of defining FrameNet Frame classifiers, to explore clusters these kinds of relations, and others, between of Frame Elements that map to the same frames. Parent-Child Frame to Frame relations VerbNet and PropBank semantic roles, and to can include Inheritance, Subframe, Perspective evaluate approaches to semantic role labeling On, Using, Causative Of, Inchoative of, and that use the type-to-type mappings to bootstrap there is also a Precedes temporal ordering VerbNet and FrameNet role labels from relation. automatic PropBank semantic role labels. The DEFT working group in Richer Event Descriptions has recently been exploring 4 Events, Event Types and Subevents expanding the ACE and ERE event types, and how they can be mapped onto a broader Accurate and informative semantic role ontological context. Exploring the FrameNet labels are an essential component of event relations that the relevant lexical items extraction, but, although necessary, they are participate in has been most informative. We not sufficient. Automatic event detection also first examined the simple LDC ERE requires the ability to distinguish between classification of Conflict events, which has events which are truly separate, such as demonstrations and attacks as siblings (ERE Yesterday, John was throwing a ball to Mary guidelines). We find FrameNet’s classification and Bill was flying a kite, as opposed to related of attacks as Hostile-Encounters quite useful, events such as John was washing the dishes and have no argument with it having an and Mary was drying them. The second pair Inheritance relation with Intentionally_act, and could be seen as temporally related subevents a Using relation with Taking_sides. of an overall doing the dishes or cleaning up Demonstrations, on the other hand, come under the Protest Frame, which has a Using 3 available for download here: http://verbs.colorado.edu/semlink/ relation with Taking_sides. The FrameNet

15 organization of demonstrations and attacks, as well as rich details about all of their possible although perfectly justifiable, doesn’t map participants. If we can harness the power of neatly onto the LDC organization since, to help us dynamically although they are close, they are not siblings. extend and enrich what has already been However, by also considering SUMO (Niles & manually created, we may find our computers Pease, 2001), the Predicate Matrix (de Lacalle , to be much smarter than we ever imagined et al., 2014), WordNet and VerbNet, we were them to be. able to develop the upper level partial Event Ontology given in Figure 1, which comfortably Acknowledgments incorporates the ERE and FrameNet relations This work has benefited immensely from comments within a broader framework, preserving the and suggestions during the discussions of the RED key aspects of each. working group on Event Ontologies, especially We are now discussing the ERE Life events, from Teruko Mitamura, Annie Zaenen, Ann Bies, birth, death, injury, marriage, divorce, etc., and German Rigau. We also gratefully and FrameNet is again proving to be acknowledge the support of the National Science inspirational. SemLink+ will encompass our Foundation Grant NSF-IIS-1116782, A Bayesian growing Event Ontology, as well as the Approach to Dynamic Lexical Resources for mappings between the resources and the Flexible Language Processing, DARPA FA-8750- multiple layers of annotation on the same data. 13-2-0045, subaward 560215 (via LDC) DEFT: Deep Exploration and Filtering of Text, DARPA

Machine Reading (via BBN), and NIH: 1 R01 LM010090-01A1, THYME, (via Harvard). The content is solely the responsibility of the authors and does not necessarily represent the official views of DARPA, NSF or NIH.

References Daniel Bauer & Owen Rambow, 2011, Increasing Coverage of Syntactic Subcategorization Patterns in FrameNet Using VerbNet, In the Proceedings of the IEEE Fifth International Conference on Semantic Computing. Claire Bonial, Julia Bonn, Kathryn Conger, Jena Figure 1 – SemLink+ Event Ontology, partial Hwang and Martha Palmer, 2014. PropBank: Semantics of New Predicate Types. The 9th edition of the Language Resources and 5 Conclusion Evaluation Conference. Reykjavik, Iceland. Since computers do not interact with and Claire Bonial, Kevin, Stowe, and Martha Palmer, experience the world the same way humans do, 2013. Renewing and Revising SemLink. The how could they ever interpret language GenLex Workshop on Linked Data in Linguistics, held with GenLex-13. describing the world the same way humans do? That NLP has made as much progress as it has Dipanjan Das, Nathan Schneider, Desai Chen, and is truly phenomenal, and there is much more Noah A. Smith, 2010. Probabilistic Frame- still that can be done. Rich, detailed, lexical Semantic . In Proceedings of the NAACL resources like FrameNet are major stepping 2010. stones that will enable continued David Dowty, 1991. Thematic Proto-Roles and improvements in the automatic representation Argument Selection. Language, 67:547-619 of sentences in context. FrameNet, and Christiane Fellbaum, 1998. WordNet: An Electronic WordNet, PropBank, VerbNet and SemLink+, Lexical Data-base. Language, Speech and provide priceless, invaluable information about Communications. MIT Press myriads of different types of events and the creative ways in which they can be expressed,

16 Charles. J. Fillmore; Collin F. Baker, and H. Sato, Ontology in Information Systems (FOIS-2001), 2004. FrameNet as a ``Net". In Proceedings of Chris Welty and Barry Smith, eds, Ogunquit, LREC 2004, 4, pages 1091-1094 Maine, October 17-19, 2001. Charles J. Fillmore, Russell R. Lee-Goldman, and Paola Merlo and Lonneke van der Plas. 2009. Russell Rhodes. 2012. “The FrameNet Abstraction and generalization in semantic role Constructicon” Boas, H.C. and Sag, I.A. (Eds.) labels: PropBank, VerbNet or both?, In the Sign-based Construction Grammar, CSLI Proceedings of ACL 2009. Publications. Martha Palmer, 2009. SemLink: Linking PropBank, Ana-Maria Guiglea and Alessandro Moschitti. VerbNet and FrameNet, In the Proceedings of 2006. Semantic role labeling via FrameNet, the Generative Lexicon Conference, GenLex-09. VerbNet and PropBank. In Proceedings of Martha Palmer, Daniel Gildea, and Paul Kingsbury. Coling-ACL 2006, pages 929–936. 2005. The Proposition Bank: An Annotated Rei Ikuta and Martha Palmer (2014) Challenges of Corpus of Semantic Roles. Computational Adding Causation to Richer Event Descriptions, Linguistics, 31(1):71–106 In the Proceedings of 2nd Events Workshop, held Martha Palmer, Daniel Gildea and Nianwen Xue. with ACL 2014, Baltimore, MD. Semantic Role Labeling. 2010. Synthesis Karin Kipper, Anna Korhonen, Neville Ryant and Lectures on Human Language Technology Martha Palmer. 2008. A Large-Scale Series, ed. Graeme Hirst, Morgan and Claypoole. Classification of English Verbs. Language Ralph Weischedel, Eduard Hovy, Mitchell Marcus, Resources and Evaluation Journal, 42(1):21–40 Martha Palmer, Robert Belvin, Sameer Pradan, Maddalen Lopez de Lacalle, Egoitz Laparra, Lance Ramshaw and Nianwen Xue. OntoNotes: German Rigau, 2014, Predicate Matrix: A Large Training Corpus for Enhanced extending SemLink throughWordNet mappings, Processing, included in Part 1 : Data Acquisition The 9th edition of the Language Resources and and Linguistic Resources of the Handbook of Evaluation Conference. Reykjavik, Iceland. Natural Language Processing and : Global Automatic Language Ian Niles and Adam Pease, 2001. Towards a Exploitation Editors: Joseph Olive, Caitlin Standard Upper Ontology. In Proceedings of the Christianson, John McCary, Springer Verglag, 2nd International Conference on Formal pp 54-63, 201

17