Predicate Representations and Polysemy in Verbnet Semantic Parsing

Total Page:16

File Type:pdf, Size:1020Kb

Predicate Representations and Polysemy in Verbnet Semantic Parsing Predicate Representations and Polysemy in VerbNet Semantic Parsing James Gung∗ Martha Palmer University of Colorado Boulder University of Colorado Boulder [email protected] [email protected] Abstract Billy consoled the puppy PB Arg0 console.01 Arg1 Despite recent advances in semantic role la- beling propelled by pre-trained text encoders VN Stimulus amuse-31.1 Experiencer like BERT, performance lags behind when ap- plied to predicates observed infrequently dur- Billy walked the puppy ing training or to sentences in new domains. PB Arg0 walk.01 Arg1 In this work, we investigate how semantic role labeling performance on low-frequency VN Agent run-51.3.2-2-1 Theme predicates and out-of-domain data can be im- Table 1: Comparison of PropBank (PB) and VerbNet proved by using VerbNet, a verb lexicon that (VN) roles for predicates console and walk. VerbNet’s groups verbs into hierarchical classes based on thematic role assignments (e.g. Stimulus vs. Agent shared syntactic and semantic behavior and de- and Experiencer vs. Theme) are more dependent on fines semantic representations describing rela- the predicate than PropBank’s numbered arguments. tions between arguments. We find that Verb- Net classes provide an effective level of ab- straction, improving generalization on low- frequency predicates by allowing them to learn large gains in performance through the use of pre- from the training examples of other predicates trained text encoders like ELMo and BERT (Peters belonging to the same class. We also find et al., 2018; Devlin et al., 2019). Despite these ad- that joint training of VerbNet role labeling and vances, performance on low-frequency predicates predicate disambiguation of VerbNet classes and out-of-domain data remains low relative to for polysemous verbs leads to improvements in-domain performance on higher frequency predi- in both tasks, naturally supporting the extrac- cates. tion of VerbNet’s semantic representations. The assignment of role labels to a predicate’s 1 Introduction arguments is dependent upon the predicate’s sense. PropBank (Palmer et al., 2005) divides each pred- Semantic role labeling (SRL) is a form of shallow icate into one or more rolesets, which are coarse- semantic parsing that involves the extraction of grained sense distinctions that each provide a set predicate arguments and their assignment to consis- of core numbered arguments (A0-A5) and their tent roles with respect to the predicate, facilitating corresponding definitions. VerbNet (VN) groups the labeling of e.g. who did what to whom (Gildea verbs into hierarchical classes, each class defining and Jurafsky, 2000). SRL systems have been a set of valid syntactic frames that define a direct broadly applied to applications such as question correspondence between thematic roles and syntac- answering (Berant et al., 2014; Wang et al., 2015), tic realizations, e.g. Agent REL Patient (e.g. John machine translation (Liu and Gildea, 2010; Bazraf- broke the vase) or Patient REL (e.g. The vase broke) shan and Gildea, 2013), dialog systems (Tur and for break-45.1 (Schuler, 2005). Hakkani-Tur¨ , 2005; Chen et al., 2013), metaphor Recent PropBank (PB) semantic role labeling detection (Stowe et al., 2019), and clinical infor- models have largely eschewed explicit predicate mation extraction (Gung, 2013; MacAvaney et al., disambiguation in favor of direct prediction of se- 2017). Recent approaches to SRL have achieved mantic roles in end-to-end trainable models (Zhou ∗Work done prior to joining Amazon. and Xu, 2015; He et al., 2017; Shi and Lin, 2019). 51 Proceedings of the 14th International Conference on Computational Semantics, pages 51–62 June 17–18, 2021. ©2021 Association for Computational Linguistics This is possible for several reasons: First, Prop- roles that, unlike PB numbered arguments, main- Bank’s core roles and modifiers are shared across tain consistent meanings across different verbs and all predicates, allowing a single classifier to be classes. VN classes provide an enumeration of syn- trained over tokens or spans. Second, although tactic frames applicable to each member verb, de- definitions of PB roles are specific to the different scribing how the thematic roles of a VN class may senses of each predicate, efforts are made when be realized in a sentence. Every syntactic frame en- creating rolesets to ensure that A0 and A1 exhibit tails a set of low-level semantic representations properties of Dowty’s prototypical Agent and pro- (primitives) that describe relations between the- totypical Patient respectively (1991). Finally, PB matic role arguments as well as changes throughout rolesets are defined based on VN class membership, the course of the event (Brown et al., 2018). The with predicates in the same classes thus being as- close relationship between syntactic realizations signed relatively consistent role definitions (Bonial and semantic representations facilitates straightfor- et al., 2010). ward extraction of VN semantic predicates given Unlike PropBank, VerbNet’s thematic roles are identification of a VN class and corresponding the- shared across predicates and classes with consis- matic roles. VN primitives have been applied to tent definitions. However, VN roles are more de- problems such as machine comprehension (Clark pendent on the identity of the predicate (Zapirain et al., 2018) and question generation (Dhole and et al., 2008; Merlo and Van Der Plas, 2009). Ex- Manning, 2020). amples of PropBank and VerbNet roles illustrating this are given in Table1. Consequently, VN role Comparing VerbNet with PropBank Yi et al. labeling models may benefit more from predicate (2007) use VN role groupings to improve label features than PropBank. Furthermore, while it is consistency across verbs by reducing the overload- possible to identify PB or VN roles without classi- ing of PropBank’s numbered arguments like A2. fying predicate senses, linking the resulting roles Comparing SRL models trained on PB and VN, to their definitions or to the syntactic frames and Zapirain et al.(2008) find that their VerbNet model associated semantic primitives in VN does require performs worse on infrequent predicates than their explicit predicate disambiguation (Brown et al., PB model, and suggest that VN is more reliant on 2019). Therefore, predicate disambiguation is of- the identity of the predicate than PB based on exper- ten an essential step when applying SRL systems iments removing predicate-specific features from to real-world problems. their models. They suggest that the high consis- In this work, we evaluate alternative approaches tency of A0 and A1 enables PB to generalize better for incorporating VerbNet classes in English Verb- without relying on predicate-specific information. Net and PropBank role labeling. We propose a Merlo and Van Der Plas(2009) provide an joint model for SRL and VN predicate disambigua- information-theoretic perspective on the compari- tion (VN classification), finding that joint training son of PropBank and VerbNet, demonstrating how leads to improvements in VN classification and the identity of the predicate is more important to role labeling for out-of-domain predicates. We also VN SRL than for PB by comparing the conditional evaluate VN classes as predicate-specific features. entropy of roles given verbs as well as the mutual Using gold classes, we observe significant improve- information of roles and verbs. In multilingual ments in both PB and VN SRL. We also observe BERT probing studies comparing several SRL for- improvements in VN role labeling when using pre- malisms, Kuznetsov and Gurevych(2020) find that dicted classes and features that incorporate all valid layer utilization for predicates differs between PB classes for each predicate1. and VN. PB emphasizes the same layers used for syntactic tasks, while VN uses layers associated 2 Background and Related Work with tasks used more prevalently in lexical tasks. These findings reinforce the importance of predi- VerbNet VerbNet is a broad-coverage lexicon cate representations to VerbNet. that groups verbs into hierarchical classes based on shared syntactic and semantic behavior (Schuler, SRL and Predicate Disambiguation Previous 2005). Each VN class is assigned a set of thematic work has investigated the interplay between pred- 1Our code is available at https://github.com/ icate sense disambiguation and SRL. Dang and jgung/verbnet-parsing-iwcs-2021. Palmer(2005) improve verb sense disambiguation 52 (VSD) using features based on semantic role labels. ing of a sentence is produced using the target pred- Moreda and Palomar(2006) find that explicit verb icate as the Sent. B input to BERT. For example, senses improve PB SRL for verb-specific roles like the sentence I tried opening it is processed as: A2 and A3, but hurt on adjuncts. Yi(2007) find CLS I tried opening it SEP opening SEP that using gold standard PB roleset IDs as features in an SRL model improves performance only on for the verb open. This enables BERT to incorpo- highly polysemous verbs. Dahlmeier et al.(2009) rate the identity of the predicate in the encoding of propose a joint probabilistic model for preposition each word while clearly delineating it from tokens disambiguation and SRL, finding an improvement in the original sentence. over independent models. To simplify notation, we’ll treat LM(a; b) 2 T ×D Predicate disambiguation plays a critical role R a LM as shorthand for the final layer BERT en- in FrameNet (Baker et al., 1998) parsing, in part coding for a pair of sentences a = w1; :::; wTa and b = w ; :::; w T T because FrameNet’s role inventory is more than 1 Tb with a and b words respectively, an order of magnitude larger than that of PB and where DLM gives BERT’s hidden size. This is pro- VN. This richer, more granular role inventory lends duced by applying WordPiece tokenization (WP) advantages to approaches that constrain role iden- to each word in each sentence and concatenating tification to the set of valid roles for the predicted the resulting sequences of token IDs with standard frame (Das et al., 2014; Hermann et al., 2014), BERT-specific IDs: or that jointly encode argument and role represen- w = CLS; WP(a); SEP; WP(b); SEP tations given identified frames (FitzGerald et al., 2015).
Recommended publications
  • Abstraction and Generalisation in Semantic Role Labels: Propbank
    Abstraction and Generalisation in Semantic Role Labels: PropBank, VerbNet or both? Paola Merlo Lonneke Van Der Plas Linguistics Department Linguistics Department University of Geneva University of Geneva 5 Rue de Candolle, 1204 Geneva 5 Rue de Candolle, 1204 Geneva Switzerland Switzerland [email protected] [email protected] Abstract The role of theories of semantic role lists is to obtain a set of semantic roles that can apply to Semantic role labels are the representa- any argument of any verb, to provide an unam- tion of the grammatically relevant aspects biguous identifier of the grammatical roles of the of a sentence meaning. Capturing the participants in the event described by the sentence nature and the number of semantic roles (Dowty, 1991). Starting from the first proposals in a sentence is therefore fundamental to (Gruber, 1965; Fillmore, 1968; Jackendoff, 1972), correctly describing the interface between several approaches have been put forth, ranging grammar and meaning. In this paper, we from a combination of very few roles to lists of compare two annotation schemes, Prop- very fine-grained specificity. (See Levin and Rap- Bank and VerbNet, in a task-independent, paport Hovav (2005) for an exhaustive review). general way, analysing how well they fare In NLP, several proposals have been put forth in in capturing the linguistic generalisations recent years and adopted in the annotation of large that are known to hold for semantic role samples of text (Baker et al., 1998; Palmer et al., labels, and consequently how well they 2005; Kipper, 2005; Loper et al., 2007). The an- grammaticalise aspects of meaning.
    [Show full text]
  • A Novel Large-Scale Verbal Semantic Resource and Its Application to Semantic Role Labeling
    VerbAtlas: a Novel Large-Scale Verbal Semantic Resource and Its Application to Semantic Role Labeling Andrea Di Fabio}~, Simone Conia}, Roberto Navigli} }Department of Computer Science ~Department of Literature and Modern Cultures Sapienza University of Rome, Italy {difabio,conia,navigli}@di.uniroma1.it Abstract The automatic identification and labeling of ar- gument structures is a task pioneered by Gildea We present VerbAtlas, a new, hand-crafted and Jurafsky(2002) called Semantic Role Label- lexical-semantic resource whose goal is to ing (SRL). SRL has become very popular thanks bring together all verbal synsets from Word- to its integration into other related NLP tasks Net into semantically-coherent frames. The such as machine translation (Liu and Gildea, frames define a common, prototypical argu- ment structure while at the same time pro- 2010), visual semantic role labeling (Silberer and viding new concept-specific information. In Pinkal, 2018) and information extraction (Bas- contrast to PropBank, which defines enumer- tianelli et al., 2013). ative semantic roles, VerbAtlas comes with In order to be performed, SRL requires the fol- an explicit, cross-frame set of semantic roles lowing core elements: 1) a verb inventory, and 2) linked to selectional preferences expressed in terms of WordNet synsets, and is the first a semantic role inventory. However, the current resource enriched with semantic information verb inventories used for this task, such as Prop- about implicit, shadow, and default arguments. Bank (Palmer et al., 2005) and FrameNet (Baker We demonstrate the effectiveness of VerbAtlas et al., 1998), are language-specific and lack high- in the task of dependency-based Semantic quality interoperability with existing knowledge Role Labeling and show how its integration bases.
    [Show full text]
  • Generating Lexical Representations of Frames Using Lexical Substitution
    Generating Lexical Representations of Frames using Lexical Substitution Saba Anwar Artem Shelmanov Universitat¨ Hamburg Skolkovo Institute of Science and Technology Germany Russia [email protected] [email protected] Alexander Panchenko Chris Biemann Skolkovo Institute of Science and Technology Universitat¨ Hamburg Russia Germany [email protected] [email protected] Abstract Seed sentence: I hope PattiHelper can helpAssistance youBenefited party soonTime . Semantic frames are formal linguistic struc- Substitutes for Assistance: assist, aid tures describing situations/actions/events, e.g. Substitutes for Helper: she, I, he, you, we, someone, Commercial transfer of goods. Each frame they, it, lori, hannah, paul, sarah, melanie, pam, riley Substitutes for Benefited party: me, him, folk, her, provides a set of roles corresponding to the sit- everyone, people uation participants, e.g. Buyer and Goods, and Substitutes for Time: tomorrow, now, shortly, sooner, lexical units (LUs) – words and phrases that tonight, today, later can evoke this particular frame in texts, e.g. Sell. The scarcity of annotated resources hin- Table 1: An example of the induced lexical represen- ders wider adoption of frame semantics across tation (roles and LUs) of the Assistance FrameNet languages and domains. We investigate a sim- frame using lexical substitutes from a single seed sen- ple yet effective method, lexical substitution tence. with word representation models, to automat- ically expand a small set of frame-annotated annotated resources. Some publicly available re- sentences with new words for their respective sources are FrameNet (Baker et al., 1998) and roles and LUs. We evaluate the expansion PropBank (Palmer et al., 2005), yet for many lan- quality using FrameNet.
    [Show full text]
  • Mona Diab Center for Computational Learning Systems (CCLS) Columbia University Email: [email protected] Office: (212)8701290
    Mona Diab Center for Computational Learning Systems (CCLS) Columbia University Email: [email protected] Office: (212)8701290 Education • Ph.D. Linguistics (Computational Linguistics) Aug. ‘03 University of Maryland, College Park Title: Word Sense Disambiguation within a multilingual framework Advisor: Philip Resnik • M.Sc. Computer Science (Artificial Intelligence - Machine Learning) Jun. ‘97 The George Washington University • B.Sc. Computer Science & Software Engineering Jun. ‘92 American University in Cairo • B.Sc. Tourist Guidance (Egyptology & Archaeology) Jun. ‘90 Faculty of Tourism and Hotel Management, Helwan University Work Experience • Associate Research Scientist (Columbia University) July’05-present Working on semantic role parsing of Modern Standard Arabic (MSA) [Collaboration with Dan Jurafsky, Stanford University]. Working on syntactic parsing of MSA [Collaboration with Nizar Habash & Owen Rambow, Columbia University] Working on tools for processing MSA. [Collaboration with Kadri Hacioglu, University of Colorado, Boulder] Developing standards for the processing of Arabic dialects. [Collaboration with Nizar Habash & Owen Rambow, Columbia University] Building Arabic Propbank [Collaboration with Martha Palmer, University of Colorado, Boulder] Building an Arabic WordNet [Collaboration with Christiane Fellbaum, Princeton University] Working on WordNet clustering. [Collaboration with Dan Jurafsky & Andrew Ng, Stanford University] Working on Arabic verb subcategorization frame induction [Collaboration with Neal Snider, Stanford University] Extending work on building lexica using comparable corpora. [Collaboration with Vincent Lacey, Georgia Tech] Mona Diab [email protected] • Postdoctoral Research Scientist (Columbia University) Feb. ’05-Jun’05 Working on tools for processing MSA as well as developing standards for the processing of Arabic dialects. • Postdoctoral Scholar (Stanford University) Jan. ’04-Jan. ‘05 Working with Dan Jurafsky on domain and language independent semantic parsing and disambiguation.
    [Show full text]
  • Distributional Semantics, Wordnet, Semantic Role
    Semantics Kevin Duh Intro to NLP, Fall 2019 Outline • Challenges • Distributional Semantics • Word Sense • Semantic Role Labeling The Challenge of Designing Semantic Representations • Q: What is semantics? • A: The study of meaning • Q: What is meaning? • A: … We know it when we see it • These sentences/phrases all have the same meaning: • XYZ corporation bought the stock. • The stock was bought by XYZ corporation. • The purchase of the stock by XYZ corporation... • The stock purchase by XYZ corporation... But how to formally define it? Meaning Sentence Representation Example Representations Sentence: “I have a car” Logic Formula Graph Representation Key-Value Records Example Representations Sentence: “I have a car” As translation in “Ich habe ein Auto” another language There’s no single agreed-upon representation that works in all cases • Different emphases: • Words or Sentences • Syntax-Semantics interface, Logical Inference, etc. • Different aims: • “Deep (and narrow)” vs “Shallow (and broad)” • e.g. Show me all flights from BWI to NRT. • Do we link to actual flight records? • Or general concept of flying machines? Outline • Challenges • Distributional Semantics • Word Sense • Semantic Role Labeling Distributional Semantics 10 Learning Distributional Semantics from large text dataset • Your pet dog is so cute • Your pet cat is so cute • The dog ate my homework • The cat ate my homework neighbor(dog) overlaps-with neighbor(cats) so meaning(dog) is-similar-to meaning(cats) 11 Word2Vec implements Distribution Semantics Your pet dog is so cute 1. Your pet - is so | dog 2. dog | Your pet - is so From: Mikolov, Tomas; et al. (2013). "Efficient Estimation of Word 12 Representations in Vector Space” Latent Semantic Analysis (LSA) also implements Distributional Semantics Pet-peeve: fundamentally, neural approaches aren’t so different from classical LSA.
    [Show full text]
  • Foundations of Natural Language Processing Lecture 16 Semantic
    Language is Flexible Foundations of Natural Language Processing Often we want to know who did what to whom (when, where, how and why) • Lecture 16 But the same event and its participants can have different syntactic realizations. Semantic Role Labelling and Argument Structure • Sandy broke the glass. vs. The glass was broken by Sandy. Alex Lascarides She gave the boy a book. vs. She gave a book to the boy. (Slides based on those of Schneider, Koehn, Lascarides) Instead of focusing on syntax, consider the semantic roles (also called 13 March 2020 • thematic roles) defined by each event. Alex Lascarides FNLP Lecture 16 13 March 2020 Alex Lascarides FNLP Lecture 16 1 Argument Structure and Alternations Stanford Dependencies Mary opened the door • The door opened John slices bread with a knife • This bread slices easily The knife slices cleanly Mary loaded the truck with hay • Mary loaded hay onto the the truck The truck was loaded with hay (by Mary) The hay was loaded onto the truck (by Mary) John gave a present to Mary • John gave Mary a present cf Mary ate the sandwich with Kim! Alex Lascarides FNLP Lecture 16 2 Alex Lascarides FNLP Lecture 16 3 Syntax-Semantics Relationship Outline syntax = semantics • 6 The semantic roles played by different participants in the sentence are not • trivially inferable from syntactical relations . though there are patterns! • The idea of semantic roles can be combined with other aspects of meaning • (beyond this course). Alex Lascarides FNLP Lecture 16 4 Alex Lascarides FNLP Lecture 16 5 Commonly used thematic
    [Show full text]
  • LIMIT-BERT : Linguistics Informed Multi-Task BERT
    LIMIT-BERT : Linguistics Informed Multi-Task BERT Junru Zhou1;2;3 , Zhuosheng Zhang 1;2;3, Hai Zhao1;2;3∗, Shuailiang Zhang 1;2;3 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University fzhoujunru,[email protected], [email protected] Abstract so on (Zhou and Zhao, 2019; Zhou et al., 2020; Ouchi et al., 2018; He et al., 2018b; Li et al., 2019), In this paper, we present Linguistics Informed when taking the latter as downstream tasks for Multi-Task BERT (LIMIT-BERT) for learning the former. In the meantime, introducing linguis- language representations across multiple lin- tic clues such as syntax and semantics into the guistics tasks by Multi-Task Learning. LIMIT- BERT includes five key linguistics tasks: Part- pre-trained language models may furthermore en- Of-Speech (POS) tags, constituent and de- hance other downstream tasks such as various Nat- pendency syntactic parsing, span and depen- ural Language Understanding (NLU) tasks (Zhang dency semantic role labeling (SRL). Differ- et al., 2020a,b). However, nearly all existing lan- ent from recent Multi-Task Deep Neural Net- guage models are usually trained on large amounts works (MT-DNN), our LIMIT-BERT is fully of unlabeled text data (Peters et al., 2018; Devlin linguistics motivated and thus is capable of et al., 2019), without explicitly exploiting linguis- adopting an improved masked training objec- tive according to syntactic and semantic con- tic knowledge.
    [Show full text]
  • Merging Propbank, Nombank, Timebank, Penn Discourse Treebank and Coreference James Pustejovsky, Adam Meyers, Martha Palmer, Massimo Poesio
    Merging PropBank, NomBank, TimeBank, Penn Discourse Treebank and Coreference James Pustejovsky, Adam Meyers, Martha Palmer, Massimo Poesio Abstract annotates the temporal features of propositions Many recent annotation efforts for English and the temporal relations between propositions. have focused on pieces of the larger problem The Penn Discourse Treebank (Miltsakaki et al of semantic annotation, rather than initially 2004a/b) treats discourse connectives as producing a single unified representation. predicates and the sentences being joined as This paper discusses the issues involved in arguments. Researchers at Essex were merging four of these efforts into a unified responsible for the coreference markup scheme linguistic structure: PropBank, NomBank, the developed in MATE (Poesio et al, 1999; Poesio, Discourse Treebank and Coreference 2004a) and have annotated corpora using this Annotation undertaken at the University of scheme including a subset of the Penn Treebank Essex. We discuss resolving overlapping and (Poesio and Vieira, 1998), and the GNOME conflicting annotation as well as how the corpus (Poesio, 2004a). This paper discusses the various annotation schemes can reinforce issues involved in creating a Unified Linguistic each other to produce a representation that is Annotation (ULA) by merging annotation of greater than the sum of its parts. examples using the schemata from these efforts. Crucially, all individual annotations can be kept 1. Introduction separate in order to make it easy to produce alternative annotations of a specific type of The creation of the Penn Treebank (Marcus et al, semantic information without need to modify the 1993) and the word sense-annotated SEMCOR annotation at the other levels. Embarking on (Fellbaum, 1997) have shown how even limited separate annotation efforts has the advantage of amounts of annotated data can result in major allowing researchers to focus on the difficult improvements in complex natural language issues in each area of semantic annotation and understanding systems.
    [Show full text]
  • Radio Frequency Identification Based Smart
    ISSN (Online) 2394-6849 International Journal of Engineering Research in Electronics and Communication Engineering (IJERECE) Vol 4, Issue 11, November 2017 Unstructured Text to DBPEDIA RDF Triples – Entity Extraction [1] Monika S G, [2] Chiranjeevi S, [3] Harshitha A, [4] Harshitha M, [5] V K Tivari, [6] Raghevendra Rao [1 – 5] Department of Electronics and Communication Engineering, Sri Sairam College of Engineering, Anekal, Bengaluru. [5] Assistant Professor, Department of Electronic and Communication Engineering, Sri Sairam College of Engineering, Anekal, Bengaluru. [6].Assistant Professor, Department of Computer Science and Engineering, Sri Sairam College of Engineering, Anekal, Bengaluru Abstract:- In the means of current technologies Use of data, information has grown significantly over the last few years. The information processing facing an issue like where the data is originating from multiple sources in an uncontrolled environment. The reason for the uncontrolled environment is the data gathered beyond the organization and generated by many people working outside the organization. The intent of this paper is delving into this unformatted information and build the framework in such a way that the information becomes more managed and used in the organization. Case and point for resume submitted for particular positions should become searchable. In this framework, we try and solve the problem and provide suggestions on how to solve other similar problem. In this paper, we describe an end-to-end system that automatically extracts RDF triples describing entity relations and properties from unstructured text. This system is based on a pipeline of text processing modules that includes an asemantic parser and a co-reference solver.
    [Show full text]
  • Frame-Based Continuous Lexical Semantics Through Exponential Family Tensor Factorization and Semantic Proto-Roles
    Frame-Based Continuous Lexical Semantics through Exponential Family Tensor Factorization and Semantic Proto-Roles Francis Ferraro and Adam Poliak and Ryan Cotterell and Benjamin Van Durme Center for Language and Speech Processing Johns Hopkins University fferraro,azpoliak,ryan.cotterell,[email protected] Abstract AGENT GOAL We study how different frame annota- ATTEMPT tions complement one another when learn- She said Bill would try the same tactic again. ing continuous lexical semantics. We Figure 1: A simple frame analysis. learn the representations from a tensorized skip-gram model that consistently en- codes syntactic-semantic content better, with multiple 10% gains over baselines. with judgements about multiple underlying prop- erties about what is likely true of the entity fill- 1 Introduction ing the role. For example, SPR talks about how likely it is for Bill to be a willing participant in the Consider “Bill” in Fig.1: what is his involve- ATTEMPT. The answer to this and other simple ment with the words “would try,” and what does judgments characterize Bill and his involvement. this involvement mean? Word embeddings repre- Since SPR both captures the likelihood of certain sent such meaning as points in a real-valued vec- properties and characterizes roles as groupings of tor space (Deerwester et al., 1990; Mikolov et al., properties, we can view SPR as representing a type 2013). These representations are often learned by of continuous frame semantics. exploiting the frequency that the word cooccurs with contexts, often within a user-defined window We are interested in capturing these SPR-based (Harris, 1954; Turney and Pantel, 2010).
    [Show full text]
  • Verbnet and Propbank
    The English VerbNet PropBank VerbNet and PropBank Word and VerbNets for Semantic Processing DAAD Summer School in Advanced Language Engineering, Kathmandu University, Nepal Day 3 Annette Hautli 1 / 34 The English VerbNet PropBank Recap Levin (1993) Detailed analysis of syntactic alternations in English Verbs with a common meaning partake in the same syntactic alternations Verb classes that share syntactic properties ! How can we use this information in nlp? 2 / 34 The English VerbNet Lexical information PropBank The VerbNet API Today 1 The English VerbNet Lexical information The VerbNet API 2 PropBank 3 / 34 The English VerbNet Lexical information PropBank The VerbNet API Overview Hierarchically arranged verb classes Inspired by Levin (1993) and further extended for nlp purposes Levin: 240 classes (47 top level, 193 second level), around 3000 lemmas VerbNet: 471 classes, around 4000 lemmas Explicit links to other resources, e.g. WordNet, FrameNet 4 / 34 The English VerbNet Lexical information PropBank The VerbNet API Outline 1 The English VerbNet Lexical information The VerbNet API 2 PropBank 5 / 34 The English VerbNet Lexical information PropBank The VerbNet API Components Thematic roles 23 thematic roles are used in VerbNet Each argument is assigned a unique thematic role When verbs have more than one sense, they usually have different thematic roles 6 / 34 The English VerbNet Lexical information PropBank The VerbNet API Components Role Example Actor Susan was chitchatting. Agent Cynthia nibbled on the carrot. Asset Carmen purchased a dress for $50. Attribute The price of oil soared. Beneficiary Sandy sang a song to me. Cause My heart is pounding from fear.
    [Show full text]
  • Arxiv:2005.08206V1 [Cs.CL] 17 May 2020 and the Projection Method Itself
    Building a Hebrew Semantic Role Labeling Lexical Resource from Parallel Movie Subtitles Ben Eyal, Michael Elhadad Dept. of Computer Science, Ben-Gurion University of the Negev Beer Sheva, Israel [email protected], [email protected] Abstract We present a semantic role labeling resource for Hebrew built semi-automatically through annotation projection from English. This corpus is derived from the multilingual OpenSubtitles dataset and includes short informal sentences, for which reliable linguistic annotations have been computed. We provide a fully annotated version of the data including morphological analysis, dependency syntax and semantic role labeling in both FrameNet and PropBank styles. Sentences are aligned between English and Hebrew, both sides include full annotations and the explicit mapping from the English arguments to the Hebrew ones. We train a neural SRL model on this Hebrew resource exploiting the pre-trained multilingual BERT transformer model, and provide the first available baseline model for Hebrew SRL as a reference point. The code we provide is generic and can be adapted to other languages to bootstrap SRL resources. Keywords: SRL, FrameNet, PropBank, Hebrew, Cross-lingual Linguistic Annotations 1. Introduction jection. We provide full source code and datasets for the described method https://github.com/bgunlp/ Semantic role labeling (SRL) is the task that consists of an- hebrew_srl. The resulting dataset is the first available notating sentences with labels that answer questions such as SRL resource for Hebrew. “Who did what to whom, when, and where?”, the answers to these questions are called “roles.” 1.1. FrameNet Two major SRL annotation schemes have been used in re- This work is part of a larger effort to develop a Hebrew cent years, FrameNet (Baker et al., 1998) and PropBank FrameNet (Hayoun and Elhadad, 2016).
    [Show full text]