Semlink: Overview SemLink - Linking PropBank,  WordNet, OntoNotes Groupings, PropBank VerbNet, FrameNet  VerbNet  Verbs grouped in hierarchical classes

 Explicitly described class properties Martha Palmer  FrameNet University of Colorado  Links among lexical resources February 12, 2013  PropBank, FrameNet, WordNet, OntoNotes groupings LING 7800  Automatic with PropBank/Verbnet 1

WordNet – Princeton WordNet – Princeton – leave, n.4, v.14 (Miller 1985, Fellbaum 1998) (Miller 1985, Fellbaum 1998) On-line lexical reference (dictionary)  Limitations as a computational lexicon  Nouns, verbs, adjectives, and adverbs grouped into  Contains little syntactic information synonym sets  No explicit lists of participants  Other relations include hypernyms (ISA), antonyms,  Sense distinctions very fine-grained, meronyms  Definitions often vague  Typical top nodes - 5 out of 25  Causes problems with creating training data for  (act, action, activity) supervised – SENSEVAL2  (animal, fauna)  Verbs > 16 senses (including call)  (artifact)  Inter-annotator Agreement ITA 71%,  (attribute, property)  Automatic Word Sense Disambiguation, WSD 64%  (body, corpus) Dang & Palmer, SIGLEX02

CLEAR – Colorado 3 CLEAR – Colorado 4

1 Creation of coarse-grained resources OntoNotes Goal: Modeling Shallow Semantics DARPA-GALE  Unsupervised clustering using rules (Mihalcea &  AGILE Team: BBN, Colorado, ISI, Moldovan, 2001) Penn  Clustering by mapping WN senses to ODE  Skeletal representation of literal (Navigli, 2006). meaning  OntoNotes - Manually grouping WN senses  Synergistic combination of: Text and annotating a corpus (Hovy et al., 2006)  Syntactic structure  Supervised clustering WN senses using  Propositional structure OntoNotes and another set of manually Word Sense  Word sense PropBank Co-reference (Snow et al., 2007) wrt Ontology tagged data .  Coreference OntoNotes CLEAR – Colorado 5 CLEAR – Colorado 6 Annotated Text

Empirical Validation – Human Judges Groupings Methodology – Human Judges the 90% solution (1700 verbs) (w/ Dang and Fellbaum)  Double blind groupings, adjudication  Syntactic Criteria (VerbNet was useful) Leave 49% -> 86%  Distinct subcategorization frames  call him an idiot  call him a taxi

 Recognizable alternations – regular sense extensions:  play an instrument  play a song  play a melody on an instrument SIGLEX01, SIGLEX02, JNLE07, Duffield, et. al., CogSci 2007

CLEAR – Colorado 7 CLEAR – Colorado 8

2 Groupings Methodology (cont.) WordNet: - call, 28 senses, 9 groups

 Semantic Criteria WN5, WN16,WN12 WN15 WN26  Differences in semantic classes of arguments Loud cry Bird or animal cry  Abstract/concrete, human/animal, animate/inanimate, different instrument types,… WN3 WN19 WN4 WN 7 WN8 WN9  Differences in the number and type of arguments Request  Often reflected in subcategorization frames WN1 WN22  John left the room. Label WN20 WN25  I left my pearls to my daughter-in-law in my will. Call a loan/bond  Differences in entailments ChallengeWN18 WN27  Change of prior entity or creation of a new entity? WN2 WN 13 WN6 WN23  Differences in types of events Visit Phone/radio WN28  Abstract/concrete/mental/emotional/….

 Specialized subject domains WN17 , WN 11 WN10, WN14, WN21, WN24, Bid 9 CLEAR – Colorado

OntoNotes Status Frames File Example: expect Roles:

 More than 2,500 verbs grouped AgentARG0: expecter Theme : thing expected  Average ITA per verbs = 89% ARG1

 http://verbs.colorado.edu/html_groupings/ Example: Transitive, active:

 More than 150,000 instances annotated Portfolio managers expect further declines in  WSJ, Brown, ECTB, EBN, EBC, WebText interest rates.  Training and Testing Agent: Portfolio managers  How do the groupings connect to PropBank? REL: expect Theme: further declines in interest rates

CLEAR – Colorado 11

3 Where we are now - DETAILS Included in OntoNotes 5.1:

 DARPA-GALE, OntoNotes 5.0 Extensions to PropBank

 BBN, Brandeis, Colorado, Penn  Original annotation coverage:  Multilayer structure: NE, TB, PB, WS, Coref  PropBank: verbs; past participle adjectival  Three languages: English, Arabic, Chinese modifiers  Several Genres (@ ≥ 200K ): NW, BN, BC, WT  NomBank: relational and eventive nouns.  Close to 2M words @ language (less PB for Arabic)  Substantial gap – trying to bridge  Parallel data, E/C, E/A  light verbs, other predicative adjectives, eventive  PropBank frame coverage for rare verbs nouns  Recent PropBank extensions

13 14

English Noun and LVC annotation PropBank Verb Frames Coverage

100%

 The set of verbs is open  Example Noun: Decision 99%  But the distribution is  Roleset: Arg0: decider, Arg1: decision… highly skewed 98%

 For English, the 1000 97%  “…[yourARG0] [decisionREL] most frequent lemmas cover 95% of the verbs [to say look I don't want to go through this 96% in running text. anymoreARG1]”  Graphs show counts over 95% English Web data containing 150 M verbs.  Example within an LVC: Make a decision 94% 1000 2000 3000 4000 5000 6000 7000 8000  “…[the PresidentARG0] [madeREL-LVB] the [fundamentally correctARGM-ADJ] [decisionREL] [to get on offenseARG1]”

15 16

4 Verb Frames Coverage By Word Senses in PropBank

Language  Orders to ignore word sense not feasible for 700+ verbs  Mary left the room Projected Estimated Coverage  Mary left her daughter-in-law her pearls in her will Language Final Count in Running Text English 5,100 99% Frameset leave.01 "move away from": Chinese 18,200* 96% Arg0: entity leaving Arabic 5,250* 99% Arg1: place left

* This covers all the verbs and most of the predicative Frameset leave.02 "give": adjectives/nouns in ATB, and CTB Arg0: giver Arg1: thing given How do the PropBank verb frames relate to Word Senses? Arg2: beneficiary

Answer requires more explanation about OntoNotes senses How do these relate to word senses in other resources?

17 CLEAR – Colorado 18

Sense Hierarchy SEMLINK-PropBank, VerbNet, FrameNet, (Palmer, et al, SNLU04 - NAACL04, NLE07, Chen, et. al, NAACL06) WordNet, OntoNotes GroupingsPalmer, Dang & Fellbaum, NLE 2007  PropBank Framesets – ITA >90% coarse grained distinctions PropBank cost-54.2, ON2 Frameset1* 20 Senseval2 verbs w/ > 1 Frameset fit-54.3, ON3 carry Maxent WSD system, 73.5% baseline, 90% WN1 WN2 WN5 WN20 WN22 WN24 WN24 WN31 WN33 WN34

WN1 WN3 WN8 WN11 WN 23  Sense Groups (Senseval-2) - ITA 82% Tagging w/groups, Intermediate level WN9 WN16 WN17 WN19 WN27 WN37 WN38 ITA 90%, 200@hr, (includes Levin classes) – 71.7% Taggers - 86.9% Semeval07 WN28 WN32 WN35 WN36 ON4 – win election

Chen, Dligach & Palmer, ICSC 2007  WordNet – ITA 73% carry-11.4, CARRY,-FN ,ON1 fine grained distinctions, 64% Dligach & Palmer, ACL-11, - 88% *ON5-ON11 carry oneself,carried away/out/off, carry to term CLEAR – Colorado 19

5 Limitations to PropBank VerbNet – based on Levin, B.,93

 WSJ too domain specific,  Class entries: Kipper, et. al., LRE08

 Additional annotation & GALE data  Capture generalizations about verb behavior

 FrameNet has selected instances from BNC  Organized hierarchically

 Args2-4 seriously overloaded, poor  Members have common semantic elements, performance semantic roles, syntactic frames, predicates

 VerbNet and FrameNet both provide more fine-  Verb entries:

grained role labels  Refer to a set of classes (different senses)

 each class member linked to WN synset(s), ON groupings, PB frame files, FrameNet frames,

21 22

Mapping from PB to VerbNet FrameNet: Telling.inform http://verbs.colorado.edu/semlink

Time In 2002, Speaker the U.S. State Department Target INFORMED Addressee North Korea

Message that the U.S. was aware of this program , and regards it as a violation of Pyongyang's nonproliferation commitments

24 23

6 Mapping from PropBank to VerbNet PropBank/VerbNet/FrameNet (similar mapping for PB-FrameNet)  Complementary Frameset id = Sense = VerbNet class =  Redundancy is harmless, may even be useful leave.02 give future-having 13.3  PropBank provides the best training data Arg0 Giver Agent/Donor*  VerbNet provides the clearest links between syntax and semantics Arg1 Thing given Theme  FrameNet provides the richest semantics Arg2 Benefactive Recipient  Together they give us the most comprehensive coverage  So…. We’re also mapping VerbNet to *FrameNet Label Baker, Fillmore, & Lowe, COLING/ACL-98 Fillmore & Baker, WordNetWKSHP, 2001 FrameNet 26

Mapping Issues (2) Mapping Issues (3) VerbNet verbs mapped to FrameNet VerbNet verbs mapped to FrameNet

 VerbNet clear-10.3  FrameNet Classes VN Class: put 9.1 FrameNet frame: place clear Members: arrange*, immerse, lodge, mount, sling** Frame Elements: Removing Thematic roles: clean • Agent • agent (+animate) •Cause • theme (+concrete) drain • Theme • destination (+loc, -region) • Goal Emptying Frames: empty Examples: • … trash *different sense •… ** not in FrameNet

27 28

7 Class formation Issues: create Class formation Issues: produce Susan Brown Susan Brown

group 3 group 1 grp 2 grp 1 grp 1 grp 3 Cause 4 3 grp 2 7 _to_start 1 1 2 1 3 cause_to_

engender 1, 2 5, 6 4 5 start 2 6 3 6 behind_the_ 2 4 scenes 6 5 3 group 2 create 7 intentionally_ grp 3 4, 5 create Creating VerbNet FrameNet FrameNet VerbNet

29 30

Class formation Issues: break/Verbnet Class Formation Issues: break/FrameNet Susan Brown WN44 – the skin broke Susan Brown WN49 – the simple vowels broke in many Germanic languages

break 45.1 grp 2 grp 1 cause_to_ split 23.2 fragment 1, 10, 31 2, 3 grp 2 grp 1 15 32, 38 45 58 1 10 2, 20 3, 32 20, 40, 41, 43 31 51 38, 40 41, 45 grp 3 61 44 hurt 43, 58 4, 5, 17, render_ 61 29, 35, grp 9 nonfunctional 53, 63 experience_ 16 28 4 5 29 bodily_harm grp 8 15 27 46 grp 4 49 37 57 35, 17, 44, 53, 63 compliance 6, 3 grp 11 25 26 appear 48.1.1 grp 3 cheat 10.6

31 32

8 WordNet: - leave, 14 senses, grouped WordNet: - leave, 14 senses, groups, PB Depart, a job, a room, a Depart, a job, a room, a WN1, WN5,WN8 dock, a country WN1, WN5,WN8 dock, a country (for X) 4 1 2 WN6 WN10 WN2 WN 4 WN9 WN11 WN12 WN6 WN10 WN2 WN 4 WN9 WN11 WN12 WN14 Wnleave_off2,3 WNleave_behind1,2,3 WN14 WNleave_off2,3 WNleave_behind1,2,3 Leave behind, leave alone Leave behind, leave alone WNleave_alone1 WN13 WNleave_alone1 WN13 5 Create a WN3State WN7 Create a WN3State /cause WN7 an effect: WNleave_off1 Left us speechless, leave a stain WNleave_off1 WNleave_out1, Wnleave_out2 “leave off” stop, terminate WNleave_out1, WNleave_out2 stop, terminate: the road leaves off, not exclude exclude 3 leave off your jacket, the results

CLEAR – Colorado 33 CLEAR – Colorado 34

Leave behind, leave alone… Overlap between Groups and

PropBank Framesets – 95% Frameset2  John left his keys at the restaurant. We left behind all our cares during our vacation. Frameset1

They were told to leave off their coats. WN1 WN2 WN3 WN4 Leave the young fawn alone. Leave the nature park just as you found it. I left my shoes on when I entered their house. WN6 WN7 WN8 WN5 WN 9 WN10 When she put away the food she left out the pie. WN11 WN12 WN13 WN 14 Let's leave enough time to visit the museum. WN19 WN20 He'll leave the decision to his wife. develop When he died he left the farm to his wife. I'm leaving our telephone and address with you. Palmer, Dang & Fellbaum, NLE 2007

CLEAR – Colorado 35 CLEAR – Colorado 36

9 Broader coverage still needed SEMLINK

 Extended VerbNet: 5,391 senses (91% PB)  Only 78% of PropBank verbs included in VN  Type-type mapping PB/VN, VN/FN  (100+ new classes from (Korhonen and Briscoe, 2004; Korhonen and Ryant,  Most classes focused on verbs with NP and 2005)) PP complements  Semi-automatic mapping of WSJ PropBank  Neglected verbs that take adverbial, instances to VerbNet classes and thematic adjectival, and sentential complements roles, hand-corrected. (now FrameNet also)  VerbNet class tagging as automatic WSD

Brown, Dligach, Palmer, IWCS 2011  Run SRL, map Arg2 to VerbNet roles, Brown performance improves Yi, Loper, Palmer, NAACL07 38

Summary Lexical resources can provide

 Generalizations about subcat frames & roles

 Reviewed available lexical resources  Backoff classes for OOV items for portability

 WordNet, Groupings, PropBank, VerbNet,  Semantic similarities/”types” for verbs FrameNet  Event type hierarchies for inferencing  We need a whole that is greater than the sum  Need to be unified and empirically validated and of the parts – Semlink extended: Semlink+

 Greater coverage, greater richness,  VN & FN need PB like coverage, and techniques for increased training data over more genres, automatic domain adaptation - Lexlink opportunities for generalizations  Hybrid lexicons – symbolic and statistical lexical entries?

CLEAR – Colorado 39 40

10 Acknowledgments And thanks to

 We gratefully acknowledge the support of the National Science Foundation Grants for , Consistent Criteria for  Postdocs: Paul Kingsbury, Dan Gildea, Word Sense Disambiguation, Robust Semantic , Nianwen Xue, Richer Representations for , A Bayesian Approach to Dynamic Lexical Resources for  Students: Joseph Rosenzweig, Hoa Dang, Flexible Language Processing and DARPA-GALE via a Tom Morton, Karin Kipper Schuler, Jinying subcontract from BBN. Chen, Szu-Ting Yi, Edward Loper, Susan  Any opinions, findings, and conclusions or recommendations expressed in this material are those of Brown, Dmitriy Dligach, Jena Hwang, Will the authors and do not necessarily reflect the views of Corvey, Claire Bonial, Jinho Choi, Lee the National Science Foundation. Becker, Shumin Wu, Kevin Stowe

 Collaborators: Christiane Fellbaum, Suzanne Stevenson, Annie Zaenen, Orin Hargraves 41 42

11