Semlink: Overview SemLink - Linking PropBank, WordNet, OntoNotes Groupings, PropBank VerbNet, FrameNet VerbNet Verbs grouped in hierarchical classes
Explicitly described class properties Martha Palmer FrameNet University of Colorado Links among lexical resources February 12, 2013 PropBank, FrameNet, WordNet, OntoNotes groupings LING 7800 Automatic Semantic Role Labeling with PropBank/Verbnet 1
WordNet – Princeton WordNet – Princeton – leave, n.4, v.14 (Miller 1985, Fellbaum 1998) (Miller 1985, Fellbaum 1998) On-line lexical reference (dictionary) Limitations as a computational lexicon Nouns, verbs, adjectives, and adverbs grouped into Contains little syntactic information synonym sets No explicit lists of participants Other relations include hypernyms (ISA), antonyms, Sense distinctions very fine-grained, meronyms Definitions often vague Typical top nodes - 5 out of 25 Causes problems with creating training data for (act, action, activity) supervised Machine Learning – SENSEVAL2 (animal, fauna) Verbs > 16 senses (including call) (artifact) Inter-annotator Agreement ITA 71%, (attribute, property) Automatic Word Sense Disambiguation, WSD 64% (body, corpus) Dang & Palmer, SIGLEX02
CLEAR – Colorado 3 CLEAR – Colorado 4
1 Creation of coarse-grained resources OntoNotes Goal: Modeling Shallow Semantics DARPA-GALE Unsupervised clustering using rules (Mihalcea & AGILE Team: BBN, Colorado, ISI, Moldovan, 2001) Penn Clustering by mapping WN senses to ODE Skeletal representation of literal (Navigli, 2006). meaning OntoNotes - Manually grouping WN senses Synergistic combination of: Text and annotating a corpus (Hovy et al., 2006) Syntactic structure Treebank Supervised clustering WN senses using Propositional structure OntoNotes and another set of manually Word Sense Word sense PropBank Co-reference (Snow et al., 2007) wrt Ontology tagged data . Coreference OntoNotes CLEAR – Colorado 5 CLEAR – Colorado 6 Annotated Text
Empirical Validation – Human Judges Groupings Methodology – Human Judges the 90% solution (1700 verbs) (w/ Dang and Fellbaum) Double blind groupings, adjudication Syntactic Criteria (VerbNet was useful) Leave 49% -> 86% Distinct subcategorization frames call him an idiot call him a taxi
Recognizable alternations – regular sense extensions: play an instrument play a song play a melody on an instrument SIGLEX01, SIGLEX02, JNLE07, Duffield, et. al., CogSci 2007
CLEAR – Colorado 7 CLEAR – Colorado 8
2 Groupings Methodology (cont.) WordNet: - call, 28 senses, 9 groups
Semantic Criteria WN5, WN16,WN12 WN15 WN26 Differences in semantic classes of arguments Loud cry Bird or animal cry Abstract/concrete, human/animal, animate/inanimate, different instrument types,… WN3 WN19 WN4 WN 7 WN8 WN9 Differences in the number and type of arguments Request Often reflected in subcategorization frames WN1 WN22 John left the room. Label WN20 WN25 I left my pearls to my daughter-in-law in my will. Call a loan/bond Differences in entailments ChallengeWN18 WN27 Change of prior entity or creation of a new entity? WN2 WN 13 WN6 WN23 Differences in types of events Visit Phone/radio WN28 Abstract/concrete/mental/emotional/….
Specialized subject domains WN17 , WN 11 WN10, WN14, WN21, WN24, Bid 9 CLEAR – Colorado
OntoNotes Status Frames File Example: expect Roles:
More than 2,500 verbs grouped AgentARG0: expecter Theme : thing expected Average ITA per verbs = 89% ARG1
http://verbs.colorado.edu/html_groupings/ Example: Transitive, active:
More than 150,000 instances annotated Portfolio managers expect further declines in WSJ, Brown, ECTB, EBN, EBC, WebText interest rates. Training and Testing Agent: Portfolio managers How do the groupings connect to PropBank? REL: expect Theme: further declines in interest rates
CLEAR – Colorado 11
3 Where we are now - DETAILS Included in OntoNotes 5.1:
DARPA-GALE, OntoNotes 5.0 Extensions to PropBank
BBN, Brandeis, Colorado, Penn Original annotation coverage: Multilayer structure: NE, TB, PB, WS, Coref PropBank: verbs; past participle adjectival Three languages: English, Arabic, Chinese modifiers Several Genres (@ ≥ 200K ): NW, BN, BC, WT NomBank: relational and eventive nouns. Close to 2M words @ language (less PB for Arabic) Substantial gap – trying to bridge Parallel data, E/C, E/A light verbs, other predicative adjectives, eventive PropBank frame coverage for rare verbs nouns Recent PropBank extensions
13 14
English Noun and LVC annotation PropBank Verb Frames Coverage
100%
The set of verbs is open Example Noun: Decision 99% But the distribution is Roleset: Arg0: decider, Arg1: decision… highly skewed 98%
For English, the 1000 97% “…[yourARG0] [decisionREL] most frequent lemmas cover 95% of the verbs [to say look I don't want to go through this 96% in running text. anymoreARG1]” Graphs show counts over 95% English Web data containing 150 M verbs. Example within an LVC: Make a decision 94% 1000 2000 3000 4000 5000 6000 7000 8000 “…[the PresidentARG0] [madeREL-LVB] the [fundamentally correctARGM-ADJ] [decisionREL] [to get on offenseARG1]”
15 16
4 Verb Frames Coverage By Word Senses in PropBank
Language Orders to ignore word sense not feasible for 700+ verbs Mary left the room Projected Estimated Coverage Mary left her daughter-in-law her pearls in her will Language Final Count in Running Text English 5,100 99% Frameset leave.01 "move away from": Chinese 18,200* 96% Arg0: entity leaving Arabic 5,250* 99% Arg1: place left
* This covers all the verbs and most of the predicative Frameset leave.02 "give": adjectives/nouns in ATB, and CTB Arg0: giver Arg1: thing given How do the PropBank verb frames relate to Word Senses? Arg2: beneficiary
Answer requires more explanation about OntoNotes senses How do these relate to word senses in other resources?
17 CLEAR – Colorado 18
Sense Hierarchy SEMLINK-PropBank, VerbNet, FrameNet, (Palmer, et al, SNLU04 - NAACL04, NLE07, Chen, et. al, NAACL06) WordNet, OntoNotes GroupingsPalmer, Dang & Fellbaum, NLE 2007 PropBank Framesets – ITA >90% coarse grained distinctions PropBank cost-54.2, ON2 Frameset1* 20 Senseval2 verbs w/ > 1 Frameset fit-54.3, ON3 carry Maxent WSD system, 73.5% baseline, 90% WN1 WN2 WN5 WN20 WN22 WN24 WN24 WN31 WN33 WN34
WN1 WN3 WN8 WN11 WN 23 Sense Groups (Senseval-2) - ITA 82% Tagging w/groups, Intermediate level WN9 WN16 WN17 WN19 WN27 WN37 WN38 ITA 90%, 200@hr, (includes Levin classes) – 71.7% Taggers - 86.9% Semeval07 WN28 WN32 WN35 WN36 ON4 – win election
Chen, Dligach & Palmer, ICSC 2007 WordNet – ITA 73% carry-11.4, CARRY,-FN ,ON1 fine grained distinctions, 64% Dligach & Palmer, ACL-11, - 88% *ON5-ON11 carry oneself,carried away/out/off, carry to term CLEAR – Colorado 19
5 Limitations to PropBank VerbNet – based on Levin, B.,93
WSJ too domain specific, Class entries: Kipper, et. al., LRE08
Additional Brown corpus annotation & GALE data Capture generalizations about verb behavior
FrameNet has selected instances from BNC Organized hierarchically
Args2-4 seriously overloaded, poor Members have common semantic elements, performance semantic roles, syntactic frames, predicates
VerbNet and FrameNet both provide more fine- Verb entries:
grained role labels Refer to a set of classes (different senses)
each class member linked to WN synset(s), ON groupings, PB frame files, FrameNet frames,
21 22
Mapping from PB to VerbNet FrameNet: Telling.inform http://verbs.colorado.edu/semlink
Time In 2002, Speaker the U.S. State Department Target INFORMED Addressee North Korea
Message that the U.S. was aware of this program , and regards it as a violation of Pyongyang's nonproliferation commitments
24 23
6 Mapping from PropBank to VerbNet PropBank/VerbNet/FrameNet (similar mapping for PB-FrameNet) Complementary Frameset id = Sense = VerbNet class = Redundancy is harmless, may even be useful leave.02 give future-having 13.3 PropBank provides the best training data Arg0 Giver Agent/Donor* VerbNet provides the clearest links between syntax and semantics Arg1 Thing given Theme FrameNet provides the richest semantics Arg2 Benefactive Recipient Together they give us the most comprehensive coverage So…. We’re also mapping VerbNet to *FrameNet Label Baker, Fillmore, & Lowe, COLING/ACL-98 Fillmore & Baker, WordNetWKSHP, 2001 FrameNet 26
Mapping Issues (2) Mapping Issues (3) VerbNet verbs mapped to FrameNet VerbNet verbs mapped to FrameNet
VerbNet clear-10.3 FrameNet Classes VN Class: put 9.1 FrameNet frame: place clear Members: arrange*, immerse, lodge, mount, sling** Frame Elements: Removing Thematic roles: clean • Agent • agent (+animate) •Cause • theme (+concrete) drain • Theme • destination (+loc, -region) • Goal Emptying Frames: empty Examples: • … trash *different sense •… ** not in FrameNet
27 28
7 Class formation Issues: create Class formation Issues: produce Susan Brown Susan Brown
group 3 group 1 grp 2 grp 1 grp 1 grp 3 Cause 4 3 grp 2 7 _to_start 1 1 2 1 3 cause_to_
engender 1, 2 5, 6 4 5 start 2 6 3 6 behind_the_ 2 4 scenes 6 5 3 group 2 create 7 intentionally_ grp 3 4, 5 create Creating VerbNet FrameNet FrameNet VerbNet
29 30
Class formation Issues: break/Verbnet Class Formation Issues: break/FrameNet Susan Brown WN44 – the skin broke Susan Brown WN49 – the simple vowels broke in many Germanic languages
break 45.1 grp 2 grp 1 cause_to_ split 23.2 fragment 1, 10, 31 2, 3 grp 2 grp 1 15 32, 38 45 58 1 10 2, 20 3, 32 20, 40, 41, 43 31 51 38, 40 41, 45 grp 3 61 44 hurt 43, 58 4, 5, 17, render_ 61 29, 35, grp 9 nonfunctional 53, 63 experience_ 16 28 4 5 29 bodily_harm grp 8 15 27 46 grp 4 49 37 57 35, 17, 44, 53, 63 compliance 6, 3 grp 11 25 26 appear 48.1.1 grp 3 cheat 10.6
31 32
8 WordNet: - leave, 14 senses, grouped WordNet: - leave, 14 senses, groups, PB Depart, a job, a room, a Depart, a job, a room, a WN1, WN5,WN8 dock, a country WN1, WN5,WN8 dock, a country (for X) 4 1 2 WN6 WN10 WN2 WN 4 WN9 WN11 WN12 WN6 WN10 WN2 WN 4 WN9 WN11 WN12 WN14 Wnleave_off2,3 WNleave_behind1,2,3 WN14 WNleave_off2,3 WNleave_behind1,2,3 Leave behind, leave alone Leave behind, leave alone WNleave_alone1 WN13 WNleave_alone1 WN13 5 Create a WN3State WN7 Create a WN3State /cause WN7 an effect: WNleave_off1 Left us speechless, leave a stain WNleave_off1 WNleave_out1, Wnleave_out2 “leave off” stop, terminate WNleave_out1, WNleave_out2 stop, terminate: the road leaves off, not exclude exclude 3 leave off your jacket, the results
CLEAR – Colorado 33 CLEAR – Colorado 34
Leave behind, leave alone… Overlap between Groups and
PropBank Framesets – 95% Frameset2 John left his keys at the restaurant. We left behind all our cares during our vacation. Frameset1
They were told to leave off their coats. WN1 WN2 WN3 WN4 Leave the young fawn alone. Leave the nature park just as you found it. I left my shoes on when I entered their house. WN6 WN7 WN8 WN5 WN 9 WN10 When she put away the food she left out the pie. WN11 WN12 WN13 WN 14 Let's leave enough time to visit the museum. WN19 WN20 He'll leave the decision to his wife. develop When he died he left the farm to his wife. I'm leaving our telephone and address with you. Palmer, Dang & Fellbaum, NLE 2007
CLEAR – Colorado 35 CLEAR – Colorado 36
9 Broader coverage still needed SEMLINK
Extended VerbNet: 5,391 senses (91% PB) Only 78% of PropBank verbs included in VN Type-type mapping PB/VN, VN/FN (100+ new classes from (Korhonen and Briscoe, 2004; Korhonen and Ryant, Most classes focused on verbs with NP and 2005)) PP complements Semi-automatic mapping of WSJ PropBank Neglected verbs that take adverbial, instances to VerbNet classes and thematic adjectival, and sentential complements roles, hand-corrected. (now FrameNet also) VerbNet class tagging as automatic WSD
Brown, Dligach, Palmer, IWCS 2011 Run SRL, map Arg2 to VerbNet roles, Brown performance improves Yi, Loper, Palmer, NAACL07 38
Summary Lexical resources can provide
Generalizations about subcat frames & roles
Reviewed available lexical resources Backoff classes for OOV items for portability
WordNet, Groupings, PropBank, VerbNet, Semantic similarities/”types” for verbs FrameNet Event type hierarchies for inferencing We need a whole that is greater than the sum Need to be unified and empirically validated and of the parts – Semlink extended: Semlink+
Greater coverage, greater richness, VN & FN need PB like coverage, and techniques for increased training data over more genres, automatic domain adaptation - Lexlink opportunities for generalizations Hybrid lexicons – symbolic and statistical lexical entries?
CLEAR – Colorado 39 40
10 Acknowledgments And thanks to
We gratefully acknowledge the support of the National Science Foundation Grants for , Consistent Criteria for Postdocs: Paul Kingsbury, Dan Gildea, Word Sense Disambiguation, Robust Semantic Parsing, Nianwen Xue, Richer Representations for Machine Translation, A Bayesian Approach to Dynamic Lexical Resources for Students: Joseph Rosenzweig, Hoa Dang, Flexible Language Processing and DARPA-GALE via a Tom Morton, Karin Kipper Schuler, Jinying subcontract from BBN. Chen, Szu-Ting Yi, Edward Loper, Susan Any opinions, findings, and conclusions or recommendations expressed in this material are those of Brown, Dmitriy Dligach, Jena Hwang, Will the authors and do not necessarily reflect the views of Corvey, Claire Bonial, Jinho Choi, Lee the National Science Foundation. Becker, Shumin Wu, Kevin Stowe
Collaborators: Christiane Fellbaum, Suzanne Stevenson, Annie Zaenen, Orin Hargraves 41 42
11