WordNet Word Sense Disambiguation (wsd) Wrap-up
WordNet, wsd and Wrap-up
Word and VerbNets for Semantic Processing
DAAD Summer School in Advanced Language Engineering, Kathmandu University, Nepal
Day 5 Annette Hautli
1 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Agenda
1 WordNet
2 Word Sense Disambiguation (wsd)
3 Wrap-up
2 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Overview
WordNet is a lexical database of words that are semantically related to each other Provides information on two fundamental properties of human language: synonymy and polysemy Synonymy: one-to-many mapping between meaning and form Polysemy: one-to-many mapping between form and meaning
3 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Synonymy
Each node in the semantic network is a “concept” Each “concept” is expressed by different words, the “synonyms” The synonym sets, or “synsets”, are the building blocks of WordNet For example {car, automobile, auto, motocar} {queue, line} {beat, hit, strike}
4 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Synonymy
Synset members are unordered All relate to the same concept Not taken into account: frequency, genre etc.
5 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Polysemy
One word has multiple meanings {bank, financial institution } {bank, river side } {bank, furniture } A word that appears in n synsets is n-fold polysemous bank is therefore 3-fold polysemous (has three senses)
6 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up The “Net” in WordNet
Synsets are connected via different relations Semantic relations are expressed by bi-directional arcs Different kinds of relations: hypernymy/hyponymy meronymy/holynymy antonymy troponymy Result: large semantic network Single top root note Entity, with about a dozen high-level concepts (Allows programs to compute the similarity between words)
7 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up WordNet structure
Actually, WordNet consists of four separate networks, one for each part of speech 1 nouns 2 verbs 3 adjectives 4 adverbs Not many relations between these sub-networks
8 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Hypernymy and hyponymy
Relates more/less general concepts of nouns
vehicle .& car, automobile bicycle, bike . & ↓ convertible SUV mountain bike
“A car is a kind of vehicle” “The class of vehicles includes cars and bikes” “A car is a kind of vehicle.”, “An SUV is a kind of car.” →“An SUV is a kind of vehicle.” Noun relations can have up to 16 levels
9 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Meronymy and holynymy
Part-whole relation of nouns car, automobile .& engine wheel . & ↓ spark plug cylinder wheel nut
“An engine has spark plugs.” “Spark plugs and cylinders are part of an engine.”
10 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Meronymy and holynymy
WordNet distinguishes three kinds of meronymy Proper parts arm – body, page – book, branch – tree Substance – stuff oxygen – water, flour – pizza Member – Group student – class, tree – forest, flock – bird But there are many more of meronymy relations...
11 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up WordNet
12 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Antonymy
Adjective relation For example Hot – cold Long – short New – old Wide – narrow Other antonyms may be less similar Hot – cool Long – lengthy New – acient
13 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Troponymy
“Manner relation” between verbs For example to walk – to move to whisper – to talk to gobble – to eat Trees can be constructed (although not as deep as for nouns) to move – to run – to jog to communicate – to talk – to whisper to gobble – to eat
14 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Miscellaneous relations
Other relations among verbs reflect a temporal or logical order
to divorce – to marry (backward presupposition) to snore – to sleep, to pay – to buy (inclusion) to kill – to die (cause)
15 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Demo
http://wordnet.princeton.edu/
16 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Agenda
1 WordNet
2 Word Sense Disambiguation (wsd)
3 Wrap-up
17 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Thanks
Slides are based on Jurafsky & Martin (2004, chapter 20)
18 / 38 Potentially helpful in many nlp tasks, e.g. machine translation, question-answering, information retrieval..
WordNet Word Sense Disambiguation (wsd) Wrap-up Overview
Word sense disambiguation The task of selecting the correct sense for a word in context.
19 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Overview
Word sense disambiguation The task of selecting the correct sense for a word in context.
Potentially helpful in many nlp tasks, e.g. machine translation, question-answering, information retrieval..
19 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Example
20 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up wsd algorithm
Basic form: Input: word in context, fixed set of word senses Output: the correct word sense for that use Context? Words surrounding the target word: annotated? just words and no particular order? context size?
21 / 38 bass smoked bass jazz bass player Window: e.g. 1-word window
WordNet Word Sense Disambiguation (wsd) Wrap-up Supervised wsd
Feature selection: Need to identify features that are predictive of verb senses Fundamental insight: look at the context words
22 / 38 smoked bass jazz bass player Window: e.g. 1-word window
WordNet Word Sense Disambiguation (wsd) Wrap-up Supervised wsd
Feature selection: Need to identify features that are predictive of verb senses Fundamental insight: look at the context words bass
22 / 38 jazz bass player Window: e.g. 1-word window
WordNet Word Sense Disambiguation (wsd) Wrap-up Supervised wsd
Feature selection: Need to identify features that are predictive of verb senses Fundamental insight: look at the context words bass smoked bass
22 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Supervised wsd
Feature selection: Need to identify features that are predictive of verb senses Fundamental insight: look at the context words bass smoked bass jazz bass player Window: e.g. 1-word window
22 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Supervised wsd
Method of feature selection: process the dataset (POS tagging, lemmatization, parsing) build feature representations encoding the relevant linguistic information two main feature types collocational features bag-of-words approach
23 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Collocational features
Features that take order or syntactic relation into account restricted to immediate word context (usually fixed window) Example: lemma and part of speech of two-word window syntactic function of the target word
24 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Example
(1) An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps.
25 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Bag-of-words features
lexical features pre-selected words that are potentially relevant for sense distinctions, e.g. for all-words task: frequent content words in the corpus for lexical sample task: content words in the sentences of the target word test for presence/absence of a certain word in the selected context
26 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Example
(2) An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps.
pre-selected words: [fishing, big, sound, player, fly] feature vector: [0, 0, 0, 1, 0]
27 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Evaluation
Measure: sense accuracy (percentage of words that are correctly tagged) Method: train-test methodology split the annotated corpus in test set and training set system is trained on the training set and evaluated on the test set Standardized datasets and methods: SensEval and SemEval competitions
28 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Evaluation
Baseline Baseline: performance we would get without much knowledge or with a simple approach necessary for any machine learning experiment Simplest baseline: most frequent sense WordNet: first sense (ordered senses) Very powerful baseline!
29 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Evaluation
Ceiling Ceiling or upper-bound performance: inter-annotator agreement
all-word corpora using WordNet: A0 around 0.75 - 0.8 more coarse-grained sense distinctions: A0 around 0.9
30 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Summary
Supervised approaches use sense-annotated datasets Need many annotated examples for every word Relevant information in the context lexico-syntactic information (collocational features) lexical information (bag of words features) information is encoded in the form of features and a classifier is trained to distinguish different senses of a given word
31 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Agenda
1 WordNet
2 Word Sense Disambiguation (wsd)
3 Wrap-up
32 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up This week
33 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up This week
1. What is the meaning of words? Approximation: with the help of resources like WordNet and FrameNet by abstraction across different part of speech operation noun and operate verb by an automatic disambiguation of the sense of a word
34 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up This week
2. How do entities take part in an event? Different ways to describe the roles of an event participant thematic roles (VerbNet) numbered roles (PropBank) frame elements (FrameNet)
35 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up This week
3. What event structure does a verb have? Some information contained in VerbNet Much more detail needed, e.g. E.g. commercial event with a number of subevents: 1 a buyer gives money and takes the goods 2 a seller gives the goods and takes the money
36 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Wrap-up
Central question of semantic nlp: Who did What to Whom, and How, When and Where?
Lexical resources are one building block of the automatic assembly of meaning
37 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Wrap-up
Thank you!
38 / 38