<<

WordNet Sense Disambiguation (wsd) Wrap-up

WordNet, wsd and Wrap-up

Word and VerbNets for Semantic Processing

DAAD Summer School in Advanced Language Engineering, Kathmandu University, Nepal

Day 5 Annette Hautli

1 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Agenda

1 WordNet

2 Word Sense Disambiguation (wsd)

3 Wrap-up

2 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Overview

WordNet is a lexical database of that are semantically related to each other Provides information on two fundamental properties of human language: synonymy and Synonymy: one-to-many mapping between meaning and form Polysemy: one-to-many mapping between form and meaning

3 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Synonymy

Each node in the semantic network is a “concept” Each “concept” is expressed by different words, the “” The sets, or “synsets”, are the building blocks of WordNet For example {car, automobile, auto, motocar} {queue, line} {beat, hit, strike}

4 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Synonymy

Synset members are unordered All relate to the same concept Not taken into account: frequency, genre etc.

5 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Polysemy

One word has multiple meanings {bank, financial institution } {bank, river side } {bank, furniture } A word that appears in n synsets is n-fold polysemous bank is therefore 3-fold polysemous (has three senses)

6 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up The “Net” in WordNet

Synsets are connected via different relations Semantic relations are expressed by bi-directional arcs Different kinds of relations: hypernymy/hyponymy meronymy/holynymy antonymy Result: large semantic network Single top root note Entity, with about a dozen high-level concepts (Allows programs to compute the similarity between words)

7 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up WordNet structure

Actually, WordNet consists of four separate networks, one for each part of speech 1 nouns 2 verbs 3 adjectives 4 adverbs Not many relations between these sub-networks

8 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Hypernymy and hyponymy

Relates more/less general concepts of nouns

vehicle .& car, automobile bicycle, bike . & ↓ convertible SUV mountain bike

“A car is a kind of vehicle” “The class of vehicles includes cars and bikes” “A car is a kind of vehicle.”, “An SUV is a kind of car.” →“An SUV is a kind of vehicle.” Noun relations can have up to 16 levels

9 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Meronymy and holynymy

Part-whole relation of nouns car, automobile .& engine wheel . & ↓ spark plug cylinder wheel nut

“An engine has spark plugs.” “Spark plugs and cylinders are part of an engine.”

10 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Meronymy and holynymy

WordNet distinguishes three kinds of meronymy Proper parts arm – body, page – book, branch – tree Substance – stuff oxygen – water, flour – pizza Member – Group student – class, tree – forest, flock – bird But there are many more of meronymy relations...

11 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up WordNet

12 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Antonymy

Adjective relation For example Hot – cold Long – short New – old Wide – narrow Other antonyms may be less similar Hot – cool Long – lengthy New – acient

13 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Troponymy

“Manner relation” between verbs For example to walk – to move to whisper – to talk to gobble – to eat Trees can be constructed (although not as deep as for nouns) to move – to run – to jog to communicate – to talk – to whisper to gobble – to eat

14 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Miscellaneous relations

Other relations among verbs reflect a temporal or logical order

to divorce – to marry (backward presupposition) to snore – to sleep, to pay – to buy (inclusion) to kill – to die (cause)

15 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Demo

http://wordnet.princeton.edu/

16 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Agenda

1 WordNet

2 Word Sense Disambiguation (wsd)

3 Wrap-up

17 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Thanks

Slides are based on Jurafsky & Martin (2004, chapter 20)

18 / 38 Potentially helpful in many nlp tasks, e.g. machine translation, question-answering, information retrieval..

WordNet Word Sense Disambiguation (wsd) Wrap-up Overview

Word sense disambiguation The task of selecting the correct sense for a word in .

19 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Overview

Word sense disambiguation The task of selecting the correct sense for a word in context.

Potentially helpful in many nlp tasks, e.g. machine translation, question-answering, information retrieval..

19 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Example

20 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up wsd algorithm

Basic form: Input: word in context, fixed set of word senses Output: the correct word sense for that use Context? Words surrounding the target word: annotated? just words and no particular order? context size?

21 / 38 bass smoked bass jazz bass player Window: e.g. 1-word window

WordNet Word Sense Disambiguation (wsd) Wrap-up Supervised wsd

Feature selection: Need to identify features that are predictive of verb senses Fundamental insight: look at the context words

22 / 38 smoked bass jazz bass player Window: e.g. 1-word window

WordNet Word Sense Disambiguation (wsd) Wrap-up Supervised wsd

Feature selection: Need to identify features that are predictive of verb senses Fundamental insight: look at the context words bass

22 / 38 jazz bass player Window: e.g. 1-word window

WordNet Word Sense Disambiguation (wsd) Wrap-up Supervised wsd

Feature selection: Need to identify features that are predictive of verb senses Fundamental insight: look at the context words bass smoked bass

22 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Supervised wsd

Feature selection: Need to identify features that are predictive of verb senses Fundamental insight: look at the context words bass smoked bass jazz bass player Window: e.g. 1-word window

22 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Supervised wsd

Method of feature selection: process the dataset (POS tagging, lemmatization, parsing) build feature representations encoding the relevant linguistic information two main feature types collocational features bag-of-words approach

23 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Collocational features

Features that take order or syntactic relation into account restricted to immediate word context (usually fixed window) Example: lemma and part of speech of two-word window syntactic function of the target word

24 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Example

(1) An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps.

25 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Bag-of-words features

lexical features pre-selected words that are potentially relevant for sense distinctions, e.g. for all-words task: frequent content words in the corpus for lexical sample task: content words in the sentences of the target word test for presence/absence of a certain word in the selected context

26 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Example

(2) An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps.

pre-selected words: [fishing, big, sound, player, fly] feature vector: [0, 0, 0, 1, 0]

27 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Evaluation

Measure: sense accuracy (percentage of words that are correctly tagged) Method: train-test methodology split the annotated corpus in test set and training set system is trained on the training set and evaluated on the test set Standardized datasets and methods: SensEval and SemEval competitions

28 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Evaluation

Baseline Baseline: performance we would get without much knowledge or with a simple approach necessary for any machine learning experiment Simplest baseline: most frequent sense WordNet: first sense (ordered senses) Very powerful baseline!

29 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Evaluation

Ceiling Ceiling or upper-bound performance: inter-annotator agreement

all-word corpora using WordNet: A0 around 0.75 - 0.8 more coarse-grained sense distinctions: A0 around 0.9

30 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Summary

Supervised approaches use sense-annotated datasets Need many annotated examples for every word Relevant information in the context lexico-syntactic information (collocational features) lexical information (bag of words features) information is encoded in the form of features and a classifier is trained to distinguish different senses of a given word

31 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Agenda

1 WordNet

2 Word Sense Disambiguation (wsd)

3 Wrap-up

32 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up This week

33 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up This week

1. What is the meaning of words? Approximation: with the help of resources like WordNet and FrameNet by abstraction across different part of speech operation noun and operate verb by an automatic disambiguation of the sense of a word

34 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up This week

2. How do entities take part in an event? Different ways to describe the roles of an event participant thematic roles (VerbNet) numbered roles (PropBank) frame elements (FrameNet)

35 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up This week

3. What event structure does a verb have? Some information contained in VerbNet Much more detail needed, e.g. E.g. commercial event with a number of subevents: 1 a buyer gives money and takes the goods 2 a seller gives the goods and takes the money

36 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Wrap-up

Central question of semantic nlp: Who did What to Whom, and How, When and Where?

Lexical resources are one building block of the automatic assembly of meaning

37 / 38 WordNet Word Sense Disambiguation (wsd) Wrap-up Wrap-up

Thank you!

38 / 38