<<

3/1/2009

CS626-460: Language Wordnet Technology for the Web/Natural • A lexical knowledgebase based on Language Processing conceptual lookup • Organizing concepts in a . Pushpak Bhattacharyya • Organize lexical information in terms of meaning, rather than word CSE Dept., form IIT Bombay • Wordnet can also be used as a . Lecture 8: Wordnet elaborated

Psycholinguistic Theory Lexical Matrix • Human lexical memory for nouns as a hierarchy. • Can canary sing? - Pretty fast response. • Can canary fly? - Slower response. • Does canary have skin? – Slowest response.

Animal (can move, has skin)

Bird (can fly)

canary (can sing)

Wordnet - a lexical reference system based on psycholinguistic theories of human lexical memory.

Wordnet Fundamental Design Question

• Wordnet is a network of linked by lexical and • Syntagmatic vs. Paradigmatic realtions? semantic relations. • Psycholinguistics is the basis of the design. • The first in the world was for English developed at Princeton over 15 years. • When we hear a word, many words come to our • The Eurowordnet- linked structure of European language mind by association. was built in 1998 over 3 years with funding • For English, about half of the associated words from the EC as a a mission mode project. are syntagmatically related and half are • Wordnets for Hindi and Marathi being built at IIT paradignatically related. Bombay are amongst the first IL wordnets. For cat • All these are proposed to be linked into the • IndoWordnet which eventually will be linked to the – animal, mammal- paradigmatic English and the Euro wordnets. – mew, purr, furry- syntagmatic

CFILT, IIT Bombay 1 3/1/2009

Syntagma Syntagmatic Relationship • A syntactic environment. • Can be •The dog barked all night. – a phrase • Syntagmatic relationship is co occurrence – a sentence relations resulting from common syntactic – a paragraph structure. – a chapter or • Here dog and bark come in many – a document. syntagmas together. So they have a very strong syntagmatic relation. • Generally limited to phrase or sentence.

Stated Fundamental Application of Wordnet: Sense Disambiguation The problem of Sense tagging

• Given a corpora To Assign correct Determination of the correct sense of the sense to the words. word • This is sense tagging. Needs Word Sense The crane ate the fish vs. Disambiguation (WSD) The crane was used to lift the load • Highly important for Question bird vs. machine Answering, Machine , Text Mining tasks.

Basic Principle Componential

• Words in natural languages are polysemous. • Consider cat and tiger. Decide on componential • However, when synonymous words are put attributes. together, a unique meaning often emerges. Use is made of Relational Semantics. • Furry Carnivorous Heavy Domesticable • Componential Semantics where each word is a bundle of semantic features (as in the • For cat (Y, Y, N, Y) Schankian Conceptual Dependency system or • For tiger (Y,Y,Y,N) Lexical Componential Semantics) is to be Complete and correct examined as a viable alternative. Attributes are difficult to design.

CFILT, IIT Bombay 2 3/1/2009

Synset: the foundation Semantic relations in wordnet (house)

1. house -- (a dwelling that serves as living quarters for one or more families; "he has a house on 1. Synonymy Cape Cod"; "she felt she had to get out of the house") 2. house -- (an official assembly having legislative powers; "the legislature has two houses") Hypernymy / Hyponymy 3. house -- (a building in which something is sheltered or located; "they had a large carriage house") 2. 4. family, household, house, home, menage -- (a social unit living together; "he moved his family to Virginia"; "It was a good Christian household"; "I waited until the whole house was asleep"; "the 3. Antonymy teacher asked how many people made up his home") 5. theater, theatre, house -- (a building where theatrical performances or motion-picture shows can be presented; "the house was full") 4. Meronymy / Holonymy 6. firm, house, business firm -- (members of a business organization that owns or operates one or more establishments; "he worked for a brokerage house") 5. Gradation 7. house -- (aristocratic family line; "the House of York") 8. house -- (the members of a religious community living together) 9. house -- (the audience gathered together in a theatre or cinema; "the house applauded"; "he 6. Entailment counted the house") 10. house -- (play in which children take the roles of father or mother or children and pretend to interact like adults; "the children were playing house") 7. 11. sign of the zodiac, star sign, sign, mansion, house, planetary house -- ((astrology) one of 12 equal areas into which the zodiac is divided) 1, 3 and 5 are lexical (word to word), rest are 12. house -- (the management of a gambling house or casino; "the house gets a percentage of every semantic (synset to synset). bet")

Creation of Synsets Synset creation (continued)

Three principles: Home John’s home was decorated with lights on the occasion of Christmas. •Minimality Having worked for many years abroad, John Returned home. •Coverage •Replacability House John’s house was decorated with lights on the occasion of Christmas. Mercury is situated in the eighth house of John’s horoscope.

Synsets (continued) Synset creation

{house} is ambiguous. From first principles {house, home} has the sense of a social unit living together; – Pick all the senses from good standard Is this the minimal unit? . {family, house , home} will make the unit completely – Obtain for each sense. unambiguous. – Needs hard and long hours of work. For coverage: {family, household, house, home} ordered according to frequency.

Replacability of the most frequent words is a requirement.

CFILT, IIT Bombay 3 3/1/2009

Synset creation (continued) Gloss and Example

From the wordnet of another language in the same family Crucially needed for concept explication, wordnet building using another wordnet and wordnet linking. – Pick the synset and obtain the sense from the gloss. – Get the words of the target language. {earthquake, quake, temblor, seism} -- (shaking and vibration at the surface of the earth resulting from underground movement along a – Often same words can be used- especially for t%sama fault plane of from volcanic activity) words. – Translation, Insertion and deletion. Hindi Synset: {BaUkMp, BaUcaala, BaUDaola}; pRqvaIko pRYzBaagaka ihlanaa; gaujaratmaoM hue BaUkMpmaoM Hindi Synset: AnauBavaI jaanakar maMjaa huAa Anaok laaoga maaro gayao. (experienced person) (shaking of the surface of earth; many were killed in the earthquake in Marathi Synset: AnauBavaI t& jaaNata &ata Gujarat) Marathi Synset: {BaUkMp, QarNaIkMp}; pRqvaIcaa pRYzBaaga halaNyaacaI ik/yaa; gaujaraqamaQyao Jaalaolyaa BaUkMpat Anaok laaok maarlao gaolao.

Semantic Relations Semantic Relations (continued)

• Hypernymy and Hyponymy • – Relation between word senses (synsets) – Part-whole relation, branch is a part of tree – X is a hyponym of Y if X is a kind of Y – X is a meronymy of Y if X is a part of Y – Hyponymy is transitive and asymmetrical – Holonymy is the inverse relation of Meronymy – Hypernymy is inverse of Hyponymy {kitchen} ………………………. {house} (lion->animal->animate entity->entity)

Lexical Relation Troponym and Entailment

• Antonymy • Entailment – Oppositeness in meaning {snoring – sleeping} – Relation between word forms – Often determined by phonetics, word length • Troponym etc. ({rise, ascend} vs. {fall, descend}) {limp, strut – walk} {whisper – talk}

CFILT, IIT Bombay 4 3/1/2009

Entailment. Opposition among verbs.

Snoring entails sleeping. • {Rise,ascend} {fall,descend} Buying entails paying. Tie-untie (do-undo) Walk-run (slow,fast) • Proper Temporal Inclusion. Teach-learn (same activity different perspective) Rise-fall (motion upward or downward) Inclusion can be in any way. Sleeping temporally includes snoring. • Opposition and Entailment. Buying temporally includes paying. Hit or miss (entail aim) . Backward presupposition. Succeed or fail (entail try.) • Co-extensiveness. (Troponymy) Limping is a manner of walking.

The causal relationship.

Show- see. Give- have.

Causation and Entailment. Giving entails having. Feeding entails eating.

Kinds of Antonymy Kinds of Meronymy

Size Small - Big Component-object Head - Body Quality Good – Bad Staff-object Wood - Table State Warm – Cool Member-collection Tree - Forest Personality Dr. Jekyl- Mr. Hyde Feature-Activity Direction East- West Speech - Conference Action Buy – Sell Place-Area Palo Alto - California Amount Little – A lot Phase-State Youth - Life Place Far – Near Resource-process Pen - Writing Time Day - Night Actor-Act Physician - Gender Boy - Girl Treatment

CFILT, IIT Bombay 5 3/1/2009

Gradation WordNet Sub-Graph

Hyponymy

State Childhood, Youth, Old Dwelling,abode Hypernymy Meronymy age kitchen Hyponymy

bckyard bedroom Temperature Hot, Warm, Cold M e r house,home Gloss o veranda n A place that serves as the living Hyponymy y quarters of one or mor efamilies m Action Sleep, Doze, Wake y study

guestroom cottage hermitage

Metonymy Insight from Sanskritic Tradition

• Associated with Metaphors which are • Power of a word epitomes of semantics – Abhidha, Lakshana, Vyanjana • Oxford Advanced Learners • Meaning of Hall: definition: “The use of a word or phrase to – The hall is packed (avidha) mean something different from the literal – The hall burst into laughing (lakshana) meaning” – The Hall is full (unsaid: and so we cannot • Does it mean Careless Usage?! enter) (vyanjana)

Metaphors in Indian Tradition Upamana, rupak, atishayokti

• upamana and upameya • upamana: Explicit comparison – Former: object being compared – Puru was like a lion in the battle with Alexander – Latter: object being compared with rupak: Implicit comparison – Puru was like a lion in the battle with • Alexander (Puru: upameya; Lion: upamana) – Puru was a lion in the battle with Alexander • Atishayokti (exaggeration): upamana and upameya dropped – Puru’s army fled. But the lion fought on.

CFILT, IIT Bombay 6 3/1/2009

Modern study (1956 onwards, Interaction of semantic fields Richards et. al.) (Haas)

• Three constituents of metaphor • Core vs. peripheral semantic fields – Vehicle (items used metaphorically) – Tenor (the metaphorical meaning of the former) • Interaction of two words in metonymic – Ground (the basis for metaphorical extension) relation brings in new semantic fields with • “The foot of the mountain” selective inclusion of features – Vehicle: :foot” – Tenor: “lower portion” • Leg of a table – Ground: “spatial parallel between the relationship between the foot to the human body and the lower – Does not stretch or move portion of the mountain with the rest of the – Does stand and support mountain”

Mapping Relations: ontological Lakoff’s (1987) contribution correspondences

• Source Domain • Anger is heat of fluid Heat Anger in container (i) Container Body • Target Domain (ii) Agitation of Agitation of fluid mind • Mapping Relations (iii) Limit of Limit of ability resistence to suppress (iv) Explosion Loss of control

Image Schemas Patterns of Metonymy

• Categories: Container Contained • Container for contained • Quantity – The kettle boiled (water) – More is up, less is down: Outputs rose dramatically; accidents rates were lower • Possessor for possessed/attribute – Linear scales and paths: Ram is by far the best – Where are you parked? (car) performer • Time • Represented entity for representative – Stationary event: we are coming to exam time – The government will announce new targets – Stationary observer: weeks rush by • Whole for part • Causation: desperation drove her to extreme steps – I am going fill up the car with petrol

CFILT, IIT Bombay 7 3/1/2009

Patterns of Metonymy (contd) Purpose of Metonymy

• Part for whole • More idiomatic/natural way of expression – More natural to say the kettle is boiling as opposed to – I noticed several new faces in the class the water in the kettle is boiling • Place for institution • Economy – Room 23 is answering (but not *is asleep) – Lalbaug witnessed the largest Ganapati • Ease of access to referent – He is in the phone book (but not *on the back of my hand) Question: Can you have part-part metonymy • Highlighting of associated relation – The car in the front decided to turn right (but not *to smoke a cigarette)

Feature sharing not necessary Proverbs

• In a restaurant: • Describes a specific event or state of – Jalebii ko abhi dudh chaiye (no feature affairs which is applicable metaphorically sharing) to a range of events or states of affairs – The elephant now want some coffee (feature provided they have the same amount of sharing) sufficiently similar image-schematic structure

12/05/08 WSD – Problem Definition

• Obtain the sense of CFILT – A set of target words, or of WSD APPROACHES – All words (all word WSD, more difficult) • Against a – Sense repository (like the wordnet), or – A thesaurus (not same as wordnet, does not have semantic relations) • Using the – Context in which the word appears.

CFILT, IIT Bombay 8 3/1/2009

12/05/08

Example word: operation Topics to be covered

 Knowledge Based Approaches • operation ((computer science) data processing in which the result is completely  WSD using Selectional Preferences (or restrictions)‏

specified by a rule (especially the processing that results from a single instruction)) CFILT "it can perform millions of operations per second"  Overlap Based Approaches

CFILT -

• operation, military operation (activity by a military or naval force (as a maneuver  Machine Learning Based Approaches IITB or campaign)) "it was a joint operation of the navy and air force"  Supervised Approaches operation, surgery, surgical operation, surgical procedure, surgical process (a •  Semi-supervised Algorithms medical procedure involving an incision with instruments; performed to repair damage or arrest disease in a living body) "they will schedule the operation as soon  Unsupervised Algorithms as an operating room is available"; "he died while undergoing surgery"  Hybrid Approaches • mathematical process, mathematical operation, operation ((mathematics) calculation by mathematical methods) "the problems at the end of the chapter demonstrated the mathematical processes involved in the derivation"; "they were learning the basic operations of arithmetic"

50

KNOWLEDEGE BASED v/s MACHINE LEARNING BASED v/s HYBRID APPROACHES Topics to be covered  Knowledge Based Approaches  Knowledge Based Approaches  Rely on knowledge resources like WordNet, Thesaurus  WSD using Selectional Preferences (or restrictions)‏ etc.  Overlap Based Approaches CFILT

 May use grammar rules for disambiguation. -  Machine Learning Based Approaches IITB  May use hand coded rules for disambiguation.  Supervised Approaches  Semi-supervised Algorithms  Machine Learning Based Approaches  Unsupervised Algorithms  Rely on corpus evidence.  Hybrid Approaches  Train a model using tagged or untagged corpus.  Probabilistic/Statistical models.  Hybrid Approaches Use corpus evidence as well as semantic relations form WordNet. 51 52

SELECTIONAL PREFERENCES SELECTIONAL PREFERENCES (INDIAN TRADITION) (RECENT LINGUISTIC THEORY) • “Desire” of some words in the sentence (“aakaangksha”). • There are words which demand arguments, like, verbs, • I saw the boy with long hair. prepositions, adjectives and sometimes nouns. These • The verb “saw” and the noun “boy” desire an object here. CFILT arguments are typically nouns. CFILT • Arguments must have the property to fulfil the demand. They • “Appropriateness” of some other words in the sentence to must satisfy selectional preferences. fulfil that desire (“yogyataa”). • I saw the boy with long hair. – Example • The PP “with long hair” can be appropriately connected only to “boy” and • Give (verb) not “saw”.  agent – animate • In case, the ambiguity is still present, “proximity” (“sannidhi”)  obj – direct can determine the meaning.  obj – indirect • E.g. I saw the boy with a telescope. • I gave him the book • The PP “with a telescope” can be attached to both “boy” and “saw”, so • I gave him the book (yesterday in the school) -> adjunct ambiguity still present. It is then attached to “boy” using the proximity check. How does this help in WSD? 53 • 54 53 – One type of contextual information is the information about the 54 type of arguments that a word takes.

CFILT, IIT Bombay 9 3/1/2009

WSD USING SELECTIONAL PREFERENCES AND ARGUMENTS Topics to be covered

Sense 1 Sense 2  Knowledge Based Approaches

 This airlines serves dinner in  This airlines serves the sector  WSD using Selectional Preferences (or restrictions)‏ CFILT CFILT the evening flight. between Agra & Delhi. CFILT  Overlap Based Approaches

 serve (Verb)‏  serve (Verb)‏

- - IITB IITB  Machine Learning Based Approaches  agent  agent  Supervised Approaches  object – edible  object – sector  Semi-supervised Algorithms  Unsupervised Algorithms Requires exhaustive enumeration of:  Hybrid Approaches

Argument-structure of verbs.

Selectional preferences of arguments.

Description of properties of words such that meeting the selectional preference criteria can be decided. E.g. This flight serves the “region” between Mumbai and Delhi How do you decide if “region” is compatible with “sector” 55 56

OVERLAP BASED APPROACHES LESK’S ALGORITHM Sense Bag: contains the words in the definition of a candidate sense of the  Require a Machine Readable Dictionary (MRD). ambiguous word.

Context Bag: contains the words in the definition of each sense of each context

CFILT CFILT CFILT  Find the overlap between the features of different senses of an word.

E.g. “On burning coal we get ash.” -

ambiguous word (sense bag) and the features of the words in its -

IITB IITB context (context bag). Ash Coal

  These features could be sense definitions, example sentences,  Sense 1 Sense 1 Trees of the olive family with pinnate leaves, A piece of glowing carbon or burnt wood. hypernyms etc. thin furrowed bark and gray branches.  Sense 2  Sense 2 charcoal. The solid residue left when combustible  Sense 3  The features could also be given weights. material is thoroughly burned or oxidized. A black solid combustible substance formed  Sense 3 by the partial decomposition of vegetable To convert into ash matter without free access to air and under the  The sense which has the maximum overlap is selected as the influence of moisture and often increased pressure and temperature that is widely used contextually appropriate sense. as a fuel for burning 57 58 In this case Sense 2 of ash would be the winner sense.

WALKER’S ALGORITHM

 A Thesaurus Based approach.  Step 1: For each sense of the target word find the thesaurus category to which that sense belongs.  Step 2: Calculate the score for each sense by using the context words. A context words will add 1 to the score of the sense if the thesaurus category of the word matches that of the sense.

 E.g. The money in this bank fetches an interest of 8% per annum  Target word: bank  Clue words from the context: money, interest, annum, fetch

Sense1: Finance Sense2: Location Context words Money +1 0 add 1 to the sense when the topic of the Interest +1 0 word matches that Fetch 0 0 of the sense Annum +1 0 59 Total 3 0

CFILT, IIT Bombay 10