CS460/626 : Natural Language Processing/Speech, NLP and the Web

CS460/626 : Natural Language Processing/Speech, NLP and the Web Lecture 24, 25, 26 Wordnet Pushpak Bhattacharyya CSE Dept., IIT Bombay 17th and 19th (morning and night), 2013 NLP Trinity Problem Semantics NLP Trinity Parsing Part of Speech Tagging Morph Analysis Marathi French HMM Hindi English Language CRF MEMM Algorithm NLP Layer Discourse and Corefernce Increased Semantics Extraction Complexity Of Processing Parsing Chunking POS tagging Morphology Background Classification of Words Word Content Function Word Word Verb Noun Adjective Adverb Prepo Conjun Pronoun Interjection sition ction NLP: Thy Name is Disambiguation A word can have multiple meanings and A meaning can have multiple words Word with multiple meanings Where there is a will, Where there is a will, There are hundreds of relatives Where there is a will There is a way There are hundreds of relatives A meaning can have multiple words Proverb “A cheat never prospers” Proverb: “A cheat never prospers but can get rich faster” WSD should be distinguished from structural ambiguity Correct groupings a must … Iran quake kills 87, 400 injured When it rains cats and dogs run for cover Should be distinguished from structural ambiguity Correct groupings a must … Iran quake kills 87, 400 injured When it rains, cats and dogs runs for cover When it rains cats and dogs, run for cover Groups of words (Multiwords) and names can be ambiguous Broken guitar for sale, no strings attached (Pun) Washington voted Washington to power pujaa ne pujaa ke liye phul todaa (Pujaa plucked flowers for worship) (deep world knowledge) The use of a shin bone is to locate furniture in dark room Stages of processing Phonetics and phonology Morphology Lexical Analysis Syntactic Analysis Semantic Analysis Pragmatics Discourse Example of WSD Operation, surgery, surgical operation, surgical procedure, surgical process -- (a medical procedure involving an incision with instruments; performed to repair damage or arrest disease in a living body; "they will schedule the operation as soon as an operating room is available"; "he died while undergoing surgery") TOPIC->(noun) surgery#1 Operation, military operation -- (activity by a military or naval force (as a maneuver or campaign); "it was a joint operation of the navy and air force") TOPIC->(noun) military#1, armed forces#1, armed services#1, military machine#1, war machine#1 Operation -- ((computer science) data processing in which the result is completely specified by a rule (especially the processing that results from a single instruction); "it can perform millions of operations per second") TOPIC->(noun) computer science#1, computing#1 mathematical process, mathematical operation, operation -- ((mathematics) calculation by mathematical methods; "the problems at the end of the chapter demonstrated the mathematical processes involved in the derivation"; "they were learning the basic operations of arithmetic") TOPIC->(noun) mathematics#1, math#1, maths#1 IS WSD NEEDED IN LARGE APPLICATIONS? Word ambiguitytopic drift in IR {case, container} Drifted topic due to inapplicable sense!!! Query word: “Madrid bomb blast case” {case, suit, lawsuit} Drifted topic due to expanded term!!! {suit, apparel} 50 Our observations 45 43.75 43.75 On error Percentages 40 Due to various Factors 35 CLEF 2007 31.25 30 Transliteration 25 Translation Disambiguation 25 Stemmer Dictionary Error Percentage Error 20 18.75 Ranking 15 12.5 10 6.25 5 0 0 Hindi-English Marathi-English How about WSD and MT? Zaheer Khan, the India fast भारत के तेज गदबाज, जहर खान, इंलड bowler, has been ruled out of the के खलाफ ृ ंखला के शेष के बाहर शासन remainder of the series against कया गया है. (ruled in the England. administrative sense??) He will return to India and will be वह भारत लौटने और बाएँ हाथ के तेज replaced by left-arm seamer RP गदबाज आरपी संह वारा तथापत Singh. कया जाएगा. जहर लॉस म पहले टेट के दौरान Zaheer picked up a hamstring (lifted??) injury during the first Test at हैमिंग चोट उठाया. Lord's. वह भारत क वेट इंडीज म हाल ह म He had been withdrawn from the एक सह (correct??) टखने क चोट के squad for India's recent Test series कारण टेट ृ ंखला के लए टम से वापस in the West Indies due to a right ले लया गया था. ankle injury. Wordnet Psycholinguistic Theory Human lexical memory for nouns as a hierarchy. Can canary sing? - Pretty fast response. Can canary fly? - Slower response. Does canary have skin? – Slowest response. Animal (can move, has skin) Bird (can fly) canary (can sing) Wordnet - a lexical reference system based on psycholinguistic theories of human lexical memory. Essential Resource for WSD: Wordnet Word Forms Word Meanings F1 F2 F3 … Fn (bank) (rely) (depend) M E 1 E1,2 1,3 E1,1 (embankme (bank) nt) M 2 E E2,2 2,… (bank) M E 3 3,2 E3,3 … … M m Em,n Wordnet: History The first wordnet in the world was for English developed at Princeton over 15 years. The Eurowordnet- linked structure of European language wordnets was built in 1998 over 3 years with funding from the EC as a a mission mode project. Wordnets for Hindi and Marathi being built at IIT Bombay are amongst the first IL wordnets. All these are proposed to be linked into the IndoWordnet which eventually will be linked to the English and the Euro wordnets. Basic Principle Words in natural languages are polysemous. However, when synonymous words are put together, a unique meaning often emerges. Use is made of Relational Semantics. Lexical and Semantic relations in wordnet 1. Synonymy 2. Hypernymy / Hyponymy 3. Antonymy 4. Meronymy / Holonymy 5. Gradation 6. Entailment 7. Troponymy 1, 3 and 5 are lexical (word to word), rest are semantic (synset to synset). WordNet Sub-Graph Hyponymy Dwelling,abode Hypernymy Meronymy kitchen Hyponymy bckyard bedroom M e r house,home Gloss o veranda A place that serves as the living n Hyponymy y quarters of one or mor efamilies m y study guestroom hermitage cottage Fundamental Design Question Syntagmatic vs. Paradigmatic relations? Psycholinguistics is the basis of the design. When we hear a word, many words come to our mind by association. For English, about half of the associated words are syntagmatically related and half are paradignatically related. For cat animal, mammal- paradigmatic mew, purr, furry- syntagmatic Stated Fundamental Application of Wordnet: Sense Disambiguation Determination of the correct sense of the word The crane ate the fish vs. The crane was used to lift the load bird vs. machine The problem of Sense tagging Given a corpora To Assign correct sense to the words. This is sense tagging. Needs Word Sense Disambiguation (WSD) Highly important for Question Answering, Machine Translation, Text Mining tasks. Classification of Words Word Content Function Word Word Verb Noun Adjective Adverb Prepo Conjun Pronoun Interjection sition ction Example of sense marking: its need एक_4187 नए शोध_1138 के अनुसार_3123 िजन लोग_1189 का सामािजक_43540 जीवन_125623 यत_48029 होता है उनके दमाग_16168 के एक_4187 हसे_120425 म अधक_42403 जगह_113368 होती है। (According to a new research, those people who have a busy social life, have larger space in a part of their brain). नेचर यूरोसाइंस म छपे एक_4187 शोध_1138 के अनुसार_3123 कई_4118 लोग_1189 के दमाग_16168 के कैन से पता_11431 चला क दमाग_16168 का एक_4187 हसा_120425 एमगडाला सामािजक_43540 यतताओं_1438 के साथ_328602 सामंजय_166 के लए थोड़ा_38861 बढ़_25368 जाता है। यह शोध_1138 58 लोग_1189 पर कया गया िजसम उनक उ_13159 और दमाग_16168 क साइज़ के आँकड़े_128065 लए गए। अमरक_413405 टम_14077 ने पाया_227806 क िजन लोग_1189 क सोशल नेटवकग अधक_42403 है उनके दमाग_16168 का एमगडाला वाला हसा_120425 बाक_130137 लोग_1189 क तुलना_म_38220 अधक_42403 बड़ा_426602 है। दमाग_16168 का एमगडाला वाला हसा_120425 भावनाओं_1912 और मानसक_42151 िथत_1652 से जुड़ा हु आ माना_212436 जाता है। Ambiguity of लोग (People) लोग, जन, लोक, जनमानस, पिलक - एक से अधक यित "लोग के हत म काम करना चाहए" (English synset) multitude, masses, mass, hoi_polloi, people, the_great_unwashed - the common people generally "separate the warriors from the mass" "power to the people" दुनया, दुनयाँ, संसार, वव, जगत, जहाँ, जहान, ज़माना, जमाना, लोक, दुनयावाले, दुनयाँवाले, लोग - संसार म रहने वाले लोग "महामा गाँधी का समान पूर दुनया करती है / म इस दुनया क परवाह नहं करता / आज क दुनया पैसे के पीछे भाग रह है" (English synset) populace, public, world - people in general considered as a whole "he is a hero in the eyes of the public” Basic Principle Words in natural languages are polysemous. However, when synonymous words are put together, a unique meaning often emerges. Use is made of Relational Semantics. Componential Semantics where each word is a bundle of semantic features (as in the Schankian Conceptual Dependency system or Lexical Componential Semantics) is to be examined as a viable alternative. Componential Semantics Consider cat and tiger. Decide on componential attributes. Furry Carnivorous Heavy Domesticable For cat (Y, Y, N, Y) For tiger (Y,Y,Y,N) Complete and correct Attributes are difficult to design. Semantic relations in wordnet 1. Synonymy 2. Hypernymy / Hyponymy 3. Antonymy 4. Meronymy / Holonymy 5. Gradation 6. Entailment 7. Troponymy 1, 3 and 5 are lexical (word to word), rest are semantic (synset to synset). Synset: the foundation (house) 1.

CS460/626 : Natural Language Processing/Speech, NLP and the Web

The Generative Lexicon

1. Introduction

Machine Reading = Text Understanding! David Israel! What Is It to Understand a Text? !

Introduction to Wordnet: an On-Line Lexical Database

Leveraging Morpho-Semantics for the Discovery of Relations in Chinese Wordnet

Automatic Labeling of Troponymy for Chinese Verbs

Part 4: Lexical Semantics 2

Ontology and the Lexicon

Lexical Scoring System of Lexical Chain for Quranic Document Retrieval

WN-EWN Relations

University of Hagen at CLEF 2004: Indexing and Translating Concepts for the GIRT Task

Learning Antonyms with Paraphrases and A