Words, Lexicons and Ontologies

Words, Lexicons and Ontologies

HG8003 Technologically Speaking: The intersection of language and technology. Words, Lexicons and Ontologies Francis Bond Division of Linguistics and Multilingual Studies http://www3.ntu.edu.sg/home/fcbond/ [email protected] Lecture 4 Location: LT8 HG8003 (2014) Schedule Lec. Date Topic 1 01-16 Introduction, Organization: Overview of NLP; Main Issues 2 01-23 Representing Language 3 02-06 Representing Meaning 4 02-13 Words, Lexicons and Ontologies 5 02-20 Text Mining and Knowledge Acquisition Quiz 6 02-27 Structured Text and the Semantic Web Recess 7 03-13 Citation, Reputation and PageRank 8 03-20 Introduction to MT, Empirical NLP 9 03-27 Analysis, Tagging, Parsing and Generation Quiz 10 Video Statistical and Example-based MT 11 04-03 Transfer and Word Sense Disambiguation 12 04-10 Review and Conclusions Exam 05-06 17:00 ➣ Video week 10 Words, Lexicons and Ontologies 1 Review of Meaning Words, Lexicons and Ontologies 2 Review of Representing Meaning ➣ Three ways of defining meaning ➢ Attributional (Compositional) ➢ Relational ➢ Distributional ➣ the Syntax-Semantic Interface ➢ Usage ↽⇀ Meaning Words, Lexicons and Ontologies 3 Attributional Meaning ➣ Give a semantic description of word use in isolation of the categorisation of other lexical items ➢ definitions ➢ decompositional semantics (break down into primitives) ➣ Easy for humans to understand ➣ Hard to decide on sense boundaries (granularity: splitters vs. lumpers) ➣ Definitions are circular (the grounding problem) ➣ Hard to be consistent Words, Lexicons and Ontologies 4 Relational Meaning ➣ Capture correspondences between lexical items by way of a finite set of pre-defined semantic relations ➣ Methodologies: ➢ lexical relations ➢ constructional relations ➣ Captures many generalizations usefully ➣ Hard to make complete ➣ Leads to large, complex graphs Words, Lexicons and Ontologies 5 Distributional Meaning ➣ Capture word meanings as collections of contexts in which words appear ➢ n-grams ➢ syntactic relations ➢ sentences ➢ documents ➣ Good for synonymy, not so good for antonymy ➣ Computationally tractable Words, Lexicons and Ontologies 6 Why are dictionaries important? ➣ For humans ➢ find meaning of unknown words ➢ find more information about known words ➢ codify knowledge about word usage (glossaries) ➣ For machines ➢ store information about words ➢ link between text and knowledge Words, Lexicons and Ontologies 7 Words Words, Lexicons and Ontologies 8 Introduction to Words, Lexicons and Ontologies ➣ Design and implementation ➢ Machine Readable Dictionaries ➢ Morphological lexicons ➢ Syntactic lexicons ➢ Semantic lexicons ➢ Ontologies ➣ Construction and Maintenance ➢ Construction from scratch ➢ Boot-strapping from existing resources ➢ Ensuring consistency Words, Lexicons and Ontologies 9 Machine Readable Dictionaries (MRDs) ➣ Human dictionaries made available on machine ➢ Electronic Dictionaries ➢ Dictionary Applications ∗ often with automatic word lookup ➢ On-line dictionaries ∗ Sometimes with glosses Words, Lexicons and Ontologies 10 A typical entry definition (n) a concise explanation of the meaning of a word or phrase or symbol ➣ Headword: definition ➣ Part of Speech: n (noun) ➣ Definition: ➢ genus: explanation ➢ differentia: concise; of the meaning of a word or phrase or symbol ? Implied: countable (a), regular plural Words, Lexicons and Ontologies 11 Parts-of-Speech (POS) ➣ Traditional Grammar has eight: Noun, Verb, Adjective, Adverb (open class) Conjunction, Preposition, Pronoun, Interjection (closed class) ➣ In the US, the Penn Treebank POS set is de-facto standard: ➢ http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html ➢ 45 tags (including punctuation) ➣ In Europe, CLAWS tagset is popular ➢ http://ucrel.lancs.ac.uk/claws7tags.html ➢ 137 tags (without punctuation) Words, Lexicons and Ontologies 12 Penn Treebank Examples (14/45) Tag Description Tag Description NN Noun,singularormass VB Verb,baseform NNS Noun,plural VBD Verb,pasttense NNP Propernoun,singular VBG Verb,gerundorpresentparticiple NNPS Propernoun,plural VBN Verb,pastparticiple PRP Personalpronoun VBP Verb,non-3rdpersonsingularpresent IN Preposition VBZ Verb,3rdpersonsingularpresent TO to . SentenceFinalpunct(.,?,!) ➣ The tags include inflectional information ➢ If you know the tag, you can generally find the lemma ➣ Some tags are very specialized: I/PRP wanted/VBD to/TO go/VB ./. Words, Lexicons and Ontologies 13 Good Definitions ➣ a definition should be simpler than the word being explained ➣ the definition should match the part of speech definition (n) a concise explanation of the meaning of a word or phrasel define (v) – give a definition for the meaning of a word; “Define ‘sadness”’ ➣ the definition should not be circular ➣ all words in the definition should be defined (somewhere) ➢ prefer small defining vocabulary ➢ only use metalanguage (NSM: Natural Semantic Metalanguage) Words, Lexicons and Ontologies 14 Circular definitions beauty the state of being beautiful beautiful full of beauty bobcat a lynx lynx a bobcat Words, Lexicons and Ontologies 15 Other useful information http://en.wiktionary.org/wiki/lynx ➣ Pronunciation ➣ Usage Examples ➣ Illustrations ➣ Etymology (history of the word) ➣ Links to other resources Easier to do without the space restrictions of a paper dictionary. Words, Lexicons and Ontologies 16 Dictionaries for NLP Minimize content in order to minimize acquisition problem. Declarativity and human readability with compilation into a machine- friendly representation. Modularity so components are reusable: e.g. distinct monolingual and transfer lexicons in an MT system. Capture generalizations with inheritance (and lexical rules etc). Avoids errors, easier to maintain and expand. Underspecification to reduce disambiguation for a particular application. (Copestake, 1992) 17 Morphological Analysis 森 永 前 日 銀 総 裁 rin ei zen hi gin sou sai mori mae nichi morinaga zennichi gin sousai morinaga zen nichigin sousai ➣ 森永 前 日銀 総裁 Morinaga former Bank of Japan President Words, Lexicons and Ontologies 18 Morphological Lexicons ➣ Stem ➣ Inflectional Class ➣ Part of Speech (often 1-200) ➣ Arguments (?) ➣ For example ➢ Relations: 前 - 総裁 ➢ Arguments: 前(総裁) ➢ Abstraction: 前(title); 総裁 ⊂ title Words, Lexicons and Ontologies 19 Morphological Lexicon ➣ I fabricate for a living ➣ I make things for a living ➣ I fabricated yesterday ➣ I made things for a living ➣ These are differences in the inflectional class Words, Lexicons and Ontologies 20 Inflection ➣ Inflection: In many languages, words appear in different forms to show small differences in meaning: for example number (dog/dogs; child/children) or tense/aspect (make/made/making/made; take/took/taking/taken) ➣ Many words pattern the same way, this is called an inflectional class (or paradigm). For example, one class of plurals in English is words that end in y: fly/flies; sky/skies. ➣ The inflectional class is normally not predictable from the meaning or syntax; and so must be stored for each word ➣ The root form (lemma) and the inflected form have the same meaning modulo the number/tense/...and the same basic part-of-speech ➣ Normally a word only undergoes one inflection Words, Lexicons and Ontologies 21 Derivation ➣ New words can also be created by changing the form. If the part of speech or meaning changes, we call it derivation: (happy/happiness; happy/unhappy; happy/happily) ➣ You can also get zero derivation, where the meaning changes without a change in form (I butter the bread/I like butter. ➣ The root form in derivation is called the stem and the process of stripping off derivational affixes is stemming ➣ You can have multiple derivations: anti-dis-establish-ment-arian-ism: the stem is establish ➣ Derivation is largely but not entirely productive: employer, teacher, *studier, actor, contractor Words, Lexicons and Ontologies 22 Syntactic Lexicon ➣ I fabricated the results ➣ I made up the results ➣ = I made the results up ➣ I walked down the road ➣ 6= I walked the road down ➣ These are differences in the syntactic lexical type Words, Lexicons and Ontologies 23 Differences in Argument Structure ➣ These are also differences in the syntactic lexical type ➢ I gave the book to him ➢ I gave him the book ➢ Cats eat mice ➢ Cats eat ➢ Cats devour mice ➢ *Cats devour mice ➣ The information about what arguments a verb can take is also called subcategorization, valence or argument frame Words, Lexicons and Ontologies 24 Semantic Lexicon ➣ I deposited the money in the bank (financial) ➣ The river overflowed its bank (riverside) ➣ I had lunch by the bank (???) ➣ These are differences in the semantic class Words, Lexicons and Ontologies 25 All the possibilities combine ➣ I saw her duck ➢ see/saw, saw/sawed ➢ duckN , duckV ➢ duckN:cloth, duckN:bird ➣ Still useful to keep separate ➢ inflectional paradigm ➢ arguments (subcategorization) ➢ semantics (selectional preferences) Words, Lexicons and Ontologies 26 Transfer Lexicons ➣ bank ↔ 銀行 ginkou ➣ bank ↔ 土手 dote ➣ 鼻 hana ↔ nose ➢ trunk [of elephant] ➢ muzzle [of horse] ➢ snout [of boar] Words, Lexicons and Ontologies 27 Dictionaries in Processing ➣ Lexical lookup is slow (disk-based) ➢ Compile dictionaries into compressed format ➢ Index ➢ Cache the index cache = load it into memory ➢ Cache already accessed entries ➢ Keep a list of frequent entries and cache them the most frequent words are very frequent ➣ Batch check for consistency off-line Words, Lexicons and Ontologies 28 Dictionaries and Intellectual Property Rights (IPR) ➣ Lexicography has along tradition of extending other’s work ➢ Johnson, Murray, . ➢ Language itself should not be restricted

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    107 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us