Automatic Labeling of Hypernymy-Troponymy Relation For

Total Page:16

File Type:pdf, Size:1020Kb

Automatic Labeling of Hypernymy-Troponymy Relation For Automatic Labeling of Hypernymy-Troponymy Relation for Chinese Verbs -文動^上下M關Âê動標記法 國Ë台c+Ä'xñ語xû©ë論文 Master Thesis Department of English, National Taiwan Normal University 指導Y授: 謝舒ñZë Advisor: Dr. Shu-Kai Hsieh 研v生: 羅巧Ê Student: Chiao-Shan Lo -華民國]Akt七月 July, 2009 Abstract 近t來,^Y²路(Wordnet)已成º計算語言xø關領域-最ºnM)(的Ç源K一,對¼Ç 訊¢"(Information Retrieval)或/ê6語言U理(Natural Language Processing)的|U有øv '的k©。^Y²路/1同©^Æ(Synset)以Ê^Y語意關Â(Lexical Semantic Relation)@ú Ë而成,例如以ñ語º;的n林¯頓^²(Princeton WordNet)、以ÊP合多個P2語言的P 語^²(EuroWordNet)I,úË皆已øv完善。6而,一個^²的úË&^一B一ºK力@能 完成,v@需要的º力以Ê耗»的B間øv可觀。因d,如U有H率&有ûq的úË一個^² /近t來研v致力的目標。而^Y間的語意關Â/Ë成一個^²的;要C ,因d,如Uê動 化的½取^Y語©關Â/úË^²的Í要e_K一。-研b語言@已úË一個以-;^º;的 -文^Y²路(Chinese WordNet, CWN),è(Ð供完t的-文­YK^©@分。6而,(目 M-文^Y²路ûq-,同©^Æ間ø互的語意關ÂC/¡(ºº$定標記,且這些標記Kx Ï尚*T成可L應(K一定規!。因d,,研vÐú一WJê動化的¹法來ê動標記^Y間的 語意關Â,,Ç論文針對動^K間的上下M^Y語意關Â(Hypernymy-troponymy relation), Ðú一.ê動標記的¹法,&½取w有-文上下M關ÂK-文動^D對。 ,Ç論文Ðúi.&LK¹法,,一,藉1句法上y定的句型(lexical syntactic pattern), ê動½取ú-文^Y²路-w有上下M關ÂK動^D。,二,我們)(bootstrapping的¹法, 透N-研búË的-ñ雙語^²(Sinica Bow)'Ï將n林¯頓ñ語^²-的語意關Â對 至- 文。實WP果o:,dûq能ë速&'Ï0ê動½取úw有上下M語意關ÂK-文動^D,, 論文盼能將d¹法應(¼c(|U-的-文^²ê動語意關Â標記,以Ê知X,體Kê動ú Ë,2而能有H率的úË完善的-文^Y知XÇ源。 關關關uuu^^^:::語語語©©©關關關ÂÂÂêêê動動動標標標記記記、、、動動動^^^^^^YYY語語語意意意、、、動動動^^^上上上下下下MMM關關關ÂÂÂ、、、---文文文^^^²²² Abstract WordNet-like databases have become crucial sources for lexical semantic studies and compu- tational linguistic applications such as Information Retrieval (IR) and Natural Language Pro- cessing (NLP). The fundamental elements of WordNet are synsets (the synonymous grouping of words) and semantic relations among synsets. However, creating such a lexical network is a time-consuming and labor-intensive project. In particular, for those languages with few re- sources such as Chinese, is even difficult. Chinese WordNet (CWN), which composed of mid- dle frequency words, has been launched by Academia Sinica based on the similar paradigm as Princeton WordNet. The synset that each word sense locates in CWN is manually la- beled. However, the lexical semantic relations among synsets in CWN are only partially con- structed and lack of systematic labeling. Therefore, in this thesis, two independent approaches were proposed to automatically harvesting lexical semantic relations, especially focused on the hypernymy-troponymy relation of verbs. This thesis describes two approaches for discovering hypernymy-troponymy relation among verbs. Syntactic pattern-based approach is used for that sentence structures can always denote relations and reveal information among lexical entries. Bootstrapping approach, on the other hand, aims at exploiting an already existing database and combining them within a common, standard framework. From a large scale of input data, our proposed approaches can greatly and rapidly extract verb pairs that are in hypernymy-troponymy relation in Chinese, aiding the construction of lexical database in a more effective way. In addition, it is hoped that these ap- proaches will shed light on the task of automatic acquisition of other Chinese lexical semantic relations and ontology learning as well. Key word: automatic extraction, lexical semantic relation, troponymy, Chinese WordNet i ACKNOWLEDGEMENTS B¼0了ë謝^的這一;,從開Ë撰ë論文沒多E我1一邊Ë思W謝^的g¹,因º一路 p來,實(有*多º要感謝了。ë論文/一段+wÈN¬的N程,這段N程-88G0各.挑 0與瓶8,=/讓我pÃ不已。不N很xK0,這一路上=/有1多º8ú援K,不論/xS 上的見ã,或/精^上的/持,都f予我«'的k©。(d,我要向這些ºññhT我1w的 感謝。 首H,我要感謝我的指導Y授,謝舒ñ老+。©二B,我擔û老+的©理,&且修了老+ (研v@開的Ï一堂²,對計算語言x的認X可ª/受0老+的_蒙,讓我¥ø0了語言x另 一個h新的領域。(ë論文的N程-,老+=/f我^8ê1的z間»|揮,&且對我Ðú的 疑O跟想法都f予ãT與/持,而謝老+沉i的個'_/最能安定ºÃ的力Ï,Ïv我因ºG 0瓶8而&躁不安B,老+=/有¦法不疾不徐0T©我ãz困ã。 另外,我_要感謝我的iMãf委á:台'外文û的高g明老+,以Ê?'ñ語û的~曉 ³老+。高老+(我i!ãfB,=/Ðú1多精闢的見ã,不論/(語言x¹b或/計算程 式¹b,都f了我很多很實(的úp,ãfP_後高老+更/±Ã0Ð供我需要的Ç源&回T 我的疑O。而~老+_/(~忙K-½zM來擔ûãf委á,儘¡如d,~老+還/(我的論 文á,ÆÆ»»0ë下y的意見&點ú論文的:點。感謝iM老+的k忙,這Ç論文M能完 成。 我_要感謝+'ÏM*秀的老+Ê同x,謝謝ÏM曾經YN我的老+,(+'的Ï一門² 都/既.實ÈP富,一點一滴/M我對語言x的知X以Ê撰ë論文的能力。還有班上*秀的同 x 們 ,Nancy, Caroline, David, Fu-Pin, Clara, and Jessica II , 雖 60了後來'¶因º工\或論文,各êª力很少見b,F/我們Í6會(z閒B¤換Ã得,f |d鼓õ。 ddK外,我要1w的感謝-研b語言@的程式-計+,N龍jH生。如果沒有`±Ã的 ii k忙,這Ç論文不可能完成。還要感謝-研b的!;、俞­以Ê淳涵,感謝`們(~忙K-> 下K邊的工\來T©我分析£上CF的語料。我_要感謝我的0Ë們: 7蓉、øK、徐\。 ë論文有多痛苦,真的要ëNM知S,}ª£些ã¬的日Pá有`們的j4,互吐苦4,ø互 勉勵。 最後,_/最Í要的,我要感謝我的¶º,感謝8½=/!條件的(背後/持我,不論/ (精^上或/iê上都f我«'的k©,還有¹¹不BN來的關Ã與O候,都讓我倍感©Ã。 謝謝`們一路j我pN來,/持我@Z的Ï一個z定,9以這,論文{f`們---- 我最愛 的¶º。 iii Contents 1 Introduction 1 1.1 Background . 1 1.2 Motivation . 3 1.3 Organization of the Thesis . 4 2 Related Works 5 2.1 WordNet-like Resources . 5 2.1.1 Princeton WordNet [31] . 6 2.1.2 EuroWordNet [45] . 7 2.1.3 Sinica Bow [23] . 8 2.1.4 Chinese WordNet [1] . 10 2.1.5 HowNet [14] . 12 2.2 Semantic Relations of Verbs . 13 2.2.1 Semantic Relations of Verbs in WordNet . 13 2.2.2 Semantic Relations of Verbs in EuroWordNet . 16 2.2.3 Other Relations of Verbs . 20 2.3 Troponymy . 22 2.3.1 Definition of Troponymy . 24 iv 2.3.2 Distinguishing Manner . 26 2.4 Automatic Discovery of Lexical Semantic Relation . 28 2.4.1 Lexico Syntactic Pattern–Based Approach . 29 2.4.2 Clustering-Based Approach . 32 2.4.3 Bootstrapping Approach . 33 2.5 Summary . 35 3 Methodology 37 3.1 Syntactic Pattern-Based Approach . 37 3.1.1 Database: Chinese WordNet . 37 3.1.2 Data Pre-processing . 39 3.1.3 Syntactic Patterns in Chinese . 41 3.1.4 Procedure . 42 3.2 Bootstrapping Approach . 44 3.2.1 Data Source . 46 3.2.2 Procedure . 48 3.3 Evaluation and Scoring . 49 3.3.1 Evaluation . 50 3.3.2 Scoring . 54 3.4 Summary . 55 4 Results and Error Analyses 56 4.1 Results from Syntactic Pattern- based Approach . 56 4.1.1 Error Analyses . 58 4.1.2 Interim Summary . 68 v 4.2 Results from Bootstrapping Approach . 69 4.2.1 Error Analyses . 70 4.3 Discussion . 81 4.3.1 Comparison of Two Approaches . 81 4.3.2 Comparison of the Results . 83 4.3.3 Comparison of the Error Types . 86 4.3.4 General Discussion . 89 4.4 Summary . 91 5 Conclusion 92 5.1 Summary of the Thesis . 92 5.2 Contribution . 94 5.3 Limitations of the Present Study and Suggestions for Future Work . 95 Appendix: A Programming Code 104 B Results from Syntactic Pattern-based Approach 107 C Results from Bootstrapping Approach 110 vi List of Tables 2.1 A finer-grained semantic relation among verbs. [9] . 21 2.2 Semantic relations of verbs in Wordnet, EuroWordNet and VerbOcean . 23 2.3 Three different types of Troponymy . 28 4.1 General results of syntactic pattern-based approach . 57 4.2 Error types and percentage . 59 4.3 Overall results from bootstrapping approach . 70 4.4 Non hypernymy-troponymy verb pairs (Total number of returned verb pairs= 11289) . 71 4.5 General comparison of syntactic pattern-based and bootstrapping approach . 82 4.6 Comparison of error types from results in two approaches . 86 4.7 General comparison of the two approaches . 89 vii List of Figures 2.1 The first two senses returned by CWN of the verb p ‘zao3, walk’ . 11 2.2 Four kinds of entailments among English verbs [31] . 15 2.3 Translation-mediated LSR Prediction (The complete model) . 33 2.4 Translation-mediated LSR Prediction (when translation equivalents are syn- onymous) . 34 3.1 Bootstrapping model . 45 3.2 Overall procedure of bootstrapping approach . 50 viii Chapter 1 Introduction 1.1 Background In recent years, there has been an increasing focus on the construction of lexical knowledge re- sources in the field of Natural Language Processing (NLP), such as Thesaurus, WordNets [31], EuroWordNet [45], FrameNet [6], HowNet [13], etc. Among these resources, Princeton Word- Net1, an electronic English lexical database, was started as an implementation of a psycholin- guistic model of the mental lexicon. In WordNet, English nouns, verbs, adjectives, and ad- verbs are organized into synonym sets, called synsets. Synsets in WordNet are connected with each other by various kinds of paradigmatic lexical semantic relations, such as Meronymy and Holonymy (between parts and wholes), Hypernymy and Hyponymy (between specific and more general synsets), etc. These relations act as pointers between synsets. Due to the seman- tic relation-based property, WordNet has been widely used to solve a variety of problems in the field of NLP and has sparked off most interest both in theoretical and applicational sides, such as Information Retrieval (IR), lexical acquisition, automatic extraction, Word Sense Dis- ambiguation (WSD), and so on. WordNet’s growing popularity has prompted the modeling and 1http://wordnet.princeton.edu 1 construction of wordnets in other languages and various domains as well. EuroWordNet [45], which aims to build a multilingual database for several European languages, is a successful example. To date, in the field of NLP applications, WordNet and EuroWordNet serve as very crucial sources and have become a standard norm in evaluating semantic relations. WordNet covers a large scale of sense-based English lexicons (206941 word-sense pairs 2). The extensive coverage of WordNet took immense labors and time. Further, semantic relations are unlimited, it takes years and intensive labors to steadily develop the scope and content. Consequently, there has been significant recent interest in finding methods to build a WordNet-like database in other languages with less efforts and time [5] [7] [9] [20] [21] [24] [28] [30] [32] [38] [39]. Lexical semantic relations among synsets are the foundations of a semantic network, but manually constructing all the relations is time-consuming and error-prone. Therefore, one of the most important steps toward efficiently constructing a WordNet-like database is to auto- matically extract lexical semantic relations.
Recommended publications
  • Creating Words: Is Lexicography for You? Lexicographers Decide Which Words Should Be Included in Dictionaries. They May Decide T
    Creating Words: Is Lexicography for You? Lexicographers decide which words should be included in dictionaries. They may decide that a word is currently just a fad, and so they’ll wait to see whether it will become a permanent addition to the language. In the past several decades, words such as hippie and yuppie have survived being fads and are now found in regular, not just slang, dictionaries. Other words, such as medicare, were created to fill needs. And yet other words have come from trademark names, for example, escalator. Here are some writing options: 1. While you probably had to memorize vocabulary words throughout your school years, you undoubtedly also learned many other words and ways of speaking and writing without even noticing it. What factors are bringing about changes in the language you now speak and write? Classes? Songs? Friends? Have you ever influenced the language that someone else speaks? 2. How often do you use a dictionary or thesaurus? What helps you learn a new word and remember its meaning? 3. Practice being a lexicographer: Define a word that you know isn’t in the dictionary, or create a word or set of words that you think is needed. When is it appropriate to use this term? Please give some sample dialogue or describe a specific situation in which you would use the term. For inspiration, you can read the short article in the Writing Center by James Chiles about the term he has created "messismo"–a word for "true bachelor housekeeping." 4. Or take a general word such as "good" or "friend" and identify what it means in different contexts or the different categories contained within the word.
    [Show full text]
  • Semantic Shift, Homonyms, Synonyms and Auto-Antonyms
    WALIA journal 31(S3): 81-85, 2015 Available online at www.Waliaj.com ISSN 1026-3861 © 2015 WALIA Semantic shift, homonyms, synonyms and auto-antonyms Fatemeh Rahmati * PhD Student, Department of Arab Literature, Islamic Azad University, Central Tehran Branch; Tehran, Iran Abstract: One of the important topics in linguistics relates to the words and their meanings. Words of each language have specific meanings, which are originally assigned to them by the builder of that language. However, the truth is that such meanings are not fixed, and may evolve over time. Language is like a living being, which evolves and develops over its lifetime. Therefore, there must be conditions which cause the meaning of the words to change, to disappear over time, or to be signified by new signifiers as the time passes. In some cases, a term may have two or more meanings, which meanings can be different from or even opposite to each other. Also, the semantic field of a word may be expanded, so that it becomes synonymous with more words. This paper tried to discuss the diversity of the meanings of the words. Key words: Word; Semantic shift; Homonym; Synonym; Auto-antonym 1. Introduction person who employed had had the intention to express this sentence. When a word is said in *Speaking of the language immediately brings the absence of intent to convey a meaning, it doesn’t words and meanings immediately to mind, because signify any meaning, and is meaningless, as are the they are two essential elements of the language. words uttered by a parrot.
    [Show full text]
  • The Generative Lexicon
    The Generative Lexicon James Pustejovsky" Computer Science Department Brandeis University In this paper, I will discuss four major topics relating to current research in lexical seman- tics: methodology, descriptive coverage, adequacy of the representation, and the computational usefulness of representations. In addressing these issues, I will discuss what I think are some of the central problems facing the lexical semantics community, and suggest ways of best ap- proaching these issues. Then, I will provide a method for the decomposition of lexical categories and outline a theory of lexical semantics embodying a notion of cocompositionality and type coercion, as well as several levels of semantic description, where the semantic load is spread more evenly throughout the lexicon. I argue that lexical decomposition is possible if it is per- formed generatively. Rather than assuming a fixed set of primitives, I will assume a fixed number of generative devices that can be seen as constructing semantic expressions. I develop a theory of Qualia Structure, a representation language for lexical items, which renders much lexical ambiguity in the lexicon unnecessary, while still explaining the systematic polysemy that words carry. Finally, I discuss how individual lexical structures can be integrated into the larger lexical knowledge base through a theory of lexical inheritance. This provides us with the necessary principles of global organization for the lexicon, enabling us to fully integrate our natural language lexicon into a conceptual whole. 1. Introduction I believe we have reached an interesting turning point in research, where linguistic studies can be informed by computational tools for lexicology as well as an appre- ciation of the computational complexity of large lexical databases.
    [Show full text]
  • 1. Introduction
    University of Groningen Specific language impairment in Dutch de Jong, Jan IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 1999 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): de Jong, J. (1999). Specific language impairment in Dutch: inflectional morphology and argument structure. s.n. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 25-09-2021 Specific Language Impairment in Dutch: Inflectional Morphology and Argument Structure Jan de Jong Copyright ©1999 by Jan de Jong Printed by Print Partners Ipskamp, Enschede Groningen Dissertations in Linguistics 28 ISSN 0928-0030 Specific Language Impairment in Dutch: Inflectional Morphology and Argument Structure Proefschrift ter verkrijging van het doctoraat in de letteren aan de Rijksuniversiteit Groningen op gezag van de Rector Magnificus, dr.
    [Show full text]
  • ELECTRONIC DICTIONARY Press Y to Select Alphabet Character Input Or Press N to Selecting a Menu Item 12 Select Japanese Input
    ELECTRONIC DICTIONARY Press Y to select alphabet character input or press N to Selecting a menu item 12 select Japanese input. PW-AC890 The date/time settings screen is displayed. Press メニュー . QUICK REFERENCE 1 13 Select the date items using or , and then enter “年” Use or to select a category menu item. (year), “月” (month) and “日” (day) (e.g. June 23th, 2009 → 2 Or, use the numeric keys to enter the category number to Layout 09 06 23) using the number buttons on the handwriting pad. select the item. Utility keys for Display(Main display) dictionaries / functions Confirm that the cursor is on “AM(午前)” or “PM(午後)”, The individual menu for the selected category menu item is displayed. / touch pad and then select one of them using or . In the individual menu, use or to select the content/ Library key 3 Selection keys Press , select the time items using or and then function and then press 検索/決定 . for contents / functions enter “時” (hour) and “分” (minute) (e.g. 9:00 → 09 00). Or, use the numeric keys ( 1 to 9 ) to enter the number Charge lamp Stylus holder(side) Confirm that the information entered is correct and press in front of the content/function ( 1 to 9 ). Global search keys 14 検索/決定 . The selected content/function screen is displayed. Power ON/OFF key The menu display appears. ● The selected content/function screen can also be selected by touching the relevant item on the category menu or the individual menu. Menu key Function key Selecting a content in the menu display Touch operations AC adapter connector (side) Character size (large/small) The PW-AC890 can be operated by touching the main screen with the stylus.
    [Show full text]
  • The History of the Creation of Lexicographic Dictionaries, Theoretical and Practical Ways of Development
    European Journal of Research Development and Sustainability (EJRDS) Available Online at: https://www.scholarzest.com Vol. 2 No. 3, March 2021, ISSN: 2660-5570 THE HISTORY OF THE CREATION OF LEXICOGRAPHIC DICTIONARIES, THEORETICAL AND PRACTICAL WAYS OF DEVELOPMENT Dilrabo Askarovna Ubaidova (Bukhara State University) Dilfuza Kamilovna Ergasheva (Bukhara State University) Article history: Abstract: Received: 20th February 2021 The article provides a historical analysis of the development of ideas about Accepted: 2th March 2021 lexicography in Russian linguistics. The authors come to reasonable conclusions Published: 20th March 2021 that 1) the term "lexicography" appeared in scientific and general use in the last third of the 19th century; 2) the content of the concept brought under this term developed in the direction from the applied aspect of this linguistic essence to the theoretical aspect and the totality of dictionaries of the given language; 3) in the last quarter of the XX century. lexicography is firmly entrenched in the science of language with the status of an autonomous branch of linguistics; 4) recently, she began to receive, in addition to the definition, a certain wider set of attributes. Keywords: vocabulary, lexicography, lexicology, lexicon, linguistic term, vocabulary practice, applied aspect, dictionaries, sociolexicography, typology of dictionaries As you know, the practice of compiling various kinds of dictionaries has a much longer history than linguistics as a science. Suffice it to recall Nighwanta, Amarakosa in Ancient India, Dictionaries of the Turkic languages of Mahmud Kozhgariy, Comparative dictionaries of all languages and dialects of Peter Pallas, etc. However, the theoretical understanding of this practice came to linguistics much later.
    [Show full text]
  • CS460/626 : Natural Language Processing/Speech, NLP and the Web
    CS460/626 : Natural Language Processing/Speech, NLP and the Web Lecture 24, 25, 26 Wordnet Pushpak Bhattacharyya CSE Dept., IIT Bombay 17th and 19th (morning and night), 2013 NLP Trinity Problem Semantics NLP Trinity Parsing Part of Speech Tagging Morph Analysis Marathi French HMM Hindi English Language CRF MEMM Algorithm NLP Layer Discourse and Corefernce Increased Semantics Extraction Complexity Of Processing Parsing Chunking POS tagging Morphology Background Classification of Words Word Content Function Word Word Verb Noun Adjective Adverb Prepo Conjun Pronoun Interjection sition ction NLP: Thy Name is Disambiguation A word can have multiple meanings and A meaning can have multiple words Word with multiple meanings Where there is a will, Where there is a will, There are hundreds of relatives Where there is a will There is a way There are hundreds of relatives A meaning can have multiple words Proverb “A cheat never prospers” Proverb: “A cheat never prospers but can get rich faster” WSD should be distinguished from structural ambiguity Correct groupings a must … Iran quake kills 87, 400 injured When it rains cats and dogs run for cover Should be distinguished from structural ambiguity Correct groupings a must … Iran quake kills 87, 400 injured When it rains, cats and dogs runs for cover When it rains cats and dogs, run for cover Groups of words (Multiwords) and names can be ambiguous Broken guitar for sale, no strings attached (Pun) Washington voted Washington to power pujaa ne pujaa ke liye phul todaa (Pujaa plucked
    [Show full text]
  • Lexicology and Lexicography
    LEXICOLOGY AND LEXICOGRAPHY 1. GENERAL INFORMATION 1.1.Study programme M.A. level (graduate) 1.6. Type of instruction (number of hours 15L + 15S (undergraduate, graduate, integrated) L + S + E + e-learning) 1.2. Year of the study programme 1st & 2nd 1.7. Expected enrollment in the course 30 Lexicology and lexicography Marijana Kresić, PhD, Associate 1.3. Name of the course 1.8. Course teacher professor 1.4. Credits (ECTS) 5 1.9. Associate teachers Mia Batinić, assistant elective Croatian, with possible individual 1.5. Status of the course 1.10. Language of instruction sessions in German and/or English 2. COURSE DESCRIPTION The aims of the course are to acquire the basic concepts of contemporary lexicology and lexicography, to become acquainted with its basic terminology as well as with the semantic and psycholinguistic foundations that are relevant for understanding problems this field. The following topics will be covered: lexicology and lexicography, the definition of 2.1. Course objectives and short words, word formation, semantic analysis, analysis of the lexicon, semantic relations between words (hyperonomy, contents hyponomy, synonymy, antonymy, homonymy, polysemy, and others), the structure of the mental lexicon, the micro- and macro structure of dictionaries, different types of dictionaries. Moreover, students will be required to conduct their own lexicographic analysis and suggest the lexicographic design of a selected lexical unit. 2.2. Course enrolment requirements No prerequisites. and entry competences required for the course
    [Show full text]
  • Introduction to Wordnet: an On-Line Lexical Database
    Introduction to WordNet: An On-line Lexical Database George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller (Revised August 1993) WordNet is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, and adjectives are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets. Standard alphabetical procedures for organizing lexical information put together words that are spelled alike and scatter words with similar or related meanings haphazardly through the list. Unfortunately, there is no obvious alternative, no other simple way for lexicographers to keep track of what has been done or for readers to ®nd the word they are looking for. But a frequent objection to this solution is that ®nding things on an alphabetical list can be tedious and time-consuming. Many people who would like to refer to a dictionary decide not to bother with it because ®nding the information would interrupt their work and break their train of thought. In this age of computers, however, there is an answer to that complaint. One obvious reason to resort to on-line dictionariesÐlexical databases that can be read by computersÐis that computers can search such alphabetical lists much faster than people can. A dictionary entry can be available as soon as the target word is selected or typed into the keyboard. Moreover, since dictionaries are printed from tapes that are read by computers, it is a relatively simple matter to convert those tapes into the appropriate kind of lexical database.
    [Show full text]
  • Leveraging Morpho-Semantics for the Discovery of Relations in Chinese Wordnet
    Leveraging Morpho-semantics for the Discovery of Relations in Chinese Wordnet Shu-Kai Hsieh Yu-Yun Chang Graduate Institute of Linguistics Graduate Institute of Linguistics National Taiwan University National Taiwan University Taipei, Taiwan Taipei, Taiwan [email protected] [email protected] Abstract data from the web (Cimiano et al., 2005), Semantic relations of different types have but runs the risk of influenced by the web played an important role in wordnet, and have genre (Alain, 2010). been widely recognized in various fields. In re- To enrich the relations coverage in Chinese cent years, with the growing interests of con- structing semantic network in support of in- Wordnet (CWN), in this paper, we propose telligent systems, automatic semantic relation an in situ approach by exploiting the morph- discovery has become an urgent task. This semantic information. This method, simple paper aims to extract semantic relations re- lying on the in situ morpho-semantic struc- and straightforward as it seems, does not incur ture in Chinese which can dispense of an the difficulties associated with lexical gaps in outside source such as corpus or web data. cross-language mapping that any translation- Manual evaluation of thousands of word pairs shows that most relations can be successful based model would encounter; and it is also predicted. We believe that it can serve as a economic and complementary with previous valuable starting point in complementing with approaches in that we can dispense of an out- other approaches, which will hold promise for the robust lexical relations acquisition. side corpus resource. In what follows, Section 2 gives a brief sum- 1 Introduction mary of lexical semantic relations acquisition Semantic relations are at the core of WordNet- from two perspectives.
    [Show full text]
  • Automatic Labeling of Troponymy for Chinese Verbs
    Automatic labeling of troponymy for Chinese verbs 羅巧Ê Chiao-Shan Lo*+ s!蓉 Yi-Rung Chen+ [email protected] [email protected] 林芝Q Chih-Yu Lin+ 謝舒ñ Shu-Kai Hsieh*+ [email protected] [email protected] *Lab of Linguistic Ontology, Language Processing and e-Humanities, +Graduate School of English/Linguistics, National Taiwan Normal University Abstract 以同©^Æ與^Y語意關¶Ë而成的^Y知X«,如ñ語^² (Wordnet)、P語^ ² (EuroWordnet)I,已有E分的研v,^²的úË_已øv完善。ú¼ø同的目的,- 研b語言@¦已úË'規!K-文^Y²路 (Chinese Wordnet,CWN),è(Ð供完t的 -文­YK^©@分。6而,(目MK-文^Y²路ûq-,1¼目M;要/¡(ºº$ 定來標記同©^ÆK間的語意關Â,因d這些標記KxÏ尚*T成可L應(K一定規!。 因d,,Ç文章y%針對動^K間的上下M^Y語意關 (Troponymy),Ðú一.ê動標 記的¹法。我們希望藉1句法上y定的句型 (lexical syntactic pattern),úË一個能 ê 動½取ú動^上下M的ûq。透N^©意$定原G的U0,P果o:,dûqê動½取ú 的動^上M^,cº率將近~分K七A。,研v盼能將,¹法應(¼c(|U-的-文^ ²ê動語意關Â標記,以Ê知X,體Kê動úË,2而能有H率的úË完善的-文^Y知 XÇ源。 關關關uuu^^^:-文^Y²路、語©關Âê動標記、動^^Y語© Abstract Synset and semantic relation based lexical knowledge base such as wordnet, have been well-studied and constructed in English and other European languages (EuroWordnet). The Chinese wordnet (CWN) has been launched by Academia Sinica basing on the similar paradigm. The synset that each word sense locates in CWN are manually labeled, how- ever, the lexical semantic relations among synsets are not fully constructed yet. In this present paper, we try to propose a lexical pattern-based algorithm which can automatically discover the semantic relations among verbs, especially the troponymy relation. There are many ways that the structure of a language can indicate the meaning of lexical items. For Chinese verbs, we identify two sets of lexical syntactic patterns denoting the concept of hypernymy-troponymy relation.
    [Show full text]
  • Applied Linguistics Unit III
    Applied Linguistics Unit III D ISCOURSE AND VOCABUL ARY We cannot deny the fact that vocabulary is one of the most important components of any language to be learnt. The place we give vocabulary in a class can still be discourse-oriented. Most of us will agree that vocabulary should be taught in context, the challenge we may encounter with this way of approaching teaching is that the word ‘context’ is a rather catch-all term and what we need to do at this point is to look at some of the specific relationships between vocabulary choice, context (in the sense of the situation in which the discourse is produced) and co-text (the actual text surrounding any given lexical item). Lexical cohesion As we have seen in Discourse Analysis, related vocabulary items occur across clause and sentence boundaries in written texts and across act, move, and turn boundaries in speech and are a major characteristic of coherent discourse. Do you remember which were those relationships in texts we studied last Semester? We call them Formal links or cohesive devices and they are: verb form, parallelism, referring expressions, repetition and lexical chains, substitution and ellipsis. Some of these are grammatical cohesive devices, like Reference, Substitution and Ellipsis; some others are Lexical Cohesive devices, like Repetition, and lexical chains (such us Synonymy, Antonymy, Meronymy etc.) Why should we study all this? Well, we are not suggesting exploiting them just because they are there, but only because we can give our learners meaningful, controlled practice and the hope of improving them with more varied contexts for using and practicing vocabulary.
    [Show full text]