Lexicography 03 Presentation.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

Lexicography 03 Presentation.Pdf Linguistic Institute 2017 Course Lexicography Dictionary Structure Dr Helene Schmolz University of Passau, Germany [email protected] Overview 1. Parts of a dictionary 2. Further aspects of dictionary structure 3. Organization of the word list 3.1 Homonyms and polysemes 3.2 Alphabetical and thematic ordering 3.3 Niching and nesting 3.4 Multi-word lemmas 3.5 Information complementing entries 4. Organization of an entry 4.1 Structural elements 4.2 Textual condensation Sources 1. Dictionary structure --introductionintroduction Using a dictionary needs competence: ◦ Reference competence ◦ General dictionary competence ◦ Specific dictionary competence Source: http://clipart-library.com/dictionary-cliparts.html 1. Dictionary structure --introductionintroduction Which parts of dictionaries can you identify? Think or have a look at print dictionaries. What is the most important part of a dictionary? headword/ entry lemma headword/ lemma entry 1. Parts ofof a dictionary Megastructure entire structure of the main components of a dictionary: o outside matter: front & back matter (sometimes also middle matter) o body or word list Macrostructure organisation of the word list, i.e. lemmas/headwords (such as alphabetical order) Microstructure order of information in each entry 1. Parts ofof a dictionary Mega- structure Macro- structure Micro- Micro- Micro- structure structure structure From: Klotz & Herbst (2016: 39) 2. Further aspects ofof dictionary structure Distribution structure ◦ where lexical information can be found (e.g. in an entry or outside the body) Cross-reference structure ◦ directs users to related kinds of information (such as referring to synonym/antonym entries) Access structure ◦ directs users to the information they are looking for ◦ two types: outer access structure: which lemma?; e.g. running heads inner access structure: which information of the lemma? e.g. font variants, colours, numerals or letters, punctuation marks 2. Further aspects ofof dictionary structurestructure:: Example OALD Cross-reference structure 2. Further aspects ofof dictionary structurestructure:: Example OALD Outer access structure Outer access structure 2. Further aspects ofof dictionary structurestructure:: Example OALD Inner access structure Inner access structure Inner access structure RevisitedRevisited:: Parts ofof a dictionary Megastructure entire structure of the main components of a dictionary: o outside matter: front & back matter (sometimes also middle matter) o body or word list Macrostructure organisation of the word list, i.e. lemmas/headwords (such as alphabetical order) Microstructure order of information in each entry 3. Organization ofof the word list What should be represented as one lemma or one entry? ◦ Word forms? ◦ Homonyms and polysemes? How should lemmas be ordered? ◦ Alphabetic and thematic ordering ◦ Compounds, derived words → niching and nesting ◦ phrasal expressions, idioms → multi-word lemmas Information complementing entries 3.1 Homonyms and polysemes Why do we consider foot – foot as polysemes and bank – bank as homonyms? Distinction of homonym/polyseme based on two considerations: ◦ Historical: same or different historical origin ◦ Speaker-psychological: relatedness of meaning or no relatedness (only perception of native speaker) 3.1 Homonyms and polysemes Historical consideration: Same gift ‘present’ From Old Norse gipt historical origin gift ‘talent Different case ‘situation’ From Old French cas historical origin case ‘container’ From Old French casse Speaker-psychological: branch ‘of tree’ Related branch ‘of company’ trunk ‘chest’ Not related trunk ‘of elephant’ 3.1 Homonyms and polysemes Approach of dictionaries varies Example “ditch” ditch /d ɪtʃ / ° noun a long channel dug at theside of a field or road… ° verb …to get rid of sth/sb because you no longer want or need them… ditch 1 /d ɪtʃ / noun a long channel dug at the side of a field or road… ditch 2 verb …to get rid of sth/sb because you no longer want or need them… From: Klotz & Herbst (2016: 41) 3.2 Alphabetical and thematic ordering Semasiological (by form) vs. onomasiological (by meaning) ordering Reverse dictionaries: ◦ Initial-alphabetical vs. final-alphabetical ordering (e.g. Backwords Dictionary) ◦ ‘typical’ conceptual dictionary vs. reverse conceptual dictionary → http://reversedictionary.org Morphologically related lemmas and multi-word lemmas as sub-lemmas → niching and nesting 3.3 Niching and nesting Lemmas brought together in one entry Entry then contains sublemma(s) Nesting: the alphabetical order is somehow broken earner … institution … earnest … instituational … earnestly … institutionally … earnestness … institutionalize … earnings … institutionalization … institutionalized … 3.4 MultiMulti--wordword lemmas Multi-word lemmas as sub-lemmas in one entry of their component words Example: throw the baby out with the bathwater To which entry do such multi-word lemmas belong? What are your arguments? → different strategies: ◦ throw , i.e. take the first lexical word ◦ baby, i.e. the first noun ◦ bathwater , i.e. the most characteristic word 3.5 Information complementing entries Sometimes we have independent parts/boxes etc. within or next to entries, such as: ◦ Pictorial illustrations, maps ◦ Statistical charts ◦ Usage boxes From: Longman Dictionary of English Language and Culture, 1998 From Klotz & Herbst (2016: 58, 67) 3.5 Further examples From OALD 2010 4. Organization ofof an entry Task: Try to write a dictionary entry out of the following information on the lemma cycle . → use the OHP transparency The main challenge is to structure it in a good way. The noun cycle is pronounced as /'saɪkl/. It stands for a bicylcle or motorcycle (compare it also to the entries BIKE and BICYCLE). An example for that use is We went for a cycle ride on Sunday. Another sense is that cycle stands for a series of events being repeated many times, such as in the cylce of the seasons . The word cycle can also be used as a verb. It has the same pronunciation as the noun. As a verb, it is especially used in British English with the sense of riding a bicycle, for example , I usually cycle home through the park . 4. Organization ofof an entry One solution offered by the OALD (2010): cycle /'saɪkl/ noun, verb ° noun 1 a bicylcle or motorcycle: We went for a cycle ride on Sunday. → see also BIKE, BICYCLE 2 a series of events being repeated many times : the cylce of the seasons . ° verb (especially BrE) to ride a bicycle: I usually cycle home through the park . 4.1 Structural elements Structured entry through ◦ syntactic properties (e.g. word class divisions) ◦ sense divisions ◦ structural divisors and general layout 4.2 Textual condensation shortening, e.g. no complete sentences, use of “shortcuts” such as abbreviations and symbols leads to information density results in less user-friendliness saves space, but no information is lost 4.2 Textual condensation 8 types: Abbreviation of citations Replacement of text elements by representation (or repetition) symbols Replacement of text elements by indication symbols Use of structure indicators Use of standardized abbreviations Summarized presentation of alternative wordings Omission of text elements Extrapositioning of text elements 4.2 Textual condensation Abbreviation of citations Replacement of text elements by representation (or repetition) symbols e.g. hyphen or tilde Replacement of text elements by indication symbols e.g. vertical stroke, raised dot, stress mark 4.2 Textual condensation Use of structure indicators Use of standardized abbreviations three main types: o general language abbreviations, e.g. sb , sth o abbreviations standardised in linguistic/lexicographic language, e.g. adj. , BrE o abbreviations standardized in the individual dictionary, e.g. T (= transitive verb), U (= uncountable noun) 4.2 Textual condensation Summarized presentation of alternative wordings e.g. brackets, comma, slash Omission of text elements e.g. anything which is missing to form a complete sentence 4.2 Textual condensation Extrapositioning of text elements 4.2 Textual condensation Which types of textual condensation can you find?.
Recommended publications
  • Creating Words: Is Lexicography for You? Lexicographers Decide Which Words Should Be Included in Dictionaries. They May Decide T
    Creating Words: Is Lexicography for You? Lexicographers decide which words should be included in dictionaries. They may decide that a word is currently just a fad, and so they’ll wait to see whether it will become a permanent addition to the language. In the past several decades, words such as hippie and yuppie have survived being fads and are now found in regular, not just slang, dictionaries. Other words, such as medicare, were created to fill needs. And yet other words have come from trademark names, for example, escalator. Here are some writing options: 1. While you probably had to memorize vocabulary words throughout your school years, you undoubtedly also learned many other words and ways of speaking and writing without even noticing it. What factors are bringing about changes in the language you now speak and write? Classes? Songs? Friends? Have you ever influenced the language that someone else speaks? 2. How often do you use a dictionary or thesaurus? What helps you learn a new word and remember its meaning? 3. Practice being a lexicographer: Define a word that you know isn’t in the dictionary, or create a word or set of words that you think is needed. When is it appropriate to use this term? Please give some sample dialogue or describe a specific situation in which you would use the term. For inspiration, you can read the short article in the Writing Center by James Chiles about the term he has created "messismo"–a word for "true bachelor housekeeping." 4. Or take a general word such as "good" or "friend" and identify what it means in different contexts or the different categories contained within the word.
    [Show full text]
  • Robust Ontology Acquisition from Machine-Readable Dictionaries
    Robust Ontology Acquisition from Machine-Readable Dictionaries Eric Nichols Francis Bond Daniel Flickinger Nara Inst. of Science and Technology NTT Communication Science Labs CSLI Nara, Japan Nippon Telegraph and Telephone Co. Stanford University [email protected] Keihanna, Japan California, U.S.A. [email protected] [email protected] Abstract Our basic approach is to parse dictionary definition sen- tences with multiple shallow and deep processors, generating In this paper, we outline the development of a semantic representations of varying specificity. The seman- system that automatically constructs ontologies by tic representation used is robust minimal recursion semantics extracting knowledge from dictionary definition (RMRS: Section 2.2). We then extract ontological relations sentences using Robust Minimal Recursion Se- using the most informative semantic representation for each mantics (RMRS), a semantic formalism that per- definition sentence. mits underspecification. We show that by com- In this paper we discuss the construction of an ontology for bining deep and shallow parsing resources through Japanese using the the Japanese Semantic Database Lexeed the common formalism of RMRS, we can extract [Kasahara et al., 2004]. The deep parser uses the Japanese ontological relations in greater quality and quan- Grammar JACY [Siegel and Bender, 2002] and the shallow tity. Our approach also has the advantages of re- parser is based on the morphological analyzer ChaSen. quiring a very small amount of rules and being We carried out two evaluations. The first gives an automat- easily adaptable to any language with RMRS re- ically obtainable measure by comparing the extracted onto- sources. logical relations by verifying the existence of the relations in exisiting WordNet [Fellbaum, 1998]and GoiTaikei [Ikehara 1 Introduction et al., 1997] ontologies.
    [Show full text]
  • Lemma Selection in Domain Specific Computational Lexica – Some Specific Problems
    Lemma selection in domain specific computational lexica – some specific problems Sussi Olsen Center for Sprogteknologi Njalsgade 80, DK-2300, Denmark Phone: +45 35 32 90 90 Fax: +45 35 32 90 89 e-mail: [email protected] URL: www.cst.dk Abstract This paper describes the lemma selection process of a Danish computational lexicon, the STO project, for domain specific language and focuses on some specific problems encountered during the lemma selection process. After a short introduction to the STO project and an explanation of why the lemmas are selected from a corpus and not chosen from existing dictionaries, the lemma selection process for domain specific language is described in detail. The purpose is to make the lemma selection process as automatic as possible but a manual examination of the final candidate lemma lists is inevitable. The lemmas found in the corpora are compared to a list of lemmas of general language, sorting out lemmas already encoded in the database. Words that have already been encoded as general language words but that are also found with another meaning and perhaps another syntactic behaviour in a specific domain should be kept on a list and the paper describes how this is done. The recognition of borrowed words the spelling of which have not been established constitutes a big problem to the automatic lemma selection process. The paper gives some examples of this problem and describes how the STO project tries to solve it. The selection of the specific domains is based on the 1. Introduction potential future applications and at present the following The Danish STO project, SprogTeknologisk Ordbase, domains have been selected, while – at least – one is still (i.e.
    [Show full text]
  • Etytree: a Graphical and Interactive Etymology Dictionary Based on Wiktionary
    Etytree: A Graphical and Interactive Etymology Dictionary Based on Wiktionary Ester Pantaleo Vito Walter Anelli Wikimedia Foundation grantee Politecnico di Bari Italy Italy [email protected] [email protected] Tommaso Di Noia Gilles Sérasset Politecnico di Bari Univ. Grenoble Alpes, CNRS Italy Grenoble INP, LIG, F-38000 Grenoble, France [email protected] [email protected] ABSTRACT a new method1 that parses Etymology, Derived terms, De- We present etytree (from etymology + family tree): a scendants sections, the namespace for Reconstructed Terms, new on-line multilingual tool to extract and visualize et- and the etymtree template in Wiktionary. ymological relationships between words from the English With etytree, a RDF (Resource Description Framework) Wiktionary. A first version of etytree is available at http: lexical database of etymological relationships collecting all //tools.wmflabs.org/etytree/. the extracted relationships and lexical data attached to lex- With etytree users can search a word and interactively emes has also been released. The database consists of triples explore etymologically related words (ancestors, descendants, or data entities composed of subject-predicate-object where cognates) in many languages using a graphical interface. a possible statement can be (for example) a triple with a lex- The data is synchronised with the English Wiktionary dump eme as subject, a lexeme as object, and\derivesFrom"or\et- at every new release, and can be queried via SPARQL from a ymologicallyEquivalentTo" as predicate. The RDF database Virtuoso endpoint. has been exposed via a SPARQL endpoint and can be queried Etytree is the first graphical etymology dictionary, which at http://etytree-virtuoso.wmflabs.org/sparql.
    [Show full text]
  • Illustrative Examples in a Bilingual Decoding Dictionary: an (Un)Necessary Component
    http://lexikos.journals.ac.za Illustrative Examples in a Bilingual Decoding Dictionary: An (Un)necessary Component?* Alenka Vrbinc ([email protected]), Faculty of Economics, University of Ljubljana, Ljubljana, Slovenia and Marjeta Vrbinc ([email protected]), Faculty of Arts, Department of English, University of Ljubljana, Ljubljana, Slovenia Abstract: The article discusses the principles underlying the inclusion of illustrative examples in a decoding English–Slovene dictionary. The inclusion of examples in decoding bilingual diction- aries can be justified by taking into account the semantic and grammatical differences between the source and the target languages. Among the differences between the dictionary equivalent, which represents the most frequent translation of the lemma in a particular sense, and the translation of the lemma in the illustrative example, the following should be highlighted: the differences in the part of speech; context-dependent translation of the lemma in the example; the one-word equiva- lent of the example; zero equivalence; and idiomatic translation of the example. All these differ- ences are addressed and discussed in detail, together with the sample entries taken from a bilingual English–Slovene dictionary. The aim is to develop criteria for the selection of illustrative examples whose purpose is to supplement dictionary equivalent(s) in the grammatical, lexical, semantic and contextual senses. Apart from that, arguments for translating examples in the target language are put forward. The most important finding is that examples included in a bilingual decoding diction- ary should be chosen carefully and should be translated into the target language, since lexical as well as grammatical changes in the translation of examples demand not only a high level of knowl- edge of both languages, but also translation abilities.
    [Show full text]
  • And User-Related Definitions in Online Dictionaries
    S. Nielsen Centre for Lexicography, Aarhus University, Denmark FUNCTION- AND USER-RELATED DEFINITIONS IN ONLINE DICTIONARIES 1. Introduction Definitions play an important part in the use and making of dictionaries. A study of the existing literature reveals that the contributions primarily concern printed dictionaries and show that definitions are static in that only one definition should be provided for each sense of a lemma or entry word (see e.g. Jackson, 2002, 86-100; Landau, 2001, 153-189). However, online dictionaries provide lexicographers with the option of taking a more flexible and dynamic approach to lexicographic definitions in an attempt to give the best possible help to users. Online dictionaries vary in size and shape. Some contain many data, some contain few data, some are intended to be used by adults, and some are intended to be used by children. These dictionaries contain data that reflect the differences between these groups of users, not only between general competences of children and adults but also between various competences among adults. The main distinction is between general and specialised dictionaries because their intended users belong to different types of users based on user competences and user needs. Even though lexicographers are aware of this distinction, “surprisingly little has been written about the relationship between the definition and the needs and skills of those who will use it” (Atkins/Rundell, 2008, 407). Most relevant contributions addressing definitions and user competences concern specialised dictionaries, but there are no reasons why the principles discussed should not be adopted by lexicographers at large (Bergenholtz/Kauffman, 1997; Bergenholtz/Nielsen, 2002).
    [Show full text]
  • Problem of Creating a Professional Dictionary of Uncodified Vocabulary
    Utopía y Praxis Latinoamericana ISSN: 1315-5216 ISSN: 2477-9555 [email protected] Universidad del Zulia Venezuela Problem of Creating a Professional Dictionary of Uncodified Vocabulary MOROZOVA, O.A; YAKHINA, A.M; PESTOVA, M.S Problem of Creating a Professional Dictionary of Uncodified Vocabulary Utopía y Praxis Latinoamericana, vol. 25, no. Esp.7, 2020 Universidad del Zulia, Venezuela Available in: https://www.redalyc.org/articulo.oa?id=27964362018 DOI: https://doi.org/10.5281/zenodo.4009661 This work is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. PDF generated from XML JATS4R by Redalyc Project academic non-profit, developed under the open access initiative O.A MOROZOVA, et al. Problem of Creating a Professional Dictionary of Uncodified Vocabulary Artículos Problem of Creating a Professional Dictionary of Uncodified Vocabulary Problema de crear un diccionario profesional de vocabulario no codificado O.A MOROZOVA DOI: https://doi.org/10.5281/zenodo.4009661 Kazan Federal University, Rusia Redalyc: https://www.redalyc.org/articulo.oa? [email protected] id=27964362018 http://orcid.org/0000-0002-4573-5858 A.M YAKHINA Kazan Federal University, Rusia [email protected] http://orcid.org/0000-0002-8914-0995 M.S PESTOVA Ural State Law University, Rusia [email protected] http://orcid.org/0000-0002-7996-9636 Received: 03 August 2020 Accepted: 10 September 2020 Abstract: e article considers the problem of lexicographical fixation of uncodified vocabulary in a professional dictionary. e oil sublanguage, being one of the variants of common language realization used by a limited group of its medium in conditions of official and also non-official communication, provides interaction of people employed in the oil industry.
    [Show full text]
  • The History of the Creation of Lexicographic Dictionaries, Theoretical and Practical Ways of Development
    European Journal of Research Development and Sustainability (EJRDS) Available Online at: https://www.scholarzest.com Vol. 2 No. 3, March 2021, ISSN: 2660-5570 THE HISTORY OF THE CREATION OF LEXICOGRAPHIC DICTIONARIES, THEORETICAL AND PRACTICAL WAYS OF DEVELOPMENT Dilrabo Askarovna Ubaidova (Bukhara State University) Dilfuza Kamilovna Ergasheva (Bukhara State University) Article history: Abstract: Received: 20th February 2021 The article provides a historical analysis of the development of ideas about Accepted: 2th March 2021 lexicography in Russian linguistics. The authors come to reasonable conclusions Published: 20th March 2021 that 1) the term "lexicography" appeared in scientific and general use in the last third of the 19th century; 2) the content of the concept brought under this term developed in the direction from the applied aspect of this linguistic essence to the theoretical aspect and the totality of dictionaries of the given language; 3) in the last quarter of the XX century. lexicography is firmly entrenched in the science of language with the status of an autonomous branch of linguistics; 4) recently, she began to receive, in addition to the definition, a certain wider set of attributes. Keywords: vocabulary, lexicography, lexicology, lexicon, linguistic term, vocabulary practice, applied aspect, dictionaries, sociolexicography, typology of dictionaries As you know, the practice of compiling various kinds of dictionaries has a much longer history than linguistics as a science. Suffice it to recall Nighwanta, Amarakosa in Ancient India, Dictionaries of the Turkic languages of Mahmud Kozhgariy, Comparative dictionaries of all languages and dialects of Peter Pallas, etc. However, the theoretical understanding of this practice came to linguistics much later.
    [Show full text]
  • Character-Word LSTM Language Models
    Character-Word LSTM Language Models Lyan Verwimp Joris Pelemans Hugo Van hamme Patrick Wambacq ESAT – PSI, KU Leuven Kasteelpark Arenberg 10, 3001 Heverlee, Belgium [email protected] Abstract A first drawback is the fact that the parameters for infrequent words are typically less accurate because We present a Character-Word Long Short- the network requires a lot of training examples to Term Memory Language Model which optimize the parameters. The second and most both reduces the perplexity with respect important drawback addressed is the fact that the to a baseline word-level language model model does not make use of the internal structure and reduces the number of parameters of the words, given that they are encoded as one-hot of the model. Character information can vectors. For example, ‘felicity’ (great happiness) is reveal structural (dis)similarities between a relatively infrequent word (its frequency is much words and can even be used when a word lower compared to the frequency of ‘happiness’ is out-of-vocabulary, thus improving the according to Google Ngram Viewer (Michel et al., modeling of infrequent and unknown words. 2011)) and will probably be an out-of-vocabulary By concatenating word and character (OOV) word in many applications, but since there embeddings, we achieve up to 2.77% are many nouns also ending on ‘ity’ (ability, com- relative improvement on English compared plexity, creativity . ), knowledge of the surface to a baseline model with a similar amount of form of the word will help in determining that ‘felic- parameters and 4.57% on Dutch. Moreover, ity’ is a noun.
    [Show full text]
  • Basic Morphology
    What is Morphology? Mark Aronoff and Kirsten Fudeman MORPHOLOGY AND MORPHOLOGICAL ANALYSIS 1 1 Thinking about Morphology and Morphological Analysis 1.1 What is Morphology? 1 1.2 Morphemes 2 1.3 Morphology in Action 4 1.3.1 Novel words and word play 4 1.3.2 Abstract morphological facts 6 1.4 Background and Beliefs 9 1.5 Introduction to Morphological Analysis 12 1.5.1 Two basic approaches: analysis and synthesis 12 1.5.2 Analytic principles 14 1.5.3 Sample problems with solutions 17 1.6 Summary 21 Introduction to Kujamaat Jóola 22 mor·phol·o·gy: a study of the structure or form of something Merriam-Webster Unabridged n 1.1 What is Morphology? The term morphology is generally attributed to the German poet, novelist, playwright, and philosopher Johann Wolfgang von Goethe (1749–1832), who coined it early in the nineteenth century in a biological context. Its etymology is Greek: morph- means ‘shape, form’, and morphology is the study of form or forms. In biology morphology refers to the study of the form and structure of organisms, and in geology it refers to the study of the configuration and evolution of land forms. In linguistics morphology refers to the mental system involved in word formation or to the branch 2 MORPHOLOGYMORPHOLOGY ANDAND MORPHOLOGICAL MORPHOLOGICAL ANALYSIS ANALYSIS of linguistics that deals with words, their internal structure, and how they are formed. n 1.2 Morphemes A major way in which morphologists investigate words, their internal structure, and how they are formed is through the identification and study of morphemes, often defined as the smallest linguistic pieces with a gram- matical function.
    [Show full text]
  • Learning About Phonology and Orthography
    EFFECTIVE LITERACY PRACTICES MODULE REFERENCE GUIDE Learning About Phonology and Orthography Module Focus Learning about the relationships between the letters of written language and the sounds of spoken language (often referred to as letter-sound associations, graphophonics, sound- symbol relationships) Definitions phonology: study of speech sounds in a language orthography: study of the system of written language (spelling) continuous text: a complete text or substantive part of a complete text What Children Children need to learn to work out how their spoken language relates to messages in print. Have to Learn They need to learn (Clay, 2002, 2006, p. 112) • to hear sounds buried in words • to visually discriminate the symbols we use in print • to link single symbols and clusters of symbols with the sounds they represent • that there are many exceptions and alternatives in our English system of putting sounds into print Children also begin to work on relationships among things they already know, often long before the teacher attends to those relationships. For example, children discover that • it is more efficient to work with larger chunks • sometimes it is more efficient to work with relationships (like some word or word part I know) • often it is more efficient to use a vague sense of a rule How Children Writing Learn About • Building a known writing vocabulary Phonology and • Analyzing words by hearing and recording sounds in words Orthography • Using known words and word parts to solve new unknown words • Noticing and learning about exceptions in English orthography Reading • Building a known reading vocabulary • Using known words and word parts to get to unknown words • Taking words apart while reading Manipulating Words and Word Parts • Using magnetic letters to manipulate and explore words and word parts Key Points Through reading and writing continuous text, children learn about sound-symbol relation- for Teachers ships, they take on known reading and writing vocabularies, and they can use what they know about words to generate new learning.
    [Show full text]
  • Towards a Conceptual Representation of Lexical Meaning in Wordnet
    Towards a Conceptual Representation of Lexical Meaning in WordNet Jen-nan Chen Sue J. Ker Department of Information Management, Department of Computer Science, Ming Chuan University Soochow University Taipei, Taiwan Taipei, Taiwan [email protected] [email protected] Abstract Knowledge acquisition is an essential and intractable task for almost every natural language processing study. To date, corpus approach is a primary means for acquiring lexical-level semantic knowledge, but it has the problem of knowledge insufficiency when training on various kinds of corpora. This paper is concerned with the issue of how to acquire and represent conceptual knowledge explicitly from lexical definitions and their semantic network within a machine-readable dictionary. Some information retrieval techniques are applied to link between lexical senses in WordNet and conceptual ones in Roget's categories. Our experimental results report an overall accuracy of 85.25% (87, 78, 89, and 87% for nouns, verbs, adjectives and adverbs, respectively) when evaluating on polysemous words discussed in previous literature. 1 Introduction Knowledge representation has traditionally been thought of as the heart of various applications of natural language processing. Anyone who has built a natural language processing system has had to tackle the problem of representing its knowledge of the world. One of the applications for knowledge representation is word sense disambiguation (WSD) for unrestricted text, which is one of the major problems in analyzing sentences. In the past, the substantial literature on WSD has concentrated on statistical approaches to analyzing and extracting information from corpora (Gale et al. 1992; Yarowsky 1992, 1995; Dagan and Itai 1994; Luk 1995; Ng and Lee 1996).
    [Show full text]