CS474 Natural Language Processing Semantic Analysis Caveats Introduction to Lexical Semantics
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Words and Alternative Basic Units for Linguistic Analysis
Words and alternative basic units for linguistic analysis 1 Words and alternative basic units for linguistic analysis Jens Allwood SCCIIL Interdisciplinary Center, University of Gothenburg A. P. Hendrikse, Department of Linguistics, University of South Africa, Pretoria Elisabeth Ahlsén SCCIIL Interdisciplinary Center, University of Gothenburg Abstract The paper deals with words and possible alternative to words as basic units in linguistic theory, especially in interlinguistic comparison and corpus linguistics. A number of ways of defining the word are discussed and related to the analysis of linguistic corpora and to interlinguistic comparisons between corpora of spoken interaction. Problems associated with words as the basic units and alternatives to the traditional notion of word as a basis for corpus analysis and linguistic comparisons are presented and discussed. 1. What is a word? To some extent, there is an unclear view of what counts as a linguistic word, generally, and in different language types. This paper is an attempt to examine various construals of the concept “word”, in order to see how “words” might best be made use of as units of linguistic comparison. Using intuition, we might say that a word is a basic linguistic unit that is constituted by a combination of content (meaning) and expression, where the expression can be phonetic, orthographic or gestural (deaf sign language). On closer examination, however, it turns out that the notion “word” can be analyzed and specified in several different ways. Below we will consider the following three main ways of trying to analyze and define what a word is: (i) Analysis and definitions building on observation and supposed easy discovery (ii) Analysis and definitions building on manipulability (iii) Analysis and definitions building on abstraction 2. -
The Generative Lexicon
The Generative Lexicon James Pustejovsky" Computer Science Department Brandeis University In this paper, I will discuss four major topics relating to current research in lexical seman- tics: methodology, descriptive coverage, adequacy of the representation, and the computational usefulness of representations. In addressing these issues, I will discuss what I think are some of the central problems facing the lexical semantics community, and suggest ways of best ap- proaching these issues. Then, I will provide a method for the decomposition of lexical categories and outline a theory of lexical semantics embodying a notion of cocompositionality and type coercion, as well as several levels of semantic description, where the semantic load is spread more evenly throughout the lexicon. I argue that lexical decomposition is possible if it is per- formed generatively. Rather than assuming a fixed set of primitives, I will assume a fixed number of generative devices that can be seen as constructing semantic expressions. I develop a theory of Qualia Structure, a representation language for lexical items, which renders much lexical ambiguity in the lexicon unnecessary, while still explaining the systematic polysemy that words carry. Finally, I discuss how individual lexical structures can be integrated into the larger lexical knowledge base through a theory of lexical inheritance. This provides us with the necessary principles of global organization for the lexicon, enabling us to fully integrate our natural language lexicon into a conceptual whole. 1. Introduction I believe we have reached an interesting turning point in research, where linguistic studies can be informed by computational tools for lexicology as well as an appre- ciation of the computational complexity of large lexical databases. -
ON SOME CATEGORIES for DESCRIBING the SEMOLEXEMIC STRUCTURE by Yoshihiko Ikegami
ON SOME CATEGORIES FOR DESCRIBING THE SEMOLEXEMIC STRUCTURE by Yoshihiko Ikegami 1. A lexeme is the minimum unit that carries meaning. Thus a lexeme can be a "word" as well as an affix (i.e., something smaller than a word) or an idiom (i.e,, something larger than a word). 2. A sememe is a unit of meaning that can be realized as a single lexeme. It is defined as a structure constituted by those features having distinctive functions (i.e., serving to distinguish the sememe in question from other semernes that contrast with it).' A question that arises at this point is whether or not one lexeme always corresponds to just one serneme and no more. Three theoretical positions are foreseeable: (I) one which holds that one lexeme always corresponds to just one sememe and no more, (2) one which holds that one lexeme corresponds to an indefinitely large number of sememes, and (3) one which holds that one lexeme corresponds to a certain limited number of sememes. These three positions wiIl be referred to as (1) the "Grundbedeutung" theory, (2) the "use" theory, and (3) the "polysemy" theory, respectively. The Grundbedeutung theory, however attractive in itself, is to be rejected as unrealistic. Suppose a preliminary analysis has revealed that a lexeme seems to be used sometimes in an "abstract" sense and sometimes in a "concrete" sense. In order to posit a Grundbedeutung under such circumstances, it is to be assumed that there is a still higher level at which "abstract" and "concrete" are neutralized-this is certainly a theoretical possibility, but it seems highly unlikely and unrealistic from a psychological point of view. -
THE LEXICON: a SYSTEM of MATRICES of LEXICAL UNITS and THEIR PROPERTIES ~ Harry H
THE LEXICON: A SYSTEM OF MATRICES OF LEXICAL UNITS AND THEIR PROPERTIES ~ Harry H. Josselson - Uriel Weinreich /I/~ in discussing the fact that at one time many American scholars relied on either the discipline of psychology or sociology for the resolution of semantic problems~ comments: In Soviet lexicology, it seems, neither the tra- ditionalists~ who have been content to work with the categories of classical rhetoric and 19th- century historical semantics~ nor the critical lexicologists in search of better conceptual tools, have ever found reason to doubt that linguistics alone is centrally responsible for the investiga- tion of the vocabulary of languages. /2/ This paper deals with a certain conceptual tool, the matrix, which linguists can use for organizing a lexicon to insure that words will be described (coded) with consistency, that is~ to insure that questions which have been asked about certain words will be asked for all words in the same class, regardless of the fact that they may be more difficult to answer for some than for others. The paper will also dis- cuss certain new categories~ beyond those of classical rhetoric~ which have been introduced into lexicology. i. INTRODUCTION The research in automatic translation brought about by the introduction of computers into the technology has ~The research described herein has b&en supported by the In- formation Systems Branch of the Office of Naval Research. The present work is an amplification of a paper~ "The Lexicon: A Matri~ of Le~emes and Their Properties"~ contributed to the Conference on Mathematical Linguistics at Budapest-Balatonsza- badi, September 6-i0~ 1968. -
Social & Behavioural Sciences SCTCMG 2019 International
The European Proceedings of Social & Behavioural Sciences EpSBS ISSN: 2357-1330 https://doi.org/10.15405/epsbs.2019.12.04.33 SCTCMG 2019 International Scientific Conference «Social and Cultural Transformations in the Context of Modern Globalism» INTERLEXICOGRAPHY IN TEACHING RUSSIAN AS A SECOND LANGUAGE Elena Baryshnikova (a)*, Dmitrii Kazhuro (b) *Corresponding author (a) Peoples’ Friendship University of Russia, 6, Miklukho-Maklaya str., Moscow, Russia, [email protected], +7 (968) 812-64-61 (b) Peoples’ Friendship University of Russia, 6, Miklukho-Maklaya str., Moscow, Russia, [email protected], +7 (977) 946-74-63 Abstract The paper is concerned with the use of dictionaries of international words by foreign students studying the Russian language. It considers the phenomenon of defining international vocabulary in a dictionary in the context of a scientific field. The authors outline the growth of cross-cultural and cross-language contacts in the history of human civilization. They provide the definition of interlexicography as the method to carry out interlexicological study of international words in various linguistic systems. The paper analyses the issues related to the choice of an illustrative part in the interlexicographic source with regard to qualifying a lexeme as international. There is a background of interlexicographic development based on bi- and multilingual dictionaries of some national languages. The authors analyze the target and goal focus of existing dictionaries of international words, the principles of making up a glossary, its organization and representation in dictionary entries. Through descriptive and comparative methods, along with methodological parameters of linguodidactic representation of the material in line with the educational lexicography, the paper identifies some linguistic and methodological advantages and disadvantages of the existing internationalism dictionaries and their linguodidactic value for teaching Russian as a second language to international students. -
The Art of Lexicography - Niladri Sekhar Dash
LINGUISTICS - The Art of Lexicography - Niladri Sekhar Dash THE ART OF LEXICOGRAPHY Niladri Sekhar Dash Linguistic Research Unit, Indian Statistical Institute, Kolkata, India Keywords: Lexicology, linguistics, grammar, encyclopedia, normative, reference, history, etymology, learner’s dictionary, electronic dictionary, planning, data collection, lexical extraction, lexical item, lexical selection, typology, headword, spelling, pronunciation, etymology, morphology, meaning, illustration, example, citation Contents 1. Introduction 2. Definition 3. The History of Lexicography 4. Lexicography and Allied Fields 4.1. Lexicology and Lexicography 4.2. Linguistics and Lexicography 4.3. Grammar and Lexicography 4.4. Encyclopedia and lexicography 5. Typological Classification of Dictionary 5.1. General Dictionary 5.2. Normative Dictionary 5.3. Referential or Descriptive Dictionary 5.4. Historical Dictionary 5.5. Etymological Dictionary 5.6. Dictionary of Loanwords 5.7. Encyclopedic Dictionary 5.8. Learner's Dictionary 5.9. Monolingual Dictionary 5.10. Special Dictionaries 6. Electronic Dictionary 7. Tasks for Dictionary Making 7.1. Panning 7.2. Data Collection 7.3. Extraction of lexical items 7.4. SelectionUNESCO of Lexical Items – EOLSS 7.5. Mode of Lexical Selection 8. Dictionary Making: General Dictionary 8.1. HeadwordsSAMPLE CHAPTERS 8.2. Spelling 8.3. Pronunciation 8.4. Etymology 8.5. Morphology and Grammar 8.6. Meaning 8.7. Illustrative Examples and Citations 9. Conclusion Acknowledgements ©Encyclopedia of Life Support Systems (EOLSS) LINGUISTICS - The Art of Lexicography - Niladri Sekhar Dash Glossary Bibliography Biographical Sketch Summary The art of dictionary making is as old as the field of linguistics. People started to cultivate this field from the very early age of our civilization, probably seven to eight hundred years before the Christian era. -
COMMON and DIFFERENT ASPECTS in a SET of COMMENTARIES in DICTIONARIES and SEMANTIC TAGS Akhmedova Dildora Bakhodirovna a Teacher
International Scientific Forum on language, literature, translation, literary criticism: international scientific-practical conference on modern approaches and perspectives. Web: https://iejrd.com/ COMMON AND DIFFERENT ASPECTS IN A SET OF COMMENTARIES IN DICTIONARIES AND SEMANTIC TAGS Akhmedova Dildora Bakhodirovna A teacher, BSU Hamidova Iroda Olimovna A student, BSU Annotation: This article discusses dictionaries that can be a source of information for the language corpus, the structure of the dictionary commentary, and the possibilities of this information in tagging corpus units. Key words:corpus linguistics, lexicography, lexical units, semantic tagging, annotated dictionary, general vocabulary, limited vocabulary, illustrative example. I.Introduction Corpus linguistics is closely related to lexicography because the linguistic unit in the dictionary, whose interpretation serves as a linguo-lexicographic supply for tagging corpus units. Digitization of the text of existing dictionaries in the Uzbek language is the first step in linking the dictionary with the corpus. Uzbek lexicography has come a long way, has a rich treasure that serves as a material for the corps. According to Professor E. Begmatov, systematicity in the lexicon is not as obvious as in other levels of language. Lexical units are more numerous than phonemes, morphemes in terms of quantity, and have the property of periodic instability. Therefore, it is not possible to identify and study the lexicon on a scale. Today, world linguistics has begun to solve this problem with the help of language corpora and has already achieved results. Qualitative and quantitative inventory of vocabulary, the possibility of comprehensive research has expanded. The main source for semantic tagging of language units is the explanatory dictionary of the Uzbek language. -
The Combinatory Morphemic Lexicon
The Combinatory Morphemic Lexicon Cem Bozsahin∗ Middle East Technical University Grammars that expect words from the lexicon may be at odds with the transparent projection of syntactic and semantic scope relations of smaller units. We propose a morphosyntactic framework based on Combinatory Categorial Grammar that provides flexible constituency, flexible category consistency, and lexical projection of morphosyntactic properties and attachment to grammar in order to establish a morphemic grammar-lexicon. These mechanisms provide enough expressive power in the lexicon to formulate semantically transparent specifications without the necessity to confine structure forming to words and phrases. For instance, bound morphemes as lexical items can have phrasal scope or word scope, independent of their attachment characteristics but consistent with their semantics. The controls can be attuned in the lexicon to language-particular properties. The result is a transparent interface of inflectional morphology, syntax, and semantics. We present a computational system and show the application of the framework to English and Turkish. 1. Introduction The study presented in this article is concerned with the integrated representation and processing of inflectional morphology, syntax, and semantics in a unified grammar ar- chitecture. An important issue in such integration is mismatches in morphological, syntactic, and semantic bracketings. The problem was first noted in derivational mor- phology. Williams (1981) provided examples from English; the semantic bracketings in (1a–2a) are in conflict with the morphological bracketings in (1b–2b). (1) a. -ity b. hydro hydro electric electric -ity (2) a. -ing b. G¨odel G¨odel number number -ing If the problem were confined to derivational morphology, we could avoid it by making derivational morphology part of the lexicon that does not interact with gram- mar. -
Changing the Rules: Why the Monolingual Learner's Dictionary Should Move Away from the Native-Speaker Tradition
127 Changing the rules: Why the monolingual learner's dictionary should move away from the native-speaker tradition Michael Rundell This paper starts from a recognition that the reference needs of people learning English are not adequately met by existing monolingual learner's dictionaries (MLDs). Either the dictionaries themselves are deficient, or their target users have not yet learned how to use them effectively: whichever view one takes — and the truth is probably somewhere in between — it is difficult to escape the conclusion that the MLD's full potential as a language-learning resource has not yet been realized. This is recognized eg by Béjoint 1981: 219: "Monolingual dic tionaries are not used as fully as they should be ... many students are not even aware of the riches that their monolingual dictionaries contain" — a view which has been consistently borne out by any user-research we have conducted at Longman. There is a variety of responses to this situation. Compilers of MLDs may feel a certain exasperation with the 'ungrateful' students who fail to recognize the very real progress that has been made in adapting conventional dictionaries to their special needs. More positively, a growing awareness on the part of teachers of the importance of developing students' reference skills (eg Béjoint 1981: 220, Rossner 1985: 99 f., Whitcut 1986: 111) is complemented by a clear commit ment on the part of dictionary publishers to make their products as accessible and user-friendly as possible. These two developments seem to offer the beguiling prospect of a scenario in which ever more helpful MLDs are ever more skilfully exploited by their users — thus resolving, to everyone's satisfaction, the problem identified at the beginning of this paper. -
Part 4: Lexical Semantics 2
Natural Language Processing Part 4: lexical semantics 2 Marco Maggini Tecnologie per l'elaborazione del linguaggio Lexical semantics • A lexicon generally has a highly structured form ▫ It stores the meanings and uses of each word ▫ It encodes the relations between words and meanings • A lexeme is the minimal unit represented in the lexicon ▫ It pairs a stem (the orthographic/phonological form chosen to represent words) with a symbolic form for meaning representation (sense) • A dictionary is a kind of lexicon where meanings are expressed through definitions and examples son noun a boy or man in relation to either or both of his parents. • a male offspring of an animal. • a male descendant : the sons of Adam. • ( the Son) (in Christian belief) the second person of the Trinity; Christ. • a man considered in relation to his native country or area : one of Nevada's most famous sons. • a man regarded as the product of a particular person, influence, or environment : sons of the French Revolution. • (also my son) used by an elder person as a form of address for a boy or young man : “You're on private land, son.” 3 Marco Maggini Tecnologie per l'elaborazione del linguaggio Lexicons & dictionaries • Definitions in dictionaries exploit words and they may be circular (a word definition uses words whose definitions exploit that word) right adj. 1. of, relating to, situated on, or being the side of the body which is away from the side on which the heart is mostly located 2. located nearer to the right hand than to the left 3. -
The Project, Its Ratiale and Processes,'Lrcludes A,Descrip-Tionf
-QED 160 099,. AUTHOR Chenhall, Robert . TITLE The Onomastic Dctopus Repdif. No.3. INSTITUTION California State Univ., Fuller 5c ender .-SPONS N ational Endowment for puB DA NOTE 1 p-. BURS PcE- MF-$0. 3C-S1. Bins Post age DESCRI TORS, k-ctivities; *Catalcgivg; ctaaficetion, Ii'a.grams *Tnf ormati n Needs; *Museums; ru jects; *Subject.. Index Ter *Bysteus Analysis; TheSauti ABSTRACT Itoivities and information -in-museums and .a project titi ertaken,',Xithe Margaret -Woodbury. Strong-Museum to develop syStematic solutiOrp to' problems in cataloging museumcoliections-aze described. Museum-,eibtiwities Are grouped in three categories:41) initial.;--anisition -.:accession, registricn, identification, and restoration; (2)ongo- ng- -rearcb, exhibition, and conservation efforts; and (3) terminal_ -de-accessioniitg. Theongoing are the.,most lizportant'of the three y.pes.of _activities.. In order to carry out theseactivitieseffictitly, all. the information required for each- defined activity must be available at the right tine avd Iladet however- at the pre-sent time there are no systems cfnomenclature that are generally accepled as a basis for defining andidentifying - man-made artifacts.17,5 _.meet this need, this project focused on the creation of a the saur for use at the Strong Museum. This Standardized worpd_l'st, or lexicon, includes, words andphxases,.-and shows synonymous anhierarchical word ,-relationshipi and is not a panacea for all suseux. cataloging dependencies. While_it .1 qrpbleuls, it can serve as a commonstarting Loint. in. identificati activities which are crucial to the catalogin4 process. Asummaryof the project, its ratiale and processes,'Lrcludes a,descrip-tionf. the lexicon with examp es.(BIM) :/- * # *4$4*] x * * * * * * * ** Reproductions supped by EARS arethe hest that can be made the originaldocument. -
Corpus Linguistic Analysis of Fear-Factor Lexemes of Selected Online Newspaper Headlines on Coronavirus Pandemic
www.idosr.org Anyanwu and Udoh ©IDOSR PUBLICATIONS International Digital Organization for Scientific Research. ISSN: 2550-7958. IDOSR JOURNAL OF COMMUNICATION AND ENGLISH 5(1) 76-83, 2020. Corpus Linguistic Analysis of Fear-Factor Lexemes of Selected Online Newspaper Headlines on Coronavirus Pandemic Esther Chikaodi Anyanwu and Victoria Chinwe Udoh Department of English Language and Literature, Nnamdi Azikiwe University, Awka. Nigeria ABSTRACT As the world expanded, so too did the spread of diseases and their vocabulary. The spread of the corona virus disease has altered the lives of billions of people, and has equally ushered in a new vocabulary to the general populace encompassing specialist terms from the fields of epidemiology and medicine. The aim of the study was to identify those lexemes that relate to corona virus (Covid-19) and which have dominated the language discourse of Nigerians. It has been observed that most of such lexemes project fear and death of human and economic resources. The researcher randomly collected data from online Nigerian newspaper headlines that were centred on corona virus related issues, and analyzed such data using the theoretical framework of corpus linguistics. The researcher identified those linguistic terms which are associated with Covid-19 as they have overwhelmingly dominated global discourse. The study revealed that so many lexemes that relate to Covid-19 project fear, disaster, confusion, restlessness, panic and death. The researcher was able to spot new words and senses associated with the pandemic and assessed the frequency of their usage. She concluded by reiterating that the lexical wealth of every language is not limited, but rather, there are lots of various lexical units created by different professions and epidemiology and medicine are not exceptions.