Chinese LIWC Lexicon Expansion Via Hierarchical Classification of Word

Total Page:16

File Type:pdf, Size:1020Kb

Chinese LIWC Lexicon Expansion Via Hierarchical Classification of Word The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) Chinese LIWC Lexicon Expansion via Hierarchical Classification of Word Embeddings with Sememe Attention Xiangkai Zeng,∗‡ Cheng Yang,§ Cunchao Tu,§ Zhiyuan Liu,†§¶ Maosong Sun§¶ ‡School of Computer Science and Engineering, Beihang University, Beijing, China §Department of Computer Science and Technology, State Key Lab on Intelligent Technology and Systems, National Lab for Information Science and Technology, Tsinghua University, Beijing, China ¶Jiangsu Collaborative Innovation Center for Language Ability, Jiangsu Normal University, Xuzhou, China Abstract lyze Chinese text. Fortunately, Chinese LIWC (Huang et al. Linguistic Inquiry and Word Count (LIWC) is a word count- 2012) has been released to fill the vacancy. In this paper, we ing software tool which has been used for quantitative text mainly focus on Chinese LIWC and use LIWC to stand for analysis in many fields. Due to its success and popularity, Chinese LIWC if not otherwise specified. the core lexicon has been translated into Chinese and many While LIWC has been used in a variety of fields, its lexi- other languages. However, the lexicon only contains several con only contains less than 7,000 words. This is insufficient thousand of words, which is deficient compared with the because according to (Li and others 2008), there are at least number of common words in Chinese. Current approaches 56,008 common words in Chinese. Moreover, LIWC lexi- often require manually expanding the lexicon, but it often con does not consider the emerging words and phrases on takes too much time and requires linguistic experts to ex- the Internet. Therefore, it is reasonable and necessary to ex- tend the lexicon. To address this issue, we propose to ex- pand the LIWC lexicon so that it is more accurate and com- pand the LIWC lexicon automatically. Specifically, we con- prehensive for scientific research. One way to expand LIWC sider it as a hierarchical classification problem and utilize lexicon is to annotate the new words manually. However, it the Sequence-to-Sequence model to classify words in the is too time-consuming and often requires language expertise lexicon. Moreover, we use the sememe information with to add new words. Hence, we propose to automatically ex- the attention mechanism to capture the exact meanings of pand LIWC lexicon. To the best of our knowledge, we are a word, so that we can expand a more precise and com- the first to expand LIWC lexicon automatically. prehensive lexicon. The experimental results show that our Automatically LIWC lexicon expansion faces the prob- model has a better understanding of word meanings with lems of polysemy and indistinctness. Polysemy means the help of sememes and achieves significant and consis- words and phrases have multiple meanings and are thereby tent improvements compared with the state-of-the-art meth- classified into multiple irrelevant categories. Indistinctness ods. The source code of this paper can be obtained from means many categories in LIWC are fine-grained, and thus https://github.com/thunlp/Auto CLIWC. making it more difficult to distinguish them. One important feature in LIWC lexicon is that categories Introduction form a tree structure hierarchy. Therefore, hierarchical clas- sification algorithms such as Hierarchical SVM (Support Linguistic Inquiry and Word Count (Pennebaker, Booth, and Vector Machine) (Chen, Crawford, and Ghosh 2004), can be Francis 2007, LIWC) has been widely used for comput- easily applied to LIWC lexicon expansion. However, these erized text analysis in social science. LIWC groups words methods are often too generic, without considering the pol- into manually-defined coarse-to-fine grained categories, and ysemy property of words and LIWC lexicon and indistinct- it was originally developed to address content analytic is- ness property of LIWC categories. sues in experimental psychology. Nowadays, there is an in- To address these issues, we propose to incorporate exter- creasing number of applications across fields such as com- nal annotated word information when expanding the lexicon. putational linguistics (Grimmer and Stewart 2013), demo- In linguistics, each word has one or more senses, and each graphics (Newman et al. 2008), health diagnostics (Bucci sense consists of one or more sememes, which are defined as and Maskit 2005), social relationship (Kacewicz et al. 2014), the smallest semantic language unit of meaning (Bloomfield etc. 1926). In this paper, we use HowNet (Dong and Dong 2003) Chinese is the most spoken language in the world (Lewis where words are annotated with relevant sememes. For pol- et al. 2009), but we cannot use the original LIWC to ana- ysemy problem, sememes can explicitly express different ∗This work was done when the first author was visiting Ts- meanings of a word and make it possible to assign multi- inghua University. ple labels. For indistinctness problem, sememe is also help- †Corresponding author: Zhiyuan Liu ([email protected]). ful in differentiating fine-grained categories since we have Copyright c 2018, Association for the Advancement of Artificial more detailed semantic information about the word mean- Intelligence (www.aaai.org). All rights reserved. ings. Moreover, sememes should be given different weights 5650 when classifying words into different categories. Thus, we annotation form a tree hierarchy, we can adopt hierarchi- propose to use attention mechanism to better utilize sememe cal classification methods for automatically LIWC expan- information. The experimental results show that our model sion. As the first effort on LIWC lexicon expansion, we be- achieves significant improvements as compared to the state- lieve that the works in hierarchical classification problems of-the-art methods. are more correlated to LIWC expansion and more suitable The contributions of this paper can be summarized as fol- for baseline comparisons. lows: For hierarchical classification, (Silla Jr and Freitas 2011) summarized a variety of them from different fields and cat- • To the best of our knowledge, our model is the first at- egorized them into five approaches. Flat Classifier (Barbedo tempt to expand LIWC lexicon automatically. and Lopes 2007) is the simplest one to deal with hierarchical • To address the polysemy and indistinctness problems, we classification problems. In this approach, the classifier com- propose to use sememe information to distinguish mean- pletely ignores the hierarchy and predicts only classes at the ings among words. leaf nodes. Local Classifier Per Node (Fagni and Sebastiani 2007) consists of training one binary classifier for class. Lo- • To better utilize sememe information, we use attention cal Classifier Per Parent Node (Silla and Freitas 2009) is the mechanism to assign different weights to sememes when approach where, for each parent node in the class hierar- classifying each word. chy, a multiclass classifier is trained to distinguish among • In the experiments, we show that our model outperforms its child nodes. Similar to Local Classifier Per Parent Node the state-of-the-art methods significantly. approach, Local Classifier Per Level (Clare and King 2003) trains one multiclass classifier, but at each level instead of each node. The last one is Global Classifier (Kiritchenko et Related Work al. 2006), where a single classification model is trained to In this section, we first introduce some previous works based take into account the hierarchy as a whole. on LIWC and then describe recent research in the hierarchi- In recent years, there have been some attempts to use neu- cal classification. Lastly, we briefly mention studies about ral networks for hierarchical classification. (Cerri, Barros, HowNet. and De Carvalho 2014) trained the multilayer perceptron The original English version of LIWC lexicon is one of level by level, and used the predictions of the neural network the most well-known dictionaries in quantitative text anal- as inputs for the next neural network associated with the next ysis. It was first released in the early 1990s and has been hierarchical level. (Karn, Waltinger, and Schutze¨ 2017) pro- updated several times, with the latest version released in posed to use RNN encoder-decoder for entity mention clas- 2015 (Pennebaker et al. 2015). Over these years, there have sification, which is a problem with a hierarchical class struc- emerged numerous scientific findings across different fields ture. The encoder-decoder performs classification by gen- with the help of the original lexicon. (Mehl et al. 2007) erating paths in the hierarchy from top node to leaf nodes. found that women and men both spoke about 16,000 words However, due to the polysemy and indistinctness problems, per day, dispelling the myth of female talkativeness. (Bucci these methods are not suitable for LIWC expansion. There- and Maskit 2007) showed that word counting approach tends fore, we propose to incorporate sememe information. to be less biased than clinician’s self-reports in therapeutic During these years, people manually annotate sememes improvements. (Rohrbaugh et al. 2008) found that the use of and build many linguistic knowledge bases. HowNet (Dong we indicated relationship closeness and was even predictive and Dong 2003) is a typical knowledge base with annotated of heart failure improvements. (Schwartz et al. 2013) ana- sememes, and it has far more words than LIWC. Therefore, lyzed personality, gender, and age in social media using an we can use it directly to help expand the LIWC lexicon. open-vocabulary approach. Due to the success and the im- HowNet has also been widely used in sentiment analysis portance of the original LIWC, (Huang et al. 2012) manually (Fu et al. 2013) and word representation learning (Niu et developed the first Chinese LIWC, and there is an increas- al. 2017). ing number of applications (Gao et al. 2013; Yu et al. 2016; Li et al. 2014) based on it. However, as a manually anno- Problem Formalization tated dictionary, this lexicon only contains less than 7,000 In this section, we first give an illustrative example of words words, which is a serious limitation compared to the number and categories in LIWC lexicon.
Recommended publications
  • Words and Alternative Basic Units for Linguistic Analysis
    Words and alternative basic units for linguistic analysis 1 Words and alternative basic units for linguistic analysis Jens Allwood SCCIIL Interdisciplinary Center, University of Gothenburg A. P. Hendrikse, Department of Linguistics, University of South Africa, Pretoria Elisabeth Ahlsén SCCIIL Interdisciplinary Center, University of Gothenburg Abstract The paper deals with words and possible alternative to words as basic units in linguistic theory, especially in interlinguistic comparison and corpus linguistics. A number of ways of defining the word are discussed and related to the analysis of linguistic corpora and to interlinguistic comparisons between corpora of spoken interaction. Problems associated with words as the basic units and alternatives to the traditional notion of word as a basis for corpus analysis and linguistic comparisons are presented and discussed. 1. What is a word? To some extent, there is an unclear view of what counts as a linguistic word, generally, and in different language types. This paper is an attempt to examine various construals of the concept “word”, in order to see how “words” might best be made use of as units of linguistic comparison. Using intuition, we might say that a word is a basic linguistic unit that is constituted by a combination of content (meaning) and expression, where the expression can be phonetic, orthographic or gestural (deaf sign language). On closer examination, however, it turns out that the notion “word” can be analyzed and specified in several different ways. Below we will consider the following three main ways of trying to analyze and define what a word is: (i) Analysis and definitions building on observation and supposed easy discovery (ii) Analysis and definitions building on manipulability (iii) Analysis and definitions building on abstraction 2.
    [Show full text]
  • ON SOME CATEGORIES for DESCRIBING the SEMOLEXEMIC STRUCTURE by Yoshihiko Ikegami
    ON SOME CATEGORIES FOR DESCRIBING THE SEMOLEXEMIC STRUCTURE by Yoshihiko Ikegami 1. A lexeme is the minimum unit that carries meaning. Thus a lexeme can be a "word" as well as an affix (i.e., something smaller than a word) or an idiom (i.e,, something larger than a word). 2. A sememe is a unit of meaning that can be realized as a single lexeme. It is defined as a structure constituted by those features having distinctive functions (i.e., serving to distinguish the sememe in question from other semernes that contrast with it).' A question that arises at this point is whether or not one lexeme always corresponds to just one serneme and no more. Three theoretical positions are foreseeable: (I) one which holds that one lexeme always corresponds to just one sememe and no more, (2) one which holds that one lexeme corresponds to an indefinitely large number of sememes, and (3) one which holds that one lexeme corresponds to a certain limited number of sememes. These three positions wiIl be referred to as (1) the "Grundbedeutung" theory, (2) the "use" theory, and (3) the "polysemy" theory, respectively. The Grundbedeutung theory, however attractive in itself, is to be rejected as unrealistic. Suppose a preliminary analysis has revealed that a lexeme seems to be used sometimes in an "abstract" sense and sometimes in a "concrete" sense. In order to posit a Grundbedeutung under such circumstances, it is to be assumed that there is a still higher level at which "abstract" and "concrete" are neutralized-this is certainly a theoretical possibility, but it seems highly unlikely and unrealistic from a psychological point of view.
    [Show full text]
  • Download Article
    Advances in Social Science, Education and Humanities Research (ASSEHR), volume 312 International Conference "Topical Problems of Philology and Didactics: Interdisciplinary Approach in Humanities and Social Sciences" (TPHD 2018) Methods of Identifying Members of Synonymic Row Juliya A. Litvinova Elena A. Maklakova Chair of Foreign Languages Chair of Foreign Languages Federal State Budget Educational Institution of Higher Federal State Budget Educational Institution of Higher Education Voronezh State University of Forestry and Education Voronezh State University of Forestry and Technologies named after G.F. Morozov Technologies named after G.F. Morozov Voronezh, Russia Voronezh, Russia [email protected] Affiliation): dept. name of organization Abstract— This article is devoted to identifying the criteria of analysis, method of field modeling, method of semantic synonymity of lexical items. The existence of different definitions interpretation, method of generalization of dictionary of synonymy, the selection criteria of items in the synonymic row definitions, methods of quantitative, lexicographical, indicate the insufficient study and incoherence of this contextual, psycholinguistic analysis. Data for the study were phenomenon in linguistics. The study of the semantics of lexical 2 lexical items in the Russian language (gorodok, gorodishko) items allows explaining the most accurately and authentically the integration and differentiation of lexical items close in meaning. obtained from the Russian Explanatory Dictionaries (V. I. The description of the meaning structure (sememe) is possible Dahl, D. N. Ushakov, S. I. Ozhegov, A. P. Evgenieva, S. A. through the description of its seme composition. The methods of Kuznetsov, T. F. Efremova), Russian National Corpus seme semasiology (lexicographic, psycholinguistic, contextual) (ruscorpora.ru). allow revealing various components in the sememe structure.
    [Show full text]
  • Verbs of 'Preparing Something for Eating by Heating It in a Particular
    DEPARTAMENTO DE FILOLOGÍA INGLESA Y ALEMANA Verbs of ‘preparing something for eating by heating it in a particular way’: a lexicological analysis Grado en Estudios Ingleses Fabián García Díaz Tutora: Mª del Carmen Fumero Pérez San Cristóbal de La Laguna, Tenerife 8 de septiembre de 2015 INDEX 1. Abstract ................................................................................................................................. 3 2. Introduction .......................................................................................................................... 4 3. Theoretical perspective ........................................................................................................ 6 4. Analysis: verbs of to prepare something for eating by heating it in a particular way: cook, fry and roast. ................................................................................................................... 9 4.1. Corpus selection .............................................................................................................. 9 4.2. Verb selection ................................................................................................................ 11 5. Paradigmatic relations ....................................................................................................... 13 5.1. Semantic components and lexematic analysis ............................................................... 13 5.2. Lexical relations ...........................................................................................................
    [Show full text]
  • Improved Word Representation Learning with Sememes
    Improved Word Representation Learning with Sememes Yilin Niu1∗, Ruobing Xie1,∗ Zhiyuan Liu1;2 ,y Maosong Sun1;2 1 Department of Computer Science and Technology, State Key Lab on Intelligent Technology and Systems, National Lab for Information Science and Technology, Tsinghua University, Beijing, China 2 Jiangsu Collaborative Innovation Center for Language Ability, Jiangsu Normal University, Xuzhou 221009 China Abstract for each word. Hence, people manually annotate word sememes and build linguistic common-sense Sememes are minimum semantic units of knowledge bases. word meanings, and the meaning of each HowNet (Dong and Dong, 2003) is one of such word sense is typically composed by sev- knowledge bases, which annotates each concep- eral sememes. Since sememes are not ex- t in Chinese with one or more relevant sememes. plicit for each word, people manually an- Different from WordNet (Miller, 1995), the phi- notate word sememes and form linguistic losophy of HowNet emphasizes the significance of common-sense knowledge bases. In this part and attribute represented by sememes. paper, we present that, word sememe in- HowNet has been widely utilized in word similar- formation can improve word representa- ity computation (Liu and Li, 2002) and sentiment tion learning (WRL), which maps word- analysis (Xianghua et al., 2013), and in section 3.2 s into a low-dimensional semantic space we will give a detailed introduction to sememes, and serves as a fundamental step for many senses and words in HowNet. NLP tasks. The key idea is to utilize In this paper, we aim to incorporate word se- word sememes to capture exact meanings memes into word representation learning (WRL) of a word within specific contexts accu- and learn improved word embeddings in a low- rately.
    [Show full text]
  • Towards a Deeper Understanding of Meaning in Language
    Valentina Gavranović www.singidunum.ac.rs Valentina Gavranović TOWARDS A TOWARDS A DEEPER DEEPER UNDERSTANDING UNDERSTANDING OF MEANING OF MEANING IN LANGUAGE AIN COURSEBOOK LANGUAGE IN ENGLISH SEMANTICS A COURSEBOOK IN ENGLISH SEMANTICS is coursebook has been written for undergraduate students majoring in English, with the aim to introduce them with the main concepts found in the domain of semantics, and help them understand the complexities and dierent aspects of meaning in language. Although the book is primarily intended for students who have no previous theoretical knowledge in semantics, it can also be used by those who want to consolidate and expand on their existing learning experience in linguistics and semantics. e scope of concepts and topics covered in this book have been framed within a rounded context which oers the basics in semantics, and provides references and incentives for students’ further autonomous investigation and research. e author endeavoured to present complex linguistic issues in such a way that any language learner with no previous knowledge in this area can use the book. However, anyone who uses this book needs to read it very carefully so that they could understand the theory and apply it while tackling the examples illustrating the concepts provided in this book. Furthermore, the very nature of linguistics demands a thorough and careful reading, rereading and questioning. e book is composed of six chapters, and, within them, there are several sections and subsections. Each chapter contains theoretical explanations followed by a set of questions and exercises whose aim is to help the learners revise the theoretical concepts and apply them in practice.
    [Show full text]
  • Towards Building a Multilingual Sememe Knowledge Base
    Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets Fanchao Qi1∗, Liang Chang2∗y, Maosong Sun13z, Sicong Ouyang2y, Zhiyuan Liu1 1Department of Computer Science and Technology, Tsinghua University Institute for Artificial Intelligence, Tsinghua University Beijing National Research Center for Information Science and Technology 2Beijing University of Posts and Telecommunications 3Jiangsu Collaborative Innovation Center for Language Ability, Jiangsu Normal University [email protected], [email protected] fsms, [email protected], [email protected] Abstract word husband A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain sense "married man" "carefully use" words annotated with sememes, have been successfully ap- plied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread human economize utilization. To address the issue, we propose to build a uni- sememe fied sememe KB for multiple languages based on BabelNet, a family male spouse multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It man- ually annotates sememes for over 15 thousand synsets (the entries of BabelNet). Then, we present a novel task of auto- Figure 1: Sememe annotation of the word “husband” in matic sememe prediction for synsets, aiming to expand the HowNet. seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative anal- sememes to annotate senses of over 100 thousand Chinese yses to explore important factors and difficulties in the task.
    [Show full text]
  • Towards Building a Multilingual Sememe Knowledge Base
    The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets Fanchao Qi,1∗ Liang Chang,2∗† Maosong Sun,1,3‡ Sicong Ouyang,2† Zhiyuan Liu1 1Department of Computer Science and Technology, Tsinghua University Institute for Artificial Intelligence, Tsinghua University Beijing National Research Center for Information Science and Technology 2Beijing University of Posts and Telecommunications 3Jiangsu Collaborative Innovation Center for Language Ability, Jiangsu Normal University [email protected], [email protected] {sms, liuzy}@tsinghua.edu.cn, [email protected] Abstract word husband A sememe is defined as the minimum semantic unit of human languages. Sememe knowledge bases (KBs), which contain sense "married man" "carefully use" words annotated with sememes, have been successfully ap- plied to many NLP tasks. However, existing sememe KBs are built on only a few languages, which hinders their widespread human economize utilization. To address the issue, we propose to build a uni- sememe fied sememe KB for multiple languages based on BabelNet, a family male spouse multilingual encyclopedic dictionary. We first build a dataset serving as the seed of the multilingual sememe KB. It man- ually annotates sememes for over 15 thousand synsets (the entries of BabelNet). Then, we present a novel task of auto- Figure 1: Sememe annotation of the word “husband” in matic sememe prediction for synsets, aiming to expand the HowNet. seed dataset into a usable KB. We also propose two simple and effective models, which exploit different information of synsets. Finally, we conduct quantitative and qualitative anal- sememes to annotate senses of over 100 thousand Chinese yses to explore important factors and difficulties in the task.
    [Show full text]
  • 10. Semic Analysis Summary
    Louis Hébert, Tools for Text and Image Analysis: An Introduction to Applied Semiotics 10. SEMIC ANALYSIS SUMMARY Semic analysis was developed within the field of semantics (the study of meaning in linguistic units). This chapter presents Rastier's interpretive semantics, and semic analysis as formulated therein. Semic analysis is performed on a semiotic act – a text, for example – by identifying the semes, that is, the elements of meaning (parts of the signified), defining clusters of semes (isotopies and molecules) and determining the relations between the clusters (relations of presupposition, comparison, etc., between the isotopies). The repetition of a seme creates an isotopy. For example, in "There was a fine ship, carved from solid gold / With azure reaching masts, on seas unknown" (Émile Nelligan, "The Golden Ship"), the words "ship", "masts" and "seas" all contain the seme /navigation/ (as well as others). The repetition of this seme creates the isotopy /navigation/. A semic molecule is a cluster of at least two semes appearing together more than once in a single semantic unit (a single word, for instance). For instance, in the poem just quoted, there is a semic molecule formed by the semes /precious/ + /dispersion/. It appears in the "streaming hair" ("cheveux épars") of the beautiful Goddess of Love (Venus) who was "spread-eagled" ("s'étalait") at the ship's prow, and also in the sinking of the ship's treasure, which the sailors "disputed amongst themselves" ("entre eux ont disputés"). 1. THEORY Semic analysis is performed on a semiotic act – such as a text – by identifying the semes (the elements of meaning), finding clusters of semes (isotopies and molecules) and determining the relations between the clusters (relations of presupposition, comparison, etc., between the isotopies).
    [Show full text]
  • International English Usage Loreto Todd & Ian Hancock
    INTERNATIONAL ENGLISH USAGE LORETO TODD & IAN HANCOCK London First published 1986 by Croom Helm This edition published in the Taylor & Francis e-Library, 2005. “To purchase your own copy of this or any of Taylor & Francis or Routledge's collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.” First published in paperback 1990 by Routledge 11 New Fetter Lane, London EC4P 4EE © 1986 Loreto Todd and Ian Hancock All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. British Library Cataloguing in Publication Data Todd, Loreto International English usage. 1. English language—Usage I. Title II. Hancock, Ian 428 PE1460 Library of Congress Cataloging in Publication Data Todd, Loreto. International English usage. Includes index. 1. English language—Usage—Handbooks, manuals, etc. I. Hancock, Ian F. II. Title. PE1460.T64 1987 428 86–28426 ISBN 0-203-97763-7 Master e-book ISBN ISBN 0-415-05102-9 (Print Edition) ISBN 0-709-94314-8 hb. Contents Introduction iv Contributors vi List of Symbols viii Pronunciation Guide ix INTERNATIONAL ENGLISH USAGE 1–587 Index 588–610 Introduction In the four centuries since the time of Shakespeare, English has changed from a relatively unimportant European language with perhaps four million speakers into an international language used in every continent by approximately eight hundred million people. It is spoken natively by large sections of the population in Australia, Canada, the Caribbean, Ireland, New Zealand, the Philippines, Southern Africa, the United Kingdom and the United States of America; it is widely spoken as a second language throughout Africa and Asia; and it is the most frequently used language of international affairs.
    [Show full text]
  • The Seme ‘Strong’ in Lexicological Definitions
    UDC 811.111’37 Boris Hlebec* University of Belgrade Faculty of Philology English Department Belgrade, Serbia THE SEME ‘STRONG’ IN LEXICOLOGICAL DEFINITIONS Abstract This is an in-depth analysis of a selection of English lexemes containing the seme ‘strong’, performed by means of the collocational method as devised and elaborated by the author in his previous articles. This kind of approach shows the way language really works and that there is no clear borderline between langue and parole, or between lexis and syntax. Key words: semantic definition, collocation, seme, sememe, classeme, cryptotype 1. Introduction Defining lexemes in a scientific way is a rigorous task, which requires insight into the whole lexical system, or at least into a large part of it. When applying our collocational method to this aim (Hlebec 2007, 2008a, 2008b, 2008c, 2010, 2011a, 2011b), the present author has come across * E-mail address: [email protected] Sources of collocations have been various: British National Corpus, Corpus of Contemporary American English, Oxford Collocational Dictionary for Students of English, The Cassell Dictionary of Appropriate Adjectives, Dictionary of Selected Collocations among the most frequently consulted. Slang, literary style and specialized terms have not been taken into account. 7 Belgrade BELLS a defining seme2 ‘strong’ as a recurrent element in quite a lot of lexemes.3 This topic merits a whole volume, and the article reveals only a part of its extensive use. ‘Strong’ is to be understood in its abstract meaning ‘of great intensity’ rather than in its concrete meaning ‘of great bodily strength’. The crucial step in the application of the collocational method is to ascertain the common content of the directive.
    [Show full text]
  • Hewson's Semantic Theory and Implicit Semantics
    HEWSON'S SEMANTIC THEORY AND IMPLICIT SEMANTICS Valeri Vassiliev Chuvash Pedagogical Institute Cheboksary, Russia ABSTRACT The problem of implicit semantics was approached in this paper with a broad understanding of presuppositions, implications, entailments, etc., as derivable information created by three different kinds of con- texts in the four stages of meaning generation. Liens are viewed as meanings implied by the speaker and/or inferred by the hearer in a real act of communication. As meaning generation is assumed to be an ac- tivity, liens are consequently treated as dynamic processes involving psychological associations, rational deductions, and imagination. Each stage of semiogenesis provides a potential possibility for a variety of liens. Most of them are dismissed and cancelled at the subsequent stages of meaning generation so that the final perceptive product-an understood utterance-appears to have only several liens which re- main relevant for a given set of linguistic, referential and situational contexts and for the majority of the speakers. 1. John Hewson's ideas on semantics, which permeate his works, ap- pear to have a considerable explanatory potential applicable to a number of semantic issues. The following is a brief outline of his semantic teaching and an attempt to demonstrate how it can be employed in the treatment of problems pertaining to implicit semantics. 1.1 Hewson's theory of meaning is based on Saussure's distinction of langue and parole, or tongue and discourse, and elaborates the view that the use of language is an activity 'whereby the underlying elements of a tongue are processed, albeit subconsciously, for use in discourse' (Hewson 1990: 27).
    [Show full text]