RELATIVE SEMANTIC COMPLEXITY IN LEXICAL UNITS Bo Ralph Department of Computational Linguistics University of G~teborg GSteborg, Sweden

Summary necessary in order to reveal the under- lying principles of lexical organiza- The lexical component of a human tion. Consequently, computer-based language is typically heterogeneous and lexicology should rank high as a branch extremely complex. Before we can come of computational and theoretical lin- to grips with the underlying lexical guistics. organization, we must reduce the be- wildering complexity. Methods must be elaborated by which the interrelations The Heterogeneity of between the units of the can be elucidated. This paper describes how a Lexical inventories that have de- Swedish lexical material stored in a veloped spontaneously do not usually computer has been semantically strati- constitute neat and clear-cut systems. fied as a stage in the semantic ana- They are typically skewed in the sense lysis of the items included in the data that many phenomena which may seem quite base. In particular, a minor subset of marginal have nonetheless given rise to the lexical items, consisting of a rich , in contrast to the current in the language, has been lexical sparsity characterizing several selected as metalanguage in the defini- domains that are logically more funda- tions. It is argued that, in this way, mental to man. To take just one example, a means of describing the relative there are, in Swedish, rather few ex- semantic complexity in lexical units pressions for eating while there is a is provided. great variety of verbs for making all sorts of noises displaying only minor acoustical (and perceptual) differentia- Introduction tion. Our creative capacity simply seems to be more nourished by our imagination The semantic and syntactic inter- with regard to sounds than by our relations between the lexical units of imagination with regard to food con- a human language are notoriously com- sumption. That the asymmetry is quite plex and intricate, whether considered arbitrary is emphasized by the fact from the individual language-user's that other essential human activities point of view or from the perspective may produce a rich vocabulary. For of the collective competence of a instance, very fine distinctions can, language community. Indeed, they are in Swedish, be expressed monomorphemic- so complex that, when it comes to ally in the field of walking. thorough semantic analysis, scholars have only been able to handle small Such disproportions as those just portions of the lexicon at a time. The mentioned are basically due to histo- typical lexico-semantic study has rical accidents, i.e. pure chance, more therefore concerned single lexical or less. Consequently, they are items or small groups of semantically language-specific rather than universal interrelated items, in particular and cannot be ascribed to any general so-called -fields or semantic tendencies in the human mind. The same domains. holds for all culture-dependent ex- pressions. Thus, if the lexicons of On the other hand, there seems to many languages tend to contain words be a growing sentiment among linguists for buildings and vehicles, it is that the lexical component is very primarily because human beings tend to basic to the functioning of language. develop such things and, secondarily, The crucial role of the lexicon cannot, need to name them. It can be concluded however, be adequately understood un- that the reason for the recurrence of less the scope is widened. Detailed such terms in various languages all knowledge is, admittedly, quite in- over the world is not essentially dispensable in constructing an overall (psycho) linguistic but, rather, a model of the lexicon; but large-scale corollary of comparable extra-linguistic lexical investigations are just as circumstances.

--115 - Cultural conditions may also give part of the vocabulary tends to be rise to other types of lexical hetero- shared by most speakers, more differen- geneity. The lexicon of a language may tiation being found at the other be viewed as comprising different extreme. strata, some of which contain common Fillmore has mentioned a number of words used by everyone, others contain- ways in which languages may differ with ing words used exclusively by special- respect to word semantics. There are ists. Technical language - where "tech- such features as relative analyticity, nical" should be taken in a broad i.e. the degree of semantic trans- sense - in various fields, such as me- dicine, law, economy, technology, etc.; parency characterizing the total lexical system, taxonomic depth, by which is some forms of language used in certain meant the dosage of particular as com- professions or by certain socially de- pared to generic terms, patterns of fined groups, like traders, priests, or meaning extension, areas of synonymy outlaws - these are examples of voca- elaboration, collocational patterns, bulary strata that are likely to be etc. (Fillmore 1978, p. 155-157). 5 In fully mastered only by relatively few fact, different domains within the voca- individuals. It is to be deplored when the language of professional debaters, bulary of a single language may vary a great deal in these respects. For in- for instance in politics and esthetics, stance, terminology is often, although also develops in this direction, as is often the case. not always, harder to analyse than are common words. In particular, terminology Other strata of language may be tends to invite heavy borrowing of for- quite familiar to a majority of the eign lexical material; in this way the language-users although they are less portion of arbitrary lexical units frequently employed, being tied up with increases. different styles, registers, or con- textual settings. This may apply to the It cannot be doubted that somewhere behind the confusing complexity of the vocabulary of honorific language, re- lexicon there is a clue as to what human ligious language, etc. Such differentia- beings find imperative to recognize as tion in as has been exemp- delimited concepts. The categorization lified here is manifested in a reflected by lexical inventories is con- language-specific way, but the very siderably disguised through the hetero- existence of differentiation is a uni- geneity which is a basic characteristic versal trait. It has been suggested of the lexical component, as has been that lexical inventories can be sub- emphasized repeatedly. As a first step, divided into various domains obeying then, methods must be elaborated by different sets of rules that govern the which the complexity can be duly relations between language and reality. handled. In particular, the semantic In other words, there may well be redundancy of the authentic lexicon various kinds of word meanings (cf. must be reduced. Fillmore 1978). 5 Information about a many-splendoured world is to be con- veyed by means of language. The phe- Reducing Redundancy nomena referred to are quite different in nature, and so the semantic content It is very natural in lexico- of lexical items may vary accordingly. semantic analysis to take word defini- tions as a point of departure. It can In most authentic vocabularies be argued that a defined word is seman- there is a gradient ranging from more tically more complex than each word used or less purely grammatical operators in the definition of that word. Also, and structure-dependent items (such as it is a well-known fact that circularity the copula, connectives, quantifiers, very easily creeps into definitions. etc.), over items that are partly Although circularity in definitions has system-oriented, partly more semantic- ally weighted (e.g. pronouns, deictic occasionally been the target of investi- expressions, prepositions), all the way gation and has served successfully as a to items simply indexing "encyclopedic" basis for determining semantic related- ness (e.g. Calzonari 1977), 2 it should, phenomena. There is much fluctuation ideally, be controlled. from language to language in this re- gard, since the division of labour One way of achieving maximal reduc- between vocabulary and grammar proper tion of semantic redundancy in the lex~ may vary. Thus the proportion of words con is, of course, to define all lexical with primarily grammatical functions entries by means of an effective meta- may differ to a high degree between language, e.g. a minimal defining voca- languages. However, the grammar-oriented bulary. Our interest can then be focused

116 on this minimal word-list on the assump- however, not absolutely on a par with tion that it covers the same semantic each other. Both to fish and to angle range as the complete vocabulary defined "contain" an element of catching. It can by it. In practical , de- be argued that they differ from each fining vocabularies have been utilized other, and from to catch, in the way the in, for instance, The General Basic catching is specified. To fish explica- English (1942) ; 8 Michael tes the object caught, viz. 'fish'. That West, An International Reader's Diction- fish is caught is presupposed by to ary (1965); Iu and, in a project having angle as well, but with the additional much wider scope and, therefore, holding specification of the fishing method em- greater theoretical interest, in ployed. However, the two types of speci- Longman's Dictionary of Contemporary fication are not equal with respect to English (1978).J the verbal act 'to catch'. While 'to catch' is presupposed as an element in Defining vocabularies are intui- to fish, the whole meaning 'to try to tively attractive. They seem to capture catch fish' is incorporated in to angle. the notion of basic vocabulary, the The relations can be expressed by general lexical subset included in bracketing in the following manner: everybody's vocabulary. In some excep- tional cases it is very easy to isolate to catch - '(to try to get hold of)' this subset. In Dyirbal, for instance, to fish - '(to catch [= to try to a Queensland Australian language, there get hold of] (fish))' is a special vocabulary used in certain to angle - '(to fish [= to catch social contexts; hence it is referred to (= to try to get hold as "mother-in-law language" (Dixon of) (fish)] (with a 1971). 4 In this subsystem, Dyangul, the hook and line))' same grammatical rules apply, but the The closer relationship between to fish vocabulary is very restricted so that, and to angle may be indicated by making for instance, each Dyangul verb corres- use of to fish in the definition of to ponds to several in the common language. angle. Parallel treatment of pairs or Therefore, the Dyangul vocabulary can be groups of verbs to the effect that one taken as a model for a semantic classi- verb may contain not only the general fication of words in Dyirbal. semantic properties of another verb but A slight disadvantage in using de- actually the other verb itself has been fining vocabularies is the levelling of suggested by, among others, Binnick depth in the linguistic analysis. The (1971) I and Fillmore (1978). 5 lexicon is considered on two fixed In fact, this relative semantic levels alone: that of the lexical en- stratification of the lexicon is rather tries and that of the basic defining similar to Weinreich's strategy for in- words. As is well known, however, lexi- vestigating the semantic content of the cal units play very different rQleS in lexical inventory. Weinreich gives the the language they are part of. Not in- following presentation: frequently, the semantic interrelations within given sets can only be represent- Stratum 0: terms definable only ed in a multi-layered fashion. I do not circularly and by os- wish to claim that the human lexicon is, tensive definition in any strict sense, hierarchically or- Stratum I: terms whose definitions ganized, but various subdivisions of it contain only stratum-0 may well be. terms, but without circularity For instance, to catch something Stratum 2: terms whose definitions means roughly 'to get hold of something', contain only stratum-0 to fish means 'to try to catch fish', and stratum-1 terms, ~d W angle means 'to fish with a hook without circularity and line'. Consistent use of a minimal Stratum n: terms whose definitions defining vocabulary would yield defini- contain only terms of tions like 'to try to get hold of fish strata 0, I, 2, ... n - with a hook and line' for to angle. This I. is by no means a totally inadequate de- finition. To angle is clearly related to He concludes that the metalanguage will verbal expressions like to get hold of; be made up of the complete ordinary the semantic relatedness becomes appar- language except for stratum n (Weinreich ent in a comparison with other verbs, 1962). ~ such as to interrupt, to sneeze, or to A similar line of reasoning is at twinkle. The verbal acts designated by the bottom of the organization of the to catch, to fish, and to angle are, Swedish lexical material analysed in the

117 - project Lexical Data Base, carried out defining-vocabulary units. The defini- at the Department of Computational Lin- tions are more elegantly formulated in guistics, University of GSteborg. A mi- this manner, but, in particular, the nimal defining vocabulary is, in prin- interrelations between lexical items are ciple, utilized in definitions. In addi- more revealingly stated. Such an ap- tion, however, words not included in the proach, building on extensive lexical defining vocabulary proper are occasion- cross-referencing, implies several theo- ally allowed in definitions, with the retical commitments. Therefore, it requirement that they should be ulti- should be emphasized that the data base mately reducible to strict defining described here is aimed at contributing vocabulary units. The minimal defining to [~X~!l~ the semantic interrelations vocabulary comprises words denoting very between lexical items, in the first fundamental concepts pertaining to place. This, however, should not be physical elements and forces, geometric- taken to mean that our goal has been an al notions, topographical properties, ideal ultimate representation of the state and movement, location, time, semantic structure in the lexicon. causation, basic organisms, physical and mental functions of organisms, etc., as well as more culture-sensitive and Investigating Relative Semantic conventionalized concepts, such as Complexity colours, artefacts, social conditions, Careful selection of defining units and the like. and adequate definitional formats are a A larger subset than the defining prerequisite for an acceptable result of vocabulary is the so-called fully de- the empirical work under way. It is true fined vocabulary. This part of the voca- that lexicographers involved in practic- bulary is provided with elaborated de- al undertakings naturally seek to attain finitions. Together with the defining consistent and adequate formulations in vocabulary it makes up the semantic hard definitions. The requirement is even core of the lexicon taken as a whole. We stronger if semantic structure is the are not likely to find more candidates main object of analysis. for this part of the vocabulary no Monolingual usually matter how much material is included in take the reader's knowledge of the the data base. Instead, new material language in question for granted. As a tends to be of a more specific kind, consequence, the definitions may not be e.g. terminology known by only a few explicit enough. For instance, the people, almost obsolete words, non- Swedish causative verbs fylla, glSdga, permanent compounds that have barely runda, sl~ta, sv~rta all agree in focus- passed the threshold of lexicalization, ing on the result of the respective but which are easily analysable in terms activities. In a standard dictionary, of the well-defined part of the voca- ISO, 6 they are defined by verbal phrases bulary; in short, words which do not add very similar to each other in structure: anything further to the basic semantic system of the lexicon. These latter Verb Definition items are not assigned any proper defi- nitions but are semantically specified fylla g~rafull more summarily. 'to fill' 'to make full' gl~dga g~ra gl~dande Thus the data base is, in prin- 'to make 'to make glowing' ciple, divided into three strata: glowing' (I) the ~£i~_~!~X, whose runda g~ra rund(are) units are axiomatic in a logic- 'to round' 'to make round(er) ' al sense and highly restricted sl~ta g~ra sl~t in number; 'to smooth' 'to make smooth' (2) the ~!!x defined vocabulary, sv~rta g~ra svart whose units have carefully 'to blacken' 'to make black' formulated definitions based on However, the verb of the paraphrases, the defining vocabulary; g~ra 'to make', implies quite different (3) the ~K~h~_~2~!~Z, activities in the respective cases, per- whose units are semantically haps something like 'to regulate', 'to described by approximation. treat', 'to shape', 'to grind', and 'to In line with the above reasoning as colour'. By aiming at this higher de- regards relative semantic complexity, gree of exactitude, we both acquire a we allow entities from the fully defined better knowledge of the basic semantic vocabulary to enter into definitions. properties of the lexical entries para- They are ultimately reducible to phrased and obtain good candidates for

118 the eventual defining vocabulary. Verb Verb paraphrase Although the material is stored in and manipulated by the computer, intui- blSd-a uts6ndra blod tion and Sprachgef~hl play dominant 'to bleed' 'to give off blood' roles in this work. Therefore, it is ~[~-! fiska med d~rj 'to whiff' 'to fish with a urgent to employ methods which may guide whiffing-line' our intuitions in a favourable direc- tion. Since, in Swedish, there is no fuml-a bete sig fuml-igt 'to fumble' 'to act fumblingly' "mother-in-law language", the units of h~-! f~rv~rva genom k~ the defining vocabulary have to be de- 'to buy' 'to acquire through termined by a number of methods with the purchase' joint goal of finding the minimal work- pensl-a bestryka med ~! able set of defining words. One impor- 'to paint' 'to paint with a tant method implies large-scale para- brush' phrasing of verbs. In a first round, we ~samka R!~H-a (or: concentrate on such verbs as have equi- ~!~-~ 'to pain' valent paraphrases involving the base ~a~-or) 'to cause pain' of the original verb, retained skrik-a utstSta skrik in the verb complement in the para- 'to cry' 'to ejaculate a cry' phrase. Such verbs are, for instance, uppl~gga i ~! the following: ~!-~ 'to pile' 'to arrange in a pile' Verb Verb paraphrase ~[~-~ k~nna sq[~ 'to grieve' 'to feel grief' bind-a festa med band tr~l-a arbeta som en tr~l 'to bind' 'to fix with a band' 'to toil' 'to work like a slave' ~xkl-a ~ka (med/p~) ~! The verbs of the paraphrases are 'to cycle' 'to go by bicycle' usually deprived of much of the specific fisk-a f~nga risk content characterizing the original, 'to fish' 'to catch fish' simple verbs. They emphasize the purely ~l-na bli 9ul(are) verbal element in the respective events. 'to turn 'to become (more) Most of the specific meaning lies, in- yellow' yellow' stead, in the verb complement in the hamr-a sl~ med hammare paraphrases. It is easily seen that the 'to hammer' 'to hit with a hammer' paraphrase verbs represent different kant-a f~rse med kan~(er) degrees of abstractness, i.e. they are 'to edge' 'to provide with an semantically complex to a varying ex- edge (or: edges)' tent. They are always, however, less ~-a forma ku~-i~ complex than the corresponding simple 'to make 'to shape convex' verbs they derive from in the analysis. convex' Considering the resemblance of these ~-~ kapa med s~H verbs to pronouns and other pro-forms, 'to saw i 'to cut with a saw' "pro-verb" would be a fitting term. tor-ka g~ra Loll(are) 'to dry' 'to make dry (or: Once the set of verbs to be used in drier)' paraphrases is established, it may also tvivl-a k~nna tvivel be employed for verbs with morphologic- 'to doubt' 'to feel doubt' ally dissimilar paraphrases. For in- stance, ~iska 'to love' : hysa k~rlek The Swedish paraphrases are just I to feel love , may be classified to- as natural as the simple verbs in the gether with other verbs of emotion. A examples given. There are hundreds of similar mode of analysis may also be analogous cases. In a host of other applied to such verbs as cannot be as- examples, there are quite conceivable sociated with paraphrases in any appar- paraphrases of basically the same kind, ent way: drabba 'to afflict', h~mta 'to although less conventionalized as col- fetch', m~rka 'to notice', etc. locations. The following verbs are of this type: As to the formats of the defini- tions, it is obvious that the para- phrases signal some fundamental proper- ties of the paraphrased verbs, besides

119 " the nature of the respective reduced same type of analysis. Of course, we are verb. Some verbs incorporate an instru- aware of many problems connected with ment, even morphologically recognizable this approach, e.g. the question of syn- (cykla, d~r~a, s~ga, etc.), others an tactic compatibility between original object implied in the event (bl~da, items and their paraphrases, the rela- kanta); still others focus on the result tive arbitrariness in selecting defining of the event (gulna, skrika, stapla, units, etc. Furthermore, there are many torka), or on the phenomenon or state features in the approach resembling ge- perceived by an experiencer (pl~ga, nerative semantic theory of the early sofia). A class which is potentially 1970's (and, incidentally, the work out- very large integrates an adverbial spe- lined in Mel'~uk and Zolkovskij 19697 cification of the event itself rather and elsewhere); consequently, the same than the actants involved (e.g. linka type of criticism as has been raised 'to limp', porla 'to purl', tindra 'to against that theory applies to the pre- twinkle'). sent work. Relations similar to those obtain- We do not find this too embarrass- ing between verbs and verbal phrases ing. Our work is chiefly empirical, within a language may be found if cor- starting with observable facts, i.e. the responding verbal expressions are com- words themselves, gradually eating our pared across languages. This is a wide- way down into deeper semantic structure. spread and natural method for reinforc- Thus, in a way, we are working in the ing observations on patterns of language direction compared with the structure. In certain lexical domains generative semanticists. We have no wish one language may have developed single to reduce all lexical items to a single (i.e. relatively arbitrary) verbs, while underlying category of units, and we are another language may express the same not prepared to press all lexical items content by phrases. For instance, there into one basic semantic schema. Rather, is a large family of motion verbs in we hope to be able to shed some light on both English and Swedish. In Japanese, the richness of the semantic system of the same meanings are usually rendered Swedish, by elaborating a semantically by various forms of the basic verb for based convertibility system. The method 'to walk' (aruku) augmented by one of a we have used seems to us to provide a number of mimetic adverbial elements. versatile means to such an end. Interestingly, some adverbs of the phrasal collocations thus arising in Japanese are, themselves, limited to a References very restricted context. This amounts I. Binnick, R.I. 1971. Bring and Come. to saying that the phrases are just as Linguistics Inquiry 2. 260-265. lexicalized as the single verbs in Eng- lish and Swedish (cf. Fillmore 1978). 3 2. Calzolari, N. 1977. An Empirical Approach to Circularity in Dictionary Transferring this interlingual com- Definitions. Cahiers de lexicologie parison to one language only, we may 31. 118-128. note that verbal paraphrases lend them- selves to classificatory work in an ana- 3. Dictionary of Contempor@ry English. logous way. Verbs may be more or less 1978. Harlow & London: Longman. productive as pro-verbs in paraphrases, 4. Dixon, R.M.W. 1971. A Method of Se- they may establish more or less natural mantic Description. Semantics, ed. paraphrases, they may occur in phrases by D.D. Steinberg and L.A. Jakobcvi~. which have corresponding single verbs Cambridge: University Press. or not, they may be more or less syno- nymous or antonymous to verbs establish- 5. Fillmore, C.J. 1978. On the Organiza- ed as pro-verbs, etc. By comparing dif- tion of Semantic Information in the ferent pro-verbs and their respective Lexicon. Papers from the Parasession paraphrases with each other we may also on the Lexicon, ed. by D. Farkas et find that a pro-verb may occur in the al. Chicago: Chicago Linguistic paraphrase of another pro-verb, thus Society. 148-173. producing semantic links of the type 6. ISO = Illustrerad svensk ordbok. 1977. discussed above. In such cases, the re- 3rd ed., 3rd pr. Stockholm: Natur och lative semantic complexity is clearly Kultur. recognizable. 7. Mel'~uk, I.A. and A.K. ~olkovskij. Verbs, in particular, are highly 1969. Towards a Functioning 'Meaning- rewarding in such work as has been des- Text' Model of Language. Essays on cribed here. Other word classes are, , Vol. II, ed. by however, accessible to basically the

120 V.Ju. Rozencvejg. 1974. Stockholm: Skriptor. 1-52. 8. The General Basic English Diction- ary, ed. by C.K. Ogden. 1942. New York: W.W. Norton & Co. 9. Weinreich, U. 1962. Lexicographic Definition in Descriptive Semantics. Problems in Lexico@raphy, 2nd ed. by F.W. Householder and S. Saporta 1967. Bloomington: Indiana Uni- versity. 25-44. 10. West, M. 1965. An International Reader's Dictionary. London: Longman.

-121