Words and Alternative Basic Units for Linguistic Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Words and alternative basic units for linguistic analysis 1 Words and alternative basic units for linguistic analysis Jens Allwood SCCIIL Interdisciplinary Center, University of Gothenburg A. P. Hendrikse, Department of Linguistics, University of South Africa, Pretoria Elisabeth Ahlsén SCCIIL Interdisciplinary Center, University of Gothenburg Abstract The paper deals with words and possible alternative to words as basic units in linguistic theory, especially in interlinguistic comparison and corpus linguistics. A number of ways of defining the word are discussed and related to the analysis of linguistic corpora and to interlinguistic comparisons between corpora of spoken interaction. Problems associated with words as the basic units and alternatives to the traditional notion of word as a basis for corpus analysis and linguistic comparisons are presented and discussed. 1. What is a word? To some extent, there is an unclear view of what counts as a linguistic word, generally, and in different language types. This paper is an attempt to examine various construals of the concept “word”, in order to see how “words” might best be made use of as units of linguistic comparison. Using intuition, we might say that a word is a basic linguistic unit that is constituted by a combination of content (meaning) and expression, where the expression can be phonetic, orthographic or gestural (deaf sign language). On closer examination, however, it turns out that the notion “word” can be analyzed and specified in several different ways. Below we will consider the following three main ways of trying to analyze and define what a word is: (i) Analysis and definitions building on observation and supposed easy discovery (ii) Analysis and definitions building on manipulability (iii) Analysis and definitions building on abstraction 2. Analysis and definitions building on observation and supposed easy discovery We will start by considering analyses and definitions intended to build on observation of linguistic communication. Here the idea is that words are the basic building blocks of linguistic communication, providing combinable units of meaning and external expression that should as such be fairly directly observable and discoverable when inspecting linguistic communication, whether in written, spoken or gestural form. This can especially be seen in the definition of orthographic words given below. (i) Orthographic words According to Trask (2004) “[a]n orthographic word is a written sequence which has a white space at each end but no white space in the middle”. This definition of ”orthographic word” is both too wide and too narrow, in relation to other notions of word that intuitively have precedence. For example, the expression ”rail road” has 2 two orthographic words but is intuitively one word. This means that the notion of orthographic word as defined captures too much, i.e. it is too wide => too many words. But making ”rail road” into two orthographic words is also too narrow => not capturing the word (semantic unit and phonological stress unit, lexeme) that is actually there. (ii) Phonological words Following Trask again, Trask (2004) defines a phonological word as “a piece of speech which behaves as a unit of pronunciation according to criteria which vary from language to language” Unfortunately, there are units other than words that perhaps meet such phonological requirements, for example phonemes, syllables or breath groups. The definition does not tell us how to differentiate these units from each other. The mention of language-specific features does not help, since, these might be different for different languages, for the different units. In addition, when transcribing words, i.e. making them into orthographic words, typical phonetic information that may be used in the identification of phonological words such as stress, tone patterns, pauses (length) are either typically not represented in transcriptions. One reason for this is that such information is not traditionally part of written language, another is that it may reflect that this information is not so easily consciously recognized/observed by phonetically untrained transcribers. If we consider the relation between orthographic words and phonological words, we first may note that given these two definitions of a word, a consequence is that there is no 1- 1 correspondence between orthographic words, phonological and semantic words. Consider the following examples: rail road (2 orthograpic words - 1 phonological word) or I’m, you’re, won’t and ain’t (1 orthograpic word - 1 phonological word but two semantically motivated words). New York (2 orthographic words) vs. Newfoundland (1 orthographic word). New York and Newfoundland, thus, fairly arbitrarily, have different orthographic status while both probably are single phonological words etc. (iii) Gestural words Using Trask’s definition of phonological words as a model, we can now define gestural words analogously as as “a piece of gestural communication which behaves as a unit of gesturing according to criteria which vary from language to language” The relation between orthographic, phonological and semantically motivated and gestural words is more complex, so that 1 – 1 correspondences between the three word forms are not always possible to establish here either. Concerning gestural languages (sign languages), one reason for this is that while written and spoken words can be seen as variants of the same unit in two different expressive modes, gestural words in sign language are units in a new language and not gestural variants of the same word, in the sense that the written and spoken variants of a word are variants. We can also note that only the definition of orthographic word is operational, i.e. lives up to the desiderata of being both directly observable and discoverable and thus, directly usable as an element in automated information retrieval. Since, as we have seen above, the criteria given for what is a unit of pronunciation or gesturing are not sufficient, these concepts thus remain in need of further specification and clarification. 3 3. Analysis and definitions building on manipulability Many linguists have thought that word criteria, based on inherent word features that are supposed to be directly observable are unreliable and need to be supplemented by other criteria. Some widely used such criteria are criteria that in a syntactic mode focus on the unit status of words. Two criteria are often suggested: (i) Moveability (ii) Resistance to intrusion and interruption Both of these criteria have often been used to define the notion ”word”. We will now consider them one by one. 3.1 Moveability According to this criterion, a word is the smallest element of a sentence that can be moved around without destroying the grammaticality of the sentence. Thus, the fact that the word often in the expression often he went to the house can be moved from first to last position as in he went to the house often, shows that often is a word. A problem with this criterion is that several kinds of units that have not traditionally been considered words can be moved around in a similar fashion. Consider the following examples: (i) Movement of morpheme The unfaithful wife was masked -> The faithful wife was unmasked Even if -un is not usually regarded as a word but as a morpheme, it can be moved around without destroying the grammaticality of the embedding sentence. The meaning is changed, but since the criterion wisely does not demand preservation of meaning, -un passes as a moveable unit. If preservation of meaning had been required, even the example given above, using the word often, might not qualify (the information structural aspect of meaning is changed). In fact, very few changes of word order do not have an effect on meaning and it would not be a trivial task to say which aspects of meaning do not change when word order is changed. If it be objected that there is no movement here, -un is just deleted and affixed to a new unit, a reply to this is that movement can always be analyzed as a combination of deletion and addition and since there is no requirement of preservation of meaning there is no way to rule this type of example out. (ii) Movement of phrase By and large you are right -> You are right by and large The phrase by and large is not usually regarded as a word but clearly behaves in a word like fashion, using this criterion. Thus, if we use the criterion of moveability, it seems that morphemes and fixed phrases are somewhat arbitrarily excluded from word status. 4 Since especially what one might call “lexicalized phrases” are important for our argument, we will give some more examples of expressions of this type that arguably have lexicalized status. The classification and examples are taken from Moon (1998) (cf. also Wray 2002). 1. Different types of “anomalous” collocations At all, by and large, of course, stay put, thank you, in retrospect, kith and kin, on behalf of someone/something, short shrift, to and fro at least, a foregone conclusion, in effect, beg the question, in time, curry favour, foot the bill, toe the line in action, into action, out of action, on show, on display, to a …degree, to a …extent 2. Fomulae Simple formulae alive and well, I’m sorry to say, not exactly, pick and choose, you know Sayings an eye for an eye, curiouser and curiouser, don’t let the bastards grind you down, that’s the way the cookie crumbles, home, James, and don’t spare the horses Proverbs you can’t have your cake and eat it, enough is enough, first come first served Similes good as gold, as old as the hills, like lambs to the slaughter, live like a king 3. Metaphors Transparent metaphors alarm bells ring, behind someone’s back, breathe life into something, on (some) one’s doorstep, pack one’s back Semi-transparent metaphors grasp the nettle, on an even keel, the pecking order, throw the towel in, under one’s belt Opaque metaphors bite the bullet, kick the bucket, over the moon, red herring, shoot the breeze 3.2 Resistance to ‘intrusion and interruption’ The second common criterion for word-hood is that words are the largest units which resist ‘intrusion and interruption’ by the insertion of new material between their constituent parts.