Stringing together a sentence: linearity and the lexis- interface

Oliver Mason

University of Birmingham

Abstract

Following existing approaches to linear grammar we explore the application of automatically identified multi-word units to the of sentence structure. After looking at several sample sentences we then move on to a discussion of routine use vs. creativity in language. The proposed new phraseological grammar does away with both syntactic and functional categories and reduces syntax to a by-product of a linearising thought in the form of phraseological units of meaning.

1. Introduction

Phraseology is concerned with the study of units above the level of the single word, which seem to become increasingly important with the widening application of empirical principles to the field of . The single word, while a convenient starting point, is not a suitable entity when describing either sentence structure or aspects of meaning. A word has no meaning in isolation, and even its syntactic environment is usually idiosyncratic when we consider actual use rather than theoretical possibilities. So far, multi-word units (MWUs) have been identified as units of meaning (eg Danielsson 2001), as it is only in conjunction with other words that we are able to decide which aspect of its meaning potential has been realised in a particular instance. The phrasal environment of a thus serves as a shorthand description of its use, if we consider the definition of meaning as use. Stubbs (2001) gives the examples of surgery and bank, which despite having several distinct meanings in isolation cannot ever be confused when used in authentic sentences. It is, of course, possible to deliberately invent sentences where their use is ambiguous, but the important issue here is that this is not what speakers do in real life. However, MWUs are not only important in , where they displace the single lexical item as the central element, shifting the focus from lexical meaning to phrasal meaning, but also in syntax, where they compete with abstract descriptions ultimately based on (Chomsky 1957). As Stubbs (1993) observes, grammarians are traditionally interested in structures only, and view the lexical items as mere instantiations of the grammatical categories which they belong to. More recent approaches (eg Sinclair 1991, Francis 1991, Brazil 1995, Hunston and Francis 2000, Sinclair and Mauranen

232 Mason

2006), on the other hand, have demonstrated that lexical items are more than that, and that instead there is a correlation between grammatical structures and the words which occur in them. As is the case with everything in the description of language, this correlation is not an absolute, but rather expresses strong tendencies reinforced by everyday usage. Some of these alternative approaches furthermore view sentence structure not as hierarchical (as in analyses derived from phrase structure grammars) but instead as linear. Such a linear sequence of units (elements in the terminology of Brazil (1995), patterns in Hunston and Francis (2000), and chunks in Sinclair and Mauranen (2006)) is constructed mainly according to the principle of prospection, where one unit places constraints upon the range of possible successor units. 1.1 Open Choice vs

Sinclair (1991) discusses two principles of grammatical descriptions, connected to the Saussurian notions of syntagma and paradigma: the open-choice principle treats each position in an utterance as a (complex) choice, basically like a slot that is filled by an appropriate item (hence it is also referred to as 'slot-and-filler model'). The idiom principle, on the other hand, states that the user has at their disposal a set of larger units, so that they do not select individual lexical items (as they would do following the open-choice principle) but instead larger chunks. He argues that neither principle is sufficient to describe language, but that the idiom principle is the more important of the two, which should be used by default for describing texts. Only when a phenomenon cannot be accounted for by the idiom principle should we fall back on the open-choice principle. This view fits in well with a model that uses MWUs as their basic units, as these would represent the larger chunks that make up utterances instead of single lexical items. As we will see below, Sinclair was right in stating that both principles are required for a more comprehensive description.

2. Multi word units and Phraseology

We now look at multi word units, which go beyond the single lexical item. While they can of course be described intuitively, there are two principal ways of automatically identifying them through computer algorithms. In this section we will explain those algorithms, as they form the basis for extracting MWUs that we later apply to our grammatical description. 2.1 Chains

When looking at computer-identified phrases in the past, most work has been concerned with n-grams, where word sequences of a particular length are extracted from a text. Values for n are typically in the range of 2 to about 8 (as on Fletcher's Phrases In English site). This is a great step forward from early studies which were mainly limited to bigrams and trigrams; this step has been facilitated