The Computational Lexical Semantics of Syntagmatic Expressions
Total Page:16
File Type:pdf, Size:1020Kb
The Computational Lexical Semantics of Syntagmatic Relations Evelyne Viegas, Stephen Beale and Sergei Nirenburg New Mexico State University Computing Research Lab, Las Cruces, NM 88003, USA viegas, sb, sergei©crl, nmsu. edu Abstract inheritance hierarchy of Lexical Semantic Functions In this paper, we address the issue of syntagmatic (LSFs). expressions from a computational lexical semantic perspective. From a representational viewpoint, we 2 Approaches to Syntagmatic argue for a hybrid approach combining linguistic and Relations conceptual paradigms, in order to account for the Syntagmatic relations, also known as collocations, continuum we find in natural languages from free are used differently by lexicographers, linguists and combining words to frozen expressions. In particu- statisticians denoting almost similar but not identi- lar, we focus on the place of lexical and semantic cal classes of expressions. restricted co-occurrences. From a processing view- The traditional approach to collocations has been point, we show how to generate/analyze syntag- lexicographic. Here dictionaries provide infor- matic expressions by using an efficient constraint- mation about what is unpredictable or idiosyn- based processor, well fitted for a knowledge-driven cratic. Benson (1989) synthesizes Hausmann's stud- approach. ies on collocations, calling expressions such as com- 1 Introduction mit murder, compile a dictionary, inflict a wound, etc. "fixed combinations, recurrent combinations" You can take advantage o] the chambermaid 1 is not a or "collocations". In Hausmann's terms (1979) a collocation one would like to generate in the context collocation is composed of two elements, a base ("Ba- of a hotel to mean "use the services of." This is why sis") and a collocate ("Kollokator"); the base is se- collocations should constitute an important part in mantically autonomous whereas the collocate cannot the design of Machine Translation or Multilingual be semantically interpreted in isolation. In other Generation systems. words, the set of lexical collocates which can com- In this paper, we address the issue of syntagmatic bine with a given basis is not predictable and there- expressions from a computational lexical semantic fore collocations must be listed in dictionaries. perspective. From a representational viewpoint, we It is hard to say that there has been a real focus argue for a hybrid approach combining linguistic and on collocations from a linguistic perspective. The conceptual paradigms, in order to account for the lexicon has been broadly sacrificed by both English- continuum we find in natural languages from free speaking schools and continental European schools. combining words to frozen expressions (such as in The scientific agenda of the former has been largely idioms kick the (proverbial) bucket). In particular, dominated by syntactic issues until recently, whereas we focus on the representation of restricted seman- the latter was more concerned with pragmatic as- tic and lexical co-occurrences, such as heavy smoker pects of natural languages. The focus has been on and pro#ssor ... students respectively, that we de- grammatical collocations such as adapt to, aim at, fine later. From a processing viewpoint, we show look ]or. Lakoff (1970) distinguishes a class of ex- how to generate/analyze syntagmatic expressions by pressions which cannot undergo certain operations, using an efficient constraint-based processor, well fit- such as nominalization, causativization: the problem ted for a knowledge-driven approach. In the follow- is hard; *the hardness of the problem; *the problem ing, we first compare different approaches to collo- hardened. The restriction on the application of cer- cations. Second, we present our approach in terms tain syntactic operations can help define collocations of representation and processing. Finally, we show such as hard problem, for example. Mel'~uk's treat- how to facilitate the acquisition of co-occurrences by ment of collocations will be detailed below. using 1) the formalism of lexical rules (LRs), 2) an In recent years, there has been a resurgence of 1Lederer, R. 1990. Anguished English A Laurel Book, Dell statistical approaches applied to the study of nat- Publishing. ural languages. Sinclair (1991) states that '% word 1328 which occurs in close proximity to a word under in- the collocational information is listed in a static way. vestigation is called a collocate of it .... Collocation We believe that one of the main drawbacks of the ap- is the occurrence of two or more words within a proach is the lack of any predictable calculi on the short space of each other in a text". The prob- possible expressions which can collocate with each lem is that with such a definition of collocations, other semantically. even when improved, z one identifies not only collo- cations but free-combining pairs frequently appear- 3 The Computational Lexical ing together such as lawyer-client; doctor-hospital. Semantic Approach However, nowadays, researchers seem to agree that In order to account for the continuum we find in nat- combining statistic with symbolic approaches lead ural languages, we argue for a continuum perspec- to quantifiable improvements (Klavans and Resnik, tive, spanning the range from free-combining words 1996). to idioms, with semantic collocations and idiosyn- The Meaning Text Theory Approach The crasies in between as defined in (Viegas and Bouil- Meaning Text Theory (MTT) is a generator-oriented lon, 1994): lexical grammatical formalism. Lexical knowledge is encoded in an entry of the Explanatory Combina- • free-combining words (the girl ate candies) torial Dictionary (ECD), each entry being divided * semantic collocations (fast car; long book) 6 into three zones: the semantic zone (a semantic net- work representing the meaning of the entry in terms • idiosyncrasies (large coke; green jealousy) of more primitive words), the syntactic zone (the • idioms (to kick the (proverbial) bucket) grammatical properties of the entry) and the lexi- cal combinatorics zone (containing the values of the Formally, we go from a purely compositional Lexical Functions (LFs) 3). LFs are central to the approach in "free-combining words" to a non- study of collocations: compositional approach in idioms. In between, a (semi-)compositional approach is still possible. (Vie- A lexical function F is a correspondence gas and Bouillon, 1994) showed that we can reduce which associates a lexical item L, called the the set of what are conventionally considered as id- key word of F, with a set of lexical items iosyncrasies by differentiating "true" idiosyncrasies F(L)-the value of F. (Mel'6uk, 1988) 4 (difficult to derive or calculate) from expressions We focus here on syntagmatic LFs describing co- which have well-defined calculi, being compositional occurrence relations such as pay attention, legitimate in nature, and that have been called semantic collo- complaint; from a distance. 5 cations. In this paper, we further distinguish their Heylen et al. (1993) have worked out some cases idiosyncrasies into: which help license a starting point for assigning LFs. They distinguish four types of syntagmatic LFs: • restricted semantic co-occurrence, where the meaning of the co-occurrence is semi- • evaluative qualifier compositional between the base and the collo- Magn(bleed) = profusely cate (strong coffee, pay attention, heavy smoker, • distributional qualifier ...) Mult(sheep) = flock • restricted lexical co-occurrence, where the • co-occurrence meaning of the collocate is compositional but Loc-in(distance)= at a distance has a lexical idiosyncratic behavior (lecture ... • verbal operator student; rancid butter; sour milk). Operl(attention) = pay We provide below examples of restricted seman- The MTT approach is very interesting as it pro- tic co-occurrences in (1), and restricted lexical co- vides a model of production well suited for genera- occurrences in (2). tion with its different strata and also a lot of lexical- Restricted semantic co-occurrence The se- semantic information. It seems nevertheless that all mantics of the combination of the entries is semi- 2Church and Hanks (1989), Smadja (1993) use statistics compositional. In other words, there is an entry in " in their algorithms to extract collocations from texts. the lexicon for the base, (the semantic collocate is 3See (Iordanskaja et al., 1991) and (Ramos et al., 1994) for their use of LFs in MTT and NLG respectively. encoded inside the base), whereas we cannot directly 4(Held, 1989) contrasts Hausman's base and collate to refer to the sense of the semantic collocate in the Mel'tuk's keyword and LF values. lexicon, as it is not part of its senses. We assign 5There are about 60 LFs listed said to be universal; the the co-occurrence a new semi-compositional sense, lexicographic approach of Mel'tuk and Zolkovsky has been applied among other languages to Russian, French, German 6See (Pustejovsky, 1995) for his account of such expres- and English. sions using a coercion operator. 1329 where the sense of the base is composed with a new tional. In other words, there are entries in the lex- sense for the collocate. icon for the base and the collocate, with the same senses as in the co-occurrence. Therefore, we can di- (la) #O=[key: "smoker", rectly refer to the senses of the co-occurring words. rel: [syntagmatic: LSFIntensity What we are capturing here is a lexical idiosyncrasy [base: #0, collocate: or in other words, we specify that we should prefer [key: "heavy", this particular combination of words. This is useful gram: [subCat: Attributive, for analysis, where it can help disambiguate a sense, freq: [value: 8]]]]] ...] and is most relevant for generation; it can be viewed as a preference among the paradigmatic family of (lb) #0= [key: "attention", the co-occurrence. rel: [syntagmatic: LSFOper [base: #0, collocate: [key: "pay", (2a) #O=[key: "truth", tel: [syntagmatic: LSFSyn gram: [subCat: SupportVerb, freq: [value: 5]]]]] ...] [base: #0, collocate: [key: "plain", sense: adj2, Ir: [comp:no, superl:no]]]] ...] In examples (1), the LSFs (LSFIntensity, LS- FOper, ...) are equivalent (and some identical) to (2b) #0=[key: "pupil", the LFs provided in the ECD.