Competence in Lexical Semantics

Competence in lexical semantics Andras´ Kornai Judit Acs´ Marton´ Makrai Institute for Computer Science Dept of Automation and Institute for Linguistics Hungarian Academy of Sciences Applied Informatics, BUTE Hungarian Academy of Sciences Kende u. 13-17 Magyar Tudosok´ krt. 2 Benczur´ u. 33 1111 Budapest, Hungary 1117 Budapest, Hungary 1068 Budapest, Hungary [email protected] [email protected] [email protected] David´ Nemeskey Katalin Pajkossy Gabor´ Recski Faculty of Informatics Department of Algebra Institute for Linguistics Eotv¨ os¨ Lorand´ University BUTE Hungarian Academy of Sciences Pazm´ any´ Peter´ set´ any´ 1/C Egry J. u. 1 Benczur´ u. 33 1117 Budapest, Hungary 1111 Budapest, Hungary 1068 Budapest, Hungary [email protected] [email protected] [email protected] Abstract universality in Section 5. Our survey of the litera- ture is far from exhaustive: both ACR and CVS have We investigate from the competence stand- deep roots, with significant precursors going back at point two recent models of lexical semantics, algebraic conceptual representations and con- least to Quillian (1968) and Osgood et al. (1975) re- tinuous vector models. spectively, but we put the emphasis on the computational experiments we ran (source code and lexica Characterizing what it means for a speaker to be available at github.com/kornai/4lang). competent in lexical semantics remains perhaps the most significant stumbling block in reconciling the 1 Background two main threads of semantics, Chomsky’s cogni- In the eyes of many, Quine (1951) has demolished tivism and Montague’s formalism. As Partee (1979) the traditional analytic/synthetic distinction, relegat- already notes (see also Partee 2013), linguists as- ing nearly all pre-Fregean accounts of word mean- sume that people know their language and that their ing from Aristotle to Locke to the dustbin of his- brain is finite, while Montague assumed that words tory. The opposing view, articulated clearly in Grice are characterized by intensions, formal objects that and Strawson (1956), is based on the empirical ob- require an infinite amount of information to specify. servation that people make the call rather uniformly In this paper we investigate two recent models of over novel examples, an argument whose import is lexical semantics that rely exclusively on finite in- evident from the (at the time, still nascent) cogni- formation objects: algebraic conceptual representa- tive perspective. Today, we may agree with Putnam tions (ACR) (Wierzbicka, 1985; Kornai, 2010; Gor- (1976): don et al., 2011), and continuous vector space (CVS) models which assign to each word a point in finite- ‘Bachelor’ may be synonymous with ‘un- dimensional Euclidean space (Bengio et al., 2003; married man’ but that cuts no philosophic Turian et al., 2010; Pennington et al., 2014). After a ice. ‘Chair’ may be synonymous with brief introduction to the philosophical background ‘moveable seat for one with back’ but that of these and similar models, we address the hard bakes no philosophic bread and washes no questions of competence, starting with learnability philosophic windows. It is the belief that in Section 2; the ability of finite networks or vectors there are synonymies and analyticities of a to replicate traditional notions of lexical relatedness deeper nature - synonymies and analytici- such as synonymy, antonymy, ambiguity, polysemy, ties that cannot be discovered by the lex- etc. in Section 3; the interface to compositional se- icographer or the linguist but only by the mantics in Section 4; and language-specificity and philosopher - that is incorrect. 165 Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics (*SEM 2015), pages 165–175, Denver, Colorado, June 4–5, 2015. Fortunately, one philosopher’s trash may just turn ilar distribution. This idea, going back at least to out to be another linguist’s treasure. What Putnam Firth (1957) is not at all trivial to defend, and not just has demonstrated is that “a speaker can, by all rea- because defining ‘semantically similar’ is a chal- sonable standards, be in command of a word like lenging task: as we shall see, there are significant de- water without being able to command the intension sign choices involved in defining similarity of vec- that would represent the word in possible worlds se- tors as well. To the extent CVS representations are mantics” (Partee, 1979). Computational systems of primarily used in artificial neural net models, it may Knowledge Representation, starting with the Teach- be helpful to consider the state of a network being able Word Comprehender of Quillian (1968), and described by the vector whose nth coordinate gives culminating in the Deep Lexical Semantics of Hobbs the activation level of the nth neuron. Under this (2008), carried on this tradition of analyzing word conception, the meaning of a word is simply the ac- meaning in terms of ‘essential’ or ‘analytic’ compo- tivation pattern of the brain when the word is pro- nents. duced or perceived. Such vectors have very large A particularly important step in this direction (1010) dimension so dimension reduction is called is the emergence of modern, computationally ori- for, but direct correlation between brain activation ented lexicographic work beginning with Collins- patterns and the distribution of words has actually COBUILD (Sinclair, 1987), the Longman Dictio- been detected (Mitchell et al., 2008). nary of Contemporary English (LDOCE) (Bogu- raev and Briscoe, 1989), WordNet (Miller, 1995), 2 Learnability FrameNet (Fillmore and Atkins, 1998), and Verb- Net (Kipper et al., 2000). Both the network- and The key distinguishing feature between ‘explana- the vector-based approach build on these efforts, but tory’ or competence models and ‘descriptive’ or per- through very different routes. formance models is that the former, but not the latter, Traditional network theories of Knowledge Rep- come complete with a learning algorithm (Chomsky, resentation tend to concentrate on nominal features 1965). Although there is a wealth of data on chil- such as the IS A links (called hypernyms in Word- dren’s acquisition of lexical entries (McKeown and Net) and treat the representation of verbs somewhat Curtis, 1987), neither cognitive nor formal seman- haphazardly. The first systems with a well-defined tics have come close to formulating a robust theory model of predication are the Conceptual Depen- of acquisition, and for intensions, infinite informa- dency model of Schank (1972), the Natural Syntax tion objects encoding the meaning in the formal the- Metalanguage (NSM) of Wierzbicka (1985), and a ory, it is not at all clear whether such a learning al- more elaborate deep lexical semantics system that is gorithm is even possible. still under construction by Hobbs and his coworkers (Hobbs, 2008; Gordon et al., 2011). What we call al- 2.1 The basic vocabulary gebraic conceptual representation (ACR) is any such The idea that there is a small set of conceptual prim- theory encoded with colored directed edges between itives for building semantic representations has a the basic conceptual units. The algebraic approach long history both in linguistics and AI as well as in provides a better fit with functional programming language teaching. The more theory-oriented sys- than the more declarative, automata-theoretic ap- tems, such as Conceptual Dependency and NSM as- proach (Huet and Razet, 2008), and makes it possi- sume only a few dozen primitives, but have a disqui- ble to encode verbal subcategorization (case frame) eting tendency to add new elements as time goes by information that is at the heart of FrameNet and (Andrews, 2015). In contrast, the systems intended VerbNet in addition to the standardly used nominal for teaching and communication, such as Basic En- features (Kornai, 2010). glish (Ogden, 1944) start with at least a thousand Continuous vector space (CVS) is also not a sin- primitives, and assume that these need to be further gle model but a rich family of models, generally supplemented by technical terms from various do- based on what Baroni (2013) calls the distributional mains. Since the obvious learning algorithm based hypothesis, that semantically similar items have sim- on any such reductive system is one where the primi- 166 tives are assumed universal (and possibly innate, see ally eliminating low-frequency nodes whose outgo- Section 5), and the rest is learned by reduction to the ing arcs lead to not yet eliminated nodes, and make primitives, we performed a series of ‘ceiling’ exper- no claim that the results in Table 1 are optimal, just iments aiming at a determination of how big the uni- that they are typical of the reduction that can be ob- versal/innate component of the lexicon must be. A tained by modest computation. We defer discussion trivial lower bound is given by the current size of the of the last line to Section 4, but note that the first line NSM inventory, 65 (Andrews, 2015), but as long as already implies that a defining set of 1,008 concepts we don’t have the complete lexicon of at least one will cover all senses of the high frequency items in language defined in NSM terms the reductivity of the major Western branches of IE, and to cover the the system remains in doubt. first (primary) sense of each word in LDOCE 361 For English, a Germanic language, the first prov- words suffice. ably reductive system is the Longman Defining Vo- cabulary (LDV), some 2,200 items, which provide Dictionary #words FVS a sufficient basis for defining all entries in LDOCE 4lang

Load more