and Learning: A Type-Theoretic Approach

Raquel Fernandez´ Staffan Larsson Institute for Logic, Language Department of Philosophy, Linguistics and Computation and Theory of Science University of Amsterdam University of Gothenburg [email protected] [email protected]

Abstract expressions (Kelleher et al., 2005; Reiter et al., 2005; Portet et al., 2009) and, perhaps even more We present a formal account of the mean- strongly, on the field of robotics, where ground- ing of vague scalar adjectives such as ‘tall’ ing language on perceptual information is critical formulated in with Records. to allow artificial agents to autonomously acquire Our approach makes precise how percep- and verify beliefs about the world (Siskind, 2001; tual information can be integrated into Steels, 2003; Roy, 2005; Skocaj et al., 2010). the meaning representation of these pred- Most of these approaches, however, do not build icates; how an agent evaluates whether an on theories of formal for natural lan- entity counts as tall; and how the proposed guage. Here we choose to formalise our account semantics can be learned and dynamically in a theoretical framework known as Type Theory updated through experience. with Records (TTR), which has been shown to be suitable for formalising classic semantic aspects 1 Introduction such as intensionality, quantification, and nega- tion (Cooper, 2005a; Cooper, 2010; Cooper and Traditional semantic theories such as those de- Ginzburg, 2011) as well as less standard phenom- scribed in Partee (1989) and Blackburn and ena such as linguistic interaction (Ginzburg, 2012; Bos (2005) offer precise accounts of the truth- Purver et al., 2014), perception and action (Dob- conditional content of linguistic expressions, but nik et al., 2013), and semantic coordination and do not deal with the connection between meaning, learning (Larsson, 2009). In this paper we use perception and learning. One can argue, however, TTR to put forward an account of the semantics of that part of getting to know the meaning of lin- vague scalar predicates like ‘tall’ that makes pre- guistic expressions consists in learning to identify cise how perceptual information can be integrated the individuals or the situations that the expres- into their meaning representation; how an agent sions can describe. For many concrete words and evaluates whether an entity counts as tall; and how phrases, this identification relies on perceptual in- the proposed semantics for these expressions can formation. In this paper, we on characteris- be learned and dynamically updated through lan- ing the meaning of vague scalar adjectives such guage use. as ‘tall’, ‘dark’, or ‘heavy’. We propose a for- We start by giving a brief overview of TTR and mal account that brings together notions from tra- explaining how it can be used for classifying en- ditional formal semanticswith perceptual informa- tities as being of particular types integrating per- tion, which allows us to specify how a logic-based ceptual information. After that, in Section 3, we interpretation function is determined and modified describe the main properties of vague scalar pred- dynamically by experience. icates. Section 4 presents a probabilistic TTR for- The need to integrate language and percep- malisation of the meaning of ‘tall’, which captures tion has been emphasised by researchers work- its -dependence and its vague character. In ing on the generation and resolution of referring Section 5, we then offer an account of how that This work is licensed under a Creative Commons Attribution meaning representation is acquired and updated 4.0 International Licence. Page numbers and proceedings footer are added by the organisers. Licence details: http: with experience. Finally, in Section 6 we discuss //creativecommons.org/licenses/by/4.0/ related work, before concluding in Section 7.

151 Proceedings of the Third Joint Conference on Lexical and (*SEM 2014), pages 151–159, Dublin, Ireland, August 23-24 2014. 2 Meaning as Classification in TTR way of constructing ptypes where the arguments of a are entities that have been intro- In this section we give a brief and hence inevitably duced before in the record type. A sample record partial introduction to Type Theory with Records. and record type are shown in (2). For more comprehensive introductions, we refer the reader to Cooper (2005b) and Cooper (2012). x = a x : Ind (2) c = prf(man(a)) : c : man(x)  man   man  2.1 Type Theory with Records: Main Notions crun = prf(run(a)) crun : run(x)     As in any type theory, the most central notion in In (2), a is an entity of type individual and prf(P ) TTR is that of a judgement that an object a is is used as a placeholder for proofs of ptypes P . of type T , written as a : T . In TTR judgements In the record type above, the ptypes man(x) and are seen as fundamentally related to perception, in run(x) constructed from predicates are dependent the sense that perceiving inherently involves cate- on x (introduced earlier in the record type). gorising what we perceive. Some common basic types in TTR are Ind (the type of individuals) and 2.2 Perceptual Meaning + R (the type of positive real numbers). All basic Larsson (2013) proposes a system formalised in types are members of a special type Type. Given TTR where some perceptual aspects of meaning types T1 and T2, we can create the function type are represented using classifiers. For example, the T T 2 whose domain are objects of type T 1 → 1 meaning of ‘right’ (as in ‘to the right of ’) involves and whose range are objects of type T2. Types a two-input perceptron classifier κright(w, t, r), can also be constructed from predicates and ob- specified by a weight vector w and a threshold jects P (a1, . . . , an). Such types are called ptypes t, which takes as input a context r including an and correspond roughly to in first or- object x and a position-sensor reading srpos. The der logic. In TTR, propositions are types of proofs, sensor reading consists of a vector containing two where proofs can be a variety of things, from situ- real numbers representing the space coordinates of ations to sensor readings (more on this below). x. The classifier classifies x as either being to the Next, we introduce records and record types. right on a plane or not.1 These are structured objects made up of pairs l, v h i x : Ind of labels and values that are displayed in a matrix: (3) if r : , then sr : RealVector (1) a. A record type:  pos  right(r.x) if (r.srpos w) > t `1 : T1 κ (w, t, r) = · right right(r.x) otherwise `2 : T2(`1)   ¬ ... As output we get a record type containing either a `n : Tn(`1, `2, . . . , `n 1) ptype right(x) or its , right(x). Larsson  −  ¬   (2013) proposes that readings from sensors may `1 = a1 count as proofs of such ptypes. A classifier can `2 = a2 b. A record: r = ...  be used for judging x as being of a particular type `n = an on the grounds of perceptual information. A per- ...    ceptual proof for right(x) would thus include the Record r in (1b) is of the record type in (1a) if output from the position sensor that is directed to- and only if a1 : T1, a2 : T2(a1), . . . , and an : wards x. Here, this output would be the space co- Tn(a1, a2, . . . , an 1). Note that the record may − ordinates of x. contain more fields but would still be of type (1a) if the typing condition holds. Records and record 3 Vague Scalar Predicates types can be nested so that the value of a label is Scalar predicates such as ‘tall’, ‘long’ and ‘ex- itself a record (or record type). We can use paths pensive’, also called “relative gradable adjectives” within a record or record type to refer to specific (Kennedy, 2007), are interpreted with respect to a bits of structure: for instance, we can use r.`2 to refer to a2 in (1b). 1We are here assuming that we have a definition of dot product for TTR vectors a:RealVectorn and b:RealVectorn As can be seen in (1a), the labels `1, . . . `n in a n such that a b = Σi=1aibi = a1b1 + a2b2 + ... + anbn. We record type can be used elsewhere to refer to the also implicitly· assume that the weight vector and the sensor values associated with them. This is a common reading vector have the same dimensionality.

152 scale, i.e., a dimension such as height, length, or 2002; Kennedy and McNally, 2005; Kennedy, cost along which entities for which the relevant di- 2007; Solt, 2011; Lassiter, 2011). mension is applicable can be ordered. This makes We build on degree approaches but adopt a scalar predicates compatible with degree morphol- perception-based perspective and take a step fur- ogy, like comparative and superlative morphemes ther to formalise how the meaning of these pred- (‘taller than’, ‘the longest’) and intensifier mor- icates can be learned and constantly updated phemes such as ‘very’ or ‘quite’. In this pa- through language use. per, our focus is on the so-called positive form of these adjectives (e.g. ‘tall’ as opposed to ‘taller’ 4 A Perceptual Semantics for ‘Tall’ or ‘tallest’). To exemplify our approach, we will use the scalar A property that distinguishes the positive form predicate ‘tall’ throughout. from the comparative and the superlative forms is its context-dependance. To take a common exam- 4.1 Context-sensitivity ple: If Sue’s height is 180cm, she may be appro- We first focus on capturing the context- priately described as a tall woman, but probably dependence of relative scalar predicates. For not as a tall basketball player. Thus, what counts this we define a type Tctxt as follows: as tall can vary from context to context, with the c : Type most relevant contextual parameter being a com- (4)T = x : c parison class relative to which the adjective is in- ctxt   h : + terpreted (e.g., the set of women, the set of bas- R   ketball players, etc.). In addition to being context- The context (ctxt) of a scalar predicate like ‘tall’ dependent, positive-form scalar predicates are also is a record of the type in (4), which includes: a vague, in the sense that they give rise to borderline type c (typically a subtype of Ind) representing the cases, i.e., entities for which it is unclear whether comparison class; an individual x within the com- the predicate holds or not. parison class (the argument of tall); a perceived Vagueness is certainly a property that affects measure on the relevant scale(s), in this case the most natural language expressions, not only scalar perceived height h of x expressed as a positive real adjectives. However, scalar adjectives have a number. relatively simple semantics (they are often uni- The context presupposes the acquisition of sen- dimensional) and thus constitute a perfect case- sory input from the environment. In particular, it study for investigating the properties and effects of assumes that an agent using such a representation vagueness on language use. Gradable adjectives is able to classify the entity in focus x as being have received a high amount of attention in the of type c and is able to use some height sensor to formal semantics literature. It is common to dis- obtain an estimate of x’s height (the value of h is tinguish between two main approaches to their se- the sensor reading). We thus forgo the inclusion of mantics: delineation-based and degree-based ap- an abstract measure function in the representation. proaches. The delineation approach is associated In an artificial agent, this may be accomplished by with the work of Klein (1980), who proposes that image processing software for detecting and mea- gradable adjectives denote partial functions de- suring objects in a digital image. pendent on a comparison class. They partition the Besides the ctxt, we also assume a standard comparison class into three disjoint sets: a positive threshold of tallness θtall of the type given in (5). , a negative extension, and an extension θtall is a function from a type specifying a com- gap (entities for which the predicate is neither true parison class to a height value, which corresponds nor false). In , degree-based approaches to a tallness threshold for that comparison class. assume a measure function m mapping individu- (In Section 5 we will discuss how such a threshold als x to degrees on a particular scale (degrees of may be computed.) + height, degrees of darkness, etc.) and a standard (5) θtall : Type R of comparison or degree threshold θ (again, de- → The meaning of ‘tall’ involves a classifier for tall- pendent on a comparison class) such that x be- ness, κ , of the following type: longs to the adjective’s if m(x) > θ tall + (Kamp, 1975; Pinkal, 1979; Pinkal, 1995; Barker, (6) κtall :(Type R ,Tctxt) Type → →

153 We define this classifier as a one-input perceptron sit = r (9) λr :Tctxt. that compares the perceived height h of an indi- sit-type = ctall : κtall(θtall, r)   vidual x to the relevant threshold θ determined by + c = Human  a comparison class c. Thus, if θ : Type R and ( x = john smith ) = →   r : Tctxt, then: h = 1.88 tall(r.x) if r.h > θ(r.c) κ (θ, r) =   tall tall(r.x) otherwise c = Human  ¬ sit = x = john smith     Simplifying somewhat, we can represent the mea- h = 1.88 ning of ‘tall’, tall, as a record specifying the type sit-type = c : tall(john smith)   tall  of context (Tctxt) where an utterance of ‘tall’ can     be made, the parameter of the tallness classifier 4.2 Vagueness (the threshold θ), and a function f which is applied According to the above account, ‘tall’ has a to the context to produce the content of ‘tall’. precise interpretation: given a degree of height (7) c : Type and a comparison class, the threshold sharply determines whether tall applies or not. There Tctxt= x : c     are several ways in which one can account for h : R+ vagueness—amongst others, by introducing per- tall =θ =θ    tall  f = λr :T .  ceptual uncertainty (possibly inaccurate sensor  ctxt   sit = r  readings). Here, in line with Lassiter (2011), we    sit-type = c : κ (θ, r)  opt for substituting the precise threshold with a  tall tall     noisy, probabilistic threshold. We consider the The output of the function fis an Austinian propo- threshold to be a normal random variable, which sition (Cooper, 2005b): a judgement that a situa- can be represented by the parameters of its Gaus- tion (sit, represented as a record r of type Tctxt), sian distribution, the mean µ and the standard de- is of a particular type (specified in sit-type). In the viation σ (the noise width).2 case of tall, the context of utterance (which instan- To incorporate this modification into our ap- tiates r) is judged to be of the type where there is proach, we update the tallness classifier κtall we an individual x which is either tall or not tall, ac- had defined in (6) so that it now takes as parame- cording to the output of the classifier κtall. The ters µtall and σtall, both of them dependent on the context of utterance in the sit field will include the comparison class and hence of type Type R+. → height-sensor reading, which means that the sen- The output of the classifier is now a probability sor reading is part of the proof of the sit-type indi- rather than a ptype such as tall(x) or tall(x). Be- ¬ cating that x is tall (or not, as the case may be). fore indicating how this probability is computed, Thus, to decide whether to refer to some indi- we give the type of the vague version of the clas- vidual x as tall or to evaluate someone else’s utter- sifier in (10) and the vague representation of the ance describing x as tall, an agent applies the func- meaning of ‘tall’ in (11). tion tall.f to the current situation, represented as a + + (10) κtall :(Type R , Type R , Tctxt) [0, 1] record r : Tctxt. As an example, let us consider a → → → situation that includes the context in (8), resulting (11) c : Type from observing John Smith as being 1.88 meters Tctxt= x : c tall (assuming this is our scale of tallness):   +   h : R µ =µtall   c = Human   tall =σ = σ  (8) ctxt = x = john smith  tall    f = λr :T .  h = 1.88  ctxt     sit = r  Let us assume that given the comparison class    sit-type = ctall : tall(r.x)  Human, θtall(Human) = 1.87. In this case,     prob = κtall(σ, µ, r)  tall.f(ctxt) will compute as shown in (9). The re-     2    sulting Austinian corresponds to the Which noise function may be the most appropriate is an empirical question we do not tackle in this paper. Our choice agent’s judgement that the situation in sit is one of Gaussian noise follows Schmidt et al. (2009)—see Sec- where John Smith counts as tall. tion 5.1.

154 The output of the function tall.f is now a prob- sit = r abilistic Austinian proposition (Cooper et al., (12) λr :Tctxt. sit-type = ctall : tall(r.x)   prob = κ (µ , σ , r) 2014). Like before, the proposition expresses a tall tall tall  judgement that a situation sit is of a particular  c = Human  type. But here the judgement is probabilistic—it ( x = john smith ) =   encodes the belief of an agent concerning the like- h = 1.88 lihood that sit is of a type where x counts as tall.   Since we take the noisy threshold to be a normal c = Human sit = x = john smith random variable, given a particular µ and σ, we     can calculate the probability that the height r.h of h = 1.88 sit-type = c : tall(john smith)  individual r.x counts as tall as follows:  tall  prob = 0.579     1 r.h µ(r.c) This probability can now be used in further prob- κtall(µ, σ, r) = 1 + erf − 2 σ(r. )√2 abilistic reasoning, to decide whether to refer to   c  an individual x as tall, or to evaluate someone Here erf is the error function, defined as3 else’s utterance describing x is tall. For exam- ple, an agent may map different probabilities to x 2 t2 different adjective qualifiers of tallness to yield erf(x) = e− dt √π Zt=0 compositional phrases such as ‘sort of tall’, ‘quite The error function defines a sigmoid shape (see tall’, ‘very tall’, ‘extremely tall’, etc. The mean- Figure 1), in line with the upward monotonicity ings of these composed adjectival phrases could of ‘tall’. The output of κtall(µ, σ, r) corresponds specify probability ranges trained independently. to the probability that h will exceed the normal Compositionality for vague perceptual meanings, random threshold with mean µ and deviation σ. and the interaction between compositionality and learning, is an exciting area for future research.4 5 Learning from Language Use In this section we consider possibilities for com- puting the noisy threshold we have introduced in the previous section and discuss how such a threshold and the probabilistic judgements it gives rise to are updated with language use. 5.1 Computing the Noisy Threshold Figure 1: Plot of the error function. We assume that agents keep track of judgements made by other agents. More concretely, for a Let us consider an example. Assume that we have vague scalar predicate like ‘tall’, we assume that µtall(Human) = 1.87 and σtall(Human) = 0.05 an agent will have at its disposal a set of obser- (see Section 5.1 below for justification of the latter vations consisting of entities of a particular type value). Let’s also assume the same ctxt as above T (a comparison class such as Human) that have in (8). In this case, tall.f(ctxt) will compute as in been judged to be tall, together with their observed (12), given that heights. Judgements of tallness may vary across c=Human individuals—indeed, such variation (both inter- κtall(µtall, σtall, x=john smith ) = and intra-individual) is a hallmark of vague pred-   T h=1.88 icates. We use Ωtall to refer to the set of heights   of those entities x : T that have been considered 1 1.88 1.87 tall by some individual. From this agent-specific 1 + erf − = 0.579 2 0.05√2 set of observations, which is constantly updated as    the agent is exposed to new judgements by other individuals, we want to compute a noisy threshold, 3For an explanation of this standard definition, see http: //en.wikipedia.org/wiki/Error_function, 4See Larsson (2013) for a sketch of compositionality for which is the source of the graph in Figure 1. perceptual meaning.

155 T which the agent uses to make her own judgements agent adds h to its set of observations Ωtall and of tallness, as specified in (11). recomputes µtall(Human), for instance using RH- Different functions can be used to compute µtall R as defined in (13). If RH-H is used, ideally the T and σtall from Ωtall. What constitutes an appro- value of k and σtall should be (re)estimated from T priate function is an empirical matter and what Ωtall. For the sake of simplicity, however, here the most suitable function is possibly varies across we will assume that these two parameters take the predicates (what may apply to ‘tall’ may not be values experimentally validated by Schmidt et al. suitable for ‘dark’ or ‘expensive’, for example). (2009) and are kept constant. An update to µtall T Hardly any work has been done on trying to iden- will take place if it is the case that h > max(Ωtall) T tify how the threshold is computed from experi- or h < min(Ωtall). This in turn will trigger un ence. A notable exception, however, is the work of update to the probability outputted by κtall. Schmidt et al. (2009), who collect judgements of As an example, let us assume that our Human people asked to indicate which items are tall given initial set of observations is Ωtall = distributions of items of different heights. Schmidt 1.87, 1.92, 1, 90, 1.75, 1.80 (recall this corre- { } and colleagues then propose different probabilis- sponds to the perceived heights of individuals tic models to account for the data and compare that have been described as tall by some agent). Human their output to the human judgements. They ex- This means that max(Ωtall ) = 1.92 and Human plore two types of models: threshold-based mod- min(Ωtall ) = 1.75. Hence, given (13): els and category-based or cluster models. The best (14) µ (Human) = performing models within these two types perform tall 1.92 0.29 (1.92 1.75) = 1.87 equally well and the study does not identify any − · − advantages of one type over the other one. Since Let’s assume we now make an observation where we have chosen threshold models as our case- a person of height 1.72 is judged to be tall. This study, we focus our attention on those here. will mean that the set of observations is now Human Each of the threshold models tested by Schmidt Ωtall = 1.87, 1.92, 1, 90, 1.75, 1.80, 1.72 { Human } et al. (2009) corresponds to a possible way of com- and consequently min(Ωtall ) = 1.72, which puting the mean µtall of a noisy threshold from a yields an updated mean of the noisy threshold: set of observations. The best performing threshold model in their study is the relative height by range (15) µtall(Human) = 1.92 0.29 (1.92 1.72) = 1.862 model, where (in our notation): − · − If we were to re-evaluate John Smith’s tallness in (13) relative height by range (RH-R): µtall(T ) = max(ΩT ) k (max(ΩT ) min(ΩT )) light of this observation, we would get a new prob- tall − · tall − tall ability 0.64 that he is tall (in contrast to the earlier T T probability of 0.579 given in (12)). Here max(Ωtall) and min(Ωtall) stand for the maximum and the minimum height, respectively, of the items that have been judged to be tall 5.3 Possible Extensions by some individual. According to this threshold Human The set of observations Ωtall can be derived model, any item within the top k% of the range from a set of Austinian propositions correspond- of heights that have been judged to be tall counts ing to instances where people have been judged as tall. The model includes two parameters, k and to be tall. To update from an Austinian proposi- a noise-width parameter that in our approach cor- tall tion p we simply add p.sit.h to ΩHuman and re- σ responds to tall. Schmidt et al. (2009) report compute µtall(p.c). Note that we are here treating that the best fit of their data was obtained with these Austinian propositions as non-probabilistic. k = 29% and σtall = 0.05. This seems to make sense since an addressee does not have direct access to the probability associated 5.2 Updating Vague Meanings with the judgement of the speaker. If we were to We now want to specify how the vague meaning take these probabilities into account (for instance, of ‘tall’ is updated as an agent is exposed to new the use of a hedge in ‘sort of tall’ may be used judgements via language use. Our setting so far to make inferences about such probabilities), and offers a straightforward solution to this: If a new if those probabilities are not always 1, we would entity x : T with height h is referred to as tall, the need a different way of computing µtall than the

156 one specified so far. standard of comparison for the predicate ‘small’. Somewhat related to the point above, note that The user’s utterance ‘Make a small circle’ is then in our approach we treat all judgements equally, interpreted as asking for a circle of an arbitrary i.e., we do not distinguish between possible dif- size that is smaller than the square. ferent levels of trustworthiness amongst speakers. In our characterisation of the context-sensitivity An agent who is told that an entity with height h of vague gradable adjectives in Section 4.1, we is tall adds that observation to its knowledge base have focused on their dependence on general com- without questioning the reliability of the speaker. parison classes corresponding to types of entities This is clearly a simplification. For instance, there (such as Human, Woman, etc) with expected prop- is developmental evidence showing that children erties such as height. Thus, in contrast to DeVault are more sensitive to reliable speakers than to un- and Stone (2004), who focus on the local context reliable ones during language acquisition (Scofield of , we have focused on what could be and Behrend, 2008). called the global context (an agent’s experience re- garding types of entities and their expected prop- 6 Other Approaches erties). How these two types of context interact Within the literature in formal semantics, Las- remains an open question, which we plan to ex- siter (2011) has put forward a proposal that ex- plore in our future work (see Kyburg and Morreau tends in interesting ways earlier work by Barker (2000), Kemp et al. (2007), and Fernandez´ (2009) (2002) and shares some aspects with the account for pointers in this direction). we have presented here. Operating in a probabilis- tic version of classical possible-worlds semantics, 7 Conclusions and future work Lassiter assumes a probability distribution over a set of possible worlds and a probability distribu- Traditional formal semantics theories postulate a tion over a set of possible languages. Each pos- fixed, abstract interpretation function that medi- sible language represents a precise interpretation ates between natural language expressions and the of a predicate like ‘tall’: tall1 = λx.x’s height world, but fall short of specifying how this func- ≥ 5’6”; tall2 = λx.x’s height 5’7”; and so forth. tion is determined or modified dynamically by ≥ Lassiter thus treats “metalinguistic belief” (repre- experience. In this paper we have presented a senting an agent’s knowledge of the meaning of characterisation of the semantics of vague scalar words) in terms of probability distributions over predicates such as ‘tall’ that clarifies how their precise languages. Since each precise interpreta- context-dependent meaning and their vague char- tion of ‘tall’ includes a given threshold, this can acter are connected with perceptual information, be seen as defining a probability distribution over and we have also shown how this low-level per- possible thresholds, similarly to the noisy thresh- ceptual information (here, real-valued readings old we have used in our account. Lassiter, how- from a height sensor) connects to high level logical ever, is not concerned with learning. semantics (ptypes) in a probabilistic framework. Within the computational semantics literature, In addition, we have put forward a proposal for DeVault and Stone (2004) describe an imple- explaining how the meaning of vague scalar ad- mented system in a drawing domain that is able to jectives like ‘tall’ is dynamically updated through interpret and execute instructions including vague language use. scalar predicates such as ‘Make a small circle’. Tallness is a function of a single value (height), Their approach makes use of degree-based seman- and is in this sense a uni-dimensional pred- tics, but does not take into account comparison icate. Indeed, most linguistic approaches to classes. This is possible in their drawing domain vagueness focus on uni-dimensional predicates since the kind of geometric figures it includes such as ‘tall’. However, many vague predicates (squares, rectangles, circles) do not have intrinsic are multi-dimensional, including nouns for posi- expected properties (size, length, etc). Their focus tions (‘above’), shapes (‘hexagonal’), and colours is on modelling how the threshold for a predicate (‘green’), amongst many others. Together with such as ‘small’ is updated during an interaction compositionality (mentioned at the end of Sec- with the system given the local discourse context. tion 4.2), generalisation of the present account to For instance, if the initial context just contains a multi-dimensional vague predicates is an interest- square, the size of that square is taken to be the ing area of future development.

157 Acknowledgements Simon Dobnik, Robin Cooper, and Staffan Larsson. 2013. Modelling language, action, and perception The first author acknowledges the support of the in type theory with records. In Constraint Solving Netherlands Organisation for Scientific Research and Language Processing, Lecture Notes in Com- (NWO) and thanks the Centre for Language Tech- puter Science, pages 70–91. Springer. nology at the University of Gothenburg for gen- Raquel Fernandez.´ 2009. Salience and feature vari- erously funding research visits that led to the ability in definite descriptions with positive-form work presented in this paper. The second au- vague adjectives. In Workshop on the Production thor acknowledges the support of Vetenskapsradet,˚ of Referring Expressions: Bridging the gap between computational and empirical approaches to refer- project 2009-1569, Semantic analysis of interac- ence (CogSci’09). tion and coordination in dialogue (SAICD); the Department of Philosophy, Linguistics, and The- Jonathan Ginzburg. 2012. The Interactive Stance. Ox- ford University Press. ory of Science; and the Centre for Language Tech- nology at the University of Gothenburg. Hans Kamp. 1975. Two theories of adjectives. In E. Keenan, editor, Formal Semantics of Natural Lan- guage, pages 123–155. Cambridge University Press.

References John Kelleher, Fintan Costello, and Josef van Genabith. Chris Barker. 2002. The dynamics of vagueness. Lin- 2005. Dynamically structuring, updating and inter- guistics & Philosophy, 25(1):1–36. relating representations of visual and linguistic dis- course context. Artificial Intelligence, 167(1):62– Patrick Blackburn and Johan Bos. 2005. Represen- 102. tation and Inference for Natural Language: A First Course in Computational Semantics. CSLI Publica- Charles Kemp, Amy Perfors, and Joshua B. Tenen- tions. baum. 2007. Learning overhypotheses with hier- archical bayesian models. Developmental Science, Robin Cooper and Jonathan Ginzburg. 2011. Negation 10(3):307–321. in dialogue. In Proceedings of the 15th Workshop on the Semantics and of Dialogue (SemDial Christopher Kennedy and Louise McNally. 2005. 2011), Los Angeles (USA). Scale structure, degree modification, and the seman- tics of gradable predicates. Language, pages 345– Robin Cooper, Simon Dobnik, Shalom Lappin, and 381. Staffan Larsson. 2014. A probabilistic rich type theory for semantic interpretation. In Proceedings Christopher Kennedy. 2007. Vagueness and grammar: of the EACL Workshop on Type Theory and Natural The semantics of relative and absolute gradable ad- Language Semantics (TTNLS). jectives. Linguistics and Philosophy, 30(1):1–45.

Robin Cooper. 2005a. Austinian truth, attitudes and Ewan Klein. 1980. A semantics for positive and type theory. Research on Language and Computa- comparative adjectives. Linguistics and Philosophy, tion, 3(4):333–362, December. 4:1–45.

Robin Cooper. 2005b. Austinian truth, attitudes and Alice Kyburg and Michael Morreau. 2000. Fitting type theory. Research on Language and Computa- words: Vague language in context. Linguistics and tion, 3:333–362. Philosophy, 23:577–597.

Robin Cooper. 2010. Generalized quantifiers and clar- Staffan Larsson. 2009. Detecting and learning from ification content. In Paweł Łupkowski and Matthew lexical innovation in dialogue: a ttr account. In Purver, editors, Aspects of Semantics and Pragmat- Proceedings of the 5th International Conference on ics of Dialogue. SemDial 2010, 14th Workshop on Generative Approaches to the Lexicon. the Semantics and Pragmatics of Dialogue, Poznan.´ Polish Society for Cognitive Science. Staffan Larsson. 2013. Formal semantics for percep- tual classification. Journal of Logic and Computa- Robin Cooper. 2012. Type theory and semantics in tion. flux. In Ruth Kempson, Nicholas Asher, and Tim Fernando, editors, Handbook of the Philosophy of Dan Lassiter. 2011. Vagueness as probabilistic linguis- Science, volume 14: Philosophy of Linguistics. El- tic knowledge. In R. Nowen, R. van Rooij, U. Sauer- sevier BV. General editors: Dov M. Gabbay, Paul land, and H. C. Schmitz, editors, Vagueness in Com- Thagard and John Woods. munication. Springer. David DeVault and Matthew Stone. 2004. Interpret- Barbara Partee. 1989. Possible worlds in model- ing vague utterances in context. In Proceedings of theoretic semantics: A linguistic perspective. In the 20th International Conference on Computational S. Allen, editor, Possible Worlds in Humanities, Arts Linguistics (COLING’04), pages 1247–1253. and Sciences, pages 93–123. Walter de Gruyter.

158 Manfred Pinkal. 1979. Semantics from different points of view. In R. Baurle,¨ U. Egli, and A. von Stechow, editors, How to Refer with Vague Descrip- tions, pages 32–50. Springer-Verlag. Manfred Pinkal. 1995. Logic and lexicon: the seman- tics of the indefinite, volume 56 of Studies in Lin- guistics and Philosophy. Springer. Franc¸ois Portet, Ehud Reiter, Albert Gatt, Jim Hunter, Somayajulu Sripada, Yvonne Freer, and Cindy Sykes. 2009. Automatic generation of textual sum- maries from neonatal intensive care data. Artificial Intelligence, 173(7):789–816. Matthew Purver, Julian Hough, and Eleni Gre- goromichelaki. 2014. Dialogue and compound contributions. In A. Stent and S. Bangalore, ed- itors, Natural Language Generation in Interactive Systems. Cambridge University Press. Ehud Reiter, Somayajulu Sripada, Jim Hunter, Jin Yu, and Ian Davy. 2005. Choosing words in computer- generated weather forecasts. Artificial Intelligence, 167(1):137–169. Deb Roy. 2005. Semiotic schemas: A framework for grounding language in action and perception. Artifi- cial Intelligence, 167(1):170–205. L.A. Schmidt, N.D. Goodman, D. Barner, and J.B. Tenenbaum. 2009. How tall is tall? composition- ality, statistics, and gradable adjectives. In Proceed- ings of the 31st annual conference of the cognitive science society. Jason Scofield and Douglas A Behrend. 2008. Learn- ing words from reliable and unreliable speakers. Cognitive Development, 23(2):278–290. Jeffrey Mark Siskind. 2001. Grounding the of verbs in visual perception using force dynamics and event logic. Journal of Artificial In- telligence Research, (15):31–90. Danijel Skocaj, M Janicek, Matej Kristan, Geert-Jan M Kruijff, Alesˇ Leonardis, Pierre Lison, Alen Vrecko, and Michael Zillich. 2010. A basic cognitive sys- tem for interactive continuous learning of visual concepts. In Proceeding of the Workshop on Inter- active Communication for Autonomous Intelligent Robots, pages 30–36. Stephanie Solt. 2011. Notes on the comparison class. In Vagueness in communication, pages 189–206. Springer. Luc Steels. 2003. Evolving grounded communication for robots. Trends in cognitive sciences, 7(7):308– 312.

159