A Model for Attention-Driven Judgements in Type Theory with Records

Simon Dobnik John D. Kelleher∗ CLASP School of Computing University of Gothenburg, Sweden Dublin Institute of Technology, Ireland [email protected] [email protected]

Abstract pipeline this way, this also assumes that representations and operations are distinct at each level and This paper makes three contributions to one needs to design interfaces that would mediate the discussion on the applicability of Type between these levels. Theory with Records (TTR) to embodied Type Theory with Records (TTR) (Cooper, dialogue agents. First, it highlights the 2005; Cooper, 2012; Cooper et al., 2015) provides problem of type assignment or judgements a theory of natural language semantics which in practical implementations which is re- views meaning and reference assignment being in source intensive. Second, it presents a the domain of an individual agent who can make judgement control mechanism, which con- judgements about situations (or invariances in the sists of grouping of types into clusters or world) of being of types (written as a : T). The states by their thematic relations and selec- type inventory of an agent is not static but is con- tion of types following two mechanisms tinuously refined through agent’s interaction with inspired by the Load Theory of selective its physical environment and with other agents attention and cognitive control (Lavie et through dialogue interaction which provides in- al., 2004), that addresses this problem. stances and feedback on what strategies to adopt Third, it presents a computational frame- to learn from these instances. The reason why work, based on Bayesian inference, that agent’s meaning representations or type invento- offers a basis for future practical exper- ries converge to an approximately identical inven- imentation on the feasibility of the pro- tory is that agents are situated in the identical or posed approach. sufficiently similar physical environment and have 1 Type Theory with Records grounded conversations with other agents; see for example the work of (Steels and Belpaeme, 2005) One of the central challenges for multi-modal di- and (Larsson, 2013) for an approach in TTR. Hav- alogue systems is information fusion or how such ing the capability to adjust the type representations a system can represent information from different they can adapt to new physical environments and domains, compare it, compose it, and reason about new conversational exchanges. Such view is not it. Typically, a situated agent will have to deal with novel to mobile robotics (Dissanayake et al., 2001) information that comes from its perceptual sensors nor to approaches to semantic and pragmatics of and will be represented as real-valued vectors and dialogue (Clark, 1996), but it is novel to formal se- conceptual categories (some of which correspond mantics (Dowty et al., 1981; Blackburn and Bos, to words in language) that are formed through cog- 2005) which represents important body of work on nitive processes in the brain. When situated agents how meaning is constructed compositionally and are implemented practically one typically adopts reasoned about. Overall, we see TTR as a highly a layered approach starting at the scene geome- fitting framework for modelling cognitive situated try and finishing at the level of the agent’s knowl- agents as it connects perception and high level se- edge about the objects and their interactions (Krui- mantics of natural language and vice versa. jff et al., 2006). Although, this approach may be The type system in TTR is rich in comparison to good for practical reasons, for example there are that found in traditional formal semantics (entities, pre-existing systems which may be organised in a truth values and function types constructed from ∗ Both authors contributed equally. these and other function types). In addition types

Appears in JerSem: The 20th Workshop on the Semantics and Pragmatics of Dialogue, edited by Julie Hunter, Mandy Simons and Matthew Stone. New Brunswick, NJ USA, July 16-18, 2016, pages 25-34. are used to model meaning in a proof-theoretic ple Left is dependent on at least two record types way rather than constraining model theoretic in- shown in the previous example representing per- terpretation. Types in TTR can be either basic ceptual objects. The notion of dependent types types such as Ind or Real or record types. Record is stronger than that of component types and is types are represented as matrices containing label- related to representing linguistic meaning. With value pairs where labels are constants and values missing information matching a component type can be either basic types, ptypes which act as type an agent could still judge (with some error) that a constructors and record types. The corresponding situation is of that type whereas a judgement with- proof-objects of record types are records. These out a previous judgement of the dependent type may be thought of as iconic representations of would be impossible. (Harnad, 1990) or sensory readings that an agent The rich type system of TTR and the relations perceives as sensory projections of objects or sit- between types give us a lot of flexibility in mod- uations in the world. The example below shows a elling natural language semantics in embodied di- judgement that a record (a matrix with = as a de- alogue agents. However, one practical problem limiter) containing a sensory reading is of a type that an application of TTR faces is how such an (with : as a delimiter). The traditional distinction agent will cope with the an increasing number of between symbolic and sub-symbolic knowledge is types that it continuously acquires through learn- not maintained in this framework as both can be ing and assign them effectively to every new situ- assigned appropriate types. ation it encounters given that such agent has lim-   a = ind26 ited processing resources. Since each type assign-  sr = [[34,24],[56,78]. . . ]  : ment involves a judgement (a probabilistic belief loc = [45,78,0.34] that something is of a type T) for each record of a  a : Ind  situation an agent having an inventory of n types  sr : list(list(Real))  would make n judgements, a large proportion of loc : list(Real) which would yield very low or even zero probabilities as they will be irrelevant or very-little relevant An important notion of TTR is that types are infor the current perceptual and conversational con- tensional which means that a given situation in the texts. This is because due to the regularities in the world may be assigned more than one type. For world certain types would never be assigned or are example, a sensory reading of a particular situa- very unlikely to be assigned in certain contexts. tion in the world involving spatial arrangement of objects captured as records of types shown in the (Hough and Purver, 2014) present a model previous example may be assigned several record where types are ordered in a lattice by v which types of spatial relations simultaneously, for ex- drives incremental type checking for the purposes ample Left, Near, At, Behind, etc. Another im- of resolution of incremental linguistic input or portant notion of TTR is sub-typing which allows output which in itself is a different task to ours. comparison of types. In addition to the type in the The approach captures taxonomic or categorial previous example, let’s call it T1, the record can relations encoded in types. As humans do not also be assigned the following two types T2 and necessarily judge situations from most general to T3 whereby the following sub-typing relation be- most specific or vice versa, the benefit of reduc- tween them holds: T1 v T2 v T3 where v reads as ing judgements following taxonomic organisation “is a subtype of”. of types would vary depending on the situation judged. Such knowledge would allow exclusion of a : Ind T = judgements of sub-types of an incompatible type 2 sr : list(list(Real)) but agent’s judgements could be further reduced if T = a : Ind 3 it were primed what to expect in its current state Thirdly, types may be component types of other and its perceptual and conversational contexts by types, for example p(redicate)-type list(Real) is a its knowledge about the world and the linguistic component of the larger record type shown above. behaviour of its interlocutor captured in a model of Finally, record types may also be dependent on thematic relations, that is spatial, temporal, causal other record types. A record type representing a or functional relations between individuals occur- geometric relation between two objects, for exam- ing in the same situations (Lin and Murphy, 2001;

26 Estes et al., 2011). Similarly, (Cooper, 2008) ar- mental-set of perceptual stimuli that are relevant gue that agents organise their type inventory into to the task they are consciously carrying out. resources that are employed and modified in dif- Early research on attention was based on the ferent activities. If this is so, in addition to a rea- concept of a structural single channel bottleneck in soning mechanism on subtype relations humans perceptual processing (Broadbent, 1958; Welford, must also rely on processes by which bundles of 1967). The early orthodoxy of attention as a bot- types are primed for in particular situational con- tleneck within a single channel has been chal- texts. As a consequence agents will not need to lenged by several researchers (e.g., (Allport et al., check each situation (sensory reading in the form 1972)) and more recent models have viewed atten- of a record) for every type in their inventory but tion as a shared resource or capacity that can be only those that they are primed for. A property that spread across multiple tasks simultaneously. For such priming mechanism must take into account is example, in the (Kahneman, 1973) theory of at- that the more accurately an agent is primed by its tention and effort the attention capacity can be fo- contexts, the lower the uncertainty and hence the cused on an individual task or shared across mul- smaller the set of the types it is primed for. tiple tasks and the more difficult a task is the more In this paper we focus on the mechanisms that attention is required by that task. Furthermore, the drive the discovery of thematic relations and pro- allocation of attention across tasks can be flexibly pose a computational model how such relations updated as the agent changes their attention policy are applied in interaction to prime an agent. The from one moment to the next. basic premise of the paper is that the mechanisms An enduring question within attention research underpinning attention are fundamental to the con- has been to understand the conditions under which trol and priming of judgements in TTR. In Sec- the perception of task irrelevant distractors is pre- tion 2, we introduce Load Theory of Attention. In vented. Most of this research in the 60s, 70s, and Section 3, we present an account, based on Load 80s was framed in terms of the early-late debate Theory of Attention, of how two different kinds of which focused on whether the structural bottle- TTR judgements can be controlled and primed in neck that excluded distractors occurred early or an agent. In Section 4, we introduce a mathemat- late in perceptual processing. Some researchers ical framework that illustrates how an agent can argued that attention could exclude early percep- maintain probability distributions over its cogni- tual processing of distractors (e.g., (Treisman, tive states and types and use them in the priming 1969)) while others argued that distractor objects process. In Section 5 we give a worked example of were perceptually processed and attention only af- this framework priming an agent for judgements. fected post perceptual processing – such as work- Section 6 gives some remarks about its usability ing memory and response selection (e.g., (Duncan, and presents our future work. 1980)). The reason for such a protracted debate was that there was a lot of evidence to support both 2 Load Theory of Attention views. Results from some studies indicated that One of the major contributions of 20th century unattended information went unnoticed (support- psychology has been the study and improved un- ing an early filter) and other studies indicated that derstanding of perceptual attention in humans. distractors were perceptually processed and inter- There is more than one type of attention mecha- fered with task response (supporting a late filter). nism. In particular, a distinction can be made be- A well regarded recent model of attention is per- tween bottom-up attention and top-down attention ceptual load theory (Lavie et al., 2004). The con- processes. Bottom-up attention is automatic, task cept of perceptual load is difficult to define but can independent, not under conscious control and is at- be characterised in terms of the number of items tracted towards salient entities in the environment that are perceptually available (the more items, the (e.g., moving object, singleton red objects, etc.). higher the load) and the demands of the percep- Top-down attention can be consciously directed tual task (e.g., selecting an object based on type by an agent and is dependent on the task they are and colour is more demanding then selecting an carrying out as tasks will have different complex- object based solely on type). Perceptual load also ities. Sometimes top-down attention is described involves defining what constitutes an item in a dis- in terms of an agent being primed to respond to a play: (Lavie et al., 2004) give the example that

27 a string of letters can be considered one item (a by the cognitive control mechanisms of Load The- word) or several items (letters) depending on the ory. These judgements are applied to types that task. Perceptual load theory attempts to resolve are in working memory and result in new types be- the early-late debate using a model of attention ing introduced to working memory. Task induced that distinguishes between two mechanisms of se- and context induced judgements are primed by the lective attention: perceptual selection and cogni- types associated via memory with the current active control. Perceptual selection is a mechanism tivities that the agent is currently engaged in and that excludes the perception of task irrelevant dis- their physical location. For example, if an agent is tractors under situations of high perceptual load; making a cup of tea there are a default set of ob- however, in situations of low perceptual load any jects relevant to that task that the agent will carry spare capacity will spill over to the perception of out a visual search for and purposefully recognise distractor objects. The cognitive control mecha- (the kettle, tea bags, cups, etc.). The definition of a nism is an active process that reduces the interfer- set of relevant types corresponding to these objects ence from perceived distractors on task response. can be understood as priming a set of task induced It does so by actively maintaining the processing judgements related to the recognition of these ob- prioritisation of task relevant stimuli within the set jects. Finally, context induced judgements can be of perceived stimuli. understood as task related judgements that are not by default related with the task but that are exten- 3 Load Theory and type judgements sions to this set and are caused by the agent’s interactions with other agents and the physical context Agents learn types all the time by making gen- of the task. For example, while an agent is making eralisations of invariances in the world and infor- a cup of tea another agent warns them to take care mation communicated to them through conversa- because the plate beside the kettle is very hot or the tion (direct transferal of knowledge). However, agent may inadvertently touch the plate and sense in order to access the knowledge quickly and ef- the heat on its own. The judgement relating to the ficiently, they organise it in a certain way in mem- interpretation of the utterance “the plate beside the ory. We propose a method of how an agent (i) or- kettle” or the sensing of and predicting the desired ganises its type inventory in memory and (ii) ap- reaction to a hot surface can be understood as a plies this type inventory using a model of attention context induced judgement. The utterance or the that avoids the exponential problem of judgements hot plate is not a part of the task but is introduced it would have to make without prioritising its type in the context in which the task is taking place. checking. We turn to the second notion first, the priming of type judgements using a model of at- This raises a question of what mechanisms tention. Within this attention based account a dis- define these classes of type judgements. Pre- tinction can be made between two types of judge- attentive type judgements are the judgements that ments: (i) pre-attentive and (ii) task induced and are fundamental to the agent’s basic operation and context induced judgements. the agent is continually making them in order to be able to cope with its internal states and the ex- 3.1 Attention-driven judgements ternal environment. The types involved in these Pre-attentive judgements are controlled by the per- judgements are intimately linked to agent’s biol- ceptual selection mechanism of Load Theory. The ogy and embodiment as they are the types of ba- result of a pre-attentive judgement is the introduc- sic representations generated by the sensors of the tion of a type into the working memory or infor- agent. As such, there is a finite set of these types. mation state in a dialogue model (Ginzburg and The assignment of other types is governed by the Fernandez,´ 2010). Basic representations of visual attention model of Load Theory. Attention can environment (Ullman, 1984) such as segmentation be either introduced by the task (or agent inter- of a visual scene into entities and background is an nally) or the context (agent externally). In terms example of a pre-attentive judgement. At the very of knowledge representation there is no difference basic level these will be the iconic representations between the types of the activity of tea making and captured by agent’s sensors (Harnad, 1990). Task the types associated with handling of dangerously induced and context induced judgements require hot objects. The set of task and context induced conscious attention. As such, they are controlled types for which an agent is primed at any moment

28 is defined by current pre-attentive judgements and interlocutor or the relationship you have with your the sequence of tasks and contexts the agent has interlocutor (are they an experienced player or an been engaged so far. For example, given that the observer). Finally, the state of making tea could agent has previously been in the corridor coupled prime an agent to make judgements relating to ket- with new pre-attentive judgements could prime the tles and cups and their arrangement in space. agent to be attentive to types one typically judges It might appear that our approach simply pushes in a kitchen. An agent learns through experience the intractability of judgements over the set the types that are relevant in a particular task and of combinatorially exploding types onto the in- context. Practically, this amounts to finding as- tractability over a set of cognitive states. We ar- sociations between types in agent’s memory and gue, however, that not withstanding the apparent their evolution over time. complexity of human inner life there are in fact relatively restricted number of cognitive states that 3.2 Cognitive states a human or an agent trying to live like a human needs to maintain in the course of an average day. Thematic relations are relations between objects, While theoretically there could be as many states events, people and other entities that co-occur or as the number of type judgements discussed ear- interact together in space and time (Lin and Mur- lier it is important to note that these states are built phy, 2001; Estes et al., 2011). Inspired by the by an agent bottom up when an agent discovers concept of a thematic relation we propose that an new situations. Since an agent will be constrained agent’s type inventory is organised as a set of cog- by the environment in which it operates and since nitive states, where each state defines a set of types it can only discover a finite set of situations in its that are related by a thematic relationship. A cog- life, and since it is equipped with learning mecha- nitive state may be the cognitive correlate of the nisms with a strong bias to make generalisations it agent intentionally performing a task but may also will only build a subset of these states that can be be a non-explicit cognitive state of an agent gener- managed by its memory. ally being in a situation or having a disposition. Important features of states and types include: Importantly, we don’t believe that an agent has (a) an agent may be in several states at the same conscious access to all its cognitive states nor can time (they may be making tea and talking about all states be directly mapped to concrete activi- music), and (b) a type may be associated with ties. Rather a cognitive state can be understood more then one state. While an agent is in a state or as a sensitivity towards certain types of objects, states performing any additional type judgements events, and situations where this sensitivity map- associated with one of the states incrementally re- ping between states and entities has been learned duces its ambiguity of being in several states. from experience. For example, there may be a cognitive state associated with the agent’s basic 4 A computational model existence and its wish to continue existing, or of being a parent, or of being in a concert hall, or There are three requirements for our computa- of being involved in a conversation about playing tional model: (i) agents clusters types according a trumpet, or of making a cup of tea. The com- to thematic relations into several states, (ii) types monality across this disparate set is the fact that are associated with each state with a certain proba- it is possible to list a set of types that are relevant bility, and (iii) a particular type may be associated to each state which represents agent’s resources in with more than one state. Thematic relations be- terms of (Cooper, 2008). For example, the very tween types are expressed by the co-occurrence of fact of an agent’s existence makes it sensitive to types in states. There are several computational entities in the environment that endanger it (large mechanisms that could be used to automatically things moving towards it at speed) or help its ex- create states (clusters of types) with the above istence (food nearby). The cognitive state related properties from data. For example, all three re- to being in a concert hall might prime the agent quirements are satisfied by Latent Dirichlet Allo- to make judgements about the music, the instru- cation (LDA) which is a popular approach to topic ments and the conductor. The state of participating modelling (Blei et al., 2003), the analogy between in a conversation about playing a trumpet prime topic modelling and our scenario being that a topic judgements relating to the body language of your is similar to a state and the association of a word

29 with a topic is equivalent to the association of a right hand side of this equation can be learned type with a state. A drawback with LDA is that from experience. This learning process can be the number of topics or states must be known a pri- simplified if we assume conditional independence ori. However, Hierarchical Dirichlet process (Teh between Pret , Taskt−1, Contt−1 and ASt−1 given st , et al., 2006) is an extension of the LDA where the essentially adopting the same formulation for cal- number of topics is also learned. culating poster probabilities as is used by a stan- Given that an agent has learned thematic rela- dard naive Bayes’ classifier: tions between types in the form of their association to states, the control problem which it is facing is P(s |Pre ,Task ,Cont ,AS ) = that it cannot know which state it is in and conse- t t t−1 t−1 t−1 η × P(Pret |st ) × P(Taskt−1|st ) quently it cannot decide what is the optimal col- (1) lection of types to be primed for in making judge- × P(Contt−1|st ) × P(ASt−1|st ) ments. As a result the agent must try to infer the × P(st ) best sets of types to prime for by estimating: Once we have computed a posterior probability 1. a posterior distribution over the possible over the set of states an agent has we need a mech- states (and, updating this distribution as it anism that explains how this distribution informs receives observations from the world and the process of priming types. The simplest mech- makes judgements about the world) anism would be to select the state with the max- imum a posteriori probability and then load into 2. make a decision regarding which judgements working memory the set of types that are associ- to be primed to make based on the updated ated with this state. This approach has the advan- probability distribution. tage of being computationally simple. However, The posterior probability of being in a particular it has the disadvantage that the agent assumes that state at time t is dependent on the previous states they are only ever in one state, and, furthermore, if at time t −1 (i.e., the Markov property holds: con- two or more states have a high posterior probabil- ditioned on the present the future is independent ity there is the possibility that the agent will keep of the past), the task and context judgements the switching between these states from one moment agent has made following the priming in the pre- to the next. An alternative approach that is less vious t − n states where n is the length of history susceptible to switching between states is to: an agent keeps, and the new pre-attentive judge- 1. use the posterior probability over the states to ments which may reflect perceptual change in its rank and prune the state set, (the states that world. So, the posterior probability of each of the are not pruned are the active states) cognitive states of the agent can be computed as follows: 2. renormalise the probability distribution over the set of active states,

P(st |Pret ,Taskt−1,Contt−1,ASt−1) = 3. compute a posterior probability over the set η × P(Pret ,Taskt−1,Contt−1,ASt−1|st ) of types associated with active states using a

× P(st ) Bayes optimal classiﬁer,

where st is a state at time t, ASt−1 is the set of 4. using the posterior probability over types, 1 active states at time t − 1, Pret is the set of new rank and prune the set of types and load the pre-attentive judgements the agent has just made, set of unpruned types into working memory. Task t−1 is the set of task relevant judgements the In order to rank and prune the state set we sim- agent has made following previous priming, and ply order the states based on their posterior proba- Cont is the set of contextual judgements the agent bilities and remove all the states that have a prob- has made following previous priming, and η de- ability below a predeﬁned threshold. This rank notes a normalisation process that ensures that the and pruning approach essentially implements a total probability mass of the posterior distribution process whereby an agent can recognise what is sums to 1. We argue that the probabilities on the not relevant to the current situation. Renormalis- 1We will deﬁne the set of active states later. ing the probability distribution over the remaining

30 states is a simple process of summing the probabil- 5 Worked example ity mass of the unpruned states and then dividing the probability mass of each state by this sum. The In this section we present a worked example that posterior probability of a type at time t is calcu- illustrates how an agent interacting in and mov- lated using a Bayes optimal classifier as follows: ing around an environment can use the proposed models to prime the set of types judgements it has loaded in its memory. This example assumes that P(typet |Pret ,Taskt−1,Contt−1,ASt−1) = the agent has already learned a number of types P(type|s)× ∑ (2) and has already associated these types with the s∈ASt cognitive states it has constructed over the course P(s|Pre ,Task ,Cont ,AS ) t t−1 t−1 t−1 of its lifetime. where typet denotes a type at time t, ASt de- To begin we will assume our agent has three notes the set of unpruned (active) states at time cognitive states: S1, S2, S3 and the prior proba- t, P(s|Pre,Task,Cont,ASt−1) denotes the probabilities of these states are < 0.4,0.3,0.3 > respec- bility of an active state s after the state set has tively. Furthermore, the state transition matrix is a been pruned and the posterior probability over right stochastic matrix with i rows and j columns the active states has been renormalised, and Pret , where each cell defines the probability of going Taskt−1, Contt−1, and ASt−1 have the same mean- from state i to state j in one time step (i.e., each ings as defined above. Using a Bayes’ optimal cell defines P(S j|Si)) and is defined as follows: approach to calculating the posterior distribution over the types associated with the active states is computationally expensive because it includes a S1 S2 S3 summation across the set of active states. How- S1 0.7 0.2 0.1 ever, the size of this set can be restricted based on S2 0.3 0.4 0.3 the pruning criteria used so the computational cost S3 0.1 0.2 0.7 of this summation operation can be minimised. Some of the benefits, however, of using a Bayes’ We will assume that there are 3 different pre- optimal formulation are that: (a) this process ex- attentive types that the agent can assign to low- plicitly recognises the fact that more then one level perceptual features. For labelling conve- state may be active at one point, (b) it also recog- nience let us assume that these features are biased nises the fact that a type may be associated with to particular locations in a building so that we can more then one state and that the strength of as- name these types after these locations, namely: sociation between the type and a state is proba- OFFICE, CORRIDOR, and KITCHEN (cf. semantic bilistic (P(type|s)), and (c) this formulation is ro- labelling of places (Mart´ınez Mozos et al., 2007)). bust to small variations in the posterior distribu- We will also assume that the agent knows three tion over states (i.e., when the state with the maxi- task/contextual types2. We are interested in con- mum a posteriori probability changes the system structing agents that can participate in dialogues is stable—in terms of the types that are loaded so we have decided that these types include types into memory—if the changes across the distribu- assigned to utterances in dialogue that the agent tion are stable). Once the posterior distribution can engage in; for example, this agent can take over the types has been calculated the types can part in dialogues relating to WEATHER, MACHINE- be ranked and pruned in a similar fashion to the LEARNING, or the general WELL-BEING of some- states. This means that we need two thresholds one. According to our model the agent should for pruning, one for pruning the states and one for have learnt a probabilistic relationship between pruning the types. The ranking and pruning across each of these types and its own cognitive states. the states and the types both reflect the attention The following right stochastic matrix defines the based approach we have taken to this work mod- probabilistic relationship between each of the pre- elled by Load Theory. When the cognitive load attentive types and the state (i.e., each cell defines on the agent is low the pruning of states and types P(type|Si)): can be relaxed and when the cognitive load from the perceptual selection is high the pruning can be- 2For the purposes of the example the distinction between come more severe. task and contextual types is moot.

31 Once we have these probabilities it is a

OFFICE CORRIDOR KITCHEN relatively straightforward process to calculate S1 0.7 0.2 0.1 P(St |Pret ,Taskt−1,Contt−1,ASt−1) using Equa- S2 0.1 0.8 0.1 tion 1. One technical point is that for each factor S3 0.05 0.15 0.8 on the right hand-side of the equation (P(Pret |st ), P(Taskt−1|st ), P(Contt−1|st ) and P(ASt−1|st )) we And, the following matrix deﬁnes the probabilis- assume conditional independence between the tic relationships between each of the task/context conditioned events given the evidence. For exam- types and the states: ple, for each Sit we calculate P(ASt−1|Sit ) as:

P(AS |Si ) = P(AS |Si ) WEATHER MACH.-LEARN. WELL-BEING t−1 t 1t−1 t S1 0.1 0.7 0.2 × P(AS2t−1|Sit ) × P(AS3t−1|Sit ) S2 0.4 0.1 0.5 S3 0.6 0.3 0.1 In this context the posterior probability over the states (rounded to 4 places of decimal) is: We also need to define two attention thresholds: one threshold is used to define the set of active states and the other is used to define the set of ac- P(St |Pret ,Taskt−1,Contt−1,ASt−1) = tive types. Unlike the probabilities defined above < 0.5412,0.3677,0.0911 > (which are relatively fixed and are updated via a separate learning process) these attention thresh- Applying the attention threshold to this distribu- olds may change from moment to moment and are tion over the states the set of active states at time dependent on the cognitive load the agent is expe- t is then {S1,S2} and renormalising the prob- riencing: high load and the thresholds are low, low ability mass over these states gives us a proba- load and the threshold are high. For this example, bility distribution (again rounded to 4 places of we will assume that the agent is under a moder- decimal) of < 0.5954,0.4046 >. We can now ately high load and that both of these thresholds use Equation 2 to calculate the posterior proba- are set to: 0.3. bilities over the task and contextual types. We To begin calculating the set of types that the only do this calculation for types that are as- agent is primed for at time step t we need informa- sociated (i.e., P(type|ASi) > 0) with at least 1 tion relating to: (a) the set of active states at t − 1 of the active states. In this instance all three (ASt−1); (b) the set of task and context type judge- of the task/context types (WEATHER, MACHINE- ments the agent made at time t − 1 (Tt−1); and, (c) LEARNING, and WELL-BEING) are associated the set of pre-attentive judgements the agent has with at least 1 of the active states so we cal- just made at time t (Pret ). For this example we culate the posterior probability for all three of will assume the following: ASt−1 = {S1,S2,S3}, these types. The posterior probabilities over the Tt−1 = {MACHINE-LEARNING, WEATHER}, and types are < 0.2214,0.4573,0.3214 >. Applying Pret = {OFFICE, CORRIDOR}. the type attention threshold of 0.3 to this distribu- Our first step is to calculate the probability dis- tion there are two types that are active MACHINE- tribution over the states at time t. We do this us- LEARNING, and WELL-BEING and the agent will ing Equation 1. Before we apply Equation 1 we be primed to make judgements of these types at need to calculate the probability distribution for time t. P(ASt−1|St ). We can calculate these probabili- In this example, we only pruned one of the ties using the prior probabilities for the states and task/context types from the primed list. However, the transition matrix P(S j|Si) and applying Bayes’ as the number of types grows (remember that types Theorem. The resulting probabilities (rounded to will represent concepts at different levels of ab- 4 places) are as follows: straction) and the number of states also grows then the number of types that are pruned will also grow.

AS1t−1 AS2t−1 AS3t−1 6 Conclusion and future work S1t 0.7000 0.2250 0.0750 S2t 0.3077 0.4615 0.2308 In this paper we present a computational mech- S3t 0.1176 0.2647 0.6176 anism for attention-driven type judgements in an

32 interacting agent that is inspired by cognitive pro- judgements and task and context related judge- cesses in humans such as discovery of thematic ments whose effects on system performance will relations and sharing of cognitive resources be- also be investigated. tween perceptual selection and cognitive control as proposed in Load Theory. It is important to note that the problem of multiple type assignments References or judgements is not exclusive to TTR but is a D. Alan Allport, Barbara Antonis, and Patricia general problem where a cognitive agent has to Reynolds. 1972. On the division of atten- make numerous classifications based on limited tion: A disproof of the single channel hypothesis. computational resources. In robotics this task is The Quarterly journal of experimental psychology, 24(2):225–235. known as visual search (Sjoö,¨ 2011; Kunze et al., 2014). The proposed application of TTR allows us Patrick Blackburn and Johan Bos. 2005. Represen- to formulate a cognitively-inspired computational tation and inference for natural language. A first model for visual search. The approach is also rel- course in computational semantics. CSLI Publica- tions. evant to computational modelling of situated dialogue. Being primed for particular types would David M. Blei, Andrew Y. Ng, and Michael I. Jordan. disambiguate interlocutors utterances based on the 2003. Latent dirichlet allocation. Journal of Ma- previous type judgements and perceptual observa- chine Learning research (JMLR), 3:993–1022. tions. In dialogue generation it allows priming of D.E. Broadbent. 1958. Perception and Communica- the agent to particular topics and therefore can be tion. Pergamon Press. used for topic modelling of a dialogue system. Herbert H. Clark. 1996. Using language. Cambridge The model proposes that an agent has a set of University Press, Cambridge. cognitive states that they have learned from past experience. An agent may be in more than one Robin Cooper, Simon Dobnik, Shalom Lappin, and cognitive state at any one time. There are a set Staffan Larsson. 2015. Probabilistic type theory and natural language semantics. Linguistic Issues in of types associated with each cognitive state of Language Technology — LiLT, 10(4):1–43, Novem- an agent. When a cognitive state is active (un- ber. pruned) an agent is primed to make judgements relating to the types associated with the state. This Robin Cooper. 2005. Austinian truth, attitudes and type theory. Research on Language and Computa- is why our account links judgement in TTR and tion, 3(2):333–362. attention. The difficulty with this account is that because more than one cognitive state is active at Robin Cooper. 2008. Type theory with records and any one time the agent must decide which of the unification-based grammar. In Fritz Hamm and Logics for Linguistic Struc- active cognitive states it should prime for observa- Stephan Kepser, editors, tures. Festschrift for Uwe Monnich¨ , volume 201, tion. The solution is that the agent should maintain page 9. Walter de Gruyter. a distribution over its cognitive states and prime its observation relative to the types associated with Robin Cooper. 2012. Type theory and semantics in the cognitive states with high probability. Follow- flux. In Ruth Kempson, Nicholas Asher, and Tim Fernando, editors, Handbook of the Philosophy of ing Load Theory the agent will actually perceive Science, volume 14 of General editors: Dov M Gab- as many of these primed types as it can before its bay, Paul Thagard and John Woods. Elsevier BV. perceptual capacity is exhausted and it will then select a subset of these primed types for further M. W. M. G Dissanayake, P. M. Newman, H. F. Durrant-Whyte, S. Clark, and M. Csorba. 2001. A cognitive processing. solution to the simultaneous localization and map In our forthcoming work we are working to- building (SLAM) problem. IEEE Transactions on Robotic and Automation, 17(3):229–241. wards a computational application of the model to situated dialogue. We are particularly interested David R Dowty, Robert Eugene Wall, and Stanley Pe- in evaluating the benefits of an agent being primed ters. 1981. Introduction to Montague semantics. D. this way in comparison to when it has no prim- Reidel Pub. Co., Dordrecht, Holland. ing at all. The model introduces several parame- John Duncan. 1980. The locus of interference in the ters, for example the number of states, the num- perception of simultaneous stimuli. Psychological ber of types, the size of memory for pre-attentive review, 87(3):272–300.

33 Zachary Estes, Sabrina Golonka, and Lara L Jones. Luc Steels and Tony Belpaeme. 2005. Coordinating 2011. Thematic thinking: The apprehension and perceptually grounded categories through language: consequences of thematic relations. In Brian Ross, A case study for colour. Behavioral and Brain Sci- editor, The Psychology of Learning and Motivation, ences, 28(4):469–489. volume 54, pages 249–294. Burlington: Academic Press. Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and David M. Blei. 2006. Hierarchical dirichlet Jonathan Ginzburg and Raquel Fernandez.´ 2010. processes. Journal of the American Statistical As- Computational models of dialogue. In Alexander sociation, 101(476):1566–1581. Clark, Chris Fox, and Shalom Lappin, editors, The handbook of computational linguistics and natural Anne M. Treisman. 1969. Strategies and models of se- language processing, Blackwell handbooks in lin- lective attention. Psychological review, 76(3):282– guistics, pages 429–481. Wiley-Blackwell, Chich- 299. ester, United Kingdom. Shimon Ullman. 1984. Visual routines. Cognition, Stevan Harnad. 1990. The symbol grounding problem. 18(1–3):97–159. Physica D, 42(1–3):335–346, June. A.T. Welford. 1967. Single-channel operation in the Julian Hough and Matthew Purver. 2014. Probabilis- brain. Acta Psychologica, 27:5–22. tic type theory for incremental dialogue processing. In Proceedings of the EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS), pages 80–88, Gothenburg, Sweden, April. Associa- tion for Computational Linguistics.

Daniel Kahneman. 1973. Attention and effort. Prentice-Hall, Englewood Cliffs, N.J.

Geert-Jan M. Kruijff, John D. Kelleher, and Nick Hawes. 2006. Information fusion for visual reference resolution in dynamic situated dialogue. In Perception and Interactive Technologies. Springer.

Lars Kunze, Chris Burbridge, and Nick Hawes. 2014. Bootstrapping probabilistic models of qualitative spatial relations for active visual object search. In AAAI Spring Symposium 2014 on Qualitative Rep- resentations for Robots, Stanford University in Palo Alto, California, US, March, 24–26.

Staffan Larsson. 2013. Formal semantics for perceptual classiﬁcation. Journal of Logic and Computa- tion, online:1–35, December 18.

Nilli Lavie, Aleksandra Hirst, Jan W de Fockert, and Essi Viding. 2004. Load theory of selective attention and cognitive control. Journal of Experimental Psychology: General, 133(3):339–354.

Emilie L. Lin and Gregory L. Murphy. 2001. Thematic relations in adults’ concepts. Journal of experimental psychology: General, 130(1):3–28.

Oscar´ Mart´ınez Mozos, Rudolph Triebel, Patric Jens- felt, Axel Rottmann, and Wolfram Burgard. 2007. Supervised semantic labeling of places using information extracted from sensor data. Robotics and Au- tonomous Systems, 55(5):391–402, 5.

Kristoffer Sjo¨o.¨ 2011. Functional understanding of space: Representing spatial knowledge using concepts grounded in an agent’s purpose. Ph.D. the- sis, KTH, Computer Vision and Active Perception (CVAP), Centre for Autonomous Systems (CAS), Stockholm, Sweden.