Towards Understanding Situated Natural Language
Total Page:16
File Type:pdf, Size:1020Kb
Towards Understanding Situated Natural Language Antoine Bordes Nicolas Usunier Ronan Collobert Jason Weston LIP6, Universit´eParis 6, LIP6, Universit´eParis 6, NEC Laboratories America, Google, Paris, France Paris, France Princeton, USA New York, USA [email protected] [email protected] [email protected] [email protected] Abstract park with his telescope." Likewise, in terms of refer- ence resolution one could disambiguate the sentence We present a general framework and learn- \He passed the exam." if one happens to know that ing algorithm for the task of concept label- Bill is taking an exam and John is not. Further, one ing: each word in a given sentence has to be can improve disambiguation of the word bank in \John tagged with the unique physical entity (e.g. went to the bank" if you happen to know whether John person, object or location) or abstract con- is out for a walk in the countryside or in the city. Many cept it refers to. Our method allows both human disambiguation decisions are in fact based on world knowledge and linguistic information whether the current sentence agrees well with one's to be used during learning and prediction. current world model. Such a model is dynamic as the We show experimentally that we can learn current state of the world (e.g. the existing entities to use world knowledge to resolve ambigui- and their relations) changes over time. Note that this ties in language, such as word senses or ref- model that we use to disambiguate can be built from erence resolution, without the use of hand- our perception of the world (e.g. visual perception) crafted rules or features. and not necessarily from language at all. Concept labeling In this paper, we propose a gen- eral framework for learning to use world knowledge 1 Introduction called concept labeling. The \knowledge" we consider can be viewed as a database of physical entities exist- Much of the focus of the natural language processing ing in the world (e.g. people, locations or objects) as community lies in solving syntactic or semantic tasks well as abstract concepts, and relations between them, with the aid of sophisticated machine learning algo- e.g. the location of one entity is expressed in terms of rithms and the encoding of linguistic prior knowledge. its relation with another entity. Our task consists of For example, a typical way of encoding prior knowl- labeling each word of a sentence with its corresponding edge is to hand-code syntax-based input features or concept from the database. rules for a given task. However, one of the most impor- tant features of natural language is that its real-world The solution to this task does not provide a full seman- use (as a tool for humans) is to communicate some- tic interpretation of a sentence, but we believe it is a thing about our physical reality or metaphysical con- important part of that goal. Indeed, in many cases, siderations of that reality. This is strong prior knowl- the meaning of a sentence can only be uncovered af- edge that is simply ignored in most current systems. ter knowing exactly which concepts, e.g. which unique objects in the world, are involved. If one wants to in- For example, in current parsing systems there is no terpret \He passed the exam", one has to infer not only allowance for the ability to disambiguate a sentence that \He" refers to a \John", and \exam" to a school given knowledge of the physical reality of the world. test, but also exactly which \John" and which test it So, if one happened to know that Bill owned a tele- was. In that sense, concept labeling is more general scope while John did not, then this should affect pars- than the traditional tasks like word-sense disambigua- ing decisions given the sentence \John saw Bill in the tion, co-reference resolution, and named-entity recog- nition, and can be seen as a unification of them. Appearing in Proceedings of the 13th International Con- ference on Artificial Intelligence and Statistics (AISTATS) Learning algorithm We then go on to propose a 2010, Chia Laguna Resort, Sardinia, Italy. Volume 9 of tractable algorithm for this task that seamlessly learns JMLR: W&CP 9. Copyright 2010 by the authors. to integrate both world knowledge and linguistic con- 65 Towards Understanding Situated Natural Language tent of a sentence without the use of any hand-crafted tered sentences, also ranges from dialogue systems, rules or features. This is a challenging goal and stan- e.g. (Allen, 1995), to co-reference resolution (Soon dard algorithms do not achieve it. Our algorithm is et al., 2001). We do not consider this type of contex- first evaluated on human generated sentences from tual knowledge in this paper, however our framework RoboCup commentaries (Chen & Mooney, 2008). Yet is extensible to those settings. this dataset does not involve world knowledge. Hence we present experiments using a novel simulation pro- 2 Concept Labeling cedure to generate natural language and concept label pairs: the simulation generates an evolving world, to- We consider the following setup. One must learn a gether with sentences describing the successive evolu- mapping from a natural language sentence x 2 X to tions. These experiments demonstrate that our algo- its labeling in terms of concepts y 2 Y, where y is an rithm can learn to use world knowledge for word dis- ordered set of concepts, one concept for each word in ambiguation and reference resolution when standard the sentence, i.e. y = (c ; : : : ; c ) where c 2 C, the methods cannot. 1 jxj i set of concepts. These concepts belong to a current Although clearly only a first step towards the goal of model of the world which expresses one's knowledge of language understanding, which is AI complete, we feel it. We term it a \universe". The framework of concept our work is an original way of tackling an important labeling is illustrated in Figure 1. and central problem. In a nut-shell, we show one can learn to resolve ambiguities using world knowledge, World knowledge We define the universe as a set which is a prerequisite for further semantic analysis, of concepts and their relations to other concepts: U = e.g. for communication. (C; R1;:::; Rn) where n is the number of types of rela- 2 tion and Ri ⊂ C , 8i = 1; : : : ; n. Much work has been Previous work Our work concerns learning the done on knowledge representation itself (see (Russell connection between two symbolic systems: the one of et al., 1995) for an introduction). This is not the focus natural language and the one, non-linguistic, of the of this paper and so we made this simplest possible concepts present in a database. Making such an as- choice. To make things concrete we now describe the sociation has been studied as the symbol grounding template database we use in this paper. problem (Harnad, 1990) in the literature. More specif- Each concept c of the database is identified using a ically, the problem of connecting natural language to unique string name(c). Each physical object or action another symbolic system is called grounded language (verb) of the universe has its own referent. For exam- processing (Roy & Reiter, 2005). Some of such earliest ple, two different cartons of milk will be referred to works that used world knowledge to improve linguistic as <milk1> and <milk2> Here, we use understand- processing involved hand-coded parsing and no learn- able strings as identifiers for clarity but they have no ing at all, perhaps the most famous being situated in meaning for the system. blocks world (Winograd et al., 1972). In our experiments we will consider two types of rela- More recent works on grounded language acquisition tion1, which give our learner world knowledge about have focused on learning to match language with some geographical properties of entities in its world, that other representations. Grounding text with a visual can be expressed with the following formula: representation also in a blocks-type world was tack- led in (Feldman et al., 1996). Other works use visual • location(c) = c0 with c; c0 2 C: the location of grounding (Siskind, 1994; Yu & Ballard, 2004; Barnard the concept c is the concept c0. & Johnson, 2005), or a representation of the intended meaning in some formal language (Fleischman & Roy, • containedby(c) = c0 with c; c0 2 C: the concept c0 2005; Kate & Mooney, 2007; Wong & Mooney, 2007; physically holds the concept c. Chen & Mooney, 2008; Branavan et al., 2009; Zettle- moyer & Collins, 2009; Liang et al., 2009). Example applications of such grounding include using the mul- Strong and weak supervision Usually in se- timodal input to improve clustering (with respect to quence labeling, learning is carried out using fully unimodal input) e.g. (Siskind, 1994), word-sense dis- supervised data composed of input sequences explic- ambiguation (Barnard & Johnson, 2005; Fleischman itly aligned with label annotations. To learn the task & Roy, 2005) or semantic parsing (Kate & Mooney, of concept labeling, this would result in training ex- 2007; Chen & Mooney, 2008; Branavan et al., 2009; amples composed by triples (x; y; u) 2 X × Y × U as Zettlemoyer & Collins, 2009). 1Of course this list is easily expanded upon. Here, we Work using linguistic context, i.e. previously ut- give two simple properties of physical objects. 66 Bordes, Usunier, Collobert, Weston an immediately realisable setting is within a computer x: He cooks the rice game environment, e.g. multiplayer Internet games. For such games, the knowledge base of concepts is al- y: <John>? <cook>? ? <rice>? ready given by the underlying game model defined by u: location the game designers (e.g.