Multi-Modal Models for Concrete and Abstract Concept Meaning

Multi-Modal Models for Concrete and Abstract Concept Meaning Felix Hill Roi Reichart Anna Korhonen Computer Laboratory Technion - IIT Computer Laboratory University of Cambridge Haifa, Israel University of Cambridge [email protected] [email protected] [email protected] Abstract Many computational semantic models represent words as real-valued vectors, encoding their rela- Multi-modal models that learn semantic representations from both linguistic and percep- tive frequency of occurrence in particular forms and tual input outperform language-only models contexts in linguistic corpora (Sahlgren, 2006; Tur- on a range of evaluations, and better reflect ney et al., 2010). Motivated both by parallels with human concept acquisition. Most perceptual human language acquisition and by evidence that input to such models corresponds to concrete many word meanings are grounded in the percep- noun concepts and the superiority of the multi- tual system (Barsalou et al., 2003), recent research modal approach has only been established has explored the integration into text-based models when evaluating on such concepts. We there- of input that approximates the visual or other sen- fore investigate which concepts can be effectively learned by multi-modal models. We sory modalities (Silberer and Lapata, 2012; Bruni show that concreteness determines both which et al., 2014). Such models can learn higher-quality linguistic features are most informative and semantic representations than conventional corpus- the impact of perceptual input in such mod- only models, as evidenced by a range of evaluations. els. We then introduce ridge regression as a means of propagating perceptual informa- However, the majority of perceptual input for the tion from concrete nouns to more abstract con- models in these studies corresponds directly to concepts that is more robust than previous ap- crete noun concepts, such as chocolate or cheese- proaches. Finally, we present weighted gram burger, and the superiority of the multi-modal over matrix combination, a means of combining the corpus-only approach has only been established representations from distinct modalities that when evaluations include such concepts (Leong and outperforms alternatives when both modalities Mihalcea, 2011; Bruni et al., 2012; Roller and are sufficiently rich. Schulte im Walde, 2013; Silberer and Lapata, 2012). It is thus unclear if the multi-modal approach is ef- 1 Introduction fective for more abstract words, such as guilt or obe- What information is needed to learn the meaning of sity. Indeed, since empirical evidence indicates dif- a word? Children learning words are exposed to a ferences in the representational frameworks of both diverse mix of information sources. These include concrete and abstract concepts (Paivio, 1991; Hill et clues in the language itself, such as nearby words or al., 2013), and verb and noun concepts (Markman speaker intention, but also what the child perceives and Wisniewski, 1997), perceptual information may about the world around it when the word is heard. not fulfill the same role in the representation of the Learning the meaning of words requires not only various concept types. This potential challenge to a sensitivity to both linguistic and perceptual input, the multi-modal approach is of particular practical but also the ability to process and combine informa- importance since concrete nouns constitute only a tion from these modalities in a productive way. small proportion of the open-class, meaning-bearing 285 Transactions of the Association for Computational Linguistics, 2 (2014) 285–296. Action Editor: Rada Mihalcea. Submitted 12/2013; Revised 6/2014; Published 10/2014. c 2014 Association for Computational Linguistics. words in everyday language (Section 2). need to consider the concreteness of the target do- In light of these considerations, this paper ad- main when constructing multi-modal models. dresses three questions: (1) Which information To address (3), we present various means of com- sources (modalities) are important for acquiring bining information from different modalities. We concepts of different types? (2) Can perceptual in- propose weighted gram matrix combination, a tech- put be propagated effectively from concrete to more nique in which representations of distinct modalities abstract words? (3) What is the best way to combine are mapped to a space of common dimension where information from the different sources? coordinates reflect proximity to other concepts. This We construct models that acquire semantic repre- transformation, which has been shown to enhance sentations for four sets of concepts: concrete nouns, semantic representations in the context of verb- abstract nouns, concrete verbs and abstract verbs. clustering (Reichart and Korhonen, 2013), reduces The linguistic input to the models comes from the representation sparsity and facilitates a product- recently released Google Syntactic N-Grams Corpus based combination that results in greater inter-modal (Goldberg and Orwant, 2013), from which a selec- dependency. Weighted gram matrix combination tion of linguistic features are extracted. Perceptual outperforms alternatives such as concatenation and input is approximated by data from the McRae et Canonical Correlation Analysis (CCA) (Hardoon et al. (2005) norms, which encode perceptual proper- al., 2004) when combining representations from two ties of concrete nouns, and the ESPGame dataset similarly rich information sources. (Von Ahn and Dabbish, 2004), which contains man- In Section 3, we present experiments with linguis- ually generated descriptions of 100,000 images. tic features designed to address question (1). These To address (1) we extract representations for analyses are extended to multi-modal models in Sec- each concept type from combinations of information tion 4, where we also address (2) and (3). We first sources. We first focus on different classes of lin- discuss the relevance of concreteness and part-of- guistic features, before extending our models to the speech (lexical function) to concept representation. multi-modal context. While linguistic information overall effectively reflects the meaning of all con- 2 Concreteness and Word Meaning cept types, we show that features encoding syntac- A large and growing body of psychological evidence tic patterns are only valuable for the acquisition of indicates differences between abstract and concrete abstract concepts. On the other hand, perceptual in- concepts.1 It has been shown that concrete words formation, whether directly encoded or propagated are more easily learned, remembered and processed through the model, plays a more important role in than abstract words (Paivio, 1991; Schwanenflugel the representation of concrete concepts. and Shoben, 1983), while neuroimaging studies In addressing (2), we propose ridge regression demonstrate differences in brain activity when sub- (Myers, 1990) as a means of propagating features jects are presented with stimuli corresponding to the from concrete nouns to more abstract concepts. The two concept types (Binder et al., 2005). regularization term in ridge regression encourages solutions that generalize well across concept types. The abstract/concrete distinction is important to We show that ridge regression effectively propagates computational semantics for various reasons. While perceptual information to abstract nouns and con- many models construct representations of concrete crete verbs, and is overall preferable to both lin- words (Andrews et al., 2009; Landauer and Dumais, ear regression and the method of Johns and Jones 1997), abstract words are in fact far more common in (2012) applied to a similar task by Silberer and La- everyday language. For instance, based on an analy- pata (2012). However, for all propagation methods, sis of those noun concepts in the University of South the impact of integrating perceptual information de- Florida dataset (USF) and their occurrence in the pends on the concreteness of the target concepts. In- British National Corpus (BNC) (Leech et al., 1994), deed, for abstract verbs, the most abstract concept 72% of noun tokens in corpora are rated by human type in our evaluations, perceptual input actually de- 1Here concreteness is understood intuitively, as per the psy- grades representation quality. This highlights the chological literature (Rosen, 2001; Gallese and Lakoff, 2005). 286 mood rule praise beam clam sardine penguin Nouns ●● ● believe enjoy leave look stab swing Verbs ● 0 2 4 6 Average Concreteness Rating Figure 1: Boxplot of concreteness distributions for noun and verb concepts in the USF data, with selected example concepts. The bold vertical line is the mean, boxes extend from the first to the third quartile, and dots represent outliers. judges as more abstract than the noun war, a concept these distinctions are pertinent to text-only models. that many would already consider quite abstract.2 The recent interest in multi-modal semantics fur- 3 Concreteness and Linguistic Features ther motivates a principled modelling approach to It has long been known that aspects of word meaning lexical concreteness. Many multi-modal models im- can be inferred from nearby words in corpora. Ap- plicitly distinguish concrete and abstract concepts proaches that exploit this fact are often called dis- since their perceptual input corresponds only to con- tributional models (Sahlgren, 2006; Turney et al., crete words (Bruni et al., 2012; Silberer and Lapata, 2010). We take a distributional

Load more