One Distributional Memory, Many Semantic Spaces

One distributional memory, many semantic spaces Marco Baroni Alessandro Lenci University of Trento University of Pisa Trento, Italy Pisa, Italy [email protected] [email protected] Abstract Think, by contrast, of WordNet, a single network of semantic information that has been adapted to We propose an approach to corpus-based all sorts of tasks, many of them certainly not en- semantics, inspired by cognitive science, visaged by the resource creators. in which different semantic tasks are tack- In this paper, we explore a different approach led using the same underlying reposi- to corpus-based semantics. Our model consists tory of distributional information, col- of a distributional semantic memory – a graph of lected once and for all from the source weighted links between concepts - built once and corpus. Task-specific semantic spaces are for all from our source corpus. Starting from the then built on demand from the repository. tuples that can be extracted from this graph, we A straightforward implementation of our derive multiple semantic spaces to solve a wide proposal achieves state-of-the-art perfor- range of tasks that exemplify various strands of mance on a number of unrelated tasks. corpus-based semantic research: measuring semantic similarity between concepts, concept cate- 1 Introduction gorization, selectional preferences, analogy of re- Corpus-derived distributional semantic spaces lations between concept pairs, finding pairs that have proved valuable in tackling a variety of tasks, instantiate a target relation and spotting an alterna- ranging from concept categorization to relation ex- tion in verb argument structure. Given a graph like traction to many others (Sahlgren, 2006; Turney, the one in Figure 1 below, adaptation to all these 2006; Pado´ and Lapata, 2007). The typical ap- tasks (and many others) can be reduced to two ba- proach in the field has been a “local” one, in which sic operations: 1) building semantic spaces, as co- each semantic task (or set of closely related tasks) occurrence matrices defined by choosing different is treated as a separate problem, that requires its units of the graph as row and column elements; own corpus-derived model and algorithms. Its 2) measuring similarity in the resulting matrix ei- successes notwithstanding, the “one task – one ther between specific rows or between a row and model” approach has also some drawbacks. an average of rows whose elements share a certain From a cognitive angle, corpus-based models property. hold promise as simulations of how humans ac- After reviewing some of the most closely require and use conceptual and linguistic informa- lated work (Section 2), we introduce our approach tion from their environment (Landauer and Du- (Section 3) and, in Section 4, we proceed to test mais, 1997). However, the common view in cog- it in various tasks, showing that its performance is nitive (neuro)science is that humans resort to a always comparable to that of task-specific meth- multipurpose semantic memory, i.e., a database ods. Section 5 draws the current conclusions and of interconnected concepts and properties (Rogers discusses future directions. and McClelland, 2004), adapting the information 2 Related work stored there to the task at hand. From an engineer- ing perspective, going back to the corpus to train a Turney (2008) recently advocated the need for a different model for each application is inefficient uniform approach to corpus-based semantic tasks. and it runs the risk of overfitting the model to a Turney recasts a number of semantic challenges in specific task, while losing sight of its adaptivity – a terms of relational or analogical similarity. Thus, highly desirable feature for any intelligent system. if an algorithm is able to tackle the latter, it can Proceedings of the EACL 2009 Workshop on GEMS: GEometical Models of Natural Language Semantics, pages 1–8, Athens, Greece, 31 March 2009. c 2009 Association for Computational Linguistics 1 also be used to address the former. Turney tests his 3 Distributional semantic memory system in a variety of tasks, obtaining good results across the board. His approach amounts to pick- Many different, apparently unrelated, semantic ing a task (analogy recognition) and reinterpreting tasks resort to the same underlying information, other tasks as its particular instances. Conversely, a “distributional semantic memory” consisting of we assume that each task may keep its speci- weighted concept+link+concept tuples extracted ficity, and unification is achieved by designing a from the corpus. The concepts in the tuples are sufficiently general distributional structure, from typically content words. The link contains corpus- which semantic spaces can be generated on de- derived information about how the two words are mand. Currently, the only task we share with Tur- connected in context: it could be for example a ney is finding SAT analogies, where his method dependency path or a shallow lexico-syntactic pat- outperforms ours by a large margin (cf. Section tern. Finally, the weight typically derives from co- 4.2.1). However, Turney uses a corpus that is occurrence counts for the elements in a tuple, re- 25 times larger than ours, and introduces nega- scaled via entropy, mutual information or similar tive training examples, whereas we dependency- measures. The way in which the tuples are iden- parse our corpus – thus, performance is not di- tified and weighted when populating the memory rectly comparable. Besides the fact that our ap- is, of course, of fundamental importance to the proach does not require labeled training data like quality of the resulting models. However, once Turney’s one, it provides, we believe, a more intu- the memory has been populated, it can be used to itive measure of taxonomic similarity (taxonomic tackle many different tasks, without ever having to neighbours are concepts that share similar con- go back to the source corpus. texts, rather than concepts that co-occur with pat- Our approach can be compared with the typical terns indicating a taxonomic relation), and it is organization of databases, in which multiple alter- better suited to model productive semantic phe- native “views” can be obtained from the same un- nomena, such as the selectional preferences of derlying data structure, to answer different infor- verbs with respect to unseen arguments (eating mation needs. The data structure is virtually inde- topinambur vs. eating ideas). Such tasks will re- pendent from the way in which it is accessed. Sim- quire an extension of the current framework of ilarly, the structure of our repository only obeys Turney (2008) beyond evidence from the direct co- to the distributional constraints extracted from the occurrence of target word pairs. corpus, and it is independent from the ways it will be “queried” to address a specific semantic task. While our unified framework is, as far as we Different tasks can simply be defined by how we know, novel, the specific ways in which we tackle split the tuples from the repository into row and the different tasks are standard. Concept similar- column elements of a matrix whose cells are filled ity is often measured by vectors of co-occurrence by the corresponding weights. Each of these de- with context words that are typed with dependency rived matrices represents a particular view of dis- information (Lin, 1998; Curran and Moens, 2002). tributional memory: we will discuss some of these Our approach to selectional preference is nearly views, and the tasks they are appropriate for, in identical to the one of Pado´ et al. (2007). We Section 4. solve SAT analogies with a simplified version of Concretely, we used here the web-derived, 2- the method of Turney (2006). Detecting whether billion word ukWaC corpus,1 dependency-parsed a pair expresses a target relation by looking at with MINIPAR.2 Focusing for now on modeling shared connector patterns with model pairs is a noun-to-noun and noun-to-verb connections, we common strategy in relation extraction (Pantel and selected the 20,000 most frequent nouns and 5,000 Pennacchiotti, 2008). Finally, our method to de- most frequent verbs as target concepts (minus stop tect verb slot similarity is analogous to the “slot lists of very frequent items). We selected as tar- overlap” of Joanis et al. (2008) and others. Since get links the top 30 most frequent direct verb- we aim at a unified approach, the lack of origi- noun dependency paths (e.g., kill+obj+victim), nality of our task-specific methods should be re- the top 30 preposition-mediated noun-to-noun or garded as a positive fact: our general framework 1http://wacky.sslmit.unibo.it can naturally reproduce, locally, well-tried ad-hoc 2http://www.cs.ualberta.ca/˜lindek/ solutions. minipar.htm 2 die kill subj_in obj subj_in subj_tr subj_in obj subj_tr obj subj_tr 109.4 1335.2 obj 915.4 68.6 subj_tr 38.2 4547.5 538.1 1306.9 8948.3 9.9 22.4 teacher victim policeman soldier use with with in at in in use with at with use in 10.1 3.2 28.9 11894.4 7020.1 2.5 2.5 7.4 30.5 10.3 105.9 41.0 2.8 handbook school gun Figure 1: A fragment of distributional memory verb-to-noun paths (e.g., soldier+with+gun) and include, maximally, one intervening noun, and the top 50 transitive-verb-mediated noun-to-noun noun-to-noun co-occurrence with no more than paths (e.g., soldier+use+gun). We extracted all 2 intervening nouns. The myHAL model uses tuples in which a target link connected two target the same co-occurrence window, but, like HAL concepts. We computed the weight (strength of (Lund and Burgess, 1996), treats left and right co- association) for all the tuples extracted in this way occurrences as distinct features.

Load more