Sense-Aware Semantic Analysis: a Multi-Prototype Word Representation Model Using Wikipedia
Total Page:16
File Type:pdf, Size:1020Kb
Sense-aware Semantic Analysis: A Multi-prototype Word Representation Model using Wikipedia Zhaohui Wu†, C. Lee Giles‡† †Computer Science and Engineering, ‡Information Sciences and Technology Pennsylvania State University, University Park, PA 16802, USA [email protected], [email protected] Abstract in the vocabulary, which should not be the case since different words could have a different number of senses. Human languages are naturally ambiguous, which makes it difficult to automatically understand the Second, the sense-specific context clusters are generated semantics of text. Most vector space models (VSM) from free text corpus, whose quality cannot be guaranteed treat all occurrences of a word as the same and build a nor evaluated (Purandare and Pedersen 2004; Schutze¨ 1998). single vector to represent the meaning of a word, which It is possible that contexts of different word senses could be fails to capture any ambiguity. We present sense-aware clustered together because they might share some common semantic analysis (SaSA), a multi-prototype VSM words, while contexts of the same word sense could be for word representation based on Wikipedia, which clustered into different groups since they have no common could account for homonymy and polysemy. The words. For example, apple “Apple Inc.” and apple “Apple “sense-specific” prototypes of a word are produced by Corps” share many contextual words in Wikipedia such clustering Wikipedia pages based on both local and as “computer”, “retail”, “shares”, and “logs” even if we global contexts of the word in Wikipedia. Experimental evaluation on semantic relatedness for both isolated consider a context window size of only 3. words and words in sentential contexts and word sense Thus, the question posed would be how can we build induction demonstrate its effectiveness. a sense-aware semantic profile for a word that can give accurate sense-specific prototypes in terms of both number and quality? And for a given context of the word, can the Introduction model assign the semantic representation of a word that Computationally modeling semantics of text has long been corresponds to the specific sense? a fundamental task for natural language understanding. By comparing existing methods that adopted automat- Among many approaches for semantic modeling, distribu- ic sense induction from free text based on context clus- tional semantic models using large scale corpora or web tering, a better way to incorporate sense-awareness into knowledge bases have proven to be effective (Deerwester semantic modeling is to do word sense disambiguation et al. 1990; Gabrilovich and Markovitch 2007; Mikolov et for different occurrences of a word using manually com- al. 2013). Specifically, they provide vector embeddings for plied sense inventories such as WordNet (Miller 1995). a single text unit based on the distributional context where However, due to knowledge acquisition bottleneck (Gale, it occurs, from which semantic relatedness or similarity Church, and Yarowsky 1992b), this approach may often measures can be derived by computing distances between miss corpus/domain-specific senses and may be out of vectors. However, a common limitation of most vector date due to changes in human languages and web con- space models is that each word is only represented by a tent (Pantel and Lin 2002). As such, we will use Wikipedia, single vector, which cannot capture homonymy and polyse- the largest encyclopedia knowledge base online with rich my (Reisinger and Mooney 2010). A natural way to address semantic information and wide knowledge coverage, as a this limitation could be building multi-prototype models semantic corpus on which to test our Sense-aware Semantic that provide different embeddings for different senses of Analysis SaSA. Each dimension in SaSA is a Wikipedia a word. However, this task is under studied with only a concept/article1 where a word appears or co-occurs with. few exceptions (Reisinger and Mooney 2010; Huang et al. By assuming that occurrences of a word in Wikipedia 2012), which cluster the contexts of a word into K clusters articles of similar subjects should share the sense, the sense- to represent multiple senses. specific clusters are generated by agglomerative hierarchical While these multi-prototype models showed significant clustering based on not only the text context, but also improvement over single prototype models, there are two Wikipedia links and categories that could ensure more fundamental problems yet to be addressed. First, they simply semantics, giving different words their own clusters. The predefine a fixed number of prototypes, K, for every word links give unique identification of a word occurrence by Copyright c 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. 1Each concept corresponds to a unique Wikipedia article. ͞apple͟ linking it to a Wikipedia article which provides helpful local disambiguated information. The categories give global topical labels of a Wikipedia article that could also be Find related concepts helpful for sense induction. For example, while the pure text C1 Apple: The apple is the pomaceous fruit of the apple tree « context of word apple in “Apple Inc.” and “Apple Corps” C2 Pome«&Fruit«&Tree « C5 Apple Inc.: Apple Inc. is an American multinational corporation in Cupertino « could not differentiate the two senses, the categories of the C6 Cupertino, California« C7 Apple Corps: Apple Corps Ltd is a multi-armed multimedia corporation « two concepts may easily show the difference since they have ͙ no category labels in common. Sense induction s1: Our contributions can be summarized as follows: Sense s2: s3: C1, C2, ͙ C5, C6, .. C7, .. We propose a multi-prototype model for word represen- space C3, C4, .. • tation, namely SaSA, using Wikipedia that could give Sense Assignment Weighting sense-aware concepts more accurate sense-specific representation of words with T1: The apple tree is small ... C1 C2 C3 C4 C5 C6 C7 .. multiple senses. apple(T1) = (255 89 163 186 4 0 2 .. ) T2: The apple apple(T2) = (0 0 0 0 688 8 22 ..) We apply SaSA to different semantic relatedness tasks, NH\ERDUGLVVPDOO« • including word-to-word (for both isolated words and words in sentential contexts) and text-to-text, and achieve Figure 1: A demonstrative example for SaSA better performance than the state-of-the-art methods in both single prototype and multi-prototype models. then find all linked concepts in contexts of w from those articles. These concepts compose the vector space of w. Sense-aware Semantic Analysis To calculate rij(w), the “relevance” of w to a concept Cij, SaSA follows ESA by representing a word using Wikipedia we define a new TFIDF based measure, namely TFIDFs, concepts. Given the whole Wikipedia concept set = to capture the sense-aware relevance. TF is the sum of two W C1,...,Cn , a word w, and the concept set that relates to parts: number of occurrences of w in Cij, and number of co- { } occurrences of w and C in a sentence in cluster s . DF the word C(w)= Cw1 ,...,Cwk , SaSA models w as its ij i { } is the number of concepts in the cluster that contains w. sense-aware semantic vector V (wsi )=[ri1(w),...,rih(w)], where rij(w) measures the relevance of w under sense When counting the co-occurrences of w and Cij, Cij has s to concept C , and S(w)= s ,...,s denotes all to be explicitly marked as a Wikipedia link to the concept i ij { 1 m} the senses of w inducted from C(w). Specifically, si = Cij. That’s to say, “apple tree” will be counted as one co- Ci1,...,Cih C(w) is a sense cluster containing a set of occurrence of “apple” and the concept “Tree” i.f.f. “tree” is Wikipedia{ concepts}⊂ where occurrences of w share the sense. linked to “http://en.wikipedia.org/wiki/Tree”. Figure 1 demonstrates the work flow of SaSA. Given a word w, it first finds all Wikipedia concepts that relate to One Sense Per Article w, including those contain w (C1, C5, and C7) and those co- As shown in Figure 1, the one sense per article assumption occur with w as Wikipedia links in w’s contexts (C2, C3, C4, made by SaSA is not perfect. For example, in the article and C6). We define a context of w as a sentence containing it. “Apple Inc.”, among 694 occurrences of “apple”, while most Then it uses agglomerative hierarchical clustering to group occurrences refer to Apple Inc., there are 4 referring to the concepts sharing the sense of w into a cluster. All the the fruit apple and 2 referring to “Apple Corps”. However, sense clusters represent the sense space S(w)= s1,...sm . considering that each Wikipedia article actually focuses on { } Given a context of w, sense assignment will determine the a specific concept, it is still reasonable to believe that the sense of w by computing the distance of the context to one sense per article may hold for most cases. We manually the clusters. Finally, the sense-aware concept vector will checked all the articles listed in Apple disambiguation page be constructed based on the relevance scores of w in the and found that each article has an extremely dominant sense underlying sense. For example, the vectors of “apple” in among all occurrences of the word “apple”. Table 1 gives a T1 and T2 are different from each other since they refer to few examples of sense distribution among four articles. As different senses. They only have some relatedness in C5 and we can see, each article has a dominant sense. We examined C7 where both senses have word occurrences. 100 randomly sampled Wikipedia articles and found that 98% of the articles support the assumption. However, con- Concept Space sidering the two papers “one sense per discourse” (Yarowsky A concept of w should be about w.