Using Distributional Semantics for Automatic Taxonomy Induction
Total Page:16
File Type:pdf, Size:1020Kb
Using Distributional Semantics for Automatic Taxonomy Induction Bushra Zafar Michael Cochez Usman Qamar College of Electrical & Fraunhofer – FIT College of Electrical & Mechanical Engineering, Sankt Augustin, Germany Mechanical Engineering, National University of michael.cochez@fit.fraunhofer.de National University of Science and Technology Science and Technology Islamabad, Pakistan Informatik 5 Islamabad, Pakistan [email protected] RWTH Aachen University [email protected] DE-52056 Aachen, Germany Abstract—Semantic taxonomies are powerful tools that provide ontology learning, semantic relation classification, and so structured knowledge to Natural Language Processing (NLP), on [9]. A number of different theories and methodologies Information Retreval (IR), and general Artificial Intelligence have been proposed to induce taxonomy automatically. Very (AI) systems. These taxonomies are extensively used for solving knowledge rich problems such as textual entailment and question common are pattern-based approaches. The working principle answering. In this paper, we present a taxonomy induction system of these approaches is to define patterns or rules first and and evaluate it using the benchmarks provided in the Taxonomy find matching text in a large textual corpus. Often a form Extraction Evaluation (TExEval2) Task. The task is to identify of bootstrapping is used. The system iteratively learns more hyponym-hypernym relations and to construct a taxonomy from patterns from the corpus by deriving patterns from the results a given domain specific list. Our approach is based on a word embedding, trained from a large corpus and string-matching found in the previous round. In this paper we focus on a approaches. The overall approach is semi-supervised. We propose distributional semantics approach. We utilize the similarity a generic algorithm that utilizes the vectors from the embedding of the context in which words occur to infer the meaning effectively, to identify hyponym-hypernym relations and to induce of the words. The context of a word is usually defined as the taxonomy. The system generated taxonomies on English the set of words that often occur near in a large corpus. We language for three different domains (environment, food and science) which are evaluated against gold standard taxonomies. are also somewhat inspired by clustering approaches, which The system achieved good results for hyponym-hypernym iden- incorporate clustering based on word (or context) similarity. tification and taxonomy induction, especially when compared to In this paper we present a system which creates a taxonomy other tools using similar background knowledge. from a given set of words. This way, we can compare to the results of the SemEval – Taxonomy Extraction Evaluation I. INTRODUCTION (TExEval-2) workshop1. The presented framework is semi- Semantic taxonomies are powerful tools that provide struc- supervised and combines the strength of word-embedding, tured knowledge to Natural Language Processing (NLP), Infor- clustering, and string-matching to identify hyponym-hypernym mation Retreval (IR), and general Artificial Intelligence (AI) relations. systems. Semantic taxonomies are extensively used for solving The remainder of the paper is structured as follows. In the knowledge-rich-problems such as textual entailment [1] and next section we provide an overview of related work. Then, question answering [2]. There are a number of manually cre- in section III we introduce our own system. In section IV ated and curated taxonomies such as YAGO [3], Freebase [4], we present our evaluation and compare the results with other SUMO [5], etc. These taxonomies have excellent quality and systems. Then we finish with a short conclusion. cover a broad domain. However, manually curating a taxonomy incurs a high cost II. RELATED WORK as can be seen in taxonomies like WordNet [6] and more The process of automatic taxonomy construction, can be encompassing systems like Cyc [7]. Moreover, these systems split in two steps. First, one needs to obtain the terms which remain incomplete and it is sometimes difficult to add new have to be in the taxonomy. These can either be provided or terms because of contradictions. Further, customization for have to be identified and extract from a text corpus. Second, different domains can be cumbersome. Nowadays, partially these terms have to be placed in a hierarchy, forming the due to rapid technological development and availability of data taxonomy. Often these two steps are combined into one as sources, it has become feasible to construct semi-automatic, or in the pattern-based approaches which we will review first. even automatic, domain specific taxonomies, instead of creat- After that we will look into clustering, classification, graph, ing those manually. [8] The process of taxonomy construction goes by many names like automatic taxonomy induction, 1http://alt.qcri.org/semeval2016/task13/ and distributional approaches. Note that these approaches are using morpho-syntactic analyzes. She essentially proposed often combined, like for example Kozareva and Hovy [10] three identifying principles. First, in single word terms the who made a system based on patterns and graph algorithms suffix is considered a hypernym if the prefix is an existing for creating the taxonomy. term. For example, pin is a hypernym of candlepin. Second, Pattern-based (also sometimes called rule-based) approaches for multi word terms, a term t1 is hypernym of multi word form perhaps the most common class of automatic taxonomy term t2 if t2 contains t1 (as pioneered by Jones). Finally, induction tools. The core idea of these approaches is to prepositions in words are used. For example, the concept establish certain patterns and if these patterns are found in sociology of culture contains the preposition of and hence the corpus the system infers a hypernym–hyponym relation we would decide that sociology is a hypernym of culture between them. For example, if the pattern a [[noun1]] is (example from [8]). There are a few limitations associated a [[noun2]] (where the double rectangular braced words with string-based methods. The first one is that it can only be are placeholders for actual words in the text) is found, then applied to compound terms (e.g., the hypernym of chemistry one can conclude that noun2 is a hypernym of noun1. If cannot be found using a string-based method). The second these patterns are chosen wisely, these systems achieve a high one is that a string-based method cannot differentiate direct accuracy and often remain relatively simple in implementation. and indirect hypernyms. For instance, animal science is a life Obviously, this kind of patterns can be used for applications science, which in turn is a science. However, a string-based beyond hyponym–hypernym detection. Also other relations method will identify science as the hypernym of both. like part-of (meronomy), is-like (synonym), is born in, etc. Another large group of approaches are based on clustering. can be detected using patterns. For these approaches, word contexts are represented as vectors The creation and selection of patterns can be done either containing the values for selected features. Next, a similarity manually [11] or automatically using automatic bootstrap- between the vectors is defined and then the clustering is ping [12]. The bootstrapping technique starts from a limited performed [16], [17]. A number of features have been used set of man-made patterns and extracts words fitting the patterns to create these vectors such as noun-verb relations [18], co- from the corpus. Then, these extracted words are used to find occurrence [19], and so on. Using clustering-based approaches new rules from the corpus. If we, for example, using the rule it is possible to identify relations which do not occur explicitly above, learned that a cake is a dessert, then we can now in the text corpus. However, labeling of clusters is required go trough the corpus and detect other sentences containing in clustering-based approaches, which is difficult for taxon- cake and dessert. If we find the sentence “cake is my favorite omy induction and evaluation. Moreover, similarly to pattern- dessert”, we could instantiate the new rule [[noun1]] is based, the clustering-based approaches fail to produce coherent my favorite [[noun2]]. Now, this new rule can be clusters for small corpora, and hence do not produce good used to find more pairs of words and the process gets repeated. results. To our knowledge, clustering-based approaches have Already in early works pattern-based approach are used for only been used to identify and extract hyponym-hypernym (is- finding is-a relation from a large corpus. Hearst [12], for the a) and synonym (is-like) relations. first time, utilized a set of hyponym-hypernym patterns and Snow et al. [20] used sentences and parse trees for hy- performed bootstrapping to automatically find the is-a relation pernym identification. First, the sentences are extracted and from a corpus. These patterns are now known as Hearst-style if they contain two terms parse trees are used to extract patterns. Others have also used this pattern style like, for patterns. These patterns were consecutively used to train the instance, Mann [13] who used Hearst-style patterns to extract hypernym classifier to induce a taxonomy from text corpora. the is-a relation for proper nouns. Yang and Callan [9] introduced a semi-supervised approach However, a drawback of pattern-based approaches is the for taxonomy induction