Unsupervised Template Mining for Semantic Category Understanding
Total Page:16
File Type:pdf, Size:1020Kb
Unsupervised Template Mining for Semantic Category Understanding 1,2 3 3 1 3 Lei Shi ∗, Shuming Shi , Chin-Yew Lin , Yi-Dong Shen , Yong Rui 1State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences 2University of Chinese Academy of Sciences 3Microsoft Research shilei,ydshen @ios.ac.cn shumings,cyl,yongrui{ } @microsoft.com { } Abstract For example, both “CEO of General Motors” and “CEO of Yahoo” have structure “CEO of [com- We propose an unsupervised approach to pany]”. We call such a structure a category tem- constructing templates from a large collec- plate. Taking a large collection of open-domain tion of semantic category names, and use categories as input, we construct a list of category the templates as the semantic representa- templates and build a mapping from categories to tion of categories. The main challenge is templates. Figure 1 shows some example semantic that many terms have multiple meanings, categories and their corresponding templates. resulting in a lot of wrong templates. Sta- Templates can be treated as additional features tistical data and semantic knowledge are of semantic categories. The new features can be extracted from a web corpus to improve exploited to improve some upper-layer applica- template generation. A nonlinear scoring tions like web search and question answering. In function is proposed and demonstrated to addition, by linking categories to templates, it is be effective. Experiments show that our possible (for a computer program) to infer the se- approach achieves significantly better re- mantic meaning of the categories. For example in sults than baseline methods. As an imme- Figure 1, from the two templates linking to cat- diate application, we apply the extracted egory “symptom of insulin deficiency”, it is rea- templates to the cleaning of a category col- sonable to interpret the category as: “a symptom lection and see promising results (preci- of a medical condition called insulin deficiency sion improved from 81% to 89%). which is about the deficiency of one type of hor- mone called insulin.” In this way, our knowledge 1 Introduction about a category can go beyond a simple string A semantic category is a collection of items shar- and its member entities. An immediate application ing common semantic properties. For example, of templates is removing invalid category names all cities in Germany form a semantic category from a noisy category collection. Promising re- named “city in Germany” or “German city”. In sults are observed for this application in our ex- Wikipedia, the category names of an entity are periments. manually edited and displayed at the end of the An intuitive approach to this task (i.e., extract- page for the entity. There have been quite a lot of ing templates from a collection of category names) approaches (Hearst, 1992; Pantel and Ravichan- dran, 2004; Van Durme and Pasca, 2008; Zhang et national holiday of Brazil (instances: Carnival, Christmas…) national holiday of [country] al., 2011) in the literature to automatically extract- national holiday of South Africa (instances: Heritage Day, Christmas…) ing category names and instances (also called is-a school in Denver school in [place] or hypernymy relations) from the web. school in Houston school in [city] football player [sport] player Most existing work simply treats a category basketball player name as a text string containing one or multiple symptom of insulin deficiency (instances: nocturia, weight loss…) symptom of [medical condition] words, without caring about its internal structure. symptom of cortisol deficiency symptom of [hormone] deficiency (instances: low blood sugar…) In this paper, we explore the semantic structure Semantic Categories Category templates of category names (or simply called “categories”). Figure 1: Examples of semantic categories and ∗This work was performed when the first author was vis- iting Microsoft Research Asia. their corresponding templates. 799 Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 799–809, October 25-29, 2014, Doha, Qatar. c 2014 Association for Computational Linguistics contains two stages: category labeling, and tem- 1) To the best of our knowledge, this is the first plate scoring. work of template generation specifically for cate- Category labeling: Divide a category name gories in unsupervised manner. into multiple segments and replace some key seg- 2) We extract semantic knowledge and statisti- ments with its hypernyms. As an example, as- cal information from a web corpus for improving sume “CEO of Delphinus” is divided to three seg- template generation. Significant performance im- ments “CEO + of + Delphinus”; and the last seg- provement is obtained in our experiments. ment (Delphinus) has hypernyms “constellation”, 3) We study the characteristics of the scoring “company”, etc. By replacing this segment with function from the viewpoint of probabilistic evi- its hypernyms, we get candidate templates “CEO dence combination and demonstrate that nonlinear of [constellation]” (a wrong template), “CEO of functions are more effective in this task. [company]”, and the like. 4) We employ the output templates to clean Template scoring: Compute the score of each our category collection mined from the web, and candidate template by aggregating the information get apparent quality improvement (precision im- obtained in the first phase. proved from 81% to 89%). A major challenge here is that many segments After discussing related work in Section 2, we (like “Delphinus” in the above example) have mul- define the problem and describe one baseline ap- tiple meanings. As a result, wrong hypernyms proach in Section 3. Then we introduce our ap- may be adopted to generate incorrect candidate proach in Section 4. Experimental results are re- templates (like “CEO of [constellation]”). In this ported and analyzed in Section 5. We conclude the paper, we focus on improving the template scor- paper in Section 6. ing stage, with the goal of assigning lower scores to bad templates and larger scores to high-quality 2 Related work ones. Several kinds of work are related to ours. There have been some research efforts (Third, Hypernymy relation extraction: Hypernymy 2012; Fernandez-Breis et al., 2010; Quesada- relation extraction is an important task in text min- Martınez et al., 2012) on exploring the structure of ing. There have been a lot of efforts (Hearst, 1992; category names by building patterns. However, we Pantel and Ravichandran, 2004; Van Durme and automatically assign semantic types to the pattern Pasca, 2008; Zhang et al., 2011) in the literature to variables (or called arguments) while they do not. extract hypernymy (or is-a) relations from the web. For example, our template has the form of “city Our target here is not hypernymy extraction, but in [country]” while their patterns are like “city in discovering the semantic structure of hypernyms [X]”. More details are given in the related work (or category names). section. Category name exploration: Category name A similar task is query understanding, including patterns are explored and built in some ex- query tagging and query template mining. Query isting research work. Third (2012) pro- tagging (Li et al., 2009; Reisinger and Pasca, posed to find axiom patterns among category 2011) corresponds to the category labeling stage names on an existing ontology. For ex- described above. It is different from template gen- ample, infer axiom pattern “SubClassOf(AB, eration because the results are for one query only, B)” from “SubClassOf(junior school school)” without merging the information of all queries to and “SubClassOf(domestic mammal mammal)”. generate the final templates. Category template Fernandez-Breis et al. (2010) and Quesada- construction are slightly different from query tem- Martınez et al. (2012) proposed to find lexical pat- plate construction. First, some useful features terns in category names to define axioms (in med- such as query click-through is not available in cat- ical domain). One example pattern mentioned in egory template construction. Second, categories their papers is “[X] binding”. They need man- should be valid natural language phrases, while ual intervention to determine what X means. The queries need not. For example, “city Germany” is main difference between the above work and ours a query but not a valid category name. We discuss is that we automatically assign semantic types to in more details in the related work section. the pattern variables (or called arguments) while Our major contributions are as follows. they do not. 800 Template mining for IE: Some research work Li et al. (2013) proposed an clustering algorithm in information extraction (IE) involves patterns. to group existing query templates by search intents Yangarber (2003) and Stevenson and Greenwood of users. (2005) proposed to learn patterns which were in Compared to the open-domain unsupervised the form of [subject, verb, object]. The category methods for query template construction, our ap- names and learned templates in our work are not proach improves on two aspects. First, we propose in this form. Another difference between our work to incorporate multiple types of semantic knowl- and their work is that, their methods need a super- edge (e.g., term peer similarity and term clusters) vised name classifer to generate the candidate pat- to improve template generation. Second, we pro- terns while our approach is unsupervised. Cham- pose a nonlinear