PATTY: a Taxonomy of Relational Patterns with Semantic Types
Total Page:16
File Type:pdf, Size:1020Kb
PATTY: A Taxonomy of Relational Patterns with Semantic Types Ndapandula Nakashole, Gerhard Weikum, Fabian Suchanek Max Planck Institute for Informatics Saarbrucken,¨ Germany {nnakasho,weikum,suchanek}@mpi-inf.mpg.de Abstract Yet, existing large-scale knowledge bases are mostly limited to abstract binary relationships be- This paper presents PATTY: a large resource for textual patterns that denote binary relations tween entities, such as “bornIn” (Auer 2007; Bol- between entities. The patterns are semanti- lacker 2008; Nastase 2010; Suchanek 2007). These cally typed and organized into a subsumption do not correspond to real text phrases. Only the Re- taxonomy. The PATTY system is based on ef- Verb system (Fader 2011) yields a larger number of ficient algorithms for frequent itemset mining relational textual patterns. However, no attempt is and can process Web-scale corpora. It har- made to organize these patterns into synonymous nesses the rich type system and entity popu- patterns, let alone into a taxonomy. Thus, the pat- lation of large knowledge bases. The PATTY taxonomy comprises 350,569 pattern synsets. terns themselves do not exhibit semantics. Random-sampling-based evaluation shows a Goal. Our goal in this paper is to systematically pattern accuracy of 84.7%. PATTY has 8,162 compile relational patterns from a corpus, and to im- subsumptions, with a random-sampling-based pose a semantically typed structure on them. The precision of 75%. The PATTY resource is freely available for interactive access and result we aim at is a WordNet-style taxonomy of download. binary relations. In particular, we aim at patterns that contain semantic types, such as hsingeri sings 1 Introduction hsongi. We also want to automatically generalize syntactic variations such as sings her hsongi and Motivation. WordNet (Fellbaum 1998) is one of the sings his hsongi, into a more general pattern sings most widely used lexical resources in computer sci- [prp] hsongi with POS tag [prp]. Analogously but ence. It groups nouns, verbs, and adjectives into sets more demandingly, we want to automatically infer of synonyms, and arranges these synonyms in a tax- that the above patterns are semantically subsumed onomy of hypernyms. WordNet is limited to single by the pattern hmusiciani performs on hmusical words. It does not contain entire phrases or pat- compositioni with more general types for the entity terns. For example, WordNet does not contain the arguments in the pattern. pattern X is romantically involved with Y. Just like words, patterns can be synonymous, and they can Compiling and organizing such patterns is chal- subsume each other. The pattern X is romantically lenging for the following reasons. 1) The number involved with Y is synonymous with the pattern X is of possible patterns increases exponentially with the dating Y. Both are subsumed by X knows Y. Patterns length of the patterns. For example, the string “Amy for relations are a vital ingredient for many appli- sings ‘Rehab”’ can give rise to the patterns hsingeri cations, including information extraction and ques- sings hsongi, hpersoni sings hartifacti, hpersoni tion answering. If a large-scale resource of relational [vbz] hentityi, etc. If wildcards for multiple words patterns were available, this could boost progress in are allowed (such as in hpersoni sings * hsongi), the NLP and AI tasks. number of possible patterns explodes. 2) A pattern 1135 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1135–1145, Jeju Island, Korea, 12–14 July 2012. c 2012 Association for Computational Linguistics can be semantically more general than another pat- terns into a taxonomy. Section 8 reports our experi- tern (when one relation is implied by the other re- mental findings. lation), and it can also be syntactically more gen- eral than another pattern (by the use of placehold- 2 Related Work ers such as [vbz]). These two subsumption orders A wealth of taxonomic knowledge bases (KBs) have a non-obvious interplay, and none can be ana- about entities and their semantic classes have be- lyzed without the other. 3) We have to handle pattern come available. These are very rich in terms of sparseness and coincidental matches. If the corpus unary predicates (semantic classes) and their entity is small, e.g., the patterns hsingeri later disliked her instances. However, the number of binary relations song hsongi and hsingeri sang hsongi, may apply to (i.e., relation types, not instances) in these KBs is the same set of entity pairs in the corpus. Still, the usually small: Freebase (Bollacker 2008) has a few patterns are not synonymous. 4) Computing mutual thousand hand-crafted relations. WikiNet (Nastase subsumptions on a large set of patterns may be pro- 2010) has automatically extracted ca. 500 relations hibitively slow. Moreover, due to noise and vague from Wikipedia category names. DBpedia (Auer semantics, patterns may even not form a crisp tax- 2007) has automatically compiled ca. 8000 names onomy, but require a hierarchy in which subsump- of properties from Wikipedia infoboxes, but these tion relations have to be weighted by statistical con- include many involuntary semantic duplicates such fidence measures. as surname and lastname. In all of these projects, Contributions. In this paper, we present PATTY, a the resource contains the relation names, but not the large resource of relational patterns that are arranged natural language patterns for them. The same is true in a semantically meaningful taxonomy, along with for other projects along these lines (Navigli 2010; entity-pair instances. More precisely, our contribu- Philpot 2008; Ponzetto 2007; Suchanek 2007). tions are as follows: In contrast, knowledge base projects that auto- 1) SOL patterns: We define an expressive fam- matically populate relations from Web pages also ily of relational patterns, which combines syntac- learn surface patterns for the relations: examples tic features (S), ontological type signatures (O), and are TextRunner/ReVerb (Banko 2007; Fader 2011), lexical features (L). The crucial novelty is the addi- NELL (Carlson 2010; Mohamed11), Probase (Wu tion of the ontological, semantic dimension to pat- 2011), the dynamic lexicon approach by (Hoffmann terns. When compared to a state-of-the-art pattern 2010; Wu 2008), the LDA-style clustering approach language, we found that SOL patterns yield higher by (Yao 2011), and projects on Web tables (Li- recall while achieving similar precision. maye 2010; Venetis 2011). Of these, only TextRun- 2) Mining algorithms: We present efficient and ner/ReVerb and NELL have made large pattern col- scalable algorithms that can infer SOL patterns and lections publicly available. subsumptions at scale, based on instance-level over- ReVerb (Fader 2011) constrains patterns to verbs laps and an ontological type hierarchy. or verb phrases that end with prepositions, while 3) A large Lexical resource:. On the Wikipe- PATTY can learn arbitrary patterns. More impor- dia corpus, we obtained 350,569 pattern synsets tantly, all methods in the TextRunner/ReVerb family with 84.7% precision. We make our pat- are blind to the ontological dimension of the enti- tern taxonomy available for further research at ties in the patterns. Therefore, there is no notion of www.mpi-inf.mpg.de/yago-naga/patty/ . semantic typing for relation phrases as in PATTY. The paper is structured as follows. Section 2 dis- NELL (Carlson 2010) is based on a fixed set cusses related work. Section 3 outlines the basic of prespecified relations with type signatures, (e.g., machinery for pattern extraction. Section 4 intro- personHasCitizenship: hpersoni × hcountryi), and duces our SOL pattern model. Sections 5 and 6 learns to extract suitable noun-phrase pairs from a present the syntactic and semantic generalization of large Web corpus. In contrast, PATTY discovers pat- patterns. Section 7 explains how to arrange the pat- terns for relations that are a priori unknown. 1136 In OntExt (Mohamed11), the NELL architecture PATTY learns arbitrary phrases for patterns. was extended to automatically compute new re- Several lexical resources capture verb categories lation types (beyond the prespecified ones) for a and entailment: WordNet 3.0 (Fellbaum 1998) con- given type signature of arguments, based on a clus- tains about 13,000 verb senses, with troponymy and tering technique. For example, the relation mu- entailment relations; VerbNet (Kipper 2008) is a hi- sicianPlaysInstrument is found by clustering pat- erarchical lexicon with more than 5,000 verb senses tern co-occurrences for the noun-phrase pairs that in ca. 300 classes, including selectional preferences. fall into the specific type signature hmusiciani × Again, all of these resources focus solely on verbs. hmusicinstrumenti. This technique works for one ConceptNet 5.0 (Havasi 2007) is a thesaurus of type signature at a time, and does not scale up to commonsense knowledge built as a crowdsourcing mining a large corpus. Also, the technique is not endeavor. PATTY, in contrast, is constructed fully suitable for inferring semantic subsumptions. In automatically from large corpora. Automatic learn- contrast, PATTY efficiently acquires patterns from ing of paraphrases and textual entailment has re- large-scale corpora and organizes them into a sub- ceived much attention (see the survey of (Androut- sumption hierarchy. sopoulos 2010)), but does not consider fine-grained Class-based attribute discovery is a special case typing for binary relations, as PATTY does. of mining relational patterns (e.g., (Alfonseca 2010; Pasca 2007; Pasca 2008; Reisinger 2009)). Given a 3 Pattern Extraction semantic class, such as movies or musicians, the task This section explains how we obtain basic textual is to determine relevant attributes, such as cast and patterns from the input corpus. We first apply the budget for movies, or albums and biography for mu- Stanford Parser (Marneffe 2006) to the individual sicians, along with their instances. Unlike PATTY’s sentences of the corpus to obtain dependency paths.