Sar-Graphs: a Linked Linguistic Knowledge Resource Connecting Facts with Language
Total Page:16
File Type:pdf, Size:1020Kb
Sar-graphs: A Linked Linguistic Knowledge Resource Connecting Facts with Language Sebastian Krause, Leonhard Hennig, Aleksandra Gabryszak, Feiyu Xu, Hans Uszkoreit DFKI Language Technology Lab, Berlin, Germany skrause,lehe02,alga02,feiyu,uszkoreit @dfki.de { } Abstract open data movement, since they address com- plementary aspects of encyclopedic and linguistic We present sar-graphs, a knowledge re- knowledge. source that links semantic relations from Few to none of the existing resources, however, factual knowledge graphs to the lin- explicitly link the semantic relations of knowl- guistic patterns with which a language edge graphs to the linguistic patterns, at the level can express instances of these relations. of phrases or sentences, that are used to express Sar-graphs expand upon existing lexico- these relations in natural language text. Lexical- semantic resources by modeling syntactic semantic resources focus on linkage at the level and semantic information at the level of of individual lexical items. For example, Babel- relations, and are hence useful for tasks Net integrates entity information from Wikipedia such as knowledge base population and re- with word senses from WordNet, UWN is a mul- lation extraction. We present a language- tilingual WordNet built from various resources, independent method to automatically con- and UBY integrates several linguistic resources by struct sar-graph instances that is based linking them at the word-sense level. Linguistic on distantly supervised relation extraction. knowledge resources that go beyond the level of We link sar-graphs at the lexical level to lexical items are scarce and of limited coverage BabelNet, WordNet and UBY, and present due to significant investment of human effort and our ongoing work on pattern- and relation- expertise required for their construction. Among level linking to FrameNet. An initial these are FrameNet (Baker et al., 1998), which dataset of English sar-graphs for 25 rela- provides fine-grained semantic relations of pred- tions is made publicly available, together icates and their arguments, and VerbNet (Schuler, with a Java-based API. 2005), which models verb-class specific syntac- tic and semantic preferences. What is missing, 1 Introduction therefore, is a large-scale, preferably automati- Knowledge graphs, such as Freebase or YAGO, cally constructed linguistic resource that links lan- are networks which contain information about guage expressions at the phrase or sentence level real-world entities and their semantic types, prop- to the semantic relations of knowledge bases, as erties and relations. In recent years consider- well as to existing terminological resources. Such able effort has been invested into constructing a repository would be very useful for many infor- these large knowledge bases in academic research, mation extraction tasks, e.g., for relation extrac- community-driven projects and industrial devel- tion and knowledge base population. opment (Bollacker et al., 2008; Suchanek et al., We aim to fill this gap with a resource whose 2008; Lehmann et al., 2015). A parallel and in structure we define in Section 2. Instances of this part independent development is the emergence resource are graphs of semantically-associated re- of large-scale lexical-semantic resources, such as lations, which we refer to by the name sar-graphs. BabelNet or UBY, which encode linguistic infor- We believe that sar-graphs are examples for a new mation about words and their relations (de Melo type of knowledge repository, language graphs, and Weikum, 2009; Navigli and Ponzetto, 2012; as they represent the linguistic patterns for the re- Gurevych et al., 2012). Both types of resources lations contained in a knowledge graph. A lan- are important contributions to the linguistic linked guage graph can be thought of as a bridge between 30 Proceedings of the 4th Workshop on Linked Data in Linguistics (LDL-2015), pages 30–38, Beijing, China, July 31, 2015. c 2015 Association for Computational Linguistics and Asian Federation of Natural Language Processing the language and the facts encoded in a knowl- The function of sar-graphs is to represent the edge graph, a bridge that characterizes the ways linguistic constructions a language l provides for in which a language can express instances of re- referring to instances of r. A vertex v V corre- ∈ lations. Our contributions in this paper are as fol- sponds to either a word in such a construction, or lows: an argument of the relation. The features assigned to a vertex via the labeling function f provide We present a model for sar-graphs, a resource • information about lexico-syntactic aspects (word of linked linguistic patterns which are used to form and lemma, word class), and lexical seman- express factual information from knowledge tics (word sense), or semantic attributes (global graphs in natural language text. We model entity identifier, entity type, semantic role in the these patterns at a fine-grained lexico-syntactic target relation). They may also provide statisti- and semantic level (Section 2). cal and meta information (e.g., frequency). The We describe the word-level linking of sar- • linguistic constructions are modeled as sub-trees graph patterns to existing lexical-semantic re- of dependency-graph representations of sentences. sources (BabelNet, WordNet, and UBY; Sec- We will refer to these trees as dependency struc- tion 3) tures or dependency constructions. Each structure We discuss our ongoing work of linking sar- • typically describes one particular way to express graphs at the pattern and relation level to relation r in language l. Edges e E are conse- FrameNet (Section 4) ∈ quently labeled with dependency tags, in addition We describe a language-independent, distantly • to, e.g., frequency information. supervised approach for automatically con- A given graph instance is specific to a language structing sar-graph instances, and present a l and target relation r. In general, r links n 2 en- first published and linked dataset of English ≥ tities. An example relation is marriage, connect- sar-graphs for 25 Freebase relations (Sec- ing two spouses to one another, and optionally to tion 5) the location and date of their wedding, as well as 2 Sar-graphs: A linguistic knowledge to their date of divorce: resource rmar.(SPOUSE1, SPOUSE2, CEREMONY, FROM, TO). Sar-graphs (Uszkoreit and Xu, 2013) extend the If a given language l only provides a single con- current range of knowledge graphs, which repre- struction to express an instance of r, then the de- sent factual, relational and common-sense infor- pendency structure of this construction forms the mation for one or more languages, with linguistic entire sar-graph. But if the language offers al- variants of how semantic relations between real- ternatives to this construction, i.e., paraphrases, world entities are expressed in natural language. their dependency structures are also added to the Definition Sar-graphs are directed multigraphs sar-graph. They are connected in such a way containing linguistic knowledge at the syntactic that all vertices labeled by the same argument and lexical semantic level. A sar-graph is a tuple name are merged, i.e., lexical specifics like word form, lemma, class, etc. are dropped from the Gr,l = (V,E, f ,A f ,Σ f ), vertices corresponding to the semantic arguments where V is the set of vertices and E is the set of of the target relation. The granularity of such a edges. The labeling function f associates both ver- dependency-structure merge is however not fixed tices and edges with sets of features (i.e., attribute- and can be adapted to application needs. value pairs): Figure 1 presents a sar-graph for five English f : V E P(A Σ ) ∪ 7→ f × f constructions with mentions of the marriage rela- where tion. The graph covers the target relation relevant parts of the individual mentions, assembled step- P( ) constructs a powerset, • · wise in a bottom-up fashion. Consider the two A is the set of attributes (i.e., attribute names) • f sentences in the top-left corner of the figure: which vertices and edges may have, and Σ f is the value alphabet of the features, i.e., Example 1 • I met Eve’s husband Jack. the set of possible attribute values for all at- • Lucy and Peter are married since 2011. tributes. • 31 SPOUSE1 SPOUSE2 SPOUSE1 SPOUSE2 FROM SPOUSE1 SPOUSE2 FROM I met Eve’s husband Jack. Lucy and Peter are married since 2011. I attended the wedding ceremony of Lucy and Peter in 2011. conj_and auxpass prep_of poss dep prep_of nsubjpass prep_since nn nsubjpass prep_in nsubj nuptialsnuptials weddingwedding dobj exchange vow syn wedding syn (verb) (noun) (noun) prep_in weddingwedding wedding nn partyparty event nsubj CEREMONY syn syn prep_of ceremony prep_of prep_in (noun) SPOUSE1 FROM SPOUSE2 nsubjpass nsubjpass prep_since prep_from marry splitsplit upup nsubjpass dep (verb) syn hubbyhubby divorce poss syn (verb) hubbiehubbie TO prep_in syn husband (noun) conj_and prep_in nsubj prep_in dobj nsubjpass conj_and nsubj det auxpass prep_from Peter and Lucy exchanged the vows in Paris. Lucy was divorced from Peter in 2012. SPOUSE1 SPOUSE2 CEREMONY SPOUSE2 SPOUSE1 TO Figure 1: Example sar-graph for the marriage relation, constructed using the dependency patterns ex- tracted from the sentences shown in the figure. Dashed vertices and edges represent additional graph elements obtained by linking lexical vertices to BabelNet. From the dependency parse trees of these sen- Less explicit relation mentions A key property tences, we can extract two graphs that connect of sar-graphs is that they store linguistic structures the relation’s arguments. The first sentence lists with varying degrees of explicitness wrt. to the un- the spouses with a possessive construction, the derlying semantic relations. Constructions that re- second sentence using a conjunction. In addi- fer to some part or aspect of the relation would tion, the second sentence provides the marriage normally be seen as sufficient evidence of an in- date.