Event-Based Knowledge Reconciliation Using Frame Embeddings And

Event-Based Knowledge Reconciliation Using Frame Embeddings And

Knowledge-Based Systems 135 (2017) 192–203 Contents lists available at ScienceDirect Knowle dge-Base d Systems journal homepage: www.elsevier.com/locate/knosys Event-base d knowle dge reconciliation using frame emb e ddings and frame similarity ∗ Mehwish Alam a, Diego Reforgiato Recupero b,d, , Misael Mongiovi c, Aldo Gangemi a,d, Petar Ristoski e a Université Paris 13, 99 avenue JB Clément, Villetaneuse, Paris 93430, France b Università degli Studi di Cagliari, Department of Mathematics and Computer Science, Via Ospedale 72, Cagliari 09124, Italy c CNR, ISTC, Catania, Italy d CNR, ISTC, Via S. Martino della Battaglia 44, Rome, Italy e University of Mannheim, Mannheim, Germany a r t i c l e i n f o a b s t r a c t Article history: This paper proposes an evolution over MERGILO, a tool for reconciling knowledge graphs extracted from Received 6 April 2017 text, using graph alignment and word similarity. The reconciled knowledge graphs are typically used Revised 9 August 2017 for multi-document summarization, or to detect knowledge evolution across document series. The main Accepted 14 August 2017 point of improvement focuses on event reconciliation i.e., reconciling knowledge graphs generated by text Available online 16 August 2017 about two similar events described differently. In order to gather a complete semantic representation of Keywords: events, we use FRED semantic web machine reader, jointly with Framester, a linguistic linked data hub Knowledge reconciliation represented using a novel formal semantics for frames. Framester is used to enhance the extracted event Frame semantics knowledge with semantic frames. We extend MERGILO with similarities based on the graph structure of Frame embeddings semantic frames and the subsumption hierarchy of semantic roles as defined in Framester. With an effec- Frame similarity tive evaluation strategy similarly as used for MERGILO, we show the improvement of the new approach Role similarity (MERGILO plus semantic frame/role similarities) over the baseline. Role embeddings FrameNet ©2017 Elsevier B.V. All rights reserved. Framester 1. Introduction multiple closely related pieces of text), for assessing sentence or document similarity, etc. Several approaches have been proposed for extracting knowl- The current study mainly targets the problem of knowledge rec- edge graphs from text. These knowledge graphs are generated onciliation from the perspective of events. In a text, a complete with the aim of making unstructured text machine-readable [1] . description of an event is syntactically denoted by a verb, since it In case of multiple texts explaining similar events, it is more ef- defines a relation between event participants. The first step in the ficient and usable to provide the machine with a combination of event-based knowledge reconciliation is to extract event-oriented multiple graphs generated by multiple texts. Using this merged knowledge graphs. For doing so, we use FRED, a machine reader graph, a machine reader can obtain knowledge contained in multi- presented in [1] , which generates an RDF/OWL graph of any open ple texts from a single consolidated graph instead of reading sev- domain input text. eral graphs. This problem, termed as ”Knowledge Reconciliation” For dealing with different lexical units describing the same or (KR), has recently been addressed by MERGILO [2] , a tool for rec- similar events, we enhance the existing pipeline by enriching the onciling knowledge graphs using graph alignment and word sim- knowledge graphs generated by FRED with semantic frames as de- ilarity. These reconciled knowledge graphs can further be utilized fined in FrameNet 1 . For this purpose, this study further makes by specific NLP applications, in particular by graph-based text sum- use of mappings between VerbNet 2 (i.e., VerbNet verb classes marization (which aims at summarizing knowledge represented in and VerbNet roles) and FrameNet, as contained in Framester [3] . Framester is a linguistic linked data hub formulated using a novel formal semantics for frames for improving semantic interoperabil- ∗ Corresponding author at: via simeto 5, san gregorio, Italy. ity between linguistic resources. Framester uses the RDF version of E-mail addresses: [email protected] (M. Alam), [email protected] (D. Reforgiato Recupero), [email protected] (M. Mongiovi), [email protected] (A. Gangemi), 1 https://framenet.icsi.berkeley.edu/fndrupal/ . [email protected] (P. Ristoski). 2 https://verbs.colorado.edu/ ∼mpalmer/projects/verbnet.html . http://dx.doi.org/10.1016/j.knosys.2017.08.014 0950-7051/© 2017 Elsevier B.V. All rights reserved. M. Alam et al. / Knowledge-Based Systems 135 (2017) 192–203 193 FrameNet [4] 3 , formalizes the FrameNet graph in OWL, and intro- cation problem. They generally extract a set of features (context duces a very rich subsumption hierarchy related to FrameNet frame words, part of speech tags, dependency path between entity, edit elements (semantic roles) . distance, etc.) from the sentence and the corresponding labels are We use Framester graph representations as a way to improve obtained from a large annotated training corpus. Usually these similarity between the nodes and the edges, where nodes rep- approaches are neither general nor scalable and computationally resent the frames and edges represent the roles. When different very expensive due to the requirement of large amount of train- verbs denote similar events, i.e. different verbs evoke different ing data. Semi-supervised approaches start with seed triples and frames which are somehow connected in the FrameNet graph us- iterate through the text to extract patterns that match them. Pat- ing the semantic relations already defined in FrameNet (such as terns become new seed triples and the process is recursively re- Inheritance, SubFrame, ... ), we can greatly improve simple string peated until no other pattern is found. Some of the most popu- matching techniques introduced in MERGILO with frame as well as lar approaches in this category are Dual Iterative Pattern Relation semantic role similarity measures. For doing so we considered the Extractor [11] , Snowball [12] , Text Runner [13] . For the last cate- similarities based on the graph structure of the FrameNet frames gory, distant supervision approaches, existing knowledge bases are as well as the subsumption hierarchy associated to the seman- used with large text corpus to generate a large number of relation tic roles defined in Framester. FrameNet graph organizes frames triples. These relations are located within the text and from them using semantic relations; to benefit from this graphical structure new hypothesis are learned to obtain a generalized model for rela- we adapt WordNet similarity measures [5] to FrameNet graph. tion extraction. Projects such as NELL use predefined ontology and We further exploit the vector representations of frames using the bootstrap relations from the web and text using seed examples of FrameNet graph and the subsumption hierarchy of roles as repre- ontology constraints. Then they use multi-view learning paradigm sented in Framester. We follow the approach RDF2Vec [6] to gen- to extract entities and relations from unstructured text. erate graph based frame embeddings referred to as Frame2Vec . These graph-based embeddings make use of graph mining algo- 2.2. Knowledge integration rithms such as graph walks and graph kernels to traverse over the graph, which is further used for generating its vector rep- Approaches for integrating knowledge include cross-document resentations. In order to find the similarity between two frames coreference resolution (when knowledge is represented as text and between two roles, this study uses WordNet similarities and documents) and ontology matching (when knowledge is in a cosine similarity for obtaining better consolidation between mul- machine-readable form). Cross-document coreference resolution tiple graphs, which lead to an improvement over the results of aims at associating mentions about a same entity (object, per- a baseline algorithm for knowledge reconciliation, MERGILO [2] . son, concept, etc.) across different texts [14–17] . When extracted MERGILO already computes the similarity between the roles repre- entities are events, the problem changes to resolution of event sented as edges in the FRED graphs but it merely performs string coreference across documents [18,19] . Authors in [19] jointly model matching for finding if the roles are similar. These embeddings can named entities and events. Clusters of entities and event mentions further be used for any NLP application, however in the current are constructed and merged accordingly to a similarity threshold scenario we use it for knowledge reconciliation purposes. based on linear regression. Then, information flows between en- More in detail, the paper is organized as follows. Section 2 in- tity and event clusters through features that model semantic role troduces state of art and related work. Section 3 lists the data dependencies. The system handles nominal and verbal events as sources, resources and tools we have adopted in our methodol- well as entities, and the joint formulation allows information from ogy. Then, Section 4 gives some details of MERGILO and its func- event coreference to help entity coreference, and vice-versa. tionalities for use as basis for the Section 5 , which explains how A rich overview of ontology matching methods is provided frame semantics have been

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us