A Knowledge and Reasoning Toolkit for Cognitive Applications

A Knowledge and Reasoning Toolkit for Cognitive Applications Mustafa Canim Cristina Cornelio Robert Farrell IBM T.J. Watson Research Center IBM T.J. Watson Research Center IBM T.J. Watson Research Center Yorktown Heights, NY Yorktown Heights, NY Yorktown Heights, NY [email protected] [email protected] [email protected] Achille Fokoue Kyle Gao John Gunnels IBM T.J. Watson Research Center IBM T.J. Watson Research Center IBM T.J. Watson Research Center Yorktown Heights, NY Yorktown Heights, NY Yorktown Heights, NY [email protected] [email protected] [email protected] Arun Iyengar Ryan Musa Mariano Rodriguez-Muro IBM T.J. Watson Research Center IBM T.J. Watson Research Center IBM T.J. Watson Research Center Yorktown Heights, NY Yorktown Heights, NY Yorktown Heights, NY [email protected] [email protected] [email protected] Rosario Uceda-Sosa IBM T.J. Watson Research Center Yorktown Heights, NY [email protected] ABSTRACT 1 INTRODUCTION This paper presents a knowledge and reasoning toolkit for devel- There is an increasing need to provide support for cognitive ap- oping cognitive applications which have significant requirements plications which make use of structured and semi-structured data for managing structured and semi-structured data. Our system pro- and require advanced querying and reasoning to be performed on vides enhanced querying and reasoning capabilities along with the data. The specific requirements for these types of applications natural language processing support and the ability to automati- vary. One of the key challenges is to develop tools which support cally extract data from PDF documents. We also have the capability cognitive applications in a general way so that they can be reused to manage ontologies in a user-friendly way. Our system is imple- for a wide variety of customer scenarios. mented as a set of Web services, and we provide enhanced clients We are developing a knowledge and reasoning toolkit (KRT) to allow applications to easily access our knowledge and reasoning based on real world scenarios which is designed to be used for a toolkit. broad range of cognitive applications. It is exposed to the user as a set of Web services allowing applications to easily make use of KEYWORDS it via the http protocol. This paper describes our system and also Data extraction, knowledge representation, natural language pro- provides considerable information about past work in the area. cessing, ontology engineering, reasoning, Resource Description The objective of the Knowledge and Reasoning Toolkit (KRT) is Framework (RDF), Web Ontology Language (OWL) to bring content into a structured knowledge representation to integrate data from heterogeneous sources, and enable conversational ACM Reference Format: access in support of humans engaging in conducting relevant tasks. Mustafa Canim, Cristina Cornelio, Robert Farrell, Achille Fokoue, Kyle Previously, extracting and representing such information from nat- Gao, John Gunnels, Arun Iyengar, Ryan Musa, Mariano Rodriguez-Muro, ural language, structured, and semi-structured (table) content is and Rosario Uceda-Sosa. 2017. A Knowledge and Reasoning Toolkit for highly manual, tedious, and error prone. The KRT creates and lever- Cognitive Applications. In Proceedings of HotWeb’17, San Jose / Silicon Valley, ages knowledge extraction, management, and reasoning techniques CA, USA, October 14, 2017, 10 pages. to reduce the human effort, ensure a principled representation, and https://doi.org/10.1145/3132465.3132478 support effective and fluent human-machine conversation. The KRT leverages knowledge and reasoning at different stages Permission to make digital or hard copies of all or part of this work for personal or of a project’s lifetime. At early stages, the KRT uses reasoning to classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation support data integration tasks, i.e., discovering connections within on the first page. Copyrights for components of this work owned by others than ACM the datasets, proposing initial alignments, and providing pointers must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a to possible misalignments as well as candidate corrections. The fee. Request permissions from [email protected]. KRT services provide means to assess the quality of the data, offer HotWeb’17, October 14, 2017, San Jose / Silicon Valley, CA, USA confidence levels about the integrated data with respect to domain © 2017 Association for Computing Machinery. knowledge and extend the data with new data that might have been ACM ISBN 978-1-4503-5527-8/17/10...$15.00 https://doi.org/10.1145/3132465.3132478 HotWeb’17, October 14, 2017, San Jose / Silicon Valley, CA, USA M. Canim et al. missing, or that can be computed if background knowledge is taken Then, we present KRT approach to handle structured and semi- into account. structured data. Most state-of-the-art approaches decouple the On the query answering and conversational side, the KRT lowers specification of the mappings or transformations from their ac- the effort required to obtain useful data by enabling new knowl- tual execution to either produce a new knowledge graph from the edge management and query answering paradigms that are guided legacy data (materialization approach) or to directly query the data by knowledge and that can interact with the user. Conversational in its original form through query rewrite (virtualization approach). systems can use KRT services to allow a domain expert to interact These approaches mainly differ along the following key dimensions: with the domain knowledge to facilitate bootstrapping, develop- • Format specific vs. Format agnostic ment and maintenance of the system. During query answering, • Generic automated direct mapping vs. Customizable map- conversational systems can use KRT query services to explore do- ping specification main knowledge, help the user refine the query and obtain the • Materialization vs. Virtualization desired answers. The KRT provides new forms of query answering • High level declarative mapping language vs. General purpose that combine the power of structured queries, with the flexibility of transformation/query or programming language information retrieval techniques, enabling complex query answer- • Existence vs. Absence of visual mapping editors ing with lower requirements with respect to data consistency. Some of these dimensions represent a spectrum along which different proposals fall. Others correspond to discrete (but not necessarily 1.1 Key challenges mutually exclusive) design choices. Data extraction and data integration projects can require large in- vestments before some results can be exploited. The integration 2.1 Format specific vs. Format agnostic of the structured data, e.g., mapping and alignment, and the train- The research on mapping or aligning legacy data to knowledge ing of models for information extraction for the unstructured data graphs started more than a decade ago with, as initial focus, the are lengthy and tedious tasks. Given that the tools and techniques exploration of approaches to map relational data into RDF/OWL. involved are unaware of the meaning of the data and the domain, Experiences from early systems such as D2RQ [9, 10], R2O [68], issues in this data and in the integration or in the models are often and SQL-RDF [70], which offer both generic automated mapping invisible until later stages in the project. Debugging and mainte- and customizable mapping approaches, led to the adoption of the nance is a difficult task for which there is little or no automated W3C standard mapping language for customizing the mapping from support. relational data to RDF, R2ML[23], along with an approach for auto- Similarly, constructing conversational systems that can use the mated generic direct mapping from relational data to RDF[4]. [57] resulting data is often a costly project-specific task. Mapping of user provides a good survey of approaches and systems to map relational queries to data sets is done in an application specific fashion, limit- data to RDF. Likewise, multiple approaches have been proposed to ing re-usability of the systems developed and limiting the queries map XML data to RDF through both direct generic automated trans- that the system is able to handle. In general, the conversational formations (e.g., xCurator[75] or XSL-based transformation[12] ) or systems in these projects lack an abstraction layer that allows them through customized mappings/transformations [7, 8]. Approaches to separate the knowledge that is specific to the domain of the have also been developed to handle mapping from other specific project from the conversational flow of the application; they have data formats: e.g. CSV and spreadsheet data ( [37, 48]). Recently, due no back-end support that allows them to explore, query and reason to the need to integrate and map information from multiple inter- with the knowledge to drive the flow. The conversational systems related sources with heterogeneous formats, new approaches have also have no way to alter, correct and update this knowledge in been proposed to map data in a format agnostic way. A prominent order to deal with issues in the data ingestion stage and no way to approach, RML [24], extends the R2RML standard [23] in order to use this knowledge to resolve conversation specific-problems, such support many input formats beside relational data (e.g., JSON, CSV, as disambiguating the relevant data. and XML). Another example, xR2RML [56], extends both R2RML The remainder of the paper is structured as follows. Section 2 and RML to also handle mapping from NoSQL databases. summarizes past work in handling semi-structured data and describes how the KRT handles it. Section 3 describes reasoning ca- 2.2 Generic automated direct mapping vs. pabilities of our system.

A Knowledge and Reasoning Toolkit for Cognitive Applications

Knowledge Extraction by Internet Monitoring to Enhance Crisis Management

Injection of Automatically Selected Dbpedia Subjects in Electronic

Handling Semantic Complexity of Big Data Using Machine Learning and RDF Ontology Model

A Comparison of Knowledge Extraction Tools for the Semantic Web

CN-Dbpedia: a Never-Ending Chinese Knowledge Extraction System

Knowledge Extraction for Hybrid Question Answering

Real-Time Population of Knowledge Bases: Opportunities and Challenges

Domain-Targeted, High Precision Knowledge Extraction

4Th INCF Japan Node International Workshop Advances in Neuroinformatics 2016 and 14Th INCF Nodes Workshop

Towards Knowledge Acquisition of Metadataon AI Progress

Machine Learning and Knowledge Extraction an Open Access Journal by MDPI

Towards Unsupervised Knowledge Extraction