What Is a Knowledge Graph?

Semantic Web 0 (0) 1 1 IOS Press 1 James P. McCusker a,1, John S. Erickson a and 1 2 Katherine Chastain a and Sabbir Rashid a and 2 3 Rukmal Weerawarana a and Marcello Bax a and 3 4 Deborah L. McGuinness a 4 5 a Computer Science, Rensselaer Polytechnic Institute, 5 6 Troy, NY, US 6 7 E-mails: [email protected], [email protected], 7 8 [email protected], [email protected], [email protected], 8 9 [email protected], [email protected] 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 1570-0844/0-1900/$35.00 © 0 – IOS Press and the authors. All rights reserved 2 J. McCusker et al. / What is a Knowledge Graph? 1 1 2 2 3 3 4 What is a Knowledge Graph? 4 5 5 6 6 7 7 8 8 9 9 Abstract. Knowledge graphs have enjoyed a resurgence in research interests after the development of several commercial 10 10 projects, such as Google’s knowledge graph. However, the use of the term has evolved and now may refer to a wide range of 11 11 graphs, that may not include clear and unambiguous definitions or references. To better provide clarity to knowledge graph re- 12 search, we survey the literature for current efforts that may inform a knowledge graph definition, and then use that review along 12 13 with our work to synthesize a definition that is relevant and informative to current knowledge graph research, while constraining 13 14 the research space that may be considered a knowledge graph. We define a knowledge graph as “A graph, composed of a set of 14 15 assertions (edges labeled with relations) that are expressed between entities (vertices), where the meaning of the graph is encoded 15 16 in its structure, the relations and entities are unambiguously identified, a limited set of relations are used to label the edges, 16 17 and the graph encodes the provenance, especially justification and attribution, of the assertions.” We evaluate a wide variety of 17 18 knowledge resources, graphs, and ontologies to determine if they qualify under our definition, and find that while expressing 18 19 knowledge as a graph structure and unambiguous denotation of entities and relations in the graph are common, it is less common 19 to trace provenance of encoded knowledge, and less common to constrain the relations used when expressing that knowledge. 20 20 We created our Knowledge Graph Catalog to support this effort, and make it available to the public to search and contribute new 21 21 knowledge graphs. 22 22 23 Keywords: Knowledge Graphs 23 24 24 25 25 26 26 27 1. Introduction updated definition along with a set of knowl- 27 28 edge graph requirements. We include the require- 28 29 Google introduced its Knowledge Graph project ment that knowledge graphs represent attributable 29 30 in 2012 [1] in order to enhance their search re- knowledge, thus they need to include information 30 31 31 sult quality, but it has also reignited interest in about where the knowledge came from, as op- 32 32 posed to containing "bare statements" with no jus- 33 knowledge graph research. They have leveraged 33 34 existing knowledge graphs, such as DBpedia and tification or provenance. We discuss how knowl- 34 35 Freebase, and also have opened up the process of edge graphs as defined are a crucial component 35 36 contributing to the graph by ingesting linked data, for the future of the Web and have great potential 36 37 RDFa, and microdata formats from the Web pages for transformational change in data science and 37 38 they index, based on the vocabularies published domain sciences. 38 39 by schema.org. The success of the Google Knowl- Knowledge graphs provide an opportunity to ex- 39 40 edge Graph, and its use of semantic technologies, pand our understanding of how knowledge can be 40 41 has led to a resurgence in the use of the term managed on the Web and how that knowledge can 41 42 in semantic research to describe similar projects. be distinguished from more conventional Web- 42 43 However, the term “knowledge graph” remains based data publication schemes such as Linked 43 44 44 underspecified, and in many cases, simply refers Data [2]. In recent years, knowledge graphs have 45 45 grown increasingly prominent through commer- 46 to any directed labeled graph. The pre-Semantic 46 47 Web conceptualization of knowledge graphs pro- cial and research applications on the Web. Google 47 48 vides us with guidance as to what might currently was one of the first to promote a semantic meta- 48 49 “count” as a knowledge graph and also describes data organizational model described as a “knowl- 49 50 capabilities that do not yet exist in current knowledge graph,” and many other organizations have 50 51 edge graphs. From this synthesis, we propose an since used the term in published research on 51 J. McCusker et al. / What is a Knowledge Graph? 3 1 knowledge management and graph databases. Our set of relation types are used. These requirements 1 2 purpose with this paper is to survey the evolv- also minimize redundancy within the knowledge 2 3 ing notion of a knowledge graph, to describe the graph, which simplifies analytical operations (in- 3 4 general space, and to provide an explicit opera- cluding reasoning and queries). Popping explores 4 5 tional description of a knowledge graph. We begin the use of knowledge graphs, and their challenges 5 6 with a review of recent definitions of knowledge at the time, in their use in network text analysis 6 7 7 graphs, knowledge graph analysis and construc- [9]. Following Zhang, Popping defines the knowl- 8 8 9 tion algorithms, and commercial, research, non- edge graph as a type of semantic network that uses 9 10 profit, and government knowledge graphs. These only a few types of relations, but also asserts that 10 11 new knowledge graphs do not strictly adhere to additional knowledge may be added to the graph. 11 12 original knowledge graph theory [3], but instead Ehrlinger [10] selected some representative def- 12 13 have followed a looser, more flexible definition. initions that demonstrate the lack of a common 13 14 We present a more descriptive view of current, core understanding of the concept. Farber, et 14 15 practical knowledge graphs, and discuss their po- al. [11] and Huang, et al. [12] define knowledge 15 16 tential for evolution and impact. graph as being an RDF graph. Paulheim [13] 16 17 argues that "knowledge graphs are supposed to 17 18 cover at least a major portion of the domains that 18 19 19 2. Related Work exist in the world, and are not supposed to be 20 20 restricted to only one domain." But while DB- 21 21 22 Rospocher, et al. present knowledge graphs as pedia or Wikidata are general knowledge graphs 22 23 collections of facts about entities, typically de- and don’t focus on a single domain, this should 23 24 rived from structured data sources such as Free- not mean that all knowledge graphs must be gen- 24 25 base [4]. They cite a dearth of event representa- eral. On the contrary, we believe that knowledge 25 26 tions in current knowledge graphs as a shortcom- graphs created for specific domains such as Bi- 26 27 ing - limiting knowledge graphs to encyclopedic ology can be considered knowledge graphs if 27 28 items such as birth and death dates - primarily due they follow the other requirements. More recently, 28 29 to the difficulty of obtaining temporal data about many works report on automatically building 29 30 entities in a structured manner. Recent surveys, knowledge graphs out of textual medical knowl- 30 31 31 such as those by Hogenboom, et al. [5] and Deng, edge and medical records [14], [15], [16], [17]. 32 32 33 et al. [6], provide overviews of numerous meth- 33 34 ods for event extraction from a variety of sources 34 35 including social media, news, academic publica- 35 3. A Definition of “Knowledge Graph” 36 tions, and even images and video, indicating that 36 37 there is a great interest in finding ways to interpret 37 38 and include such temporal data in a more struc- One thing to note is that the knowledge graph 38 39 tured format. Another review by Nickel et al. ex- platforms that have been reviewed in this paper 39 40 plores machine learning methods for knowledge do not strictly adhere to the definition of knowl- 40 41 graphs, but limits their definition to directed la- edge graph that was set out in an de Riet and 41 42 beled graphs, with the ability to optionally pre- Meersman [3], Stokman and de Vries [7], and 42 43 define the schema. They also review, but do not Zhang [8]. Since usage has evolved, it is appro- 43 44 44 take a position on, the use of the closed versus priate to develop a definition that follows how 45 45 46 open world assumptions. the term is currently used.

What Is a Knowledge Graph?

One Knowledge Graph to Rule Them All? Analyzing the Diﬀerences Between Dbpedia, YAGO, Wikidata & Co

Wikipedia Knowledge Graph with Deepdive

Knowledge Graphs on the Web – an Overview Arxiv:2003.00719V3 [Cs

Knowledge Graph Identification

Google Knowledge Graph, Bing Satori and Wolfram Alpha * Farouk Musa Aliyu and Yusuf Isah Yahaya

Towards a Knowledge Graph for Science

Exploiting Semantic Web Knowledge Graphs in Data Mining

Wembedder: Wikidata Entity Embedding Web Service

How Much Is a Triple? Estimating the Cost of Knowledge Graph Creation

Wisdom of Enterprise Knowledge Graphs

Sangrahaka: a Tool for Annotating and Querying Knowledge Graphs

Knowledge Extraction Part 3: Graph